...

/

Read Data into DataFrame

Read Data into DataFrame

Learn to read data in pandas and PySpark.

Read data in the Pandas

There are many ways to read data in pandas, but in this lesson, we’ll focus on the following two ways:

  1. Read with custom code then convert it to a pandas DataFrame.

  2. Read with a built-in pandas function.

Note: The codes discussed below are executable.

Using custom code

import pandas as pd
from tqdm import tqdm
import json
PATH_BIGDATA = '/Toys_and_Games_5.json'
def read_json_to_pdf(path: str) -> pd.DataFrame: 
  data = []
  with open(path, 'r') as f: 
    for line in tqdm(f):
      data.append(json.loads(line))
  df = pd.DataFrame(data)
  return df
raw_pdf = read_json_to_pdf(PATH_BIGDATA)
print(raw_pdf.head())

print('Code Executed Successfully')
Read data using custom code in pandas

After a successful code execution, we’ll see the message “Code Executed Successfully” in the terminal.

Explanation

  • Lines 1–3: We import the ...