Read Data into DataFrame
Learn to read data in pandas and PySpark.
Read data in the Pandas
There are many ways to read data in pandas, but in this lesson, we’ll focus on the following two ways:
-
Read with custom code then convert it to a pandas DataFrame.
-
Read with a built-in pandas function.
Note: The codes discussed below are executable.
Using custom code
import pandas as pd from tqdm import tqdm import json PATH_BIGDATA = '/Toys_and_Games_5.json' def read_json_to_pdf(path: str) -> pd.DataFrame: data = [] with open(path, 'r') as f: for line in tqdm(f): data.append(json.loads(line)) df = pd.DataFrame(data) return df raw_pdf = read_json_to_pdf(PATH_BIGDATA) print(raw_pdf.head()) print('Code Executed Successfully')
Read data using custom code in pandas
After a successful code execution, we’ll see the message “Code Executed Successfully” in the terminal.
Explanation
-
Lines 1–3: We import the ...