...

/

Data Processing

Data Processing

Get some hands-on experience in data processing with Python.

In this lesson, we’ll cover the essential steps of data preprocessing, which are crucial for preparing data for ML models. By the end of this lesson, you’ll have hands-on experience with processing the data to make it ready for analysis.

Data processing

Before diving into data analysis or ML, it’s essential to preprocess the data. This step ensures that the data is clean and well-structured. Common preprocessing tasks include handling missing values, encoding categorical variables, and scaling numerical features. Here’s how to perform some basic data preprocessing:

Handling missing values

If there are missing values in the dataset, we should decide how to deal with them. We should either impute them or remove them. Missing values can increase the bias of results, so removing them can improve dataset quality.

Press + to interact
data = sns.load_dataset("tips")
print(data.isnull().sum())

Code explanation

  • Line 2: The data.isnull().sum() function examines each element to see if it’s empty and then sums up the null values for each column.

Encoding categorical variables

We need to convert categorical variables into numerical form so they ...