Search⌘ K

Logistic Regression Steps: 5 to 7

Explore key steps in preparing data for logistic regression, including handling missing values by removal or imputation, defining dependent and independent variables, and assigning the logistic regression algorithm for classification tasks. This lesson helps you understand practical data preprocessing and model setup.

5) Remove and fill missing values

Let’s now inspect the data frame for missing values.

C++
#5. Remove and fill missing values
print(df.isnull().sum())

The output shows that four of the thirty six variables contain missing values: these four variables and their correlation to the y (dependent) variable (State_successful) are summarized in the table below.

C++
#Code for obtaining correlation coefficients
df['State_successful'].corr(df['Facebook Friends'].astype(float))
df['State_successful'].corr(df['Creator - # Projects Backed'].astype(float))
df['State_successful'].corr(df['# Videos'].astype(float))
df['State_successful'].corr(df['# Words (Risks and Challenges)'].astype(float))

Facebook Friends and Creator - # Projects Backed variables have many missing values, but their correlation to the dependent variable (State_successful) is ...