Machine Learning and Imbalanced Data
Learn to deal with the class imbalance problem manually and with SMOTE.
Since we have the features and the targets from our previous lesson, let's split them into train and test datasets.
Imbalance data
Let's also check the class imbalance for our training data.
Press + to interact
from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)print(y_train.value_counts(), '\n',"Minority class (Active) is only {} % in the training set".format(round(y_train.value_counts()[1] / len(y_train) * 100, 2)))
With that, let's train a logistic regression model.
Press + to interact
from sklearn.linear_model import LogisticRegression# Creating model instanceslogR = LogisticRegression(max_iter=10000)# fitting the modellogR.fit(X_train,y_train)# Accuracy Scoreprint("Accuracy Score for (X_train, y_train):",logR.score(X_train,y_train))
The numbers look impressive with an accuracy of ~98%. The minority class is only ...
Access this course and 1400+ top-rated courses and projects.