Solution: Supervised Learning
Follow the instructions for using supervised learning algorithms on preprocessed data.
We'll cover the following...
There are multiple possible solutions for this coding challenge, depending on the algorithms we choose, but here is one possible solution:
Press + to interact
main.py
data.csv
preprocessed = pd.read_csv("preprocessed.csv")y_var = 'Churn_Yes'X_var = ['SeniorCitizen_1', 'tenure','MonthlyCharges', 'TotalCharges', 'gender_Male','Partner_Yes', 'Dependents_Yes', 'PhoneService_Yes','MultipleLines_No phone service', 'MultipleLines_Yes','InternetService_Fiber optic', 'InternetService_No','OnlineSecurity_No internet service', 'OnlineSecurity_Yes','OnlineBackup_No internet service', 'OnlineBackup_Yes','DeviceProtection_No internet service', 'DeviceProtection_Yes','TechSupport_No internet service', 'TechSupport_Yes','StreamingTV_No internet service', 'StreamingTV_Yes','StreamingMovies_No internet service', 'StreamingMovies_Yes','Contract_One year', 'Contract_Two year', 'PaperlessBilling_Yes','PaymentMethod_Credit card (automatic)', 'PaymentMethod_Electronic check','PaymentMethod_Mailed check',]# Define X (model features) and y (target variable)X = preprocessed[X_var]y = preprocessed[y_var]# First algorithmfrom sklearn.linear_model import LogisticRegressionlr = LogisticRegression(penalty='l2', C=10)lr.fit(X, y)lr_predictions = lr.predict(X)print(lr_predictions)# Second algorithmfrom sklearn.neighbors import KNeighborsClassifierkn_classifier = KNeighborsClassifier(n_neighbors=4, metric='euclidean', weights='distance')kn_classifier.fit(X, y)kn_predictions = kn_classifier.predict(X)print(kn_predictions)# Third algorithmfrom sklearn.tree import DecisionTreeClassifierdt_classifier = DecisionTreeClassifier(max_depth=5, min_samples_split=10)dt_classifier.fit(X, y)dt_predictions = dt_classifier.predict(X)print(dt_predictions)# Ensemblefrom sklearn.ensemble import StackingClassifierestimators = [('lr', lr),('knn', kn_classifier),('dt', dt_classifier)]clf = StackingClassifier(estimators=estimators,final_estimator=LogisticRegression())clf.fit(X, y)ensemble_predictions = clf.predict(X)print(ensemble_predictions)print(y)
Line 27–29: We implement the first algorithm: logistic regression.
An instance of
LogisticRegression
is created withpenalty='l2'
(ridge regularization) andC=10
(inverse regularization strength).The model is fitted to the feature matrix
X
and the target vectory
, and predictions are made on the feature matrixX
using thepredict()
...