Logistic Regression Implementation
Implement the logistic regression.
It's time to implement what we have learned and see how logistic regression can be used in business.
The data and its overview
We can think about a college’s decision to shortlist the applicants for admission based on their GRE or GPA. The college offers some specializations; the codes are in the field column. We are trying to keep it simple and are considering only a few columns.
gre
: The GRE score of the applicant.gpa
: The GPA of the applicant.field
: The field of study for which the student has applied to.admit
: The target column with binary 1-0 outcomes showing if the student was successful or not.
adm = pd.read_csv('admissions.csv')print(adm.head())
We can see some missing data in this small dataset, and we can calculate the numbers.
# How much (%) data is missingprint((adm.isnull().sum()/len(adm)*100).to_string()) # % of missing data
Well, only a small fraction of the data is missing. We can ignore it.
print('No. of observations before removing missing values:', adm.shape[0])adm.dropna(inplace=True) # inplace = True for the permanent changeprint('No. of observations after removing missing values:', adm.shape[0])
Let's compute the probabilities and the odds for admission based on the field of study.
print((adm.field.value_counts()).to_string()) # unique()
So far, we have four fields available for the applicants, and most students were admitted to the field with code 2.
# targets w.r.t the fields 1, 2, 3 and 4y_f1 = adm[adm.field == 1].admity_f2 = adm[adm.field == 2].admity_f3 = adm[adm.field == 3].admity_f4 = adm[adm.field == 4].admit# chance to get into the respective field -- probability!print("Probabilities:")print('P(admit | field = 1):', np.mean(y_f1))print('P(admit | field = 2):', np.mean(y_f2))print('P(admit | field = 3):', np.mean(y_f3))print('P(admit | field = 4):', np.mean(y_f4))# lets get the odds for from probabilitiesprint("\nOdds:")def odds(p):return float(p) / (1 - p) # a function to compute oddsprint('odds(admit | field = 1):', odds(np.mean(y_f1)))print('odds(admit | field = 2):', odds(np.mean(y_f2)))print('odds(admit | field = 3):', odds(np.mean(y_f3)))print('odds(admit | field = 4):', odds(np.mean(y_f4)))
Now that we have the probabilities and odds, let’s model them and compare.
Linear regression vs. logistic regression
Similar to linear regression, we’ll have to create instances for the logistic regression and train the model on the dataset. Once ...