...

/

Logistic Regression Implementation

Logistic Regression Implementation

Implement the logistic regression.

It's time to implement what we have learned and see how logistic regression can be used in business.

The data and its overview

We can think about a college’s decision to shortlist the applicants for admission based on their GRE or GPA. The college offers some specializations; the codes are in the field column. We are trying to keep it simple and are considering only a few columns.

  • gre: The GRE score of the applicant.

  • gpa: The GPA of the applicant.

  • field: The field of study for which the student has applied to.

  • admit: The target column with binary 1-0 outcomes showing if the student was successful or not.

Press + to interact
adm = pd.read_csv('admissions.csv')
print(adm.head())

We can see some missing data in this small dataset, and we can calculate the numbers.

Press + to interact
# How much (%) data is missing
print((adm.isnull().sum()/len(adm)*100).to_string()) # % of missing data

Well, only a small fraction of the data is missing. We can ignore it.

Press + to interact
print('No. of observations before removing missing values:', adm.shape[0])
adm.dropna(inplace=True) # inplace = True for the permanent change
print('No. of observations after removing missing values:', adm.shape[0])

Let's compute the probabilities and the odds for admission based on the field of study.

Press + to interact
print((adm.field.value_counts()).to_string()) # unique()

So far, we have four fields available for the applicants, and most students were admitted to the field with code 2.

Press + to interact
# targets w.r.t the fields 1, 2, 3 and 4
y_f1 = adm[adm.field == 1].admit
y_f2 = adm[adm.field == 2].admit
y_f3 = adm[adm.field == 3].admit
y_f4 = adm[adm.field == 4].admit
# chance to get into the respective field -- probability!
print("Probabilities:")
print('P(admit | field = 1):', np.mean(y_f1))
print('P(admit | field = 2):', np.mean(y_f2))
print('P(admit | field = 3):', np.mean(y_f3))
print('P(admit | field = 4):', np.mean(y_f4))
# lets get the odds for from probabilities
print("\nOdds:")
def odds(p):return float(p) / (1 - p) # a function to compute odds
print('odds(admit | field = 1):', odds(np.mean(y_f1)))
print('odds(admit | field = 2):', odds(np.mean(y_f2)))
print('odds(admit | field = 3):', odds(np.mean(y_f3)))
print('odds(admit | field = 4):', odds(np.mean(y_f4)))

Now that we have the probabilities and odds, let’s model them and compare.

Linear regression vs. logistic regression

Similar to linear regression, we’ll have to create instances for the logistic regression and train the model on the dataset. Once ...

Access this course and 1400+ top-rated courses and projects.