...

Estimating the Coefficients and Intercepts of Logistic Regression

Learn about cost functions used in logistic regression models.

We'll cover the following...

Understanding cost functions
Mean squared error for linear regression
- Log-loss for logistic regression
Try it yourself

In the previous chapter, we learned that the coefficients of a logistic regression model (each of which goes with a particular feature), as well as the intercept, are determined using the training data when the fit method is called on a logistic regression model in scikit-learn. These numbers are called the parameters of the model, and the process of finding the best values for them is called parameter estimation. Once the parameters are found, the logistic regression model is essentially a finished product: with just these numbers, we can use a logistic regression model in any environment where we can perform common mathematical functions.

Press + to interact

Understanding cost functions

It is clear that the process of parameter estimation is important, because this is how we can make a predictive model from our data. So, how does parameter estimation work? To understand this, the first step is to familiarize ourselves with the concept of a cost function. A cost function is a way of telling how far away the model predictions are from perfectly describing the data. The larger the difference between the model predictions and the actual data, then the larger the “cost” returned by the cost function.

Mean squared error for linear regression

This is a straightforward concept for regression problems. The difference between predictions and true values can be used as the cost. This difference goes through a transformation, such as absolute value or squaring, to ensure the cost is positive. Finally, the cost is averaged over all the training samples.

Log-loss for logistic regression

For classification problems, especially in fitting logistic regression models, a typical cost function is the log-loss function, also called cross-entropy loss. This is the cost function that scikit-learn uses, in a modified form, to fit logistic regression:

log \space loss = \frac{1}{n}\sum\limits_{i=1}^n - (y_i log(p_i)+(1-y_i)log(1-p_i))

Introduction

Data Exploration and Cleaning

(Challenge) Exploring Remaining Financial Features in Dataset

Introduction to scikit-learn and Model Evaluation

Fake News Detection Using Scikit-learn

(Challenge) Logistic Regression and Precision-Recall Curve

Details of Logistic Regression and Feature Extraction

(Challenge) Logistic Regression Model and Coefficients

The Bias-Variance Trade-Off

(Challenge) Cross-Validation and Feature Engineering

Decision Trees and Random Forests

(Challenge) Cross-Validation Grid Search with Random Forest

Gradient Boosting, XGBoost, and SHAP Values

(Challenge) XGBoost and SHAP Explanation for Case Study Data

Predict Frog Toxicity with Python and XGBoost

Test Set Analysis, Financial Insights, and Delivery to the Client

(Challenge) Deriving Financial Insights

Appendix

Estimating the Coefficients and Intercepts of Logistic Regression

Understanding cost functions

Mean squared error for linear regression

Log-loss for logistic regression