Gradient Descent to Find Optimal Parameter Values
Learn about the method of finding parameter values for a logistic regression using log-loss cost as an optimization problem.
We'll cover the following
Optimization problem for logistic regression
The problem of finding the parameter values (coefficients and intercept) for a logistic regression model using a log-loss cost boils down to a problem of optimization: we would like to find the set of parameters that results in the minimum cost, because costs are higher for worse predictions. In other words, we want the set of parameters that is the “least wrong” on average over all of the training samples. This process is done for you automatically by the fit
method of the logistic regression model in scikit-learn. There are different solution techniques for finding the set of parameters with the lowest cost, and you can choose which one you would like to use with the solver
keyword when you are instantiating the model class. All of these methods work somewhat differently. However, they are all based on the concept of gradient descent.
Understanding gradient descent
The gradient descent process starts with an initial guess. The choice of the initial guess is not that important for logistic regression and you don’t need to make it manually; this is handled by the solver
keyword. However, for more advanced machine learning algorithms such as deep neural networks, selection of the initial guesses for parameters requires more attention.
For the sake of illustration, we will consider a problem where there is only one parameter to estimate. We’ll look at the value of a hypothetical cost function and devise a gradient descent procedure to find the value of the parameter, , for which the cost, , is the lowest. Here, we choose some values, create a function that returns the value of the cost function, and look at the value of the cost function over this range of parameters.
The code to do this is as follows:
X_poly = np.linspace(-3,5,81) print(X_poly[:5], '...', X_poly[-5:])
Here is the output of the print statement:
[-3. -2.9 -2.8 -2.7 -2.6] ... [4.6 4.7 4.8 4.9 5. ]
The remaining code snippet is as follows:
def cost_function(X):
return X * (X-2)
y_poly = cost_function(X_poly)
plt.plot(X_poly, y_poly)
plt.xlabel('Parameter value')
plt.ylabel('Cost function')
plt.title('Error surface')
The resulting plot should appear as follows:
Get hands-on with 1300+ tech skills courses.