Update the Gradient

Apply what we have learned about sigmoids by updating the loss function.

Updating the loss function

Now that we have a brand-new loss function, let’s look up its gradient. Here is the partial derivative of the log loss with respect to the weight from the math textbooks:

δLδw=1mi=1mxi(yi^yi)\frac{\delta L}{\delta w} = \frac{1}{m} \sum_{i=1}^{m} x_i (\hat{y_i} - y_i)

This gradient might look familiar. In fact, it closely resembles the gradient of the mean squared error that we have used so far:

δMSEδw=2mi=1mx(yi^yi)\frac{\delta MSE}{\delta w} = \frac{2}{m}\sum_{i=1}^{m}x(\hat{y_i}-y_i)

See how similar they are? This means that we can take our previous gradient() function:

Get hands-on with 1400+ tech skills courses.