Why non-linear regression?

We’ve already seen that we can fit even a non-linear polynomial through data points using linear regression if the function to be approximated was linear in parameters. However, in some cases, linear regression may not fit the data well. In such a case, non-linear regression is preferred. For instance, in a classification problem where the targets are classes (or categories), we may be interested in predicting the probability distribution, bounded between $0$ and $1$ , over all the classes for a given input. If the probability of a class, $c$ , is $p(\hat y_i=c|\bold{x_i})=f_\bold{w}(\bold{x_i})$ , then it’s hard to come up with a function, $f_\bold{w}$ , that’s linear with respect to $\bold{w}$ . The figure below is a pictorial example of such a case, where the number of classes is three and we have to employ non-linear regression to find their probability distribution.

Logistic regression

Consider a classification problem with two classes, where $y_i \in\{0,1\}$ . Such a problem is also called a binary classification problem. One way of modeling $p(\hat y_i=1|\bold{x_i})$ is by using a logistic function as follows:

p(\hat y_i=1|\bold{x_i})=\frac{1}{1+\exp(-(\bold{x_i}^T\bold{w}+b))}

p(\hat y_i=0|\bold{x_i})=1-p(\hat y_i=1|\bold{x_i})

Note: A logistic function is also known as sigmoid, typically denoted as $\sigma(z)$ .
$\sigma(z)=\frac{1}{1+\exp(-z)}$