Logistic regression is a common machine learning algorithm for binary classification and predicts a binary categorical variable through a logistic function.
The logistic regression model passes the outcome of a linear function of features through a logistic function to calculate the probability of an occurrence. The model then maps the probability to binary outcomes.
The logistic function in linear regression is a type of sigmoid, a class of functions with the same specific properties.
Sigmoid is a mathematical function that takes any real number and maps it to a probability between 1 and 0.
The formula of the sigmoid function is:
The sigmoid function forms an S shaped graph, which means as approaches infinity, the probability becomes 1, and as approaches negative infinity, the probability becomes 0. The model sets a threshold that decides what range of probability is mapped to which binary variable.
Suppose we have two possible outcomes, true and false, and have set the threshold as 0.5. A probability less than 0.5 would be mapped to the outcome false, and a probability greater than or equal to 0.5 would be mapped to the outcome true.
Suppose, a regression model is fit using some training data to obtain β and x represents the input features:
The probability of being mapped to 1 is given by the equation: