Home/Blog/Machine Learning/Linear Regression vs. Logistic Regression
Linear Regression VS Logistic Regression
Home/Blog/Machine Learning/Linear Regression vs. Logistic Regression

Linear Regression vs. Logistic Regression

9 min read
Jan 10, 2025
content
What is linear regression?
Is there a correlation between the data?
How linear regression works
Steps in linear regression
What is logistic regression?
How logistic regression works
Steps in logistic regression
Differences between linear regression and logistic regression
Related functions
Output type
Correlation
Metrics used
Applications

Before we delve deep into the topic of this blog, let’s look at some key takeaways:

Key Takeaways

Linear regression: Predicts continuous values (e.g., weight, price) by fitting a straight line through data.

Logistic regression: Classifies data by predicting the probability of an outcome (e.g., overweight or not).

Key differences: Linear regression predicts values, while logistic regression predicts probabilities for classification. Linear regression requires correlated variables, while logistic regression works with non-correlated variables.

Applications: Linear regression is used for value forecasting; logistic regression is used for classification tasks like weather prediction.

“The coldest winter I have ever spent was a summer in San Francisco.”

The above quote is commonly attributed to Mark Twain. Let’s not digress further from the topic and get back to the data, the prediction, and everything related to numbers. We're going to be diving deep and talking about the daily temperature. Let’s start with the summer of 2024 in San Franciso and look at the daily minimum and maximum temperatures.

What is linear regression?#

Let's assume I want to see whether the daily maximum temperature and the daily minimum temperature are related or not. It seems that on cold days, the minimum, as well as the maximum temperatures will go down. Similarly, on warmer days, both these numbers will increase, and my guess will be that these numbers should have some relation between them.

If these numbers are related, then with the knowledge of one of the values, we should be able to predict the other pretty accurately. This is exactly what we do in linear regression. First, we find out whether the two data are related or not and what is the relation between the two, i.e., we find a linear relation (defined by a straight line) that maps one value to the other.

Temperature Data for September 2024
Temperature Data for September 2024

In the above chart, I plotted the daily minimum and maximum temperatures for the month of September 2024. The location I picked is San Francisco. Now, we want to check if the values shown above are related. For this, we use a concept from statistics called correlation, and we ask for the following correlation.

Is there a correlation between the data?#

If the answer is yes, we can use the data to make predictions. With a high correlation, we can be more confident about our predictions.

The correlation coefficient between two numbers is a number between -1 and 1. Values close to zero mean very low correlation. A positive value means increasing one value will most likely increase the other value.

The correlation coefficient of this data is around 0.017,0.017, which is very close to zero. This means that there is no significant correlation between the minimum temperature and the maximum temperature of a day.

To keep this blog simple and more accessible, mathematical details have been left out intentionally.

How linear regression works#

In linear regression, there is another keyword: the regression line. Linear regression draws a line—the regression line—through the data points that pass close to the maximum number of data points, in the sense that the sum of distances of these points from the regression line is the least. This line shows the overall trend and where this trend will lead us. Let's look at the regression line on the points that we discussed earlier.

Regression line for the temperature data
Regression line for the temperature data

The red line is the trend line and it shows that the maximum temperature will remain close to 70. This does not make much sense, and again, it cannot be used for prediction because the correlation is close to zero.

Now, let's look at a similar data, but for the month of January 2024 in the diagram below.

Temperature for January 2024
Temperature for January 2024

The regression coefficient is 0.6840.684 and the trend line (regression line or the line of best fit) is shown in red. This is closer to what we expected—an increase in minimum temperature indicates an increase in the maximum temperature for the same day. This is also an example of positive correlation.

As we know, there are 31 days in January. The above plot was constructed using the first 30 days. Now let's see if we can use the minimum temperature of January 31, 2024 to predict the maximum temperature for the same day.

The minimum temperature on that day was 53. If we look at the red line where minimum temperature is 53, the maximum temperature is around 60.5. In fact, the actual temperature maximum temperature on that day was 60. This is a pretty good guess.

Now, let's list down the steps in linear regression.

Steps in linear regression#

To use the linear regression, we need to follow the steps given below.

  1. Separate the variables (input and output).

  2. Load the data (in an Excel sheet or in any programming language).

  3. Find the regression line—the line that best fits the data.

  4. For a new point, use the regression line to predict the output.

Learn more about linear regression through a hands-on project.

Learn more about linear regression through a hands-on project.

What is logistic regression?#

Now, we want to observe what other information can we extract on the basis of these temperatures.

The diagram above shows minimum and maximum temperatures for the first 30 days of January 2024. Red color indicates that it did not rain on that day, and blue means that it rained. Now, I want to see whether the minimum and maximum temperatures can be used to guess whether it rained on January 31, 2024 or not. The temperature for this day is shown with a black dot.

We can use logistic regression in such scenarios, where we want to predict the possibility of a specific outcome. In fact, logistic regression returns the probability of a particular outcome. It does this by assigning a score to each combination of inputs. This score can range from negative infinity to positive infinity. A score close to zero means that the probability is around 50%. Scores with a high positive value indicates the likelihood of an event occuring, and the scores with a high negative value indicates that the event has low chances of happening. This score can be converted to probability, as we’ll see later on

How logistic regression works#

Logistic regression requires labeled data, such as the one we saw earlier, where each data point is labeled into two classes. Let's call them positive and negative. Using this data, we want to classify a new point p.p. For the point p,p, a score is calculated. This score can be anywhere between -\infty and +.+\infty. Then, using a logistic function such as the sigmoid function, this score is converted into a probability. This function maps the score to a range between 00 and 1.1.

Steps in logistic regression#

For logistic regression, apply the following steps.

  1. Load the data, along with their class or label.

  2. Divide the data into two parts: training and testing.

  3. Use the training data to train the model i.e., weights of the hyperparameters are calculated.

  4. Use the testing data to evaluate the efficiency of logistic regression.Input the test point to calculate the score, using the weights found above.

  5. Apply the logistic function to convert the score into a probability.

  6. This probability is the probability that the input belongs to a particular class.

  7. If the probability is greater than 1/2, the function returns a positive result, otherwise it returns negative. In case of multiple classes, the model returns the class with the highest probability.

For simplicity, some of the steps of the logistic regression have been omitted. For implementation, you can read the process given in the project Implement Logistic Regression in Python from Scratch.

Differences between linear regression and logistic regression#

Let's look at the differences between the linear regression and the logistic regression in the table below before discussing each of these points.

Linear Regression

Logistic Regression

Function in action

Linear – Straight line

Sigmoid – S-shaped

Output

A number

A probability

Typically used for

Predicting the output

Classifying the input

Requires correlation

Yes

No

Evaluation metrics

Correlation coefficient, root mean square error

Accuracy, precision, recall, F-1 score

  • In linear regression, a best fit straight line is drawn, making sure that the distance between the available data points and the line is minimum. This straight line can be drawn in more than two dimensions, hence there is no restriction on the number of independent (input) variables. In case there are more than one independent variables, such a regression is called multiple linear regression. This line is used for output prediction.

  • In logistic regression, an S-shaped logistic function – the sigmoid function – is used to convert the raw score into a probability. If the probability is more than half, then this means that it is highly likely that the data belongs to a particular class. A probability of less than half will mean otherwise.

Output type#

  • In linear regression, the output is a real-valued number. This number can even be negative.

  • In logistic regression, after the probability is computed, the data is assigned a particular label (class with highest probability), which is the actual output of the logistic regression.

Correlation#

  • In linear regression, if the input variables are not correlated with the output variable, then the results will be inconclusive. A high correlation, whether positive or negative, indicates that the output will be very close to the actual value.

  • In logistic regression, it is not a good idea to use variables that are highly correlated to each other. Using such variables will result in splitting up the effect of the outcome (class) and such variables.

Metrics used#

  • In linear regression, finding the best-fit line involves reducing the error, or more technically, the root mean square error (RMSE). lower RMSE means better prediction. Also, the correlation coefficient (a number between 1-1 and +1+1) gives us an idea how good the prediction will be.

  • In logistic regression, the values of accuracy, precision, recall, and F-1 score are different types of metrics used to provide the effectiveness of the results. Each one of them have their own applications, and are used in different situations.

Applications#

  • Linear regression is used for predicting a number, e.g. stock forecasting, operational efficiency of machines, number of subscribers, etc.

  • Logistic regression is used for weather prediction, text analysis, etc.

Here are some useful resources regarding the implementation of these models.

If you want to learn implementing linear regression or logistic regression, here are some of the resources.

For hands-on practice of linear regression and logistic regression, I recommend having a look at the following projects.

For learning some related concepts, and the applications of these topics, you can consult the courses below.

Or you can become a machine learning engineer by going through the following learning path.

Learn to Code: Become a Machine Learning Engineer

Cover
Become a Machine Learning Engineer

Start your journey to becoming a machine learning engineer by mastering the fundamentals of coding with Python. Learn machine learning techniques, data manipulation, and visualization. As you progress, you'll explore object-oriented programming and the machine learning process, gaining hands-on experience with machine learning algorithms and tools like scikit-learn. Tackle practical projects, including predicting auto insurance payments and customer segmentation using K-means clustering. Finally, explore the deep learning models with convolutional neural networks and apply your skills to an AI-powered image colorization project.

105hrs
Beginner
17 Challenges
11 Quizzes

Frequently Asked Questions

What is linear regression used for?

Linear regression is used to predict an output that is in the form of a real-valued number.

What is logistic regression used for?

Can the output of linear regression be something other than numbers?

Can the output of logistic regression be something other than numbers?


Written By:
Khawaja Muhammad Fahd
Linear Regression VS Logistic Regression
Join 2.5 million developers at
Explore the catalog

Free Resources