Mathematics of Linear Correlation

Learn about the mathematical equation for the linear correlation and F-test.

Understanding linear correlation equation

What is linear correlation, mathematically speaking? If you’ve taken basic statistics, you are likely familiar with linear correlation already. Linear correlation works very similarly to linear regression. For two columns, XX and YY, linear correlation ρρ (the lowercase Greek letter “rho”) is defined as the following:

ρ=E[(XµX)(YµY)]σXσYρ = \frac{E[(X-µ_X)(Y-µ_Y)]}{σ_Xσ_Y}

This equation describes the expected value (EE, which you can think of as the average) of the difference between the elements of XX and their average, µXµ_X , multiplied by the difference between the corresponding elements of YY and their average, µYµ_Y . The average for EE is taken over pairs of XX, YY values. You can imagine that if, when XX is relatively large compared to its mean, µXµ_X , YY also tends to be similarly large, then the terms of the multiplication in the numerator will both tend to be positive, leading to a positive product and positive correlation after the expected value, EE, is taken. Similarly, if YY tends to be small when XX is small, both terms in the numerator will be negative and again lead to positive correlation. Conversely, if YY tends to decrease as XX increases, they will have negative correlation.

The denominator (the product of the standard deviations of XX and YY) serves to normalize linear correlation to the scale of [-1, 1]. Because Pearson correlation is adjusted for the mean and standard deviation of the data, the actual values of the data are not as important as the relationship between XX and YY. Stronger linear correlations are closer to 1 or -1. If there is no linear relation between XX and YY, the correlation will be close to 0.

A limitation of Pearson correlation

It’s worth noting that, while it is regularly used in this context by data science practitioners, Pearson correlation is not strictly appropriate for a binary response variable, as we have in the case study problem. Technically speaking, among other restrictions, Pearson correlation is only valid for continuous data. However, Pearson correlation can still accomplish the purpose of giving a quick idea of the potential usefulness of features. It is also conveniently available in software libraries such as pandas.

In data science in general, you will find that certain widely used techniques may be applied to data that violates its formal statistical assumptions. It is important to be aware of the formal assumptions underlying analytical methods. In fact, knowledge of these assumptions may be tested during interviews for data science jobs. However, in practice, as long as a technique can help us on our way to understanding the problem and finding an effective solution, it can still be a valuable tool.

That being said, linear correlation will not be an effective measure of the predictive power of all features. In particular, it only picks up on linear relationships. Shifting our focus momentarily to a hypothetical regression problem, have a look at the following examples and discuss what you expect the linear correlations to be. Notice that the values of the data on the x and y axes are not labeled; this is because the location (mean) and standard deviation (scale) of data does not affect the Pearson correlation, only the relationship between the variables, which can be discerned by plotting them together:

Get hands-on with 1400+ tech skills courses.