Scipy an External Library

This lesson introduces an external scipy library by discussing in detail how scipy provides support to handle statistics and probabilistic functionalities.

Calculating correlations #

Scipy is a Python library for scientific computing. Scipy and Numpy are the core libraries that Pandas is built upon. We will discuss Pandas later in the course, but having an understanding of Scipy and Numpy before discussing Pandas is useful.

A correlation is a numerical measure of the statistical relationship between two variables. For us, those variables will usually be two columns of data, for example, the temperature outside and the likelihood of rain.

One way to calculate the correlation between two vectors of data is with Pearson’s r-value. This value ranges between -1 and 1. Where -1 means there is a total negative correlation, 0 means no correlation, and 1 means total positive.

Note: these are all linear correlations.

In the image below, you can see a graphical representation of correlation. Source.