Identifying Context: Correlation vs. Causation

Break down the components of correlation and causation and their impact on data storytelling.

It can be challenging to tell a story without knowing the full context.

"Correlation is not causation" is a saying in the data science domain to emphasize that relationships found in the data using correlation analysis do not imply that one event/variable causes another.

Confounding variables, or variables that were not considered during data analysis, can have an impact on the causal context and therefore complicate data storytelling because they reduce the full context available to a data storyteller. However, investigating these relationships can also help us mine interesting insights from our data.

Correlations and causal context

Let's consider a few case studies on correlations with some causal context behind them. Our objective with this analysis is to go beyond saying there is a strong positive/weak negative correlation and identify potential causal links between the variables.

Case study 1: Life expectancy and population

Let's take a look at the scatter plot below showing the correlation between life expectancy and population in the United States, from the Gapminder dataset provided by the Plotly package.

Get hands-on with 1400+ tech skills courses.