Reshaping DataFrames

Learn how to reshape DataFrames to and from long format data.

Let’s now explore how we can reshape DataFrames to and from long format, and more importantly, why we would want to do so.

Melting DataFrames

One of the first things to notice is that years are spread across columns with the values corresponding to them, each in its respective cell under the respective year. The issue is that 1980 is not really a variable. A more useful way is to have a year variable, and in that column, the values vary from 1974 to 2019. Remember, the way we created the first chart in this section makes our life much easier. Let’s illustrate using a small dataset so things are clear, and then we can implement the same approach with the data DataFrame.

The Jupyter Notebook set up below shows how we can have the same data structured differently while containing the same information:

Please login to launch live app!

Our current DataFrame is structured like the “Wide” format table, and it would be easier to have it in a format like the “Long (tidy) format” table.

The difficulty with the wide format is that the variables are presented in different ways. In some cases, they are displayed vertically in a column (“country” and “indicator”), while in others, they are displayed horizontally across the columns “2015” and “2020.” Accessing the same data in the long format DataFrame is straightforward—we simply specify the columns that we want. In addition to that, we get automatic mapping of values. For example, taking the columns “year” and “value” from the long DataFrame would automatically map 2015 to 100, 2015 to 10, and so on. At the same time, each row is a complete and independent representation of the case we are dealing with.

The good news is that this is doable with one call to the melt method.

wide_df.melt(id_vars=['country', 'indicator'],
    value_vars=['2015',
...