Plotting with pandas

Learn about pandas and plotting graphs using a given dataset.

pandas basics and installation

Although Python can be used to parse CSV files, it can still be complex. While there are libraries like csv_reader() that can help, we’ll still have to run many functions manually. Enter pandas, which is an excellent library for data analysis. If we’re dealing with complicated or large datasets, pandas can be really helpful. It is a superset based on NumPy and SciPy. So, it contains many convenient features, like reading directly from a Microsoft Excel file, executing data joins (like we might perform in SQL), and so on. We’ll go over some of these functionalities in this lesson.

pandas has two primary data structures:

  • Series

  • DataFrames

Series is similar to NumPy’s array or dictionary, although it comes with many extra features.

DataFrames is a two-dimensional data structure that contains both column and row information, like the fields of an Excel file.

Note: An Excel file has rows, columns, and an optional header field, which can be represented in a dataframe. That’s what we’ll use in our examples below.

We should get pandas if we’ve installed Anaconda on Windows. We can run pip install pandas on Linux. pip will tell us about any dependencies that we might need to install along with it.

Analyze obesity in England

We will use the data provided by the UK Government in 2014 to analyze obesity in England. The data is in Microsoft Excel format, so we can easily open it and view the different sections given to us.

Let’s get started then.

pandas is imported as pd and numpy is imported as np for efficiency’s sake.

data = pd.ExcelFile("Obes-phys-acti-diet-eng-2014-tab.xls")

Note: The name “Obes-phys-acti-diet-eng-2014-tab.xls” is the name of the file downloaded from the official UK Government’s website.

Let’s open the xls file. The great thing about pandas is that we can open Excel files directly. Most libraries can only work with CSV files where we print all the sheet names:

['Chapter 7', '7.1', '7.2', '7.3', '7.4', '7.5', '7.6', '7.7', '7.8', '7.9', '7.10']

A sheet is just one “page,” of Excel data. Users break the data into multiple sheets rather than keep all the data in one substantial unmanageable sheet.

Obesity by gender

Let’s have a look at the data:

Get hands-on with 1400+ tech skills courses.