Plotting with pandas
Learn about pandas and plotting graphs using a given dataset.
We'll cover the following
pandas basics and installation
Although Python can be used to parse CSV files, it can still be complex. While there are libraries like csv_reader()
that can help, we’ll still have to run many functions manually. Enter pandas, which is an excellent library for data analysis. If we’re dealing with complicated or large datasets, pandas can be really helpful. It is a superset based on NumPy and SciPy. So, it contains many convenient features, like reading directly from a Microsoft Excel file, executing data joins (like we might perform in SQL), and so on. We’ll go over some of these functionalities in this lesson.
pandas has two primary data structures:
-
Series
-
DataFrames
Series is similar to NumPy’s array or dictionary, although it comes with many extra features.
DataFrames is a two-dimensional data structure that contains both column and row information, like the fields of an Excel file.
Note: An Excel file has rows, columns, and an optional header field, which can be represented in a dataframe. That’s what we’ll use in our examples below.
We should get pandas if we’ve installed Anaconda on Windows. We can run pip install pandas
on Linux. pip will tell us about any dependencies that we might need to install along with it.
Analyze obesity in England
We will use the data provided by the UK Government in 2014 to analyze obesity in England. The data is in Microsoft Excel format, so we can easily open it and view the different sections given to us.
Let’s get started then.
pandas
is imported as pd
and numpy
is imported as np
for efficiency’s sake.
data = pd.ExcelFile("Obes-phys-acti-diet-eng-2014-tab.xls")
Note: The name “Obes-phys-acti-diet-eng-2014-tab.xls” is the name of the file downloaded from the official UK Government’s website.
Let’s open the xls
file. The great thing about pandas is that we can open Excel files directly. Most libraries can only work with CSV
files where we print all the sheet names:
['Chapter 7', '7.1', '7.2', '7.3', '7.4', '7.5', '7.6', '7.7', '7.8', '7.9', '7.10']
A sheet is just one “page,” of Excel data. Users break the data into multiple sheets rather than keep all the data in one substantial unmanageable sheet.
Obesity by gender
Let’s have a look at the data:
Get hands-on with 1400+ tech skills courses.