pandas

Learn the basics of using pandas, the main Python library that works with DataFrames.

What is pandas?

According to their own website, pandas (yes, with a lower case “p”) is “a fast, powerful, flexible, and easy to use open source data analysis and manipulation tool built on top of the Python programming language.” Excel users find pandas to be the most familiar way to manipulate data in Python since a pandas DataFrame shares many similarities with Excel.

Getting started

In Python, every time we want to use a library in our code, we use the import command. We can then import the library we want using an alias. The conventional alias for pandas is pd, so the usual way to start using pandas in Python is by calling import pandas as pd and then using pd every time we need it.

Understanding pandas objects

Series

A pandas Series is the equivalent of a column in a table. Let’s look at an example of how to create a pandas Series:

Press + to interact
import pandas as pd
a = pd.Series([10, 20, 30, 40])
print(a)

DataFrame

A DataFrame object is similar to a table and will have an index and at least one column (Series). The index is a unique identifier of each of the rows in the DataFrame. When we work with time series data, that index is usually a DatetimeIndex. So instead of being identified by a number, each row will be identified by a date and a time. This is not always the case, though.

Let’s see how to create a ...