Examples of Time Series Data
Understand time series through a few examples of real-world data.
We'll cover the following
Don’t worry if you’ve never used Python or pandas and don’t fully understand the code yet. All of that will be covered in the next sections. The goal here is to see some examples and start getting familiar with the way Python handles time series data.
Microsoft stock prices
Take a look at the first five rows of a dataset containing Microsoft stock prices and see how those prices change over time.
import pandas as pddf = pd.read_csv('microsoft_stock.csv')print(df.head())
Many questions pop up just from looking at these rows:
What is the format of the “Date” column? Does “4/1/2015” mean “April 1st” or “January 4th?” In the US it would be the former, while in most other countries it would be the latter.
Where is the data for “4/3/2015,” “4/4/2015,” and “4/5/2015?"
What is the most recent data available?
As you might have noticed, when looking at a time series in a table, sometimes it can be hard to see the big picture. The standard way to visualize time series data is by using line charts.
import pandas as pdimport matplotlib.pyplot as pltdf = pd.read_csv('microsoft_stock.csv')# Changing the datatypedf["Date"] = pd.to_datetime(df['Date'], format='%m/%d/%Y %H:%M:%S')# Setting the Date as indexdf = df.set_index('Date')# Plottingfig, axe = plt.subplots(figsize=(7, 4.5), dpi=300)axe.plot(df['Close'])plt.xlabel("Date")plt.ylabel("Price")plt.title("Microsoft stock closing price")fig.savefig("output/output.png")plt.close(fig)
As you can see, the line chart gives us a much better overview of the stock prices' data. It goes from 2015 to 2021, and it shows that the stock price has been increasing over time, except for some bumps in 2019 and 2020.
Seattle weather
Now, let’s take a look at the first rows of a dataset containing weather data, such as the max temperature and humidity, for Seattle (US) and see how these numbers change over time.
import pandas as pdimport matplotlib.pyplot as pltimport matplotlib.dates as dtdf = pd.read_csv('seattle_weather.csv')# Changing the datatypedf["date"] = pd.to_datetime(df['date'], format='%Y-%m-%d')# Setting the Date as indexdf = df.set_index('date')fig, ax = plt.subplots(figsize=(7, 4.5), dpi=300)ax.plot(df['temp_max'])# Formatting axe to make it easier to readax.xaxis.set_major_locator(dt.YearLocator())ax.xaxis.set_minor_locator(dt.MonthLocator((1,4,7,10)))ax.xaxis.set_major_formatter(dt.DateFormatter("\n%Y"))ax.xaxis.set_minor_formatter(dt.DateFormatter("%b"))plt.setp(ax.get_xticklabels(), rotation=0, ha="center")plt.subplots_adjust(bottom=0.15)# Labellingplt.xlabel("Date")plt.ylabel("Temperature (max)")plt.title("Seattle daily max temperature")fig.savefig("output/output.png")plt.close(fig)
Here we can see there is more variability and a clear seasonal pattern. Every year, temperatures reach a peak around July/August and a low around December/January. On the other hand, there is no clear trend—the overall temperature doesn't seem to be getting significantly higher or lower every year.