Dates in the Index
Learn to manipulate date information in pandas.
If we have dates in the index, we can do some powerful manipulation and aggregation of our data.
We’ll shift gears and look at data that has a date as an index. Let’s look at the amount of snow that fell each day at a ski resort:
import pandas as pdurl = 'https://github.com/mattharrison/datasets/raw/master/data/alta-noaa-1980-2019.csv'alta_df = pd.read_csv(url)dates = pd.to_datetime(alta_df.DATE)snow = (alta_df.SNOW.rename(dates))print(snow)
Finding missing data
Let’s look for missing data. There are a few methods that can help us deal with missing data in time data. We can check if any values are missing using any
:
print(snow.isna().any())
There is missing data. Let’s look where it is:
print(snow[snow.isna()])
With a date index, we can provide partial date strings to the loc
indexing attribute. This will let us inspect around the missing data and see if that gives us any insight into why it’s missing:
print(snow.loc['1985-09':'1985-09-20'])
Filling in the missing data
Often we have time-series data with missing values. For example, in the snow data, the value for the date 1985-09-19
is missing. (See previous code.)
This value looks like it could be filled in with zero (since this is the end of summer):
print(snow.loc['1985-09':'1985-09-20'].fillna(0))
However, in the middle of the winter, these values in January might not be zero. (It’s not clear to us why these values are missing. Did a sensor fail? Did someone forget to write down the amount? Was it really zero?) The best way to deal with missing data is to talk to an expert and determine why it’s missing:
print(snow.loc['1987-12-30':'1988-01-10'])
pandas has various tricks for dealing with missing data. Let’s demonstrate them using missing data from the end of December through January. Notice what happens to the January 1 value as we demonstrate these. ...