What is the pandas shift() method?

An essential component of Python data analysis is the pandas package. It gives us the ability to work with and examine data with amazing simplicity. When working with sequential data, especially time series, one of its useful methods, shift(), is essential.

What is the shift() method?

Essentially, by using the shift() method, we can move the data values in a DataFrame along a certain axis by a predetermined number of positions using the shift() method. This movement allows us to analyze data from various angles and can be either forward (downward) or backward (upward).

Why is shift() useful?

The shift() method is handy in many data situations, especially when dealing with time-related information. Here are some practical ways in which it can be helpful:

  • Predicting future values: When we want to guess what might happen next based on past patterns, shift() helps by creating new columns that show historical values. These columns can be used as helpful clues in our prediction model.

  • Comparing now and then: shift() is great for directly comparing data at different times. It helps us spot trends, patterns, or changes in our information.

  • Calculating changing trends: Combine shift() with other tools, like averages or standard deviations, to figure out how our data is changing over time. This gives us insights into the overall trend.

How does shift() work?

The shift() function needs one essential thing: the number of positions to move the data. If we use a positive number, it shifts the data down, and if it’s a negative number, it shifts the data up.

Syntax

Here is the syntax of shift() method in Python:

DataFrame.shift(periods=1, freq=None, axis=0, fill_value=None, suffix=None)
Syntax of shift() method

Parameters

  • periods (int): The number of periods (positions) to shift.

    • Positive values shift data downwards (forward in time if working with time series).

    • Negative values shift data upwards (backward in time).

  • freq (str or DateOffset): This controls the frequency with which the index is shifted when working with time series data. Here are some common examples:

    • 'D': Daily frequency

    • 'B': Business day frequency

    • 'M': Month end frequency

  • axis (int or str {'index', 'columns'}): The axis along which to shift the data.

    • 0 or 'index': Shift values along the rows.

    • 1 or 'columns': Shift values along the columns.

  • fill_value (object): The value to fill the new entries created by the shift operation.

    • If not specified, missing values are typically represented by NaN (Not a Number).

  • suffix(str, optional): If str and periods are iterable, this is added after the column name and before the shift value for each shifted column name.

Code example

Here’s the example code of the shift() method:

import pandas as pd
data = {'Sales': [100, 150, 80, 200], 'Profit': [20, 30, 15, 40]}
df = pd.DataFrame(data)
df['Sales_prev'] = df['Sales'].shift(1)
df['Profit_lag2'] = df['Profit'].shift(2)
print(df)

Code explanation

In the above code:

  • Line 1: We import the pandas library as pd.

  • Line 3: We create a dictionary named data to store sample sales and profit figures as lists.

  • Line 4: We convert the data dictionary into a pandas DataFrame named df.

  • Line 6: We create a new column Sales_prev in the DataFrame df. This column contains the values of the Sales column shifted down by one position (1 period). It’s like having a column that shows the previous day’s sales.

  • Line 8: Another new column Profit_lag2 is added. This column contains the values of the Profit column shifted up by two positions (2 periods). It’s like having a column that shows the profit from two days ago.

  • Line 10: We print the entire DataFrame df with the added columns. We’ll see the original Sales and Profit columns along with the new Sales_prev and Profit_lag2 columns showing the shifted values.

Conclusion

By understanding the shift() method and its applications, we can unlock new possibilities for data exploration and analysis in Python, particularly when working with time series data.

Free Resources

Copyright ©2024 Educative, Inc. All rights reserved