How to perform time series forecasting using ARIMA in Python

Share

Time series data is a collection of observations recorded at different time intervals. These data points are ordered chronologically and can be used to predict future values. Time series forecasting is a crucial technique in various domains such as finance, economics, and meteorology. In this answer, we’ll explore ARIMA time series forecasting using Python.

Understanding ARIMA

ARIMA stands for autoregressive integrated moving average. It’s a robust statistical method for analyzing and forecasting time series data. ARIMA combines three key components to model a time series:

  • Autoregressive (AR): The Autoregressive component considers the correlation between the current value of the time series and its previous values. It assumes that the future values of the series can be predicted using past values.

  • Differencing (I): Differencing is a technique used to make a time series stationary. Stationarity is important because many time series models, including ARIMA, work best when the data is stationary. Stationary data has a constant mean and variance over time.

  • Moving average (MA): The Moving average component helps to model the relationship between the current value and the past prediction errors (residuals).

ARIMA model in Python

Let’s exemplify the ARIMA model in Python by predicting the next 10 closing prices of the S&P 500 from the given data as follows:

main.py
data.csv
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
# Data
my_df = pd.read_csv('data.csv')
closing_price = my_df['prices']
# ARIMA model
my_model = ARIMA(closing_price, order=(1, 1, 2))
model_fit = my_model.fit()
# Predict values
# 100-109, refers to the next 10 values after the value at 99th index
predicted_values = model_fit.predict(100, 109)
# Plot actual and predicted values
plt.figure()
plt.plot(closing_price, label='Actual Values')
plt.plot(predicted_values, label='Predicted Values', color='red', linestyle='dotted')
plt.xlabel('Time')
plt.ylabel('Value')
plt.title('Actual vs. Predicted Values from ARIMA')
plt.legend()
plt.show()

Explanation

  • Lines 1–3: We import the necessary libraries.

    • pandas for data manipulation, matplotlib for creating visualizations, and statsmodels for time series analysis using the ARIMA model.

  • Lines 6–7: The data is loaded from a CSV file named 'data.csv' using pandas, which is a powerful data manipulation library.

    • Specifically, we’re interested in the 'prices' column of the DataFrame, which presumably contains the closing prices of the S&P 500 stock.

  • Lines 10–11: We create an ARIMA model with an order of (1, 1, 2). The order consists of three components:

    • The autoregressive order (p), the differencing order (d), and the moving average order (q). These values determine the behavior of the ARIMA model.

    • Once the ARIMA model is defined, we fit it to the closing prices of the S&P 500 data.

  • Lines 15: Here, we’re interested in forecasting the closing price values for the next 10 time points after the last index in our original data. The model_fit.predict(100, 109) call generates predictions for these time points using the fitted ARIMA model.

  • Lines 18–25: We create a line plot using matplotlib. The plot includes two lines: one representing the actual closing price values and another representing the predicted values from our ARIMA model.

    • The actual values are displayed in blue, while the predicted values are shown in red with a dotted line style.

    • The x-axis of the plot represents time, and the y-axis represents the value of the closing prices.

    • The title of the plot is set as 'Actual vs. Predicted Values from ARIMA'.

    • To provide clarity, a legend is included to distinguish between the actual and predicted values.

Conclusion

In conclusion, this answer demonstrates the ARIMA model for time series forecasting along with the code example to load time series data, create an ARIMA model, make predictions, and visualize the actual versus predicted closing price values. It’s a practical example of how data analysis and forecasting can be performed using Python and relevant libraries.

Copyright ©2024 Educative, Inc. All rights reserved