Stochastic Processes
Learn the difference between stochastic and deterministic processes and why stochastic processes are important for time series analysis.
We'll cover the following
Randomness
Time series are datasets whose time index is one of their main characteristics. This means we need to know when an observation was produced to understand the data-generating process behind our sample. Yet, before going deep into why time is so important, let’s take a step back and reflect on the expression “data-generating process” itself. What does it mean?
The data-generating process of our sample is nothing but the real-world mechanism that produces our data. While the definition may seem obvious, it implies an uncomfortable truth: In the empirical sciences, where statistics and data science belong, we usually don’t know for sure the underlying mechanism that produced our data. At most, we know an approximation of it. Think, for instance, of stock prices: We know that good news about a company’s performance might positively impact its stock price. However, we almost never know exactly by how many dollars the stock will go up if the company beats its revenue forecasts by 1%. In such cases, we’ll usually give a (better or worse) confidence interval.
For this reason, in empirical sciences, we often think of data as realizations of a more or less random process. In mathematical terms, this is known as stochastic processes.
Deterministic systems
Roughly speaking, stochastic processes are data-generating processes that contain a random component. This is in contrast to deterministic processes, which can be perfectly reconstructed based on logical rules.
Deterministic systems can be as simple as the metric system (one meter will always be 100 centimeters) or as complex as Newtonian physics. To understand the state of a deterministic system, we only need to know two things:
The laws that regulate the system
The initial conditions of the system
Look at the code snippet below. It is an extremely simple example of what a deterministic time series can be. The function calculates the power of an initial state in a sequence of steps. Note that by knowing what the initial condition of the system was (as defined by the parameter initial_state
) and how many steps have been calculated, we can trace the whole process.
import pandas as pdimport matplotlib.pyplot as pltdef deterministic(initial_state, steps):'''This function takes an initial value for a deterministic system and a number of steps,and returns a series of realisations of the system at the end of those steps.The value of the system is the power of the initial step at the end step.Parameters::initial_state: integer:steps: integer'''counter = 0state = initial_staterealisations = []while counter <= steps:state = 2 * staterealisations.append(state)counter +=1return realisationsresults = deterministic(2, 10)s = pd.Series(results)plt.plot(s)plt.xlabel('Step')plt.ylabel('Value of system')plt.show()
Stochastic processes
Stochastic processes, on the other hand, have a built-in random component. This doesn’t mean that the whole system is random, though. Think, for instance, of weather: If you were to guess what the weather would be like tomorrow, expecting it to be similar to today’s wouldn’t be a bad approximation. However, you couldn’t possibly know how many minutes of sunlight or millimeters of rain will fall at any given minute, and even the best forecast will only give you a confidence interval. In other words, weather patterns are sticky and forecastable, but not entirely.
Another example is a country’s gross domestic product (GDP). Modern economists are confident that the value of a country’s production of goods and services tends to increase under the current economic system. However, the upward trend is not monotonic, and it’s usually broken by small and big variations up and down, as you can see in the figure below, which shows the US GDP from January 2000 to July 2022.
In the following code snippet, we can see an example of one of the most basic stochastic processes in time series: A random walk. Random walks are processes that are the sum of their previous realization plus some random noise at each time step. Don’t worry about the model itself. Just focus on how the code works. Run it a few times, and we will see that for a given initial_state
and number of steps
, the potential realizations are literally infinite:
import numpy as npimport pandas as pdimport matplotlib.pyplot as pltdef random_walk(initial_state, steps):'''This function takes an initial value and a number of steps and calculates a seriesof realizations of a simple random walk at the end of those steps.The random walk is defined by a standard normal distribution.Parameters::initial_state: integer:steps: integer'''counter = 0state = initial_staterealisations = []while counter <= steps:state = state + np.random.normal(0,1)realisations.append(state)counter +=1return realisationsresults = random_walk(0, 10)s = pd.Series(results)plt.plot(s)plt.xlabel('Step')plt.ylabel('Value of system')plt.show()
Stochastic processes are fundamental to the way we understand and model time series. Because we don’t usually understand how the data-generating mechanism of our time series works, we need to assume randomness is built into the system. This is what ARIMA models, among many others, do.