The objective of the shortest path problem in a directed graph is to determine the minimum weight path between two vertices, where the edge weights correspond to the cost or distance between vertices. To achieve this, we are given a directed graph , where is the set of vertices and is the set of edges with associated weights, along with two vertices (the source) and (the target). This problem can be tackled by employing different algorithms, such as Dijkstra’s algorithm, Bellman-Ford algorithm, or the Floyd-Warshall algorithm, based on the specific requirements and properties of the graph.
This blog post will focus on the Bellman-Ford algorithm as a means of solving the shortest path problem in a directed graph. The algorithm’s core is based on Bellman’s equation, which also serves as the foundation for reinforcement learning–a popular field in data science.
Bellman’s equation is a fundamental concept in dynamic programming, which is a technique for solving optimization problems that involve making decisions over time. It is named after Richard Bellman, who first formulated it in the 1950s. In its simplest form, Bellman’s equation is a recursive equation that expresses the optimal value of a problem in terms of the optimal values of its subproblems. It is written as follows:
Here, is the optimal value of the problem at state , is the reward or cost of taking action in state ; is the next state that results from taking action in state ; and max denotes the maximum over all possible actions .
Note: Bellman’s equation for minimization problems has “min” in place of “max.”
Consider the directed graph given in the figure below. The problem involves finding the shortest path from all vertices to a fixed destination vertex, denoted as . We can express the length of the shortest path from a given vertex to , which consists of no more than edges, as . Using the notation to represent the weight of the edge from vertex to , we can conclude that . However, for , we need to consider the sum of the weights of the edges from vertex to an intermediate vertex and from to . Therefore, we have . Finally, we have , meaning there is no path from vertex to that consists of one edge at most. Considering a shortest path can be found with no more than edges, where is the total number of vertices, our goal is to find given that and . We can express Bellman’s equation for the shortest path as follows:
The implementation of the algorithm is straightforward. We first define the nodes and edges of the graph and then directly apply Bellman’s equation.
import numpy as np# Define the nodesnodes = list('xypwz')x, y, p, w, z = 0, 1, 2, 3, 4# Number of nodesn = len(nodes)# Create index listidxs = list(np.arange(n))# Create a mapping of indices to node labelsi2s = dict(zip(idxs, nodes))# Initialize the cost matrix E and the value matrix VV = np.inf * np.ones(n)V[z] = 0E = np.inf * np.ones((n, n))# Set the costs for the edgesE[x, y] = 4E[x, p] = 2E[y, p] = 1E[y, w] = 3E[y, z] = 2E[p, y] = 3E[p, w] = 5E[p, z] = 4E[w, z] = -5E[z, x] = 3# Bellman's algorithm to find shortest pathsdef Bellman(E, V):valueChanged = TruenumIters = 0while valueChanged:numIters += 1valueChanged = Falsefor i in range(V.size):minVi = V[i]for j in range(V.size):vij = E[i, j] + V[j]if vij < minVi:minVi = vijvalueChanged = TrueV[i] = minVireturn V, numIters# Function to print shortest pathsdef print_shortest_paths(E, V, V_star, i2s):for i in range(V.size):minVi = V[i]bestNext = ifor j in range(V.size):vij = E[i, j] + V_star[j]if vij < minVi:minVi = vijbestNext = jprint(i2s[i] + '-->' + i2s[bestNext])# Run the Bellman's algorithm and print the shortest pathsV_star, numIters = Bellman(E.copy(), V.copy())print('Iterations:', numIters)print_shortest_paths(E.copy(), V.copy(), V_star.copy(), i2s)
The code above implements Bellman’s algorithm to find the shortest paths in a directed graph. The graph is represented by an adjacency matrix E
, where E[i, j]
represents the cost of the edge from node i
to node j
. The algorithm initializes the cost vector V
with infinite values, except for the destination node z
, which is set to 0
.
The Bellman
function performs iterations of the algorithm until no further improvements can be made to the cost vector V
. It updates the cost values for each node by considering the minimum cost of reaching that node through its neighbors.
The print_shortest_paths
function takes the original cost matrix E
, the initial cost vector V
, the final cost vector V_star
, and a mapping of indices to node labels i2s
. It prints the shortest paths from each node to their best next nodes based on the final cost vector V_star
.
Finally, the code runs Bellman’s algorithm and prints the number of iterations performed and the shortest paths for each node.
To achieve efficiency, the code can be vectorized. However, the benefits of vectorization extend beyond just speed. In particular, we redefine as the length of the shortest path that consists of exactly edges. To ensure consistency, we allow self-loops on each node with a cost of in case the shortest path has fewer than edges.
We consider the nodes as states in an environment where an agent takes actions of the form , representing a jump to state . The cost of an action in state is defined as the cost of the edge from to , denoted as . The Bellman’s equation can now be rewritten as follows:
We can interpret as the minimum cost of taking a sequence of actions starting from state . If an action does not lead to a state, its cost is considered infinite, such as in our example graph where there is no edge from to . Furthermore, is computed by estimating the costs over all actions in state and selecting the action with the minimum cost. If we denote as the minimum cost of a sequence of actions when the first in state is , then . It is important to note that satisfies the following equation:
The algorithm can now be decomposed to calculate values for every state-action pair .
Additionally, we introduce a transition matrix denoted as for an action . is a matrix consisting of zeros and ones. Each entry equals if action from state leads to state via an edge. In our example graph, when ordering the states as , , , , and , the transition matrix for action is as follows:
This matrix represents the connectivity between states when taking action . Each entry indicates whether action from state leads to state via an edge.
We are set for vectorization. In particular, if we denote and , then we get the following equation:
Furthermore, if then the vector can be found by minimising the matrix along the horizontal axis.
Here is a vectorized implementation by utilizing np.einsum
to avoid the loop over the actions.
import numpy as np# Define the nodes and their corresponding indicesnodes = list('xypwz')x, y, p, w, z = 0, 1, 2, 3, 4# Define the action indicesax, ay, ap, aw, az = 0, 1, 2, 3, 4# Define the actionsactions = ['jump to x', 'jump to y', 'jump to p', 'jump to w', 'jump to z']# Number of nodesn = len(nodes)# Create lists and dictionaries for node and action indicesidxs = list(np.arange(n))i2s = dict(zip(idxs, nodes))i2a = dict(zip(idxs, actions))# Initialize the value function V and set the value of the goal state to 0V = np.inf * np.ones(n)V[z] = 0# Initialize the cost/reward matrix R with infinity valuesR = np.inf * np.ones((n, n))# Define the costs/rewards for each state-action pairR[x, ax] = 0R[y, ay] = 0R[p, ap] = 0R[w, aw] = 0R[z, az] = 0R[x, ay] = 4R[x, ap] = 2R[y, ap] = 1R[y, aw] = 3R[y, az] = 2R[p, ay] = 3R[p, aw] = 5R[p, az] = 4R[w, az] = -5R[z, ax] = 3# Initialize the transition tensor T with zerosT = np.zeros((n, n, len(actions)))# Define the transitions for each state-action pairT[x, x, ax] = 1T[z, x, ax] = 1T[y, y, ay] = 1T[x, y, ay] = 1T[p, y, ay] = 1T[p, p, ap] = 1T[x, p, ap] = 1T[y, p, ap] = 1T[w, w, aw] = 1T[y, w, aw] = 1T[p, w, aw] = 1T[z, z, az] = 1T[y, z, az] = 1T[p, z, az] = 1T[w, z, az] = 1# Define the Bellman algorithm for finding the optimal value functiondef Bellman_fast(T, R, V):# Set infinite values in V and R to the maximum representable float valueV[V == np.inf] = np.finfo(np.float64).maxR[R == np.inf] = np.finfo(np.float64).max# Number of iterationsnum_iterations = len(V) - 1# Initialize the policy/best-action and the updated value functionP = np.arange(len(V))for i in range(num_iterations):# Calculate the Q-valuesQ = R + np.einsum('ijk,j->ik', T, V)# Update the value functionVi = np.min(Q, axis=1).copy()Pi = np.argmin(Q, axis=1).copy()idx = V - Vi > 0P[idx] = Pi[idx]V = Vi.copy()return V, P# Define a function to print the optimal pathdef print_path(s, z, P, i2a):while P[s] != z:print('From ' + nodes[s] + ' ' + i2a[P[s]])s = P[s]print('From ' + nodes[s] + ' ' + i2a[P[s]])# Find the optimal value function and policy using the Bellman algorithmV_star, P = Bellman_fast(T.copy(), R.copy(), V.copy())# Print the optimal path from state x to zprint_path(x, z, P, i2a)
Up to this point, our assumption has been that when taking action from state , it will always result in transitioning to state with a probability of .
Note: represents the probability of reaching state from state when taking action .
However, in many real-world environments, there is uncertainty involved. What if there is a non-zero probability that action in state leads to a state other than ?
Consider the scenario of playing with a remote control toy car. Pressing the button to move the car right may not always guarantee that the car will move to the right. Factors such as a slippery floor or an unexpected earthquake can introduce uncertainty during the action. In reality, most environments are inherently uncertain, and it becomes necessary to incorporate this uncertainty into our algorithm. The Bellman equation can now be reformulated as follows:
Fortunately, our vectorized algorithm can handle this uncertainty by incorporating transition probabilities into the corresponding transition matrices. We can assign the transition probability to the entry .
By making this adjustment, no other changes are necessary in the algorithm except for the number of iterations. However, it is important to note that when uncertainty is present, the number of iterations required to reach the optimal solution can potentially become infinite in the limit.
A Markov decision process (MDP) is a mathematical framework used to model decision-making problems in situations where outcomes are influenced by both stochastic (random) elements and the actions taken by an agent. It is named after the mathematician Andrey Markov.
The key components of an MDP are listed below:
The goal in an MDP is to find an optimal policy that maximizes the expected cumulative reward over time. Various algorithms, such as dynamic programming, reinforcement learning, and Monte Carlo methods, can be used to solve MDPs and find the optimal policy.
Note: We have indeed written vectorized code that solves an MDP! Just replace “minimization” with “maximization,” “path” with “policy” and “cost” with “reward.”
So far we’ve been using Bellman’s equation to solve a Markov decision process (MDP). However, when the transition probabilities of an MDP are unknown, the approach to solving such problems is commonly referred to as reinforcement learning. In reinforcement learning, the objective is to approximate the MDP using samples collected through the agent’s actions. From these samples, an explicit transition model can be constructed, or the algorithm can proceed without explicitly estimating the transition model.
Furthermore, an MDP assumes that the transition model of the environment is fixed, often referred to as a stationary process. However, reinforcement learning relaxes this assumption as well, allowing for more complex and dynamic environments. This relaxation introduces additional challenges in learning an optimal policy, as the transition model may change over time. Moreover, the need for continuous action and state spaces leads to a multidimensional complexity of the problem. Now that we’ve discussed the Bellman-Ford algorithm in this blog, you can learn more about creating AI solutions in the Grokking AI for Engineering & Product Managers or Mastering Machine Learning Theory and Practice course.
Reinforcement learning involves addressing the exploration-exploitation trade-off, where the agent needs to balance between exploiting its current knowledge to maximize rewards and exploring new actions to gather more information about the environment. Various algorithms and techniques, such as Q-learning and policy gradients have been developed to tackle the complexity of reinforcement learning problems and learn optimal policies in uncertain and changing environments. When value functions and/or policy functions are represented using deep neural networks, it is referred to as deep reinforcement learning.
If you like this blog, you may want to explore the following courses on Educative:
The machine learning field is rapidly advancing today due to the availability of large datasets and the ability to process big data efficiently. Moreover, several new techniques have produced groundbreaking results for standard machine learning problems. This course provides a detailed description of different machine learning algorithms and techniques, including regression, deep learning, reinforcement learning, Bayes nets, support vector machines (SVMs), and decision trees. The course also offers sufficient mathematical details for a deeper understanding of how different techniques work. An overview of the Python programming language and the fundamental theoretical aspects of ML, including probability theory and optimization, is also included. The course contains several practical coding exercises as well. By the end of the course, you will have a deep understanding of different machine-learning methods and the ability to choose the right method for different applications.
Game data science is emerging as a significant field of study due to the emergence of social games embedded in online social networks. The ubiquity of social games gives access to new data sources and impacts essential business decisions, given the introduction of freemium business models. Game data science covers collecting, storing, analyzing data, and communicating insights. This course will teach you game data extraction, processing, data abstraction, data analysis through visualization, data clustering, supervised learning, neural networks, and advanced sequence analysis using a case study and many examples in the R programming language.
Free Resources