Implementation of DFS
Learn how to implement depth first search.
In this lesson, we’ll solve the problem of detecting cycles in directed graphs by implementing DFS.
Implementation notes
The easiest way to implement DFS is as a recursive procedure:
- Mark the current node as “in progress.”
- Recursively process all nodes behind discovery edges.
- Mark the current node as done.
Due to the recursion occuring between starting and finishing a node, we automatically explore deeper nodes first before going up to shallower nodes again.
So, in principle, our depth-first search procedure should be a recursive function
void dfs(int u) {}
that performs DFS from the starting node u
.
As we mostly need to perform operations on all neighboring edges of a node, we’ll make use of the adjacency list graph representation. We are considering unweighted graphs here, but the implementation for weighted graphs is very similar. We’ll simply ignore the values of the weights.
But now our DFS procedure additionally needs to access some “global” state such as:
- the adjacency list of the graph.
- the current state or color of each node.
One option would be to pass this state to the dfs
routine additionally, like in
void dfs(int u, vector<vector<int>> const &adjacencyList, vector<NodeState> &state) {}
However, this is unwieldy, especially when we want to keep track of more state during the search, such as node predecessors or distances.
Instead, we’ll implement our cycle detection algorithm as a class where this state is stored in the member variables.
Avoiding stack overflows
There is one downside of implementing DFS recursively. As each recursive function call creates a stack frame, it is possible to run into stack overflows on very large graphs.
An easy workaround for this problem is to simply increase the maximum stack size, for example, using the ulimit -s
command on Linux systems.
But some operating systems do not support raising the stack size past a hard limit. In that case, the only remedy is to implement DFS iteratively by keeping track of a vector
of “in progress” nodes. This vector
can be allocated on the heap and thus we can avoid stack overflow issues.
Implementing cycle detection with DFS
Let’s start by setting up a basic class to run the cycle detection.
#include <vector>#include <iostream>using namespace std;// shorthand for adjacency list typetypedef vector<vector<int>> AdjacencyList;// define an enum for node states during DFS executionenum class NodeState {UNVISITED, IN_PROGRESS, FINISHED};class CycleDetector {private:const AdjacencyList adjacencyList;vector<NodeState> nodeStates;bool cycleFound = false;public:CycleDetector(AdjacencyList _adjacencyList): adjacencyList {_adjacencyList}// initialize all nodes as unvisited, nodeStates {vector<NodeState>(adjacencyList.size(), NodeState::UNVISITED)}{}};
To keep track of the node states during DFS, we have defined the enumeration NodeState
.
enum class NodeState {UNVISITED, IN_PROGRESS, FINISHED};
The states correspond to the colors white, gray, and blue from the previous lesson.
In our class, we use a vector
to keep track of the current state of each node. Initially, all nodes are in the UNVISITED
state. We also maintain a boolean variable cycleFound
, initially false
, which keeps track of whether we have discovered a cycle yet.
Next, let’s write the DFS routine.
void dfs(int u) {// mark current node as in progressthis->nodeStates[u] = NodeState::IN_PROGRESS;for (int v : this->adjacencyList[u]) {switch (this->nodeStates[v]) {// discovery edge: recursively call dfscase NodeState::UNVISITED: dfs(v); break;// back edge: mark cycle as foundcase NodeState::IN_PROGRESS: this->cycleFound = true; break;// redundant edge: skipcase NodeState::FINISHED: break;}}// mark current node as donethis->nodeStates[u] = NodeState::FINISHED;}
After marking the current node u
as IN_PROGRESS
, we’ll check all of its outgoing edges. Our action will depend on the type of each edge:
- For discovery edges leading to unvisited nodes, we’ll recursively call
dfs
. - For back edges leading to “in progress” nodes, we’ll note that we have found a cycle, but will not