...

/

Implementation of DFS

Implementation of DFS

Learn how to implement depth first search.

In this lesson, we’ll solve the problem of detecting cycles in directed graphs by implementing DFS.

Implementation notes

The easiest way to implement DFS is as a recursive procedure:

  1. Mark the current node as “in progress.”
  2. Recursively process all nodes behind discovery edges.
  3. Mark the current node as done.

Due to the recursion occuring between starting and finishing a node, we automatically explore deeper nodes first before going up to shallower nodes again.

So, in principle, our depth-first search procedure should be a recursive function

void dfs(int u) {}

that performs DFS from the starting node u.

As we mostly need to perform operations on all neighboring edges of a node, we’ll make use of the adjacency list graph representation. We are considering unweighted graphs here, but the implementation for weighted graphs is very similar. We’ll simply ignore the values of the weights.

But now our DFS procedure additionally needs to access some “global” state such as:

  • the adjacency list of the graph.
  • the current state or color of each node.

One option would be to pass this state to the dfs routine additionally, like in

void dfs(int u, vector<vector<int>> const &adjacencyList, vector<NodeState> &state) {}

However, this is unwieldy, especially when we want to keep track of more state during the search, such as node predecessors or distances.

Instead, we’ll implement our cycle detection algorithm as a class where this state is stored in the member variables.

Avoiding stack overflows

There is one downside of implementing DFS recursively. As each recursive function call creates a stack frame, it is possible to run into stack overflows on very large graphs.

%0 node_1 dfs(u1) node_2 dfs(u2) node_3 dfs(u3) node_1622885713399 ... node_1622885682682 dfs(u1000000) node_1622885667177 dfs(u1000001) top_node top node_1622885667177->top_node
Visualization of the call stack of a DFS execution where a high recursion depth can lead to stack overflows.

An easy workaround for this problem is to simply increase the maximum stack size, for example, using the ulimit -s command on Linux systems.

But some operating systems do not support raising the stack size past a hard limit. In that case, the only remedy is to implement DFS iteratively by keeping track of a vector of “in progress” nodes. This vector can be allocated on the heap and thus we can avoid stack overflow issues.

Implementing cycle detection with DFS

Let’s start by setting up a basic class to run the cycle detection.

Press + to interact
#include <vector>
#include <iostream>
using namespace std;
// shorthand for adjacency list type
typedef vector<vector<int>> AdjacencyList;
// define an enum for node states during DFS execution
enum class NodeState {UNVISITED, IN_PROGRESS, FINISHED};
class CycleDetector {
private:
const AdjacencyList adjacencyList;
vector<NodeState> nodeStates;
bool cycleFound = false;
public:
CycleDetector(AdjacencyList _adjacencyList)
: adjacencyList {_adjacencyList}
// initialize all nodes as unvisited
, nodeStates {vector<NodeState>(adjacencyList.size(), NodeState::UNVISITED)}
{}
};

To keep track of the node states during DFS, we have defined the enumeration NodeState.

enum class NodeState {UNVISITED, IN_PROGRESS, FINISHED};

The states correspond to the colors white, gray, and blue from the previous lesson.

In our class, we use a vector to keep track of the current state of each node. Initially, all nodes are in the UNVISITED state. We also maintain a boolean variable cycleFound, initially false, which keeps track of whether we have discovered a cycle yet.

Next, let’s write the DFS routine.

Press + to interact
void dfs(int u) {
// mark current node as in progress
this->nodeStates[u] = NodeState::IN_PROGRESS;
for (int v : this->adjacencyList[u]) {
switch (this->nodeStates[v]) {
// discovery edge: recursively call dfs
case NodeState::UNVISITED: dfs(v); break;
// back edge: mark cycle as found
case NodeState::IN_PROGRESS: this->cycleFound = true; break;
// redundant edge: skip
case NodeState::FINISHED: break;
}
}
// mark current node as done
this->nodeStates[u] = NodeState::FINISHED;
}

After marking the current node u as IN_PROGRESS, we’ll check all of its outgoing edges. Our action will depend on the type of each edge:

  • For discovery edges leading to unvisited nodes, we’ll recursively call dfs.
  • For back edges leading to “in progress” nodes, we’ll note that we have found a cycle, but will not
...