Tarjan’s Algorithm
Learn about Tarjan's algorithm for finding strongly connected components.
The earliest algorithm for finding strongly connected components is a natural extension of depth-first search and is attributed to a 1972
Building intuition
The first vertex of an SCC to be visited by depth-first search becomes an ancestor of all other vertices in that SCC (and potentially vertices of other SCCs visited subsequently). We call this vertex the root of the SCC or scc-root, since it is the root of the depth-first subtree rooted at .
An scc-root is the first vertex of an SCC to be visited and the last vertex of that SCC to be marked explored.
The idea behind Tarjan’s algorithm is to follow the natural flow of depth-first search and identify the strongly connected components as soon as the last vertex of that strongly connected component is marked explored.
For example, if the boxes in the following illustration represent the strongly connected components in a tree-like hierarchy (corresponding to a depth-first tree), and the dotted edges represent non-tree edges going between different SCCs, then the algorithm identifies first, then , then , and so on, as indicated by increasing subscripts—in the order specified by the timestamps of the scc-roots.
Using a stack for identification of an SCC
To implement Tarjan’s algorithm, each vertex is pushed onto the stack as soon as it’s visited by depth-first search. When a vertex is marked explored, it is not removed from the stack unless it is an scc-root.
As soon as an scc-root is marked explored, we pop all of its descendants present on the stack in addition to the scc-root itself and create a new strongly connected component containing these vertices.
So, at any point in time, the only vertices on the stack are the ones that have been visited but whose SCCs have not been determined.
Don’t worry about how to identify an scc-root at this point. For now, the interest is only in seeing how the SCCs are identified whenever an scc-root is identified.
The magic in Tarjan’s algorithm is in figuring out which vertices are the scc-roots. Once an scc-root is identified, the scc-root and the descendants present above it on the stack can be popped to create a new SCC.
To see how to identify scc-roots, it’s useful to think about how a vertex in the directed graph can be reached from another vertex.
Reflections on non-tree edges and reachability
Suppose depth-first search is run on a directed graph, and every edge is classified into one of four types. The tree-edges offer us limited insight into how some of the vertices can be reached from others. So, let’s think about how the non-tree edges can be useful for reaching one vertex from another:
Forward-edge
A forward-edge just gives an alternate path from to , since by definition of a forward-edge, another path exists from to consisting entirely of tree-edges. So, it adds nothing more to our knowledge that is reachable from .
Since forward-edges offer us no new insights into strongly connected components, they are unimportant for our present purpose.
Back-edge
A back-edge creates a cycle, and all vertices in a cycle must belong to the same SCC.
A back-edge can only lead to a vertex in the same SCC.
Cross-edge
The end-vertices of a cross-edge may belong to the same SCC, but they may very well belong to two different ones.
Claim: For a cross-edge , if and belong to different SCCs, the SCC containing ...