...

/

Tarjan’s Algorithm

Tarjan’s Algorithm

Learn about Tarjan's algorithm for finding strongly connected components.

The earliest algorithm for finding strongly connected components is a natural extension of depth-first search and is attributed to a 1972 paperTarjan, R. E. “Depth-first search and linear graph algorithms.” SIAM Journal on Computing 1, no. 2 (1972). by Robert Tarjan.

Press + to interact

Building intuition

The first vertex of an SCC to be visited by depth-first search becomes an ancestor of all other vertices in that SCC (and potentially vertices of other SCCs visited subsequently). We call this vertex the root of the SCC or scc-root, since it is the root of the depth-first subtree rooted at vv.

An scc-root is the first vertex of an SCC to be visited and the last vertex of that SCC to be marked explored.

The idea behind Tarjan’s algorithm is to follow the natural flow of depth-first search and identify the strongly connected components as soon as the last vertex of that strongly connected component is marked explored.

For example, if the boxes in the following illustration represent the strongly connected components CiC_i in a tree-like hierarchy (corresponding to a depth-first tree), and the dotted edges represent non-tree edges going between different SCCs, then the algorithm identifies C1C_1 first, then C2C_2, then C3C_3, and so on, as indicated by increasing subscripts—in the order specified by the endedended timestamps of the scc-roots.

Press + to interact
Cross-edges and forward-edges shown dotted
Cross-edges and forward-edges shown dotted

Using a stack for identification of an SCC

To implement Tarjan’s algorithm, each vertex is pushed onto the stack as soon as it’s visited by depth-first search. When a vertex is marked explored, it is not removed from the stack unless it is an scc-root.

As soon as an scc-root is marked explored, we pop all of its descendants present on the stack in addition to the scc-root itself and create a new strongly connected component containing these vertices.

So, at any point in time, the only vertices on the stack are the ones that have been visited but whose SCCs have not been determined.

Don’t worry about how to identify an scc-root at this point. For now, the interest is only in seeing how the SCCs are identified whenever an scc-root is identified.

Press + to interact
Depth-first search is started at a vertex v1 of a directed graph whose SCCs are to be identified
1 / 13
Depth-first search is started at a vertex v1 of a directed graph whose SCCs are to be identified

The magic in Tarjan’s algorithm is in figuring out which vertices are the scc-roots. Once an scc-root is identified, the scc-root and the descendants present above it on the stack can be popped to create a new SCC.

To see how to identify scc-roots, it’s useful to think about how a vertex in the directed graph can be reached from another vertex.

Reflections on non-tree edges and reachability

Suppose depth-first search is run on a directed graph, and every edge is classified into one of four types. The tree-edges offer us limited insight into how some of the vertices can be reached from others. So, let’s think about how the non-tree edges can be useful for reaching one vertex from another:

Forward-edge

A forward-edge (u,v)(u,v) just gives an alternate path from uu to vv, since by definition of a forward-edge, another path exists from uu to vv consisting entirely of tree-edges. So, it adds nothing more to our knowledge that vv is reachable from uu.

Since forward-edges offer us no new insights into strongly connected components, they are unimportant for our present purpose.

Back-edge

A back-edge (u,v)(u,v) creates a cycle, and all vertices in a cycle must belong to the same SCC.

A back-edge can only lead to a vertex in the same SCC.

Cross-edge

The end-vertices of a cross-edge may belong to the same SCC, but they may very well belong to two different ones.

Claim: For a cross-edge (u,v)(u,v), if uu and vv belong to different SCCs, the SCC containing ...