Data Structures for Disjoint Sets

Overview

Disjoint-set data structures, also known as union-find structures, manage a dynamic collection of non-overlapping sets. They are crucial in problems that involve connectivity and group membership, such as network clustering, image processing, and graph algorithms.

The core operations supported are:

MAKE-SET(x): Initializes a new set containing only the element x.
FIND-SET(x): Returns the representative of the set containing x. This helps identify which subset an element belongs to.
UNION(x, y): Merges the two disjoint sets that contain elements x and y.

These operations are designed to be fast—even across large sequences of updates—especially when combined with key optimizations.

Applications and Basic Operations

A canonical use case of disjoint-set structures is identifying connected components in an undirected graph. The algorithm works by initially placing each vertex into its own singleton set and then performing a UNION for each edge to merge the connected vertices. After processing all edges, any two vertices are in the same connected component if and only if they belong to the same set.

CONNECTED-COMPONENTS(G):
    for each vertex v in G.V:
        MAKE-SET(v)
    for each edge (u, v) in G.E:
        if FIND-SET(u) ≠ FIND-SET(v):
            UNION(u, v)

This algorithm is especially beneficial in dynamic graphs, where edges are added incrementally over time, making a full traversal (like DFS or BFS) costly for every update.

Linked-List Representation

The simplest representation of disjoint sets is via linked lists, where each set is a list and its head serves as the representative. This design yields:

MAKE-SET(x): O(1) — simply create a new node.
FIND-SET(x): O(1) — follow a pointer back to the set object.

However, UNION(x, y) can be expensive. To merge two sets, we must update all elements in the shorter list to now point to the new representative, taking O(n) time in the worst case.

Weighted-Union Heuristic

To reduce cost, a weighted-union strategy is used: always append the smaller list to the larger one. This ensures any single element’s pointer is updated at most O(log n) times.

Theorem 19.1
A sequence of m operations (with n MAKE-SET calls) takes O(m + n log n) time using the linked-list structure with weighted-union.

Disjoint-Set Forests

A more sophisticated approach uses rooted trees to represent sets, forming what’s known as a disjoint-set forest. Here:

Each node points to its parent.
The root is its own parent and serves as the set representative.
MAKE-SET creates a new root.
FIND-SET walks up the tree to find the root.
UNION attaches one tree to another.

Two Key Heuristics

Union by Rank
Each node keeps a rank, which approximates tree height. When merging two trees, the root of the lower-rank tree becomes a child of the higher-rank tree. If ranks are equal, one becomes the root and its rank increases.
Path Compression
During FIND-SET, every node visited is directly connected to the root. This drastically flattens the structure, making subsequent operations nearly constant time.

Theorem 19.14
A sequence of m disjoint-set operations takes O(m α(n)) time, where α(n) is the inverse Ackermann function—growing so slowly that α(n) ≤ 4 for all practical inputs.

Analysis of Union by Rank with Path Compression

To rigorously understand the efficiency of disjoint-set forests, we apply amortized analysis using the potential method. This considers how expensive operations are on average, over time.

Key Concepts:

Rank Invariant: Ranks strictly increase on any path toward the root.
Ackermann Function (Aₖ(j)): Grows extremely fast.
Inverse Ackermann Function (α(n)): Grows extremely slowly and is used to bound the running time.

Amortized time per operation:

Operation	Amortized Time
MAKE-SET	O(1)
FIND-SET	O(α(n))
UNION	O(α(n))

This makes disjoint-set forests with union by rank and path compression one of the most efficient data structures for dynamic connectivity.

Applications

Disjoint-set data structures are widely used in:

Kruskal’s Algorithm for computing Minimum Spanning Trees
Tarjan’s Offline Least Common Ancestor (LCA) queries in rooted trees
Connected components detection in graphs
Dynamic connectivity and clustering
Image segmentation and object tracking in computer vision
Network and circuit analysis (e.g., testing bridge-failure connectivity)

APA References

Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2022). Introduction to Algorithms (4th ed.). The MIT Press.

Tarjan, R. E. (1975). Efficiency of a good but not linear set union algorithm. Journal of the ACM (JACM), 22(2), 215–225. https://doi.org/10.1145/321879.321884

Sedgewick, R., & Wayne, K. (2011). Algorithms (4th ed.). Addison-Wesley.

Skiena, S. S. (2020). The Algorithm Design Manual (3rd ed.). Springer.

GeeksforGeeks. (n.d.). Disjoint Set (Union-Find) | Set 1 (Detect Cycle in an Undirected Graph). https://www.geeksforgeeks.org/disjoint-set-data-structures/

USFCA Visualization Tool. (n.d.). Disjoint Set Visualizer. Retrieved from https://www.cs.usfca.edu/~galles/visualization/DisjointSets.html