Overview
Disjoint-set data structures, also known as union-find structures, manage a dynamic collection of non-overlapping sets. They are crucial in problems that involve connectivity and group membership, such as network clustering, image processing, and graph algorithms.
The core operations supported are:
MAKE-SET(x)
: Initializes a new set containing only the elementx
.FIND-SET(x)
: Returns the representative of the set containingx
. This helps identify which subset an element belongs to.UNION(x, y)
: Merges the two disjoint sets that contain elementsx
andy
.
These operations are designed to be fast—even across large sequences of updates—especially when combined with key optimizations.
Applications and Basic Operations
A canonical use case of disjoint-set structures is identifying connected components in an undirected graph. The algorithm works by initially placing each vertex into its own singleton set and then performing a UNION
for each edge to merge the connected vertices. After processing all edges, any two vertices are in the same connected component if and only if they belong to the same set.
CONNECTED-COMPONENTS(G):
for each vertex v in G.V:
MAKE-SET(v)
for each edge (u, v) in G.E:
if FIND-SET(u) ≠ FIND-SET(v):
UNION(u, v)
This algorithm is especially beneficial in dynamic graphs, where edges are added incrementally over time, making a full traversal (like DFS or BFS) costly for every update.
Linked-List Representation
The simplest representation of disjoint sets is via linked lists, where each set is a list and its head serves as the representative. This design yields:
MAKE-SET(x)
: O(1) — simply create a new node.FIND-SET(x)
: O(1) — follow a pointer back to the set object.
However, UNION(x, y)
can be expensive. To merge two sets, we must update all elements in the shorter list to now point to the new representative, taking O(n) time in the worst case.
Weighted-Union Heuristic
To reduce cost, a weighted-union strategy is used: always append the smaller list to the larger one. This ensures any single element’s pointer is updated at most O(log n) times.
Theorem 19.1
A sequence ofm
operations (withn
MAKE-SET calls) takes O(m + n log n) time using the linked-list structure with weighted-union.
Disjoint-Set Forests
A more sophisticated approach uses rooted trees to represent sets, forming what’s known as a disjoint-set forest. Here:
- Each node points to its parent.
- The root is its own parent and serves as the set representative.
MAKE-SET
creates a new root.FIND-SET
walks up the tree to find the root.UNION
attaches one tree to another.
Two Key Heuristics
- Union by Rank
Each node keeps arank
, which approximates tree height. When merging two trees, the root of the lower-rank tree becomes a child of the higher-rank tree. If ranks are equal, one becomes the root and its rank increases. - Path Compression
DuringFIND-SET
, every node visited is directly connected to the root. This drastically flattens the structure, making subsequent operations nearly constant time.
Theorem 19.14
A sequence ofm
disjoint-set operations takes O(m α(n)) time, where α(n) is the inverse Ackermann function—growing so slowly that α(n) ≤ 4 for all practical inputs.
Analysis of Union by Rank with Path Compression
To rigorously understand the efficiency of disjoint-set forests, we apply amortized analysis using the potential method. This considers how expensive operations are on average, over time.
Key Concepts:
- Rank Invariant: Ranks strictly increase on any path toward the root.
- Ackermann Function (Aₖ(j)): Grows extremely fast.
- Inverse Ackermann Function (α(n)): Grows extremely slowly and is used to bound the running time.
Amortized time per operation:
Operation | Amortized Time |
---|---|
MAKE-SET | O(1) |
FIND-SET | O(α(n)) |
UNION | O(α(n)) |
This makes disjoint-set forests with union by rank and path compression one of the most efficient data structures for dynamic connectivity.
Applications
Disjoint-set data structures are widely used in:
- Kruskal’s Algorithm for computing Minimum Spanning Trees
- Tarjan’s Offline Least Common Ancestor (LCA) queries in rooted trees
- Connected components detection in graphs
- Dynamic connectivity and clustering
- Image segmentation and object tracking in computer vision
- Network and circuit analysis (e.g., testing bridge-failure connectivity)
APA References
Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2022). Introduction to Algorithms (4th ed.). The MIT Press.
Tarjan, R. E. (1975). Efficiency of a good but not linear set union algorithm. Journal of the ACM (JACM), 22(2), 215–225. https://doi.org/10.1145/321879.321884
Sedgewick, R., & Wayne, K. (2011). Algorithms (4th ed.). Addison-Wesley.
Skiena, S. S. (2020). The Algorithm Design Manual (3rd ed.). Springer.
GeeksforGeeks. (n.d.). Disjoint Set (Union-Find) | Set 1 (Detect Cycle in an Undirected Graph). https://www.geeksforgeeks.org/disjoint-set-data-structures/
USFCA Visualization Tool. (n.d.). Disjoint Set Visualizer. Retrieved from https://www.cs.usfca.edu/~galles/visualization/DisjointSets.html