Introduction
Union-Find (also called Disjoint Set Union or DSU) helps you track elements partitioned into non-overlapping sets. You can perform operations: find tells you which set an element belongs to, and union merges sets into one.
You'll use Union-Find in:
Kruskal's MST: Check if adding an edge creates a cycle
Cycle detection: Determine if a graph has cycles
Connected components: Group nodes that can reach each other
The key insight is representing sets as trees. Each element points to a parent, and the root identifies the set.
Naive Approach
Start with a parent array where parent[i] = i means each element is its own set.
function find(x):
while parent[x] != x:
x = parent[x]
return x
function union(x, y):
rootX = find(x)
rootY = find(y)
parent[rootX] = rootY
The problem? Trees can become tall chains. If you union , finding the root of takes time. You walk the entire chain.
This makes operations cost time with space. You need optimizations to fix this.
Path Compression
Path compression flattens the tree during find. When you search for a root, you make every node along the path point directly to the root.
function find(x):
if parent[x] != x:
parent[x] = find(parent[x])
return parent[x]
After calling find($1$) on chain , all nodes point directly to . The next find on any of these nodes takes time.
Path compression alone gives amortized per operation. Combined with union by rank, you get near-constant time.
Union by Rank
Union by rank keeps trees balanced. Track each tree's height (rank) and always attach the shorter tree under the taller one.
function init(n):
for i from 0 to n-1:
parent[i] = i
rank[i] = 0
function union(x, y):
rootX = find(x)
rootY = find(y)
if rootX == rootY:
return
if rank[rootX] < rank[rootY]:
parent[rootX] = rootY
else if rank[rootX] > rank[rootY]:
parent[rootY] = rootX
else:
parent[rootY] = rootX
rank[rootX] = rank[rootX] + 1
Rank only increases when merging equal-height trees. This bounds tree height to .
Complexity Analysis
With both path compression and union by rank:
Time: per operation, where is the inverse Ackermann function. For any practical (up to ), . This is effectively .
Space: for the parent and rank arrays.
Neither optimization alone achieves this. Path compression without rank gives . Rank without compression also gives . Together, they create a structure where trees stay flat and operations stay fast.