Union-Find (DSU) - Algorithm Tutorial | Repovive

#####   ######  #####    ###    #   #  ###  #   #  ######
##  ##  ##      ##  ##  ## ##   #   #   #   #   #  ##
#####   ####    #####   #   #   #   #   #   #   #  ####
##  #   ##      ##      ## ##    # #    #    # #   ##
##   #  ######  ##       ###      #    ###    #    ######

$ curl repovive.com/algorithms/union-find

░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░█████████████████████████████████████████

#####   ######  #####    ###    #   #  ###  #   #  ######
##  ##  ##      ##  ##  ## ##   #   #   #   #   #  ##
#####   ####    #####   #   #   #   #   #   #   #  ####
##  #   ##      ##      ## ##    # #    #    # #   ##
##   #  ######  ##       ###      #    ###    #    ######

$ curl repovive.com/algorithms/union-find

Introduction

Union-Find (also called Disjoint Set Union or DSU) helps you track elements partitioned into non-overlapping sets. You can perform $2$ operations: find tells you which set an element belongs to, and union merges $2$ sets into one.

You'll use Union-Find in:

$1.$ Kruskal's MST: Check if adding an edge creates a cycle

$2.$ Cycle detection: Determine if a graph has cycles

$3.$ Connected components: Group nodes that can reach each other

The key insight is representing sets as trees. Each element points to a parent, and the root identifies the set.

Naive Approach

Start with a parent array where parent[i] = i means each element is its own set.

text

function find(x):
    while parent[x] != x:
        x = parent[x]
    return x

function union(x, y):
    rootX = find(x)
    rootY = find(y)
    parent[rootX] = rootY

The problem? Trees can become tall chains. If you union $1 \to 2 \to 3 \to ... \to n$ , finding the root of $1$ takes $O(n)$ time. You walk the entire chain.

This makes $m$ operations cost $O(m \cdot n)$ time with $O(n)$ space. You need optimizations to fix this.

Path Compression

Path compression flattens the tree during find. When you search for a root, you make every node along the path point directly to the root.

text

function find(x):
    if parent[x] != x:
        parent[x] = find(parent[x])
    return parent[x]

After calling find($1$) on chain $1 \to 2 \to 3 \to 4$ , all nodes point directly to $4$ . The next find on any of these nodes takes $O(1)$ time.

Path compression alone gives amortized $O(\log n)$ per operation. Combined with union by rank, you get near-constant time.

Union by Rank

Union by rank keeps trees balanced. Track each tree's height (rank) and always attach the shorter tree under the taller one.

text

function init(n):
    for i from 0 to n-1:
        parent[i] = i
        rank[i] = 0

function union(x, y):
    rootX = find(x)
    rootY = find(y)
    if rootX == rootY:
        return
    if rank[rootX] < rank[rootY]:
        parent[rootX] = rootY
    else if rank[rootX] > rank[rootY]:
        parent[rootY] = rootX
    else:
        parent[rootY] = rootX
        rank[rootX] = rank[rootX] + 1

Rank only increases when merging equal-height trees. This bounds tree height to $O(\log n)$ .

Complexity Analysis

With both path compression and union by rank:

Time: $O(\alpha(n))$ per operation, where $\alpha$ is the inverse Ackermann function. For any practical $n$ (up to $10^{80}$ ), $\alpha(n) \leq 4$ . This is effectively $O(1)$ .

Space: $O(n)$ for the parent and rank arrays.

Neither optimization alone achieves this. Path compression without rank gives $O(\log n)$ . Rank without compression also gives $O(\log n)$ . Together, they create a structure where trees stay flat and operations stay fast.

Union-Find (DSU) Algorithm Tutorial

Introduction

Naive Approach

Path Compression

Union by Rank

Complexity Analysis

Implement this Algorithm

Related Algorithms