The Chain Idea

Here is the idea: if you split the tree into chains, any path crosses at most $O(\log n)$ chains. Each chain is a sequence of ancestor-descendant nodes. You can represent each chain as a contiguous segment. To query a path from $u$ to $v$ , you query $O(\log n)$ segments (one per chain the path crosses) and combine the results.

The magic is in choosing which edges form chains so that paths cross few chains. If you pick chains randomly, a path might cross $O(n)$ chains. The heavy-light edge rule guarantees $O(\log n)$ crossings.

Space complexity is $O(n)$ for the data structures used.