Small-to-large merging uses the idea of a heavy child: the child with the largest subtree. This is the same concept as heavy-light decomposition, but you do not need the full HLD structure here. No chains, no path decomposition.
You identify the heavy child per node and reuse its data structure. The rest is standard DFS. The heavy child concept gives you the efficiency without the complexity of full HLD.