For each node, you need to merge frequency maps from all its children. If you merge naively, you recompute everything from scratch, giving time. Small-to-Large merging fixes this. For each node, keep the largest child's frequency map.
Then merge all smaller children into it, one color at a time. Each color gets moved at most times (it only moves when its subtree is the smaller one). Total: .
Space complexity is for the data structures used.