Each node stores a set or map. In the worst case, you might have total elements across all sets at any point in the DFS.
However, you can free child sets after merging if you do not need them later. This keeps memory usage reasonable and close to linear.
In practice, small-to-large uses less memory than you would expect because sets are reused, not copied. You are moving pointers or references, not duplicating data.