Here's the trick: process nums1 in order, from index to . When assigning nums1[], you've already assigned indices to . The mask tells you which elements of nums2 are taken. The number of set bits equals how many nums1 elements have been assigned.
So popcount(mask) tells you which nums1 index comes next. This eliminates the need for a second dimension. You don't need because .