Instead of iterating subsets directly, build them incrementally. Process one bit position at a time. For each bit, either keep it or turn it off (if it was on).
Define = sum of where submask matches mask in bits through , and can differ in bits through . Base case: . No bits are flexible yet, so the only "submask" is mask itself. Final answer: gives the sum over all submasks.