The baseline train.py already uses RMSNorm for all normalizations. But when agents start from architectures using LayerNorm instead, they independently discover that swapping to RMSNorm lowers val_bpb.
RMSNorm was formalized by Zhang and Sennrich in . It took human researchers years of theoretical work to propose and validate it. Your agent finds the same result in a single experiment by trying the swap and measuring the outcome.
The agent has no understanding of why RMSNorm works better. It doesn't know the theory. It discovers the technique purely through empirical testing. That's what searching program space gives you.