Tied embeddings means sharing weights between the input embedding layer and the output projection (lm_head). Press and Wolf published this technique in .
In some runs, agents independently discover weight tying and it improves val_bpb. But the result depends on the starting architecture. In Discussion #, the agent tried tying embeddings on nanochat's default config. It failed. val_bpb jumped by , a catastrophic regression.
Agents can rediscover known techniques, but they don't know which architectures benefit from them. They just try it and measure.