Every experiment produces a git commit, a results.tsv entry, and a run.log file. But reproducing a result means more than having the code. You also need the same GPU, the same driver version, the same CUDA toolkit, and the same random state.
torch.compile behaves differently across GPU architectures. An optimization that works on an H might not reproduce on an RTX . CUDA nondeterminism means the same code on the same GPU can produce slightly different val_bpb scores across runs.
If reproducibility matters to you, pin your software versions and record your GPU model alongside every experiment result.