prepare.py is the file you never touch. It holds the infrastructure that keeps your experiments fair and comparable.
Inside, you'll find global constants like MAX_SEQ_LEN = 2048 and TIME_BUDGET = 300 (the -minute cap per experiment), the evaluate_bpb function that computes your validation metric, BPE tokenizer training with a default vocabulary of tokens, dataset downloading and tokenization, and the dataloaders.
If your agent could edit prepare.py, it could game the metric or change the time budget. Locking this file keeps your playing field level.