The autoresearch-mlx fork runs natively on Apple Silicon through MLX, Apple's array framework. No PyTorch. No CUDA.
Setup is the same pattern: install uv, run uv sync, then uv run prepare.py to build the tokenizer and data shards. Test with uv run train.py. Point your agent at program.md and let it loop.
On an M Max, one community run drove val_bpb from down to overnight. Smaller hardware produces different winning strategies than H runs. Width scaling matters less. Attention patterns matter more. The fixed -minute budget still applies.