The community fork autoresearch-win-rtx adds support for consumer NVIDIA GPUs on Windows and Linux. The RTX has GB of VRAM, compared to GB on an H.
The fork uses a profile-driven approach. It detects your GPU's compute capability, BF/TF support, and VRAM tier, then auto-tunes batch sizes. You don't manually set DEPTH or sequence length. A short autotune pass runs at startup and caches decisions per GPU.
Add a VRAM constraint to your program.md: " GB VRAM. Revert any change that causes out-of-memory." Each experiment still takes minutes, but you get fewer gradient steps than on an H.