QLoRA introduced paged optimizers using NVIDIA unified memory. When GPU memory fills up, optimizer states automatically spill to CPU RAM.
This prevents out-of-memory crashes during training spikes. You might see slight slowdowns when paging occurs, but training continues instead of crashing.
Enable paged optimizers when memory is tight. The bitsandbytes library handles this automatically.