You can run DPO and other alignment methods with LoRA. Train adapters instead of full model weights.
This reduces memory requirements significantly. A B model DPO with LoRA fits on consumer GPUs.
QLoRA for alignment also works. The quantized base model stays frozen while you train LoRA adapters on preference data.