Unsloth optimizes fine-tuning for speed. It rewrites attention and training kernels for x faster training with % less memory. Supports Llama, Mistral, Phi, Gemma, and Qwen.
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/llama-3-8b-bnb-4bit",
max_seq_length=2048
)
model = FastLanguageModel.get_peft_model(model, r=16)
The API mirrors Hugging Face. If you're training on consumer GPUs, Unsloth is almost always worth using.