QLoRA

A fine-tuning method combining quantization with LoRA. The base model loads in 4-bit precision to save memory while LoRA adapters train in higher precision. QLoRA enables fine-tuning very large models on a single GPU with minimal quality loss.