Loading...
Loading...
Fine-tune massive models on consumer hardware with LoRA and QLoRA techniques
Instead of updating ALL 7B+ parameters, LoRA trains tiny adapter matrices:
The original weights are frozen. LoRA adapters are small (~10-50MB) and swappable.
| Method | VRAM (7B) | VRAM (70B) |
|---|---|---|
| Full fine-tuning | 56 GB | 560 GB |
| LoRA (rank=8) | 16 GB | 160 GB |
| QLoRA (4-bit) | 6 GB | 48 GB |
| Size | Quality | Suitable For |
|---|---|---|
| 100-500 | Low | Experiments |
| 1,000-5,000 | Medium | Task adaptation |
| 5,000-20,000 | High | Production |
Calculate QLoRA training cost:
gpu_hours = 12
gpu_cost_per_hour = 3.50 # A100 on RunPod
total = gpu_hours * gpu_cost_per_hour
print(f'QLoRA training: ${total:.2f}')
print(f'Compare to full FT: ${total*100:.0f}')