Parameter-Efficient Fine-Tuning — LoRA & QLoRA

Fine-tune massive models on consumer hardware with LoRA and QLoRA techniques

LoRA & QLoRA

How LoRA Works

Instead of updating ALL 7B+ parameters, LoRA trains tiny adapter matrices:

Full fine-tuning: Updates ALL weights (expensive)

LoRA: Only trains A and B matrices (0.1-1% of parameters)

The original weights are frozen. LoRA adapters are small (~10-50MB) and swappable.

Training Requirements

Method	VRAM (7B)	VRAM (70B)
Full fine-tuning	56 GB	560 GB
LoRA (rank=8)	16 GB	160 GB
QLoRA (4-bit)	6 GB	48 GB

Dataset Quality

Size	Quality	Suitable For
100-500	Low	Experiments
1,000-5,000	Medium	Task adaptation
5,000-20,000	High	Production

Your Turn!

Calculate QLoRA training cost:

python

gpu_hours = 12
gpu_cost_per_hour = 3.50  # A100 on RunPod
total = gpu_hours * gpu_cost_per_hour
print(f'QLoRA training: ${total:.2f}')
print(f'Compare to full FT: ${total*100:.0f}')

✏️ Code Editor

Loading Python...

📤 Output

Write your solution and click "Run Code" to test it!

← AI Safety & Alignment Next: Agent Architectures — ReAct & Beyond →