---
name: fine-tuning-customization
description: LLM fine-tuning with LoRA, QLoRA, DPO alignment, and synthetic data generation. Efficient training, preference learning, data creation. Use when customizing models for specific domains.
version: 1.0.0
tags: [fine-tuning, lora, qlora, dpo, synthetic-data, rlhf, 2026]
context: fork
agent: llm-integrator
author: OrchestKit
user-invocable: false
---

# Fine-Tuning & Customization

Customize LLMs for specific domains using parameter-efficient fine-tuning and alignment techniques.

> **Unsloth 2026**: 7x longer context RL, FP8 RL on consumer GPUs, rsLoRA support. **TRL**: OpenEnv integration, vLLM server mode, transformers 5.0.0+ compatible.

## Decision Framework: Fine-Tune or Not?

| Approach | Try First | When It Works |
|----------|-----------|---------------|
| Prompt Engineering | Always | Simple tasks, clear instructions |
| RAG | External knowledge needed | Knowledge-intensive tasks |
| Fine-Tuning | Last resort | Deep specialization, format control |

**Fine-tune ONLY when:**
1. Prompt engineering tried and insufficient
2. RAG doesn't capture domain nuances
3. Specific output format consistently required
4. Persona/style must be deeply embedded
5. You have ~1000+ high-quality examples

## LoRA vs QLoRA (Unsloth 2026)

| Criteria | LoRA | QLoRA |
|----------|------|-------|
| Model fits in VRAM | Use LoRA | |
| Memory constrained | | Use QLoRA |
| Training speed | 39% faster | |
| Memory savings | | 75%+ (dynamic 4-bit quants) |
| Quality | Baseline | ~Same (Unsloth recovered accuracy loss) |
| 70B LLaMA | | <48GB VRAM with QLoRA |

## Quick Reference: LoRA Training

```python
from unsloth import FastLanguageModel
from trl import SFTTrainer

# Load with 4-bit quantization (QLoRA)
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Meta-Llama-3.1-8B",
    max_seq_length=2048,
    load_in_4bit=True,
)

# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r=16,              # Rank (16-64 typical)
    lora_alpha=32,     # Scaling (2x r)
    lora_dropout=0.05,
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",  # Attention
        "gate_proj", "up_proj", "down_proj",      # MLP (QLoRA paper)
    ],
)

# Train
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    max_seq_length=2048,
)
trainer.train()
```

## DPO Alignment

```python
from trl import DPOTrainer, DPOConfig

config = DPOConfig(
    learning_rate=5e-6,  # Lower for alignment
    beta=0.1,            # KL penalty coefficient
    per_device_train_batch_size=4,
    num_train_epochs=1,
)

# Preference dataset: {prompt, chosen, rejected}
trainer = DPOTrainer(
    model=model,
    ref_model=ref_model,  # Frozen reference
    args=config,
    train_dataset=preference_dataset,
    tokenizer=tokenizer,
)
trainer.train()
```

## Synthetic Data Generation

```python
async def generate_synthetic(topic: str, n: int = 100) -> list[dict]:
    """Generate training examples using teacher model."""
    examples = []
    for _ in range(n):
        response = await client.chat.completions.create(
            model="gpt-4o",  # Teacher
            messages=[{
                "role": "system",
                "content": f"Generate a training example about {topic}. "
                          "Include instruction and response."
            }],
            response_format={"type": "json_object"}
        )
        examples.append(json.loads(response.choices[0].message.content))
    return examples
```

## Key Hyperparameters

| Parameter | Recommended | Notes |
|-----------|-------------|-------|
| Learning rate | 2e-4 | LoRA/QLoRA standard |
| Epochs | 1-3 | More risks overfitting |
| LoRA r | 16-64 | Higher = more capacity |
| LoRA alpha | 2x r | Scaling factor |
| Batch size | 4-8 | Per device |
| Warmup | 3% | Ratio of steps |

## Anti-Patterns (FORBIDDEN)

```python
# NEVER fine-tune without trying alternatives first
model.fine_tune(data)  # Try prompt engineering & RAG first!

# NEVER use low-quality training data
data = scrape_random_web()  # Garbage in, garbage out

# NEVER skip evaluation
trainer.train()
deploy(model)  # Always evaluate before deploy!

# ALWAYS use separate eval set
train, eval = split(data, test_size=0.1)
trainer = SFTTrainer(..., eval_dataset=eval)
```

## Detailed Documentation

| Resource | Description |
|----------|-------------|
| [references/lora-qlora.md](references/lora-qlora.md) | Parameter-efficient fine-tuning |
| [references/dpo-alignment.md](references/dpo-alignment.md) | Direct Preference Optimization |
| [references/synthetic-data.md](references/synthetic-data.md) | Training data generation |
| [references/when-to-finetune.md](references/when-to-finetune.md) | Decision framework |

## Related Skills

- `llm-evaluation` - Evaluate fine-tuned models
- `embeddings` - When to use embeddings instead
- `rag-retrieval` - When RAG is better than fine-tuning
- `langfuse-observability` - Track training experiments

## Capability Details

### lora-qlora
**Keywords:** LoRA, QLoRA, PEFT, parameter efficient, adapter, low-rank
**Solves:**
- Fine-tune large models on consumer hardware
- Configure LoRA hyperparameters
- Choose target modules for adapters

### dpo-alignment
**Keywords:** DPO, RLHF, preference, alignment, human feedback, preference data
**Solves:**
- Align models to human preferences
- Create preference datasets
- Configure DPO training

### synthetic-data
**Keywords:** synthetic data, data generation, teacher model, distillation
**Solves:**
- Generate training data with LLMs
- Implement teacher-student training
- Scale training data quality

### when-to-finetune
**Keywords:** should I fine-tune, fine-tune decision, customize model
**Solves:**
- Decide when fine-tuning is appropriate
- Evaluate alternatives to fine-tuning
- Assess data requirements