---
name: funsloth-hfjobs
description: Training manager for Hugging Face Jobs - launch fine-tuning on HF cloud GPUs with optional WandB monitoring
---

# Hugging Face Jobs Training Manager

Run Unsloth training on Hugging Face Jobs (cloud GPU training).

## Prerequisites

1. **HF Authentication**: `huggingface-cli whoami` (login if needed)
2. **HF Jobs Access**: Requires PRO subscription or org compute access
3. **Training notebook/script**: From `funsloth-train`

## Workflow

### 1. Select Hardware

| GPU | VRAM | Cost | Best For |
|-----|------|------|----------|
| A10G | 24GB | ~$1.50/hr | 7-14B LoRA |
| A100 40GB | 40GB | ~$4/hr | 14-34B |
| A100 80GB | 80GB | ~$6/hr | 70B |
| H100 | 80GB | ~$8/hr | Fastest |

See [references/HARDWARE_GUIDE.md](references/HARDWARE_GUIDE.md) for model-to-GPU mapping.

### 2. Convert Notebook to Script

HF Jobs requires PEP 723 script format:

```python
# /// script
# requires-python = ">=3.10"
# dependencies = [
#     "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git",
#     "torch>=2.0",
#     "transformers>=4.45",
#     "trl>=0.12",
#     "peft>=0.13",
#     "datasets>=2.18",
# ]
# ///
```

Use [scripts/train_sft.py](scripts/train_sft.py) as a template.

### 3. Optional: WandB Integration

Add to script:
```python
import wandb
wandb.init(project="funsloth-training")
# Add report_to="wandb" in TrainingArguments
```

Set: `export WANDB_API_KEY="your-key"`

### 4. Estimate Costs

Use the cost estimator:
```bash
python scripts/estimate_cost.py --tokens {total_tokens} --platform hfjobs
```

### 5. Launch Job

```bash
# Create job config
cat > job_config.yaml << 'EOF'
compute:
  gpu: {gpu_type}
  gpu_count: 1
script: train_hfjobs.py
outputs:
  - /outputs/*
EOF

# Submit
huggingface-cli jobs create --config job_config.yaml
```

### 6. Monitor Progress

```bash
huggingface-cli jobs status {job_id}
huggingface-cli jobs logs {job_id} --follow
```

WandB: `https://wandb.ai/{username}/funsloth-training`

### 7. Download Artifacts

```python
from huggingface_hub import snapshot_download
snapshot_download(repo_id="{username}/funsloth-job", local_dir="./outputs")
```

### 8. Handoff

Offer `funsloth-upload` for Hub upload with model card.

## Error Handling

| Error | Resolution |
|-------|------------|
| No HF Jobs access | Get PRO subscription |
| OOM | Reduce batch size or upgrade GPU |
| Job timeout | Enable checkpointing |
| Script error | Check PEP 723 dependencies |

## Bundled Resources

- [scripts/train_sft.py](scripts/train_sft.py) - PEP 723 script template
- [scripts/estimate_cost.py](scripts/estimate_cost.py) - Cost estimation
- [references/PLATFORM_COMPARISON.md](references/PLATFORM_COMPARISON.md) - HF Jobs vs alternatives
- [references/HARDWARE_GUIDE.md](references/HARDWARE_GUIDE.md) - VRAM requirements
- [references/TROUBLESHOOTING.md](references/TROUBLESHOOTING.md) - Common issues