--- name: transformers description: Use when "HuggingFace Transformers", "pre-trained models", "pipeline API", or asking about "text generation", "text classification", "question answering", "NER", "fine-tuning transformers", "AutoModel", "Trainer API" version: 1.0.0 --- # HuggingFace Transformers Access thousands of pre-trained models for NLP, vision, audio, and multimodal tasks. ## When to Use - Quick inference with pipelines - Text generation, classification, QA, NER - Image classification, object detection - Fine-tuning on custom datasets - Loading pre-trained models from HuggingFace Hub --- ## Pipeline Tasks ### NLP Tasks | Task | Pipeline Name | Output | |------|---------------|--------| | **Text Generation** | `text-generation` | Completed text | | **Classification** | `text-classification` | Label + confidence | | **Question Answering** | `question-answering` | Answer span | | **Summarization** | `summarization` | Shorter text | | **Translation** | `translation_en_to_fr` | Translated text | | **NER** | `ner` | Entity spans + types | | **Fill Mask** | `fill-mask` | Predicted tokens | ### Vision Tasks | Task | Pipeline Name | Output | |------|---------------|--------| | **Image Classification** | `image-classification` | Label + confidence | | **Object Detection** | `object-detection` | Bounding boxes | | **Image Segmentation** | `image-segmentation` | Pixel masks | ### Audio Tasks | Task | Pipeline Name | Output | |------|---------------|--------| | **Speech Recognition** | `automatic-speech-recognition` | Transcribed text | | **Audio Classification** | `audio-classification` | Label + confidence | --- ## Model Loading Patterns ### Auto Classes | Class | Use Case | |-------|----------| | **AutoModel** | Base model (embeddings) | | **AutoModelForCausalLM** | Text generation (GPT-style) | | **AutoModelForSeq2SeqLM** | Encoder-decoder (T5, BART) | | **AutoModelForSequenceClassification** | Classification head | | **AutoModelForTokenClassification** | NER, POS tagging | | **AutoModelForQuestionAnswering** | Extractive QA | **Key concept**: Always use Auto classes unless you need a specific architecture—they handle model detection automatically. --- ## Generation Parameters | Parameter | Effect | Typical Values | |-----------|--------|----------------| | **max_new_tokens** | Output length | 50-500 | | **temperature** | Randomness (0=deterministic) | 0.1-1.0 | | **top_p** | Nucleus sampling threshold | 0.9-0.95 | | **top_k** | Limit vocabulary per step | 50 | | **num_beams** | Beam search (disable sampling) | 4-8 | | **repetition_penalty** | Discourage repetition | 1.1-1.3 | **Key concept**: Higher temperature = more creative but less coherent. For factual tasks, use low temperature (0.1-0.3). --- ## Memory Management ### Device Placement Options | Option | When to Use | |--------|-------------| | **device_map="auto"** | Let library decide GPU allocation | | **device_map="cuda:0"** | Specific GPU | | **device_map="cpu"** | CPU only | ### Quantization Options | Method | Memory Reduction | Quality Impact | |--------|------------------|----------------| | **8-bit** | ~50% | Minimal | | **4-bit** | ~75% | Small for most tasks | | **GPTQ** | ~75% | Requires calibration | | **AWQ** | ~75% | Activation-aware | **Key concept**: Use `torch_dtype="auto"` to automatically use the model's native precision (often bfloat16). --- ## Fine-Tuning Concepts ### Trainer Arguments | Argument | Purpose | Typical Value | |----------|---------|---------------| | **num_train_epochs** | Training passes | 3-5 | | **per_device_train_batch_size** | Samples per GPU | 8-32 | | **learning_rate** | Step size | 2e-5 for fine-tuning | | **weight_decay** | Regularization | 0.01 | | **warmup_ratio** | LR warmup | 0.1 | | **evaluation_strategy** | When to eval | "epoch" or "steps" | ### Fine-Tuning Strategies | Strategy | Memory | Quality | Use Case | |----------|--------|---------|----------| | **Full fine-tuning** | High | Best | Small models, enough data | | **LoRA** | Low | Good | Large models, limited GPU | | **QLoRA** | Very Low | Good | 7B+ models on consumer GPU | | **Prefix tuning** | Low | Moderate | When you can't modify weights | --- ## Tokenization Concepts | Parameter | Purpose | |-----------|---------| | **padding** | Make sequences same length | | **truncation** | Cut sequences to max_length | | **max_length** | Maximum tokens (model-specific) | | **return_tensors** | Output format ("pt", "tf", "np") | **Key concept**: Always use the tokenizer that matches the model—different models use different vocabularies. --- ## Best Practices | Practice | Why | |----------|-----| | Use pipelines for inference | Handles preprocessing automatically | | Use device_map="auto" | Optimal GPU memory distribution | | Batch inputs | Better throughput | | Use quantization for large models | Run 7B+ on consumer GPUs | | Match tokenizer to model | Vocabularies differ between models | | Use Trainer for fine-tuning | Built-in best practices | ## Resources - Docs: - Model Hub: - Course: