---
name: ai-llm-engineering
description: |
 Operational skill hub for LLM system architecture, evaluation, deployment, and optimization (modern production standards). Links to specialized skills for prompts, RAG, agents, and safety. Integrates recent advances: PEFT/LoRA fine-tuning, hybrid RAG handoff (see dedicated skill), vLLM 24x throughput, multi-layered security (90%+ bypass for single-layer), automated drift detection (18-second response), and CI/CD-aligned evaluation.
---

# LLM Engineering – Operational Skill Hub

A single resource for executing, validating, and scaling LLM systems with **modern production standards**, while delegating domain depth to specialized skills.

This skill provides quick reference, decision frameworks, and navigation to detailed operational patterns for:

- Data, training, fine-tuning (PEFT/LoRA standard)
- Evaluation (automated testing, metrics, rollout gates)
- Deployment (vLLM 24x throughput, FP8/FP4 quantization)
- LLMOps (automated drift detection, retraining)
- Safety (multi-layered defenses, AI-powered guardrails)

**For detailed patterns:** See [Resources](#resources-best-practices--operational-patterns) and [Templates](#templates-copy-paste-ready) sections below.

---

## Quick Reference

| Task | Tool/Framework | Command/Pattern | When to Use |
|------|----------------|-----------------|-------------|
| RAG Pipeline | LlamaIndex, LangChain | Page-level chunking + hybrid retrieval | Dynamic knowledge, 0.648 accuracy |
| Agentic Workflow | LangGraph, AutoGen, CrewAI | ReAct, multi-agent orchestration | Complex tasks, tool use required |
| Prompt Design | Anthropic, OpenAI guides | CoT, few-shot, structured | Task-specific behavior control |
| Evaluation | LangSmith, W&B, RAGAS | Multi-metric (hallucination, bias, cost) | Quality validation, A/B testing |
| Production Deploy | vLLM, TensorRT-LLM | FP8/FP4 quantization, 24x throughput | High-throughput serving, cost optimization |
| Monitoring | Arize Phoenix, LangFuse | Drift detection, 18-second response | Production LLM systems |

---

## Decision Tree: LLM System Architecture

```text
Building LLM application: [Architecture Selection]
    ├─ Need current knowledge?
    │   ├─ Simple Q&A? → Basic RAG (page-level chunking + hybrid retrieval)
    │   └─ Complex retrieval? → Advanced RAG (reranking + contextual retrieval)
    │
    ├─ Need tool use / actions?
    │   ├─ Single task? → Simple agent (ReAct pattern)
    │   └─ Multi-step workflow? → Multi-agent (LangGraph, CrewAI)
    │
    ├─ Static behavior sufficient?
    │   ├─ Quick MVP? → Prompt engineering (CI/CD integrated)
    │   └─ Production quality? → Fine-tuning (PEFT/LoRA)
    │
    └─ Best results?
        └─ Hybrid (RAG + Fine-tuning + Agents) → Comprehensive solution
```

**See [Decision Matrices](resources/decision-matrices.md) for detailed selection criteria.**

---

## When to Use This Skill

Claude should invoke this skill when the user asks about:

- LLM preflight/project checklists, production best practices, or data pipelines
- Building or deploying RAG, agentic, or prompt-based LLM apps
- Prompt design, chain-of-thought (CoT), ReAct, or template patterns
- Troubleshooting LLM hallucination, bias, retrieval issues, or production failures
- Evaluating LLMs: benchmarks, multi-metric eval, or rollout/monitoring
- LLMOps: deployment, rollback, scaling, resource optimization
- Technology stack selection (models, vector DBs, frameworks)
- Production deployment strategies and operational patterns

---

## Scope Boundaries (Use These Skills for Depth)

- **Prompt design & CI/CD** → [ai-prompt-engineering](../ai-prompt-engineering/SKILL.md)
- **RAG pipelines & chunking** → [ai-llm-rag-engineering](../ai-llm-rag-engineering/SKILL.md)
- **Search tuning (BM25, HNSW, hybrid)** → [ai-llm-search-retrieval](../ai-llm-search-retrieval/SKILL.md)
- **Agent architectures & tools** → [ai-agents-development](../ai-agents-development/SKILL.md)
- **Serving optimization/quantization** → [ai-llm-ops-inference](../ai-llm-ops-inference/SKILL.md)
- **Production deployment/monitoring** → [ai-ml-ops-production](../ai-ml-ops-production/SKILL.md)
- **Security/guardrails** → [ai-ml-ops-security](../ai-ml-ops-security/SKILL.md)

---

## Resources (Best Practices & Operational Patterns)

Comprehensive operational guides with checklists, patterns, and decision frameworks:

### Core Operational Patterns

- **[Project Planning Patterns](resources/project-planning-patterns.md)** - Stack selection, FTI pipeline, performance budgeting
  - AI engineering stack selection matrix
  - Feature/Training/Inference (FTI) pipeline blueprint
  - Performance budgeting and goodput gates
  - Progressive complexity (prompt → RAG → fine-tune → hybrid)

- **[Production Checklists](resources/production-checklists.md)** - Pre-deployment validation and operational checklists
  - LLM lifecycle checklist (modern production standards)
  - Data & training, RAG pipeline, deployment & serving
  - Safety/guardrails, evaluation, agentic systems
  - Reliability & data infrastructure (DDIA-grade)
  - Weekly production tasks

- **[Common Design Patterns](resources/common-design-patterns.md)** - Copy-paste ready implementation examples
  - Chain-of-Thought (CoT) prompting
  - ReAct (Reason + Act) pattern
  - RAG pipeline (minimal to advanced)
  - Agentic planning loop
  - Self-reflection and multi-agent collaboration

- **[Decision Matrices](resources/decision-matrices.md)** - Quick reference tables for selection
  - RAG type decision matrix (naive → advanced → modular)
  - Production evaluation table with targets and actions
  - Model selection matrix (GPT-4, Claude, Gemini, self-hosted)
  - Vector database, embedding model, framework selection
  - Deployment strategy matrix

- **[Anti-Patterns](resources/anti-patterns.md)** - Common mistakes and prevention strategies
  - Data leakage, prompt dilution, RAG context overload
  - Agentic runaway, over-engineering, ignoring evaluation
  - Hard-coded prompts, missing observability
  - Detection methods and prevention code examples

### Domain-Specific Patterns

- **[LLMOps Best Practices](resources/llmops-best-practices.md)** - Operational lifecycle and deployment patterns
- **[Evaluation Patterns](resources/eval-patterns.md)** - Testing, metrics, and quality validation
- **[Prompt Engineering Patterns](resources/prompt-engineering-patterns.md)** - Quick reference (canonical skill: [ai-prompt-engineering](../ai-prompt-engineering/SKILL.md))
- **[Agentic Patterns](resources/agentic-patterns.md)** - Quick reference (canonical skill: [ai-agents-development](../ai-agents-development/SKILL.md))
- **[RAG Best Practices](resources/rag-best-practices.md)** - Quick reference (canonical skill: [ai-llm-rag-engineering](../ai-llm-rag-engineering/SKILL.md))

**Note:** Each resource file includes preflight/validation checklists, copy-paste reference tables, inline templates, anti-patterns, and decision matrices.

---

## Templates (Copy-Paste Ready)

Production templates by use case and technology:

### RAG Pipelines

- **[Basic RAG](templates/rag-pipelines/template-basic-rag.md)** - Simple retrieval-augmented generation
- **[Advanced RAG](templates/rag-pipelines/template-advanced-rag.md)** - Hybrid retrieval, reranking, contextual embeddings

### Prompt Engineering

- **[Chain-of-Thought](templates/prompt-engineering/template-cot.md)** - Step-by-step reasoning pattern
- **[ReAct](templates/prompt-engineering/template-react.md)** - Reason + Act for tool use

### Agentic Workflows

- **[Reflection Agent](templates/agentic-workflows/template-reflection.md)** - Self-critique and improvement
- **[Multi-Agent](templates/agentic-workflows/template-multi-agent.md)** - Manager-worker orchestration

### Data Pipelines

- **[Data Quality](templates/data-pipelines/template-data-quality.md)** - Validation, deduplication, PII detection

### Deployment

- **[LLM Deployment](templates/deployment/template-llm-deployment.md)** - Production deployment with monitoring

### Evaluation

- **[Multi-Metric Evaluation](templates/evaluation/template-multi-metric.md)** - Comprehensive testing suite

---

## Related Skills

This skill integrates with complementary Claude Code skills:

### Core Dependencies

- **[ai-llm-rag-engineering](../ai-llm-rag-engineering/SKILL.md)** - Advanced RAG patterns, chunking strategies, hybrid retrieval, reranking
- **[ai-llm-search-retrieval](../ai-llm-search-retrieval/SKILL.md)** - Search optimization, BM25 tuning, vector search, ranking pipelines
- **[ai-prompt-engineering](../ai-prompt-engineering/SKILL.md)** - Systematic prompt design, evaluation, testing, and optimization
- **[ai-agents-development](../ai-agents-development/SKILL.md)** - Agent architectures, tool use, multi-agent systems, autonomous workflows

### Production & Operations

- **[ai-llm-development](../ai-llm-development/SKILL.md)** - Model training, fine-tuning, dataset creation, instruction tuning
- **[ai-llm-ops-inference](../ai-llm-ops-inference/SKILL.md)** - Production serving, quantization, batching, GPU optimization
- **[ai-ml-ops-production](../ai-ml-ops-production/SKILL.md)** - Deployment patterns, monitoring, drift detection, API design
- **[ai-ml-ops-security](../ai-ml-ops-security/SKILL.md)** - Security guardrails, prompt injection defense, privacy protection

---

## External Resources

See **[data/sources.json](data/sources.json)** for 50+ curated authoritative sources:

- **Official LLM platform docs** - OpenAI, Anthropic, Gemini, Mistral, Azure OpenAI, AWS Bedrock
- **Open-source models and frameworks** - HuggingFace Transformers, LLaMA, vLLM, PEFT/LoRA, DeepSpeed
- **RAG frameworks and vector DBs** - LlamaIndex, LangChain, LangGraph, Haystack, Pinecone, Qdrant, Chroma
- **2025 Agentic frameworks** - Anthropic Agent SDK, AutoGen, CrewAI, LangGraph Multi-Agent, Semantic Kernel
- **2025 RAG innovations** - Microsoft GraphRAG (knowledge graphs), Pathway (real-time), hybrid retrieval
- **Prompt engineering** - Anthropic Prompt Library, Prompt Engineering Guide, CoT/ReAct patterns
- **Evaluation and monitoring** - OpenAI Evals, HELM, Anthropic Evals, LangSmith, W&B, Arize Phoenix
- **Production deployment** - LiteLLM, Ollama, RunPod, Together AI, vLLM serving

---

## Usage

### For New Projects

1. Start with **[Production Checklists](resources/production-checklists.md)** - Validate all pre-deployment requirements
2. Use **[Decision Matrices](resources/decision-matrices.md)** - Select technology stack
3. Reference **[Project Planning Patterns](resources/project-planning-patterns.md)** - Design FTI pipeline
4. Implement with **[Common Design Patterns](resources/common-design-patterns.md)** - Copy-paste code examples
5. Avoid **[Anti-Patterns](resources/anti-patterns.md)** - Learn from common mistakes

### For Troubleshooting

1. Check **[Anti-Patterns](resources/anti-patterns.md)** - Identify failure modes and mitigations
2. Use **[Decision Matrices](resources/decision-matrices.md)** - Evaluate if architecture fits use case
3. Reference **[Common Design Patterns](resources/common-design-patterns.md)** - Verify implementation correctness

### For Ongoing Operations

1. Follow **[Production Checklists](resources/production-checklists.md)** - Weekly operational tasks
2. Integrate **[Evaluation Patterns](resources/eval-patterns.md)** - Continuous quality monitoring
3. Apply **[LLMOps Best Practices](resources/llmops-best-practices.md)** - Deployment and rollback procedures

---

## Navigation Summary

**Quick Decisions:** [Decision Matrices](resources/decision-matrices.md)
**Pre-Deployment:** [Production Checklists](resources/production-checklists.md)
**Planning:** [Project Planning Patterns](resources/project-planning-patterns.md)
**Implementation:** [Common Design Patterns](resources/common-design-patterns.md)
**Troubleshooting:** [Anti-Patterns](resources/anti-patterns.md)

**Domain Depth:** [LLMOps](resources/llmops-best-practices.md) | [Evaluation](resources/eval-patterns.md) | [Prompts](resources/prompt-engineering-patterns.md) | [Agents](resources/agentic-patterns.md) | [RAG](resources/rag-best-practices.md)

**Templates:** [templates/](templates/) - Copy-paste ready production code

**Sources:** [data/sources.json](data/sources.json) - Authoritative documentation links

---