---
name: ai-mlops
description: Production MLOps and ML/LLM/agent security skill for deploying and operating ML systems in production (registry + CI/CD, serving, monitoring/drift, evaluation loops, incident response/runbooks, and governance), including GenAI security (prompt injection, jailbreaks, RAG security, privacy, and supply chain).
---

# MLOps & ML Security - Complete Reference (Jan 2026)

Production ML lifecycle with **modern security practices**.

This skill covers:

- **Production**: Data ingestion, deployment, drift detection, monitoring, incident response
- **Security**: Prompt injection, jailbreak defense, RAG security, output filtering
- **Governance**: Privacy protection, supply chain security, safety evaluation

1. **Data ingestion** (dlt): Load data from APIs, databases to warehouses
2. **Model deployment**: Batch jobs, real-time APIs, hybrid systems, event-driven automation
3. **Operations**: Real-time monitoring, drift detection, automated retraining, incident response

**Modern Best Practices (Jan 2026)**:

- Version everything that can change: model artifacts, data snapshots, feature definitions, prompts/configs, and agent graphs; require reproducibility, rollbacks, and audit logs (NIST SSDF: https://csrc.nist.gov/pubs/sp/800/218/final).
- Gate changes with evals (offline + online) and safe rollout (shadow/canary/blue-green); treat regressions in quality, safety, latency, and cost as release blockers.
- Align controls and documentation to risk posture (EU AI Act: https://eur-lex.europa.eu/eli/reg/2024/1689/oj; NIST AI RMF + GenAI profile: https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-1.pdf, https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf).
- Operationalize security: threat model the full system (data, model, prompts, tools, RAG), harden the supply chain (SBOM/signing), and ship incident playbooks for both reliability and safety events.

It is execution-focused:

- Data ingestion patterns (REST APIs, database replication, incremental loading)
- Deployment patterns (batch, online, hybrid, streaming, event-driven)
- **Automated monitoring** with real-time drift detection
- **Automated retraining** pipelines (monitor → detect → trigger → validate → deploy)
- Incident handling with validated rollback and postmortems
- Links to copy-paste templates in `assets/`

## Quick Reference

| Task | Tool/Framework | Command | When to Use |
|------|----------------|---------|-------------|
| Data Ingestion | dlt (data load tool) | `dlt pipeline run`, `dlt init` | Loading from APIs, databases to warehouses |
| Batch Deployment | Airflow, Dagster, Prefect | `airflow dags trigger`, `dagster job launch` | Scheduled predictions on large datasets |
| API Deployment | FastAPI, Flask, TorchServe | `uvicorn app:app`, `torchserve --start` | Real-time inference (<500ms latency) |
| LLM Serving | vLLM, TGI, BentoML | `vllm serve model`, `bentoml serve` | High-throughput LLM inference |
| Model Registry | MLflow, W&B, ZenML | `mlflow.register_model()`, `zenml model register` | Versioning and promoting models |
| Drift Detection | Statistical tests + monitors | PSI/KS, embedding drift, prediction drift | Detect data/process changes and trigger review |
| Monitoring | Prometheus, Grafana | `prometheus.yml`, Grafana dashboards | Metrics, alerts, SLO tracking |
| AgentOps | AgentOps, Langfuse, LangSmith | `agentops.init()`, trace visualization | AI agent observability, session replay |
| Incident Response | Runbooks, PagerDuty | Documented playbooks, alert routing | Handling failures and degradation |

## Use This Skill When

Use this skill when the user asks for **deployment, operations, monitoring, incident handling, or governance** for ML/LLM/agent systems, e.g.:

- "How do I deploy this model to prod?"
- "Design a batch + online scoring architecture."
- "Add monitoring and drift detection to our model."
- "Write an incident runbook for this ML service."
- "Package this LLM/RAG pipeline as an API."
- "Plan our retraining and promotion workflow."
- "Load data from Stripe API to Snowflake."
- "Set up incremental database replication with dlt."
- "Build an ELT pipeline for warehouse loading."

If the user is asking only about **EDA, modelling, or theory**, prefer:

- `ai-ml-data-science` (EDA, features, modelling, SQL transformation with SQLMesh)
- `ai-llm` (prompting, fine-tuning, eval)
- `ai-rag` (retrieval pipeline design)
- `ai-llm-inference` (compression, spec decode, serving internals)

If the user is asking about **SQL transformation (after data is loaded)**, prefer:

- `ai-ml-data-science` (SQLMesh templates for staging, intermediate, marts layers)

## Decision Tree: Choosing Deployment Strategy

```text
User needs to deploy: [ML System]
    ├─ Data Ingestion?
    │   ├─ From REST APIs? → dlt REST API templates
    │   ├─ From databases? → dlt database sources (PostgreSQL, MySQL, MongoDB)
    │   └─ Incremental loading? → dlt incremental patterns (timestamp, ID-based)
    │
    ├─ Model Serving?
    │   ├─ Latency <500ms? → FastAPI real-time API
    │   ├─ Batch predictions? → Airflow/Dagster batch pipeline
    │   └─ Mix of both? → Hybrid (batch features + online scoring)
    │
    ├─ Monitoring & Ops?
    │   ├─ Drift detection? → Evidently + automated retraining triggers
    │   ├─ Performance tracking? → Prometheus + Grafana dashboards
    │   └─ Incident response? → Runbooks + PagerDuty alerts
    │
    └─ LLM/RAG Production?
        ├─ Cost optimization? → Caching, prompt templates, token budgets
        └─ Safety? → See ai-mlops skill
```

## Core Concepts (Vendor-Agnostic)

- **Lifecycle loop**: train → validate → deploy → monitor → respond → retrain/retire.
- **Risk controls**: access control, data minimization, logging, and change management (NIST AI RMF: https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-1.pdf).
- **Observability planes**: system metrics (latency/errors), data metrics (freshness/drift), quality metrics (model performance).
- **Incident readiness**: detection, containment, rollback, and root-cause analysis.

## Do / Avoid

**Do**
- Do gate deployments with repeatable checks: evaluation pass, load test, security review, rollback plan.
- Do version everything: code, data, features, model artifact, prompt templates, configuration.
- Do define SLOs and budgets (latency/cost/error rate) before optimizing.

**Avoid**
- Avoid manual “clickops” deployments without audit trail.
- Avoid silent upgrades; require eval + canary for model/prompt changes.
- Avoid drift dashboards without actions; every alert needs an owner and runbook.

## Core Patterns Overview

This skill provides production-ready patterns and guides organized into comprehensive references:

### Data & Infrastructure Patterns

**Pattern 0: Data Contracts, Ingestion & Lineage**
→ See [Data Ingestion Patterns](references/data-ingestion-patterns.md)

- Data contracts with SLAs and versioning
- Ingestion modes (CDC, batch, streaming)
- Lineage tracking and schema evolution
- Replay and backfill procedures

**Pattern 1: Choose Deployment Mode**
→ See [Deployment Patterns](references/deployment-patterns.md)

- Decision table (batch, online, hybrid, streaming)
- When to use each mode
- Deployment mode selection checklist

**Pattern 2: Standard Deployment Lifecycle**
→ See [Deployment Lifecycle](references/deployment-lifecycle.md)

- Pre-deploy, deploy, observe, operate, evolve phases
- Environment promotion (dev → staging → prod)
- Gradual rollout strategies (canary, blue-green)

**Pattern 3: Packaging & Model Registry**
→ See [Model Registry Patterns](references/model-registry-patterns.md)

- Model registry structure and metadata
- Packaging strategies (Docker, ONNX, MLflow)
- Promotion flows (experimental → production)
- Versioning and governance

### Serving Patterns

**Pattern 4: Batch Scoring Pipeline**
→ See [Deployment Patterns](references/deployment-patterns.md)

- Orchestration with Airflow/Dagster
- Idempotent scoring jobs
- Validation and backfill procedures

**Pattern 5: Real-Time API Scoring**
→ See [API Design Patterns](references/api-design-patterns.md)

- Service design (HTTP/JSON, gRPC)
- Input/output schemas
- Rate limiting, timeouts, circuit breakers

**Pattern 6: Hybrid & Feature Store Integration**
→ See [Feature Store Patterns](references/feature-store-patterns.md)

- Batch vs online features
- Feature store architecture
- Training-serving consistency
- Point-in-time correctness

### Operations Patterns

**Pattern 7: Monitoring & Alerting**
→ See [Monitoring Best Practices](references/monitoring-best-practices.md)

- Data, performance, and technical metrics
- SLO definition and tracking
- Dashboard design and alerting strategies

**Pattern 8: Drift Detection & Automated Retraining**
→ See [Drift Detection Guide](references/drift-detection-guide.md)

- Automated retraining triggers
- Event-driven retraining pipelines

**Pattern 9: Incidents & Runbooks**
→ See [Incident Response Playbooks](references/incident-response-playbooks.md)

- Common failure modes
- Detection, diagnosis, resolution
- Post-mortem procedures

**Pattern 10: LLM / RAG in Production**
→ See [LLM & RAG Production Patterns](references/llm-rag-production-patterns.md)

- Prompt and configuration management
- Safety and compliance (PII, jailbreaks)
- Cost optimization (token budgets, caching)
- Monitoring and fallbacks

**Pattern 11: Cross-Region, Residency & Rollback**
→ See [Multi-Region Patterns](references/multi-region-patterns.md)

- Multi-region deployment architectures
- Data residency and tenant isolation
- Disaster recovery and failover
- Regional rollback procedures

**Pattern 12: Online Evaluation & Feedback Loops**
→ See [Online Evaluation Patterns](references/online-evaluation-patterns.md)

- Feedback signal collection (implicit, explicit)
- Shadow and canary deployments
- A/B testing with statistical significance
- Human-in-the-loop labeling
- Automated retraining cadence

**Pattern 13: AgentOps (AI Agent Operations)**
→ See [AgentOps Patterns](references/agentops-patterns.md)

- Session tracing and replay for AI agents
- Cost and latency tracking across agent runs
- Multi-agent visualization and debugging
- Tool invocation monitoring
- Integration with CrewAI, LangGraph, OpenAI Agents SDK

**Pattern 14: Edge MLOps & TinyML**
→ See [Edge MLOps Patterns](references/edge-mlops-patterns.md)

- Device-aware CI/CD pipelines
- OTA model updates with rollback
- Federated learning operations
- Edge drift detection
- Intermittent connectivity handling

## Resources (Detailed Guides)

For comprehensive operational guides, see:

**Core Infrastructure:**

- **[Data Ingestion Patterns](references/data-ingestion-patterns.md)** - Data contracts, CDC, batch/streaming ingestion, lineage, schema evolution
- **[Deployment Lifecycle](references/deployment-lifecycle.md)** - Pre-deploy validation, environment promotion, gradual rollout, rollback
- **[Model Registry Patterns](references/model-registry-patterns.md)** - Versioning, packaging, promotion workflows, governance
- **[Feature Store Patterns](references/feature-store-patterns.md)** - Batch/online features, hybrid architectures, consistency, latency optimization

**Serving & APIs:**

- **[Deployment Patterns](references/deployment-patterns.md)** - Batch, online, hybrid, streaming deployment strategies and architectures
- **[API Design Patterns](references/api-design-patterns.md)** - ML/LLM/RAG API patterns, input/output schemas, reliability patterns, versioning

**Operations & Reliability:**

- **[Monitoring Best Practices](references/monitoring-best-practices.md)** - Metrics collection, alerting strategies, SLO definition, dashboard design
- **[Drift Detection Guide](references/drift-detection-guide.md)** - Statistical tests, automated detection, retraining triggers, recovery strategies
- **[Incident Response Playbooks](references/incident-response-playbooks.md)** - Runbooks for common failure modes, diagnostics, resolution steps

**Security & Governance:**

- **[Threat Models](references/threat-models.md)** - Trust boundaries, attack surface, control mapping
- **[Prompt Injection Mitigation](references/prompt-injection-mitigation.md)** - Input hardening, tool/RAG containment, least privilege
- **[Jailbreak Defense](references/jailbreak-defense.md)** - Robust refusal behavior, safe completion patterns
- **[RAG Security](references/rag-security.md)** - Retrieval poisoning, context injection, sensitive data leakage
- **[Output Filtering](references/output-filtering.md)** - Layered filters (PII/toxicity/policy), block/rewrite strategies
- **[Privacy Protection](references/privacy-protection.md)** - PII handling, data minimization, retention, consent
- **[Supply Chain Security](references/supply-chain-security.md)** - SBOM, dependency pinning, artifact signing
- **[Safety Evaluation](references/safety-evaluation.md)** - Red teaming, eval sets, incident readiness

**Advanced Patterns:**

- **[LLM & RAG Production Patterns](references/llm-rag-production-patterns.md)** - Prompt management, safety, cost optimization, caching, monitoring
- **[Multi-Region Patterns](references/multi-region-patterns.md)** - Multi-region deployment, data residency, disaster recovery, rollback
- **[Online Evaluation Patterns](references/online-evaluation-patterns.md)** - A/B testing, shadow deployments, feedback loops, automated retraining
- **[AgentOps Patterns](references/agentops-patterns.md)** - AI agent observability, session replay, cost tracking, multi-agent debugging
- **[Edge MLOps Patterns](references/edge-mlops-patterns.md)** - TinyML, federated learning, OTA updates, device-aware CI/CD

## Templates

Use these as copy-paste starting points for production artifacts:

### Data Ingestion (dlt)

For loading data into warehouses and pipelines:

- **[dlt basic pipeline setup](../data-lake-platform/assets/ingestion/dlt/template-dlt-pipeline.md)** - Install, configure, run basic extraction and loading
- **[dlt REST API sources](../data-lake-platform/assets/ingestion/dlt/template-dlt-rest-api.md)** - Extract from REST APIs with pagination, authentication, rate limiting
- **[dlt database sources](../data-lake-platform/assets/ingestion/dlt/template-dlt-database-source.md)** - Replicate from PostgreSQL, MySQL, MongoDB, SQL Server
- **[dlt incremental loading](../data-lake-platform/assets/ingestion/dlt/template-dlt-incremental.md)** - Timestamp-based, ID-based, merge/upsert patterns, lookback windows
- **[dlt warehouse loading](../data-lake-platform/assets/ingestion/dlt/template-dlt-warehouse-loading.md)** - Load to Snowflake, BigQuery, Redshift, Postgres, DuckDB

**Use dlt when:**

- Loading data from APIs (Stripe, HubSpot, Shopify, custom APIs)
- Replicating databases to warehouses
- Building ELT pipelines with incremental loading
- Managing data ingestion with Python

**For SQL transformation (after ingestion), use:**

→ `ai-ml-data-science` skill (SQLMesh templates for staging/intermediate/marts layers)

### Deployment & Packaging

- **[Deployment & MLOps template](assets/deployment/template-deployment-mlops.md)** - Complete MLOps lifecycle, model registry, promotion workflows
- **[Deployment readiness checklist](assets/deployment/deployment-readiness-checklist.md)** - Go/No-Go gate, monitoring, and rollback plan
- **[API service template](assets/deployment/template-api-service.md)** - Real-time REST/gRPC API with FastAPI, input validation, rate limiting
- **[Batch scoring pipeline template](assets/deployment/template-batch-pipeline.md)** - Orchestrated batch inference with Airflow/Dagster, validation, backfill

### Monitoring & Operations

- **[Monitoring & alerting template](assets/monitoring/template-monitoring-plan.md)** - Data/performance/technical metrics, dashboards, SLO definition
- **[Drift detection & retraining template](assets/monitoring/template-drift-retraining.md)** - Automated drift detection, retraining triggers, promotion pipelines
- **[Incident runbook template](assets/ops/template-incident-runbook.md)** - Failure mode playbooks, diagnosis steps, resolution procedures

## Navigation

**Resources**
- [references/drift-detection-guide.md](references/drift-detection-guide.md)
- [references/model-registry-patterns.md](references/model-registry-patterns.md)
- [references/online-evaluation-patterns.md](references/online-evaluation-patterns.md)
- [references/monitoring-best-practices.md](references/monitoring-best-practices.md)
- [references/llm-rag-production-patterns.md](references/llm-rag-production-patterns.md)
- [references/api-design-patterns.md](references/api-design-patterns.md)
- [references/incident-response-playbooks.md](references/incident-response-playbooks.md)
- [references/deployment-patterns.md](references/deployment-patterns.md)
- [references/data-ingestion-patterns.md](references/data-ingestion-patterns.md)
- [references/deployment-lifecycle.md](references/deployment-lifecycle.md)
- [references/feature-store-patterns.md](references/feature-store-patterns.md)
- [references/multi-region-patterns.md](references/multi-region-patterns.md)
- [references/agentops-patterns.md](references/agentops-patterns.md)
- [references/edge-mlops-patterns.md](references/edge-mlops-patterns.md)

**Templates**
- [template-dlt-pipeline.md](../data-lake-platform/assets/ingestion/dlt/template-dlt-pipeline.md)
- [template-dlt-rest-api.md](../data-lake-platform/assets/ingestion/dlt/template-dlt-rest-api.md)
- [template-dlt-database-source.md](../data-lake-platform/assets/ingestion/dlt/template-dlt-database-source.md)
- [template-dlt-incremental.md](../data-lake-platform/assets/ingestion/dlt/template-dlt-incremental.md)
- [template-dlt-warehouse-loading.md](../data-lake-platform/assets/ingestion/dlt/template-dlt-warehouse-loading.md)
- [assets/deployment/template-deployment-mlops.md](assets/deployment/template-deployment-mlops.md)
- [assets/deployment/deployment-readiness-checklist.md](assets/deployment/deployment-readiness-checklist.md)
- [assets/deployment/template-api-service.md](assets/deployment/template-api-service.md)
- [assets/deployment/template-batch-pipeline.md](assets/deployment/template-batch-pipeline.md)
- [assets/ops/template-incident-runbook.md](assets/ops/template-incident-runbook.md)
- [assets/monitoring/template-drift-retraining.md](assets/monitoring/template-drift-retraining.md)
- [assets/monitoring/template-monitoring-plan.md](assets/monitoring/template-monitoring-plan.md)

**Data**
- [data/sources.json](data/sources.json) - Curated external references

## External Resources

See `data/sources.json` for curated references on:

- Serving frameworks (FastAPI, Flask, gRPC, TorchServe, KServe, Ray Serve)
- Orchestration (Airflow, Dagster, Prefect)
- Model registries and MLOps (MLflow, W&B, Vertex AI, Sagemaker)
- Monitoring and observability (Prometheus, Grafana, OpenTelemetry, Evidently)
- Feature stores (Feast, Tecton, Vertex, Databricks)
- Streaming & messaging (Kafka, Pulsar, Kinesis)
- LLMOps & RAG infra (vector DBs, LLM gateways, safety tools)

## Data Lake & Lakehouse

For comprehensive data lake/lakehouse patterns (beyond dlt ingestion), see **[data-lake-platform](../data-lake-platform/SKILL.md)**:

- **Table formats:** Apache Iceberg, Delta Lake, Apache Hudi
- **Query engines:** ClickHouse, DuckDB, Apache Doris, StarRocks
- **Alternative ingestion:** Airbyte (GUI-based connectors)
- **Transformation:** dbt (alternative to SQLMesh)
- **Streaming:** Apache Kafka patterns
- **Orchestration:** Dagster, Airflow

This skill focuses on **ML-specific deployment, monitoring, and security**. Use data-lake-platform for general-purpose data infrastructure.

## Recency Protocol (Tooling Recommendations)

When users ask recommendation questions about MLOps tooling, verify recency before answering.

### Trigger Conditions

- "What's the best MLOps platform for [use case]?"
- "What should I use for [deployment/monitoring/drift detection]?"
- "What's the latest in MLOps?"
- "Current best practices for [model registry/feature store/observability]?"
- "Is [MLflow/Kubeflow/Vertex AI] still relevant in 2026?"
- "[MLOps tool A] vs [MLOps tool B]?"
- "Best way to deploy [LLM/ML model] to production?"
- "What feature store should I use?"

### Minimal Recency Check

1. Start from `data/sources.json` and prefer sources with `add_as_web_search: true`.
2. If web search or browsing is available, confirm at least: (a) the tool’s latest release/docs date, (b) active maintenance signals, (c) a recent comparison/alternatives post.
3. If live search is not available, state that you are relying on static knowledge + `data/sources.json`, and recommend validation steps (POC + evals + rollout plan).

### What to Report

After searching, provide:

- **Current landscape**: What MLOps tools/platforms are popular NOW
- **Emerging trends**: New approaches gaining traction (LLMOps, GenAI ops)
- **Deprecated/declining**: Tools or approaches losing relevance
- **Recommendation**: Based on fresh data, not just static knowledge

## Related Skills

For adjacent topics, reference these skills:

- **[ai-ml-data-science](../ai-ml-data-science/SKILL.md)** - EDA, feature engineering, modelling, evaluation, SQLMesh transformations
- **[ai-llm](../ai-llm/SKILL.md)** - Prompting, fine-tuning, evaluation for LLMs
- **[ai-agents](../ai-agents/SKILL.md)** - Agentic workflows, multi-agent systems, LLMOps
- **[ai-rag](../ai-rag/SKILL.md)** - RAG pipeline design, chunking, retrieval, evaluation
- **[ai-llm-inference](../ai-llm-inference/SKILL.md)** - Model serving optimization, quantization, batching
- **[ai-prompt-engineering](../ai-prompt-engineering/SKILL.md)** - Prompt design patterns and best practices
- **[data-lake-platform](../data-lake-platform/SKILL.md)** - Data lake/lakehouse infrastructure (ClickHouse, Iceberg, Kafka)

Use this skill to **turn trained models into reliable services**, not to derive the model itself.