--- name: ml-engineering description: Use when "deploying ML models", "MLOps", "model serving", "feature stores", "model monitoring", or asking about "PyTorch deployment", "TensorFlow production", "RAG systems", "LLM integration", "ML infrastructure" version: 1.0.0 --- # ML Engineering Guide Production-grade ML/AI systems, MLOps, and model deployment. ## When to Use - Deploying ML models to production - Building ML platforms and infrastructure - Implementing MLOps pipelines - Integrating LLMs into production systems - Setting up model monitoring and drift detection ## Tech Stack | Category | Tools | |----------|-------| | ML Frameworks | PyTorch, TensorFlow, Scikit-learn, XGBoost | | LLM Frameworks | LangChain, LlamaIndex, DSPy | | Data Tools | Spark, Airflow, dbt, Kafka, Databricks | | Deployment | Docker, Kubernetes, AWS/GCP/Azure | | Monitoring | MLflow, Weights & Biases, Prometheus | | Databases | PostgreSQL, BigQuery, Snowflake, Pinecone | ## Production Patterns ### Model Deployment Pipeline ```python # Model serving with FastAPI from fastapi import FastAPI import torch app = FastAPI() model = torch.load("model.pth") @app.post("/predict") async def predict(data: dict): tensor = preprocess(data) with torch.no_grad(): prediction = model(tensor) return {"prediction": prediction.tolist()} ``` ### Feature Store Integration ```python # Feast feature store from feast import FeatureStore store = FeatureStore(repo_path=".") features = store.get_online_features( features=["user_features:age", "user_features:location"], entity_rows=[{"user_id": 123}] ).to_dict() ``` ### Model Monitoring ```python # Drift detection from evidently import ColumnMapping from evidently.report import Report from evidently.metric_preset import DataDriftPreset report = Report(metrics=[DataDriftPreset()]) report.run(reference_data=ref_df, current_data=curr_df) ``` ## MLOps Best Practices ### Development - Test-driven development for ML pipelines - Version control models and data - Reproducible experiments with MLflow ### Production - A/B testing infrastructure - Canary deployments for models - Automated retraining pipelines - Model monitoring and drift detection ### Performance Targets | Metric | Target | |--------|--------| | P50 Latency | < 50ms | | P95 Latency | < 100ms | | P99 Latency | < 200ms | | Throughput | > 1000 RPS | | Availability | 99.9% | ## LLM Integration Patterns ### RAG System ```python # Basic RAG with LangChain from langchain.vectorstores import Pinecone from langchain.embeddings import OpenAIEmbeddings from langchain.chains import RetrievalQA vectorstore = Pinecone.from_existing_index( index_name="docs", embedding=OpenAIEmbeddings() ) qa = RetrievalQA.from_chain_type( llm=llm, retriever=vectorstore.as_retriever() ) ``` ### Prompt Management ```python # Structured prompts with DSPy import dspy class QA(dspy.Signature): """Answer questions based on context.""" context = dspy.InputField() question = dspy.InputField() answer = dspy.OutputField() qa = dspy.Predict(QA) ``` ## Common Commands ```bash # Development python -m pytest tests/ -v --cov python -m black src/ python -m pylint src/ # Training python scripts/train.py --config prod.yaml mlflow run . -P epochs=10 # Deployment docker build -t model:v1 . kubectl apply -f k8s/model-serving.yaml # Monitoring mlflow ui --port 5000 ``` ## Security & Compliance - Authentication for model endpoints - Data encryption (at rest & in transit) - PII handling and anonymization - GDPR/CCPA compliance - Model access audit logging