--- name: modal-deployment description: Run Python code in the cloud with serverless containers, GPUs, and autoscaling. Use when deploying ML models, running batch jobs, scheduling tasks, serving APIs with GPU acceleration, or scaling compute-intensive workloads. Triggers on requests for serverless GPU infrastructure, LLM inference, model training/fine-tuning, parallel data processing, cron jobs in the cloud, or deploying Python web endpoints. --- # Modal Modal is a serverless platform for running Python in the cloud with zero configuration. Define everything in code—no YAML, Docker, or Kubernetes required. ## Quick Start ```python import modal app = modal.App("my-app") @app.function() def hello(): return "Hello from Modal!" @app.local_entrypoint() def main(): print(hello.remote()) ``` Run: `modal run app.py` ## Core Concepts ### Functions Decorate Python functions to run remotely: ```python @app.function(gpu="H100", memory=32768, timeout=600) def train_model(data): # Runs on H100 GPU with 32GB RAM, 10min timeout return model.fit(data) ``` ### Images Define container environments via method chaining: ```python image = ( modal.Image.debian_slim(python_version="3.12") .apt_install("ffmpeg", "libsndfile1") .uv_pip_install("torch", "transformers", "numpy") .env({"CUDA_VISIBLE_DEVICES": "0"}) ) app = modal.App("ml-app", image=image) ``` Key image methods: - `.debian_slim()` / `.micromamba()` - Base images - `.uv_pip_install()` / `.pip_install()` - Python packages - `.apt_install()` - System packages - `.run_commands()` - Shell commands - `.add_local_python_source()` - Local modules - `.env()` - Environment variables ### GPUs Attach GPUs with a single parameter: ```python @app.function(gpu="H100") # Single H100 @app.function(gpu="A100-80GB") # 80GB A100 @app.function(gpu="H100:4") # 4x H100 @app.function(gpu=["H100", "A100-40GB:2"]) # Fallback options ``` Available: B200, H200, H100, A100-80GB, A100-40GB, L40S, L4, A10G, T4 ### Classes with Lifecycle Hooks Load models once at container startup: ```python @app.cls(gpu="L40S") class Model: @modal.enter() def load(self): self.model = load_pretrained("model-name") @modal.method() def predict(self, x): return self.model(x) # Usage Model().predict.remote(data) ``` ### Web Endpoints Deploy APIs instantly: ```python @app.function() @modal.fastapi_endpoint() def api(text: str): return {"result": process(text)} # For complex apps @app.function() @modal.asgi_app() def fastapi_app(): from fastapi import FastAPI web = FastAPI() @web.get("/health") def health(): return {"status": "ok"} return web ``` ### Volumes (Persistent Storage) ```python volume = modal.Volume.from_name("my-data", create_if_missing=True) @app.function(volumes={"/data": volume}) def save_file(content: str): with open("/data/output.txt", "w") as f: f.write(content) volume.commit() # Persist changes ``` ### Secrets ```python @app.function(secrets=[modal.Secret.from_name("my-api-key")]) def call_api(): import os key = os.environ["API_KEY"] ``` Create secrets: Dashboard or `modal secret create my-secret KEY=value` ### Dicts (Distributed Key-Value Store) ```python cache = modal.Dict.from_name("my-cache", create_if_missing=True) @app.function() def cached_compute(key: str): if key in cache: return cache[key] result = expensive_computation(key) cache[key] = result return result ``` ### Queues (Distributed FIFO) ```python queue = modal.Queue.from_name("task-queue", create_if_missing=True) @app.function() def producer(): queue.put_many([{"task": i} for i in range(10)]) @app.function() def consumer(): while task := queue.get(timeout=60): process(task) ``` ### Parallel Processing ```python # Map over inputs (auto-parallelized) results = list(process.map(items)) # Spawn async jobs calls = [process.spawn(item) for item in items] results = [call.get() for call in calls] # Batch processing (up to 1M inputs) process.spawn_map(range(100_000)) ``` ### Scheduling ```python @app.function(schedule=modal.Period(hours=1)) def hourly_job(): pass @app.function(schedule=modal.Cron("0 9 * * 1-5")) # 9am weekdays def daily_report(): pass ``` ## CLI Commands ```bash modal run app.py # Run locally-triggered function modal serve app.py # Hot-reload web endpoints modal deploy app.py # Deploy persistently modal shell app.py # Interactive shell in container modal app list # List deployed apps modal app logs # Stream logs modal volume list # List volumes modal secret list # List secrets ``` ## Common Patterns ### LLM Inference ```python @app.cls(gpu="H100", image=image) class LLM: @modal.enter() def load(self): from vllm import LLM self.llm = LLM("meta-llama/Llama-3-8B") @modal.method() def generate(self, prompt: str): return self.llm.generate(prompt) ``` ### Download Models at Build Time ```python def download_model(): from huggingface_hub import snapshot_download snapshot_download("model-id", local_dir="/models") image = ( modal.Image.debian_slim() .pip_install("huggingface-hub") .run_function(download_model) ) ``` ### Concurrency for I/O-bound Work ```python @app.function() @modal.concurrent(max_inputs=100) async def fetch_urls(url: str): async with aiohttp.ClientSession() as session: return await session.get(url) ``` ### Memory Snapshots (Faster Cold Starts) ```python @app.cls(enable_memory_snapshot=True, gpu="A10G") class FastModel: @modal.enter(snap=True) def load(self): self.model = load_model() # Snapshot this state ``` ## Autoscaling ```python @app.function( min_containers=2, # Always keep 2 warm max_containers=100, # Scale up to 100 buffer_containers=5, # Extra buffer for bursts scaledown_window=300, # Keep idle for 5 min ) def serve(): pass ``` ## Best Practices 1. **Put imports inside functions** when packages aren't installed locally 2. **Use `@modal.enter()`** for expensive initialization (model loading) 3. **Pin dependency versions** for reproducible builds 4. **Use Volumes** for model weights and persistent data 5. **Use memory snapshots** for sub-second cold starts in production 6. **Set appropriate timeouts** for long-running tasks 7. **Use `min_containers=1`** for production APIs to keep containers warm 8. **Use absolute imports** with full package paths (not relative imports) ### Fast Image Builds with uv_sync Use `.uv_sync()` instead of `.pip_install()` for faster dependency installation: ```python # In pyproject.toml, define dependency groups: # [dependency-groups] # modal = ["fastapi", "pydantic-ai>=1.0.0", "logfire"] image = ( modal.Image.debian_slim(python_version="3.12") .uv_sync("agent", groups=["modal"], frozen=False) .add_local_python_source("agent.src") # Use dot notation for packages ) ``` **Key points:** - Deploy from project root: `modal deploy agent/src/api.py` - Use dot notation in `.add_local_python_source("package.subpackage")` - Imports must match: `from agent.src.config import ...` (not relative `from .config`) ### Logfire Observability Add observability with Logfire (especially for pydantic-ai): ```python @app.cls(image=image, secrets=[..., modal.Secret.from_name("logfire")], min_containers=1) class Web: @modal.enter() def startup(self): import logfire logfire.configure(send_to_logfire="if-token-present", environment="production", service_name="my-agent") logfire.instrument_pydantic_ai() self.agent = create_agent() ``` ## Reference Documentation See `references/` for detailed guides on images, functions, GPUs, scaling, web endpoints, storage, dicts, queues, sandboxes, and networking. Official docs: https://modal.com/docs