--- name: modal description: Use when "Modal", "serverless GPU", "cloud GPU", "deploy ML model", or asking about "serverless containers", "GPU compute", "batch processing", "scheduled jobs", "autoscaling ML" version: 1.0.0 --- # Modal Serverless Cloud Platform Serverless Python execution with GPUs, autoscaling, and pay-per-use compute. ## When to Use - Deploy and serve ML models (LLMs, image generation) - Run GPU-accelerated computation - Batch process large datasets in parallel - Schedule compute-intensive jobs - Build serverless APIs with autoscaling ## Quick Start ```bash # Install pip install modal # Authenticate modal token new ``` ```python import modal app = modal.App("my-app") @app.function() def hello(): return "Hello from Modal!" # Run with: modal run script.py ``` ## Container Images ```python # Build image with dependencies image = ( modal.Image.debian_slim(python_version="3.12") .pip_install("torch", "transformers", "numpy") ) app = modal.App("ml-app", image=image) ``` ## GPU Functions ```python @app.function(gpu="H100") def train_model(): import torch assert torch.cuda.is_available() # GPU code here # Available GPUs: T4, L4, A10, A100, L40S, H100, H200, B200 # Multi-GPU: gpu="H100:8" ``` ## Web Endpoints ```python @app.function() @modal.web_endpoint(method="POST") def predict(data: dict): result = model.predict(data["input"]) return {"prediction": result} # Deploy: modal deploy script.py ``` ## Scheduled Jobs ```python @app.function(schedule=modal.Cron("0 2 * * *")) # Daily at 2 AM def daily_backup(): pass @app.function(schedule=modal.Period(hours=4)) # Every 4 hours def refresh_cache(): pass ``` ## Autoscaling ```python @app.function() def process_item(item_id: int): return analyze(item_id) @app.local_entrypoint() def main(): items = range(1000) # Automatically parallelized across containers results = list(process_item.map(items)) ``` ## Persistent Storage ```python volume = modal.Volume.from_name("my-data", create_if_missing=True) @app.function(volumes={"/data": volume}) def save_results(data): with open("/data/results.txt", "w") as f: f.write(data) volume.commit() # Persist changes ``` ## Secrets Management ```python @app.function(secrets=[modal.Secret.from_name("huggingface")]) def download_model(): import os token = os.environ["HF_TOKEN"] ``` ## ML Model Serving ```python @app.cls(gpu="L40S") class Model: @modal.enter() def load_model(self): from transformers import pipeline self.pipe = pipeline("text-classification", device="cuda") @modal.method() def predict(self, text: str): return self.pipe(text) @app.local_entrypoint() def main(): model = Model() result = model.predict.remote("Modal is great!") ``` ## Resource Configuration ```python @app.function( cpu=8.0, # 8 CPU cores memory=32768, # 32 GiB RAM ephemeral_disk=10240, # 10 GiB disk timeout=3600 # 1 hour timeout ) def memory_intensive_task(): pass ``` ## Best Practices 1. **Pin dependencies** for reproducible builds 2. **Use appropriate GPU types** - L40S for inference, H100 for training 3. **Leverage caching** via Volumes for model weights 4. **Use `.map()` for parallel processing** 5. **Import packages inside functions** if not available locally 6. **Store secrets securely** - never hardcode API keys ## vs Alternatives | Platform | Best For | |----------|----------| | **Modal** | Serverless GPUs, autoscaling, Python-native | | RunPod | GPU rental, long-running jobs | | AWS Lambda | CPU workloads, AWS ecosystem | | Replicate | Model hosting, simple deployments | ## Resources - Docs: - Examples: