---
name: modal-knowledge
description: Comprehensive Modal.com platform knowledge covering all features, pricing, and best practices
---

# Modal Knowledge Skill

Comprehensive Modal.com platform knowledge covering all features, pricing, and best practices. Activate this skill when users need detailed information about Modal's serverless cloud platform.

## Activation Triggers

Activate this skill when users ask about:
- Modal.com platform features and capabilities
- GPU-accelerated Python functions
- Serverless container configuration
- Modal pricing and billing
- Modal CLI commands
- Web endpoints and APIs on Modal
- Scheduled/cron jobs on Modal
- Modal volumes, secrets, and storage
- Parallel processing with Modal
- Modal deployment and CI/CD

---

## Platform Overview

Modal is a serverless cloud platform for running Python code, optimized for AI/ML workloads with:

- **Zero Configuration**: Everything defined in Python code
- **Fast GPU Startup**: ~1 second container spin-up
- **Automatic Scaling**: Scale to zero, scale to thousands
- **Per-Second Billing**: Only pay for active compute
- **Multi-Cloud**: AWS, GCP, Oracle Cloud Infrastructure

---

## Core Components Reference

### Apps and Functions

```python
import modal

app = modal.App("app-name")

@app.function()
def basic_function(arg: str) -> str:
    return f"Result: {arg}"

@app.local_entrypoint()
def main():
    result = basic_function.remote("test")
    print(result)
```

### Function Decorator Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `image` | Image | Container image configuration |
| `gpu` | str/list | GPU type(s): "T4", "A100", ["H100", "A100"] |
| `cpu` | float | CPU cores (0.125 to 64) |
| `memory` | int | Memory in MB (128 to 262144) |
| `timeout` | int | Max execution seconds |
| `retries` | int | Retry attempts on failure |
| `secrets` | list | Secrets to inject |
| `volumes` | dict | Volume mount points |
| `schedule` | Cron/Period | Scheduled execution |
| `concurrency_limit` | int | Max concurrent executions |
| `container_idle_timeout` | int | Seconds to keep warm |
| `include_source` | bool | Auto-sync source code |

---

## GPU Reference

### Available GPUs

| GPU | Memory | Use Case | ~Cost/hr |
|-----|--------|----------|----------|
| T4 | 16 GB | Small inference | $0.59 |
| L4 | 24 GB | Medium inference | $0.80 |
| A10G | 24 GB | Inference/fine-tuning | $1.10 |
| L40S | 48 GB | Heavy inference | $1.50 |
| A100-40GB | 40 GB | Training | $2.00 |
| A100-80GB | 80 GB | Large models | $3.00 |
| H100 | 80 GB | Cutting-edge | $5.00 |
| H200 | 141 GB | Largest models | $5.00 |
| B200 | 180+ GB | Latest gen | $6.25 |

### GPU Configuration

```python
# Single GPU
@app.function(gpu="A100")

# Specific memory variant
@app.function(gpu="A100-80GB")

# Multi-GPU
@app.function(gpu="H100:4")

# Fallbacks (tries in order)
@app.function(gpu=["H100", "A100", "any"])

# "any" = L4, A10G, or T4
@app.function(gpu="any")
```

---

## Image Building

### Base Images

```python
# Debian slim (recommended)
modal.Image.debian_slim(python_version="3.11")

# From Dockerfile
modal.Image.from_dockerfile("./Dockerfile")

# From Docker registry
modal.Image.from_registry("nvidia/cuda:12.1.0-base-ubuntu22.04")
```

### Package Installation

```python
# pip (standard)
image.pip_install("torch", "transformers")

# uv (FASTER - 10-100x)
image.uv_pip_install("torch", "transformers")

# System packages
image.apt_install("ffmpeg", "libsm6")

# Shell commands
image.run_commands("apt-get update", "make install")
```

### Adding Files

```python
# Single file
image.add_local_file("./config.json", "/app/config.json")

# Directory
image.add_local_dir("./models", "/app/models")

# Python source
image.add_local_python_source("my_module")

# Environment variables
image.env({"VAR": "value"})
```

### Build-Time Function

```python
def download_model():
    from huggingface_hub import snapshot_download
    snapshot_download("model-name")

image.run_function(download_model, secrets=[...])
```

---

## Storage

### Volumes

```python
# Create/reference volume
vol = modal.Volume.from_name("my-vol", create_if_missing=True)

# Mount in function
@app.function(volumes={"/data": vol})
def func():
    # Read/write to /data
    vol.commit()  # Persist changes
```

### Secrets

```python
# From dashboard (recommended)
modal.Secret.from_name("secret-name")

# From dictionary
modal.Secret.from_dict({"KEY": "value"})

# From local env
modal.Secret.from_local_environ(["KEY1", "KEY2"])

# From .env file
modal.Secret.from_dotenv()

# Usage
@app.function(secrets=[modal.Secret.from_name("api-keys")])
def func():
    import os
    key = os.environ["API_KEY"]
```

### Dict and Queue

```python
# Distributed dict
d = modal.Dict.from_name("cache", create_if_missing=True)
d["key"] = "value"
d.put("key", "value", ttl=3600)

# Distributed queue
q = modal.Queue.from_name("jobs", create_if_missing=True)
q.put("task")
item = q.get()
```

---

## Web Endpoints

### FastAPI Endpoint (Simple)

```python
@app.function()
@modal.fastapi_endpoint()
def hello(name: str = "World"):
    return {"message": f"Hello, {name}!"}
```

### ASGI App (Full FastAPI)

```python
from fastapi import FastAPI
web_app = FastAPI()

@web_app.post("/predict")
def predict(text: str):
    return {"result": process(text)}

@app.function()
@modal.asgi_app()
def fastapi_app():
    return web_app
```

### WSGI App (Flask)

```python
from flask import Flask
flask_app = Flask(__name__)

@app.function()
@modal.wsgi_app()
def flask_endpoint():
    return flask_app
```

### Custom Web Server

```python
@app.function()
@modal.web_server(port=8000)
def custom_server():
    subprocess.run(["python", "-m", "http.server", "8000"])
```

### Custom Domains

```python
@modal.asgi_app(custom_domains=["api.example.com"])
```

---

## Scheduling

### Cron

```python
# Daily at 8 AM UTC
@app.function(schedule=modal.Cron("0 8 * * *"))

# With timezone
@app.function(schedule=modal.Cron("0 6 * * *", timezone="America/New_York"))
```

### Period

```python
@app.function(schedule=modal.Period(hours=5))
@app.function(schedule=modal.Period(days=1))
```

**Note:** Scheduled functions only run with `modal deploy`, not `modal run`.

---

## Parallel Processing

### Map

```python
# Parallel execution (up to 1000 concurrent)
results = list(func.map(items))

# Unordered (faster)
results = list(func.map(items, order_outputs=False))
```

### Starmap

```python
# Spread args
pairs = [(1, 2), (3, 4)]
results = list(add.starmap(pairs))
```

### Spawn

```python
# Async job (returns immediately)
call = func.spawn(data)
result = call.get()  # Get result later

# Spawn many
calls = [func.spawn(item) for item in items]
results = [call.get() for call in calls]
```

---

## Container Lifecycle (Classes)

```python
@app.cls(gpu="A100", container_idle_timeout=300)
class Server:

    @modal.enter()
    def load(self):
        self.model = load_model()

    @modal.method()
    def predict(self, text):
        return self.model(text)

    @modal.exit()
    def cleanup(self):
        del self.model
```

### Concurrency

```python
@modal.concurrent(max_inputs=100, target_inputs=80)
@modal.method()
def batched(self, item):
    pass
```

---

## CLI Commands

### Development

```bash
modal run app.py              # Run function
modal serve app.py            # Hot-reload dev server
modal shell app.py            # Interactive shell
modal shell app.py --gpu A100 # Shell with GPU
```

### Deployment

```bash
modal deploy app.py           # Deploy
modal app list                # List apps
modal app logs app-name       # View logs
modal app stop app-name       # Stop app
```

### Resources

```bash
# Volumes
modal volume create name
modal volume list
modal volume put name local remote
modal volume get name remote local

# Secrets
modal secret create name KEY=value
modal secret list

# Environments
modal environment create staging
```

---

## Pricing (2025)

### Plans

| Plan | Price | Containers | GPU Concurrency |
|------|-------|------------|-----------------|
| Starter | Free ($30 credits) | 100 | 10 |
| Team | $250/month | 1000 | 50 |
| Enterprise | Custom | Unlimited | Custom |

### Compute

- **CPU**: $0.0000131/core/sec
- **Memory**: $0.00000222/GiB/sec
- **GPUs**: See GPU table above

### Special Programs

- Startups: Up to $25k credits
- Researchers: Up to $10k credits

---

## Best Practices

1. **Use `@modal.enter()`** for model loading
2. **Use `uv_pip_install`** for faster builds
3. **Use GPU fallbacks** for availability
4. **Set appropriate timeouts** and retries
5. **Use environments** (dev/staging/prod)
6. **Download models during build**, not runtime
7. **Use `order_outputs=False`** when order doesn't matter
8. **Set `container_idle_timeout`** to balance cost/latency
9. **Monitor costs** in Modal dashboard
10. **Test with `modal run`** before `modal deploy`

---

## Common Patterns

### LLM Inference

```python
@app.cls(gpu="A100", container_idle_timeout=300)
class LLM:
    @modal.enter()
    def load(self):
        from vllm import LLM
        self.llm = LLM(model="...")

    @modal.method()
    def generate(self, prompt):
        return self.llm.generate([prompt])
```

### Batch Processing

```python
@app.function(volumes={"/data": vol})
def process(file):
    # Process file
    vol.commit()

# Parallel
results = list(process.map(files))
```

### Scheduled ETL

```python
@app.function(
    schedule=modal.Cron("0 6 * * *"),
    secrets=[modal.Secret.from_name("db")]
)
def daily_etl():
    extract()
    transform()
    load()
```

---

## Quick Reference

| Task | Code |
|------|------|
| Create app | `app = modal.App("name")` |
| Basic function | `@app.function()` |
| With GPU | `@app.function(gpu="A100")` |
| With image | `@app.function(image=img)` |
| Web endpoint | `@modal.asgi_app()` |
| Scheduled | `schedule=modal.Cron("...")` |
| Mount volume | `volumes={"/path": vol}` |
| Use secret | `secrets=[modal.Secret.from_name("x")]` |
| Parallel map | `func.map(items)` |
| Async spawn | `func.spawn(arg)` |
| Class pattern | `@app.cls()` with `@modal.enter()` |