--- name: "daft-udf-tuning" description: "Optimize Daft UDF performance. Invoke when user needs GPU inference, encounters slow UDFs, or asks about async/batch processing." --- # Daft UDF Tuning Optimize User-Defined Functions for performance. ## UDF Types | Type | Decorator | Use Case | |---|---|---| | **Stateless** | `@daft.func` | Simple transforms. Use `async` for I/O-bound tasks. | | **Stateful** | `@daft.cls` | Expensive init (e.g., loading models). Supports `gpus=N`. | | **Batch** | `@daft.func.batch` | Vectorized CPU/GPU ops (NumPy/PyTorch). Faster. | ## Quick Recipes ### 1. Async I/O (Web APIs) ```python @daft.func async def fetch(url: str): async with aiohttp.ClientSession() as s: return await s.get(url).text() ``` ### 2. GPU Batch Inference (PyTorch/Models) ```python @daft.cls(gpus=1) class Classifier: def __init__(self): self.model = load_model().cuda() # Run once per worker @daft.method.batch(batch_size=32) def predict(self, images): return self.model(images.to_pylist()) # Run with concurrency df.with_column("preds", Classifier(max_concurrency=4).predict(df["img"])) ``` ## Tuning Keys - **`max_concurrency`**: Total parallel UDF instances. - **`gpus=N`**: GPU request per instance. - **`batch_size`**: Rows per call. Too small = overhead; too big = OOM. - **`into_batches(N)`**: Pre-slice partitions if memory is tight.