--- description: "Architecture overview of NeMo Curator's Ray-based distributed video curation system with autoscaling" categories: ["concepts-architecture"] tags: ["architecture", "distributed", "ray", "autoscaling", "video-curation", "performance"] personas: ["data-scientist-focused", "mle-focused", "admin-focused"] difficulty: "intermediate" content_type: "concept" modality: "video-only" only: not ga --- # Architecture NeMo Curator's video curation system builds on Ray, a distributed framework for scalable, high‑throughput data processing across machine clusters. ## Ray Foundation NeMo Curator leverages two essential Ray Core capabilities: - **Distributed Actor Management**: Creates and manages Ray actors across a cluster. Cosmos-Xenna supports per-stage runtime environments. In Curator today, per-stage `runtime_env` is not user-configurable through stage specs; the integration sets only limited executor-level environment variables. - **Ray Object Store and References**: Uses Ray's object store and data references to reduce data movement and increase throughput. ![Ray Architecture](/assets/images/stages-pipelines-diagram.png) ## Execution and Auto Scaling Curator runs pipelines through an executor. The Cosmos-Xenna executor (`XennaExecutor`) translates `ProcessingStage` definitions into Cosmos-Xenna stage specifications and runs them on Ray in either streaming or batch mode. During streaming execution, the auto-scaling mechanism: - Monitors each stage's throughput - Dynamically adjusts worker allocation - Optimizes pipeline performance by balancing resources across stages This dynamic scaling reduces bottlenecks and uses hardware efficiently for large-scale video curation tasks. Key executor configuration (actual keys): - `logging_interval`: Seconds between status logs (default: 60) - `ignore_failures`: Continue on failures (default: False) - `execution_mode`: "streaming" or "batch" (default: "streaming") - `cpu_allocation_percentage`: CPU allocation ratio (default: 0.95) - `autoscale_interval_s`: Auto-scaling interval in seconds (applies in streaming mode; default: 180) Use `Pipeline.describe()` to review stage resources and input/output requirements at a glance during development.