---
description: "Key abstractions in video curation including stages, pipelines, and execution modes for scalable processing"
categories: ["concepts-architecture"]
tags: ["abstractions", "pipeline", "stages", "video-curation", "distributed", "ray"]
personas: ["data-scientist-focused", "mle-focused"]
difficulty: "intermediate"
content_type: "concept"
modality: "video-only"
only: not ga
---

# Key Abstractions

NeMo Curator introduces core abstractions to organize and scale video curation workflows:

- **Pipelines**: Ordered sequences of stages forming an end-to-end workflow.
- **Stages**: Individual processing units that perform a single step (for example, reading, splitting, format conversion, filtering, embedding, captioning, writing).
- **Tasks**: The unit of data that flows through a pipeline (for video, `VideoTask` holding a `Video` and its `Clip` objects).
- **Executors**: Components that run pipelines on a backend (Ray) with automatic scaling.

![Stages and Pipelines](/assets/images/stages-pipelines-diagram.png)

## Pipelines

A pipeline orchestrates stages into an end-to-end workflow. Key characteristics:

- **Stage Sequence**: Stages must follow a logical order where each stage's output feeds into the next
- **Input Configuration**: Specifies the data source location
- **Stage Configuration**: Stages accept their own parameters, including model paths and algorithm settings
- **Execution Mode**: Supports streaming and batch processing through the executor

## Stages

A stage represents a single step in your data curation workflow. Video stages are organized into several functional categories:

- **Input/Output**: Read video files and write processed outputs to storage ([Save & Export Documentation](/curate-video/save-export))
- **Video Clipping**: Split videos into clips using fixed stride or scene-change detection ([Video Clipping Documentation](/curate-video/process-data/clipping))
- **Frame Extraction**: Extract frames from videos or clips for analysis and embeddings ([Frame Extraction Documentation](/curate-video/process-data/frame-extraction))
- **Embedding Generation**: Generate clip-level embeddings using Cosmos-Embed1 models ([Embeddings Documentation](/curate-video/process-data/embeddings))
- **Filtering**: Filter clips based on motion analysis and aesthetic quality scores ([Filtering Documentation](/curate-video/process-data/filtering))
- **Caption and Preview**: Generate captions and preview images from video clips ([Captions & Preview Documentation](/curate-video/process-data/captions-preview))
- **Deduplication**: Remove near-duplicate clips using embedding-based clustering ([Duplicate Removal Documentation](/curate-video/process-data/dedup))

### Stage Architecture

Each processing stage:

1. Inherits from `ProcessingStage`
2. Declares a stable `name` and `resources: Resources` (CPU cores, GPU memory, or multiple GPUs)
3. Defines `inputs()`/`outputs()` to document required attributes and produced attributes on tasks
4. Implements `setup(worker_metadata)` for model initialization and `process(task)` to transform tasks

This design enables map-style execution with executor-managed fault tolerance and dynamic scaling per stage. Stages can optionally provide `process_batch()` to support vectorized batch processing.

Composite stages provide a user-facing convenience API and decompose into one or more execution stages at build time.

```python
class MyStage(ProcessingStage[X, Y]):
    name: str = "..."
    resources: Resources = Resources(...)

    def inputs(self) -> tuple[list[str], list[str]]: ...
    def outputs(self) -> tuple[list[str], list[str]]: ...

    def setup(self, worker_metadata: WorkerMetadata | None = None) -> None: ...
    def process(self, task: X) -> Y | list[Y]: ...
```

Refer to the stage base and resources definitions in Curator for full details.

### Resource Semantics

`Resources` support both fractional and whole‑GPU semantics:

- `gpu_memory_gb`: Request a fraction of a single GPU by memory; Curator rounds to a fractional GPU share and enforces that `gpu_memory_gb` stays within one device.
- `gpus`: Request one or more GPUs for a stage that is GPU aware.

Choose one of `gpu_memory_gb` (single‑GPU fractional) or `gpus` (one or more full GPUs). Combining both is not allowed.

## Tasks

Video pipelines operate on task types defined in Curator:

- `VideoTask`: Wraps a single input `Video`
- `Video`: Holds decoded metadata, frames, and lists of `Clip`
- `Clip`: Holds buffers, extracted frames, embeddings, and caption windows

Stages transform tasks stage by stage (for example, `VideoReader` populates `Video`, splitting stages create `Clip` objects, embedding and captioning stages annotate clips, and writer stages persist outputs).

## Executors

Executors run pipelines on a backend. Curator uses [`XennaExecutor`](https://github.com/nvidia-cosmos/cosmos-xenna) to translate `ProcessingStage` definitions into Cosmos-Xenna stage specifications and run them on Ray with automatic scaling. Execution modes include streaming (default) and batch.