---
description: "Step-by-step guide to installing Curator and running your first video curation pipeline"
categories: ["getting-started"]
tags: ["video-curation", "installation", "quickstart", "gpu-accelerated", "ray", "python"]
personas: ["data-scientist-focused", "mle-focused"]
difficulty: "beginner"
content_type: "tutorial"
modality: "video-only"
---
# Get Started with Video Curation
This guide shows how to install Curator and run your first video curation pipeline.
The [example pipeline](#run-the-splitting-pipeline-example) processes a list of videos, splitting each into 10‑second clips using a fixed stride. It then generates clip‑level embeddings for downstream tasks such as duplicate removal and similarity search.
## Overview
This quickstart guide demonstrates how to:
1. **Install NeMo Curator** with video processing support
2. **Set up FFmpeg** with GPU-accelerated encoding
3. **Configure embedding models** (Cosmos-Embed1)
4. **Process videos** through a complete splitting and embedding pipeline
5. **Generate outputs** ready for duplicate removal, captioning, and model training
**What you build:** A video processing pipeline that:
- Splits videos into 10-second clips using fixed stride or scene detection
- Generates clip-level embeddings for similarity search and deduplication
- Optionally creates captions and preview images
- Outputs results in formats compatible with multimodal training workflows
## Prerequisites
### System Requirements
To use NeMo Curator's video curation capabilities, ensure your system meets these requirements:
#### Operating System
* **Ubuntu 24.04, 22.04, or 20.04** (required for GPU-accelerated video processing)
* Other Linux distributions may work but are not officially supported
#### Python Environment
* **Python 3.10, 3.11, or 3.12**
* **uv package manager** for dependency management
* **Git** for model and repository dependencies
#### GPU Requirements
* **NVIDIA GPU required** (CPU-only mode not supported for video processing)
* **Architecture**: Volta™ or newer (compute capability 7.0+)
- Examples: V100, T4, RTX 2080+, A100, H100
* **CUDA**: Version 12.0 or above
* **VRAM**: Minimum requirements by configuration:
- Basic splitting + embedding: ~16GB VRAM
- Full pipeline (splitting + embedding + captioning): ~38GB VRAM
- Reduced configuration (lower batch sizes, FP8): ~21GB VRAM
#### Software Dependencies
* **FFmpeg 8.0+** with one of the following encoders:
- GPU encoder: `h264_nvenc` (recommended for performance; requires an NVENC-equipped GPU — note that A100 and H100 do **not** include NVENC)
- CPU encoder: `libvpx-vp9` (for non-NVENC GPUs; produces VP9 in `.mp4`)
If `uv` is not installed, refer to the [Installation Guide](/get-started/installation) for setup instructions, or install it quickly with:
```bash
curl -LsSf https://astral.sh/uv/0.8.22/install.sh | sh
source $HOME/.local/bin/env
```
---
## Install
Create and activate a virtual environment, then choose an install option:
```bash
uv pip install torch wheel_stub psutil setuptools setuptools_scm
uv pip install --no-build-isolation "nemo-curator[video_cuda12]"
```
```bash
git clone https://github.com/NVIDIA-NeMo/Curator.git
cd Curator
uv sync --extra video_cuda12 --all-groups
source .venv/bin/activate
```
NeMo Curator is available as a standalone container:
```bash
# Pull the container
docker pull nvcr.io/nvidia/nemo-curator:{{ container_version }}
# Run the container
docker run --gpus all -it --rm nvcr.io/nvidia/nemo-curator:{{ container_version }}
```
For details on container environments and configurations, see [Container Environments](/reference/infra/container-environments).
## Install FFmpeg and Encoders
Curator’s video pipelines rely on `FFmpeg` for decoding and encoding. If you plan to encode clips (using `--transcode-encoder h264_nvenc` or `--transcode-encoder libvpx-vp9`), install `FFmpeg` with NVENC and `libvpx-vp9` support. The maintained install script bundles both.
Use the maintained script in the repository to build and install `FFmpeg` with NVIDIA NVENC and `libvpx-vp9` support. The script enables `--enable-cuda-nvcc`, `--enable-libnpp`, and `--enable-libvpx`.
- Script source: [docker/common/install_ffmpeg.sh](https://github.com/NVIDIA-NeMo/Curator/blob/main/docker/common/install_ffmpeg.sh)
```bash
curl -fsSL https://raw.githubusercontent.com/NVIDIA-NeMo/Curator/main/docker/common/install_ffmpeg.sh -o install_ffmpeg.sh
chmod +x install_ffmpeg.sh
sudo bash install_ffmpeg.sh
```
Confirm that `FFmpeg` is on your `PATH` and that at least one supported encoder is available:
```bash
ffmpeg -hide_banner -version | head -n 5
ffmpeg -encoders | grep -E "h264_nvenc|libvpx-vp9" | cat
```
If encoders are missing, reinstall `FFmpeg` with the required options or use the Debian/Ubuntu script above.
**Processing H.264/HEVC/AV1 inputs? You might still need a software decoder — even with NVENC/NVDEC.**
Curator runs `ffprobe` inside CPU-only Ray actors (`VideoReader`, `ClipWriter`) for metadata extraction. Those actors can't open NVDEC decoders, so without a software h264/hevc/av1 decoder your inputs are silently skipped (`SoftwareCodecMissingError` in the logs).
Run the bundled installer inside the container to add software decoder support — no image rebuild needed:
```bash
bash /opt/Curator/docker/common/install_h264_support.sh
```
See [Software H.264/HEVC/AV1 Codec Support](/get-started/installation#software-h264hevcav1-codec-support-advanced) for the full picture.
Refer to [Clip Encoding](/curate-video/process-data/transcoding) to choose encoders and verify NVENC support on your system.
### Available Models
Embeddings convert each video clip into a numeric vector that captures visual and semantic content. Curator uses these vectors to:
- Remove near-duplicate clips during duplicate removal
- Enable similarity search and clustering
- Support downstream analysis such as caption verification
NeMo Curator supports two embedding model families:
#### Cosmos-Embed1 (Recommended)
**Cosmos-Embed1 (default)**: Available in three variants—**cosmos-embed1-224p**, **cosmos-embed1-336p**, and **cosmos-embed1-448p**—which differ in input resolution and accuracy/VRAM tradeoff. All variants are automatically downloaded to `MODEL_DIR` on first run.
| Model Variant | Resolution | VRAM Usage | Speed | Accuracy | Best For |
|---------------|------------|------------|-------|----------|----------|
| **cosmos-embed1-224p** | 224×224 | ~8GB | Fastest | Good | Large-scale processing, initial curation |
| **cosmos-embed1-336p** | 336×336 | ~12GB | Medium | Better | Balanced performance and quality |
| **cosmos-embed1-448p** | 448×448 | ~16GB | Slower | Best | High-quality embeddings, fine-grained matching |
**Model links:**
- [cosmos-embed1-224p on Hugging Face](https://huggingface.co/nvidia/cosmos-embed1-224p)
- [cosmos-embed1-336p on Hugging Face](https://huggingface.co/nvidia/cosmos-embed1-336p)
- [cosmos-embed1-448p on Hugging Face](https://huggingface.co/nvidia/cosmos-embed1-448p)
For this quickstart, the following steps set up support for **Cosmos-Embed1-224p**.
### Prepare Model Weights
For most use cases, you only need to create a model directory. The required model files will be downloaded automatically on first run.
1. Create a model directory:
```bash
mkdir -p "$MODEL_DIR"
```
You can reuse the same `` across runs.
2. No additional setup is required. The model will be downloaded automatically when first used.
## Set Up Data Directories
Organize input videos and output locations before running the pipeline.
- **Local**: For local file processing. Define paths like:
```bash
DATA_DIR=/path/to/videos
OUT_DIR=/path/to/output_clips
MODEL_DIR=/path/to/models
```
- **S3**: For cloud storage (AWS S3, MinIO, etc.). Configure credentials in `~/.aws/credentials` and use `s3://` paths for `--video-dir` and `--output-clip-path`.
**S3 usage notes:**
- Input videos can be read from S3 paths
- Output clips can be written to S3 paths
- Model directory should remain local for performance
- Ensure IAM permissions allow read/write access to specified buckets
## Run the Splitting Pipeline Example
Use the example script from https://github.com/NVIDIA-NeMo/Curator/tree/main/tutorials/video/getting-started to read videos, split into clips, and write outputs. This runs a Ray pipeline with `XennaExecutor` under the hood.
```bash
python tutorials/video/getting-started/video_split_clip_example.py \
--video-dir "$DATA_DIR" \
--model-dir "$MODEL_DIR" \
--output-clip-path "$OUT_DIR" \
--splitting-algorithm fixed_stride \
--fixed-stride-split-duration 10.0 \
--embedding-algorithm cosmos-embed1-224p \
--transcode-encoder h264_nvenc \
--verbose
```
**What this command does:**
1. Reads all video files from `$DATA_DIR`
2. Splits each video into 10-second clips using fixed stride
3. Generates embeddings using Cosmos-Embed1-224p model
4. Encodes clips using h264_nvenc codec
5. Writes output clips and metadata to `$OUT_DIR`
**Using a config file**: The example script accepts many command-line arguments. For complex configurations, you can store arguments in a file and pass them with the `@` prefix:
echo '--video-dir /data/videos
--output-clip-path /data/output
--splitting-algorithm fixed_stride
--fixed-stride-split-duration 10.0
--embedding-algorithm cosmos-embed1-224p
--transcode-encoder h264_nvenc' > my_config.txt
python tutorials/video/getting-started/video_split_clip_example.py @my_config.txt
### Configuration Options Reference
| Option | Values | Description |
|--------|--------|-------------|
| **Splitting** |
| `--splitting-algorithm` | `fixed_stride`, `transnetv2` | Method for dividing videos into clips |
| `--fixed-stride-split-duration` | Float (seconds) | Clip length for fixed stride (default: 10.0) |
| `--transnetv2-frame-decoder-mode` | `pynvc`, `ffmpeg_gpu`, `ffmpeg_cpu` | Frame decoding method for TransNetV2 |
| **Embedding** |
| `--embedding-algorithm` | `cosmos-embed1-224p`, `cosmos-embed1-336p`, `cosmos-embed1-448p` | Embedding model to use |
| **Encoding** |
| `--transcode-encoder` | `h264_nvenc`, `libvpx-vp9`, `libopenh264` | Video encoder for output clips. Use `libvpx-vp9` (CPU) on GPUs without NVENC such as A100/H100. `libopenh264` is opt-in — run `install_h264_support.sh --with-libopenh264` inside the container or provide a system FFmpeg that includes it. See [Software H.264/HEVC/AV1 Codec Support](/get-started/installation#software-h264hevcav1-codec-support-advanced). |
| `--transcode-use-hwaccel` | Flag | Enable hardware acceleration for encoding (only valid with `h264_nvenc`). |
| **Optional Features** |
| `--generate-captions` | Flag | Generate text captions for each clip |
| `--generate-previews` | Flag | Create preview images for each clip |
| `--verbose` | Flag | Enable detailed logging output |
### Understanding Pipeline Output
After successful execution, the output directory will contain:
```
$OUT_DIR/
├── clips/
│ ├── video1_clip_0000.mp4
│ ├── video1_clip_0001.mp4
│ └── ...
├── embeddings/
│ ├── video1_clip_0000.npy
│ ├── video1_clip_0001.npy
│ └── ...
├── metadata/
│ └── manifest.jsonl
└── previews/ (if --generate-previews enabled)
├── video1_clip_0000.jpg
└── ...
```
**File descriptions:**
- **clips/**: Encoded video clips (MP4 format)
- **embeddings/**: Numpy arrays containing clip embeddings (for similarity search)
- **metadata/manifest.jsonl**: JSONL file with clip metadata (paths, timestamps, embeddings)
- **previews/**: Thumbnail images for each clip (optional)
**Example manifest entry:**
```json
{
"video_path": "/data/input_videos/video1.mp4",
"clip_path": "/data/output_clips/clips/video1_clip_0000.mp4",
"start_time": 0.0,
"end_time": 10.0,
"embedding_path": "/data/output_clips/embeddings/video1_clip_0000.npy",
"preview_path": "/data/output_clips/previews/video1_clip_0000.jpg"
}
```
## Best Practices
### Data Preparation
- **Validate input videos**: Ensure videos are not corrupted before processing
- **Consistent formats**: Convert videos to a standard format (MP4 with H.264) for consistent results
- **Organize by content**: Group similar videos together for efficient processing
### Model Selection
- **Start with Cosmos-Embed1-224p**: Best balance of speed and quality for initial experiments
- **Upgrade resolution as needed**: Use 336p or 448p only when higher precision is required
- **Monitor VRAM usage**: Check GPU memory with `nvidia-smi` during processing
### Pipeline Configuration
- **Enable verbose logging**: Use `--verbose` flag for debugging and monitoring
- **Test on small subset**: Run pipeline on 5-10 videos before processing large datasets
- **Use GPU encoding**: Enable NVENC for significant performance improvements
- **Save intermediate results**: Keep embeddings and metadata for downstream tasks
### Infrastructure
- **Use shared storage**: Mount shared filesystem for multi-node processing
- **Allocate sufficient VRAM**: Plan for peak usage (captioning + embedding)
- **Monitor GPU utilization**: Use `nvidia-smi dmon` to track GPU usage during processing
- **Schedule long-running jobs**: Process large video datasets in batch jobs overnight
## Next Steps
Explore the [Video Curation documentation](/curate-video). For encoding guidance, refer to [Clip Encoding](/curate-video/process-data/transcoding).