--- description: "Step-by-step guide to installing Curator and running your first video curation pipeline" categories: ["getting-started"] tags: ["video-curation", "installation", "quickstart", "gpu-accelerated", "ray", "python"] personas: ["data-scientist-focused", "mle-focused"] difficulty: "beginner" content_type: "tutorial" modality: "video-only" --- # Get Started with Video Curation This guide shows how to install Curator and run your first video curation pipeline. The [example pipeline](#run-the-splitting-pipeline-example) processes a list of videos, splitting each into 10‑second clips using a fixed stride. It then generates clip‑level embeddings for downstream tasks such as duplicate removal and similarity search. ## Overview This quickstart guide demonstrates how to: 1. **Install NeMo Curator** with video processing support 2. **Set up FFmpeg** with GPU-accelerated encoding 3. **Configure embedding models** (Cosmos-Embed1) 4. **Process videos** through a complete splitting and embedding pipeline 5. **Generate outputs** ready for duplicate removal, captioning, and model training **What you build:** A video processing pipeline that: - Splits videos into 10-second clips using fixed stride or scene detection - Generates clip-level embeddings for similarity search and deduplication - Optionally creates captions and preview images - Outputs results in formats compatible with multimodal training workflows ## Prerequisites ### System Requirements To use NeMo Curator's video curation capabilities, ensure your system meets these requirements: #### Operating System * **Ubuntu 24.04, 22.04, or 20.04** (required for GPU-accelerated video processing) * Other Linux distributions may work but are not officially supported #### Python Environment * **Python 3.10, 3.11, or 3.12** * **uv package manager** for dependency management * **Git** for model and repository dependencies #### GPU Requirements * **NVIDIA GPU required** (CPU-only mode not supported for video processing) * **Architecture**: Volta™ or newer (compute capability 7.0+) - Examples: V100, T4, RTX 2080+, A100, H100 * **CUDA**: Version 12.0 or above * **VRAM**: Minimum requirements by configuration: - Basic splitting + embedding: ~16GB VRAM - Full pipeline (splitting + embedding + captioning): ~38GB VRAM - Reduced configuration (lower batch sizes, FP8): ~21GB VRAM #### Software Dependencies * **FFmpeg 8.0+** with one of the following encoders: - GPU encoder: `h264_nvenc` (recommended for performance; requires an NVENC-equipped GPU — note that A100 and H100 do **not** include NVENC) - CPU encoder: `libvpx-vp9` (for non-NVENC GPUs; produces VP9 in `.mp4`) If `uv` is not installed, refer to the [Installation Guide](/get-started/installation) for setup instructions, or install it quickly with: ```bash curl -LsSf https://astral.sh/uv/0.8.22/install.sh | sh source $HOME/.local/bin/env ``` --- ## Install Create and activate a virtual environment, then choose an install option: ```bash uv pip install torch wheel_stub psutil setuptools setuptools_scm uv pip install --no-build-isolation "nemo-curator[video_cuda12]" ``` ```bash git clone https://github.com/NVIDIA-NeMo/Curator.git cd Curator uv sync --extra video_cuda12 --all-groups source .venv/bin/activate ``` NeMo Curator is available as a standalone container: ```bash # Pull the container docker pull nvcr.io/nvidia/nemo-curator:{{ container_version }} # Run the container docker run --gpus all -it --rm nvcr.io/nvidia/nemo-curator:{{ container_version }} ``` For details on container environments and configurations, see [Container Environments](/reference/infra/container-environments). ## Install FFmpeg and Encoders Curator’s video pipelines rely on `FFmpeg` for decoding and encoding. If you plan to encode clips (using `--transcode-encoder h264_nvenc` or `--transcode-encoder libvpx-vp9`), install `FFmpeg` with NVENC and `libvpx-vp9` support. The maintained install script bundles both. Use the maintained script in the repository to build and install `FFmpeg` with NVIDIA NVENC and `libvpx-vp9` support. The script enables `--enable-cuda-nvcc`, `--enable-libnpp`, and `--enable-libvpx`. - Script source: [docker/common/install_ffmpeg.sh](https://github.com/NVIDIA-NeMo/Curator/blob/main/docker/common/install_ffmpeg.sh) ```bash curl -fsSL https://raw.githubusercontent.com/NVIDIA-NeMo/Curator/main/docker/common/install_ffmpeg.sh -o install_ffmpeg.sh chmod +x install_ffmpeg.sh sudo bash install_ffmpeg.sh ``` Confirm that `FFmpeg` is on your `PATH` and that at least one supported encoder is available: ```bash ffmpeg -hide_banner -version | head -n 5 ffmpeg -encoders | grep -E "h264_nvenc|libvpx-vp9" | cat ``` If encoders are missing, reinstall `FFmpeg` with the required options or use the Debian/Ubuntu script above. **Processing H.264/HEVC/AV1 inputs? You might still need a software decoder — even with NVENC/NVDEC.** Curator runs `ffprobe` inside CPU-only Ray actors (`VideoReader`, `ClipWriter`) for metadata extraction. Those actors can't open NVDEC decoders, so without a software h264/hevc/av1 decoder your inputs are silently skipped (`SoftwareCodecMissingError` in the logs). Run the bundled installer inside the container to add software decoder support — no image rebuild needed: ```bash bash /opt/Curator/docker/common/install_h264_support.sh ``` See [Software H.264/HEVC/AV1 Codec Support](/get-started/installation#software-h264hevcav1-codec-support-advanced) for the full picture. Refer to [Clip Encoding](/curate-video/process-data/transcoding) to choose encoders and verify NVENC support on your system. ### Available Models Embeddings convert each video clip into a numeric vector that captures visual and semantic content. Curator uses these vectors to: - Remove near-duplicate clips during duplicate removal - Enable similarity search and clustering - Support downstream analysis such as caption verification NeMo Curator supports two embedding model families: #### Cosmos-Embed1 (Recommended) **Cosmos-Embed1 (default)**: Available in three variants—**cosmos-embed1-224p**, **cosmos-embed1-336p**, and **cosmos-embed1-448p**—which differ in input resolution and accuracy/VRAM tradeoff. All variants are automatically downloaded to `MODEL_DIR` on first run. | Model Variant | Resolution | VRAM Usage | Speed | Accuracy | Best For | |---------------|------------|------------|-------|----------|----------| | **cosmos-embed1-224p** | 224×224 | ~8GB | Fastest | Good | Large-scale processing, initial curation | | **cosmos-embed1-336p** | 336×336 | ~12GB | Medium | Better | Balanced performance and quality | | **cosmos-embed1-448p** | 448×448 | ~16GB | Slower | Best | High-quality embeddings, fine-grained matching | **Model links:** - [cosmos-embed1-224p on Hugging Face](https://huggingface.co/nvidia/cosmos-embed1-224p) - [cosmos-embed1-336p on Hugging Face](https://huggingface.co/nvidia/cosmos-embed1-336p) - [cosmos-embed1-448p on Hugging Face](https://huggingface.co/nvidia/cosmos-embed1-448p) For this quickstart, the following steps set up support for **Cosmos-Embed1-224p**. ### Prepare Model Weights For most use cases, you only need to create a model directory. The required model files will be downloaded automatically on first run. 1. Create a model directory: ```bash mkdir -p "$MODEL_DIR" ``` You can reuse the same `` across runs. 2. No additional setup is required. The model will be downloaded automatically when first used. ## Set Up Data Directories Organize input videos and output locations before running the pipeline. - **Local**: For local file processing. Define paths like: ```bash DATA_DIR=/path/to/videos OUT_DIR=/path/to/output_clips MODEL_DIR=/path/to/models ``` - **S3**: For cloud storage (AWS S3, MinIO, etc.). Configure credentials in `~/.aws/credentials` and use `s3://` paths for `--video-dir` and `--output-clip-path`. **S3 usage notes:** - Input videos can be read from S3 paths - Output clips can be written to S3 paths - Model directory should remain local for performance - Ensure IAM permissions allow read/write access to specified buckets ## Run the Splitting Pipeline Example Use the example script from https://github.com/NVIDIA-NeMo/Curator/tree/main/tutorials/video/getting-started to read videos, split into clips, and write outputs. This runs a Ray pipeline with `XennaExecutor` under the hood. ```bash python tutorials/video/getting-started/video_split_clip_example.py \ --video-dir "$DATA_DIR" \ --model-dir "$MODEL_DIR" \ --output-clip-path "$OUT_DIR" \ --splitting-algorithm fixed_stride \ --fixed-stride-split-duration 10.0 \ --embedding-algorithm cosmos-embed1-224p \ --transcode-encoder h264_nvenc \ --verbose ``` **What this command does:** 1. Reads all video files from `$DATA_DIR` 2. Splits each video into 10-second clips using fixed stride 3. Generates embeddings using Cosmos-Embed1-224p model 4. Encodes clips using h264_nvenc codec 5. Writes output clips and metadata to `$OUT_DIR` **Using a config file**: The example script accepts many command-line arguments. For complex configurations, you can store arguments in a file and pass them with the `@` prefix: echo '--video-dir /data/videos --output-clip-path /data/output --splitting-algorithm fixed_stride --fixed-stride-split-duration 10.0 --embedding-algorithm cosmos-embed1-224p --transcode-encoder h264_nvenc' > my_config.txt python tutorials/video/getting-started/video_split_clip_example.py @my_config.txt ### Configuration Options Reference | Option | Values | Description | |--------|--------|-------------| | **Splitting** | | `--splitting-algorithm` | `fixed_stride`, `transnetv2` | Method for dividing videos into clips | | `--fixed-stride-split-duration` | Float (seconds) | Clip length for fixed stride (default: 10.0) | | `--transnetv2-frame-decoder-mode` | `pynvc`, `ffmpeg_gpu`, `ffmpeg_cpu` | Frame decoding method for TransNetV2 | | **Embedding** | | `--embedding-algorithm` | `cosmos-embed1-224p`, `cosmos-embed1-336p`, `cosmos-embed1-448p` | Embedding model to use | | **Encoding** | | `--transcode-encoder` | `h264_nvenc`, `libvpx-vp9`, `libopenh264` | Video encoder for output clips. Use `libvpx-vp9` (CPU) on GPUs without NVENC such as A100/H100. `libopenh264` is opt-in — run `install_h264_support.sh --with-libopenh264` inside the container or provide a system FFmpeg that includes it. See [Software H.264/HEVC/AV1 Codec Support](/get-started/installation#software-h264hevcav1-codec-support-advanced). | | `--transcode-use-hwaccel` | Flag | Enable hardware acceleration for encoding (only valid with `h264_nvenc`). | | **Optional Features** | | `--generate-captions` | Flag | Generate text captions for each clip | | `--generate-previews` | Flag | Create preview images for each clip | | `--verbose` | Flag | Enable detailed logging output | ### Understanding Pipeline Output After successful execution, the output directory will contain: ``` $OUT_DIR/ ├── clips/ │ ├── video1_clip_0000.mp4 │ ├── video1_clip_0001.mp4 │ └── ... ├── embeddings/ │ ├── video1_clip_0000.npy │ ├── video1_clip_0001.npy │ └── ... ├── metadata/ │ └── manifest.jsonl └── previews/ (if --generate-previews enabled) ├── video1_clip_0000.jpg └── ... ``` **File descriptions:** - **clips/**: Encoded video clips (MP4 format) - **embeddings/**: Numpy arrays containing clip embeddings (for similarity search) - **metadata/manifest.jsonl**: JSONL file with clip metadata (paths, timestamps, embeddings) - **previews/**: Thumbnail images for each clip (optional) **Example manifest entry:** ```json { "video_path": "/data/input_videos/video1.mp4", "clip_path": "/data/output_clips/clips/video1_clip_0000.mp4", "start_time": 0.0, "end_time": 10.0, "embedding_path": "/data/output_clips/embeddings/video1_clip_0000.npy", "preview_path": "/data/output_clips/previews/video1_clip_0000.jpg" } ``` ## Best Practices ### Data Preparation - **Validate input videos**: Ensure videos are not corrupted before processing - **Consistent formats**: Convert videos to a standard format (MP4 with H.264) for consistent results - **Organize by content**: Group similar videos together for efficient processing ### Model Selection - **Start with Cosmos-Embed1-224p**: Best balance of speed and quality for initial experiments - **Upgrade resolution as needed**: Use 336p or 448p only when higher precision is required - **Monitor VRAM usage**: Check GPU memory with `nvidia-smi` during processing ### Pipeline Configuration - **Enable verbose logging**: Use `--verbose` flag for debugging and monitoring - **Test on small subset**: Run pipeline on 5-10 videos before processing large datasets - **Use GPU encoding**: Enable NVENC for significant performance improvements - **Save intermediate results**: Keep embeddings and metadata for downstream tasks ### Infrastructure - **Use shared storage**: Mount shared filesystem for multi-node processing - **Allocate sufficient VRAM**: Plan for peak usage (captioning + embedding) - **Monitor GPU utilization**: Use `nvidia-smi dmon` to track GPU usage during processing - **Schedule long-running jobs**: Process large video datasets in batch jobs overnight ## Next Steps Explore the [Video Curation documentation](/curate-video). For encoding guidance, refer to [Clip Encoding](/curate-video/process-data/transcoding).