--- description: "Generate clip-level embeddings using Cosmos-Embed1" categories: ["video-curation"] tags: ["embeddings", "cosmos-embed1", "video"] personas: ["data-scientist-focused", "mle-focused"] difficulty: "intermediate" content_type: "howto" modality: "video-only" --- # Embeddings Generate clip-level embeddings for search, question answering, filtering, and duplicate removal. ## Use Cases - Prepare semantic vectors for search, clustering, and near-duplicate detection. - Score optional text prompts against clip content. - Enable downstream filtering or retrieval tasks that need clip-level vectors. ## Before You Start - Create clips upstream. Refer to [Clipping](/curate-video/process-data/clipping). - Provide frames for embeddings or sample at the required rate. Refer to [Frame Extraction](/curate-video/process-data/frame-extraction). - Access to model weights on each node (the stages download weights if missing). --- ## Quickstart Use the pipeline stages or the example script flags to generate clip-level embeddings. ```python from nemo_curator.pipeline import Pipeline from nemo_curator.stages.video.clipping.clip_frame_extraction import ClipFrameExtractionStage from nemo_curator.utils.decoder_utils import FrameExtractionPolicy, FramePurpose from nemo_curator.stages.video.embedding.cosmos_embed1 import ( CosmosEmbed1FrameCreationStage, CosmosEmbed1EmbeddingStage, ) pipe = Pipeline(name="video_embeddings_example") pipe.add_stage( ClipFrameExtractionStage( extraction_policies=(FrameExtractionPolicy.sequence,), extract_purposes=(FramePurpose.EMBEDDINGS,), target_res=(-1, -1), verbose=True, ) ) pipe.add_stage(CosmosEmbed1FrameCreationStage(model_dir="/models", variant="224p", target_fps=2.0, verbose=True)) pipe.add_stage(CosmosEmbed1EmbeddingStage(model_dir="/models", variant="224p", gpu_memory_gb=20.0, verbose=True)) pipe.run() ``` ```bash # Cosmos-Embed1 (224p) python tutorials/video/getting-started/video_split_clip_example.py \ ... \ --generate-embeddings \ --embedding-algorithm cosmos-embed1-224p \ --embedding-gpu-memory-gb 20.0 ``` ## Embedding Options ### Cosmos-Embed1 1. Add `CosmosEmbed1FrameCreationStage` to transform extracted frames into model-ready tensors. ```python from nemo_curator.stages.video.embedding.cosmos_embed1 import ( CosmosEmbed1FrameCreationStage, CosmosEmbed1EmbeddingStage, ) frames = CosmosEmbed1FrameCreationStage( model_dir="/models", variant="224p", # or 336p, 448p target_fps=2.0, verbose=True, ) ``` 2. Add `CosmosEmbed1EmbeddingStage` to generate `clip.cosmos_embed1_embedding` and optional `clip.cosmos_embed1_text_match`. ```python embed = CosmosEmbed1EmbeddingStage( model_dir="/models", variant="224p", gpu_memory_gb=20.0, verbose=True, ) ``` #### Parameters | Parameter | Type | Default | Description | | --- | --- | --- | --- | | `model_dir` | str | `"models/cosmos_embed1"` | Directory for model utilities and configs used to format input frames. | | `variant` | {"224p", "336p", "448p"} | `"336p"` | Resolution preset that controls the model’s expected input size. | | `target_fps` | float | 2.0 | Source sampling rate used to select frames; may re-extract at higher FPS if needed. | | `num_cpus` | int | 3 | CPU cores used when on-the-fly re-extraction is required. | | `verbose` | bool | `False` | Log per-clip decisions and re-extraction messages. | | Parameter | Type | Default | Description | | --- | --- | --- | --- | | `model_dir` | str | `"models/cosmos_embed1"` | Directory for model weights; downloaded on each node if missing. | | `variant` | {"224p", "336p", "448p"} | `"336p"` | Resolution preset used by the model weights. | | `gpu_memory_gb` | int | 20 | Approximate GPU memory reservation per worker. | | `texts_to_verify` | list[str] \| None | `None` | Optional text prompts to score against the clip embedding. | | `verbose` | bool | `False` | Log setup and per-clip outcomes. | #### Outputs - `clip.cosmos_embed1_frames` → temporary tensors used by the embedding stage - `clip.cosmos_embed1_embedding` → final clip-level vector (NumPy array) - Optional: `clip.cosmos_embed1_text_match` ## Troubleshooting - Not enough frames for embeddings: Increase `target_fps` during frame extraction or adjust clip length so that the model receives the required number of frames. - Out of memory during embedding: Lower `gpu_memory_gb`, reduce batch size if exposed, or use a smaller resolution variant. - Weights not found on node: Confirm `model_dir` and network access. The stages download weights if missing. ## Next Steps - Use embeddings for duplicate removal. Refer to [Duplicate Removal](/curate-video/process-data/dedup). - Generate captions and previews for review workflows. Refer to [Captions & Preview](/curate-video/process-data/captions-preview).