---
description: "Essential concepts for image data curation including loading, processing, and export with GPU acceleration"
categories: ["concepts-architecture"]
tags: ["concepts", "image-curation", "tar-archives", "gpu-accelerated", "embedding", "classification"]
personas: ["data-scientist-focused", "mle-focused"]
difficulty: "beginner"
content_type: "concept"
modality: "image-only"
---
# Image Curation Concepts
This document covers the essential concepts for image data curation in NVIDIA NeMo Curator. These concepts assume basic familiarity with data science and machine learning principles.
## Core Concept Areas
Image curation in NVIDIA NeMo Curator focuses on these key areas:
Core concepts for loading and managing image datasets
Concepts for embedding generation, classification, filtering, and deduplication
Concepts for saving, exporting, and resharding curated image datasets
## Infrastructure Components
The image curation concepts build on NVIDIA NeMo Curator's core infrastructure components, which are shared across all modalities (text, image, video). These components include:
Optimize memory usage when processing large datasets
partitioning
batching
monitoring
Leverage NVIDIA GPUs for faster data processing
cuda
dali
performance
Continue interrupted operations across large datasets
checkpoints
recovery
batching