--- description: "Comprehensive audio curation capabilities for speech data processing including ASR inference, quality assessment, and text integration workflows" categories: ["workflows"] tags: ["audio-curation", "asr-inference", "speech-processing", "quality-metrics", "manifests", "text-integration"] personas: ["data-scientist-focused", "mle-focused"] difficulty: "beginner" content_type: "workflow" modality: "audio-only" --- # About Audio Curation NeMo Curator provides comprehensive audio curation capabilities to prepare high-quality speech data for automatic speech recognition (ASR) and multi-modal model training. The toolkit includes processors for loading audio datasets, performing ASR inference, assessing transcription quality, and integrating with text curation workflows. ## Use Cases - Process and curate large-scale speech datasets for ASR model training - Perform quality assessment and filtering based on transcription accuracy metrics - Generate transcriptions using state-of-the-art NVIDIA NeMo ASR models - Integrate audio processing with text curation pipelines for multi-modal workflows - Scale audio processing across GPU clusters efficiently --- ## Introduction Master the fundamentals of NeMo Curator and set up your audio processing environment. Learn about AudioTask, ASR pipelines, and other core data structures for efficient audio curation data-structures asr-pipeline quality-metrics Learn prerequisites, setup instructions, and initial configuration for audio curation setup configuration quickstart ## Curation Tasks ### Load Data Import your audio data from various sources into NeMo Curator's processing pipeline. Load audio files from local directories and file systems local-storage file-discovery batch-processing Create and load custom audio dataset manifests with metadata manifests metadata custom-formats Load and process the multilingual FLEURS speech dataset fleurs multilingual benchmarks ### Process Data Transform and enhance your audio data through ASR inference, quality assessment, and analysis. Generate transcriptions using NVIDIA NeMo ASR models nemo-models transcription gpu-accelerated Assess transcription quality using WER and CER wer-filtering duration-filtering Analyze audio characteristics including duration and format validation duration-calculation format-validation metadata-extraction Integrate audio processing results with text curation workflows multimodal text-filtering pipeline-integration ### Save & Export Save processed audio data and transcriptions in formats suitable for downstream training and analysis. Export curated audio datasets with transcriptions and quality metrics manifests parquet metadata --- ## Tutorials Build practical experience with step-by-step guides for common audio curation workflows. Learn the basics of audio loading, ASR inference, and quality filtering asr-inference quality-filtering basic-workflow