---
description: "Comprehensive audio curation capabilities for speech data processing including ASR inference, quality assessment, and text integration workflows"
categories: ["workflows"]
tags: ["audio-curation", "asr-inference", "speech-processing", "quality-metrics", "manifests", "text-integration"]
personas: ["data-scientist-focused", "mle-focused"]
difficulty: "beginner"
content_type: "workflow"
modality: "audio-only"
---
# About Audio Curation
NeMo Curator provides comprehensive audio curation capabilities to prepare high-quality speech data for automatic speech recognition (ASR) and multi-modal model training. The toolkit includes processors for loading audio datasets, performing ASR inference, assessing transcription quality, and integrating with text curation workflows.
## Use Cases
- Process and curate large-scale speech datasets for ASR model training
- Perform quality assessment and filtering based on transcription accuracy metrics
- Generate transcriptions using state-of-the-art NVIDIA NeMo ASR models
- Integrate audio processing with text curation pipelines for multi-modal workflows
- Scale audio processing across GPU clusters efficiently
---
## Introduction
Master the fundamentals of NeMo Curator and set up your audio processing environment.
Learn about AudioTask, ASR pipelines, and other core data structures for efficient audio curation
data-structures
asr-pipeline
quality-metrics
Learn prerequisites, setup instructions, and initial configuration for audio curation
setup
configuration
quickstart
## Curation Tasks
### Load Data
Import your audio data from various sources into NeMo Curator's processing pipeline.
Load audio files from local directories and file systems
local-storage
file-discovery
batch-processing
Create and load custom audio dataset manifests with metadata
manifests
metadata
custom-formats
Load and process the multilingual FLEURS speech dataset
fleurs
multilingual
benchmarks
### Process Data
Transform and enhance your audio data through ASR inference, quality assessment, and analysis.
Generate transcriptions using NVIDIA NeMo ASR models
nemo-models
transcription
gpu-accelerated
Assess transcription quality using WER and CER
wer-filtering
duration-filtering
Analyze audio characteristics including duration and format validation
duration-calculation
format-validation
metadata-extraction
Integrate audio processing results with text curation workflows
multimodal
text-filtering
pipeline-integration
### Save & Export
Save processed audio data and transcriptions in formats suitable for downstream training and analysis.
Export curated audio datasets with transcriptions and quality metrics
manifests
parquet
metadata
---
## Tutorials
Build practical experience with step-by-step guides for common audio curation workflows.
Learn the basics of audio loading, ASR inference, and quality filtering
asr-inference
quality-filtering
basic-workflow