---
name: audio-mixing-patterns
description: ffmpeg audio mixing patterns for video production. Use when mixing narration with music, implementing ducking, or balancing volume levels for demos
tags: [video, audio, ffmpeg, mixing, ducking, narration, music]
context: fork
agent: demo-producer
user-invocable: false
version: 1.0.0
---

# Audio Mixing Patterns

Comprehensive guide to audio mixing for video production using ffmpeg. Covers narration/music balancing, automatic ducking, timing control, and loudness normalization.

## Core Principle

**Quality Audio = Clear Narration + Supportive Music + Appropriate Levels**

The human voice occupies 85-255 Hz (fundamental) with harmonics up to 8kHz. Music must support, not compete.

## Volume Balancing Formula

```
Standard Video Mix Ratios:
--------------------------
Narration:  100% (reference level)
Music:      15-20% of narration level
SFX:        70-100% of narration level (contextual)

dB Relationships:
-----------------
Narration:  -14 dB LUFS (dialogue standard)
Music bed:  -30 to -35 dB LUFS (under narration)
Music only: -16 dB LUFS (no narration sections)
SFX:        -18 to -20 dB LUFS
```

### Volume Multiplier Quick Reference

| Ratio | Multiplier | Use Case |
|-------|------------|----------|
| 100% | 1.0 | Full volume (narration) |
| 70% | 0.7 | Prominent SFX |
| 50% | 0.5 | Equal blend |
| 30% | 0.3 | Noticeable background |
| 20% | 0.2 | Subtle bed (recommended music) |
| 15% | 0.15 | Minimal presence |
| 10% | 0.1 | Barely audible |

## Basic ffmpeg Mixing Commands

### Two-Track Mix (Narration + Music)

```bash
# Basic mix: narration at full, music at 15%
ffmpeg -i narration.mp3 -i music.mp3 \
  -filter_complex "[0:a]volume=1.0[narr];[1:a]volume=0.15[music];[narr][music]amix=inputs=2:duration=first" \
  -c:a aac -b:a 192k output.m4a
```

### Three-Track Mix (Narration + Music + SFX)

```bash
ffmpeg -i narration.mp3 -i music.mp3 -i sfx.mp3 \
  -filter_complex "\
    [0:a]volume=1.0[narr];\
    [1:a]volume=0.15[music];\
    [2:a]volume=0.7[sfx];\
    [narr][music][sfx]amix=inputs=3:duration=first:weights='3 1 2'" \
  -c:a aac -b:a 192k output.m4a
```

## Timing with adelay Filter

The `adelay` filter positions audio at precise timestamps.

### Syntax

```bash
adelay=delays[|delays...][,all=1]
# delays: milliseconds or samples (with 'S' suffix)
# all=1: apply same delay to all channels
```

### Position Music at Specific Time

```bash
# Start music at 5 seconds
ffmpeg -i narration.mp3 -i music.mp3 \
  -filter_complex "\
    [0:a]volume=1.0[narr];\
    [1:a]adelay=5000|5000,volume=0.15[music];\
    [narr][music]amix=inputs=2:duration=first" \
  output.m4a
```

### Multiple Timed Audio Cues

```bash
# Narration starts at 0, music at 2s, SFX at 5.5s
ffmpeg -i narration.mp3 -i music.mp3 -i sfx.wav \
  -filter_complex "\
    [0:a]volume=1.0[narr];\
    [1:a]adelay=2000|2000,volume=0.15[music];\
    [2:a]adelay=5500|5500,volume=0.7[sfx];\
    [narr][music][sfx]amix=inputs=3:duration=longest" \
  output.m4a
```

## Audio Ducking

Automatically lower music when speech is present.

### Simple Sidechain Compression (Ducking)

```bash
ffmpeg -i narration.mp3 -i music.mp3 \
  -filter_complex "\
    [0:a]asplit=2[narr][sc];\
    [1:a][sc]sidechaincompress=threshold=0.02:ratio=10:attack=50:release=500[ducked];\
    [narr][ducked]amix=inputs=2:duration=first" \
  output.m4a
```

### Parameters Explained

| Parameter | Value | Effect |
|-----------|-------|--------|
| threshold | 0.02 (default 0.125) | Lower = more sensitive to speech |
| ratio | 10:1 | How much to reduce (10:1 = significant duck) |
| attack | 50ms | How fast to duck when speech starts |
| release | 500ms | How fast to return after speech ends |
| knee | 2.82843 | Softness of compression curve |

### Advanced Ducking with Precise Control

```bash
ffmpeg -i narration.mp3 -i music.mp3 \
  -filter_complex "\
    [0:a]asplit=2[narr][sc];\
    [1:a]volume=0.5[music_pre];\
    [music_pre][sc]sidechaincompress=\
      threshold=0.015:\
      ratio=15:\
      attack=30:\
      release=800:\
      makeup=1:\
      knee=6[ducked];\
    [narr][ducked]amix=inputs=2:duration=first:weights='1 0.4'" \
  output.m4a
```

## Mix Ratios by Content Type

```
Content Type          | Narration | Music | SFX  | Notes
---------------------|-----------|-------|------|------------------
Tutorial/How-to      | 100%      | 10%   | 50%  | Voice clarity critical
Corporate/Business   | 100%      | 15%   | 60%  | Professional presence
Social Media         | 100%      | 20%   | 80%  | Higher energy
Documentary          | 100%      | 25%   | 100% | Cinematic feel
Promo/Advertising    | 100%      | 30%   | 100% | Impactful
Music Video          | 50%       | 100%  | 80%  | Music dominant
Podcast              | 100%      | 5%    | 30%  | Minimal distraction
E-learning           | 100%      | 8%    | 40%  | Focus on retention
```

## Loudness Normalization (LUFS)

LUFS (Loudness Units Full Scale) is the broadcast standard for perceived loudness.

### Target Levels by Platform

| Platform | Target LUFS | True Peak | Notes |
|----------|-------------|-----------|-------|
| YouTube | -14 LUFS | -1 dB TP | Auto-normalized |
| Spotify | -14 LUFS | -1 dB TP | Loudness penalty applied |
| Apple Music | -16 LUFS | -1 dB TP | Sound Check |
| Broadcast TV | -24 LUFS | -2 dB TP | EBU R128 standard |
| Podcast | -16 to -19 LUFS | -1 dB TP | Apple spec |
| TikTok/Reels | -14 LUFS | -1 dB TP | Mobile optimization |

### Loudness Normalization Command

```bash
# Normalize to -14 LUFS (YouTube/Spotify standard)
ffmpeg -i input.mp3 \
  -af loudnorm=I=-14:TP=-1:LRA=11 \
  -c:a aac -b:a 192k output.m4a
```

### Two-Pass Normalization (More Accurate)

```bash
# Pass 1: Analyze
ffmpeg -i input.mp3 \
  -af loudnorm=I=-14:TP=-1:LRA=11:print_format=json \
  -f null - 2>&1 | grep -A 12 "output_i"

# Pass 2: Apply measured values
ffmpeg -i input.mp3 \
  -af loudnorm=I=-14:TP=-1:LRA=11:\
measured_I=-18.5:measured_TP=-2.3:measured_LRA=8.2:\
measured_thresh=-28.5:\
linear=true \
  -c:a aac -b:a 192k output.m4a
```

## Multi-Track Production Pipeline

### Complete Video Audio Mix

```bash
ffmpeg -i video.mp4 -i narration.wav -i music.mp3 -i sfx.wav \
  -filter_complex "\
    [1:a]volume=1.0,aformat=sample_fmts=fltp:sample_rates=48000:channel_layouts=stereo[narr];\
    [2:a]volume=0.15,aformat=sample_fmts=fltp:sample_rates=48000:channel_layouts=stereo[music];\
    [3:a]adelay=3000|3000,volume=0.7,aformat=sample_fmts=fltp:sample_rates=48000:channel_layouts=stereo[sfx];\
    [narr][music][sfx]amix=inputs=3:duration=first:normalize=0[mixed];\
    [mixed]loudnorm=I=-14:TP=-1:LRA=11[final]" \
  -map 0:v -map "[final]" \
  -c:v copy -c:a aac -b:a 192k \
  output.mp4
```

### Audio-Only Master Mix

```bash
ffmpeg -i narration.wav -i music.mp3 -i intro_sfx.wav -i outro_sfx.wav \
  -filter_complex "\
    [0:a]volume=1.0[narr];\
    [1:a]volume=0.15[music];\
    [2:a]adelay=0|0,volume=0.8[intro];\
    [3:a]adelay=55000|55000,volume=0.8[outro];\
    [narr][music][intro][outro]amix=inputs=4:duration=longest:weights='3 1 2 2'[mix];\
    [mix]loudnorm=I=-14:TP=-1[final]" \
  -map "[final]" -c:a aac -b:a 256k master_audio.m4a
```

## Quick Reference: Common Patterns

### Pattern 1: Narration + Background Music

```bash
ffmpeg -i narration.mp3 -i music.mp3 \
  -filter_complex "[0:a]volume=1.0[n];[1:a]volume=0.15[m];[n][m]amix=inputs=2:duration=first" \
  output.m4a
```

### Pattern 2: Music with Auto-Duck

```bash
ffmpeg -i narration.mp3 -i music.mp3 \
  -filter_complex "[0:a]asplit=2[n][sc];[1:a][sc]sidechaincompress=threshold=0.02:ratio=10[d];[n][d]amix=inputs=2" \
  output.m4a
```

### Pattern 3: Timed Intro Music Fade

```bash
ffmpeg -i narration.mp3 -i intro_music.mp3 \
  -filter_complex "\
    [1:a]afade=t=out:st=8:d=2,volume=0.3[intro];\
    [0:a]adelay=10000|10000[narr];\
    [intro][narr]amix=inputs=2:duration=longest" \
  output.m4a
```

### Pattern 4: Crossfade Between Segments

```bash
ffmpeg -i segment1.mp3 -i segment2.mp3 \
  -filter_complex "\
    [0:a]afade=t=out:st=28:d=2[s1];\
    [1:a]adelay=28000|28000,afade=t=in:d=2[s2];\
    [s1][s2]amix=inputs=2:duration=longest" \
  output.m4a
```

## Troubleshooting

| Issue | Cause | Solution |
|-------|-------|----------|
| Clipping/distortion | Combined levels too high | Reduce individual volumes or add limiter |
| Narration buried | Music too loud | Reduce music to 10-15%, add ducking |
| Hollow/thin sound | Phase cancellation | Check mono compatibility |
| Pumping artifacts | Aggressive ducking | Increase attack/release times |
| Inconsistent levels | No normalization | Apply loudnorm filter |

### Add Limiter to Prevent Clipping

```bash
ffmpeg -i input.mp3 \
  -af "alimiter=level_in=1:level_out=0.9:limit=0.95:attack=5:release=50" \
  output.m4a
```

## Related Skills

- `video-pacing`: Video rhythm and timing patterns
- `remotion-composer`: Programmatic video generation
- `demo-producer`: Product demo video production
- `thumbnail-first-frame`: Video thumbnail optimization

## References

- [ffmpeg Filters](./references/ffmpeg-filters.md) - Complete audio filter reference
- [Volume Balancing](./references/volume-balancing.md) - Detailed formulas and calculations
- [Ducking Patterns](./references/ducking-patterns.md) - Automatic ducking implementation