---
name: audio-video
description: "Audio and video processing with FFmpeg, WebRTC, and streaming. Covers transcoding, format conversion, real-time communication, and media pipelines. Use for video processing, live streaming, or audio manipulation."
---

# Audio & Video Processing Skill

Complete guide for audio and video processing.

## Quick Reference

### FFmpeg Commands
| Task | Command |
|------|---------|
| **Convert** | `ffmpeg -i input.mp4 output.webm` |
| **Extract Audio** | `ffmpeg -i video.mp4 -vn audio.mp3` |
| **Thumbnail** | `ffmpeg -i video.mp4 -ss 5 -frames:v 1 thumb.jpg` |
| **Resize** | `ffmpeg -i input.mp4 -vf scale=1280:720 output.mp4` |
| **Compress** | `ffmpeg -i input.mp4 -crf 28 output.mp4` |

### Common Formats
```
Video: MP4, WebM, MKV, AVI, MOV
Audio: MP3, AAC, WAV, FLAC, OGG
Codecs: H.264, H.265/HEVC, VP9, AV1
```

---

## 1. FFmpeg Basics

### Installation
```bash
# Ubuntu/Debian
sudo apt install ffmpeg

# macOS
brew install ffmpeg

# Windows (Chocolatey)
choco install ffmpeg
```

### Basic Conversion
```bash
# Video format conversion
ffmpeg -i input.avi output.mp4

# Audio format conversion
ffmpeg -i input.wav output.mp3

# With codec specification
ffmpeg -i input.mp4 -c:v libx264 -c:a aac output.mp4
```

### Video Quality Control
```bash
# CRF (Constant Rate Factor) - 0-51, lower = better quality
ffmpeg -i input.mp4 -c:v libx264 -crf 23 output.mp4

# Bitrate control
ffmpeg -i input.mp4 -c:v libx264 -b:v 2M output.mp4

# Two-pass encoding for better quality
ffmpeg -i input.mp4 -c:v libx264 -b:v 2M -pass 1 -f null /dev/null
ffmpeg -i input.mp4 -c:v libx264 -b:v 2M -pass 2 output.mp4
```

---

## 2. Video Processing

### Resize and Scale
```bash
# Scale to specific resolution
ffmpeg -i input.mp4 -vf "scale=1920:1080" output.mp4

# Scale maintaining aspect ratio
ffmpeg -i input.mp4 -vf "scale=1280:-1" output.mp4  # Auto height
ffmpeg -i input.mp4 -vf "scale=-1:720" output.mp4   # Auto width

# Scale with padding (letterbox)
ffmpeg -i input.mp4 -vf "scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2" output.mp4
```

### Trim and Cut
```bash
# Cut from timestamp to timestamp
ffmpeg -i input.mp4 -ss 00:01:00 -to 00:02:00 -c copy output.mp4

# Cut with duration
ffmpeg -i input.mp4 -ss 00:01:00 -t 30 -c copy output.mp4

# Fast seek (put -ss before input)
ffmpeg -ss 00:01:00 -i input.mp4 -t 30 -c copy output.mp4
```

### Concatenate Videos
```bash
# Create file list
echo "file 'video1.mp4'" > list.txt
echo "file 'video2.mp4'" >> list.txt
echo "file 'video3.mp4'" >> list.txt

# Concatenate
ffmpeg -f concat -safe 0 -i list.txt -c copy output.mp4
```

### Add Watermark
```bash
# Image watermark
ffmpeg -i input.mp4 -i logo.png -filter_complex "overlay=10:10" output.mp4

# Bottom right corner
ffmpeg -i input.mp4 -i logo.png -filter_complex "overlay=W-w-10:H-h-10" output.mp4

# Text watermark
ffmpeg -i input.mp4 -vf "drawtext=text='Copyright':fontsize=24:fontcolor=white:x=10:y=10" output.mp4
```

### Speed Adjustment
```bash
# Speed up 2x
ffmpeg -i input.mp4 -filter:v "setpts=0.5*PTS" -filter:a "atempo=2.0" output.mp4

# Slow down 0.5x
ffmpeg -i input.mp4 -filter:v "setpts=2*PTS" -filter:a "atempo=0.5" output.mp4
```

---

## 3. Audio Processing

### Extract Audio
```bash
# Extract audio track
ffmpeg -i video.mp4 -vn -c:a copy audio.aac

# Convert to MP3
ffmpeg -i video.mp4 -vn -c:a libmp3lame -q:a 2 audio.mp3

# Extract specific audio stream
ffmpeg -i video.mp4 -map 0:a:0 -c copy audio.m4a
```

### Audio Conversion
```bash
# WAV to MP3
ffmpeg -i input.wav -c:a libmp3lame -b:a 320k output.mp3

# FLAC to MP3
ffmpeg -i input.flac -c:a libmp3lame -q:a 0 output.mp3

# MP3 to AAC
ffmpeg -i input.mp3 -c:a aac -b:a 256k output.m4a
```

### Audio Normalization
```bash
# Loudnorm filter (EBU R128)
ffmpeg -i input.mp3 -af loudnorm=I=-16:LRA=11:TP=-1.5 output.mp3

# Volume adjustment
ffmpeg -i input.mp3 -af "volume=2.0" output.mp3  # 2x louder
ffmpeg -i input.mp3 -af "volume=0.5" output.mp3  # Half volume

# Peak normalization
ffmpeg -i input.mp3 -af "acompressor" output.mp3
```

### Merge Audio/Video
```bash
# Replace audio track
ffmpeg -i video.mp4 -i audio.mp3 -c:v copy -c:a aac -map 0:v -map 1:a output.mp4

# Add audio to video (mix)
ffmpeg -i video.mp4 -i audio.mp3 -c:v copy -filter_complex "[0:a][1:a]amix=inputs=2:duration=first" output.mp4
```

---

## 4. Thumbnails and Screenshots

### Single Thumbnail
```bash
# At specific time
ffmpeg -i video.mp4 -ss 00:00:05 -frames:v 1 thumbnail.jpg

# Best quality
ffmpeg -i video.mp4 -ss 00:00:05 -frames:v 1 -q:v 2 thumbnail.jpg

# Specific size
ffmpeg -i video.mp4 -ss 00:00:05 -frames:v 1 -vf "scale=320:180" thumbnail.jpg
```

### Multiple Thumbnails
```bash
# Every N seconds
ffmpeg -i video.mp4 -vf "fps=1/10" thumbnails_%03d.jpg  # Every 10 seconds

# Specific number of thumbnails
ffmpeg -i video.mp4 -vf "select='not(mod(n,300))'" -vsync vfr thumb_%03d.jpg

# Thumbnail sprite/grid
ffmpeg -i video.mp4 -vf "fps=1/10,scale=160:90,tile=10x10" sprite.jpg
```

### GIF Creation
```bash
# Simple GIF
ffmpeg -i video.mp4 -ss 0 -t 5 -vf "fps=10,scale=320:-1" output.gif

# High quality GIF with palette
ffmpeg -i video.mp4 -ss 0 -t 5 -vf "fps=10,scale=320:-1:flags=lanczos,palettegen" palette.png
ffmpeg -i video.mp4 -i palette.png -ss 0 -t 5 -filter_complex "[0:v]fps=10,scale=320:-1:flags=lanczos[x];[x][1:v]paletteuse" output.gif
```

---

## 5. Streaming Formats

### HLS (HTTP Live Streaming)
```bash
# Generate HLS
ffmpeg -i input.mp4 \
  -c:v libx264 -c:a aac \
  -hls_time 10 \
  -hls_list_size 0 \
  -hls_segment_filename "segment_%03d.ts" \
  output.m3u8

# Adaptive bitrate HLS
ffmpeg -i input.mp4 \
  -filter_complex "[0:v]split=3[v1][v2][v3];[v1]scale=1920:1080[v1out];[v2]scale=1280:720[v2out];[v3]scale=854:480[v3out]" \
  -map "[v1out]" -c:v:0 libx264 -b:v:0 5M \
  -map "[v2out]" -c:v:1 libx264 -b:v:1 2M \
  -map "[v3out]" -c:v:2 libx264 -b:v:2 1M \
  -map 0:a -c:a aac -b:a 128k \
  -var_stream_map "v:0,a:0 v:1,a:0 v:2,a:0" \
  -master_pl_name master.m3u8 \
  -hls_time 6 \
  -hls_segment_filename "v%v/segment_%03d.ts" \
  v%v/index.m3u8
```

### DASH (Dynamic Adaptive Streaming)
```bash
ffmpeg -i input.mp4 \
  -c:v libx264 -c:a aac \
  -f dash \
  -init_seg_name "init-\$RepresentationID\$.m4s" \
  -media_seg_name "chunk-\$RepresentationID\$-\$Number%05d\$.m4s" \
  output.mpd
```

---

## 6. Python Integration

### PyAV (FFmpeg bindings)
```python
import av

def transcode_video(input_path, output_path, target_codec='libx264'):
    input_container = av.open(input_path)
    output_container = av.open(output_path, 'w')

    # Get input streams
    video_stream = input_container.streams.video[0]
    audio_stream = input_container.streams.audio[0] if input_container.streams.audio else None

    # Create output streams
    output_video = output_container.add_stream(target_codec, rate=video_stream.average_rate)
    output_video.width = video_stream.width
    output_video.height = video_stream.height
    output_video.pix_fmt = 'yuv420p'

    if audio_stream:
        output_audio = output_container.add_stream('aac', rate=audio_stream.rate)

    for packet in input_container.demux():
        if packet.stream.type == 'video':
            for frame in packet.decode():
                for out_packet in output_video.encode(frame):
                    output_container.mux(out_packet)
        elif packet.stream.type == 'audio' and audio_stream:
            for frame in packet.decode():
                for out_packet in output_audio.encode(frame):
                    output_container.mux(out_packet)

    # Flush encoders
    for packet in output_video.encode():
        output_container.mux(packet)

    output_container.close()
    input_container.close()
```

### MoviePy
```python
from moviepy.editor import VideoFileClip, concatenate_videoclips, TextClip, CompositeVideoClip

def process_video(input_path, output_path):
    # Load video
    clip = VideoFileClip(input_path)

    # Trim
    trimmed = clip.subclip(10, 60)  # 10s to 60s

    # Resize
    resized = trimmed.resize(height=720)

    # Add text overlay
    txt_clip = TextClip("Hello World", fontsize=50, color='white')
    txt_clip = txt_clip.set_position('center').set_duration(5)

    # Composite
    final = CompositeVideoClip([resized, txt_clip])

    # Export
    final.write_videofile(
        output_path,
        codec='libx264',
        audio_codec='aac'
    )

    clip.close()

def create_thumbnail(video_path, output_path, time=5):
    clip = VideoFileClip(video_path)
    frame = clip.get_frame(time)
    clip.save_frame(output_path, t=time)
    clip.close()

def concatenate_videos(video_paths, output_path):
    clips = [VideoFileClip(path) for path in video_paths]
    final = concatenate_videoclips(clips)
    final.write_videofile(output_path)
    for clip in clips:
        clip.close()
```

### FFmpeg Subprocess
```python
import subprocess
import json

def get_video_info(video_path):
    """Get video metadata using ffprobe"""
    cmd = [
        'ffprobe',
        '-v', 'quiet',
        '-print_format', 'json',
        '-show_format',
        '-show_streams',
        video_path
    ]

    result = subprocess.run(cmd, capture_output=True, text=True)
    return json.loads(result.stdout)

def transcode_video(input_path, output_path, options=None):
    """Transcode video with FFmpeg"""
    cmd = ['ffmpeg', '-i', input_path]

    if options:
        if 'video_codec' in options:
            cmd.extend(['-c:v', options['video_codec']])
        if 'audio_codec' in options:
            cmd.extend(['-c:a', options['audio_codec']])
        if 'crf' in options:
            cmd.extend(['-crf', str(options['crf'])])
        if 'resolution' in options:
            cmd.extend(['-vf', f"scale={options['resolution']}"])

    cmd.extend(['-y', output_path])

    result = subprocess.run(cmd, capture_output=True, text=True)

    if result.returncode != 0:
        raise Exception(f"FFmpeg error: {result.stderr}")

    return True

def generate_hls(input_path, output_dir):
    """Generate HLS stream"""
    import os
    os.makedirs(output_dir, exist_ok=True)

    cmd = [
        'ffmpeg', '-i', input_path,
        '-c:v', 'libx264',
        '-c:a', 'aac',
        '-hls_time', '10',
        '-hls_list_size', '0',
        '-hls_segment_filename', f'{output_dir}/segment_%03d.ts',
        f'{output_dir}/index.m3u8'
    ]

    subprocess.run(cmd, check=True)
```

---

## 7. Live Streaming

### RTMP Server (nginx-rtmp)
```nginx
# nginx.conf
rtmp {
    server {
        listen 1935;

        application live {
            live on;
            record off;

            # HLS output
            hls on;
            hls_path /var/www/hls;
            hls_fragment 3;
            hls_playlist_length 60;
        }
    }
}
```

### Stream to RTMP
```bash
# Stream file to RTMP
ffmpeg -re -i input.mp4 -c copy -f flv rtmp://server/live/stream

# Stream webcam
ffmpeg -f v4l2 -i /dev/video0 -f alsa -i hw:0 \
  -c:v libx264 -preset veryfast -b:v 2M \
  -c:a aac -b:a 128k \
  -f flv rtmp://server/live/stream

# Stream to YouTube
ffmpeg -i input.mp4 \
  -c:v libx264 -preset medium -b:v 4M \
  -c:a aac -b:a 128k \
  -f flv rtmp://a.rtmp.youtube.com/live2/YOUR_STREAM_KEY
```

---

## 8. Video Processing Pipeline

### Async Processing with Celery
```python
from celery import Celery
import os

celery = Celery('video_tasks')

@celery.task
def process_upload(video_id, input_path):
    """Complete video processing pipeline"""
    output_dir = f'/videos/{video_id}'
    os.makedirs(output_dir, exist_ok=True)

    # Generate thumbnail
    generate_thumbnail.delay(video_id, input_path, f'{output_dir}/thumb.jpg')

    # Transcode to multiple resolutions
    resolutions = [
        ('1080p', '1920:1080', '5M'),
        ('720p', '1280:720', '2.5M'),
        ('480p', '854:480', '1M'),
    ]

    for name, scale, bitrate in resolutions:
        transcode_resolution.delay(
            video_id, input_path,
            f'{output_dir}/{name}.mp4',
            scale, bitrate
        )

    # Generate HLS
    generate_hls.delay(video_id, input_path, f'{output_dir}/hls')

@celery.task
def generate_thumbnail(video_id, input_path, output_path):
    cmd = [
        'ffmpeg', '-i', input_path,
        '-ss', '5', '-frames:v', '1',
        '-vf', 'scale=320:-1',
        output_path
    ]
    subprocess.run(cmd, check=True)

@celery.task
def transcode_resolution(video_id, input_path, output_path, scale, bitrate):
    cmd = [
        'ffmpeg', '-i', input_path,
        '-c:v', 'libx264', '-b:v', bitrate,
        '-vf', f'scale={scale}',
        '-c:a', 'aac', '-b:a', '128k',
        output_path
    ]
    subprocess.run(cmd, check=True)
```

---

## 9. Audio Analysis

### Waveform Generation
```python
import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np

def generate_waveform(audio_path, output_path):
    """Generate waveform image"""
    y, sr = librosa.load(audio_path)

    plt.figure(figsize=(14, 5))
    librosa.display.waveshow(y, sr=sr)
    plt.title('Waveform')
    plt.savefig(output_path, dpi=150, bbox_inches='tight')
    plt.close()

def generate_spectrogram(audio_path, output_path):
    """Generate spectrogram image"""
    y, sr = librosa.load(audio_path)
    D = librosa.stft(y)
    S_db = librosa.amplitude_to_db(np.abs(D), ref=np.max)

    plt.figure(figsize=(14, 5))
    librosa.display.specshow(S_db, sr=sr, x_axis='time', y_axis='hz')
    plt.colorbar(format='%+2.0f dB')
    plt.title('Spectrogram')
    plt.savefig(output_path, dpi=150, bbox_inches='tight')
    plt.close()
```

---

## 10. Best Practices

### Encoding Presets
```bash
# Web-optimized MP4
ffmpeg -i input.mp4 \
  -c:v libx264 -preset slow -crf 22 \
  -c:a aac -b:a 128k \
  -movflags +faststart \
  output.mp4

# Archive quality
ffmpeg -i input.mp4 \
  -c:v libx264 -preset veryslow -crf 18 \
  -c:a flac \
  archive.mkv

# Mobile-optimized
ffmpeg -i input.mp4 \
  -c:v libx264 -preset fast -crf 28 \
  -vf "scale='min(720,iw)':-2" \
  -c:a aac -b:a 96k \
  mobile.mp4
```

---

## Best Practices

1. **Use hardware acceleration** - NVENC, VAAPI, VideoToolbox
2. **Optimize for web** - faststart flag for streaming
3. **Choose right codec** - H.264 for compatibility, VP9/AV1 for quality
4. **Appropriate CRF** - 18-23 for quality, 28+ for size
5. **Two-pass for size** - When file size matters
6. **Process async** - Queue long operations
7. **Generate previews** - Thumbnails and sprites
8. **Validate input** - Check format before processing
9. **Clean temp files** - Remove intermediate files
10. **Monitor resources** - CPU/memory limits