---
name: interview-transcription
description: Transcription workflows, recording management, and quote extraction for journalists. Use when processing audio/video recordings, generating transcripts with timestamps, extracting quotes for fact-checking, or building source-and-recording databases. For interview question design and pre-interview preparation, see the interview-prep skill.
---

# Interview transcription and management

Practical workflows for journalists managing interviews from preparation through publication.

## When to activate

- Preparing questions for an interview
- Processing audio/video recordings
- Creating or managing transcripts
- Organizing notes from multiple sources
- Building a source relationship database
- Generating timestamped quotes for fact-checking
- Converting recordings to publishable quotes

## Recording setup for transcription

For pre-interview research, question design, attribution agreements, and consent scripts, use the **interview-prep** skill. The notes here cover only the recording configuration that affects transcription quality.

```python
# Standard recording configuration for clean transcription
RECORDING_SETTINGS = {
    'format': 'wav',           # Lossless for transcription
    'sample_rate': 16000,      # Whisper resamples to 16k anyway; 16k saves disk
    'channels': 1,             # Mono is fine for speech; stereo only if mics are positionally distinct
    'backup': True,            # Always run a backup recorder
}

# File naming convention
# YYYY-MM-DD_source-lastname_topic.wav
# Example: 2026-05-08_smith_budget-hearing.wav
```

**Two-device rule.** Always record on two devices. Phone as backup minimum. If using a wireless lav mic, the recorder built into the lav unit is one device; the phone running a backup app is the second.

**Mono is preferred** unless each speaker has their own dedicated microphone routed to a distinct channel. Stereo with both speakers bleeding into both channels is worse for diarization than clean mono.

## Transcription workflows

### Automated transcription pipeline

Vanilla OpenAI Whisper transcribes audio to text but does **not** assign speaker labels. To get diarized output ("Speaker 1:" / "Speaker 2:" / etc.) you need a tool that combines Whisper with a diarization model — typically **WhisperX** (`m-bain/whisperX`), which wraps faster-whisper transcription with pyannote.audio diarization and produces word-level timestamps with speaker IDs in one pass.

```python
from pathlib import Path
import subprocess
import json

def transcribe_interview(
    audio_path: str,
    output_dir: str = "./transcripts",
    diarize: bool = True,
    hf_token: str | None = None,
    min_speakers: int = 2,
    max_speakers: int = 2,
) -> dict:
    """
    Transcribe an interview using WhisperX (Whisper + pyannote diarization).
    Returns a transcript with word-level timestamps and speaker labels.

    Diarization needs a Hugging Face token with access to the pyannote
    speaker-diarization-3.1 model. Accept the model EULA at
    huggingface.co/pyannote/speaker-diarization-3.1 once, then pass the token.
    """
    Path(output_dir).mkdir(exist_ok=True)

    cmd = [
        'whisperx', audio_path,
        '--model', 'large-v3',
        '--output_format', 'json',
        '--output_dir', output_dir,
        '--language', 'en',
        '--compute_type', 'int8',     # CPU-friendly; use 'float16' on GPU
        '--min_speakers', str(min_speakers),
        '--max_speakers', str(max_speakers),
    ]

    if diarize:
        cmd.append('--diarize')
        if hf_token:
            cmd += ['--hf_token', hf_token]

    subprocess.run(cmd, check=True, capture_output=True)

    json_path = Path(output_dir) / f"{Path(audio_path).stem}.json"
    with open(json_path) as f:
        return json.load(f)

def format_for_editing(transcript: dict) -> str:
    """Convert to journalist-friendly format with timestamps."""
    lines = []
    for segment in transcript.get('segments', []):
        timestamp = format_timestamp(segment['start'])
        text = segment['text'].strip()
        lines.append(f"[{timestamp}] {text}")
    return '\n\n'.join(lines)

def format_timestamp(seconds: float) -> str:
    """Convert seconds to HH:MM:SS format."""
    h = int(seconds // 3600)
    m = int((seconds % 3600) // 60)
    s = int(seconds % 60)
    return f"{h:02d}:{m:02d}:{s:02d}"
```

**Falling back to plain Whisper.** If diarization is overkill or you can't get a Hugging Face token, drop the `--diarize` flag — the model still produces accurate timestamped transcription and you label speakers manually based on context. `faster-whisper` (CTranslate2 backend) is the speed-optimized variant and works the same way at the CLI. `whisper.cpp` is the C++ port for resource-constrained machines (Raspberry Pi, older laptops); it doesn't include diarization but runs the small/medium models on CPU comfortably.

### Manual transcription template

For sensitive interviews or when AI transcription fails:

```markdown
## Transcript: [Source] - [Date]

**Recording file**: [filename]
**Duration**: [XX:XX]
**Transcribed by**: [name]
**Verified against recording**: [ ] Yes / [ ] No

---

[00:00:15] **Q**: [Your question]

[00:00:45] **A**: [Source response - verbatim, including ums, pauses noted as (...)]

[00:01:30] **Q**: [Follow-up]

[00:01:42] **A**: [Response]

---

## Notes
- [Anything not captured in audio: gestures, documents shown, etc.]

## Potential quotes
- [00:01:42] "Quote that stands out" - context: [why it matters]
```

## Quote extraction and verification

### Pull quotes workflow

```python
from dataclasses import dataclass
from typing import Optional
import re

@dataclass
class Quote:
    text: str
    timestamp: str
    speaker: str
    context: str
    verified: bool = False
    used_in: Optional[str] = None

class QuoteBank:
    """Manage quotes from interview transcripts."""

    def __init__(self):
        self.quotes = []

    def extract_quote(self, transcript: str, start_time: str,
                      end_time: str, speaker: str, context: str) -> Quote:
        """Extract and store a quote with metadata."""
        # Pull text between timestamps
        pattern = rf'\[{re.escape(start_time)}\](.+?)(?=\[\d|$)'
        match = re.search(pattern, transcript, re.DOTALL)

        if match:
            text = match.group(1).strip()
            quote = Quote(
                text=text,
                timestamp=start_time,
                speaker=speaker,
                context=context
            )
            self.quotes.append(quote)
            return quote
        return None

    def verify_quote(self, quote: Quote, audio_path: str) -> bool:
        """Mark quote as verified against original recording."""
        # In practice: listen to audio at timestamp, confirm accuracy
        quote.verified = True
        return True

    def export_for_story(self) -> str:
        """Export verified quotes ready for publication."""
        output = []
        for q in self.quotes:
            if q.verified:
                output.append(f'"{q.text}"\n— {q.speaker}\n[Timestamp: {q.timestamp}]')
        return '\n\n'.join(output)
```

### Quote accuracy checklist

Before publishing any quote:

```markdown
- [ ] Listened to original recording at timestamp
- [ ] Quote is verbatim (or clearly marked as paraphrased)
- [ ] Context preserved (not cherry-picked to change meaning)
- [ ] Speaker identified correctly
- [ ] Timestamp documented for fact-checker
- [ ] Source approved quote (if agreement made)
```

## Source management database

### Interview tracking schema

```python
from dataclasses import dataclass, field
from datetime import datetime
from typing import List, Optional
from enum import Enum

class SourceStatus(Enum):
    ACTIVE = "active"           # Currently engaged
    DORMANT = "dormant"         # Not recently contacted
    DECLINED = "declined"       # Refused to participate
    OFF_RECORD = "off_record"   # Background only

class InterviewType(Enum):
    ON_RECORD = "on_record"
    BACKGROUND = "background"
    DEEP_BACKGROUND = "deep_background"
    OFF_RECORD = "off_record"

@dataclass
class Source:
    name: str
    organization: str
    contact_info: dict  # email, phone, signal, etc.
    beat: str
    status: SourceStatus = SourceStatus.ACTIVE
    interviews: List['Interview'] = field(default_factory=list)
    notes: str = ""

    # Relationship tracking
    first_contact: Optional[datetime] = None
    trust_level: int = 1  # 1-5 scale

@dataclass
class Interview:
    source: str
    date: datetime
    interview_type: InterviewType
    recording_path: Optional[str] = None
    transcript_path: Optional[str] = None
    story_slug: Optional[str] = None
    key_quotes: List[str] = field(default_factory=list)
    follow_up_needed: bool = False
    notes: str = ""
```

### Quick source lookup

```python
def find_sources_for_story(sources: List[Source], topic: str,
                           beat: str = None) -> List[Source]:
    """Find relevant sources for a new story."""
    matches = []
    for source in sources:
        # Filter by beat if specified
        if beat and source.beat != beat:
            continue
        # Only suggest active sources
        if source.status != SourceStatus.ACTIVE:
            continue
        # Check if they've spoken on similar topics
        for interview in source.interviews:
            if topic.lower() in interview.notes.lower():
                matches.append(source)
                break

    # Sort by trust level
    return sorted(matches, key=lambda s: s.trust_level, reverse=True)
```

## Audio/video processing

### Batch processing multiple recordings

```python
from pathlib import Path
from concurrent.futures import ProcessPoolExecutor
import json

def batch_transcribe(recordings_dir: str, output_dir: str) -> dict:
    """Process all recordings in a directory."""
    recordings = list(Path(recordings_dir).glob('*.wav')) + \
                 list(Path(recordings_dir).glob('*.mp3')) + \
                 list(Path(recordings_dir).glob('*.m4a'))

    results = {}

    with ProcessPoolExecutor(max_workers=4) as executor:
        futures = {
            executor.submit(transcribe_interview, str(rec), output_dir): rec
            for rec in recordings
        }

        for future in futures:
            rec = futures[future]
            try:
                transcript = future.result()
                results[rec.name] = {
                    'status': 'success',
                    'transcript': transcript
                }
            except Exception as e:
                results[rec.name] = {
                    'status': 'error',
                    'error': str(e)
                }

    return results
```

### Video interview extraction

```python
import subprocess

def extract_audio_from_video(video_path: str, output_path: str = None) -> str:
    """Extract audio track from video for transcription."""
    if output_path is None:
        output_path = video_path.rsplit('.', 1)[0] + '.wav'

    subprocess.run([
        'ffmpeg', '-i', video_path,
        '-vn',  # No video
        '-acodec', 'pcm_s16le',  # WAV format
        '-ar', '44100',  # Sample rate
        '-ac', '1',  # Mono
        output_path
    ], check=True)

    return output_path
```

## Legal and ethical considerations

### Consent documentation

```markdown
## Recording consent record

**Date**:
**Source name**:
**Recording type**: [ ] Audio [ ] Video
**Interview type**: [ ] On record [ ] Background [ ] Off record

### Consent obtained:
- [ ] Verbal consent recorded at start of interview
- [ ] Written consent form signed
- [ ] Email confirmation of consent

### Jurisdiction notes:
- Interview location state/country:
- One-party or two-party consent jurisdiction:
- Any specific restrictions agreed:

### Agreed terms:
- [ ] Full attribution allowed
- [ ] Organization attribution only
- [ ] Anonymous source
- [ ] Review quotes before publication
- [ ] Embargo until [date]:
```

### Recording-consent jurisdiction

For the per-state breakdown of one-party vs. all-party consent, hidden-recording rules, and federal preemption, use the **interview-prep** skill (which points to the Reporters Committee for Freedom of the Press *Reporter's Recording Guide* — the authoritative continuously-updated source).

**Always get explicit consent on recording** regardless of jurisdiction. Note the consent verbatim at the head of every transcript file (timestamp, speaker, response). This protects you legally everywhere and gives the fact-checker a clean starting point.

## Tools and resources

| Tool | Purpose | Notes |
|------|---------|-------|
| OpenAI Whisper | Local transcription, no diarization | Free, runs offline. `large-v3` is the current best model |
| WhisperX | Whisper + speaker diarization | `m-bain/whisperX`. Free. Word-level timestamps with speaker IDs. Needs a Hugging Face token for the pyannote model |
| faster-whisper | Speed-optimized Whisper | CTranslate2 backend. ~4x faster than vanilla Whisper at the same accuracy. Used internally by WhisperX |
| whisper.cpp | CPU-friendly Whisper port | C++ implementation. Runs the small/medium models on a Raspberry Pi |
| pyannote.audio | Standalone speaker diarization | Use directly when you already have transcripts from another source |
| MacWhisper / Buzz | GUI wrappers for Whisper | macOS / cross-platform GUIs for journalists who don't want a CLI |
| Otter.ai | Cloud transcription, real-time | Verify privacy posture before using with sensitive sources — Otter Pilot has historically joined meetings unannounced and indexed transcripts; check current settings |
| Descript | Edit audio like text | Good for pulling clips. Cloud-hosted |
| Rev (human + AI) | Human transcription for sensitive material | Slower, more accurate. Cloud-hosted |
| Trint | Journalist-focused, collaboration | Cloud-hosted. Has team features |
| oTranscribe | Free web-based manual transcription aid | Local-only (browser); no upload. Good for off-the-record material you can't hand to a cloud service |

## Related skills

- **interview-prep** — Pre-interview research, question design, consent scripts, and recording-law jurisdiction
- **source-verification** — Verify source credentials before interview
- **fact-check-workflow** — Verify quotes against the recording before publication
- **foia-requests** — Get documents to inform interview questions
- **data-journalism** — Analyze data sources mentioned in interviews
- **newsroom-style** — Convert verbatim quotes into AP-style copy for publication

---

## Skill metadata

| Field | Value |
|-------|-------|
| version | 1.0.0 |
| created | 2025-12-26 |
| updated | 2026-05-08 |
| author | Joe Amditis |
| domain | journalism, research |
| complexity | intermediate |