---
name: interview-transcription
description: Interview management, transcription workflows, and source note-taking for journalists. Use when preparing for interviews, managing recordings, transcribing audio/video, organizing source notes, creating timestamped references, or building interview databases. Essential for reporters conducting interviews and managing source relationships.
---

# Interview transcription and management

Practical workflows for journalists managing interviews from preparation through publication.

## When to activate

- Preparing questions for an interview
- Processing audio/video recordings
- Creating or managing transcripts
- Organizing notes from multiple sources
- Building a source relationship database
- Generating timestamped quotes for fact-checking
- Converting recordings to publishable quotes

## Pre-interview preparation

### Research checklist

Before recording starts, you should already know:

```markdown
## Source prep for: [Name]

### Background
- Role/title:
- Organization:
- Why they're relevant to this story:
- Previous media appearances (note inconsistencies):

### Key questions (prioritized)
1. [Must-ask question]
2. [Must-ask question]
3. [If time permits]

### Documents to reference
- [ ] Bring/share [specific document]
- [ ] Ask about [specific claim/data point]

### Red lines
- Topics they'll likely avoid:
- Sensitive areas to approach carefully:
```

### Recording setup

```python
# Standard recording configuration
RECORDING_SETTINGS = {
    'format': 'wav',           # Lossless for transcription
    'sample_rate': 44100,      # Standard quality
    'channels': 1,             # Mono is fine for speech
    'backup': True,            # Always run backup recorder
}

# File naming convention
# YYYY-MM-DD_source-lastname_topic.wav
# Example: 2024-03-15_smith_budget-hearing.wav
```

**Two-device rule**: Always record on two devices. Phone as backup minimum.

## Transcription workflows

### Automated transcription pipeline

```python
from pathlib import Path
import subprocess
import json

def transcribe_interview(audio_path: str, output_dir: str = "./transcripts") -> dict:
    """
    Transcribe using Whisper with speaker diarization.
    Returns transcript with timestamps.
    """
    Path(output_dir).mkdir(exist_ok=True)

    # Use whisper.cpp or OpenAI Whisper
    result = subprocess.run([
        'whisper',
        audio_path,
        '--model', 'medium',
        '--output_format', 'json',
        '--output_dir', output_dir,
        '--language', 'en',
        '--word_timestamps', 'True'
    ], capture_output=True)

    # Load and return structured transcript
    json_path = Path(output_dir) / f"{Path(audio_path).stem}.json"
    with open(json_path) as f:
        return json.load(f)

def format_for_editing(transcript: dict) -> str:
    """Convert to journalist-friendly format with timestamps."""
    lines = []
    for segment in transcript.get('segments', []):
        timestamp = format_timestamp(segment['start'])
        text = segment['text'].strip()
        lines.append(f"[{timestamp}] {text}")
    return '\n\n'.join(lines)

def format_timestamp(seconds: float) -> str:
    """Convert seconds to HH:MM:SS format."""
    h = int(seconds // 3600)
    m = int((seconds % 3600) // 60)
    s = int(seconds % 60)
    return f"{h:02d}:{m:02d}:{s:02d}"
```

### Manual transcription template

For sensitive interviews or when AI transcription fails:

```markdown
## Transcript: [Source] - [Date]

**Recording file**: [filename]
**Duration**: [XX:XX]
**Transcribed by**: [name]
**Verified against recording**: [ ] Yes / [ ] No

---

[00:00:15] **Q**: [Your question]

[00:00:45] **A**: [Source response - verbatim, including ums, pauses noted as (...)]

[00:01:30] **Q**: [Follow-up]

[00:01:42] **A**: [Response]

---

## Notes
- [Anything not captured in audio: gestures, documents shown, etc.]

## Potential quotes
- [00:01:42] "Quote that stands out" - context: [why it matters]
```

## Quote extraction and verification

### Pull quotes workflow

```python
from dataclasses import dataclass
from typing import Optional
import re

@dataclass
class Quote:
    text: str
    timestamp: str
    speaker: str
    context: str
    verified: bool = False
    used_in: Optional[str] = None

class QuoteBank:
    """Manage quotes from interview transcripts."""

    def __init__(self):
        self.quotes = []

    def extract_quote(self, transcript: str, start_time: str,
                      end_time: str, speaker: str, context: str) -> Quote:
        """Extract and store a quote with metadata."""
        # Pull text between timestamps
        pattern = rf'\[{re.escape(start_time)}\](.+?)(?=\[\d|$)'
        match = re.search(pattern, transcript, re.DOTALL)

        if match:
            text = match.group(1).strip()
            quote = Quote(
                text=text,
                timestamp=start_time,
                speaker=speaker,
                context=context
            )
            self.quotes.append(quote)
            return quote
        return None

    def verify_quote(self, quote: Quote, audio_path: str) -> bool:
        """Mark quote as verified against original recording."""
        # In practice: listen to audio at timestamp, confirm accuracy
        quote.verified = True
        return True

    def export_for_story(self) -> str:
        """Export verified quotes ready for publication."""
        output = []
        for q in self.quotes:
            if q.verified:
                output.append(f'"{q.text}"\n— {q.speaker}\n[Timestamp: {q.timestamp}]')
        return '\n\n'.join(output)
```

### Quote accuracy checklist

Before publishing any quote:

```markdown
- [ ] Listened to original recording at timestamp
- [ ] Quote is verbatim (or clearly marked as paraphrased)
- [ ] Context preserved (not cherry-picked to change meaning)
- [ ] Speaker identified correctly
- [ ] Timestamp documented for fact-checker
- [ ] Source approved quote (if agreement made)
```

## Source management database

### Interview tracking schema

```python
from dataclasses import dataclass, field
from datetime import datetime
from typing import List, Optional
from enum import Enum

class SourceStatus(Enum):
    ACTIVE = "active"           # Currently engaged
    DORMANT = "dormant"         # Not recently contacted
    DECLINED = "declined"       # Refused to participate
    OFF_RECORD = "off_record"   # Background only

class InterviewType(Enum):
    ON_RECORD = "on_record"
    BACKGROUND = "background"
    DEEP_BACKGROUND = "deep_background"
    OFF_RECORD = "off_record"

@dataclass
class Source:
    name: str
    organization: str
    contact_info: dict  # email, phone, signal, etc.
    beat: str
    status: SourceStatus = SourceStatus.ACTIVE
    interviews: List['Interview'] = field(default_factory=list)
    notes: str = ""

    # Relationship tracking
    first_contact: Optional[datetime] = None
    trust_level: int = 1  # 1-5 scale

@dataclass
class Interview:
    source: str
    date: datetime
    interview_type: InterviewType
    recording_path: Optional[str] = None
    transcript_path: Optional[str] = None
    story_slug: Optional[str] = None
    key_quotes: List[str] = field(default_factory=list)
    follow_up_needed: bool = False
    notes: str = ""
```

### Quick source lookup

```python
def find_sources_for_story(sources: List[Source], topic: str,
                           beat: str = None) -> List[Source]:
    """Find relevant sources for a new story."""
    matches = []
    for source in sources:
        # Filter by beat if specified
        if beat and source.beat != beat:
            continue
        # Only suggest active sources
        if source.status != SourceStatus.ACTIVE:
            continue
        # Check if they've spoken on similar topics
        for interview in source.interviews:
            if topic.lower() in interview.notes.lower():
                matches.append(source)
                break

    # Sort by trust level
    return sorted(matches, key=lambda s: s.trust_level, reverse=True)
```

## Audio/video processing

### Batch processing multiple recordings

```python
from pathlib import Path
from concurrent.futures import ProcessPoolExecutor
import json

def batch_transcribe(recordings_dir: str, output_dir: str) -> dict:
    """Process all recordings in a directory."""
    recordings = list(Path(recordings_dir).glob('*.wav')) + \
                 list(Path(recordings_dir).glob('*.mp3')) + \
                 list(Path(recordings_dir).glob('*.m4a'))

    results = {}

    with ProcessPoolExecutor(max_workers=4) as executor:
        futures = {
            executor.submit(transcribe_interview, str(rec), output_dir): rec
            for rec in recordings
        }

        for future in futures:
            rec = futures[future]
            try:
                transcript = future.result()
                results[rec.name] = {
                    'status': 'success',
                    'transcript': transcript
                }
            except Exception as e:
                results[rec.name] = {
                    'status': 'error',
                    'error': str(e)
                }

    return results
```

### Video interview extraction

```python
import subprocess

def extract_audio_from_video(video_path: str, output_path: str = None) -> str:
    """Extract audio track from video for transcription."""
    if output_path is None:
        output_path = video_path.rsplit('.', 1)[0] + '.wav'

    subprocess.run([
        'ffmpeg', '-i', video_path,
        '-vn',  # No video
        '-acodec', 'pcm_s16le',  # WAV format
        '-ar', '44100',  # Sample rate
        '-ac', '1',  # Mono
        output_path
    ], check=True)

    return output_path
```

## Legal and ethical considerations

### Consent documentation

```markdown
## Recording consent record

**Date**:
**Source name**:
**Recording type**: [ ] Audio [ ] Video
**Interview type**: [ ] On record [ ] Background [ ] Off record

### Consent obtained:
- [ ] Verbal consent recorded at start of interview
- [ ] Written consent form signed
- [ ] Email confirmation of consent

### Jurisdiction notes:
- Interview location state/country:
- One-party or two-party consent jurisdiction:
- Any specific restrictions agreed:

### Agreed terms:
- [ ] Full attribution allowed
- [ ] Organization attribution only
- [ ] Anonymous source
- [ ] Review quotes before publication
- [ ] Embargo until [date]:
```

### Two-party consent states (US)

California, Connecticut, Florida, Illinois, Maryland, Massachusetts, Michigan, Montana, Nevada, New Hampshire, Pennsylvania, Washington require all-party consent.

**Always get explicit consent on recording** regardless of jurisdiction.

## Tools and resources

| Tool | Purpose | Notes |
|------|---------|-------|
| Whisper | Local transcription | Free, accurate, private |
| Otter.ai | Cloud transcription | Real-time, speaker ID |
| Descript | Edit audio like text | Good for pulling clips |
| Rev | Human transcription | For sensitive/legal |
| Trint | Journalist-focused | Collaboration features |
| oTranscribe | Free web player | Manual transcription aid |

## Related skills

- **source-verification** - Verify source credentials before interview
- **foia-requests** - Get documents to inform interview questions
- **data-journalism** - Analyze data sources mention in interviews

---

## Skill metadata

| Field | Value |
|-------|-------|
| Version | 1.0.0 |
| Created | 2025-12-26 |
| Author | Claude Skills for Journalism |
| Domain | Journalism, Research |
| Complexity | Intermediate |