--- name: Transcription Automation description: Automate audio/video transcription, meeting notes, subtitle generation, and content processing version: 1.0.0 author: Claude Office Skills category: multimedia tags: - transcription - audio - video - meetings - subtitles department: content models: - claude-3-opus - claude-3-sonnet - gpt-4 - whisper mcp: server: media-mcp tools: - whisper_transcribe - assembly_ai - deepgram_api - subtitle_generate capabilities: - Speech-to-text conversion - Meeting transcription - Subtitle generation - Speaker diarization input: - Audio files - Video files - Live streams - Meeting recordings output: - Text transcripts - SRT/VTT subtitles - Meeting notes - Searchable archives languages: - en - zh - multi related_skills: - youtube-automation - podcast-automation - meeting-notes --- # Transcription Automation Comprehensive skill for automating audio/video transcription and content processing. ## Core Workflows ### 1. Transcription Pipeline ``` TRANSCRIPTION FLOW: ┌─────────────────┐ │ Audio/Video │ │ Input │ └────────┬────────┘ ▼ ┌─────────────────┐ │ Pre-Processing │ │ - Convert │ │ - Enhance │ │ - Split │ └────────┬────────┘ ▼ ┌─────────────────┐ │ Transcription │ │ - STT Engine │ │ - Diarization │ └────────┬────────┘ ▼ ┌─────────────────┐ │ Post-Processing │ │ - Format │ │ - Timestamps │ │ - Speakers │ └────────┬────────┘ ▼ ┌─────────────────┐ │ Output │ │ - Text/SRT/VTT │ │ - Summary │ └─────────────────┘ ``` ### 2. Transcription Configuration ```yaml transcription_config: engine: whisper # whisper, assembly_ai, deepgram audio_settings: sample_rate: 16000 channels: mono format: wav transcription: language: auto # or specific: en, zh, es model: large # tiny, base, small, medium, large task: transcribe # transcribe or translate features: speaker_diarization: true word_timestamps: true punctuation: true profanity_filter: false output: formats: - txt - srt - vtt - json include_confidence: true include_timestamps: true ``` ## Meeting Transcription ### Meeting Notes Template ```yaml meeting_transcript: metadata: title: "{{meeting_title}}" date: "{{date}}" duration: "{{duration}}" attendees: "{{speakers}}" output_template: | # {{title}} **Date:** {{date}} **Duration:** {{duration}} **Attendees:** {{attendees}} ## Summary {{ai_summary}} ## Key Points {{#each key_points}} - {{this}} {{/each}} ## Action Items {{#each action_items}} - [ ] {{task}} - @{{assignee}} - Due: {{due_date}} {{/each}} ## Full Transcript {{#each segments}} **[{{timestamp}}] {{speaker}}:** {{text}} {{/each}} ``` ### Speaker Diarization ```yaml diarization_config: min_speakers: 2 max_speakers: 10 speaker_labels: - name: "Speaker 1" voice_sample: "sample_1.wav" # Optional - name: "Speaker 2" voice_sample: "sample_2.wav" output_format: speaker_prefix: true speaker_timestamps: true example_output: | [00:00:05] SPEAKER_1: Welcome everyone to today's meeting. [00:00:12] SPEAKER_2: Thanks for having us. [00:00:18] SPEAKER_1: Let's start with the agenda. ``` ## Subtitle Generation ### SRT Format ```yaml subtitle_config: format: srt timing: max_duration: 7 # seconds per subtitle min_gap: 0.1 # seconds between subtitles chars_per_line: 42 max_lines: 2 style: case: sentence # sentence, upper, lower numbers: words # words, digits example_output: | 1 00:00:05,000 --> 00:00:08,500 Welcome to today's presentation about transcription automation. 2 00:00:09,000 --> 00:00:12,000 Let me start by explaining the basic concepts. ``` ### VTT Format ```yaml vtt_config: format: vtt features: cue_settings: true styling: true example_output: | WEBVTT 00:00:05.000 --> 00:00:08.500 align:center Welcome to today's presentation about transcription automation. 00:00:09.000 --> 00:00:12.000 align:center Let me start by explaining the basic concepts. ``` ## Integration Workflows ### Zoom Integration ```yaml zoom_transcription: trigger: event: recording_completed workflow: - step: download_recording source: zoom_cloud - step: transcribe engine: whisper language: auto - step: diarize identify_speakers: true - step: generate_notes template: meeting_notes include_summary: true extract_action_items: true - step: distribute destinations: - notion_page - slack_channel - email_attendees ``` ### YouTube Integration ```yaml youtube_subtitles: trigger: event: video_uploaded workflow: - step: download_audio source: youtube_video - step: transcribe engine: whisper task: transcribe - step: generate_subtitles formats: [srt, vtt] - step: translate target_languages: [es, zh, ja, de, fr] - step: upload_subtitles destination: youtube as_cc: true ``` ### Podcast Processing ```yaml podcast_workflow: input: source: rss_feed format: audio/mp3 processing: - transcribe: engine: whisper model: large - generate_chapters: detect_topics: true min_duration: 60 # seconds - create_show_notes: summarize: true extract_links: true highlight_quotes: true - create_searchable_index: full_text: true timestamps: true output: - transcript_txt - chapters_json - show_notes_md - search_index ``` ## Language Support ### Multi-Language Transcription ```yaml multilingual: auto_detect: true supported_languages: - code: en name: English model: large - code: zh name: Chinese model: large - code: es name: Spanish model: large - code: ja name: Japanese model: medium translation: enabled: true target: en preserve_original: true ``` ### Code-Switching ```yaml code_switching: enabled: true primary_language: en secondary_languages: [zh, es] output: | [00:01:23] The next topic is about 人工智能, which has been muy importante in recent years. handling: detect_language_per_segment: true tag_language_switches: true ``` ## Quality Enhancement ### Post-Processing ```yaml post_processing: text_cleanup: - remove_filler_words: ["um", "uh", "like"] - fix_common_errors: true - normalize_numbers: true formatting: - add_punctuation: true - capitalize_sentences: true - paragraph_breaks: true speaker_attribution: - merge_short_segments: true - min_segment_duration: 1.0 output_enhancement: - add_timestamps: true - highlight_keywords: true - generate_summary: true ``` ### Accuracy Metrics ``` TRANSCRIPTION QUALITY REPORT ═══════════════════════════════════════ File: meeting_2024_01_15.mp3 Duration: 45:32 Engine: Whisper Large METRICS: Word Error Rate (WER): 4.2% Character Error Rate: 2.8% Confidence Score: 0.94 SPEAKER DIARIZATION: Speakers Detected: 4 Diarization Accuracy: 91% PROCESSING TIME: Total: 8m 23s Real-time Factor: 0.18x DETECTED ISSUES: • Low confidence at 12:34 (background noise) • Overlapping speech at 23:45 • Unknown speaker at 34:12 ``` ## API Examples ### OpenAI Whisper ```python import openai # Transcribe audio with open("meeting.mp3", "rb") as audio_file: transcript = openai.Audio.transcribe( model="whisper-1", file=audio_file, response_format="verbose_json", timestamp_granularities=["word", "segment"] ) # Access results for segment in transcript.segments: print(f"[{segment.start:.2f}] {segment.text}") ``` ### AssemblyAI ```python import assemblyai as aai transcriber = aai.Transcriber() config = aai.TranscriptionConfig( speaker_labels=True, auto_chapters=True, entity_detection=True ) transcript = transcriber.transcribe( "https://example.com/meeting.mp3", config=config ) for utterance in transcript.utterances: print(f"Speaker {utterance.speaker}: {utterance.text}") ``` ## Best Practices 1. **Quality Audio**: Clean input = better output 2. **Choose Right Model**: Balance speed vs accuracy 3. **Use Diarization**: Identify speakers clearly 4. **Post-Process**: Clean up automated output 5. **Verify Critical Content**: Human review important 6. **Consider Privacy**: Handle sensitive content 7. **Store Efficiently**: Compress and index 8. **Provide Context**: Vocabulary hints help