--- name: scribe description: "Reference skill for Zoom AI Services Scribe. Use after routing to a transcription workflow when handling uploaded or stored media, Build-platform JWT auth, fast mode transcription, batch jobs, or transcript pipeline design." user-invocable: false triggers: - "scribe" - "ai services scribe" - "zoom scribe" - "transcribe audio file" - "transcribe video file" - "batch transcription" - "fast mode transcription" - "build platform jwt" --- # Zoom AI Services Scribe Background reference for Zoom AI Services Scribe across: - synchronous single-file transcription (`POST /aiservices/scribe/transcribe`) - asynchronous batch jobs (`/aiservices/scribe/jobs*`) - browser microphone pseudo-streaming via repeated short file uploads - webhook-driven batch status updates - Build-platform JWT generation and credential handling Official docs: - https://developers.zoom.us/docs/ai-services/ - https://developers.zoom.us/docs/ai-services/scribe/ - https://developers.zoom.us/docs/api/ai-services/ - https://developers.zoom.us/api-hub/ai-services/methods/endpoints.json - Quickstart sample: https://github.com/zoom/scribe-quickstart/ ## Routing Guardrail - If the user needs **uploaded or stored media transcribed into text**, route here first. - If the user needs **live meeting media** without file-based upload/batch jobs, route to [../rtms/SKILL.md](../rtms/SKILL.md). - If the user needs **Zoom REST API inventory** for AI Services paths, chain [../rest-api/SKILL.md](../rest-api/SKILL.md). - If the user needs webhook signature patterns or generic HMAC receiver hardening, optionally chain [../webhooks/SKILL.md](../webhooks/SKILL.md). ## Quick Links 1. [concepts/auth-and-processing-modes.md](concepts/auth-and-processing-modes.md) 2. [scenarios/high-level-scenarios.md](scenarios/high-level-scenarios.md) 3. [examples/fast-mode-node.md](examples/fast-mode-node.md) 4. [examples/batch-webhook-pipeline.md](examples/batch-webhook-pipeline.md) 5. [references/api-reference.md](references/api-reference.md) 6. [references/environment-variables.md](references/environment-variables.md) 7. [references/samples-validation.md](references/samples-validation.md) 8. [references/versioning-and-drift.md](references/versioning-and-drift.md) 9. [troubleshooting/common-drift-and-breaks.md](troubleshooting/common-drift-and-breaks.md) 10. [RUNBOOK.md](RUNBOOK.md) ## Core Workflow 1. Get Build-platform credentials and generate an HS256 JWT. 2. Choose **fast mode** for one short file or **batch mode** for stored archives / large sets. 3. Submit the transcription request. 4. For batch jobs, poll job/file status or receive webhook notifications. 5. Persist and post-process transcript JSON. ## Hosted Fast-Mode Guardrail - The formal fast-mode API limits are `100 MB` and `2 hours`, but hosted browser flows can still time out before the upstream response returns. - Current deployed-sample observations: - ~17.2 MB MP4 completed in about `26s` - ~38.6 MB MP4 completed in about `26-37s` - ~59.2 MB MP4 completed in about `32-34s` on the backend - some ~59.2 MB browser requests still surfaced as frontend `504` while backend logs later showed `200` - Treat frontend `504` plus backend `200` as a browser/edge timeout race, not an automatic transcription failure. - For hosted UIs, prefer an async request/polling wrapper for fast mode instead of holding the browser open for the full upstream response. - For larger or less predictable media, prefer batch mode even when the file is still within the formal fast-mode size limit. ## Browser Microphone Pattern - `scribe` does not expose a documented real-time streaming API surface. - If you want a browser microphone experience, use pseudo-streaming: 1. capture microphone audio in short chunks 2. upload each chunk through the async fast-mode wrapper 3. poll for completion 4. append chunk transcripts in sequence - Recommended starting cadence: - chunk size: `5 seconds` - acceptable range: `5-10 seconds` - in-flight chunk requests: `2-3` - This is a practical UI pattern for incremental transcript updates, not a substitute for `rtms`. - Treat this as a fallback demo pattern, not the preferred production architecture. - It adds repeated upload overhead, chunk-boundary drift, browser codec/container variability, and transcript stitching complexity. - If the user asks for actual live stream ingestion, low-latency continuous media, or server-push media transport, route to [../rtms/SKILL.md](../rtms/SKILL.md) instead. ## Endpoint Surface | Mode | Method | Path | Use | |------|--------|------|-----| | Fast | `POST` | `/aiservices/scribe/transcribe` | Synchronous transcription for one file | | Batch | `POST` | `/aiservices/scribe/jobs` | Submit asynchronous batch job | | Batch | `GET` | `/aiservices/scribe/jobs` | List jobs | | Batch | `GET` | `/aiservices/scribe/jobs/{jobId}` | Inspect job summary/state | | Batch | `DELETE` | `/aiservices/scribe/jobs/{jobId}` | Cancel queued/processing job | | Batch | `GET` | `/aiservices/scribe/jobs/{jobId}/files` | Inspect per-file results | ## High-Level Scenarios - On-demand clip transcription after a user uploads one recording. - Batch transcription of stored S3 call archives. - Webhook-driven ETL pipeline that writes transcripts to your database/search index. - Re-transcription of Zoom-managed recordings after exporting them to your own storage. - Offline compliance or QA workflows that need timestamps, channel separation, and speaker hints. ## Chaining - Stored Zoom recordings -> [../rest-api/SKILL.md](../rest-api/SKILL.md) + `scribe` - Webhook verification hardening -> [../webhooks/SKILL.md](../webhooks/SKILL.md) - Real-time live transcript/media -> [../rtms/SKILL.md](../rtms/SKILL.md) - Cross-product routing -> [../general/SKILL.md](../general/SKILL.md) ## Operations - [RUNBOOK.md](RUNBOOK.md) - 5-minute preflight and debugging checklist.