# API Reference

All workflows accept an optional `credentials` object for [runtime credential injection](./CREDENTIALS.md#runtime-credentials). This is inherited from the base `MuxAIOptions` interface and is not repeated for each workflow below.

## `getSummaryAndTags(assetId, options?)`

Analyzes a Mux video or audio asset and returns AI-generated metadata.

**Parameters:**

- `assetId` (string) - Mux asset ID (video or audio-only)
- `options` (optional) - Configuration options

**Options:**

- `provider?: 'openai' | 'anthropic' | 'google'` - AI provider (default: 'openai')
- `tone?: 'neutral' | 'playful' | 'professional'` - Analysis tone (default: 'neutral')
- `model?: string` - AI model to use (defaults: `gpt-5.1`, `claude-sonnet-4-5`, or `gemini-3-flash-preview`)
- `languageCode?: string` - Language code for transcript track selection (e.g., 'en', 'fr'). When omitted, prefers English if available.
- `outputLanguageCode?: string` - BCP 47 language code (e.g., 'en', 'fr', 'ja') for the generated title, description, and tags. When omitted or set to `'auto'`, auto-detects from the selected transcript track's language. Falls back to unconstrained (LLM decides) if no language metadata is available.
- `includeTranscript?: boolean` - Include transcript in analysis (default: true)
- `cleanTranscript?: boolean` - Remove VTT timestamps and formatting from transcript (default: true)
- `imageSubmissionMode?: 'url' | 'base64'` - How to submit storyboard to AI providers (default: 'url')
- `imageDownloadOptions?: object` - Options for image download when using base64 mode
  - `timeout?: number` - Request timeout in milliseconds (default: 10000)
  - `retries?: number` - Maximum retry attempts (default: 3)
  - `retryDelay?: number` - Base delay between retries in milliseconds (default: 1000)
  - `maxRetryDelay?: number` - Maximum delay between retries in milliseconds (default: 10000)
  - `exponentialBackoff?: boolean` - Whether to use exponential backoff (default: true)
- `promptOverrides?: object` - Override specific sections of the prompt for custom use cases
  - `task?: string` - Override the main task instruction
  - `title?: string` - Override title generation guidance
  - `description?: string` - Override description generation guidance
  - `keywords?: string` - Override keywords generation guidance
  - `qualityGuidelines?: string` - Override quality guidelines

**Returns:**

```typescript
interface SummaryAndTagsResult {
  assetId: string;
  title: string; // Short title
  description: string; // Detailed description
  tags: string[]; // Up to 10 relevant keywords
  storyboardUrl?: string; // Video storyboard URL (undefined for audio-only assets)
  usage?: TokenUsage; // Token usage from the AI provider
  transcriptText?: string; // Raw transcript text (when includeTranscript is true)
}
```

## `getModerationScores(assetId, options?)`

Analyzes a Mux asset for inappropriate content using OpenAI's Moderation API, Hive's Moderation API, or Google Cloud Vision SafeSearch.

- For **video assets**, this moderates **storyboard thumbnails** (image moderation).
- For **audio-only assets**, this moderates the **underlying transcript text** (text moderation). Only `openai` supports this; `hive` and `google-vision-api` are image-only and will throw.

**Parameters:**

- `assetId` (string) - Mux asset ID (video or audio-only)
- `options` (optional) - Configuration options

**Options:**

- `provider?: 'openai' | 'hive' | 'google-vision-api'` - Moderation provider (default: 'openai')
- `model?: string` - OpenAI moderation model to use (default: `omni-moderation-latest`); ignored for `hive` and `google-vision-api`
- `languageCode?: string` - Transcript language code when moderating audio-only assets (optional)
- `thresholds?: { sexual?: number; violence?: number }` - Custom thresholds (default: {sexual: 0.7, violence: 0.8})
- `thumbnailInterval?: number` - Seconds between thumbnails for long videos (default: 10)
- `thumbnailWidth?: number` - Thumbnail width in pixels (default: 640)
- `maxSamples?: number` - Maximum number of thumbnails to sample. Acts as a cap: if `thumbnailInterval` produces fewer samples than this limit the interval is respected; otherwise samples are evenly distributed with first and last frames pinned. (default: unlimited)
- `maxConcurrent?: number` - Maximum concurrent API requests (default: 5)
- `imageSubmissionMode?: 'url' | 'base64'` - How to submit images to AI providers (default: 'url')
- `imageDownloadOptions?: object` - Options for image download when using base64 mode
  - `timeout?: number` - Request timeout in milliseconds (default: 10000)
  - `retries?: number` - Maximum retry attempts (default: 3)
  - `retryDelay?: number` - Base delay between retries in milliseconds (default: 1000)
  - `maxRetryDelay?: number` - Maximum delay between retries in milliseconds (default: 10000)
  - `exponentialBackoff?: boolean` - Whether to use exponential backoff (default: true)

**Hive note (audio-only):** transcript moderation submits `text_data` and requires a Hive **Text Moderation** project/API key. If you use a Visual Moderation key, Hive will reject the request (see [Hive Text Moderation docs](https://docs.thehive.ai/docs/classification-text)).

**Google Vision note:** SafeSearch returns the `adult`, `violence`, `racy`, `spoof`, and `medical` `Likelihood` enum values (`UNKNOWN`..`VERY_LIKELY`). `@mux/ai` consumes only `adult` (mapped to `sexual`) and `violence`, and converts the enum onto a 0..1 scale linearly: `UNKNOWN`=0, `VERY_UNLIKELY`=0.2, `UNLIKELY`=0.4, `POSSIBLE`=0.6, `LIKELY`=0.8, `VERY_LIKELY`=1.0. Because `exceedsThreshold` uses strict `>`, the default 0.8 threshold treats only `VERY_LIKELY` as exceeding — drop the threshold to e.g. 0.7 if you want `LIKELY` to flag. This mapping may change in future versions of `@mux/ai`.

**Returns:**

```typescript
{
  assetId: string;
  mode: 'thumbnails' | 'transcript';
  isAudioOnly: boolean;
  thumbnailScores: Array<{ // Individual thumbnail results
    url: string;
    time?: number; // Time in seconds of the thumbnail within the video
    sexual: number; // 0-1 score
    violence: number; // 0-1 score
    error: boolean;
    errorMessage?: string;
  }>;
  maxScores: { // Highest scores across all thumbnails (or transcript chunks for audio-only)
    sexual: number;
    violence: number;
  };
  coverage: {
    requestedSampleCount: number;
    successfulSampleCount: number;
    failedSampleCount: number;
    sampleCoverage: number; // 0-1 fraction of requested samples that succeeded
    isPartial: boolean; // true when some samples failed but the workflow still returned a result
    isLowConfidence: boolean; // true when coverage is thin and thresholds should be interpreted cautiously
  };
  exceedsThreshold: boolean; // true if content should be flagged
  thresholds: { // Threshold values used
    sexual: number;
    violence: number;
  };
  usage?: TokenUsage; // Workflow usage metadata
}
```

## `hasBurnedInCaptions(assetId, options?)`

Analyzes video frames to detect burned-in captions (hardcoded subtitles) that are permanently embedded in the video image.

**Parameters:**

- `assetId` (string) - Mux video asset ID
- `options` (optional) - Configuration options

**Options:**

- `provider?: 'openai' | 'anthropic' | 'google'` - AI provider (default: 'openai')
- `model?: string` - AI model to use (defaults: `gpt-5.1`, `claude-sonnet-4-5`, or `gemini-3-flash-preview`)
- `imageSubmissionMode?: 'url' | 'base64'` - How to submit storyboard to AI providers (default: 'url')
- `imageDownloadOptions?: object` - Options for image download when using base64 mode
  - `timeout?: number` - Request timeout in milliseconds (default: 10000)
  - `retries?: number` - Maximum retry attempts (default: 3)
  - `retryDelay?: number` - Base delay between retries in milliseconds (default: 1000)
  - `maxRetryDelay?: number` - Maximum delay between retries in milliseconds (default: 10000)
  - `exponentialBackoff?: boolean` - Whether to use exponential backoff (default: true)
- `promptOverrides?: object` - Override specific sections of the detection prompt
  - `task?: string` - Override the main analysis task instruction
  - `analysisSteps?: string` - Override the step-by-step analysis procedure
  - `positiveIndicators?: string` - Override criteria for classifying text as captions
  - `negativeIndicators?: string` - Override criteria for ruling out captions

**Returns:**

```typescript
{
  assetId: string;
  hasBurnedInCaptions: boolean; // Whether burned-in captions were detected
  confidence: number; // Confidence score (0.0-1.0)
  detectedLanguage: string | null; // Language of detected captions, or null
  storyboardUrl: string; // URL to analyzed storyboard
  usage?: TokenUsage; // Token usage from the AI provider
}
```

**Detection Logic:**

- Analyzes video storyboard frames to identify text overlays
- Distinguishes between actual captions and marketing/end-card text
- Text appearing only in final 1-2 frames is classified as marketing copy
- Caption text must appear across multiple frames throughout the timeline
- Optimized prompts minimize false positives

## `askQuestions(assetId, questions, options?)`

Answer questions about asset content by analyzing storyboard frames and optional transcripts. For audio-only assets, this workflow analyzes transcript text only. By default, answers are "yes"/"no", but you can override the allowed responses.

**Parameters:**

- `assetId` (string) - Mux asset ID (video or audio-only)
- `questions` (array) - Array of question objects
  - Each question object must have a `question` field (string)
  - Each question may optionally include `answerOptions?: string[]` (defaults to `["yes", "no"]`)
  - Each question may optionally include `freeFormReply?: boolean` (**experimental** — see below)
  - Example: `[{ question: "What is the production quality?", answerOptions: ["amateur", "semi-pro", "professional"] }]`
- `options` (optional) - Configuration options

**Options:**

- `provider?: 'openai' | 'anthropic' | 'google'` - AI provider (default: 'openai')
- `model?: string` - AI model to use (defaults: `gpt-5.1`, `claude-sonnet-4-5`, or `gemini-3-flash-preview`)
- `languageCode?: string` - Language code for transcript track selection (e.g., 'en', 'fr'). When omitted, prefers English if available.
- `includeTranscript?: boolean` - Include transcript in analysis (default: true, required for audio-only assets)
- `cleanTranscript?: boolean` - Remove VTT timestamps and formatting from transcript (default: true)
- `imageSubmissionMode?: 'url' | 'base64'` - How to submit storyboard to AI providers (default: 'url')
- `imageDownloadOptions?: object` - Options for image download when using base64 mode
  - `timeout?: number` - Request timeout in milliseconds (default: 10000)
  - `retries?: number` - Maximum retry attempts (default: 3)
  - `retryDelay?: number` - Base delay between retries in milliseconds (default: 1000)
  - `maxRetryDelay?: number` - Maximum delay between retries in milliseconds (default: 10000)
  - `exponentialBackoff?: boolean` - Whether to use exponential backoff (default: true)
- `storyboardWidth?: number` - Storyboard resolution in pixels (default: 640)
- `maxFreeFormAnswerLength?: number` - **Experimental.** Maximum character length for free-form answers when a question sets `freeFormReply: true` (default: 500). Keep low to bound the open-ended output channel.

**Returns:**

```typescript
interface AskQuestionsResult {
  assetId: string;
  answers: Array<{
    question: string; // The original question
    answer: string | null; // Answer from allowed options (null when skipped)
    confidence: number; // Confidence score (0.0-1.0)
    reasoning: string; // AI's explanation based on observable evidence or why the question was skipped
    skipped: boolean; // True when the question was not answerable from the asset content
  }>;
  storyboardUrl?: string; // URL to analyzed storyboard (undefined for audio-only assets)
  usage?: TokenUsage; // Token usage from the AI provider
  transcriptText?: string; // Raw transcript (when includeTranscript is true)
}
```

**Examples:**

```typescript
// Single question
const result = await askQuestions("asset-id", [
  { question: "Does this video contain cooking?" }
]);

console.log(result.answers[0].answer); // "yes" or "no" by default
console.log(result.answers[0].confidence); // 0.95
console.log(result.answers[0].reasoning); // "A chef prepares ingredients..."

// Multiple questions (efficient single API call)
const result = await askQuestions("asset-id", [
  { question: "Does this video contain people?" },
  { question: "Is this video in color?" },
  { question: "Does this video contain violence?" }
]);

// Without transcript (visual-only analysis)
const result = await askQuestions("asset-id", questions, {
  includeTranscript: false
});

// Per-question answer options — mix yes/no with classification scales
const result = await askQuestions("asset-id", [
  { question: "Does this contain cooking?" }, // answer options default to yes/no
  { question: "What is the production quality?", answerOptions: ["amateur", "semi-pro", "professional"] },
  { question: "What is the primary content type?", answerOptions: ["tutorial", "entertainment", "news", "advertisement"] },
  { question: "What is the overall sentiment?", answerOptions: ["positive", "neutral", "negative"] },
]);
```

**Tips for Effective Questions:**

- Be specific and focused on observable evidence
- Frame questions positively (prefer "Is X present?" over "Is X not present?")
- Avoid ambiguous or subjective questions
- Questions should have clear answers that map to your allowed options
- The AI prioritizes visual evidence when transcript and visuals conflict

> ⚠️ **Experimental:** `freeFormReply` enables open-ended answers for a given
> question. This bypasses the enum schema and allows the model to reply with
> prose instead of yes/no or `answerOptions` — use it only when your use
> case genuinely needs open-ended answers (e.g. describing a scene,
> extracting a quoted line) and treat the answer as untrusted model output.
> Mutually exclusive with `answerOptions` (setting both throws). The
> length cap (`maxFreeFormAnswerLength`, default 500 chars) and the
> output-safety scrubber still apply. API shape may change.

```typescript
// Experimental: free-form answers for a single question
const result = await askQuestions("asset-id", [
  { question: "Is this video about glasses?" }, // defaults to yes/no
  {
    question: "Describe the primary subject of this video in one sentence.",
    freeFormReply: true,
  },
], { maxFreeFormAnswerLength: 300 });

console.log(result.answers[1].answer); // e.g. "A pair of tortoiseshell glasses..."
```

## `generateEngagementInsights(assetId, options?)`

Generate AI-powered insights explaining viewer engagement patterns by analyzing hotspot data, heatmap statistics, visual frames, and transcripts.

**Parameters:**

- `assetId` (string) - Mux asset ID
- `options` (optional) - Configuration options

**Options:**

- `provider?: 'openai' | 'anthropic' | 'google'` - AI provider (default: 'openai')
- `model?: string` - AI model to use (defaults: `gpt-5.1`, `claude-sonnet-4-5`, or `gemini-3-flash-preview`)
- `hotspotLimit?: number` - Number of engagement moments to analyze per direction (default: 5, range: 1-10). Note: actual moment count may be up to 2x this value since both peaks and valleys are fetched.
- `timeframe?: string` - Engagement data timeframe (default: '7:days')
  - Examples: `'60:minutes'`, `'24:hours'`, `'7:days'`, `'30:days'`

- `skipShots?: boolean` - Skip shots integration, use thumbnails instead (default: false). Recommended for latency-sensitive use cases.

**Returns:**

```typescript
interface EngagementInsightsResult {
  assetId: string;
  momentInsights: Array<{
    startMs: number; // Start time in milliseconds
    endMs: number; // End time in milliseconds
    timestamp: string; // Human-readable timestamp (e.g., "2:15")
    engagementScore: number; // Normalized score (0.0-1.0)
    insight: string; // Explanation of engagement pattern
  }>;
  overallInsight: {
    summary: string; // Overall engagement summary
    trends: string[]; // Key trends identified
  };
  usage?: { // Token usage statistics
    inputTokens: number;
    outputTokens: number;
    totalTokens: number;
  };
}
```

**Examples:**

```typescript
// Basic usage - informational insights
const result = await generateEngagementInsights("asset-id");

result.momentInsights.forEach(m => {
  console.log(`${m.timestamp}: ${m.insight}`);
});

// Custom timeframe
const result = await generateEngagementInsights("asset-id", {
  timeframe: "30:days",
  hotspotLimit: 5,
});

console.log(result.overallInsight.summary);
console.log("Trends:", result.overallInsight.trends);

// Low-latency mode (skip shots polling)
const result = await generateEngagementInsights("asset-id", {
  skipShots: true,
});
```

**Requirements:**

- Newer or low-view videos may not have sufficient engagement data
- Works with both video and audio-only assets (audio-only skips visual analysis)

**Use Cases:**

- Content optimization based on viewer behavior
- Understanding what drives re-watching and engagement
- Identifying pacing issues and drop-off points
- A/B testing video variations
- Providing engagement feedback to content creators

## `translateCaptions(assetId, trackId, toLanguageCode, options?)`

Translates existing captions from one language to another and optionally adds them as a new track to the Mux asset. The source language is inferred from the track's metadata.

**Parameters:**

- `assetId` (string) - Mux asset ID (video or audio-only)
- `trackId` (string) - ID of the source caption track to translate
- `toLanguageCode` (string) - Target language code (e.g., 'es', 'fr', 'de')
- `options` - Configuration options

**Options:**

- `provider: 'openai' | 'anthropic' | 'google'` - AI provider (required)
- `model?: string` - Model to use (defaults to the provider's chat model if omitted)
- `uploadToMux?: boolean` - Whether to upload translated track to Mux (default: true)
- `s3Endpoint?: string` - S3-compatible storage endpoint
- `s3Region?: string` - S3 region (default: 'auto')
- `s3Bucket?: string` - S3 bucket name
- `storageAdapter?: StorageAdapter` - Optional adapter with `putObject` and `createPresignedGetUrl` methods
- `s3SignedUrlExpirySeconds?: number` - Expiry duration in seconds for S3 presigned GET URLs (default: 86400 / 24 hours)
- `chunking?: object` - Optional VTT-aware chunking controls for large caption translations
  - `enabled?: boolean` - Set to `false` to translate all cues in a single structured request (default: `true`)
  - `minimumAssetDurationSeconds?: number` - Prefer a single request until the asset is at least this long (default: `1800`)
  - `targetChunkDurationSeconds?: number` - Soft target for chunk duration once chunking starts (default: `1800`)
  - `maxConcurrentTranslations?: number` - Max number of concurrent translation requests when chunking (default: `4`)
  - `maxCuesPerChunk?: number` - Hard cap for cues included in a single AI translation chunk (default: `80`)
  - `maxCueTextTokensPerChunk?: number` - Approximate cap for cue text tokens included in a single AI translation chunk (default: `2000`)

**Returns:**

```typescript
interface TranslationResult {
  assetId: string;
  trackId: string; // Source track ID
  sourceLanguageCode: string; // Inferred from track metadata
  targetLanguageCode: string;
  sourceLanguage: LanguageCodePair; // { iso639_1: string; iso639_3: string }
  targetLanguage: LanguageCodePair; // { iso639_1: string; iso639_3: string }
  originalVtt: string; // Original VTT content
  translatedVtt: string; // Translated VTT content
  uploadedTrackId?: string; // Mux track ID (if uploaded)
  presignedUrl?: string; // S3 presigned URL (default expiry: 24 hours)
  usage?: TokenUsage; // Token usage from the AI provider
}
```

**Supported Languages:**
All ISO 639-1 language codes are automatically supported using `Intl.DisplayNames`. Examples: Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Japanese (ja), Korean (ko), Chinese (zh), Russian (ru), Arabic (ar), Hindi (hi), Thai (th), Swahili (sw), and many more.

**Chunking Behavior:**

- Chunking is enabled by default for `translateCaptions`
- Shorter assets are translated in a single request until `minimumAssetDurationSeconds` is reached
- When chunking is active, requests stay aligned to VTT cues and the final VTT is rebuilt locally
- Chunk size is bounded by both cue count and approximate cue text token budget

## `editCaptions(assetId, trackId, options)`

Edits a caption track using LLM-powered profanity censorship, static find/replace, or both. Optionally uploads the edited track to Mux.

**Parameters:**

- `assetId` (string) - Mux asset ID (video or audio-only)
- `trackId` (string) - ID of the caption track to edit
- `options` - Configuration options

**Options:**

- `provider?: 'openai' | 'anthropic' | 'google'` - AI provider (required when `autoCensorProfanity` is set)
- `model?: string` - Model to use (defaults to the provider's chat model if omitted)
- `autoCensorProfanity?: object` - LLM-powered profanity censorship (optional)
  - `mode?: 'blank' | 'remove' | 'mask'` - Replacement strategy (default: 'blank')
    - `'blank'`: `shit` → `[____]` (bracketed underscores matching word length)
    - `'remove'`: word removed entirely
    - `'mask'`: `shit` → `????` (question marks matching word length)
  - `alwaysCensor?: string[]` - Words to always censor regardless of LLM output
  - `neverCensor?: string[]` - Words to never censor even if the LLM flags them (takes precedence over `alwaysCensor`)
- `replacements?: Array<{ find: string; replace: string; caseSensitive?: boolean }>` - Static find/replace pairs (optional, no LLM needed). Each entry matches case-sensitively by default; set `caseSensitive: false` to match regardless of case.
- `uploadToMux?: boolean` - Whether to upload edited track to Mux (default: true)
- `deleteOriginalTrack?: boolean` - Whether to delete the original track after uploading the edited one (default: true)
- `s3Endpoint?: string` - S3-compatible storage endpoint
- `s3Region?: string` - S3 region (default: 'auto')
- `s3Bucket?: string` - S3 bucket name
- `trackNameSuffix?: string` - Suffix appended to the original track name in parentheses (default: 'edited', e.g. "Subtitles (edited)")
- `storageAdapter?: StorageAdapter` - Optional adapter with `putObject` and `createPresignedGetUrl` methods
- `s3SignedUrlExpirySeconds?: number` - Expiry duration in seconds for S3 presigned GET URLs (default: 86400 / 24 hours)

At least one of `autoCensorProfanity` or `replacements` must be provided.

**Returns:**

```typescript
interface ReplacementRecord {
  cueStartTime: number; // Start time of the cue where the replacement occurred (seconds)
  before: string; // Original word/phrase
  after: string; // Replacement text
}

interface EditCaptionsResult {
  assetId: string;
  trackId: string;
  originalVtt: string; // Original VTT content
  editedVtt: string; // Edited VTT content
  totalReplacementCount: number; // Total replacements across all operations
  autoCensorProfanity?: { // Present when autoCensorProfanity was used
    replacements: ReplacementRecord[]; // Each censored word with cue timing
  };
  replacements?: { // Present when replacements were used
    replacements: ReplacementRecord[]; // Each static replacement with cue timing
  };
  uploadedTrackId?: string; // Mux track ID (if uploaded)
  presignedUrl?: string; // S3 presigned URL (default expiry: 24 hours)
  usage?: TokenUsage; // Token usage (only present if LLM was used)
}
```

## `generateChapters(assetId, options?)`

Generates AI-powered chapter markers by analyzing video or audio transcripts. Creates logical chapter breaks based on topic changes and content transitions.

**Parameters:**

- `assetId` (string) - Mux asset ID (video or audio-only)
- `options` (optional) - Configuration options

**Options:**

- `languageCode?: string` - Language code for captions (e.g., 'en', 'es', 'fr'). When omitted, prefers English if available.
- `outputLanguageCode?: string` - BCP 47 language code (e.g., 'en', 'fr', 'ja') for the generated chapter titles. When omitted or set to `'auto'`, auto-detects from the selected transcript track's language. Falls back to unconstrained (LLM decides) if no language metadata is available.
- `provider?: 'openai' | 'anthropic' | 'google'` - AI provider (default: 'openai')
- `model?: string` - AI model to use (defaults: `gpt-5.1`, `claude-sonnet-4-5`, or `gemini-3-flash-preview`)
- `promptOverrides?: object` - Override specific sections of the chaptering prompt
  - `task?: string` - Override the main task instruction
  - `outputFormat?: string` - Override the expected output format description
  - `chapterGuidelines?: string` - Override chapter count and formatting guidelines
  - `titleGuidelines?: string` - Override chapter title style guidelines
- `minChaptersPerHour?: number` - Minimum chapters to generate per hour of content (default: 3)
- `maxChaptersPerHour?: number` - Maximum chapters to generate per hour of content (default: 8)

**Returns:**

```typescript
{
  assetId: string;
  languageCode?: string; // Resolved from input or track metadata
  chapters: Array<{
    startTime: number; // Chapter start time in seconds
    title: string; // Descriptive chapter title
  }>;
  usage?: TokenUsage; // Token usage from the AI provider
}
```

**Requirements:**

- Asset must have a ready caption/transcript track
- When `languageCode` is omitted, prefers an English track if available
- Uses existing auto-generated or uploaded captions/transcripts

**Example Output:**

```javascript
// Perfect format for Mux Player
player.addChapters([
  { startTime: 0, title: "Introduction and Setup" },
  { startTime: 45, title: "Main Content Discussion" },
  { startTime: 120, title: "Conclusion" }
]);
```

## `translateAudio(assetId, toLanguageCode, options?)`

Creates AI-dubbed audio tracks from existing media content using ElevenLabs voice cloning and translation. Uses the default audio track on your asset. Source language is auto-detected unless `fromLanguageCode` is provided.

**Parameters:**

- `assetId` (string) - Mux asset ID (video or audio-only; must have audio.m4a static rendition)
- `toLanguageCode` (string) - Target language code (e.g., 'es', 'fr', 'de')
- `options` (optional) - Configuration options

**Options:**

- `provider?: 'elevenlabs'` - AI provider (default: 'elevenlabs')
- `fromLanguageCode?: string` - Optional source language code passed to ElevenLabs `source_lang` (ISO 639-1 or ISO 639-3, default: auto-detect)
- `numSpeakers?: number` - Number of speakers (default: 0 for auto-detect)
- `uploadToMux?: boolean` - Whether to upload dubbed track to Mux (default: true)
- `s3Endpoint?: string` - S3-compatible storage endpoint
- `s3Region?: string` - S3 region (default: 'auto')
- `s3Bucket?: string` - S3 bucket name
- `storageAdapter?: StorageAdapter` - Optional adapter with `putObject` and `createPresignedGetUrl` methods
- `s3SignedUrlExpirySeconds?: number` - Expiry duration in seconds for S3 presigned GET URLs (default: 86400 / 24 hours)

**Returns:**

```typescript
interface TranslateAudioResult {
  assetId: string;
  targetLanguageCode: string;
  targetLanguage: LanguageCodePair; // { iso639_1: string; iso639_3: string }
  dubbingId: string; // ElevenLabs dubbing job ID
  uploadedTrackId?: string; // Mux audio track ID (if uploaded)
  presignedUrl?: string; // S3 presigned URL (default expiry: 24 hours)
  usage?: TokenUsage; // Workflow usage metadata
}
```

**Requirements:**

- Asset must have an `audio.m4a` static rendition (auto-requested if missing)
- ElevenLabs API key with Creator plan or higher
- S3-compatible storage for Mux ingestion

**Supported Languages:**
ElevenLabs supports 32+ languages with automatic language name detection via `Intl.DisplayNames`. Supported languages include English, Spanish, French, German, Italian, Portuguese, Polish, Japanese, Korean, Chinese, Russian, Arabic, Hindi, Thai, and many more. Track names are automatically generated (e.g., "Polish (auto-dubbed)").

## `generateEmbeddings(assetId, options?)`

Generate vector embeddings for transcript chunks from video or audio assets for semantic search.

**Deprecated:**
`generateVideoEmbeddings` is deprecated. Use `generateEmbeddings` instead.

**Parameters:**

- `assetId` (string) - Mux asset ID (video or audio-only)
- `options` (optional) - Configuration options

**Options:**

- `provider?: 'openai' | 'google'` - Embedding provider (default: 'openai')
- `model?: string` - Model to use (defaults: `text-embedding-3-small` for OpenAI, `gemini-embedding-001` for Google)
- `chunkingStrategy?: object` - How to chunk the transcript
  - `type: 'token' | 'vtt'` - Chunking method
  - `maxTokens?: number` - Maximum tokens per chunk (default: 500)
  - `overlap?: number` - Token overlap between chunks (for type: 'token', default: 100)
  - `overlapCues?: number` - VTT cue overlap between chunks (for type: 'vtt', default: 2)
- `languageCode?: string` - Language code for transcript track selection. When omitted, prefers English if available.
- `batchSize?: number` - Maximum number of chunks to process concurrently (default: 5)

**Returns:**

```typescript
{
  assetId: string;
  chunks: Array<{
    chunkId: string;
    embedding: number[]; // Vector embedding
    metadata: {
      startTime?: number; // Chunk start time in seconds
      endTime?: number; // Chunk end time in seconds
      tokenCount: number;
    };
  }>;
  averagedEmbedding: number[]; // Single embedding for entire transcript
  provider: string;
  model: string;
  metadata: {
    totalChunks: number;
    totalTokens: number;
    chunkingStrategy: string;
    embeddingDimensions: number;
    generatedAt: string;
  };
  usage?: TokenUsage; // Workflow usage metadata
}
```

## Custom Prompts with `promptOverrides`

Customize specific sections of the summarization prompt for different use cases like SEO, social media, or technical analysis. See the [Prompt Customization guide](./PROMPT-CUSTOMIZATION.md) for a full overview of the prompt builder pattern.

**Tip:** Before adding overrides, read through the default summarization prompt template in `src/workflows/summarization.ts` (the `summarizationPromptBuilder` config) so that you have clear context on what each section does and what you're changing.

```typescript
import { getSummaryAndTags } from "@mux/ai/workflows";

// SEO-optimized metadata
const seoResult = await getSummaryAndTags(assetId, {
  tone: "professional",
  promptOverrides: {
    task: "Generate SEO-optimized metadata that maximizes discoverability.",
    title: "Create a search-optimized title (50-60 chars) with primary keyword front-loaded.",
    keywords: "Focus on high search volume terms and long-tail keywords.",
  },
});

// Social media optimized for engagement
const socialResult = await getSummaryAndTags(assetId, {
  promptOverrides: {
    title: "Create a scroll-stopping headline using emotional triggers or curiosity gaps.",
    description: "Write shareable copy that creates FOMO and works without watching the video.",
    keywords: "Generate hashtag-ready keywords for trending and niche community tags.",
  },
});

// Technical/production analysis
const technicalResult = await getSummaryAndTags(assetId, {
  tone: "professional",
  promptOverrides: {
    task: "Analyze cinematography, lighting, and production techniques.",
    title: "Describe the production style or filmmaking technique.",
    description: "Provide a technical breakdown of camera work, lighting, and editing.",
    keywords: "Use industry-standard production terminology.",
  },
});
```

**Available override sections:**
| Section | Description |
|---------|-------------|
| `task` | Main instruction for what to analyze |
| `title` | Guidance for generating the title |
| `description` | Guidance for generating the description |
| `keywords` | Guidance for generating keywords/tags |
| `qualityGuidelines` | General quality instructions |

Each override can be a simple string (replaces the section content) or a full `PromptSection` object for advanced control over XML tag names and attributes.

## Common Types

### `TokenUsage`

Returned by all workflows in the `usage` field:

```typescript
interface TokenUsage {
  inputTokens?: number; // Tokens in the input prompt
  outputTokens?: number; // Tokens generated in the output
  totalTokens?: number; // Total tokens consumed
  reasoningTokens?: number; // Chain-of-thought reasoning tokens
  cachedInputTokens?: number; // Input tokens served from cache
  metadata?: {
    assetDurationSeconds?: number;
    thumbnailCount?: number;
  };
}
```

### `LanguageCodePair`

Returned by `translateCaptions` and `translateAudio`:

```typescript
interface LanguageCodePair {
  iso639_1: string; // Two-letter code (e.g., "en", "es") — use for Mux/browser players
  iso639_3: string; // Three-letter code (e.g., "eng", "spa") — use for ElevenLabs
}
```