# media-gen-mcp

media-gen-mcp MCP SDK OpenAI SDK mcp-proxy fast-mcp License GitHub stars Build Status

--- **Media Gen MCP** is a **strict TypeScript** Model Context Protocol (MCP) server for OpenAI Images (`gpt-image-1.5`, `gpt-image-1`), OpenAI Videos (Sora), and Google GenAI Videos (Veo): generate/edit images, create/remix video jobs, and fetch media from URLs or disk with smart `resource_link` vs inline `image` outputs and optional `sharp` processing. Production-focused (full strict typecheck, ESLint + Vitest CI). Works with fast-agent, Claude Desktop, ChatGPT, Cursor, VS Code, Windsurf, and any MCP-compatible client. **Design principle:** spec-first, type-safe image tooling – strict OpenAI Images API + MCP compliance with fully static TypeScript types and flexible result placements/response formats for different clients. - **Generate images** from text prompts using OpenAI's `gpt-image-1.5` model (with `gpt-image-1` compatibility and DALL·E support planned in future versions). - **Edit images** (inpainting, outpainting, compositing) from 1 up to 16 images at once, with advanced prompt control. - **Generate videos** via OpenAI Videos (`sora-2`, `sora-2-pro`) with job create/remix/list/retrieve/delete and asset downloads. - **Generate videos** via Google GenAI (Veo) with operation polling and file-first downloads. - **Fetch & compress images** from HTTP(S) URLs or local file paths with smart size/quality optimization. - **Fetch documents** from HTTP(S) URLs or local file paths and return `resource_link`/`resource` outputs. - **Debug MCP output shapes** with a `test-images` tool that mirrors production result placement (`content`, `structuredContent`, `toplevel`). - **Integrates with**: [fast-agent](https://github.com/strato-space/fast-agent), [Windsurf](https://windsurf.com), [Claude Desktop](https://www.anthropic.com/claude/desktop), [Cursor](https://cursor.com), [VS Code](https://code.visualstudio.com/), and any MCP-compatible client. --- ## ✨ Features - **Strict MCP spec support** Tool outputs are first-class [`CallToolResult`](https://github.com/modelcontextprotocol/spec/blob/main/schema/2025-11-25/schema.json) objects from the latest MCP schema, including: `content` items (`text`, `image`, `resource_link`, `resource`), optional `structuredContent`, optional top-level `files`, and the `isError` flag for failures. - **Full gpt-image-1.5 and sora-2/sora-2-pro parameters coverage (generate & edit)** - [`openai-images-generate`](#openai-images-generate) mirrors the OpenAI Images [`create`](https://platform.openai.com/docs/api-reference/images/create) API for `gpt-image-1.5` (and `gpt-image-1`) (background, moderation, size, quality, output_format, output_compression, `n`, `user`, etc.). - [`openai-images-edit`](#openai-images-edit) mirrors the OpenAI Images [`createEdit`](https://platform.openai.com/docs/api-reference/images/createEdit) API for `gpt-image-1.5` (and `gpt-image-1`) (image, mask, `n`, quality, size, `user`). - **OpenAI Videos (Sora) job tooling (create / remix / list / retrieve / delete / content)** - [`openai-videos-create`](#openai-videos-create) mirrors [`videos/create`](https://platform.openai.com/docs/api-reference/videos/create) and can optionally wait for completion. - [`openai-videos-remix`](#openai-videos-remix) mirrors [`videos/remix`](https://platform.openai.com/docs/api-reference/videos/remix). - [`openai-videos-list`](#openai-videos-list) mirrors [`videos/list`](https://platform.openai.com/docs/api-reference/videos/list). - [`openai-videos-retrieve`](#openai-videos-retrieve) mirrors [`videos/retrieve`](https://platform.openai.com/docs/api-reference/videos/retrieve). - [`openai-videos-delete`](#openai-videos-delete) mirrors [`videos/delete`](https://platform.openai.com/docs/api-reference/videos/delete). - [`openai-videos-retrieve-content`](#openai-videos-retrieve-content) mirrors [`videos/content`](https://platform.openai.com/docs/api-reference/videos/content) and downloads `video` / `thumbnail` / `spritesheet` assets to disk, returning MCP `resource_link` (default) or embedded `resource` blocks (via `tool_result`). - **Google GenAI (Veo) operations + downloads (generate / retrieve operation / retrieve content)** - [`google-videos-generate`](#google-videos-generate) starts a long-running operation (`ai.models.generateVideos`) and can optionally wait for completion and download `.mp4` outputs. [Veo model reference](https://docs.cloud.google.com/vertex-ai/generative-ai/docs/model-reference/veo-video-generation) - [`google-videos-retrieve-operation`](#google-videos-retrieve-operation) polls an existing operation. - [`google-videos-retrieve-content`](#google-videos-retrieve-content) downloads an `.mp4` from a completed operation, returning MCP `resource_link` (default) or embedded `resource` blocks (via `tool_result`). - **Fetch and process images from URLs or files** [`fetch-images`](#fetch-images) tool loads images from HTTP(S) URLs or local file paths with optional, user-controlled compression (disabled by default). Supports parallel processing of up to 20 images. - **Fetch videos from URLs or files** [`fetch-videos`](#fetch-videos) tool lists local videos or downloads remote video URLs to disk and returns MCP `resource_link` (default) or embedded `resource` blocks (via `tool_result`). - **Fetch documents from URLs or files** [`fetch-document`](#fetch-document) tool downloads remote files or reuses local paths and returns MCP `resource_link` (default) or embedded `resource` blocks (via `tool_result`). - **Mix and edit up to 16 images** [`openai-images-edit`](#openai-images-edit) accepts `image` as a single string or an array of 1–16 file paths/base64 strings, matching the OpenAI spec for GPT Image models (`gpt-image-1.5`, `gpt-image-1`) image edits. - **Smart image compression** Built-in compression using [sharp](https://sharp.pixelplumbing.com/) — iteratively reduces quality and dimensions to fit MCP payload limits while maintaining visual quality. - **Resource-aware file output with `resource_link`** - Automatic switch from inline base64 to `file` when the total response size exceeds a safe threshold. - Outputs are written to disk using `output__media-gen___.` filenames (images/documents use a generated UUID; videos use the OpenAI `video_id`) and exposed to MCP clients via `content[]` depending on `tool_result` (`resource_link`/`image` for images, `resource_link`/`resource` for video/document downloads). - **Built-in test-images tool for MCP client debugging** [`test-images`](#test-images) reads sample images from a configured directory and returns them using the same result-building logic as production tools. Use `tool_result` and `response_format` parameters to test how different MCP clients handle `content[]` and `structuredContent`. - **Structured MCP error handling** All tool errors (validation, OpenAI API failures, I/O) are returned as MCP errors with `isError: true` and `content: [{ type: "text", text: }]`, making failures easy to parse and surface in MCP clients. --- ## 🚀 Installation ```sh git clone https://github.com/strato-space/media-gen-mcp.git cd media-gen-mcp npm install npm run build ``` Build modes: - `npm run build` – strict TypeScript build with **all strict flags enabled**, including `skipLibCheck: false`. Incremental builds via `.tsbuildinfo` (~2-3s on warm cache). - `npm run esbuild` – fast bundling via esbuild (no type checking, useful for rapid iteration). ### Development mode (no build required) For development or when TypeScript compilation fails due to memory constraints: ```sh npm run dev # Uses tsx to run TypeScript directly ``` ### Quality checks ```sh npm run lint # ESLint with typescript-eslint npm run typecheck # Strict tsc --noEmit npm run test # Unit tests (vitest) npm run test:watch # Watch mode for TDD npm run ci # lint + typecheck + test ``` ### Unit tests The project uses [vitest](https://vitest.dev/) for unit testing. Tests are located in `test/`. **Covered modules:** | Module | Tests | Description | |--------|-------|-------------| | `compression` | 12 | Image format detection, buffer processing, file I/O | | `helpers` | 31 | URL/path validation, output resolution, result placement, resource links | | `env` | 19 | Configuration parsing, env validation, defaults | | `logger` | 10 | Structured logging + truncation safety | | `pricing` | 5 | Sora pricing estimate helpers | | `schemas` | 69 | Zod schema validation for all tools, type inference | | `fetch-images` (integration) | 3 | End-to-end MCP tool call behavior | | `fetch-videos` (integration) | 3 | End-to-end MCP tool call behavior | **Test categories:** - **compression** — `isCompressionAvailable`, `detectImageFormat`, `processBufferWithCompression`, `readAndProcessImage` - **helpers** — `isHttpUrl`, `isAbsolutePath`, `isBase64Image`, `ensureDirectoryWritable`, `resolveOutputPath`, `getResultPlacement`, `buildResourceLinks` - **env** — config loading and validation for `MEDIA_GEN_*` / `MEDIA_GEN_MCP_*` settings - **logger** — truncation and error formatting behavior - **schemas** — validation for `openai-images-*`, `openai-videos-*`, `fetch-images`, `fetch-videos`, `test-images` inputs, boundary testing (prompt length, image count limits, path validation) ```sh npm run test # ✓ test/compression.test.ts (12 tests) # ✓ test/helpers.test.ts (31 tests) # ✓ test/env.test.ts (19 tests) # ✓ test/logger.test.ts (10 tests) # ✓ test/pricing.test.ts (5 tests) # ✓ test/schemas.test.ts (69 tests) # ✓ test/fetch-images.integration.test.ts (3 tests) # ✓ test/fetch-videos.integration.test.ts (3 tests) # Tests: 152 passed ``` ### Run directly via npx (no local clone) You can also run the server straight from a remote repo using `npx`: ```sh npx -y github:strato-space/media-gen-mcp --env-file /path/to/media-gen.env ``` The `--env-file` argument tells the server which env file to load (e.g. when you keep secrets outside the cloned directory). The file should contain `OPENAI_API_KEY`, optional Azure variables, and any `MEDIA_GEN_MCP_*` settings. ### `secrets.yaml` (optional) You can keep API keys (and optional Google Vertex AI settings) in a `secrets.yaml` file (compatible with the fast-agent secrets template): ```yaml openai: api_key: anthropic: api_key: google: api_key: vertex_ai: enabled: true project_id: your-gcp-project-id location: europe-west4 ``` `media-gen-mcp` loads `secrets.yaml` from the current working directory (or from `--secrets-file /path/to/secrets.yaml`) and applies it to env vars; values in `secrets.yaml` override env, and `` placeholders are ignored. --- ## ⚡ Quick start (fast-agent & Windsurf) ### fast-agent In fast-agent, MCP servers are configured in `fastagent.config.yaml` under the `mcp.servers` section (see the [fast-agent docs](https://github.com/strato-space/fast-agent)). To add `media-gen-mcp` from GitHub via `npx` as an MCP server: ```yaml # fastagent.config.yaml mcp: servers: # your existing servers (e.g. fetch, filesystem, huggingface, ...) media-gen-mcp: command: "npx" args: ["-y", "github:strato-space/media-gen-mcp", "--env-file", "/path/to/media-gen.env"] ``` Put `OPENAI_API_KEY` and other settings into `media-gen.env` (see `.env.sample` in this repo). ### Windsurf Add an MCP server that runs `media-gen-mcp` from GitHub via `npx` using the JSON format below (similar to Claude Desktop / VS Code): ```json { "mcpServers": { "media-gen-mcp": { "command": "npx", "args": ["-y", "github:strato-space/media-gen-mcp", "--env-file", "/path/to/media-gen.env"] } } } ``` --- ## 🔑 Configuration Add to your MCP client config (fast-agent, Windsurf, Claude Desktop, Cursor, VS Code): ```json { "mcpServers": { "media-gen-mcp": { "command": "npx", "args": ["-y", "github:strato-space/media-gen-mcp"], "env": { "OPENAI_API_KEY": "sk-..." } } } } ``` Also supports Azure deployments: ```json { "mcpServers": { "media-gen-mcp": { "command": "npx", "args": ["-y", "github:strato-space/media-gen-mcp"], "env": { // "AZURE_OPENAI_API_KEY": "sk-...", // "AZURE_OPENAI_ENDPOINT": "my.endpoint.com", "OPENAI_API_VERSION": "2024-12-01-preview" } } } } ``` Environment variables: - Set `OPENAI_API_KEY` (and optionally `AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_ENDPOINT`, `OPENAI_API_VERSION`) in the environment of the process that runs `node dist/index.js` (shell, systemd unit, Docker env, etc.). - The server will **optionally** load a local `.env` file from its working directory if present (it does not override already-set environment variables). - You can also pass `--env-file /path/to/env` when starting the server (including via `npx`); this file is loaded via `dotenv` before tools run, again without overriding already-set variables. ### Logging and base64 truncation To avoid flooding logs with huge image payloads, the built-in logger applies a log-only sanitizer to structured `data` passed to `log.debug/info/warn/error`: - Truncates configured string fields (e.g. `b64_json`, `base64`, string `data`, `image_url`) to a short preview controlled by `LOG_TRUNCATE_DATA_MAX` (default: 64 characters). The list of keys defaults to `LOG_SANITIZE_KEYS` inside `src/lib/logger.ts` and can be overridden via `MEDIA_GEN_MCP_LOG_SANITIZE_KEYS` (comma-separated list of field names). - Sanitization is applied **only** to log serialization; tool results returned to MCP clients are never modified. Control via environment: - `MEDIA_GEN_MCP_LOG_SANITIZE_IMAGES` (default: `true`) - `1`, `true`, `yes`, `on` – enable truncation (default behaviour). - `0`, `false`, `no`, `off` – disable truncation and log full payloads. Field list and limits are configured in `src/lib/logger.ts` via `LOG_SANITIZE_KEYS` and `LOG_TRUNCATE_DATA_MAX`. ### Security and local file access - **Allowed directories**: All tools are restricted to paths matching `MEDIA_GEN_DIRS`. If unset, defaults to `/tmp/media-gen-mcp` (or `%TEMP%/media-gen-mcp` on Windows). - **Test samples**: `MEDIA_GEN_MCP_TEST_SAMPLE_DIR` adds a directory to the allowlist and enables the `test-images` tool. - **Local reads**: `fetch-images` and `fetch-document` accept file paths (absolute or relative). Relative paths are resolved against the first `MEDIA_GEN_DIRS` entry and must still match an allowed pattern. - **Remote reads**: HTTP(S) fetches are filtered by `MEDIA_GEN_URLS` patterns. Empty = allow all. - **Writes**: `openai-images-generate`, `openai-images-edit`, `fetch-images`, `fetch-videos`, and `fetch-document` write under the first entry of `MEDIA_GEN_DIRS`. `test-images` is read-only and does not create new files. #### Glob patterns Both `MEDIA_GEN_DIRS` and `MEDIA_GEN_URLS` support glob wildcards: | Pattern | Matches | Example | |---------|---------|---------| | `*` | Any single segment (no `/`) | `/home/*/media/` matches `/home/user1/media/` | | `**` | Any number of segments | `/data/**/images/` matches `/data/a/b/images/` | URL examples: ```shell MEDIA_GEN_URLS=https://*.cdn.example.com/,https://storage.example.com/**/assets/ ``` Path examples: ```shell MEDIA_GEN_DIRS=/home/*/media-gen/output/,/data/**/images/ ``` ⚠️ **Warning**: Trailing wildcards without a delimiter (e.g., `/home/user/*` or `https://cdn.com/**`) expose entire subtrees and trigger a console warning at startup. #### Recommended mitigations 1. Run under a dedicated OS user with access only to allowed directories. 2. Keep allowlists minimal. Avoid `*` in home directories or system paths. 3. Use explicit `MEDIA_GEN_URLS` prefixes for remote fetches. 4. Monitor allowed directories via OS ACLs or backups. ### Tool Result Parameters: `tool_result` and `response_format` Image tools (`openai-images-*`, `fetch-images`, `test-images`) support two parameters that control the shape of the MCP tool result: | Parameter | Values | Default | Description | |-----------|--------|---------|-------------| | `tool_result` | `resource_link`, `image` | `resource_link` | Controls `content[]` shape | | `response_format` | `url`, `path`, `b64_json` | `url` | Controls `structuredContent` shape (OpenAI ImagesResponse format) | Video/document download tools (`openai-videos-create` / `openai-videos-remix` when downloading, `openai-videos-retrieve-content`, `google-videos-generate` when downloading, `google-videos-retrieve-content`, `fetch-videos`, `fetch-document`) support: | Parameter | Values | Default | Description | |-----------|--------|---------|-------------| | `tool_result` | `resource_link`, `resource` | `resource_link` | Controls `content[]` shape | Google video tools (`google-videos-*`) also support: | Parameter | Values | Default | Description | |-----------|--------|---------|-------------| | `response_format` | `url`, `b64_json` | `url` | Controls `structuredContent.response.generatedVideos[].video` shape (`uri` vs `videoBytes`) | #### `tool_result` — controls `content[]` - **Images** (`openai-images-*`, `fetch-images`, `test-images`) - **`resource_link`** (default): Emits `ResourceLink` items with `file://` or `https://` URIs - **`image`**: Emits base64 `ImageContent` blocks - **Videos** (tools that download video data) - **`resource_link`** (default): Emits `ResourceLink` items with `file://` or `https://` URIs - **`resource`**: Emits `EmbeddedResource` blocks with base64 `resource.blob` - **Documents** (`fetch-document`) - **`resource_link`** (default): Emits `ResourceLink` items with `file://` or `https://` URIs - **`resource`**: Emits `EmbeddedResource` blocks with base64 `resource.blob` #### `response_format` — controls `structuredContent` For OpenAI images, `structuredContent` always contains an OpenAI ImagesResponse-style object: ```jsonc { "created": 1234567890, "data": [ { "url": "https://..." } // or { "path": "/abs/path.png" } / { "b64_json": "..." } depending on response_format ] } ``` - **`url`** (default): `data[].url` contains file URLs - **`path`**: `data[].path` contains local filesystem paths - **`b64_json`**: `data[].b64_json` contains base64-encoded image data For Google videos, `response_format` controls whether `structuredContent.response.generatedVideos[].video` prefers: - **`url`** (default): `video.uri` (and strips `video.videoBytes`) - **`b64_json`**: `video.videoBytes` (and strips `video.uri`) #### Backward Compatibility (MCP 5.2.6) Per MCP spec 5.2.6, a `TextContent` block with serialized JSON (always using URLs in `data[]`) is also included in `content[]` for backward compatibility with clients that don't support `structuredContent`. Example tool result structure: ```jsonc { "content": [ // ResourceLink or ImageContent based on tool_result { "type": "resource_link", "uri": "https://...", "name": "image.png", "mimeType": "image/png" }, // Serialized JSON for backward compatibility (MCP 5.2.6) { "type": "text", "text": "{ \"created\": 1234567890, \"data\": [{ \"url\": \"https://...\" }] }" } ], "structuredContent": { "created": 1234567890, "data": [{ "url": "https://..." }] } } ``` **ChatGPT MCP client behavior (chatgpt.com, as of 2025-12-01):** - ChatGPT currently ignores `content[]` image data in favor of `structuredContent`. - For ChatGPT, use `response_format: "url"` and configure the first `MEDIA_GEN_MCP_URL_PREFIXES` entry as a public HTTPS prefix (for example `MEDIA_GEN_MCP_URL_PREFIXES=https://media-gen.example.com/media`). For Anthropic clients (Claude Desktop, etc.), the default configuration works well. ### Network access via mcp-proxy (SSE) For networked SSE access you can front `media-gen-mcp` with [`mcp-proxy`](https://github.com/modelcontextprotocol/servers/tree/main/src/proxy) or its equivalent. This setup has been tested with the TypeScript SSE proxy implementation [`punkpeye/mcp-proxy`](https://github.com/punkpeye/mcp-proxy). For example, a one-line command looks like: ```sh mcp-proxy --host=0.0.0.0 --port=99 --server=sse --sseEndpoint=/ --shell 'npx -y github:strato-space/media-gen-mcp --env-file /path/to/media-gen.env' ``` In production you would typically wire this up via a systemd template unit that loads `PORT`/`SHELL_CMD` from an `EnvironmentFile=` (see `server/mcp/mcp@.service` style setups). --- ## 🛠 Tool signatures ### openai-images-generate Arguments (input schema): - `prompt` (string, required) - Text prompt describing the desired image. - Max length: 32,000 characters. - `background` ("transparent" | "opaque" | "auto", optional) - Background handling mode. - If `background` is `"transparent"`, then `output_format` must be `"png"` or `"webp"`. - `model` ("gpt-image-1.5" | "gpt-image-1", optional, default: "gpt-image-1.5") - `moderation` ("auto" | "low", optional) - Content moderation behavior, passed through to the Images API. - `n` (integer, optional) - Number of images to generate. - Min: 1, Max: 10. - `output_compression` (integer, optional) - Compression level (0–100). - Only applied when `output_format` is `"jpeg"` or `"webp"`. - `output_format` ("png" | "jpeg" | "webp", optional) - Output image format. - If omitted, the server treats output as PNG semantics. - `quality` ("auto" | "high" | "medium" | "low", default: "high") - `size` ("1024x1024" | "1536x1024" | "1024x1536" | "auto", default: "1024x1536") - `user` (string, optional) - User identifier forwarded to OpenAI for monitoring. - `response_format` ("url" | "path" | "b64_json", default: "url") - Response format (aligned with OpenAI Images API): - `"url"`: file/URL-based output (resource_link items, `image_url` fields, `data[].url` in `api` placement). - `"path"`: local filesystem paths in `data[].path` (for local skill workflows). - `"b64_json"`: inline base64 image data (image content, `data[].b64_json` in `api` placement). - `tool_result` ("resource_link" | "image", default: "resource_link") - Controls `content[]` shape: - `"resource_link"` emits ResourceLink items (file/URL-based) - `"image"` emits base64 ImageContent blocks Behavior notes: - The server uses OpenAI `gpt-image-1.5` by default (set `model: "gpt-image-1"` for legacy behavior). - If the total size of all base64 images would exceed the configured payload threshold (default ~50MB via `MCP_MAX_CONTENT_BYTES`), the server automatically switches the **effective output mode** to file/URL-based and saves images to the first entry of `MEDIA_GEN_DIRS` (default: `/tmp/media-gen-mcp`). - Even when you explicitly request `response_format: "b64_json"`, the server still writes the files to disk (for static hosting, caching, or later reuse). Exposure of file paths / URLs in the tool result then depends on `MEDIA_GEN_MCP_RESULT_PLACEMENT` and per-call `result_placement` (see section below). Output (MCP CallToolResult, when placement includes `"content"`): - When the effective `output` mode is `"base64"`: - `content` is an array that may contain: - image items: - `{ type: "image", data: , mimeType: <"image/png" | "image/jpeg" | "image/webp"> }` - optional text items with revised prompts returned by the Images API (for models that support it, e.g. DALL·E 3): - `{ type: "text", text: }` - When the effective `output` mode is `"file"`: - `content` contains one `resource_link` item per file, plus the same optional `text` items with revised prompts: - `{ type: "resource_link", uri: "file:///absolute-path-1.png", name: "absolute-path-1.png", mimeType: }` - For `gpt-image-1.5` and `gpt-image-1`, an additional `text` line is included with a pricing estimate (based on `structuredContent.usage`), and `structuredContent.pricing` contains the full pricing breakdown. When `result_placement` includes `"api"`, `openai-images-generate` instead returns an **OpenAI Images API-like object** without MCP wrappers: ```jsonc { "created": 1764599500, "data": [ { "b64_json": "..." } // or { "url": "https://.../media/file.png" } when output: "file" ], "background": "opaque", "output_format": "png", "size": "1024x1024", "quality": "high" } ``` ### openai-images-edit Arguments (input schema): - `image` (string or string[], required) - Either a single absolute path to an image file (`.png`, `.jpg`, `.jpeg`, `.webp`), a base64-encoded image string (optionally as a `data:image/...;base64,...` URL), **or an HTTP(S) URL** pointing to a publicly accessible image, **or** an array of 1–16 such strings (for multi-image editing). - When an HTTP(S) URL is provided, the server fetches the image and converts it to base64 before sending to OpenAI. - `prompt` (string, required) - Text description of the desired edit. - Max length: 32,000 characters. - `mask` (string, optional) - Absolute path, base64 string, or HTTP(S) URL for a mask image (PNG < 4MB, same dimensions as the source image). Transparent areas mark regions to edit. - `model` ("gpt-image-1.5" | "gpt-image-1", optional, default: "gpt-image-1.5") - `n` (integer, optional) - Number of images to generate. - Min: 1, Max: 10. - `quality` ("auto" | "high" | "medium" | "low", default: "high") - `size` ("1024x1024" | "1536x1024" | "1024x1536" | "auto", default: "1024x1536") - `user` (string, optional) - User identifier forwarded to OpenAI for monitoring. - `response_format` ("url" | "path" | "b64_json", default: "url") - Response format (aligned with OpenAI Images API): - `"url"`: file/URL-based output (resource_link items, `image_url` fields, `data[].url` in `api` placement). - `"path"`: local filesystem paths in `data[].path` (for local skill workflows). - `"b64_json"`: inline base64 image data (image content, `data[].b64_json` in `api` placement). - `tool_result` ("resource_link" | "image", default: "resource_link") - Controls `content[]` shape: - `"resource_link"` emits ResourceLink items (file/URL-based) - `"image"` emits base64 ImageContent blocks Behavior notes: - The server accepts `image` and `mask` as absolute paths, base64/data URLs, or HTTP(S) URLs. - When an HTTP(S) URL is provided, the server fetches the image and converts it to a base64 data URL before calling OpenAI. - For edits, the server always returns PNG semantics (mime type `image/png`) when emitting images. Output (MCP CallToolResult): - When the effective `output` mode is `"base64"`: - `content` is an array that may contain: - image items: - `{ type: "image", data: , mimeType: "image/png" }` - optional text items with revised prompts (when the underlying model returns them): - `{ type: "text", text: }` - When the effective `output` mode is `"file"`: - `content` contains one `resource_link` item per file, plus the same optional `text` items with revised prompts: - `{ type: "resource_link", uri: "file:///absolute-path-1.png", name: "absolute-path-1.png", mimeType: "image/png" }` - For `gpt-image-1.5` and `gpt-image-1`, an additional `text` line is included with a pricing estimate (based on `structuredContent.usage`), and `structuredContent.pricing` contains the full pricing breakdown. When `result_placement` includes `"api"`, `openai-images-edit` follows the **same raw API format** as `openai-images-generate` (top-level `created`, `data[]`, `background`, `output_format`, `size`, `quality` with `b64_json` for base64 output or `url` for file output). Error handling (both tools): - On errors inside the tool handler (validation, OpenAI API failures, I/O, etc.), the server returns a CallToolResult marked as an error: - `isError: true` - `content: [{ type: "text", text: }]` - The error message text is taken directly from the underlying exception message, without additional commentary from the server, while full details are logged to the server console. ### openai-videos-create Create a video generation job using the OpenAI Videos API (`videos.create`). Arguments (input schema): - `prompt` (string, required) — text prompt describing the video (max 32K chars). - `input_reference` (string, optional) — optional image reference (HTTP(S) URL, base64/data URL, or file path). - `input_reference_fit` ("match" | "cover" | "contain" | "stretch", default: "contain") - How to fit `input_reference` to the requested video `size`: - `match`: require exact dimensions (fails fast on mismatch) - `cover`: resize + center-crop to fill - `contain`: resize + pad/letterbox to fit (default) - `stretch`: resize with distortion - `input_reference_background` ("blur" | "black" | "white" | "#RRGGBB" | "#RRGGBBAA", default: "blur") - Padding background used when `input_reference_fit="contain"`. - `model` ("sora-2" | "sora-2-pro", default: "sora-2-pro") - `seconds` ("4" | "8" | "12", optional) - `size` ("720x1280" | "1280x720" | "1024x1792" | "1792x1024", optional) - `1024x1792` and `1792x1024` require `sora-2-pro`. - If `input_reference` is omitted and `size` is omitted, the API default is used. - `wait_for_completion` (boolean, default: true) - When true, the server polls `openai-videos-retrieve` until `completed` or `failed` (or timeout), then downloads assets. - `timeout_ms` (integer, default: 900000) - `poll_interval_ms` (integer, default: 2000) - `download_variants` (string[], default: ["video"]) - Allowed values: `"video" | "thumbnail" | "spritesheet"`. - `tool_result` (`"resource_link"` | `"resource"`, default: `"resource_link"`) - Controls `content[]` shape for downloaded assets: - `"resource_link"` emits ResourceLink items (file/URL-based) - `"resource"` emits EmbeddedResource blocks with base64 `resource.blob` Output (MCP CallToolResult): - `structuredContent`: OpenAI `Video` object (job metadata; final state when `wait_for_completion=true`). - `content`: includes `resource_link` (default) or embedded `resource` blocks for downloaded assets (when requested) and text blocks with JSON. - Includes a summary JSON block: `{ "video_id": "...", "pricing": { "currency": "USD", "model": "...", "size": "...", "seconds": 4, "price": 0.1, "cost": 0.4 } | null }` (and when waiting: `{ "video_id": "...", "assets": [...], "pricing": ... }`). ### openai-videos-remix Create a remix job from an existing `video_id` (`videos.remix`). Arguments (input schema): - `video_id` (string, required) - `prompt` (string, required) - `wait_for_completion`, `timeout_ms`, `poll_interval_ms`, `download_variants`, `tool_result` — same semantics as `openai-videos-create` (default wait is true). ### openai-videos-list List video jobs (`videos.list`). Arguments (input schema): - `after` (string, optional) — cursor (video id) to list after. - `limit` (integer, optional) - `order` ("asc" | "desc", optional) Output: - `structuredContent`: OpenAI list response shape `{ data, has_more, last_id }`. - `content`: a text block with serialized JSON. ### openai-videos-retrieve Retrieve job status (`videos.retrieve`). - `video_id` (string, required) ### openai-videos-delete Delete a video job (`videos.delete`). - `video_id` (string, required) ### openai-videos-retrieve-content Retrieve an asset for a completed job (`videos.downloadContent`, REST `GET /videos/{video_id}/content`), write it under allowed `MEDIA_GEN_DIRS`, and return MCP `resource_link` (default) or embedded `resource` blocks (via `tool_result`). Arguments (input schema): - `video_id` (string, required) - `variant` ("video" | "thumbnail" | "spritesheet", default: "video") - `tool_result` (`"resource_link"` | `"resource"`, default: `"resource_link"`) Output (MCP CallToolResult): - `structuredContent`: OpenAI `Video` object. - `content`: a `resource_link` (or embedded `resource`), a summary JSON block `{ video_id, variant, uri, pricing }`, plus the full video JSON. ### google-videos-generate Create a Google video generation operation using the Google GenAI SDK (`@google/genai`) `ai.models.generateVideos`. Arguments (input schema): - `prompt` (string, optional) - `input_reference` (string, optional) — image-to-video input (HTTP(S) URL, base64/data URL, or file path under `MEDIA_GEN_DIRS`) - `input_reference_mime_type` (string, optional) — override for `input_reference` MIME type (must be `image/*`) - `input_video_reference` (string, optional) — video-extension input (HTTP(S) URL or file path under `MEDIA_GEN_DIRS`; mutually exclusive with `input_reference`) - `model` (string, default: `"veo-3.1-generate-001"`) - `number_of_videos` (integer, default: `1`) - `aspect_ratio` (`"16:9" | "9:16"`, optional) - `duration_seconds` (integer, optional) - Veo 2 models: 5–8 seconds (default: 8) - Veo 3 models: 4, 6, or 8 seconds (default: 8) - When using `referenceImages`: 8 seconds - `person_generation` (`"DONT_ALLOW" | "ALLOW_ADULT" | "ALLOW_ALL"`, optional) - `wait_for_completion` (boolean, default: `true`) - `timeout_ms` (integer, default: `900000`) - `poll_interval_ms` (integer, default: `10000`) - `download_when_done` (boolean, optional; defaults to `true` when waiting) - `tool_result` (`"resource_link"` | `"resource"`, default: `"resource_link"`) - Controls `content[]` shape when downloading generated videos. - `response_format` (`"url"` | `"b64_json"`, default: `"url"`) - Controls `structuredContent.response.generatedVideos[].video` fields: - `"url"` prefers `video.uri` (and strips `video.videoBytes`) - `"b64_json"` prefers `video.videoBytes` (and strips `video.uri`) Requirements: - Gemini Developer API: set `GEMINI_API_KEY` (or `GOOGLE_API_KEY`), or `google.api_key` in `secrets.yaml`. - Vertex AI: set `GOOGLE_GENAI_USE_VERTEXAI=true`, `GOOGLE_CLOUD_PROJECT`, and `GOOGLE_CLOUD_LOCATION` (or `google.vertex_ai.*` in `secrets.yaml`). Output: - `structuredContent`: Google operation object (includes `name`, `done`, and `response.generatedVideos[]` when available). - `content`: status text, optional `.mp4` `resource_link` (default) or embedded `resource` blocks (when downloaded), plus JSON text blocks for compatibility. ### google-videos-retrieve-operation Retrieve/poll an existing Google video operation (`ai.operations.getVideosOperation`). - `operation_name` (string, required) - `response_format` (`"url"` | `"b64_json"`, default: `"url"`) Output: - `structuredContent`: Google operation object. - `content`: JSON text blocks with a short summary + the full operation. ### google-videos-retrieve-content Download `.mp4` content for a completed operation and return file-first MCP `resource_link` (default) or embedded `resource` blocks (via `tool_result`). - `operation_name` (string, required) - `index` (integer, default: `0`) — selects `response.generatedVideos[index]` - `tool_result` (`"resource_link"` | `"resource"`, default: `"resource_link"`) - `response_format` (`"url"` | `"b64_json"`, default: `"url"`) Recommended workflow: 1) Call `google-videos-generate` with `wait_for_completion=true` (default) to get the completed operation and downloads; set to false only if you need the operation id immediately. 2) Poll `google-videos-retrieve-operation` until `done=true`. 3) Call `google-videos-retrieve-content` to download an `.mp4` and receive a `resource_link` (or embedded `resource`). ### fetch-images Fetch and process images from URLs or local file paths with optional compression. Arguments (input schema): - `sources` (string[], optional) - Array of image sources: HTTP(S) URLs or file paths (absolute or relative to the first `MEDIA_GEN_DIRS` entry). - Min: 1, Max: 20 images. - Mutually exclusive with `ids` and `n`. - `ids` (string[], optional) - Array of image IDs to fetch by local filename match under the primary `MEDIA_GEN_DIRS[0]` directory. - IDs must be safe (`[A-Za-z0-9_-]` only; no `..`, `*`, `?`, slashes). - Matches filenames containing `_{id}_` or `_{id}.` (supports both single outputs and multi-output suffixes like `_1.png`). - When `ids` is used, `compression` and `file` are not supported (no new files are created). - Mutually exclusive with `sources` and `n`. - `n` (integer, optional) - When set, returns the last N image files from the primary `MEDIA_GEN_DIRS[0]` directory. - Files are sorted by modification time (most recently modified first). - Mutually exclusive with `sources` and `ids`. - `compression` (object, optional) - `max_size` (integer, optional): Max dimension in pixels. Images larger than this will be resized. - `max_bytes` (integer, optional): Target max file size in bytes. Default: 819200 (800KB). - `quality` (integer, optional): JPEG/WebP quality 1-100. Default: 85. - `format` ("jpeg" | "png" | "webp", optional): Output format. Default: jpeg. - `response_format` ("url" | "path" | "b64_json", default: "url") - Response format: file/URL-based (`url`), local path (`path`), or inline base64 (`b64_json`). - `tool_result` ("resource_link" | "image", default: "resource_link") - Controls `content[]` shape: - `"resource_link"` emits ResourceLink items (file/URL-based) - `"image"` emits base64 ImageContent blocks - `file` (string, optional) - Base path for output files. If multiple images, index suffix is added. Behavior notes: - Images are processed in parallel for maximum throughput. - Compression is **only** applied when `compression` options are provided. - Compression uses [sharp](https://sharp.pixelplumbing.com/) with iterative quality/size reduction when enabled. - Partial success: if some sources fail, successful images are still returned with errors listed in the response. - When `n` is provided, it is only honored when the `MEDIA_GEN_MCP_ALLOW_FETCH_LAST_N_IMAGES` environment variable is set to `true`. Otherwise, the call fails with a validation error. - Sometimes an MCP client (for example, ChargeGPT) may not wait for a response from `media-gen-mcp` due to a timeout. In creative environments where you need to quickly retrieve the latest `openai-images-generate` / `openai-images-edit` outputs, you can use `fetch-images` with the `n` argument. When the `MEDIA_GEN_MCP_ALLOW_FETCH_LAST_N_IMAGES=true` environment variable is set, `fetch-images` will return the last N files from `MEDIA_GEN_DIRS[0]` even if the original generation or edit operation timed out on the MCP client side. ### fetch-videos Fetch videos from HTTP(S) URLs or local file paths. Arguments (input schema): - `sources` (string[], optional) - Array of video sources: HTTP(S) URLs or file paths (absolute or relative to the first `MEDIA_GEN_DIRS` entry). - Min: 1, Max: 20 videos. - Mutually exclusive with `ids` and `n`. - `ids` (string[], optional) - Array of video IDs to fetch by local filename match under the primary `MEDIA_GEN_DIRS[0]` directory. - IDs must be safe (`[A-Za-z0-9_-]` only; no `..`, `*`, `?`, slashes). - Matches filenames containing `_{id}_` or `_{id}.` (supports both single outputs and multi-asset suffixes like `_thumbnail.webp`). - When `ids` is used, `file` is not supported (no downloads; returns existing files). - Mutually exclusive with `sources` and `n`. - `n` (integer, optional) - When set, returns the last N video files from the primary `MEDIA_GEN_DIRS[0]` directory. - Files are sorted by modification time (most recently modified first). - Mutually exclusive with `sources` and `ids`. - `tool_result` (`"resource_link"` | `"resource"`, default: `"resource_link"`) - Controls `content[]` shape: - `"resource_link"` emits ResourceLink items (file/URL-based) - `"resource"` emits EmbeddedResource blocks with base64 `resource.blob` - `file` (string, optional) - Base path for output files (used when downloading from URLs). If multiple videos are downloaded, an index suffix is added. Output: - `content`: one `resource_link` (default) or embedded `resource` block per resolved video, plus an optional error summary text block. - `structuredContent`: `{ data: [{ source, uri, file, mimeType, name, downloaded }], errors?: string[] }`. Behavior notes: - URL downloads are only allowed when the URL matches `MEDIA_GEN_URLS` (when set). - When `n` is provided, it is only honored when the `MEDIA_GEN_MCP_ALLOW_FETCH_LAST_N_VIDEOS` environment variable is set to `true`. Otherwise, the call fails with a validation error. ### fetch-document Fetch documents from HTTP(S) URLs or local file paths. Arguments (input schema): - `sources` (string[]) - Array of document sources: HTTP(S) URLs or file paths (absolute or relative to the first `MEDIA_GEN_DIRS` entry). - Min: 1, Max: 20 documents. - `tool_result` (`"resource_link"` | `"resource"`, default: `"resource_link"`) - Controls `content[]` shape: - `"resource_link"` emits ResourceLink items (file/URL-based) - `"resource"` emits EmbeddedResource blocks with base64 `resource.blob` - `file` (string, optional) - Base path for output files (used when downloading from URLs). If multiple documents are downloaded, an index suffix is added. Output: - `content`: one `resource_link` (default) or embedded `resource` block per resolved document, plus an optional error summary text block. - `structuredContent`: `{ data: [{ source, uri, file, mimeType, name, downloaded }], errors?: string[] }`. Behavior notes: - URL downloads are only allowed when the URL matches `MEDIA_GEN_URLS` (when set). - Local paths are validated against `MEDIA_GEN_DIRS` and can be provided as `file://` URLs. - Default filenames use `output__media-gen__fetch-document_.` when `file` is omitted. ### test-images Debug tool for testing MCP result placement without calling OpenAI API. **Enabled only when `MEDIA_GEN_MCP_TEST_SAMPLE_DIR` is set**. The tool reads existing images from this directory and does **not** create new files. Arguments (input schema): - `response_format` ("url" | "path" | "b64_json", default: "url") - `result_placement` ("content" | "api" | "structured" | "toplevel" or array of these, optional) - Override `MEDIA_GEN_MCP_RESULT_PLACEMENT` for this call. - `compression` (object, optional) - Same logical tuning knobs as `fetch-images`, but using camelCase keys: - `tool_result` ("resource_link" | "image", default: "resource_link") - Controls `content[]` shape: - `"resource_link"` emits ResourceLink items (file/URL-based) - `"image"` emits base64 ImageContent blocks - `maxSize` (integer, optional): max dimension in pixels. - `maxBytes` (integer, optional): target max file size in bytes. - `quality` (integer, optional): JPEG/WebP quality 1–100. - `format` ("jpeg" | "png" | "webp", optional): output format. Behavior notes: - Reads up to 10 images from the sample directory (no sorting — filesystem order). - Uses the same result-building logic as `openai-images-generate` and `openai-images-edit` (including `result_placement` overrides). - When `output == "base64"` and `compression` is provided, sample files are read and compressed **in memory** using `sharp`; original files on disk are never modified. - Useful for testing how different MCP clients handle various result structures. - When `result_placement` includes `"api"`, the tool returns a **mock OpenAI Images API-style object**: - Top level: `created`, `data[]`, `background`, `output_format`, `size`, `quality`. - For `response_format: "b64_json"` each `data[i]` contains `b64_json`. - For `response_format: "path"` each `data[i]` contains `path`. - For `response_format: "url"` each `data[i]` contains `url` instead of `b64_json`. #### Debug CLI helpers for `test-images` For local debugging there are two helper scripts that call `test-images` directly: - `npm run test-images` – uses `debug/debug-call.ts` and prints the validated `CallToolResult` as seen by the MCP SDK client. Usage: ```sh npm run test-images -- [placement] [--response_format url|path|b64_json] # examples: # npm run test-images -- structured --response_format b64_json # npm run test-images -- structured --response_format path # npm run test-images -- structured --response_format url ``` - `npm run test-images:raw` – uses `debug/debug-call-raw.ts` and prints the raw JSON-RPC `result` (the underlying `CallToolResult` without extra wrapping). Same CLI flags as above. Both scripts truncate large fields for readability: - `image_url` → first 80 characters, then `...(N chars)`; - `b64_json` and `data` (when it is a base64 string) → first 25 characters, then `...(N chars)`. --- ## 🧩 Version policy ### Semantic Versioning (SemVer) This package follows **SemVer**: `MAJOR.MINOR.PATCH` (x.y.z). - `MAJOR` — breaking changes (tool names, input schemas, output shapes). - `MINOR` — new tools or backward-compatible additions (new optional params, new fields in responses). - `PATCH` — bug fixes and internal refactors with no intentional behavior change. Since `1.0.0`, this project follows **standard SemVer rules**: breaking changes bump **MAJOR** (npm’s `^1.0.0` allows `1.x`, but not `2.0.0`). ### Dependency policy This repository aims to stay **closely aligned with current stable releases**: - **MCP SDK**: targeting the latest stable `@modelcontextprotocol/sdk` and schema. - **OpenAI SDK**: regularly updated to the latest stable `openai` package. - **Zod**: using the Zod 4.x line (currently `^4.1.3`). In this project we previously ran on Zod 3.x and, in combination with the MCP TypeScript SDK typings, hit heavy TypeScript errors when passing `.shape` into `inputSchema` — in particular TS2589 (*"type instantiation is excessively deep and possibly infinite"*) and TS2322 (*schema shape not assignable to `AnySchema | ZodRawShapeCompat`*). We track the upstream discussion in [modelcontextprotocol/typescript-sdk#494](https://github.com/modelcontextprotocol/typescript-sdk/issues/494) and the related Zod typing work in [colinhacks/zod#5222](https://github.com/colinhacks/zod/pull/5222), and keep the stack on a combination that passes **full strict** compilation reliably. - **Tooling stack** (Node.js, TypeScript, etc.): developed and tested against recent LTS / current releases, with a dedicated `tsconfig-strict.json` that enables all strict TypeScript checks (`strict`, `noUnusedLocals`, `noUnusedParameters`, `exactOptionalPropertyTypes`, `noUncheckedIndexedAccess`, `noPropertyAccessFromIndexSignature`, etc.). You are welcome to pin or downgrade Node.js, TypeScript, the OpenAI SDK, Zod, or other pieces of the stack if your environment requires it, but please keep in mind: - we primarily test and tune against the latest stack; - issues that only reproduce on older runtimes / SDK versions may be harder for us to investigate and support; - upstream compatibility is validated first of all against the latest MCP spec and OpenAI Images API. This project is intentionally a bit **futuristic**: it tries to keep up with new capabilities as they appear in MCP and OpenAI tooling (in particular, robust multimodal/image support over MCP and in ChatGPT’s UI). A detailed real‑world bug report and analysis of MCP image rendering in ChatGPT is listed in the **References** section as a case study. If you need a long-term-stable stack, pin exact versions in your own fork and validate them carefully in your environment. --- ## 🧩 Typed tool callbacks All tool handlers use **strongly typed callback parameters** derived from Zod schemas via `z.input`: ```typescript // Schema definition const openaiImagesGenerateBaseSchema = z.object({ prompt: z.string().max(32000), background: z.enum(["transparent", "opaque", "auto"]).optional(), // ... more fields }); // Type alias type OpenAIImagesGenerateArgs = z.input; // Strictly typed callback server.registerTool( "openai-images-generate", { inputSchema: openaiImagesGenerateBaseSchema.shape, ... }, async (args: OpenAIImagesGenerateArgs, _extra: unknown) => { const validated = openaiImagesGenerateSchema.parse(args); // ... handler logic }, ); ``` This pattern provides: - **Static type safety** — IDE autocomplete and compile-time checks for all input fields. - **Runtime validation** — Zod `.parse()` ensures all inputs match the schema before processing. - **MCP SDK compatibility** — `inputSchema: schema.shape` provides the JSON Schema for tool registration. All tools (`openai-images-*`, `openai-videos-*`, `fetch-images`, `fetch-videos`, `fetch-document`, `test-images`) follow this pattern. --- ## 🧩 Tool annotations This MCP server exposes the following tools with annotation hints: | Tool | `readOnlyHint` | `destructiveHint` | `idempotentHint` | `openWorldHint` | |------|----------------|-------------------|------------------|-----------------| | **openai-images-generate** | `true` | `false` | `false` | `true` | | **openai-images-edit** | `true` | `false` | `false` | `true` | | **openai-videos-create** | `true` | `false` | `false` | `true` | | **openai-videos-remix** | `true` | `false` | `false` | `true` | | **openai-videos-list** | `true` | `false` | `false` | `true` | | **openai-videos-retrieve** | `true` | `false` | `false` | `true` | | **openai-videos-delete** | `true` | `false` | `false` | `true` | | **openai-videos-retrieve-content** | `true` | `false` | `false` | `true` | | **fetch-images** | `true` | `false` | `false` | `false` | | **fetch-videos** | `true` | `false` | `false` | `false` | | **fetch-document** | `true` | `false` | `false` | `false` | | **test-images** | `true` | `false` | `false` | `false` | These hints help MCP clients understand that these tools: - may invoke external APIs or read external resources (open world), - do not modify existing project files or user data; they only create new media files (images/videos/documents) in configured output directories, - may produce different outputs on each call, even with the same inputs. Because `readOnlyHint` is set to `true` for most tools, MCP platforms (including chatgpt.com) can treat this server as logically read-only and usually will not show "this tool can modify your files" warnings. --- ## 📁 Project structure ```text media-gen-mcp/ ├── src/ │ ├── index.ts # MCP server entry point │ └── lib/ │ ├── compression.ts # Image compression (sharp) │ ├── env.ts # Env parsing + allowlists (+ glob support) │ ├── helpers.ts # URL/path validation, result building │ ├── logger.ts # Structured logging + truncation helpers │ └── schemas.ts # Zod schemas for all tools ├── test/ │ ├── compression.test.ts # 12 tests │ ├── env.test.ts # 19 tests │ ├── fetch-images.integration.test.ts# 2 tests │ ├── fetch-videos.integration.test.ts# 2 tests │ ├── helpers.test.ts # 31 tests │ ├── logger.test.ts # 10 tests │ └── schemas.test.ts # 64 tests ├── debug/ # Local debug helpers (MCP client scripts) ├── plan/ # Design notes / plans ├── dist/ # Compiled output ├── tsconfig.json ├── vitest.config.ts ├── package.json ├── CHANGELOG.md ├── README.md └── AGENTS.md ``` --- ## 📝 License MIT --- ## 🩺 Troubleshooting - Make sure your `OPENAI_API_KEY` is valid and has image API access. - You must have a [verified OpenAI organization](https://platform.openai.com/account/organization). After verifying, it can take 15–20 minutes for image API access to activate. - File paths [optional param] must be absolute. - **Unix/macOS/Linux**: Starting with `/` (e.g., `/path/to/image.png`) - **Windows**: Drive letter followed by `:` (e.g., `C:/path/to/image.png` or `C:\path\to\image.png`) - For file output, ensure the target directory is writable. - If you see errors about file types, check your image file extensions and formats. --- ## 🙏 Inspiration This server was originally inspired by [SureScaleAI/openai-gpt-image-mcp](https://github.com/SureScaleAI/openai-gpt-image-mcp), but is now a separate implementation focused on **closely tracking the official specifications**: - **OpenAI Images API alignment** – The arguments for `openai-images-generate` and `openai-images-edit` mirror [`images.create` / `gpt-image-1.5`](https://platform.openai.com/docs/api-reference/images/create): `prompt`, `n`, `size`, `quality`, `background`, `output_format`, `output_compression`, `user`, plus `response_format` (`url` / `b64_json`) with the same semantics as the OpenAI Images API. - **MCP Tool Result alignment (image + resource_link)** – With `result_placement = "content"`, the server follows the MCP **5.2 Tool Result** section ([5.2.2 Image Content](https://modelcontextprotocol.io/specification/2025-11-25/server/tools#image-content), [5.2.4 Resource Links](https://modelcontextprotocol.io/specification/2025-11-25/server/tools#tool-result)) and emits strongly-typed `content[]` items: - `{ "type": "image", "data": "", "mimeType": "image/png" }` for `response_format = "b64_json"`; - `{ "type": "resource_link", "uri": "file:///..." | "https://...", "name": "...", "mimeType": "image/..." }` for file/URL-based output. - **Raw OpenAI-style API output** – With `result_placement = "api"`, the tool result itself **is** an OpenAI Images-style object: `{ created, data: [...], background, output_format, size, quality, usage? }`, where each `data[]` entry contains either `b64_json` (for `response_format = "b64_json"`) or `url` (for `response_format = "url"`). No MCP wrapper fields (`content`, `structuredContent`, `files`, `urls`) are added in this mode. In short, this library: - tracks the OpenAI Images API for **arguments and result shape** when `result_placement = "api"` with `response_format = "url" | "b64_json"`, and - follows the MCP specification for **tool result content blocks** (`image`, `resource_link`, `text`) when `result_placement = "content"`. ### Recommended presets for common clients - **Default mode / Claude Desktop / strict MCP clients** For clients that strictly follow the MCP spec, the recommended (and natural) configuration is: - `result_placement = content` - `response_format = b64_json` In this mode the server returns: - `content[]` with `type: "image"` (base64 image data) and `type: "resource_link"` (file/URL links), matching MCP section 5.2 (Image Content and Resource Links). This output works well for **direct integration** with Claude Desktop and any client that fully implements the 2025‑11‑25 spec. - **chatgpt.com Developer Mode** For running this server as an MCP backend behind ChatGPT Developer Mode, the most practical configuration is the one that most closely matches the OpenAI Images API: - `result_placement = api` - `response_format = url` In this mode the tool result matches the `images.create` / `gpt-image-1.5` format (including `data[].url`), which simplifies consumption from backends and libraries that expect the OpenAI schema. However, **even with this OpenAI-native shape, the chatgpt.com client does not currently render images**. This behavior is documented in detail in the following report: --- ## ⚠️ Limitations & Large File Handling - **Configurable payload safeguard:** By default this server uses a ~50MB budget (52,428,800 bytes) for inline `content` to stay within typical MCP client limits. You can override this threshold by setting the `MCP_MAX_CONTENT_BYTES` environment variable to a higher (or lower) value. - **Auto-Switch to File Output:** If the total image base64 size exceeds the configured threshold, the tool automatically saves images to disk and returns file path(s) via `resource_link` instead of inline base64. This helps avoid client-side "payload too large" errors while still delivering full-resolution images. - **Default File Location:** If you do not specify a `file` path, outputs are saved under `MEDIA_GEN_DIRS[0]` (default: `/tmp/media-gen-mcp`) using names like `output__media-gen___.`. - **Environment Variables:** - `MEDIA_GEN_DIRS`: Set this to control where outputs are saved. Example: `export MEDIA_GEN_DIRS=/your/desired/dir`. This directory may coincide with your public static directory if you serve files directly from it. - `MEDIA_GEN_MCP_URL_PREFIXES`: Optional comma-separated HTTPS prefixes for public URLs, matched positionally to `MEDIA_GEN_DIRS` entries. When set, the server builds public URLs as `/` and returns them alongside file paths (for example via `resource_link` URIs and `structuredContent.data[].url` when `response_format: "url"`). Example: `export MEDIA_GEN_MCP_URL_PREFIXES=https://media-gen.example.com/media,https://media-gen.example.com/samples` - **Best Practice:** For large or production images, always use file output and ensure your client is configured to handle file paths. Configure `MEDIA_GEN_DIRS` and (optionally) `MEDIA_GEN_MCP_URL_PREFIXES` to serve images via a public web server (e.g., nginx). --- ## 🌐 Serving generated files over HTTPS If you want ChatGPT (or any MCP client) to mention publicly accessible URLs alongside file paths: 1. Expose your image directory via HTTPS. For example, on nginx: ```nginx server { # listen 443 ssl http2; # server_name ; # ssl_certificate ; # ssl_certificate_key ; location /media/ { alias /home/username/media-gen-mcp/media/; autoindex off; expires 7d; add_header Cache-Control "public, immutable"; } } ``` 2. Ensure the first entry in `MEDIA_GEN_DIRS` points to the same directory (e.g. `MEDIA_GEN_DIRS=/home/username/media-gen-mcp/media/` or `MEDIA_GEN_DIRS=media/` when running from the project root). 3. Set `MEDIA_GEN_MCP_URL_PREFIXES=https://media-gen.example.com/media` so the server returns matching HTTPS URLs in top-level `urls`, `resource_link` URIs, and `image_url` fields (for `response_format: "url"`). Both `openai-images-generate` and `openai-images-edit` now attach `files` + `urls` for **base64** and **file** response modes, allowing clients to reference either the local filesystem path or the public HTTPS link. This is particularly useful while ChatGPT cannot yet render MCP image blocks inline. --- ## 📚 References - **Model Context Protocol** - [MCP Specification](https://modelcontextprotocol.io/docs/getting-started/intro) - [MCP Schema (2025-11-25)](https://github.com/modelcontextprotocol/modelcontextprotocol/blob/main/schema/2025-11-25/schema.json) - **OpenAI Images** - [Images API overview](https://platform.openai.com/docs/api-reference/images) - [Images generate (gpt-image-1.5)](https://platform.openai.com/docs/api-reference/images/create) - [Images edit (`createEdit`)](https://platform.openai.com/docs/api-reference/images/createEdit) - [Tools guide: image generation & revised_prompt](https://platform.openai.com/docs/guides/tools-image-generation) - **OpenAI Videos** - [Videos API overview](https://platform.openai.com/docs/api-reference/videos) - **Case studies** - [MCP image rendering in ChatGPT (GitHub issue)](https://github.com/strato-space/report/issues/1) - **Symptoms:** ChatGPT often ignored or mishandled MCP `image` content blocks: empty tool results, raw base64 treated as text (huge token usage), or generic "I can't see the image" responses, while other MCP clients (Cursor, Claude) rendered the same images correctly. - **Root cause:** not a problem with the MCP spec itself, but with ChatGPT's handling/serialization of MCP `CallToolResult` image content blocks and media objects (especially around UI rendering and nested containers). - **Status & workarounds:** OpenAI has begun rolling out fixes for MCP image support in Codex/ChatGPT, but behavior is still inconsistent; this server uses file/resource_link + URL patterns and spec‑conformant `image` blocks so that tools remain usable across current and future MCP clients. --- ## 🙏 Credits - Built with [@modelcontextprotocol/sdk](https://www.npmjs.com/package/@modelcontextprotocol/sdk) - Uses [openai](https://www.npmjs.com/package/openai) Node.js SDK - Refactoring and MCP spec alignment assisted by [Windsurf](https://windsurf.com) and [GPT-5 High Reasoning](https://openai.com).