# AI Vision MCP Server A powerful Model Context Protocol (MCP) server that provides AI-powered image and video analysis using Google Gemini and Vertex AI models. ## Features - **Dual Provider Support**: Choose between Google Gemini API and Vertex AI - **Multimodal Analysis**: Support for both image and video content analysis - **Flexible File Handling**: Upload via multiple methods (URLs, local files, base64) - **Storage Integration**: Built-in Google Cloud Storage support - **Comprehensive Validation**: Zod-based data validation throughout - **Error Handling**: Robust error handling with retry logic and circuit breakers - **TypeScript**: Full TypeScript support with strict type checking ## Quick Start ### Pre-requisites You could choose either to use [`google` provider](https://aistudio.google.com/welcome) or [`vertex_ai` provider](https://cloud.google.com/vertex-ai/generative-ai/docs/start/quickstart). For simplicity, `google` provider is recommended. Below are the environment variables you need to set based on your selected provider. (Note: It’s recommended to set the timeout configuration to more than 5 minutes for your MCP client). (i) **Using Google AI Studio Provider** ```bash export IMAGE_PROVIDER="google" # or vertex_ai export VIDEO_PROVIDER="google" # or vertex_ai export GEMINI_API_KEY="your-gemini-api-key" ``` Get your Google AI Studio's api key [here](https://aistudio.google.com/app/api-keys) (ii) **Using Vertex AI Provider** ```bash export IMAGE_PROVIDER="vertex_ai" export VIDEO_PROVIDER="vertex_ai" export VERTEX_CLIENT_EMAIL="your-service-account@project.iam.gserviceaccount.com" export VERTEX_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n" export VERTEX_PROJECT_ID="your-gcp-project-id" export GCS_BUCKET_NAME="your-gcs-bucket" ``` Refer to [the guideline here](docs/provider/vertex-ai-setup-guide.md) on how to set this up. ### Installation Below are the installation guide for this MCP on different MCP clients, such as Claude Desktop, Claude Code, Cursor, Cline, etc.
Claude Desktop Add to your Claude Desktop configuration: (i) Using Google AI Studio Provider ```json { "mcpServers": { "ai-vision-mcp": { "command": "npx", "args": ["ai-vision-mcp"], "env": { "IMAGE_PROVIDER": "google", "VIDEO_PROVIDER": "google", "GEMINI_API_KEY": "your-gemini-api-key" } } } } ``` (ii) Using Vertex AI Provider ```json { "mcpServers": { "ai-vision-mcp": { "command": "npx", "args": ["ai-vision-mcp"], "env": { "IMAGE_PROVIDER": "vertex_ai", "VIDEO_PROVIDER": "vertex_ai", "VERTEX_CLIENT_EMAIL": "your-service-account@project.iam.gserviceaccount.com", "VERTEX_PRIVATE_KEY": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n", "VERTEX_PROJECT_ID": "your-gcp-project-id", "GCS_BUCKET_NAME": "ai-vision-mcp-{VERTEX_PROJECT_ID}" } } } } ```
Claude Code (i) Using Google AI Studio Provider ```bash claude mcp add ai-vision-mcp \ -e IMAGE_PROVIDER=google \ -e VIDEO_PROVIDER=google \ -e GEMINI_API_KEY=your-gemini-api-key \ -- npx ai-vision-mcp ``` (ii) Using Vertex AI Provider ```bash claude mcp add ai-vision-mcp \ -e IMAGE_PROVIDER=vertex_ai \ -e VIDEO_PROVIDER=vertex_ai \ -e VERTEX_CLIENT_EMAIL=your-service-account@project.iam.gserviceaccount.com \ -e VERTEX_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n" \ -e VERTEX_PROJECT_ID=your-gcp-project-id \ -e GCS_BUCKET_NAME=ai-vision-mcp-{VERTEX_PROJECT_ID} \ -- npx ai-vision-mcp ``` Note: Increase the MCP startup timeout to 1 minutes and MCP tool execution timeout to about 5 minutes by updating `~\.claude\settings.json` as follows: ```json { "env": { "MCP_TIMEOUT": "60000", "MCP_TOOL_TIMEOUT": "300000" } } ```
Cursor Go to: Settings -> Cursor Settings -> MCP -> Add new global MCP server Pasting the following configuration into your Cursor ~/.cursor/mcp.json file is the recommended approach. You may also install in a specific project by creating .cursor/mcp.json in your project folder. See [Cursor MCP docs](https://docs.cursor.com/context/model-context-protocol) for more info. (i) Using Google AI Studio Provider ```json { "mcpServers": { "ai-vision-mcp": { "command": "npx", "args": ["ai-vision-mcp"], "env": { "IMAGE_PROVIDER": "google", "VIDEO_PROVIDER": "google", "GEMINI_API_KEY": "your-gemini-api-key" } } } } ``` (ii) Using Vertex AI Provider ```json { "mcpServers": { "ai-vision-mcp": { "command": "npx", "args": ["ai-vision-mcp"], "env": { "IMAGE_PROVIDER": "vertex_ai", "VIDEO_PROVIDER": "vertex_ai", "VERTEX_CLIENT_EMAIL": "your-service-account@project.iam.gserviceaccount.com", "VERTEX_PRIVATE_KEY": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n", "VERTEX_PROJECT_ID": "your-gcp-project-id", "GCS_BUCKET_NAME": "ai-vision-mcp-{VERTEX_PROJECT_ID}" } } } } ```
Cline Cline uses a JSON configuration file to manage MCP servers. To integrate the provided MCP server configuration: 1. Open Cline and click on the MCP Servers icon in the top navigation bar. 2. Select the Installed tab, then click Advanced MCP Settings. 3. In the cline_mcp_settings.json file, add the following configuration: (i) Using Google AI Studio Provider ```json { "mcpServers": { "timeout": 300, "type": "stdio", "ai-vision-mcp": { "command": "npx", "args": ["ai-vision-mcp"], "env": { "IMAGE_PROVIDER": "google", "VIDEO_PROVIDER": "google", "GEMINI_API_KEY": "your-gemini-api-key" } } } } ``` (ii) Using Vertex AI Provider ```json { "mcpServers": { "ai-vision-mcp": { "timeout": 300, "type": "stdio", "command": "npx", "args": ["ai-vision-mcp"], "env": { "IMAGE_PROVIDER": "vertex_ai", "VIDEO_PROVIDER": "vertex_ai", "VERTEX_CLIENT_EMAIL": "your-service-account@project.iam.gserviceaccount.com", "VERTEX_PRIVATE_KEY": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n", "VERTEX_PROJECT_ID": "your-gcp-project-id", "GCS_BUCKET_NAME": "ai-vision-mcp-{VERTEX_PROJECT_ID}" } } } } ```
Other MCP clients The server uses stdio transport and follows the standard MCP protocol. It can be integrated with any MCP-compatible client by running: ```bash npx ai-vision-mcp ```
## MCP Tools The server provides four main MCP tools: ### 1) `analyze_image` Analyzes an image using AI and returns a detailed description. **Parameters:** - `imageSource` (string): URL, base64 data, or file path to the image - `prompt` (string): Question or instruction for the AI - `mode` (string, optional): Analysis mode - one of: - `general` (default) - General image analysis - `palette` - Extract design tokens (colors, spacing, typography) - `hierarchy` - Analyze visual hierarchy and eye flow - `components` - Catalog UI components and design system maturity - `options` (object, optional): Analysis options including temperature and max tokens **Examples:** 1. **General image analysis:** ```json { "imageSource": "https://plus.unsplash.com/premium_photo-1710965560034-778eedc929ff", "prompt": "What is this image about? Describe what you see in detail." } ``` 2. **Extract design tokens:** ```json { "imageSource": "https://example.com/design.png", "prompt": "Extract all design tokens from this screenshot", "mode": "palette" } ``` 3. **Analyze visual hierarchy:** ```json { "imageSource": "C:\\Users\\username\\Downloads\\ui_mockup.png", "prompt": "Analyze the visual hierarchy and eye flow", "mode": "hierarchy" } ``` 4. **Component inventory:** ```json { "imageSource": "https://example.com/design-system.png", "prompt": "List all UI components and evaluate design system maturity", "mode": "components" } ``` ### 2) `compare_images` Compares multiple images using AI and returns a detailed comparison analysis. **Parameters:** - `imageSources` (array): Array of image sources (URLs, base64 data, or file paths) - minimum 2, maximum 4 images - `prompt` (string): Question or instruction for comparing the images - `options` (object, optional): Analysis options including temperature and max tokens **Examples:** 1. **Compare images from URLs:** ```json { "imageSources": [ "https://example.com/image1.jpg", "https://example.com/image2.jpg" ], "prompt": "Compare these two images and tell me the differences" } ``` 2. **Compare mixed sources:** ```json { "imageSources": [ "https://example.com/image1.jpg", "C:\\\\Users\\\\username\\\\Downloads\\\\image2.jpg", "data:image/jpeg;base64,/9j/4AAQSkZJRgAB..." ], "prompt": "Which image has the best lighting quality?" } ``` ### 3) `detect_objects_in_image` Detects objects in an image using AI vision models and generates annotated images with bounding boxes. Returns detected objects with coordinates and either saves the annotated image to a file or temporary directory. **Parameters:** - `imageSource` (string): URL, base64 data, or file path to the image - `prompt` (string): Custom detection prompt describing what to detect or recognize in the image - `outputFilePath` (string, optional): Explicit output path for the annotated image **Configuration:** This function uses optimized default parameters for object detection and does not accept runtime `options` parameter. To customize the AI parameters (temperature, topP, topK, maxTokens), use environment variables: ``` # Recommended environment variable settings for object detection (these are now the defaults) TEMPERATURE_FOR_DETECT_OBJECTS_IN_IMAGE=0.0 # Deterministic responses TOP_P_FOR_DETECT_OBJECTS_IN_IMAGE=0.95 # Nucleus sampling TOP_K_FOR_DETECT_OBJECTS_IN_IMAGE=30 # Vocabulary selection MAX_TOKENS_FOR_DETECT_OBJECTS_IN_IMAGE=8192 # High token limit for JSON ``` **File Handling Logic:** 1. **Explicit outputFilePath provided** β†’ Saves to the exact path specified 2. **If not explicit outputFilePath** β†’ Automatically saves to temporary directory **Response Types:** - Returns `file` object when explicit outputFilePath is provided - Returns `tempFile` object when explicit outputFilePath is not provided so the image file output is auto-saved to temporary folder - Always includes `detections` array with detected objects and coordinates - Includes `summary` with percentage-based coordinates for browser automation **Examples:** 1. **Basic object detection:** ```json { "imageSource": "https://example.com/image.jpg", "prompt": "Detect all objects in this image" } ``` 2. **Save annotated image to specific path:** ```json { "imageSource": "C:\\Users\\username\\Downloads\\image.jpg", "outputFilePath": "C:\\Users\\username\\Documents\\annotated_image.png" } ``` 3. **Custom detection prompt:** ```json { "imageSource": "data:image/jpeg;base64,/9j/4AAQSkZJRgAB...", "prompt": "Detect and label all electronic devices in this image" } ``` ### 4) `audit_design` Audits UI/UX design compliance with pixel-level analysis and AI critique. This tool provides automated design compliance auditing using pure TypeScript/JavaScript pixel analysis combined with Gemini Vision API critique. It extracts dominant colors, detects visual complexity, validates WCAG contrast ratios, and generates actionable design recommendations. **Inspired by:** [Automating UX/UI Design Analysis with Python, Machine Learning, and LLMs](https://medium.com/@jadeygraham96/automating-ux-ui-design-analysis-with-python-machine-learning-and-llms-1fa1440b719b) by Jade Graham **Parameters:** - `imageSource` (string): URL, base64 data, or file path to the design image - `prompt` (string, optional): Custom audit context or focus areas - `options` (object, optional): Analysis options including temperature and max tokens **Features:** - **Dominant Colors**: K-means clustering to extract 5 primary colors - **Edge Complexity**: Sobel operator for visual structure analysis - **WCAG Contrast**: W3C relative luminance formula validation (AA/AAA) - **Luminance Stats**: Mean brightness and standard deviation calculations - **Design Issues**: Automated detection of contrast, complexity, and brightness problems - **AI Critique**: Gemini-powered recommendations for design improvements **Examples:** 1. **Basic design audit:** ```json { "imageSource": "https://example.com/design.png", "prompt": "Audit this design for accessibility and visual hierarchy" } ``` 2. **Audit local design file:** ```json { "imageSource": "C:\\Users\\username\\Downloads\\ui_design.png", "prompt": "Check WCAG AA compliance" } ``` ### 5) `analyze_video` Analyzes a video using AI and returns a detailed description. **Parameters:** - `videoSource` (string): YouTube URL, GCS URI, or local file path to the video - `prompt` (string): Question or instruction for the AI - `options` (object, optional): Analysis options including temperature and max tokens **Supported video sources:** - YouTube URLs (e.g., `https://www.youtube.com/watch?v=...`) - Local file paths (e.g., `C:\Users\username\Downloads\video.mp4`) **Examples:** 1. **Analyze video from YouTube URL:** ```json { "videoSource": "https://www.youtube.com/watch?v=9hE5-98ZeCg", "prompt": "What is this video about? Describe what you see in detail." } ``` 2. **Analyze local video file:** ```json { "videoSource": "C:\\Users\\username\\Downloads\\video.mp4", "prompt": "What is this video about? Describe what you see in detail." } ``` **Note:** Only YouTube URLs are supported for public video URLs. Other public video URLs are not currently supported. ## Environment Configuration For basic setup, you only need to configure the provider selection and required credentials: ### Google AI Studio Provider (Recommended) ```bash export IMAGE_PROVIDER="google" export VIDEO_PROVIDER="google" export GEMINI_API_KEY="your-gemini-api-key" ``` ### Vertex AI Provider (Production) ```bash export IMAGE_PROVIDER="vertex_ai" export VIDEO_PROVIDER="vertex_ai" export VERTEX_CLIENT_EMAIL="your-service-account@project.iam.gserviceaccount.com" export VERTEX_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n" export VERTEX_PROJECT_ID="your-gcp-project-id" export GCS_BUCKET_NAME="your-gcs-bucket" ``` ### πŸ“– **Detailed Configuration Guide** For comprehensive environment variable documentation, including: - Complete configuration reference (60+ environment variables) - Function-specific optimization examples - Advanced configuration patterns - Troubleshooting guidance πŸ‘‰ **[See Environment Variable Guide](docs/environment-variable-guide.md)** ### Configuration Priority Overview The server uses a hierarchical configuration system where more specific settings override general ones: 1. **LLM-assigned values** (runtime parameters in tool calls) 2. **Function-specific variables** (`TEMPERATURE_FOR_ANALYZE_IMAGE`, etc.) 3. **Task-specific variables** (`TEMPERATURE_FOR_IMAGE`, etc.) 4. **Universal variables** (`TEMPERATURE`, etc.) 5. **System defaults**
Quick Configuration Examples **Basic Optimization:** ```bash # General settings export TEMPERATURE=0.7 export MAX_TOKENS=1500 # Task-specific optimization export TEMPERATURE_FOR_IMAGE=0.2 # More precise for images export TEMPERATURE_FOR_VIDEO=0.5 # More creative for videos ``` **Function-specific Optimization:** ```bash # Optimize individual functions export TEMPERATURE_FOR_ANALYZE_IMAGE=0.1 export TEMPERATURE_FOR_COMPARE_IMAGES=0.3 export TEMPERATURE_FOR_DETECT_OBJECTS_IN_IMAGE=0.0 # Deterministic export MAX_TOKENS_FOR_DETECT_OBJECTS_IN_IMAGE=8192 # High token limit ``` **Model Selection:** ```bash # Choose models per function export ANALYZE_IMAGE_MODEL="gemini-2.5-flash-lite" export COMPARE_IMAGES_MODEL="gemini-2.5-flash" export ANALYZE_VIDEO_MODEL="gemini-2.5-flash-pro" ```
## Troubleshooting (stdio / Codex / Claude Code) ### 1) "Transport closed" / tool call fails If you see errors like: - `tools/call failed: Transport closed` Common causes: **A) Image annotation dependency failed to load** This server uses [`imagescript`](https://github.com/matmen/ImageScript) for image annotation/dimension extraction. Verify it loads: ```bash npm run doctor # or npm run check:imagescript ``` **B) stdout logs corrupt stdio MCP framing** This server uses the MCP **stdio** transport (newline-delimited JSON-RPC over stdout). - βœ… stdout must contain **only** MCP JSON-RPC messages - βœ… write logs to **stderr** (e.g. `console.error`) - ❌ do not use `console.log` in stdio MCP servers If stdout is polluted, clients (Codex/Claude Code) may disconnect and report `Transport closed`. ## Development ### Prerequisites - Node.js 18+ - npm or yarn ### Setup ```bash # Clone the repository git clone https://github.com/tan-yong-sheng/ai-vision-mcp.git cd ai-vision-mcp # Install dependencies npm install # Build the project npm run build # Start development server npm run dev ``` ### Scripts - `npm run build` - Build the TypeScript project - `npm run dev` - Start development server with watch mode - `npm run lint` - Run ESLint - `npm run format` - Format code with Prettier - `npm start` - Start the built server ## Architecture The project follows a modular architecture: ``` src/ β”œβ”€β”€ providers/ # AI provider implementations β”‚ β”œβ”€β”€ gemini/ # Google Gemini provider β”‚ β”œβ”€β”€ vertexai/ # Vertex AI provider β”‚ └── factory/ # Provider factory β”œβ”€β”€ services/ # Core services β”‚ β”œβ”€β”€ ConfigService.ts β”‚ └── FileService.ts β”œβ”€β”€ storage/ # Storage implementations β”œβ”€β”€ file-upload/ # File upload strategies β”œβ”€β”€ types/ # TypeScript type definitions β”œβ”€β”€ utils/ # Utility functions └── server.ts # Main MCP server ``` ## Error Handling The server includes comprehensive error handling: - **Validation Errors**: Input validation using Zod schemas - **Network Errors**: Automatic retries with exponential backoff - **Authentication Errors**: Clear error messages for API key issues - **File Errors**: Handling for file size limits and format restrictions ## Contributing 1. Fork the repository 2. Create a feature branch (`git checkout -b feature/amazing-feature`) 3. Commit your changes (`git commit -m 'Add amazing feature'`) 4. Push to the branch (`git push origin feature/amazing-feature`) 5. Open a Pull Request ## License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. ## Acknowledgments - Google for the Gemini and Vertex AI APIs - The Model Context Protocol team for the MCP framework - Jade Graham for the [design analysis methodology](https://medium.com/@jadeygraham96/automating-ux-ui-design-analysis-with-python-machine-learning-and-llms-1fa1440b719b) that inspired the `audit_design` tool - All contributors and users of this project