--- name: ai-vision-cli description: > Analyze images and videos with AI vision models. Detect objects with bounding boxes, compare multiple images, audit design compliance, and analyze video content using Google Gemini or Vertex AI. Supports CLI and MCP modes. license: MIT metadata: version: 1.0.0 author: Tan Yong Sheng tags: - image-analysis - video-analysis - gemini - vertex-ai - mcp - cli - object-detection - design-analysis keywords: - vision - ai - image - video - analysis - detection - comparison - design-audit --- # AI Vision MCP AI-powered image and video analysis CLI using Google Gemini and Vertex AI models. ## Quick Start ### Installation ```bash npm install -g ai-vision-mcp # or use directly npx ai-vision-mcp [options] ``` ### Setup Set your provider via environment variables: **Google AI Studio (Recommended)** ```bash export IMAGE_PROVIDER="google" export VIDEO_PROVIDER="google" export GEMINI_API_KEY="your-api-key" ``` Get your API key at [aistudio.google.com/app/api-keys](https://aistudio.google.com/app/api-keys) **Vertex AI** ```bash export IMAGE_PROVIDER="vertex_ai" export VIDEO_PROVIDER="vertex_ai" export VERTEX_CLIENT_EMAIL="your-service-account@project.iam.gserviceaccount.com" export VERTEX_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n" export VERTEX_PROJECT_ID="your-gcp-project-id" export GCS_BUCKET_NAME="your-gcs-bucket" ``` ## Commands ### audit-design Audit a website or UI design for accessibility, visual quality, WCAG contrast compliance, and design best practices. ```bash ai-vision audit-design [--prompt ] [options] ``` **Options:** - `--prompt ` — Custom audit prompt (optional) - `--temperature ` — Temperature 0-2 (default: 0.7) - `--top-p ` — Top P 0-1 - `--top-k ` — Top K 1-100 - `--max-tokens ` — Max output tokens - `--json` — Output raw JSON **Output includes:** - Severity level (critical/major/minor/pass) - Metrics: dimensions, dominant colors, edge complexity, luminance, WCAG contrast - Issues identified with recommendations - AI critique **Examples:** ```bash ai-vision audit-design https://example.com/hero.jpg ai-vision audit-design screenshot.png --prompt "Evaluate accessibility" ai-vision audit-design design.jpg --json ``` ### analyze-image Analyze an image with AI vision models. ```bash ai-vision analyze-image --prompt [options] ``` **Options:** - `--prompt ` — Analysis prompt (required) - `--temperature ` — Temperature 0-2 (default: 0.7) - `--top-p ` — Top P 0-1 - `--top-k ` — Top K 1-100 - `--max-tokens ` — Max output tokens - `--json` — Output raw JSON **Examples:** ```bash ai-vision analyze-image https://example.com/image.jpg --prompt "describe the scene" ai-vision analyze-image screenshot.png --prompt "extract design tokens" ai-vision analyze-image image.jpg --prompt "analyze" --json ``` ### compare-images Compare 2-4 images to identify differences, similarities, or changes. ```bash ai-vision compare-images [source3] [source4] --prompt [options] ``` **Options:** - `--prompt ` — Comparison prompt (required) - `--temperature ` — Temperature 0-2 (default: 0.7) - `--top-p ` — Top P 0-1 - `--top-k ` — Top K 1-100 - `--max-tokens ` — Max output tokens - `--json` — Output raw JSON **Examples:** ```bash ai-vision compare-images before.jpg after.jpg --prompt "what changed?" ai-vision compare-images v1.png v2.png v3.png --prompt "which is best?" ai-vision compare-images baseline.png current.png --prompt "find visual bugs" --json ``` ### detect-objects Detect and identify objects in an image with bounding boxes and confidence scores. ```bash ai-vision detect-objects --prompt [--output ] [options] ``` **Options:** - `--prompt ` — Detection prompt (required) - `--output ` — Save annotated image (optional) - `--viewport-width ` — Logical viewport width for web screenshots - `--viewport-height ` — Logical viewport height for web screenshots - `--temperature ` — Temperature 0-2 (default: 0.7) - `--top-p ` — Top P 0-1 - `--top-k ` — Top K 1-100 - `--max-tokens ` — Max output tokens - `--json` — Output raw JSON **Output includes:** - Detections: Array of objects with bounding boxes and confidence scores - Summary: Human-readable text with CSS selectors for web elements - Metadata: Detection model, provider, processing time **Examples:** ```bash ai-vision detect-objects photo.jpg --prompt "find all cars" ai-vision detect-objects scene.jpg --prompt "detect people" --output annotated.jpg ai-vision detect-objects screenshot.png --prompt "find buttons" --viewport-width 1920 --viewport-height 1080 ai-vision detect-objects image.jpg --prompt "find text" --json ``` ### analyze-video Analyze video content frame-by-frame or as a whole. Supports URLs, local files, and YouTube videos. ```bash ai-vision analyze-video --prompt [options] ``` **Options:** - `--prompt ` — Analysis prompt (required) - `--start-offset