---
name: video-agent
description: AI content generation suite with 35+ models. Image generation, video creation, audio processing via FAL AI, Google Vertex AI, ElevenLabs. Pipeline orchestration and cost management.
original_source: https://github.com/donghaozhang/video-agent-claude-skill
author: donghaozhang
license: MIT
---

# Video Agent - AI Content Generation Suite

A comprehensive AI content generation package providing a unified interface across 35+ models for image, video, and audio creation.

## When to Use This Skill

- Text-to-image generation
- Image-to-image transformations
- Text-to-video creation
- Image-to-video animation
- Professional text-to-speech
- Multi-step content pipelines
- Batch content generation

## Supported Providers

### FAL AI
- FLUX models (text-to-image)
- Image transformations
- Fast inference

### Google Vertex AI
- Imagen 4 (text-to-image)
- Veo (text-to-video)
- High quality outputs

### ElevenLabs
- 20+ voice options
- Professional TTS
- Multiple languages

### OpenRouter
- Access to various LLMs
- Text generation
- Content writing

## Core Capabilities

### Image Generation

```
Generate image:
Prompt: "A serene Japanese garden at sunset"
Model: flux-pro
Size: 1024x1024
Style: photorealistic
```

**Available Models:**
- FLUX Pro/Dev (FAL)
- Imagen 4 (Google)
- Stable Diffusion variants

### Video Creation

```
Generate video:
Prompt: "Ocean waves crashing on rocky shore"
Model: veo
Duration: 5 seconds
Resolution: 1080p
```

**Available Models:**
- Google Veo
- MiniMax Hailuo
- Kling

### Image-to-Video

```
Animate image:
Source: /path/to/image.png
Motion: "gentle zoom out with particle effects"
Duration: 4 seconds
```

### Text-to-Speech

```
Generate audio:
Text: "Welcome to our product demo..."
Voice: professional-female-1
Speed: 1.0
Output: welcome.mp3
```

**Voice Options:**
- Professional male/female
- Casual conversational
- Narrator styles
- Multiple accents

## Pipeline Orchestration

### YAML Configuration

```yaml
pipeline: product-demo
steps:
  - name: generate-logo
    type: image
    model: flux-pro
    prompt: "Modern tech logo for AI startup"

  - name: create-intro
    type: video
    model: veo
    prompt: "Logo animation reveal"

  - name: add-voiceover
    type: audio
    model: elevenlabs
    text: "Introducing the future of AI..."
    voice: professional-male

  - name: combine
    type: merge
    inputs: [create-intro, add-voiceover]
```

### JSON Configuration

```json
{
  "pipeline": "social-content",
  "parallel": true,
  "steps": [
    {
      "type": "image",
      "variants": 4,
      "prompt": "Product hero shot"
    }
  ]
}
```

## Cost Management

### Real-time Estimation

```
Estimate cost for:
- 10 images (1024x1024)
- 2 videos (5 seconds)
- 1 audio (60 seconds)

Estimated: $2.45
```

### Budget Limits

```yaml
budget:
  max_per_job: $5.00
  max_daily: $50.00
  alert_threshold: 80%
```

## Performance Features

### Parallel Execution

```
Generate 10 image variants in parallel
Threads: 4
Expected speedup: 2-3x
```

### Caching

- Automatic prompt caching
- Reuse similar generations
- Reduce redundant API calls

## CLI Commands

```bash
# Image generation
video-agent image "prompt" --model flux-pro --size 1024

# Video generation
video-agent video "prompt" --model veo --duration 5

# Audio generation
video-agent audio "text" --voice professional-female

# Pipeline execution
video-agent pipeline config.yaml

# Cost check
video-agent cost --estimate
```

## Python API

```python
from video_agent import ImageGenerator, VideoGenerator

# Generate image
img = ImageGenerator(model="flux-pro")
result = img.generate("sunset over mountains")

# Generate video
vid = VideoGenerator(model="veo")
result = vid.generate("timelapse of clouds")
```

## Setup

### 1. Install Package
```bash
pip install video-agent-claude-skill
```

### 2. Configure API Keys
```bash
export FAL_API_KEY="your-key"
export GOOGLE_VERTEX_KEY="your-key"
export ELEVENLABS_API_KEY="your-key"
```

### 3. Verify Setup
```bash
video-agent status
```

## Use Cases

- **Marketing**: Product images, promo videos
- **Social Media**: Content at scale
- **Education**: Explainer videos, voiceovers
- **Prototyping**: Visual concepts, mockups
- **Automation**: Batch content pipelines

## Credits

Created by [donghaozhang](https://github.com/donghaozhang/video-agent-claude-skill). Licensed under MIT.