---
name: video-designer
description: Expert video designer that generates comprehensive design specifications based on video direction. Creates precise JSON schemas for scenes including elements, animations, timing, and styling following strict design guidelines.
---
# Video Designer
This skill provides design guidelines for creating consistent, high-quality video content by following a specific schema. Load the relevant reference files based on what needs to be designed.
`./references/characters.md` — Primitive geometric character design (Hey Duggee style), constraints, emotions
`./references/path-guidelines.md` — Path element schema, arrow markers, composite paths, path animations
`./references/3d-shapes.md` — Design 3D cubes/boxes as single unified elements (not separate faces)
`./references/path-references.md` — Different types of paths along with their parameters and output JSON example
```
COORDINATE SYSTEM
Origin: Top-left corner (0, 0)
X-axis: Increases rightward →
Y-axis: Increases downward ↓
FOR SHAPES/TEXT/ICONS:
Position: Always refers to element's CENTER point
FOR PATHS:
All coordinates are ABSOLUTE screen positions
No position/size fields needed (implied by path coordinates)
ROTATION DIRECTIONS
0° = pointing up (↑)
90° = pointing right (→)
180° = pointing down (↓)
270° = pointing left (←)
Positive values = clockwise rotation
Negative values = counter-clockwise (-90° same as 270°)
SETTING TRANSFORMS BY ELEMENT TYPE
For shapes/text: Set `rotation` directly to the desired direction.
For assets: Check `asset_manifest` for `composition`.
- `composition` describes the asset's structure and orientation
- Infer the asset's base orientation from the composition description
- Use composition to determine if the asset is symmetric or asymmetric
SYMMETRIC ASSETS (no distinct top/bottom in composition):
- Use `rotation = desired_direction - inferred_orientation`
ASYMMETRIC ASSETS (composition mentions distinct top/bottom):
- If NO direction change needed, omit rotation/flip fields entirely
- If direction change is small (e.g., 45°) and won't invert, use `rotation` only
- If direction change would invert the asset (180°), use `flipX`/`flipY` instead of rotation
- `flipX: true` = horizontal mirror (for LEFT↔RIGHT flip)
- `flipY: true` = vertical mirror (for UP↔DOWN flip)
- Can combine flip + rotation for diagonal directions
Example: Asset with composition: "Wedge-shaped vehicle pointing RIGHT. Flat BOTTOM, angled TOP"
- Inferred orientation: 90° (pointing RIGHT)
- To point RIGHT: No transform needed (omit rotation/flip fields)
- To point UP-RIGHT: Use `rotation: -45` (small angle, no inversion)
- To point BOTTOM-RIGHT: Use `rotation: 45` (small angle, no inversion)
- To point LEFT: Use `flipX: true` (not rotation: 180 which inverts it)
- To point UP-LEFT: Use `flipX: true, rotation: 45`
- To point DOWN-LEFT: Use `flipX: true, rotation: -45`
EXAMPLE (1920×1080 viewport)
Screen center: x = 960, y = 540
Top-center: x = 960, y = 100
Bottom-left quadrant: x = 480, y = 810
Right edge center: x = 1820, y = 540
```
Your output must be a valid JSON object matching this schema:
```json
{
"scene": 0,
"startTime": 0,
"endTime": 7664,
"video_metadata": {
"viewport_size": "1920x1080",
"backgroundColor": "#HEXCOLOR",
"layout": {
"strategy": "three-column|centered|freeform"
}
},
"elements": [
{
"id": "unique_element_id",
"type": "shape|text|asset|pattern|path",
"enterOn": 0,
"exitOn": 7664,
"content": "CRITICAL: Complete visual specification (see content_field_requirements)",
"x": number,
"y": number,
"width": number,
"height": number,
"rotation": number,
"flipX": boolean,
"flipY": boolean,
"orientation": number,
"scale": number,
"opacity": number,
"fill": "#HEXCOLOR",
"stroke": "#HEXCOLOR",
"strokeWidth": number,
"fontSize": number,
"fontWeight": number,
"textAlign": "left|center|right",
"lineHeight": number,
"zIndex": number,
"animation": {
"entrance": {
"type": "string (e.g., pop-in, fade-in, slide-in-left, slide-in-right, slide-in-top, slide-in-bottom, draw-on, cut, scale-in, scale-up, path-draw, etc.)",
"duration": number
},
"exit": {
"type": "string (e.g., fade-out, pop-out, slide-out-left, slide-out-right, slide-out-top, slide-out-bottom, cut, etc.)",
"duration": number
},
"actions": [
{
"on": number,
"duration": number,
"targetProperty": "opacity|scale|x|y|rotation|fill|stroke",
"value": number,
"easing": "linear|easeIn|easeOut|easeInOut"
},
{
"type": "follow-path",
"pathId": "path_element_id",
"autoRotate": boolean,
"on": number,
"duration": number,
"easing": "linear|easeIn|easeOut|easeInOut"
}
]
}
}
]
}
```
**Required fields per element:** `id`, `type`, `enterOn`, `exitOn`, `content`, `zIndex`
**Required for assets with follow-path:** `orientation` (inferred from composition, e.g., 0°=UP, 90°=RIGHT, 180°=DOWN, 270°=LEFT)
**Optional fields:** `animation` (but recommended for visual engagement)
When you need multiple similar elements with only a few varying properties, use the `instances` pattern to avoid duplication and reduce token count.
All base properties go at the root level. The `instances` array specifies overrides for each copy.
```json
{
"id": "arrow",
"type": "shape",
"content": "Right-pointing arrow",
"enterOn": 1000,
"exitOn": 5000,
"x": 100,
"y": 200,
"width": 50,
"height": 50,
"fill": "#3B82F6",
"opacity": 1,
"instances": [
{ "useDefaults": true },
{ "x": 200 },
{ "x": 300, "fill": "#FF5722" },
{ "y": 400 }
]
}
```
**This creates 4 elements:**
- `arrow_0`: x=100, y=200, fill=#3B82F6 (uses all base values)
- `arrow_1`: x=200, y=200, fill=#3B82F6 (overrides only x)
- `arrow_2`: x=300, y=200, fill=#FF5722 (overrides x and fill)
- `arrow_3`: x=100, y=400, fill=#3B82F6 (overrides only y)
1. **First instance is always `{ "useDefaults": true }`** - Represents an element with no overrides (uses all base values)
2. **Deep merge for nested objects** - Nested objects (`animation`, `path_params`, `container`) merge recursively; unspecified fields inherit from base
3. **Field ordering** - `instances` must be the **last field** in the object
4. **IDs generated automatically** - Pattern: `${baseId}_${index}` (e.g., `arrow_0`, `arrow_1`)
5. **Works on any level** - Can be used on elements, `path_params`, or any nested object
6. **Array length = instance count** - The array length determines how many copies are created
**⚠️ CRITICAL: Instances are MANDATORY when 2+ similar elements exist.**
✅ **MUST use instances for:**
- Any 2+ elements of the same `type` with similar structure
- This includes: shapes, paths, text labels, assets - ANY element type
```json
{
"id": "dot",
"type": "shape",
"content": "Small circular dot",
"enterOn": 1000,
"exitOn": 5000,
"x": 660,
"y": 290,
"width": 30,
"height": 30,
"fill": "#3B82F6",
"opacity": 1,
"animation": {
"entrance": { "type": "pop-in", "duration": 300 }
},
"instances": [
{ "useDefaults": true },
{ "x": 960, "animation": { "entrance": { "duration": 200 } } },
{ "x": 1260, "fill": "#FF5722" }
]
}
```
Creates 3 dots:
- `dot_0`: x=660, animation.entrance.duration=300 (base values)
- `dot_1`: x=960, animation.entrance.duration=200, type="pop-in" inherited (deep merge)
- `dot_2`: x=1260, fill=#FF5722, animation.entrance.duration=300 inherited
```json
{
"id": "arc_element",
"type": "path",
"enterOn": 1000,
"exitOn": 5000,
"x": 400,
"y": 300,
"stroke": "#FFFFFF",
"strokeWidth": 3, // Calculate as percentage of viewport dimension
"path_params": {
"type": "arc",
"start_x": 100,
"start_y": 300,
"end_x": 400,
"end_y": 300,
"radius": 2, // Calculate as percentage of viewport dimension
"sweep": 1,
"instances": [
{ "useDefaults": true },
{ "radius": 4 },
{ "start_x": 200, "end_x": 500 },
{ "sweep": 0 }
]
}
}
```
Creates 4 arc paths with different radii, positions, and directions while sharing the base coordinates.
- **Logical organization** - Related elements stay together (e.g., all parts of a smartphone)
- **Simplified management** - Transform entire groups at once (move, rotate, scale)
- **Shared timing** - Parent `enterOn`/`exitOn` applies to all children unless overridden
- **Reduced repetition** - Common properties defined at parent level
✅ **Use groups when:**
- Multiple elements logically belong together (parts of an icon, device, or object)
- Elements share common timing or transformations
- Scene has many elements that need organization
- Elements form a composite object (phone = body + screen + buttons)
❌ **Don't use groups when:**
- Single standalone element
- Elements are unrelated or independent
- No shared timing or transformation needed
Groups support these properties:
- **Timing:** `enterOn`, `exitOn` (inherited by children unless overridden)
- **Transform:** `x`, `y`, `rotation`, `scale` (applied to entire group)
- **Animation:** `entrance`, `exit` (for the group as a whole)
Individual children can override timing and have their own animations.
**CRITICAL: Timing Values Must Be Absolute**
**All timing values (`enterOn`, `exitOn`, `action.on`) must use absolute video timestamps, NOT relative scene timestamps.**
Given:
- `scene.startTime = 18192` (absolute video time)
- Audio transcript shows word "dust" at `1777ms` (relative to scene start)
Your timing should be:
```json
"enterOn": 19969, // 18192 + 1777 = absolute video time
"exitOn": 24589 // matches scene.endTime (absolute)
```
**Formula:** `absolute_time = scene.startTime + audio_relative_time`
The `content` field is the most critical part. It must answer ALL of these:
| Aspect | What to Specify | Example |
|--------|-----------------|---------|
| **Shape/Form** | Exact geometry, proportions | "Asymmetrical rounded blob—right lobe shorter, left lobe extends 2x downward" |
| **Visual Details** | Colors, textures, features | "Deep orange center (#E65100) fading to bright orange (#FF9800) edges, 3 subtle lighter spots" |
| **Face/Expression** | If character: eyes, mouth, emotion | "Wide white eyes with violet pupils, V-shaped pink eyebrows angled inward expressing anger" |
| **Position Context** | Where in frame, relative to what | "Centered in belly area of silhouette, taking 75% of belly's width" |
| **Initial State** | Starting appearance | "Begins as small concentrated core at liver's center" |
| **Transformations** | What changes and how | "On inhale: body compresses, eyes shrink, mouth tightens to small 'o'; on exhale: expands, eyes widen, mouth stretches to tall oval" |
| **Interaction** | How it relates to other elements | "Scales at same rate as silhouette to maintain relative position inside belly" |
**Precision Test**: Could someone draw this without seeing the original? If uncertain, add more detail.
Before designing anything, analyze the example to establish your style guide:
- Background color
- Layout strategy
Document font styles and calculate proportional sizes (fontSize ÷ viewport_height).
Document each animation type with duration:
```
pop-in: Xms
fade-in: Xms
scale-grow: Xms
color-transition: Xms
```
| Attribute | Example's Approach |
|-----------|-------------------|
| Mood | (playful/serious/technical) |
| Shapes | (rounded/sharp, thick/thin strokes) |
| Characters | (Kawaii faces, googly eyes, expressive) |
| Motion | (organic wobbles, breathing animations) |
| Metaphors | (how abstract concepts become visual) |
Reconstruct sentences from transcript. Identify:
- Key moments needing visual emphasis
- Natural timing beats for element entrances
From scene_direction, list:
- Primary elements (must have)
- Supporting elements (enhance clarity)
- Labels/text (identify concepts)
**CRITICAL: Only create elements mentioned in `videoDescription`.**
For every element:
1. Write complete `content` description (see content_field_requirements)
2. Calculate exact position and size
3. Assign zIndex for layer order
4. Set timing synced to transcript
5. Define entrance/exit animations
6. Specify any mid-scene actions
Audio timestamps are relative to scene start. Convert to absolute video time:
```
Audio: "ball" at 4708ms (relative)
Scene: startTime = 7000ms (absolute)
Element enterOn: 7000 + 4600 = 11600ms (absolute, with anticipation)
Audio: "bat" at 5908ms (relative)
Element enterOn: 7000 + 5800 = 12800ms (absolute, with anticipation)
```
**Always add scene.startTime to audio timestamps.**
Always create text elements when you need to show the text on the screen.
Text elements are auto-sized based on content and fontSize. Position elements with adequate spacing to avoid overlaps.
**CRITICAL: textAlign is CSS only, NOT positioning**
The `textAlign` field controls how text lines flow within the text container. It does NOT change where the container is positioned.
**Universal coordinate system rule applies:**
- x,y is ALWAYS the center point of the text element
- textAlign affects internal text alignment, not container position
**When textAlign matters:**
- Multi-line text (e.g., "MONKEY\nTALKS") - shorter lines align left/center/right within the container width
- Single-line text with `containerWidth` set - text aligns within that fixed width
**When textAlign has NO visible effect:**
- Single-line text without containerWidth - the container shrinks to fit the text exactly, leaving no extra space for alignment to matter
**Example - Multi-line text at x=540 (viewport center):**
```
textAlign: "center" textAlign: "left"
┌──────┐ ┌──────┐
│MONKEY│ │MONKEY│
│ TALKS│ │TALKS │
└──────┘ └──────┘
↑ ↑
x=540 x=540
```
Both containers are centered on x=540. PHASED (longest line) fills container width. Only ARRAY (shorter line) shifts based on textAlign.
Calculate text sizes as a percentage of viewport height for consistency across resolutions.
**Calculation formula:**
```
fontSize = viewport_height × percentage
```
**Principle:** Choose percentage based on:
- Visual hierarchy (headlines larger than body text)
- Readability requirements (ensure text is readable at target viewport size)
- Direction requirements (follow what the direction specifies)
- Context (technical content may need different sizing than casual content)
**IMPORTANT:** Never create separate shape elements for text backgrounds. Use the `container` object instead.
The `container` object gives text its own background, border, and padding. The background automatically sizes to fit the text content.
- Keep text within viewport boundaries
```json
{
"id": "Unique identifier for this text element",
"type": "Element type must be text",
"content": "Brief description of what this text represents or its purpose in the scene",
"text": "The actual text content to display",
"bgID": "ID of parent/background element - ONLY for positioning text ON diagram elements (optional)",
"enterOn": "ABSOLUTE video time in ms (scene.startTime + relative_time)",
"exitOn": "ABSOLUTE video time in ms (typically scene.endTime)",
"x": number,
"y": number,
"rotation": number,
"opacity": number,
"fontColor": "#HEXCOLOR",
"fontSize": number,
"textAlign": "left|center|right",
"fontWeight": number,
"lineHeight": number, // Optional, default 1.5
"containerWidth": number, // Optional, for fixed-width text that wraps
"padding": number, // Optional, default 8
"zIndex": number,
"animation": {
"entrance": {
"type": "Entry animation style (string - e.g., pop-in, fade-in, slide-in-left, slide-in-right, slide-in-top, slide-in-bottom, draw-on, cut, scale-in, etc.)",
"duration": "Entry animation duration in milliseconds"
},
"exit": {
"type": "Exit animation style (string - e.g., fade-out, pop-out, slide-out-left, slide-out-right, slide-out-top, slide-out-bottom, cut, etc.)",
"duration": "Exit animation duration in milliseconds"
}
},
"container": {
"padding": number,
"background": {
"type": "none|solid|gradient|frosted-glass|highlight",
"color": "#HEXCOLOR",
"opacity": number,
"gradient": {
"from": "#HEXCOLOR",
"to": "#HEXCOLOR",
"direction": "to-right|to-left|to-bottom|to-top|to-br|to-bl"
}
},
"border": {
"radius": number,
"color": "#HEXCOLOR",
"width": number
},
"backdropBlur": "sm|md|lg|xl"
}
}
```
```json
{
"id": "code_example_label",
"type": "text",
"content": "Label displaying code example with gradient background",
"text": "Hello World!",
"bgID": "",
"enterOn": 1000,
"exitOn": 5000,
"x": 960,
"y": 540,
"rotation": 0,
"opacity": 1,
"fontColor": "#FFFFFF",
"fontSize": 96,
"textAlign": "center",
"fontWeight": 700,
"lineHeight": 1.4,
"zIndex": 5,
"animation": {
"entrance": {
"type": "pop-in",
"duration": 500
},
"exit": {
"type": "fade-out",
"duration": 400
}
},
"container": {
"padding": 20, // Calculate as percentage of fontSize or viewport dimension
"background": {
"type": "gradient",
"color": "#3B82F6",
"opacity": 0.9,
"gradient": {
"from": "#3B82F6",
"to": "#8B5CF6",
"direction": "to-right"
}
},
"border": {
"radius": 12, // Calculate as percentage of viewport dimension
"color": "#FFFFFF",
"width": 2 // Calculate as percentage of viewport dimension
},
"backdropBlur": "md"
}
}
```
1. **Auto-fit backgrounds**: When using container.background, the background automatically sizes to fit the text + padding
2. **Padding makes backgrounds auto-size**: The container.padding property creates spacing and makes the background fit perfectly
3. **No separate shape elements needed**: Text backgrounds are built into the text element itself