---
name: austn-tools
description: Generate content using austn.net AI services (TTS, images, etc.)
user_invocable: true
---

# Austn Tools Skill

## Purpose
Access Austin's local GPU-powered AI services at austn.net for content generation:
- Text-to-Speech (Chatterbox TTS)
- Image Generation (ComfyUI)
- Background Removal
- Vector Tracing
- Audio Stem Separation
- And more

## Available Services

### 1. Text-to-Speech (`/tts`)
**URL**: https://austn.net/tts/new
**Backend**: Chatterbox TTS on local GPU

**⚠️ CRITICAL CONSTRAINT: 40-second maximum duration**
- Audio caps at 40 seconds regardless of text length
- For longer content: split into multiple clips with separate share links
- Estimate: ~100-120 words = ~40 seconds

**Parameters**:
| Field | Description | Default |
|-------|-------------|---------|
| text | Text to speak (keep under ~120 words) | Required |
| voice | Voice selection | "Default voice" |
| exaggeration | Emotional intensity (0-1) | 0.5 |
| cfg_weight | Voice adherence (0-1) | 1.0 |

**Expression Tags** (add inline to text):
- `[laughter]` - Laughing
- `[giggle]` - Giggling
- `[sigh]` - Sighing
- `[gasp]` - Gasping
- `[whisper]` - Whispering
- `[cough]` - Coughing
- `[clear_throat]` - Throat clearing
- `[groan]` - Groaning
- `[humming]` - Humming
- `[UH]`, `[UM]` - Filler sounds

**Example Text**:
```
Hello! [sigh] This is austnomaton speaking. [laughter] Pretty wild, right?
```

### 2. Image Generation (`/images`)
**URL**: https://austn.net/images/ai_generate
**Backend**: ComfyUI on local GPU

**Parameters**:
| Field | Description | Default |
|-------|-------------|---------|
| prompt | Image description | Required |
| negative_prompt | What to avoid | "blurry, low quality, distorted" |
| seed | Reproducibility seed | Random |
| size | Image dimensions | 512x512 |
| batch_size | Number of images | 1 |
| publish | Show in gallery 10min | false |

### 3. Background Removal (`/rembg`)
**URL**: https://austn.net/rembg
Remove backgrounds from images.

### 4. Vector Tracing (`/vtracer`)
**URL**: https://austn.net/vtracer
Convert raster images to SVG vectors.

### 5. Audio Stems (`/stems`)
**URL**: https://austn.net/stems
Separate audio into vocal/instrument tracks.

### 6. 3D Tools (`/3d`)
**URL**: https://austn.net/3d
3D content generation.

### 7. MIDI Generation (`/midi`)
**URL**: https://austn.net/midi
Generate MIDI sequences.

## Usage via Browser Automation

Since these are web UIs, use browser automation to interact:

### TTS Generation
```python
# 1. Navigate to TTS
navigate("https://austn.net/tts/new")

# 2. Click text field and enter text
click(text_field)
type("Hello world! [laughter] This is a test.")

# 3. Optionally expand advanced options
click(advanced_options_checkbox)
# Adjust sliders if needed

# 4. Click Generate Speech
click(generate_button)

# 5. Wait for audio, then download
```

### Image Generation
```python
# 1. Navigate to image generator
navigate("https://austn.net/images/ai_generate")

# 2. Enter prompt
click(prompt_field)
type("A robot writing code in a cozy office, digital art")

# 3. Optionally set advanced options
click(advanced_options_checkbox)
# Set negative prompt, seed, size, batch

# 4. Click Generate Image
click(generate_button)

# 5. Wait for result, download
```

## Browser Automation Tips

### Field Locations (approximate)
**TTS Page** (`/tts/new`):
- Text input: Center of page, large textarea
- Voice dropdown: Below text input
- Advanced options checkbox: Below voice dropdown
- Exaggeration slider: After checkbox expanded
- CFG Weight slider: Below exaggeration
- Generate button: Green button at bottom

**Image Page** (`/images/ai_generate`):
- Prompt textarea: Top of form
- Advanced options checkbox: Below prompt
- Negative prompt: First advanced field
- Seed input: Below negative prompt
- Size dropdown: Below seed
- Batch size dropdown: Below size
- Generate button: Green button at bottom

### Downloading Results
- TTS: Audio player appears, right-click to save or use download button
- Images: Image appears in result area, right-click to save

## Integration with Video Pipeline

These tools combine well for autonomous video creation:

1. **Script** → Write narration text
2. **TTS** → Generate voiceover audio
3. **Images** → Generate visuals/thumbnails
4. **Combine** → Use ffmpeg or video editor

### Example Workflow
```
1. Generate narration: /austn-tools tts "Welcome to austnomaton..."
2. Generate thumbnail: /austn-tools image "Robot mascot, friendly, digital art"
3. Record screen session with browser automation
4. Combine audio + video with ffmpeg
5. Export final video
```

## Output Locations

Save generated content to:
- Audio: `content/audio/`
- Images: `content/images/`
- Videos: `content/videos/`

## Service Status & Dependencies

| Service | Backend | Requires Local GPU |
|---------|---------|-------------------|
| TTS | Chatterbox TTS | Yes (but often available) |
| Images | ComfyUI | Yes - needs server running |
| Rembg | Python | Likely |
| VTracer | Rust | Likely |
| Stems | Demucs | Yes |
| 3D | Unknown | Yes |
| MIDI | Unknown | Yes |

### Connection Details
- Services route to local GPU via Tailscale
- Image generation connects to `100.68.94.33:8188` (ComfyUI)
- If generation fails with "TCP connection" error, the backend server isn't running

### Verified Working (2026-02-02)
- ✅ TTS - Generated 8.4s audio in 6.9s
- ❌ Images - Failed (ComfyUI server not running)

## Notes

- Services depend on Austin's local GPU being online
- No API keys needed - it's Austin's own infrastructure
- TTS has "Share Link" that lasts 7 days
- Gallery publish is optional and temporary (10 min)
- Large batches may take time depending on GPU load