---
name: id-generator
description: Generate intelligent session IDs based on detected content source type.
  Analyzes ContentSummary and creates meaningful IDs (podcast-xyz, transcript-abc, etc.).
---

# ID Generator Skill

Expert at generating meaningful session IDs based on content source type and characteristics.

## What This Skill Does

- Reads ContentSummary JSON with detectedSource field
- Analyzes source type and main topic/theme
- Generates human-readable, meaningful session ID
- Returns ID with confidence score and rationale
- Enables reusable content identification

## ID Generation Process

Follow these 5 steps:

### Step 1: Validate Input
- Confirm ContentSummary has required fields
- Check if detectedSource field exists
- Note the content's main topic/headline/category

### Step 2: Analyze Source Type
- Review detectedSource value (podcast, transcript, article, youtube, twitter, text, other)
- Consider content characteristics:
  - **Podcast**: Audio transcript with timestamps, speaker markers, conversational flow
  - **Transcript**: Conversational content, dialogue, timestamps, multiple speakers
  - **Article**: Written structure, sections, headlines, formal tone
  - **YouTube**: Video description, channel info, timestamps
  - **Twitter**: Social media context, short-form, engagement metrics
  - **Text**: Unstructured thoughts, raw content, raw notes
  - **Other**: Any other content type

### Step 3: Extract Topic Keywords
- Identify main topic from headline, category, or keyThemes
- Select 1-2 most significant keywords
- Avoid generic terms, prefer specific subject matter
- Examples: "healthcare", "ai", "python", "productivity"

### Step 4: Generate Base ID
- Format: `{source-type}-{date}-{topic}`
- Source prefix from detectedSource (lowercase)
- Date in YYYY-MM-DD format
- Topic as 1-2 words (hyphen-separated, lowercase)
- Example: `podcast-2024-12-08-ai-healthcare`

### Step 5: Quality Check & Return
- Ensure ID is lowercase, hyphen-separated
- Validate length (20-50 characters preferred)
- Return JSON with:
  - `contentId`: Generated ID
  - `detectedSource`: Confirmed source type
  - `sourceConfidence`: 0.0-1.0 confidence in source detection
  - `rationale`: Brief explanation of why this source type

## ID Generation Rules

### Source-Based Naming
- **Podcast**: `podcast-{date}-{topic}`
- **Transcript**: `transcript-{date}-{topic}`
- **Article**: `article-{date}-{topic}`
- **YouTube**: `youtube-{date}-{topic}`
- **Twitter**: `tweet-{date}-{topic}`
- **Text**: `text-{date}-{topic}`
- **Other**: `content-{date}-{topic}`

### Topic Selection
- Extract from headline (first 2-3 words)
- If headline too generic, use category or first keyTheme
- Avoid articles (a, the, and, or)
- Use only letters, numbers, hyphens
- Maximum 3 words

### Date Handling
- Use current date in YYYY-MM-DD format
- Or extract from context field if content has original date
- Format consistently

### Examples
```
Input: Podcast about AI in Healthcare (2024-12-08)
Output: podcast-2024-12-08-ai-healthcare

Input: Whisper transcript of React conference talk (2024-12-07)
Output: transcript-2024-12-07-react-conference

Input: Article on productivity hacks (2024-12-06)
Output: article-2024-12-06-productivity-hacks

Input: YouTube tutorial on Python (2024-12-05)
Output: youtube-2024-12-05-python-tutorial
```

## Output Format

Return structured JSON:
```json
{
  "contentId": "podcast-2024-12-08-ai-healthcare",
  "detectedSource": "podcast",
  "sourceConfidence": 0.95,
  "rationale": "Audio transcript with speaker markers, timestamps, and conversational flow indicates podcast source."
}
```

## Important Rules

- Always return valid JSON
- Confidence score reflects certainty in source detection (0.5-1.0 typical range)
- Rationale should be brief (1-2 sentences)
- ID must be unique and reusable
- Never include spaces or special characters (except hyphens)
- Generate ID even if source confidence is moderate (>0.5)
- Prefer readable IDs over random strings
- Keep ID length under 60 characters when possible