---
name: q-topic-finetuning
description: "Fine-tune and consolidate topic modeling outputs (BERTopic, LDA, etc.) into a theory-driven classification framework for academic manuscripts. Use when processing topic modeling results that need topic consolidation, theoretical classification, domain-specific preservation, multi-category handling, data verification, or Excel updates with final labels."
---

# Q Topic Finetuning Skill

Fine-tune topic modeling outputs into consolidated, theory-driven topic frameworks for academic manuscripts.

## When to Use

- Converting raw topic model outputs (BERTopic, LDA, NMF) into manuscript-ready categories
- Applying theoretical frameworks (legitimacy, stakeholder theory, etc.) to topic clusters
- Consolidating 50+ topics into 20-50 theoretically meaningful groups
- Preserving domain-specific distinctions (by entity, event, geography, time)
- Creating reproducible Excel outputs with classification labels

## Workflow Overview

```
Source Data (Topic Model Excel) 
    |
    v
1. Load & Analyze Topics --> identify overlaps, unassigned
    |
    v
2. Define Final Topic Structure --> FINAL_TOPICS dictionary
    |
    v
3. Apply Theoretical Framework --> classify each topic
    |
    v
4. Generate Implementation Plan (MD)
    |
    v
5. Update Source Data with Labels (Excel)
```

## Core Principles

### Preservation Rules (Customize per Domain)
Identify what should NEVER be merged based on theoretical importance:
- **Entity-specific**: Different companies, teams, people
- **Event-specific**: Different conferences, tournaments, time periods
- **Geography-specific**: Different countries, regions
- **Stakeholder-specific**: Different actor perspectives

### Theoretical Framework (Template)
Replace with your relevant framework:

| Type | Description | Example Topics |
|------|-------------|----------------|
| **Category A** | Definition | Topics fitting A |
| **Category B** | Definition | Topics fitting B |
| **Category C** | Definition | Topics fitting C |
| **Cross-cutting** | Spans multiple | Topics by entity/domain |

**Example: Legitimacy Framework (Suchman, 1995)**
- Cognitive: Institutional recognition, taken-for-granted status
- Pragmatic: Direct stakeholder benefits, practical interests
- Moral: Normative evaluation, values alignment

### Multi-Category Topics
Some topics belong to multiple categories:
- Track explicitly in assignments dictionary
- Calculate overlap for reconciliation
- Display as semicolon-separated: "Category A; Cross-cutting"

## Required Inputs

1. **Topic model output** (Excel/CSV)
   - Columns: Topic ID, Count, Name/Label, Keywords, Representative_Docs (optional)

2. **Merge recommendations** (optional)
   - Sheets: MERGE_GROUPS, INDEPENDENT_TOPICS

3. **Document data** (for label updates)
   - Contains individual documents with Topic ID column

## Key Code Patterns

### Pattern 1: Final Topic Definition

```python
FINAL_TOPICS = {
    'A1': {
        'label': 'Descriptive Label for Topic',
        'theme': 'Category-Subcategory',  # e.g., 'Pragmatic-Fan'
        'sources': [8, 12, 45]  # Original topic IDs to merge
    },
    'A2': {
        'label': 'Another Topic Label',
        'theme': 'Category-Subcategory',
        'sources': [3, 17, 33]
    },
    # Topics can appear in multiple final topics for multi-category
}
```

### Pattern 2: Assignment Mapping

```python
assignments = {}
for code, data in FINAL_TOPICS.items():
    for tid in data['sources']:
        if tid not in assignments:
            assignments[tid] = []
        assignments[tid].append((code, data['theme']))

# Find multi-category topics
multi_cat = {tid: assigns for tid, assigns in assignments.items() 
             if len(assigns) > 1}
```

### Pattern 3: Overlap Calculation

```python
total_overlap = sum(
    topics[tid]['count'] * (len(assigns) - 1)
    for tid, assigns in assignments.items()
    if len(assigns) > 1
)

# Verification: non_outlier_docs + total_overlap = table1_total
```

### Pattern 4: Excel Label Update

```python
TOPIC_MAPPING = {
    0: [('A1', 'Category')],
    1: [('A2', 'Category')],
    4: [('B1', 'Category A'), ('C1', 'Cross-cutting')],  # Multi-category
    # ... all topic IDs
}

def get_themes(topic_id):
    if topic_id in TOPIC_MAPPING:
        themes = list(set([m[1] for m in TOPIC_MAPPING[topic_id]]))
        return '; '.join(themes)
    return 'Unknown'

df['Final_Topic_Code'] = df['Topic'].apply(get_final_codes)
df['Final_Topic_Label'] = df['Topic'].apply(get_final_labels)
df['Category_Theme'] = df['Topic'].apply(get_themes)
```

## Handling Common Requests

| User Request | Action |
|--------------|--------|
| "Preserve X separately" | Add as independent topic code |
| "Merge these topics" | Combine source IDs into single code |
| "Exact counts please" | Replace ~ approximations with computed totals |
| "Integrate into tables" | Remove standalone sections, embed in category tables |
| "Count mismatch?" | Explain multi-category overlap, show reconciliation |
| "Keep original style" | Preserve existing template structure when updating |

## Script Templates

See `scripts/` for reference implementations:
- `generate_implementation_plan.py` - Full plan generation
- `update_excel_with_labels.py` - Excel column updates

Adapt these scripts by:
1. Updating FINAL_TOPICS with your topic structure
2. Replacing FINAL_LABELS with your labels
3. Modifying theme categories to match your framework

## Verification Checklist

- [ ] All non-outlier topics assigned to at least one category
- [ ] Multi-category topics explicitly tracked
- [ ] Overlap reconciliation verified
- [ ] Domain-specific topics preserved separately
- [ ] Category subtotals match grand total
- [ ] Output file has new classification columns