--- name: topic-modeling-text-mining description: Apply LDA, NMF, and other computational methods to discover patterns in large text corpora with appropriate parameter tuning allowed-tools: Read, Grep, Write, Edit, Glob, Bash, WebFetch --- # Topic Modeling and Text Mining Apply LDA, NMF, and other computational methods to discover patterns in large text corpora with appropriate parameter tuning. ## Overview This skill enables computational analysis of large text collections. It encompasses topic modeling, text mining techniques, and pattern discovery to reveal structures and themes in textual data for humanistic inquiry. ## Capabilities ### Topic Modeling - LDA implementation - NMF analysis - Structural topic models - Dynamic topic models - Parameter optimization ### Text Preprocessing - Tokenization - Stopword removal - Lemmatization/stemming - N-gram extraction - Document-term matrices ### Pattern Discovery - Word frequency analysis - Collocation detection - Named entity recognition - Sentiment analysis - Network extraction ### Visualization - Word clouds - Topic distributions - Temporal trends - Network graphs - Interactive displays ## Usage Guidelines ### Analysis Process 1. Prepare text corpus 2. Preprocess documents 3. Select modeling approach 4. Tune parameters 5. Run analysis 6. Interpret results 7. Validate findings ### Parameter Considerations - Number of topics - Iteration counts - Hyperparameters - Coherence metrics - Validation approaches ### Interpretation Guidelines - Examine topic words - Review representative documents - Consider domain knowledge - Validate with close reading - Acknowledge limitations ## Integration Points ### Related Processes - Text Mining and Distant Reading - Corpus Linguistics Analysis - Network Analysis for Humanities ### Collaborating Skills - tei-text-encoding - gis-mapping-humanities - literary-close-reading ## References - Digital humanities methodology - Topic modeling tutorials - Text analysis tools - Computational linguistics resources