--- name: Sentiment Analyzer slug: sentiment-analyzer description: Analyze text sentiment at scale with nuanced understanding category: ai-ml complexity: intermediate version: "1.0.0" author: "ID8Labs" triggers: - "analyze sentiment" - "sentiment analysis" - "opinion mining" - "emotion detection" - "text sentiment" tags: - sentiment - NLP - opinion-mining - emotion - text-analysis --- # Sentiment Analyzer The Sentiment Analyzer skill guides you through implementing sentiment analysis systems that understand the emotional tone and opinion in text. From simple positive/negative classification to nuanced aspect-based sentiment and emotion detection, this skill covers the full spectrum of sentiment analysis capabilities. Sentiment analysis is deceptively complex. Sarcasm, context, domain-specific language, and cultural nuances all challenge simple approaches. This skill helps you choose the right techniques for your accuracy requirements, whether that's fast rule-based systems, fine-tuned classifiers, or LLM-based analysis. Whether you're analyzing customer reviews, social media mentions, support tickets, or survey responses, this skill ensures your sentiment analysis captures the true voice of your users. ## Core Workflows ### Workflow 1: Choose Sentiment Analysis Approach 1. **Define** requirements: - Granularity: Binary, ternary, or continuous? - Aspects: Overall or aspect-based? - Emotions: Sentiment or specific emotions? - Languages: Single or multilingual? - Volume: Batch or real-time? 2. **Evaluate** options: | Approach | Speed | Accuracy | Customizable | Best For | |----------|-------|----------|--------------|----------| | Rule-based (VADER) | Very fast | Moderate | Limited | Social media, quick analysis | | Pre-trained (RoBERTa) | Fast | Good | Fine-tunable | General text | | Fine-tuned | Fast | Best | Requires data | Domain-specific | | LLM (GPT-4, Claude) | Slow | Excellent | Prompt-based | Nuanced, complex | 3. **Select** based on tradeoffs 4. **Plan** implementation ### Workflow 2: Implement Sentiment Pipeline 1. **Preprocess** text: ```python def preprocess_for_sentiment(text): # Preserve sentiment-relevant features text = normalize_unicode(text) # Handle social media conventions text = expand_contractions(text) # don't -> do not text = normalize_elongation(text) # loooove -> love text = handle_negation(text) # Mark negation scope # Preserve but normalize emoji/emoticons text = convert_emoji_to_text(text) # :) -> [HAPPY] return text ``` 2. **Analyze** sentiment: ```python class SentimentAnalyzer: def __init__(self, model_type="transformer"): if model_type == "transformer": self.model = pipeline("sentiment-analysis", model="cardiffnlp/twitter-roberta-base-sentiment") elif model_type == "vader": self.model = SentimentIntensityAnalyzer() def analyze(self, text): preprocessed = preprocess_for_sentiment(text) result = self.model(preprocessed) return { "text": text, "sentiment": result["label"], "confidence": result["score"] } ``` 3. **Aggregate** for insights: - Overall sentiment distribution - Sentiment over time - Sentiment by segment/topic 4. **Validate** results ### Workflow 3: Aspect-Based Sentiment Analysis 1. **Identify** aspects to track: - Product features (price, quality, service) - Experience dimensions (speed, accuracy, friendliness) - Custom aspects for your domain 2. **Extract** aspects from text: ```python def extract_aspects(text, aspect_list): # Find mentions of known aspects found_aspects = [] for aspect in aspect_list: if aspect.lower() in text.lower(): found_aspects.append(aspect) # Also extract using NER or LLM for unknown aspects extracted = extract_noun_phrases(text) return found_aspects + extracted ``` 3. **Analyze** sentiment per aspect: ```python def aspect_sentiment(text, aspects): results = {} for aspect in aspects: # Extract sentences mentioning aspect relevant = extract_aspect_context(text, aspect) # Analyze sentiment of relevant text if relevant: sentiment = analyze_sentiment(relevant) results[aspect] = sentiment return results ``` 4. **Aggregate** aspect sentiments across documents ## Quick Reference | Action | Command/Trigger | |--------|-----------------| | Analyze sentiment | "Analyze sentiment of [text]" | | Choose approach | "Best sentiment analysis for [use case]" | | Aspect-based | "Sentiment by feature for [reviews]" | | Detect emotions | "Detect emotions in [text]" | | Handle sarcasm | "How to handle sarcasm in sentiment" | | Aggregate results | "Summarize sentiment trends" | ## Best Practices - **Preserve Sentiment Signals**: Don't preprocess away important cues - Keep punctuation (!! vs .) - Preserve capitalization patterns - Keep emoji/emoticons (convert to text) - Handle negation explicitly - **Match Model to Domain**: Pre-trained models have domain bias - Twitter models work differently than product review models - Fine-tune or select domain-appropriate models - Test on your actual data before deploying - **Handle Negation Properly**: "Not bad" isn't negative - Rule-based: Mark negation scope - Neural models: Usually handle automatically - Test negation cases explicitly - **Consider Context**: Sentiment depends on context - "Cheap" is positive for budget items, negative for luxury - Use aspect-based analysis for nuance - Include surrounding context when possible - **Validate with Humans**: Machine sentiment != human sentiment - Sample and manually verify results - Calculate agreement metrics - Iterate on disagreements - **Report Uncertainty**: Not all text has clear sentiment - Neutral is a valid class - Low confidence predictions should be flagged - Consider abstaining on ambiguous cases ## Advanced Techniques ### LLM-Based Nuanced Sentiment Use language models for complex analysis: ```python def llm_sentiment_analysis(text, aspects=None): prompt = f"""Analyze the sentiment of the following text. Text: "{text}" Provide: 1. Overall sentiment (positive/negative/neutral/mixed) 2. Confidence (0-1) 3. Key positive aspects mentioned 4. Key negative aspects mentioned 5. Notable emotional tones (joy, frustration, surprise, etc.) {"Also rate sentiment specifically for these aspects: " + ", ".join(aspects) if aspects else ""} Respond in JSON format.""" response = llm.complete(prompt) return json.loads(response) ``` ### Emotion Detection Beyond positive/negative to specific emotions: ```python from transformers import pipeline # Multi-label emotion classification emotion_classifier = pipeline( "text-classification", model="SamLowe/roberta-base-go_emotions", top_k=None ) def detect_emotions(text): results = emotion_classifier(text)[0] # Filter to significant emotions significant = [r for r in results if r["score"] > 0.1] return sorted(significant, key=lambda x: x["score"], reverse=True) # Example output: # [{"label": "admiration", "score": 0.45}, # {"label": "joy", "score": 0.32}, # {"label": "gratitude", "score": 0.28}] ``` ### Comparative Sentiment Detect sentiment comparisons: ```python def comparative_sentiment(text): """ Detect: "A is better than B" patterns """ prompt = f"""Analyze this text for comparative sentiment. Text: "{text}" If the text compares entities, identify: 1. Entity A (the preferred/better one) 2. Entity B (the less preferred/worse one) 3. Dimension of comparison (price, quality, etc.) 4. Strength of preference (slight, moderate, strong) If no comparison, respond with: {{"comparison": false}} Respond in JSON.""" return llm.complete(prompt) ``` ### Temporal Sentiment Tracking Analyze sentiment over time: ```python def sentiment_timeline(documents, time_field, window="day"): """ Track sentiment trends over time. """ # Analyze each document results = [] for doc in documents: sentiment = analyze_sentiment(doc["text"]) results.append({ "timestamp": doc[time_field], "sentiment": sentiment["score"], "text": doc["text"] }) # Aggregate by time window df = pd.DataFrame(results) df["window"] = df["timestamp"].dt.floor(window) trends = df.groupby("window").agg({ "sentiment": ["mean", "std", "count"], "text": lambda x: list(x)[:3] # Sample texts }) return trends ``` ### Sarcasm Detection Handle sarcasm before sentiment analysis: ```python def detect_sarcasm(text): """ Detect potential sarcasm indicators. """ indicators = { "exaggeration": bool(re.search(r'\b(best|worst|ever|always|never)\b', text.lower())), "air_quotes": '"' in text, "ellipsis": "..." in text, "positive_negative_mix": has_mixed_signals(text), "hashtags": "#sarcasm" in text.lower() or "#not" in text.lower() } # Use model for detection sarcasm_score = sarcasm_model.predict(text) return { "is_sarcastic": sarcasm_score > 0.5, "confidence": sarcasm_score, "indicators": indicators } def sentiment_with_sarcasm(text): sarcasm = detect_sarcasm(text) base_sentiment = analyze_sentiment(text) if sarcasm["is_sarcastic"] and sarcasm["confidence"] > 0.7: # Flip sentiment return flip_sentiment(base_sentiment) return base_sentiment ``` ## Common Pitfalls to Avoid - Using generic models on domain-specific text - Preprocessing away sentiment-relevant features (emoji, punctuation) - Ignoring negation handling - Treating neutral as absence of opinion vs explicit neutrality - Not validating model outputs against human judgment - Assuming sarcasm doesn't exist in your data - Over-weighting extreme sentiments in aggregation - Reporting sentiment without confidence/uncertainty