--- name: survey-analyzer description: Analyze survey responses with Likert scale analysis, cross-tabulations, sentiment scoring, and frequency distributions with visualizations. --- # Survey Analyzer Comprehensive survey data analysis with Likert scales, cross-tabs, and sentiment analysis. ## Features - **Likert Scale Analysis**: Agreement scale scoring and visualization - **Cross-Tabulation**: Relationship analysis between categorical variables - **Frequency Analysis**: Response distributions and percentages - **Sentiment Scoring**: Text response sentiment analysis - **Open-Ended Analysis**: Theme extraction from text responses - **Statistical Tests**: Chi-square, correlations, significance testing - **Visualizations**: Bar charts, heatmaps, word clouds, distribution plots - **Report Generation**: Comprehensive PDF/HTML reports ## Quick Start ```python from survey_analyzer import SurveyAnalyzer analyzer = SurveyAnalyzer() # Load survey data analyzer.load_csv('survey_responses.csv') # Analyze Likert scale question results = analyzer.likert_analysis('satisfaction', scale_type='agreement') print(f"Mean score: {results['mean_score']:.2f}") # Cross-tabulation crosstab = analyzer.crosstab('age_group', 'product_preference') print(crosstab) # Generate report analyzer.generate_report('survey_report.pdf') ``` ## CLI Usage ```bash # Analyze Likert scale python survey_analyzer.py --data survey.csv --likert satisfaction --output results.pdf # Cross-tabulation python survey_analyzer.py --data survey.csv --crosstab age_group product --output crosstab.png # Sentiment analysis python survey_analyzer.py --data survey.csv --sentiment comments --output sentiment.html # Full report python survey_analyzer.py --data survey.csv --report --output full_report.pdf ``` ## API Reference ### SurveyAnalyzer Class ```python class SurveyAnalyzer: def __init__(self) # Data Loading def load_csv(self, filepath, **kwargs) -> 'SurveyAnalyzer' def load_data(self, data: pd.DataFrame) -> 'SurveyAnalyzer' # Likert Scale Analysis def likert_analysis(self, column, scale_type='agreement') -> Dict def likert_comparison(self, columns: List[str]) -> pd.DataFrame def plot_likert(self, column, output, scale_type='agreement') -> str # Frequency Analysis def frequency_table(self, column) -> pd.DataFrame def multiple_choice(self, column, delimiter=',') -> pd.DataFrame def plot_frequencies(self, column, output, top_n=None) -> str # Cross-Tabulation def crosstab(self, row_var, col_var, normalize=None) -> pd.DataFrame def chi_square_test(self, row_var, col_var) -> Dict def plot_crosstab(self, row_var, col_var, output) -> str # Sentiment Analysis def sentiment_analysis(self, column) -> pd.DataFrame def sentiment_summary(self, column) -> Dict def plot_sentiment(self, column, output) -> str # Open-Ended Analysis def word_frequency(self, column, top_n=20) -> pd.DataFrame def word_cloud(self, column, output) -> str def extract_themes(self, column, n_themes=5) -> List[str] # Statistics def satisfaction_score(self, columns: List[str]) -> Dict def response_rate(self) -> Dict def demographics_summary(self, columns: List[str]) -> pd.DataFrame # Reporting def generate_report(self, output, format='pdf') -> str def summary(self) -> str ``` ## Likert Scale Analysis ### Standard Scales ```python # 5-point agreement scale analyzer.likert_analysis('satisfaction', scale_type='agreement') # 1=Strongly Disagree, 2=Disagree, 3=Neutral, 4=Agree, 5=Strongly Agree # 5-point frequency scale analyzer.likert_analysis('usage', scale_type='frequency') # 1=Never, 2=Rarely, 3=Sometimes, 4=Often, 5=Always # Custom scale analyzer.likert_analysis('rating', scale_type='custom', labels=['Poor', 'Fair', 'Good', 'Excellent']) ``` ### Results ```python results = analyzer.likert_analysis('satisfaction') # { # 'mean_score': 4.2, # 'median': 4, # 'mode': 5, # 'distribution': {1: 2, 2: 5, 3: 15, 4: 40, 5: 38}, # 'percentages': {1: 2%, 2: 5%, 3: 15%, 4: 40%, 5: 38%}, # 'top_2_box': 78%, # % Agree + Strongly Agree # 'bottom_2_box': 7% # % Disagree + Strongly Disagree # } ``` ### Visualization ```python # Stacked bar chart analyzer.plot_likert('satisfaction', 'likert_chart.png') # Compare multiple questions analyzer.likert_comparison(['quality', 'value', 'service']) analyzer.plot_likert_comparison(['quality', 'value', 'service'], 'comparison.png') ``` ## Frequency Analysis ### Single Choice ```python freq = analyzer.frequency_table('age_group') # Count Percentage # 18-24 45 22.5% # 25-34 78 39.0% # 35-44 52 26.0% # 45+ 25 12.5% # Plot analyzer.plot_frequencies('age_group', 'age_distribution.png') ``` ### Multiple Choice For questions allowing multiple selections: ```python # Data format: "Option A, Option B, Option C" results = analyzer.multiple_choice('features_liked', delimiter=',') # Count Percentage # Price 120 60% # Quality 95 47.5% # Design 80 40% # Durability 70 35% analyzer.plot_frequencies('features_liked', 'features.png', top_n=10) ``` ## Cross-Tabulation ### Basic Cross-Tab ```python crosstab = analyzer.crosstab('age_group', 'satisfaction') # Satisfied Neutral Dissatisfied # 18-24 30 10 5 # 25-34 60 15 3 # 35-44 40 8 4 # 45+ 18 5 2 # With percentages crosstab_pct = analyzer.crosstab('age_group', 'satisfaction', normalize='index') # Row percentages ``` ### Statistical Testing ```python result = analyzer.chi_square_test('age_group', 'satisfaction') # { # 'statistic': 12.45, # 'p_value': 0.014, # 'significant': True, # 'interpretation': 'There is a significant relationship between # age_group and satisfaction (p=0.014)' # } ``` ### Visualization ```python # Heatmap analyzer.plot_crosstab('age_group', 'satisfaction', 'crosstab_heatmap.png') ``` ## Sentiment Analysis Analyze open-ended text responses: ```python # Analyze all comments sentiment_df = analyzer.sentiment_analysis('comments') # comment polarity sentiment # 0 "Great product!" 0.8 Positive # 1 "Could be better" 0.1 Neutral # 2 "Very disappointed" -0.6 Negative # Summary summary = analyzer.sentiment_summary('comments') # { # 'positive': 65%, # 'neutral': 20%, # 'negative': 15%, # 'avg_polarity': 0.35 # } # Visualize analyzer.plot_sentiment('comments', 'sentiment_distribution.png') ``` ## Open-Ended Analysis ### Word Frequency ```python words = analyzer.word_frequency('comments', top_n=20) # Word Frequency # 0 great 45 # 1 quality 38 # 2 price 32 # ... ``` ### Word Cloud ```python analyzer.word_cloud('comments', 'wordcloud.png') ``` ### Theme Extraction ```python themes = analyzer.extract_themes('feedback', n_themes=5) # ['product quality', 'customer service', 'pricing', # 'delivery speed', 'user experience'] ``` ## Satisfaction Metrics ### Net Promoter Score (NPS) ```python nps = analyzer.nps_score('recommendation') # 0-10 scale # { # 'promoters': 65%, # 9-10 # 'passives': 25%, # 7-8 # 'detractors': 10%, # 0-6 # 'nps': 55 # } ``` ### Overall Satisfaction ```python satisfaction = analyzer.satisfaction_score([ 'product_quality', 'customer_service', 'value_for_money', 'ease_of_use' ]) # { # 'overall_score': 4.3, # 'category_scores': {...}, # 'satisfaction_rate': 86% # % scoring 4-5 # } ``` ## Demographics Analysis ```python demographics = analyzer.demographics_summary([ 'age_group', 'gender', 'location', 'income_range' ]) # Returns frequency tables for each demographic variable ``` ## Response Rate Analysis ```python response_rate = analyzer.response_rate() # { # 'total_respondents': 200, # 'completion_rate': 85%, # 'average_time': '5m 30s', # 'dropout_points': { # 'question_5': 8%, # 'question_12': 5% # } # } ``` ## Report Generation ### Comprehensive Report ```python analyzer.generate_report('survey_report.pdf', format='pdf') ``` Report includes: - Executive summary - Response rate and demographics - Question-by-question analysis - Likert scale visualizations - Cross-tabulations - Sentiment analysis - Key findings and recommendations ### Custom Report Sections ```python analyzer.set_report_sections([ 'executive_summary', 'demographics', 'likert_questions', 'cross_tabs', 'sentiment', 'recommendations' ]) ``` ## Advanced Features ### Filter by Segment ```python # Analyze subset of responses analyzer.filter('age_group', '25-34') results = analyzer.likert_analysis('satisfaction') analyzer.clear_filter() ``` ### Compare Segments ```python comparison = analyzer.compare_segments( segment_col='age_group', metric_col='satisfaction' ) # Shows how different segments scored the metric ``` ### Trend Analysis For longitudinal surveys: ```python trends = analyzer.trend_analysis( metric='satisfaction', time_col='survey_date', period='month' ) analyzer.plot_trends(trends, 'satisfaction_trend.png') ``` ## Dependencies - pandas>=2.0.0 - numpy>=1.24.0 - scipy>=1.10.0 - textblob>=0.17.0 - matplotlib>=3.7.0 - seaborn>=0.12.0 - wordcloud>=1.9.0 - reportlab>=4.0.0