{ "cells": [ { "cell_type": "markdown", "id": "650e0648", "metadata": {}, "source": [ "---\n", "title: \"Statistical Analysis with Python\"\n", "description: \"Learn statistical analysis using Python's scipy and pandas libraries with real survey data\"\n", "date: 2025-01-27\n", "lastmod: 2025-01-27\n", "author: \"Zer0-Mistakes Team\"\n", "layout: notebook\n", "difficulty: intermediate\n", "tags: [python, statistics, scipy, data-analysis, surveys]\n", "categories: [Notebooks, Tutorials]\n", "toc: true\n", "comments: true\n", "---\n", "\n", "# Statistical Analysis with Python\n", "\n", "Learn to perform statistical analysis using Python's powerful libraries. This tutorial covers descriptive statistics, hypothesis testing, correlation analysis, and more using real survey response data.\n", "\n", "**What you'll learn:**\n", "- Descriptive statistics (mean, median, mode, variance)\n", "- Correlation analysis\n", "- Hypothesis testing (t-tests, chi-square)\n", "- Normal distribution and normality testing\n", "- Confidence intervals" ] }, { "cell_type": "markdown", "id": "4aa8976a", "metadata": {}, "source": [ "## Setup and Imports" ] }, { "cell_type": "code", "execution_count": 12, "id": "25a1e67b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "โœ… Libraries imported successfully!\n", "Pandas: 3.0.0\n", "NumPy: 2.4.2\n", "SciPy: 1.17.0\n" ] } ], "source": [ "# Import statistical and data libraries\n", "import pandas as pd\n", "import numpy as np\n", "import scipy\n", "from scipy import stats\n", "from scipy.stats import ttest_ind, chi2_contingency, pearsonr, spearmanr\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", "\n", "print(\"โœ… Libraries imported successfully!\")\n", "print(f\"Pandas: {pd.__version__}\")\n", "print(f\"NumPy: {np.__version__}\")\n", "print(f\"SciPy: {scipy.__version__}\")" ] }, { "cell_type": "markdown", "id": "63c12e25", "metadata": {}, "source": [ "## Load Survey Data" ] }, { "cell_type": "code", "execution_count": 3, "id": "5bd6d2fb", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "๐Ÿ“Š Survey Data Preview:\n", "Shape: 75 respondents ร— 13 questions\n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
respondent_idagegendereducationemploymentincome_bracketproduct_satisfactionservice_ratingwould_recommendpurchase_frequencycategory_preferencefeedback_lengthresponse_date
0128FemaleBachelor'sFull-time50000-7500045YesMonthlyElectronics1422025-01-15
1235MaleMaster'sFull-time75000-10000054YesWeeklyElectronics892025-01-16
2342FemaleBachelor'sPart-time25000-5000033MaybeQuarterlyFurniture1562025-01-17
3423MaleHigh SchoolStudentUnder 2500044YesMonthlyElectronics452025-01-18
4551FemaleDoctorateFull-time100000+55YesWeeklyElectronics2032025-01-19
5631Non-binaryBachelor'sFull-time50000-7500044YesMonthlyFurniture782025-01-20
6745MaleMaster'sSelf-employed75000-10000032NoRarelyElectronics3122025-01-21
7827FemaleBachelor'sFull-time50000-7500055YesMonthlyElectronics672025-01-22
8938MaleBachelor'sFull-time75000-10000044YesQuarterlyFurniture952025-01-23
91056FemaleHigh SchoolRetired25000-5000045YesMonthlyFurniture1242025-01-24
\n", "
" ], "text/plain": [ " respondent_id age gender education employment income_bracket \\\n", "0 1 28 Female Bachelor's Full-time 50000-75000 \n", "1 2 35 Male Master's Full-time 75000-100000 \n", "2 3 42 Female Bachelor's Part-time 25000-50000 \n", "3 4 23 Male High School Student Under 25000 \n", "4 5 51 Female Doctorate Full-time 100000+ \n", "5 6 31 Non-binary Bachelor's Full-time 50000-75000 \n", "6 7 45 Male Master's Self-employed 75000-100000 \n", "7 8 27 Female Bachelor's Full-time 50000-75000 \n", "8 9 38 Male Bachelor's Full-time 75000-100000 \n", "9 10 56 Female High School Retired 25000-50000 \n", "\n", " product_satisfaction service_rating would_recommend purchase_frequency \\\n", "0 4 5 Yes Monthly \n", "1 5 4 Yes Weekly \n", "2 3 3 Maybe Quarterly \n", "3 4 4 Yes Monthly \n", "4 5 5 Yes Weekly \n", "5 4 4 Yes Monthly \n", "6 3 2 No Rarely \n", "7 5 5 Yes Monthly \n", "8 4 4 Yes Quarterly \n", "9 4 5 Yes Monthly \n", "\n", " category_preference feedback_length response_date \n", "0 Electronics 142 2025-01-15 \n", "1 Electronics 89 2025-01-16 \n", "2 Furniture 156 2025-01-17 \n", "3 Electronics 45 2025-01-18 \n", "4 Electronics 203 2025-01-19 \n", "5 Furniture 78 2025-01-20 \n", "6 Electronics 312 2025-01-21 \n", "7 Electronics 67 2025-01-22 \n", "8 Furniture 95 2025-01-23 \n", "9 Furniture 124 2025-01-24 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Load the survey response dataset\n", "survey = pd.read_csv('/Users/bamr87/github/zer0-mistakes/assets/data/notebooks/survey_responses.csv')\n", "\n", "print(\"๐Ÿ“Š Survey Data Preview:\")\n", "print(f\"Shape: {survey.shape[0]} respondents ร— {survey.shape[1]} questions\\n\")\n", "survey.head(10)" ] }, { "cell_type": "markdown", "id": "58bc6f8d", "metadata": {}, "source": [ "## Descriptive Statistics\n", "\n", "Let's calculate key descriptive statistics for our numerical columns:" ] }, { "cell_type": "code", "execution_count": 5, "id": "7417d28e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "๐Ÿ“ˆ Descriptive Statistics for Survey Responses:\n", "======================================================================\n", "\n", "AGE:\n", " Mean: 38.28\n", " Median: 37.00\n", " Mode: 26\n", " Std Dev: 11.24\n", " Variance: 126.31\n", " Range: 20 - 62\n", " IQR: 18.00\n", "\n", "PRODUCT SATISFACTION:\n", " Mean: 4.16\n", " Median: 4.00\n", " Mode: 4\n", " Std Dev: 0.74\n", " Variance: 0.54\n", " Range: 2 - 5\n", " IQR: 1.00\n", "\n", "SERVICE RATING:\n", " Mean: 4.07\n", " Median: 4.00\n", " Mode: 4\n", " Std Dev: 0.86\n", " Variance: 0.74\n", " Range: 2 - 5\n", " IQR: 1.00\n", "\n", "FEEDBACK LENGTH:\n", " Mean: 130.56\n", " Median: 112.00\n", " Mode: 98\n", " Std Dev: 71.89\n", " Variance: 5167.84\n", " Range: 32 - 312\n", " IQR: 97.50\n" ] } ], "source": [ "# Calculate comprehensive descriptive statistics\n", "numeric_cols = ['age', 'product_satisfaction', 'service_rating', 'feedback_length']\n", "\n", "print(\"๐Ÿ“ˆ Descriptive Statistics for Survey Responses:\")\n", "print(\"=\" * 70)\n", "\n", "for col in numeric_cols:\n", " data = survey[col]\n", " print(f\"\\n{col.upper().replace('_', ' ')}:\")\n", " print(f\" Mean: {data.mean():.2f}\")\n", " print(f\" Median: {data.median():.2f}\")\n", " print(f\" Mode: {data.mode().values[0]}\")\n", " print(f\" Std Dev: {data.std():.2f}\")\n", " print(f\" Variance: {data.var():.2f}\")\n", " print(f\" Range: {data.min()} - {data.max()}\")\n", " print(f\" IQR: {data.quantile(0.75) - data.quantile(0.25):.2f}\")" ] }, { "cell_type": "markdown", "id": "7e9cfbad", "metadata": {}, "source": [ "## Correlation Analysis\n", "\n", "Examine relationships between satisfaction metrics:" ] }, { "cell_type": "code", "execution_count": 6, "id": "f49742d6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "๐Ÿ”— Correlation Matrix (Pearson):\n", " product_satisfaction service_rating feedback_length \\\n", "product_satisfaction 1.000 0.709 -0.309 \n", "service_rating 0.709 1.000 -0.233 \n", "feedback_length -0.309 -0.233 1.000 \n", "age -0.218 -0.005 0.647 \n", "\n", " age \n", "product_satisfaction -0.218 \n", "service_rating -0.005 \n", "feedback_length 0.647 \n", "age 1.000 \n", "\n", "\n", "๐Ÿ“Š Detailed Correlation Analysis:\n", "============================================================\n", "\n", "product_satisfaction vs service_rating:\n", " Pearson r = 0.7093 (***)\n", " p-value = 0.0000\n", " โ†’ Strong positive correlation\n", "\n", "age vs product_satisfaction:\n", " Pearson r = -0.2179 (ns)\n", " p-value = 0.0604\n", " โ†’ Weak negative correlation\n", "\n", "age vs service_rating:\n", " Pearson r = -0.0048 (ns)\n", " p-value = 0.9677\n", " โ†’ Weak negative correlation\n", "\n", "feedback_length vs product_satisfaction:\n", " Pearson r = -0.3092 (**)\n", " p-value = 0.0069\n", " โ†’ Weak negative correlation\n" ] } ], "source": [ "# Calculate correlation matrix for satisfaction metrics\n", "satisfaction_cols = ['product_satisfaction', 'service_rating', 'feedback_length', 'age']\n", "correlation_matrix = survey[satisfaction_cols].corr()\n", "\n", "print(\"๐Ÿ”— Correlation Matrix (Pearson):\")\n", "print(correlation_matrix.round(3))\n", "\n", "# Detailed pairwise correlations with significance\n", "print(\"\\n\\n๐Ÿ“Š Detailed Correlation Analysis:\")\n", "print(\"=\" * 60)\n", "\n", "pairs = [\n", " ('product_satisfaction', 'service_rating'),\n", " ('age', 'product_satisfaction'),\n", " ('age', 'service_rating'),\n", " ('feedback_length', 'product_satisfaction')\n", "]\n", "\n", "for col1, col2 in pairs:\n", " r, p_value = pearsonr(survey[col1], survey[col2])\n", " significance = \"***\" if p_value < 0.001 else \"**\" if p_value < 0.01 else \"*\" if p_value < 0.05 else \"ns\"\n", " print(f\"\\n{col1} vs {col2}:\")\n", " print(f\" Pearson r = {r:.4f} ({significance})\")\n", " print(f\" p-value = {p_value:.4f}\")\n", " if abs(r) >= 0.7:\n", " strength = \"strong\"\n", " elif abs(r) >= 0.4:\n", " strength = \"moderate\"\n", " else:\n", " strength = \"weak\"\n", " direction = \"positive\" if r > 0 else \"negative\"\n", " print(f\" โ†’ {strength.capitalize()} {direction} correlation\")" ] }, { "cell_type": "markdown", "id": "30248e44", "metadata": {}, "source": [ "## Hypothesis Testing: T-Tests\n", "\n", "Compare satisfaction scores between different groups:" ] }, { "cell_type": "code", "execution_count": 7, "id": "e273a147", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "๐Ÿงช Independent Samples T-Test: Product Satisfaction by Gender\n", "============================================================\n", "\n", "Group Statistics:\n", " Male (n=35): M = 4.00, SD = 0.77\n", " Female (n=36): M = 4.28, SD = 0.70\n", "\n", "Test Results:\n", " t-statistic = -1.5932\n", " p-value = 0.1157\n", "\n", "Conclusion at ฮฑ = 0.05:\n", " โœ— FAIL TO REJECT null hypothesis - no significant difference\n" ] } ], "source": [ "# Independent samples t-test: Compare satisfaction between genders\n", "male_satisfaction = survey[survey['gender'] == 'Male']['product_satisfaction']\n", "female_satisfaction = survey[survey['gender'] == 'Female']['product_satisfaction']\n", "\n", "t_stat, p_value = ttest_ind(male_satisfaction, female_satisfaction)\n", "\n", "print(\"๐Ÿงช Independent Samples T-Test: Product Satisfaction by Gender\")\n", "print(\"=\" * 60)\n", "print(f\"\\nGroup Statistics:\")\n", "print(f\" Male (n={len(male_satisfaction)}): M = {male_satisfaction.mean():.2f}, SD = {male_satisfaction.std():.2f}\")\n", "print(f\" Female (n={len(female_satisfaction)}): M = {female_satisfaction.mean():.2f}, SD = {female_satisfaction.std():.2f}\")\n", "print(f\"\\nTest Results:\")\n", "print(f\" t-statistic = {t_stat:.4f}\")\n", "print(f\" p-value = {p_value:.4f}\")\n", "print(f\"\\nConclusion at ฮฑ = 0.05:\")\n", "if p_value < 0.05:\n", " print(\" โœ“ REJECT null hypothesis - significant difference exists\")\n", "else:\n", " print(\" โœ— FAIL TO REJECT null hypothesis - no significant difference\")" ] }, { "cell_type": "markdown", "id": "ef914e6c", "metadata": {}, "source": [ "## Chi-Square Test\n", "\n", "Test for association between categorical variables:" ] }, { "cell_type": "code", "execution_count": 8, "id": "1bcd7356", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "๐Ÿ“‹ Contingency Table: Purchase Frequency ร— Category Preference\n", "category_preference Electronics Furniture\n", "purchase_frequency \n", "Monthly 23 19\n", "Quarterly 4 8\n", "Rarely 3 4\n", "Weekly 14 0\n", "\n", "\n", "๐Ÿงช Chi-Square Test of Independence\n", "============================================================\n", "\n", "Results:\n", " Chi-square statistic = 14.0252\n", " Degrees of freedom = 3\n", " p-value = 0.0029\n", "\n", "Conclusion at ฮฑ = 0.05:\n", " โœ“ REJECT null hypothesis - variables are DEPENDENT\n", " โ†’ Purchase frequency IS associated with category preference\n" ] } ], "source": [ "# Chi-square test: Association between purchase frequency and category preference\n", "contingency_table = pd.crosstab(survey['purchase_frequency'], survey['category_preference'])\n", "\n", "print(\"๐Ÿ“‹ Contingency Table: Purchase Frequency ร— Category Preference\")\n", "print(contingency_table)\n", "\n", "chi2, p_value, dof, expected = chi2_contingency(contingency_table)\n", "\n", "print(\"\\n\\n๐Ÿงช Chi-Square Test of Independence\")\n", "print(\"=\" * 60)\n", "print(f\"\\nResults:\")\n", "print(f\" Chi-square statistic = {chi2:.4f}\")\n", "print(f\" Degrees of freedom = {dof}\")\n", "print(f\" p-value = {p_value:.4f}\")\n", "print(f\"\\nConclusion at ฮฑ = 0.05:\")\n", "if p_value < 0.05:\n", " print(\" โœ“ REJECT null hypothesis - variables are DEPENDENT\")\n", " print(\" โ†’ Purchase frequency IS associated with category preference\")\n", "else:\n", " print(\" โœ— FAIL TO REJECT null hypothesis - variables are INDEPENDENT\")\n", " print(\" โ†’ No significant association between purchase frequency and category\")" ] }, { "cell_type": "markdown", "id": "2b578c79", "metadata": {}, "source": [ "## Normality Testing\n", "\n", "Check if satisfaction scores follow a normal distribution:" ] }, { "cell_type": "code", "execution_count": 9, "id": "10b6a01f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "๐Ÿ“ Normality Tests for Product Satisfaction Scores\n", "============================================================\n", "\n", "1. Shapiro-Wilk Test:\n", " W-statistic = 0.8159\n", " p-value = 0.0000\n", "\n", "2. D'Agostino-Pearson Test:\n", " Kยฒ statistic = 3.1266\n", " p-value = 0.2094\n", "\n", "3. Distribution Shape:\n", " Skewness = -0.4623 (left-skewed)\n", " Kurtosis = -0.3638 (platykurtic)\n", "\n", "๐Ÿ“Š Conclusion:\n", " Data deviates significantly from normal distribution (p < 0.05)\n" ] } ], "source": [ "# Test normality of satisfaction scores using multiple methods\n", "data = survey['product_satisfaction']\n", "\n", "print(\"๐Ÿ“ Normality Tests for Product Satisfaction Scores\")\n", "print(\"=\" * 60)\n", "\n", "# Shapiro-Wilk Test (best for n < 5000)\n", "shapiro_stat, shapiro_p = stats.shapiro(data)\n", "print(f\"\\n1. Shapiro-Wilk Test:\")\n", "print(f\" W-statistic = {shapiro_stat:.4f}\")\n", "print(f\" p-value = {shapiro_p:.4f}\")\n", "\n", "# D'Agostino's K-squared Test\n", "dagostino_stat, dagostino_p = stats.normaltest(data)\n", "print(f\"\\n2. D'Agostino-Pearson Test:\")\n", "print(f\" Kยฒ statistic = {dagostino_stat:.4f}\")\n", "print(f\" p-value = {dagostino_p:.4f}\")\n", "\n", "# Skewness and Kurtosis\n", "skew = stats.skew(data)\n", "kurt = stats.kurtosis(data)\n", "print(f\"\\n3. Distribution Shape:\")\n", "print(f\" Skewness = {skew:.4f} ({'right-skewed' if skew > 0 else 'left-skewed' if skew < 0 else 'symmetric'})\")\n", "print(f\" Kurtosis = {kurt:.4f} ({'leptokurtic' if kurt > 0 else 'platykurtic' if kurt < 0 else 'mesokurtic'})\")\n", "\n", "print(f\"\\n๐Ÿ“Š Conclusion:\")\n", "if shapiro_p > 0.05:\n", " print(\" Data appears to be normally distributed (p > 0.05)\")\n", "else:\n", " print(\" Data deviates significantly from normal distribution (p < 0.05)\")" ] }, { "cell_type": "markdown", "id": "c71ceffe", "metadata": {}, "source": [ "## Confidence Intervals\n", "\n", "Calculate confidence intervals for key metrics:" ] }, { "cell_type": "code", "execution_count": 10, "id": "20826442", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "๐Ÿ“ 95% Confidence Intervals\n", "============================================================\n", "\n", "Product Satisfaction:\n", " Sample Mean: 4.16\n", " 95% CI: [3.99, 4.33]\n", " โ†’ We are 95% confident the true population mean\n", " falls between 3.99 and 4.33\n", "\n", "Service Rating:\n", " Sample Mean: 4.07\n", " 95% CI: [3.87, 4.26]\n", " โ†’ We are 95% confident the true population mean\n", " falls between 3.87 and 4.26\n", "\n", "Feedback Length:\n", " Sample Mean: 130.56\n", " 95% CI: [114.02, 147.10]\n", " โ†’ We are 95% confident the true population mean\n", " falls between 114.02 and 147.10\n", "\n", "Age:\n", " Sample Mean: 38.28\n", " 95% CI: [35.69, 40.87]\n", " โ†’ We are 95% confident the true population mean\n", " falls between 35.69 and 40.87\n" ] } ], "source": [ "# Calculate 95% confidence intervals for satisfaction metrics\n", "def confidence_interval(data, confidence=0.95):\n", " \"\"\"Calculate confidence interval for mean\"\"\"\n", " n = len(data)\n", " mean = np.mean(data)\n", " se = stats.sem(data) # Standard error of the mean\n", " h = se * stats.t.ppf((1 + confidence) / 2, n - 1) # Margin of error\n", " return mean, mean - h, mean + h\n", "\n", "print(\"๐Ÿ“ 95% Confidence Intervals\")\n", "print(\"=\" * 60)\n", "\n", "metrics = {\n", " 'Product Satisfaction': survey['product_satisfaction'],\n", " 'Service Rating': survey['service_rating'],\n", " 'Feedback Length': survey['feedback_length'],\n", " 'Age': survey['age']\n", "}\n", "\n", "for name, data in metrics.items():\n", " mean, lower, upper = confidence_interval(data)\n", " print(f\"\\n{name}:\")\n", " print(f\" Sample Mean: {mean:.2f}\")\n", " print(f\" 95% CI: [{lower:.2f}, {upper:.2f}]\")\n", " print(f\" โ†’ We are 95% confident the true population mean\")\n", " print(f\" falls between {lower:.2f} and {upper:.2f}\")" ] }, { "cell_type": "markdown", "id": "d4085a33", "metadata": {}, "source": [ "## Summary Statistics by Group" ] }, { "cell_type": "code", "execution_count": 11, "id": "e135c1e7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "๐Ÿ“Š SURVEY ANALYSIS SUMMARY\n", "======================================================================\n", "\n", "๐Ÿ“‹ Dataset Overview:\n", " Total Respondents: 75\n", " Average Age: 38.3 years\n", " Gender Distribution: {'Female': np.int64(36), 'Male': np.int64(35), 'Non-binary': np.int64(4)}\n", "\n", "๐ŸŽฏ Key Satisfaction Metrics:\n", " Product Satisfaction: 4.16/5\n", " Service Rating: 4.07/5\n", " Would Recommend: 58/75 (77.3%)\n", "\n", "๐Ÿ’ป Category Preferences:\n", " Electronics: 44 (58.7%)\n", " Furniture: 31 (41.3%)\n", "\n", "โฑ๏ธ Purchase Frequency:\n", " Monthly: 42 (56.0%)\n", " Weekly: 14 (18.7%)\n", " Quarterly: 12 (16.0%)\n", " Rarely: 7 (9.3%)\n", "\n", "======================================================================\n" ] } ], "source": [ "# Generate comprehensive summary statistics by demographic groups\n", "print(\"๐Ÿ“Š SURVEY ANALYSIS SUMMARY\")\n", "print(\"=\" * 70)\n", "\n", "# Overall statistics\n", "print(f\"\\n๐Ÿ“‹ Dataset Overview:\")\n", "print(f\" Total Respondents: {len(survey)}\")\n", "print(f\" Average Age: {survey['age'].mean():.1f} years\")\n", "print(f\" Gender Distribution: {dict(survey['gender'].value_counts())}\")\n", "\n", "# Key findings\n", "print(f\"\\n๐ŸŽฏ Key Satisfaction Metrics:\")\n", "print(f\" Product Satisfaction: {survey['product_satisfaction'].mean():.2f}/5\")\n", "print(f\" Service Rating: {survey['service_rating'].mean():.2f}/5\")\n", "recommend_yes = (survey['would_recommend'] == 'Yes').sum()\n", "print(f\" Would Recommend: {recommend_yes}/{len(survey)} ({100*recommend_yes/len(survey):.1f}%)\")\n", "\n", "# Category preferences\n", "print(f\"\\n๐Ÿ’ป Category Preferences:\")\n", "category_counts = survey['category_preference'].value_counts()\n", "for category, count in category_counts.items():\n", " pct = (count / len(survey)) * 100\n", " print(f\" {category}: {count} ({pct:.1f}%)\")\n", "\n", "# Purchase patterns\n", "print(f\"\\nโฑ๏ธ Purchase Frequency:\")\n", "frequency_counts = survey['purchase_frequency'].value_counts()\n", "for freq, count in frequency_counts.items():\n", " pct = (count / len(survey)) * 100\n", " print(f\" {freq}: {count} ({pct:.1f}%)\")\n", "\n", "print(\"\\n\" + \"=\" * 70)" ] }, { "cell_type": "markdown", "id": "ee08aedd", "metadata": {}, "source": [ "## Next Steps\n", "\n", "This tutorial covered the fundamentals of statistical analysis with Python. To continue learning:\n", "\n", "1. **Visualize your statistics** - Check out the [Matplotlib Visualization](/notebooks/matplotlib-visualization/) tutorial\n", "2. **Analyze larger datasets** - See the [Pandas Data Analysis](/notebooks/pandas-data-analysis/) tutorial\n", "3. **Fetch external data** - Learn about APIs in the [API Requests](/notebooks/api-requests/) tutorial\n", "\n", "**Key Takeaways:**\n", "- Use `describe()` for quick descriptive statistics\n", "- `scipy.stats` provides comprehensive hypothesis testing tools\n", "- Always check assumptions (normality, equal variances) before parametric tests\n", "- Correlation โ‰  causation - always interpret results carefully\n", "- Report confidence intervals alongside point estimates" ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.14.0" } }, "nbformat": 4, "nbformat_minor": 5 }