{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Summary Statistics with Python\n", "> Summary statistics gives you the tools you need to boil down massive datasets to reveal the highlights. In this chapter, you'll explore summary statistics including mean, median, and standard deviation, and learn how to accurately interpret them. You'll also develop your critical thinking skills, allowing you to choose the best summary statistics for your data. This is the Summary of lecture \"Introduction to Statistics in Python\", via datacamp.\n", "\n", "- toc: true \n", "- badges: true\n", "- comments: true\n", "- author: Chanseok Kang\n", "- categories: [Python, Datacamp, Statistics]\n", "- image: images/iqr.png" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "\n", "plt.rcParams['figure.figsize'] = (10, 8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What is statistics?\n", "- Statistics\n", " - the practice and study of collecting and analyzing data\n", " - Summary Statistic - A fact about or summary of some data\n", "- Example\n", " - How likely is someone to purchase a product? Are people more likely to purchase it if they can use a different payment system?\n", " - How many occupants will your hotel have? How can you optimize occupancy?\n", " - How many sizes of jeans need to be manufactured so they can fit 95% of the population? Should the same number of each size be produced?\n", " - A/B tests: Which ad is more effective in getting people to purchase product?\n", "- Type of statistics\n", " - Descriptive statistics\n", " - Describe and summarize data\n", " - Inferential statistics\n", " - Use a sample of data to make inferences about a larger population\n", "- Type of data\n", " - Numeric (Quantitative)\n", " - Continuous (Measured)\n", " - Discrete (Counted)\n", " - Categorical (Qualitative)\n", " - Nomial (Unordered)\n", " - Ordinal (Ordered)\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Measures of center\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Mean and median\n", "In this chapter, you'll be working with the [2018 Food Carbon Footprint Index](https://www.nu3.de/blogs/nutrition/food-carbon-footprint-index-2018) from nu3. The `food_consumption` dataset contains information about the kilograms of food consumed per person per year in each country in each food category (`consumption`) as well as information about the carbon footprint of that food category (`co2_emissions`) measured in kilograms of carbon dioxide, or $CO_2$, per person per year in each country.\n", "\n", "In this exercise, you'll compute measures of center to compare food consumption in the US and Belgium." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | country | \n", "food_category | \n", "consumption | \n", "co2_emission | \n", "
---|---|---|---|---|
1 | \n", "Argentina | \n", "pork | \n", "10.51 | \n", "37.20 | \n", "
2 | \n", "Argentina | \n", "poultry | \n", "38.66 | \n", "41.53 | \n", "
3 | \n", "Argentina | \n", "beef | \n", "55.48 | \n", "1712.00 | \n", "
4 | \n", "Argentina | \n", "lamb_goat | \n", "1.56 | \n", "54.63 | \n", "
5 | \n", "Argentina | \n", "fish | \n", "4.36 | \n", "6.96 | \n", "