{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Correlation and Experimental Design\n", "> In this chapter, you'll learn how to quantify the strength of a linear relationship between two variables, and explore how confounding variables can affect the relationship between two other variables. You'll also see how a study’s design can influence its results, change how the data should be analyzed, and potentially affect the reliability of your conclusions. This is the Summary of lecture \"Introduction to Statistics in Python\", via datacamp.\n", "\n", "- toc: true \n", "- badges: true\n", "- comments: true\n", "- author: Chanseok Kang\n", "- categories: [Python, Datacamp, Statistics]\n", "- image: images/lmplot_wh.png" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Correlation\n", "- Correlation coefficient\n", " - Quantifies the linear relationship between two variables\n", " - Number between -1 and 1\n", " - Magnitude corresponds to strength of relationship\n", " - Sign (+ or -) corresponds to direction of relationship\n", "- Pearson product-moment correlation($r$)\n", " - Most Common\n", " - $\\bar{x}$ = mean of $x$\n", " - $\\sigma_x$ = standard deviation of $x$\n", " \n", " $$ r = \\sum_{i=1}^{n} \\frac{(x_i - \\bar{x})(y_i - \\bar{y})}{\\sigma_x \\times \\sigma_y} $$\n", " \n", " - Variation\n", " - Kendall's Tau\n", " - Spearman's rho" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Relationships between variables\n", "In this chapter, you'll be working with a dataset `world_happiness` containing results from the [2019 World Happiness Report](https://worldhappiness.report/ed/2019/). The report scores various countries based on how happy people in that country are. It also ranks each country on various societal aspects such as social support, freedom, corruption, and others. The dataset also includes the GDP per capita and life expectancy for each country.\n", "\n", "In this exercise, you'll examine the relationship between a country's life expectancy (`life_exp`) and happiness score (`happiness_score`) both visually and quantitatively." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | country | \n", "social_support | \n", "freedom | \n", "corruption | \n", "generosity | \n", "gdp_per_cap | \n", "life_exp | \n", "happiness_score | \n", "
---|---|---|---|---|---|---|---|---|
1 | \n", "Finland | \n", "2.0 | \n", "5.0 | \n", "4.0 | \n", "47.0 | \n", "42400 | \n", "81.8 | \n", "155 | \n", "
2 | \n", "Denmark | \n", "4.0 | \n", "6.0 | \n", "3.0 | \n", "22.0 | \n", "48300 | \n", "81.0 | \n", "154 | \n", "
3 | \n", "Norway | \n", "3.0 | \n", "3.0 | \n", "8.0 | \n", "11.0 | \n", "66300 | \n", "82.6 | \n", "153 | \n", "
4 | \n", "Iceland | \n", "1.0 | \n", "7.0 | \n", "45.0 | \n", "3.0 | \n", "47900 | \n", "83.0 | \n", "152 | \n", "
5 | \n", "Netherlands | \n", "15.0 | \n", "19.0 | \n", "12.0 | \n", "7.0 | \n", "50500 | \n", "81.8 | \n", "151 | \n", "
\n", " | country | \n", "social_support | \n", "freedom | \n", "corruption | \n", "generosity | \n", "gdp_per_cap | \n", "life_exp | \n", "happiness_score | \n", "grams_sugar_per_day | \n", "
---|---|---|---|---|---|---|---|---|---|
Unnamed: 0 | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
1 | \n", "Finland | \n", "2 | \n", "5 | \n", "4.0 | \n", "47 | \n", "42400 | \n", "81.8 | \n", "155 | \n", "86.8 | \n", "
2 | \n", "Denmark | \n", "4 | \n", "6 | \n", "3.0 | \n", "22 | \n", "48300 | \n", "81.0 | \n", "154 | \n", "152.0 | \n", "
3 | \n", "Norway | \n", "3 | \n", "3 | \n", "8.0 | \n", "11 | \n", "66300 | \n", "82.6 | \n", "153 | \n", "120.0 | \n", "
4 | \n", "Iceland | \n", "1 | \n", "7 | \n", "45.0 | \n", "3 | \n", "47900 | \n", "83.0 | \n", "152 | \n", "132.0 | \n", "
5 | \n", "Netherlands | \n", "15 | \n", "19 | \n", "12.0 | \n", "7 | \n", "50500 | \n", "81.8 | \n", "151 | \n", "122.0 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
129 | \n", "Yemen | \n", "100 | \n", "147 | \n", "83.0 | \n", "155 | \n", "2340 | \n", "68.1 | \n", "5 | \n", "77.9 | \n", "
130 | \n", "Rwanda | \n", "144 | \n", "21 | \n", "2.0 | \n", "90 | \n", "2110 | \n", "69.1 | \n", "4 | \n", "14.1 | \n", "
131 | \n", "Tanzania | \n", "131 | \n", "78 | \n", "34.0 | \n", "49 | \n", "2980 | \n", "67.7 | \n", "3 | \n", "28.0 | \n", "
132 | \n", "Afghanistan | \n", "151 | \n", "155 | \n", "136.0 | \n", "137 | \n", "1760 | \n", "64.1 | \n", "2 | \n", "24.5 | \n", "
133 | \n", "Central African Republic | \n", "155 | \n", "133 | \n", "122.0 | \n", "113 | \n", "794 | \n", "52.9 | \n", "1 | \n", "22.4 | \n", "
133 rows × 9 columns
\n", "