{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# In Search Of Happiness: When Is It?\n",
"\n",
"The happiness challenge on the go! \n",
"We have already answered some questions about happiness. You can read about this and much more [here](https://nbviewer.jupyter.org/github/chupstee/data.sugar/blob/master/00002_world_happiness/world_happiness.map.ipynb). \n",
"Today we wondered when we are becoming happier. \n",
"We'll take [the World Happiness report from Kaggle](https://www.kaggle.com/unsdsn/world-happiness?select=2018.csv), which ranks 156 countries by their level of happiness on a 10-point scale.\n",
"\n",
"\n",
"## The World Happiness Report\n",
"\n",
"Recall quoting Kaggle:\n",
"\n",
"The World Happiness Report is a landmark survey of the state of global happiness. The first report was published in 2012, the second in 2013, the third in 2015, and the fourth in the 2016 Update. The World Happiness 2017, which ranks 155 countries by their happiness levels, was released at the United Nations at an event celebrating International Day of Happiness on March 20th. The report continues to gain global recognition as governments, organizations and civil society increasingly use happiness indicators to inform their policy-making decisions. Leading experts across fields – economics, psychology, survey analysis, national statistics, health, public policy and more – describe how measurements of well-being can be used effectively to assess the progress of nations. The reports review the state of happiness in the world today and show how the new science of happiness explains personal and national variations in happiness.\n",
"\n",
"You can read more [here](https://www.kaggle.com/unsdsn/world-happiness).\n",
"\n",
"We are most interested in the following columns:\n",
"\n",
"- `Country or region` - country name\n",
"- `Overall rank` - country's place in the rating\n",
"- `Score` - happiness score\n",
"\n",
"\n",
"## Introduction\n",
"\n",
"We'll try to identify the relationship between the level of happiness and the age of the population by country.\n",
"\n",
"[The World Factbook](https://www.cia.gov/library/publications/the-world-factbook) by CIA provides information on the history, people and society, government, economy, energy, geography, communications, transportation, military, and transnational issues for 267 world entities. \n",
"For our purposes, we will take the following indicators:\n",
"\n",
"- `Life expectancy at birth` - the average number of years to be lived by a group of people born in the same year, if mortality at each age remains constant in the future. Life expectancy at birth is also a measure of overall quality of life in a country and summarizes the mortality at all ages. \n",
"- `Median age` - the age that divides a population into two numerically equal groups; that is, half the people are younger than this age and half are older. It is a single index that summarizes the age distribution of a population. Currently, the median age ranges from a low of about 15 in Niger and Uganda to 40 or more in several European countries and Japan. \n",
"- `Population growth rate` - the average annual percent change in populations, resulting from a surplus (or deficit) of births over deaths and the balance of migrants entering and leaving a country. The rate may be positive or negative.\n",
"- `Death rate` - the average annual number of deaths during a year per 1,000 population at midyear; also known as crude death rate.\n",
"- `Birth rate` - the average annual number of births during a year per 1,000 persons in the population at midyear; also known as crude birth rate.\n",
"\n",
"We will compare the happiness scores with the CIA rates for 2018, as this is the year when the data is presented in the most complete way. \n",
"\n",
"Let's find out ***when*** *is happiness*.\n",
"\n",
"\n",
"## Reading The Data"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# Import libs\n",
"from glob import glob\n",
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Happiness Reports\n",
"\n",
"As we mentioned above, we'll read the happiness report published in 2018. \n",
"We'll also rename the columns according to the snake_case format."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Index(['rank', 'country', 'score', 'gdp_per_capita', 'social_support', 'life_expectancy', 'freedom', 'generosity',\n",
" 'trust_corruption'],\n",
" dtype='object')\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
rank
\n",
"
country
\n",
"
score
\n",
"
gdp_per_capita
\n",
"
social_support
\n",
"
life_expectancy
\n",
"
freedom
\n",
"
generosity
\n",
"
trust_corruption
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
1
\n",
"
Finland
\n",
"
7.632
\n",
"
1.305
\n",
"
1.592
\n",
"
0.874
\n",
"
0.681
\n",
"
0.202
\n",
"
0.393
\n",
"
\n",
"
\n",
"
1
\n",
"
2
\n",
"
Norway
\n",
"
7.594
\n",
"
1.456
\n",
"
1.582
\n",
"
0.861
\n",
"
0.686
\n",
"
0.286
\n",
"
0.340
\n",
"
\n",
"
\n",
"
2
\n",
"
3
\n",
"
Denmark
\n",
"
7.555
\n",
"
1.351
\n",
"
1.590
\n",
"
0.868
\n",
"
0.683
\n",
"
0.284
\n",
"
0.408
\n",
"
\n",
"
\n",
"
3
\n",
"
4
\n",
"
Iceland
\n",
"
7.495
\n",
"
1.343
\n",
"
1.644
\n",
"
0.914
\n",
"
0.677
\n",
"
0.353
\n",
"
0.138
\n",
"
\n",
"
\n",
"
4
\n",
"
5
\n",
"
Switzerland
\n",
"
7.487
\n",
"
1.420
\n",
"
1.549
\n",
"
0.927
\n",
"
0.660
\n",
"
0.256
\n",
"
0.357
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" rank country score gdp_per_capita social_support life_expectancy freedom generosity trust_corruption\n",
"0 1 Finland 7.632 1.305 1.592 0.874 0.681 0.202 0.393\n",
"1 2 Norway 7.594 1.456 1.582 0.861 0.686 0.286 0.340\n",
"2 3 Denmark 7.555 1.351 1.590 0.868 0.683 0.284 0.408\n",
"3 4 Iceland 7.495 1.343 1.644 0.914 0.677 0.353 0.138\n",
"4 5 Switzerland 7.487 1.420 1.549 0.927 0.660 0.256 0.357"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Set the width to display\n",
"pd.set_option('display.width', 120)\n",
"# Increase the number of rows to display\n",
"pd.set_option('display.max_rows', 60) \n",
"\n",
"# Get the data\n",
"happiness = pd.read_csv('data/happiness_2018.csv')\n",
"\n",
"# Column map to rename\n",
"cols_dict = {'Country':'country',\n",
" 'Country or region':'country',\n",
" 'Region':'region',\n",
" 'Happiness Rank':'rank',\n",
" 'Happiness.Rank':'rank',\n",
" 'Overall rank':'rank',\n",
" 'Happiness Score':'score',\n",
" 'Happiness.Score':'score',\n",
" 'Score':'score',\n",
" 'Economy (GDP per Capita)':'gdp_per_capita',\n",
" 'Economy..GDP.per.Capita.':'gdp_per_capita',\n",
" 'GDP per capita':'gdp_per_capita',\n",
" 'Family':'family',\n",
" 'Freedom':'freedom',\n",
" 'Freedom to make life choices':'freedom',\n",
" 'Generosity':'generosity',\n",
" 'Health (Life Expectancy)':'life_expectancy',\n",
" 'Health..Life.Expectancy.':'life_expectancy',\n",
" 'Healthy life expectancy':'life_expectancy',\n",
" 'Perceptions of corruption':'trust_corruption',\n",
" 'Trust (Government Corruption)':'trust_corruption',\n",
" 'Trust..Government.Corruption.':'trust_corruption',\n",
" 'Social support':'social_support',\n",
" 'Dystopia Residual':'dystopia_residual',\n",
" 'Dystopia.Residual':'dystopia_residual',\n",
" 'Standard Error':'standard_error',\n",
" 'Upper Confidence Interval':'whisker_high',\n",
" 'Whisker.high':'whisker_high',\n",
" 'Lower Confidence Interval':'whisker_low',\n",
" 'Whisker.low':'whisker_low'\n",
" }\n",
"\n",
"# Rename the columns\n",
"happiness.rename(columns=cols_dict, inplace=True)\n",
"\n",
"print(happiness.columns) # check the new column names\n",
"happiness.head() # check the values"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"RangeIndex: 156 entries, 0 to 155\n",
"Data columns (total 9 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 rank 156 non-null int64 \n",
" 1 country 156 non-null object \n",
" 2 score 156 non-null float64\n",
" 3 gdp_per_capita 156 non-null float64\n",
" 4 social_support 156 non-null float64\n",
" 5 life_expectancy 156 non-null float64\n",
" 6 freedom 156 non-null float64\n",
" 7 generosity 156 non-null float64\n",
" 8 trust_corruption 155 non-null float64\n",
"dtypes: float64(7), int64(1), object(1)\n",
"memory usage: 11.1+ KB\n"
]
}
],
"source": [
"happiness.info()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see 156 countries in the report of 2018. There are no missing values for the `country`, `rank`, `score` columns.\n",
"\n",
"Let's check for duplicates."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Duplicated: 0\n"
]
}
],
"source": [
"# Duplicated\n",
"print('Duplicated: {}'.format(happiness.duplicated(subset='country').sum()))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It's OK. Let's get the CIA data.\n",
"\n",
"### CIA Reports\n",
"\n",
"We have downloaded The World Factbook archive for different years and saved the data that was collected in 2018."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Initialize birth: 226\n",
"Merge death: 226\n",
"Merge life_expectancy_at_birth: 226\n",
"Merge median_age: 229\n",
"Merge population_growth: 237\n",
"\n",
"Int64Index: 237 entries, 0 to 236\n",
"Data columns (total 6 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 country_cia 237 non-null object \n",
" 1 birth 226 non-null float64\n",
" 2 death 226 non-null float64\n",
" 3 life_expectancy_at_birth 223 non-null float64\n",
" 4 median_age 228 non-null float64\n",
" 5 population_growth 234 non-null float64\n",
"dtypes: float64(5), object(1)\n",
"memory usage: 13.0+ KB\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
country_cia
\n",
"
birth
\n",
"
death
\n",
"
life_expectancy_at_birth
\n",
"
median_age
\n",
"
population_growth
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Angola
\n",
"
43.7
\n",
"
9.0
\n",
"
60.6
\n",
"
15.9
\n",
"
3.49
\n",
"
\n",
"
\n",
"
1
\n",
"
Niger
\n",
"
43.6
\n",
"
11.5
\n",
"
56.3
\n",
"
15.5
\n",
"
3.16
\n",
"
\n",
"
\n",
"
2
\n",
"
Mali
\n",
"
43.2
\n",
"
9.6
\n",
"
60.8
\n",
"
15.8
\n",
"
2.98
\n",
"
\n",
"
\n",
"
3
\n",
"
Chad
\n",
"
43.0
\n",
"
10.5
\n",
"
57.5
\n",
"
15.8
\n",
"
3.23
\n",
"
\n",
"
\n",
"
4
\n",
"
Uganda
\n",
"
42.4
\n",
"
9.9
\n",
"
56.3
\n",
"
15.9
\n",
"
3.18
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
232
\n",
"
Cocos (Keeling) Islands
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
0.00
\n",
"
\n",
"
\n",
"
233
\n",
"
Pitcairn Islands
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
0.00
\n",
"
\n",
"
\n",
"
234
\n",
"
Tokelau
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
-0.01
\n",
"
\n",
"
\n",
"
235
\n",
"
Svalbard
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
-0.03
\n",
"
\n",
"
\n",
"
236
\n",
"
Niue
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
-0.03
\n",
"
\n",
" \n",
"
\n",
"
237 rows × 6 columns
\n",
"
"
],
"text/plain": [
" country_cia birth death life_expectancy_at_birth median_age population_growth\n",
"0 Angola 43.7 9.0 60.6 15.9 3.49\n",
"1 Niger 43.6 11.5 56.3 15.5 3.16\n",
"2 Mali 43.2 9.6 60.8 15.8 2.98\n",
"3 Chad 43.0 10.5 57.5 15.8 3.23\n",
"4 Uganda 42.4 9.9 56.3 15.9 3.18\n",
".. ... ... ... ... ... ...\n",
"232 Cocos (Keeling) Islands NaN NaN NaN NaN 0.00\n",
"233 Pitcairn Islands NaN NaN NaN NaN 0.00\n",
"234 Tokelau NaN NaN NaN NaN -0.01\n",
"235 Svalbard NaN NaN NaN NaN -0.03\n",
"236 Niue NaN NaN NaN NaN -0.03\n",
"\n",
"[237 rows x 6 columns]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cia_files = glob('data/cia.age.*.2018.txt')\n",
"cia = pd.DataFrame()\n",
"\n",
"for file in cia_files:\n",
" c = pd.read_csv(file,\n",
" engine='python', sep=r'\\s{3,}', header=None,\n",
" names=['country_cia', file.split('.')[2], 'data_year'],\n",
" squeeze=False, skiprows=1, index_col=0,\n",
" thousands=',', dtype={file.split('.')[2]:'float64'}\n",
" )[['country_cia', file.split('.')[2]]] # read the file\n",
" if cia.size == 0:\n",
" cia = cia.append(c)\n",
" print('Initialize {}: {}'.format(file.split('.')[2], cia.shape[0])) # for the first file\n",
" else:\n",
" cia = cia.merge(c, on='country_cia', how='outer')\n",
" print('Merge {}: {}'.format(file.split('.')[2], cia.shape[0]))\n",
"\n",
"cia.reset_index()\n",
"\n",
"cia.info()\n",
"cia"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It is interesting to see what the median age and life expectancy in the world are."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
birth
\n",
"
death
\n",
"
life_expectancy_at_birth
\n",
"
median_age
\n",
"
population_growth
\n",
"
\n",
" \n",
" \n",
"
\n",
"
count
\n",
"
226.000000
\n",
"
226.000000
\n",
"
223.000000
\n",
"
228.00000
\n",
"
234.000000
\n",
"
\n",
"
\n",
"
mean
\n",
"
18.816372
\n",
"
7.650442
\n",
"
73.296861
\n",
"
30.99386
\n",
"
1.006154
\n",
"
\n",
"
\n",
"
std
\n",
"
9.345047
\n",
"
2.717405
\n",
"
7.639940
\n",
"
8.95895
\n",
"
1.159035
\n",
"
\n",
"
\n",
"
min
\n",
"
6.500000
\n",
"
1.600000
\n",
"
52.100000
\n",
"
15.50000
\n",
"
-3.130000
\n",
"
\n",
"
\n",
"
25%
\n",
"
11.600000
\n",
"
5.900000
\n",
"
68.400000
\n",
"
23.37500
\n",
"
0.262500
\n",
"
\n",
"
\n",
"
50%
\n",
"
15.850000
\n",
"
7.400000
\n",
"
75.200000
\n",
"
30.65000
\n",
"
0.940000
\n",
"
\n",
"
\n",
"
75%
\n",
"
23.475000
\n",
"
9.275000
\n",
"
78.900000
\n",
"
38.82500
\n",
"
1.710000
\n",
"
\n",
"
\n",
"
max
\n",
"
43.700000
\n",
"
19.300000
\n",
"
89.400000
\n",
"
53.80000
\n",
"
7.370000
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" birth death life_expectancy_at_birth median_age population_growth\n",
"count 226.000000 226.000000 223.000000 228.00000 234.000000\n",
"mean 18.816372 7.650442 73.296861 30.99386 1.006154\n",
"std 9.345047 2.717405 7.639940 8.95895 1.159035\n",
"min 6.500000 1.600000 52.100000 15.50000 -3.130000\n",
"25% 11.600000 5.900000 68.400000 23.37500 0.262500\n",
"50% 15.850000 7.400000 75.200000 30.65000 0.940000\n",
"75% 23.475000 9.275000 78.900000 38.82500 1.710000\n",
"max 43.700000 19.300000 89.400000 53.80000 7.370000"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cia.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Life expectancy today is about 73 years. The average age of a modern person is about 30 years. \n",
"What are the countries with the lowest life expectancy at birth and median age? \n",
"Let's define TOP5 ratings."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"TOP5 countries with the min life expectancy at birth\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
country_cia
\n",
"
life_expectancy_at_birth
\n",
"
\n",
" \n",
" \n",
"
\n",
"
11
\n",
"
Afghanistan
\n",
"
52.1
\n",
"
\n",
"
\n",
"
5
\n",
"
Zambia
\n",
"
53.0
\n",
"
\n",
"
\n",
"
51
\n",
"
Lesotho
\n",
"
53.0
\n",
"
\n",
"
\n",
"
8
\n",
"
Somalia
\n",
"
53.2
\n",
"
\n",
"
\n",
"
23
\n",
"
Central African Republic
\n",
"
53.3
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" country_cia life_expectancy_at_birth\n",
"11 Afghanistan 52.1\n",
"5 Zambia 53.0\n",
"51 Lesotho 53.0\n",
"8 Somalia 53.2\n",
"23 Central African Republic 53.3"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Print the countries with the min life expectancy at birth\n",
"print('TOP5 countries with the min life expectancy at birth')\n",
"cia[['country_cia', 'life_expectancy_at_birth']].sort_values(by='life_expectancy_at_birth', ascending=True).head()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"TOP5 countries with the min median age\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
country_cia
\n",
"
median_age
\n",
"
\n",
" \n",
" \n",
"
\n",
"
1
\n",
"
Niger
\n",
"
15.5
\n",
"
\n",
"
\n",
"
2
\n",
"
Mali
\n",
"
15.8
\n",
"
\n",
"
\n",
"
3
\n",
"
Chad
\n",
"
15.8
\n",
"
\n",
"
\n",
"
0
\n",
"
Angola
\n",
"
15.9
\n",
"
\n",
"
\n",
"
4
\n",
"
Uganda
\n",
"
15.9
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" country_cia median_age\n",
"1 Niger 15.5\n",
"2 Mali 15.8\n",
"3 Chad 15.8\n",
"0 Angola 15.9\n",
"4 Uganda 15.9"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Print the countries with the min median age\n",
"print('TOP5 countries with the min median age')\n",
"cia[['country_cia', 'median_age']].sort_values(by='median_age', ascending=True).head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What are the countries with the highest life expectancy at birth and median age?"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"TOP5 countries with the max life expectancy at birth\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
country_cia
\n",
"
life_expectancy_at_birth
\n",
"
\n",
" \n",
" \n",
"
\n",
"
225
\n",
"
Monaco
\n",
"
89.4
\n",
"
\n",
"
\n",
"
211
\n",
"
Singapore
\n",
"
85.5
\n",
"
\n",
"
\n",
"
222
\n",
"
Japan
\n",
"
85.5
\n",
"
\n",
"
\n",
"
216
\n",
"
Macau
\n",
"
84.6
\n",
"
\n",
"
\n",
"
213
\n",
"
San Marino
\n",
"
83.4
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" country_cia life_expectancy_at_birth\n",
"225 Monaco 89.4\n",
"211 Singapore 85.5\n",
"222 Japan 85.5\n",
"216 Macau 84.6\n",
"213 San Marino 83.4"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Print the countries with the max life expectancy at birth\n",
"print('TOP5 countries with the max life expectancy at birth')\n",
"cia[['country_cia', 'life_expectancy_at_birth']].sort_values(by='life_expectancy_at_birth', ascending=False).head()"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"TOP5 countries with the max median age\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
country_cia
\n",
"
median_age
\n",
"
\n",
" \n",
" \n",
"
\n",
"
225
\n",
"
Monaco
\n",
"
53.8
\n",
"
\n",
"
\n",
"
222
\n",
"
Japan
\n",
"
47.7
\n",
"
\n",
"
\n",
"
212
\n",
"
Germany
\n",
"
47.4
\n",
"
\n",
"
\n",
"
224
\n",
"
Saint Pierre and Miquelon
\n",
"
47.2
\n",
"
\n",
"
\n",
"
215
\n",
"
Italy
\n",
"
45.8
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" country_cia median_age\n",
"225 Monaco 53.8\n",
"222 Japan 47.7\n",
"212 Germany 47.4\n",
"224 Saint Pierre and Miquelon 47.2\n",
"215 Italy 45.8"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Print the countries with the max median age\n",
"print('TOP5 countries with the max median age')\n",
"cia[['country_cia', 'median_age']].sort_values(by='median_age', ascending=False).head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Preparing The Data Sets\n",
"\n",
"Now we should combine the `happiness` and `cia` datasets.\n",
"\n",
"First, we need to check the columns that will be used for merging.\n",
"Country names may differ in data sets, for instance, `eSwatini` and `Swaziland`, `Trinidad and Tobago` and `Trinidad & Tobago`. In this case, the rows will not match.\n",
"\n",
"Before, we store the `country_cia` column of the `cia` in a new column `country`."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"cia['country'] = cia['country_cia']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's compare the `country` columns of the `happiness` data set and the `cia` data set.\n",
"To do this, we'll combine two data sets using `outer` join."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
country
\n",
"
score
\n",
"
population_growth
\n",
"
\n",
" \n",
" \n",
"
\n",
"
131
\n",
"
Congo (Kinshasa)
\n",
"
4.245
\n",
"
NaN
\n",
"
\n",
"
\n",
"
129
\n",
"
Myanmar
\n",
"
4.308
\n",
"
NaN
\n",
"
\n",
"
\n",
"
113
\n",
"
Congo (Brazzaville)
\n",
"
4.559
\n",
"
NaN
\n",
"
\n",
"
\n",
"
106
\n",
"
Ivory Coast
\n",
"
4.671
\n",
"
NaN
\n",
"
\n",
"
\n",
"
103
\n",
"
Palestinian Territories
\n",
"
4.743
\n",
"
NaN
\n",
"
\n",
"
\n",
"
65
\n",
"
Kosovo
\n",
"
5.662
\n",
"
NaN
\n",
"
\n",
"
\n",
"
57
\n",
"
Northern Cyprus
\n",
"
5.835
\n",
"
NaN
\n",
"
\n",
"
\n",
"
56
\n",
"
South Korea
\n",
"
5.875
\n",
"
NaN
\n",
"
\n",
"
\n",
"
37
\n",
"
Trinidad & Tobago
\n",
"
6.192
\n",
"
NaN
\n",
"
\n",
"
\n",
"
20
\n",
"
Czech Republic
\n",
"
6.711
\n",
"
NaN
\n",
"
\n",
"
\n",
"
184
\n",
"
American Samoa
\n",
"
NaN
\n",
"
-1.35
\n",
"
\n",
"
\n",
"
233
\n",
"
Andorra
\n",
"
NaN
\n",
"
-0.01
\n",
"
\n",
"
\n",
"
213
\n",
"
Anguilla
\n",
"
NaN
\n",
"
1.92
\n",
"
\n",
"
\n",
"
189
\n",
"
Antigua and Barbuda
\n",
"
NaN
\n",
"
1.20
\n",
"
\n",
"
\n",
"
214
\n",
"
Aruba
\n",
"
NaN
\n",
"
1.24
\n",
"
\n",
"
\n",
"
193
\n",
"
Bahamas, The
\n",
"
NaN
\n",
"
0.79
\n",
"
\n",
"
\n",
"
217
\n",
"
Barbados
\n",
"
NaN
\n",
"
0.26
\n",
"
\n",
"
\n",
"
218
\n",
"
Bermuda
\n",
"
NaN
\n",
"
0.43
\n",
"
\n",
"
\n",
"
220
\n",
"
British Virgin Islands
\n",
"
NaN
\n",
"
2.20
\n",
"
\n",
"
\n",
"
187
\n",
"
Brunei
\n",
"
NaN
\n",
"
1.55
\n",
"
\n",
"
\n",
"
186
\n",
"
Burma
\n",
"
NaN
\n",
"
0.89
\n",
"
\n",
"
\n",
"
181
\n",
"
Cabo Verde
\n",
"
NaN
\n",
"
1.32
\n",
"
\n",
"
\n",
"
216
\n",
"
Cayman Islands
\n",
"
NaN
\n",
"
1.96
\n",
"
\n",
"
\n",
"
238
\n",
"
Christmas Island
\n",
"
NaN
\n",
"
1.11
\n",
"
\n",
"
\n",
"
241
\n",
"
Cocos (Keeling) Islands
\n",
"
NaN
\n",
"
0.00
\n",
"
\n",
"
\n",
"
169
\n",
"
Comoros
\n",
"
NaN
\n",
"
1.57
\n",
"
\n",
"
\n",
"
159
\n",
"
Congo, Democratic Republic of the
\n",
"
NaN
\n",
"
2.33
\n",
"
\n",
"
\n",
"
157
\n",
"
Congo, Republic of the
\n",
"
NaN
\n",
"
2.17
\n",
"
\n",
"
\n",
"
203
\n",
"
Cook Islands
\n",
"
NaN
\n",
"
-2.72
\n",
"
\n",
"
\n",
"
163
\n",
"
Cote d'Ivoire
\n",
"
NaN
\n",
"
2.30
\n",
"
\n",
"
\n",
"
223
\n",
"
Cuba
\n",
"
NaN
\n",
"
-0.27
\n",
"
\n",
"
\n",
"
204
\n",
"
Curacao
\n",
"
NaN
\n",
"
0.39
\n",
"
\n",
"
\n",
"
228
\n",
"
Czechia
\n",
"
NaN
\n",
"
0.10
\n",
"
\n",
"
\n",
"
175
\n",
"
Djibouti
\n",
"
NaN
\n",
"
2.13
\n",
"
\n",
"
\n",
"
194
\n",
"
Dominica
\n",
"
NaN
\n",
"
0.17
\n",
"
\n",
"
\n",
"
160
\n",
"
Equatorial Guinea
\n",
"
NaN
\n",
"
2.41
\n",
"
\n",
"
\n",
"
164
\n",
"
Eritrea
\n",
"
NaN
\n",
"
0.89
\n",
"
\n",
"
\n",
"
168
\n",
"
Eswatini
\n",
"
NaN
\n",
"
0.82
\n",
"
\n",
"
\n",
"
221
\n",
"
Falkland Islands (Islas Malvinas)
\n",
"
NaN
\n",
"
0.01
\n",
"
\n",
"
\n",
"
199
\n",
"
Faroe Islands
\n",
"
NaN
\n",
"
0.58
\n",
"
\n",
"
\n",
"
185
\n",
"
Fiji
\n",
"
NaN
\n",
"
0.56
\n",
"
\n",
"
\n",
"
200
\n",
"
French Polynesia
\n",
"
NaN
\n",
"
0.85
\n",
"
\n",
"
\n",
"
166
\n",
"
Gambia, The
\n",
"
NaN
\n",
"
1.99
\n",
"
\n",
"
\n",
"
162
\n",
"
Gaza Strip
\n",
"
NaN
\n",
"
2.25
\n",
"
\n",
"
\n",
"
202
\n",
"
Gibraltar
\n",
"
NaN
\n",
"
0.21
\n",
"
\n",
"
\n",
"
201
\n",
"
Greenland
\n",
"
NaN
\n",
"
-0.04
\n",
"
\n",
"
\n",
"
192
\n",
"
Grenada
\n",
"
NaN
\n",
"
0.42
\n",
"
\n",
"
\n",
"
183
\n",
"
Guam
\n",
"
NaN
\n",
"
0.23
\n",
"
\n",
"
\n",
"
226
\n",
"
Guernsey
\n",
"
NaN
\n",
"
0.28
\n",
"
\n",
"
\n",
"
156
\n",
"
Guinea-Bissau
\n",
"
NaN
\n",
"
2.48
\n",
"
\n",
"
\n",
"
191
\n",
"
Guyana
\n",
"
NaN
\n",
"
0.48
\n",
"
\n",
"
\n",
"
240
\n",
"
Holy See (Vatican City)
\n",
"
NaN
\n",
"
0.00
\n",
"
\n",
"
\n",
"
222
\n",
"
Isle of Man
\n",
"
NaN
\n",
"
0.65
\n",
"
\n",
"
\n",
"
211
\n",
"
Jersey
\n",
"
NaN
\n",
"
0.76
\n",
"
\n",
"
\n",
"
179
\n",
"
Kiribati
\n",
"
NaN
\n",
"
1.12
\n",
"
\n",
"
\n",
"
198
\n",
"
Korea, North
\n",
"
NaN
\n",
"
0.52
\n",
"
\n",
"
\n",
"
231
\n",
"
Korea, South
\n",
"
NaN
\n",
"
0.44
\n",
"
\n",
"
\n",
"
225
\n",
"
Liechtenstein
\n",
"
NaN
\n",
"
0.78
\n",
"
\n",
"
\n",
"
230
\n",
"
Macau
\n",
"
NaN
\n",
"
0.71
\n",
"
\n",
"
\n",
"
188
\n",
"
Maldives
\n",
"
NaN
\n",
"
-0.06
\n",
"
\n",
"
\n",
"
171
\n",
"
Marshall Islands
\n",
"
NaN
\n",
"
1.50
\n",
"
\n",
"
\n",
"
182
\n",
"
Micronesia, Federated States of
\n",
"
NaN
\n",
"
-0.55
\n",
"
\n",
"
\n",
"
235
\n",
"
Monaco
\n",
"
NaN
\n",
"
0.30
\n",
"
\n",
"
\n",
"
224
\n",
"
Montserrat
\n",
"
NaN
\n",
"
0.43
\n",
"
\n",
"
\n",
"
177
\n",
"
Nauru
\n",
"
NaN
\n",
"
0.51
\n",
"
\n",
"
\n",
"
197
\n",
"
New Caledonia
\n",
"
NaN
\n",
"
1.30
\n",
"
\n",
"
\n",
"
245
\n",
"
Niue
\n",
"
NaN
\n",
"
-0.03
\n",
"
\n",
"
\n",
"
239
\n",
"
Norfolk Island
\n",
"
NaN
\n",
"
0.01
\n",
"
\n",
"
\n",
"
195
\n",
"
Northern Mariana Islands
\n",
"
NaN
\n",
"
-0.52
\n",
"
\n",
"
\n",
"
172
\n",
"
Oman
\n",
"
NaN
\n",
"
2.00
\n",
"
\n",
"
\n",
"
219
\n",
"
Palau
\n",
"
NaN
\n",
"
0.40
\n",
"
\n",
"
\n",
"
176
\n",
"
Papua New Guinea
\n",
"
NaN
\n",
"
1.67
\n",
"
\n",
"
\n",
"
242
\n",
"
Pitcairn Islands
\n",
"
NaN
\n",
"
0.00
\n",
"
\n",
"
\n",
"
232
\n",
"
Puerto Rico
\n",
"
NaN
\n",
"
-1.70
\n",
"
\n",
"
\n",
"
236
\n",
"
Saint Barthelemy
\n",
"
NaN
\n",
"
NaN
\n",
"
\n",
"
\n",
"
227
\n",
"
Saint Helena, Ascension, and Tristan da Cunha
\n",
"
NaN
\n",
"
0.14
\n",
"
\n",
"
\n",
"
208
\n",
"
Saint Kitts and Nevis
\n",
"
NaN
\n",
"
0.70
\n",
"
\n",
"
\n",
"
206
\n",
"
Saint Lucia
\n",
"
NaN
\n",
"
0.31
\n",
"
\n",
"
\n",
"
237
\n",
"
Saint Martin
\n",
"
NaN
\n",
"
NaN
\n",
"
\n",
"
\n",
"
234
\n",
"
Saint Pierre and Miquelon
\n",
"
NaN
\n",
"
-1.13
\n",
"
\n",
"
\n",
"
209
\n",
"
Saint Vincent and the Grenadines
\n",
"
NaN
\n",
"
-0.23
\n",
"
\n",
"
\n",
"
180
\n",
"
Samoa
\n",
"
NaN
\n",
"
0.61
\n",
"
\n",
"
\n",
"
229
\n",
"
San Marino
\n",
"
NaN
\n",
"
0.70
\n",
"
\n",
"
\n",
"
161
\n",
"
Sao Tome and Principe
\n",
"
NaN
\n",
"
1.66
\n",
"
\n",
"
\n",
"
205
\n",
"
Seychelles
\n",
"
NaN
\n",
"
0.74
\n",
"
\n",
"
\n",
"
207
\n",
"
Sint Maarten
\n",
"
NaN
\n",
"
1.39
\n",
"
\n",
"
\n",
"
170
\n",
"
Solomon Islands
\n",
"
NaN
\n",
"
1.90
\n",
"
\n",
"
\n",
"
190
\n",
"
Suriname
\n",
"
NaN
\n",
"
1.00
\n",
"
\n",
"
\n",
"
244
\n",
"
Svalbard
\n",
"
NaN
\n",
"
-0.03
\n",
"
\n",
"
\n",
"
158
\n",
"
Timor-Leste
\n",
"
NaN
\n",
"
2.32
\n",
"
\n",
"
\n",
"
243
\n",
"
Tokelau
\n",
"
NaN
\n",
"
-0.01
\n",
"
\n",
"
\n",
"
178
\n",
"
Tonga
\n",
"
NaN
\n",
"
-0.10
\n",
"
\n",
"
\n",
"
215
\n",
"
Trinidad and Tobago
\n",
"
NaN
\n",
"
-0.23
\n",
"
\n",
"
\n",
"
196
\n",
"
Turks and Caicos Islands
\n",
"
NaN
\n",
"
2.09
\n",
"
\n",
"
\n",
"
173
\n",
"
Tuvalu
\n",
"
NaN
\n",
"
0.86
\n",
"
\n",
"
\n",
"
174
\n",
"
Vanuatu
\n",
"
NaN
\n",
"
1.81
\n",
"
\n",
"
\n",
"
212
\n",
"
Virgin Islands
\n",
"
NaN
\n",
"
-0.30
\n",
"
\n",
"
\n",
"
210
\n",
"
Wallis and Futuna
\n",
"
NaN
\n",
"
0.30
\n",
"
\n",
"
\n",
"
167
\n",
"
West Bank
\n",
"
NaN
\n",
"
1.81
\n",
"
\n",
"
\n",
"
165
\n",
"
Western Sahara
\n",
"
NaN
\n",
"
2.64
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" country score population_growth\n",
"131 Congo (Kinshasa) 4.245 NaN\n",
"129 Myanmar 4.308 NaN\n",
"113 Congo (Brazzaville) 4.559 NaN\n",
"106 Ivory Coast 4.671 NaN\n",
"103 Palestinian Territories 4.743 NaN\n",
"65 Kosovo 5.662 NaN\n",
"57 Northern Cyprus 5.835 NaN\n",
"56 South Korea 5.875 NaN\n",
"37 Trinidad & Tobago 6.192 NaN\n",
"20 Czech Republic 6.711 NaN\n",
"184 American Samoa NaN -1.35\n",
"233 Andorra NaN -0.01\n",
"213 Anguilla NaN 1.92\n",
"189 Antigua and Barbuda NaN 1.20\n",
"214 Aruba NaN 1.24\n",
"193 Bahamas, The NaN 0.79\n",
"217 Barbados NaN 0.26\n",
"218 Bermuda NaN 0.43\n",
"220 British Virgin Islands NaN 2.20\n",
"187 Brunei NaN 1.55\n",
"186 Burma NaN 0.89\n",
"181 Cabo Verde NaN 1.32\n",
"216 Cayman Islands NaN 1.96\n",
"238 Christmas Island NaN 1.11\n",
"241 Cocos (Keeling) Islands NaN 0.00\n",
"169 Comoros NaN 1.57\n",
"159 Congo, Democratic Republic of the NaN 2.33\n",
"157 Congo, Republic of the NaN 2.17\n",
"203 Cook Islands NaN -2.72\n",
"163 Cote d'Ivoire NaN 2.30\n",
"223 Cuba NaN -0.27\n",
"204 Curacao NaN 0.39\n",
"228 Czechia NaN 0.10\n",
"175 Djibouti NaN 2.13\n",
"194 Dominica NaN 0.17\n",
"160 Equatorial Guinea NaN 2.41\n",
"164 Eritrea NaN 0.89\n",
"168 Eswatini NaN 0.82\n",
"221 Falkland Islands (Islas Malvinas) NaN 0.01\n",
"199 Faroe Islands NaN 0.58\n",
"185 Fiji NaN 0.56\n",
"200 French Polynesia NaN 0.85\n",
"166 Gambia, The NaN 1.99\n",
"162 Gaza Strip NaN 2.25\n",
"202 Gibraltar NaN 0.21\n",
"201 Greenland NaN -0.04\n",
"192 Grenada NaN 0.42\n",
"183 Guam NaN 0.23\n",
"226 Guernsey NaN 0.28\n",
"156 Guinea-Bissau NaN 2.48\n",
"191 Guyana NaN 0.48\n",
"240 Holy See (Vatican City) NaN 0.00\n",
"222 Isle of Man NaN 0.65\n",
"211 Jersey NaN 0.76\n",
"179 Kiribati NaN 1.12\n",
"198 Korea, North NaN 0.52\n",
"231 Korea, South NaN 0.44\n",
"225 Liechtenstein NaN 0.78\n",
"230 Macau NaN 0.71\n",
"188 Maldives NaN -0.06\n",
"171 Marshall Islands NaN 1.50\n",
"182 Micronesia, Federated States of NaN -0.55\n",
"235 Monaco NaN 0.30\n",
"224 Montserrat NaN 0.43\n",
"177 Nauru NaN 0.51\n",
"197 New Caledonia NaN 1.30\n",
"245 Niue NaN -0.03\n",
"239 Norfolk Island NaN 0.01\n",
"195 Northern Mariana Islands NaN -0.52\n",
"172 Oman NaN 2.00\n",
"219 Palau NaN 0.40\n",
"176 Papua New Guinea NaN 1.67\n",
"242 Pitcairn Islands NaN 0.00\n",
"232 Puerto Rico NaN -1.70\n",
"236 Saint Barthelemy NaN NaN\n",
"227 Saint Helena, Ascension, and Tristan da Cunha NaN 0.14\n",
"208 Saint Kitts and Nevis NaN 0.70\n",
"206 Saint Lucia NaN 0.31\n",
"237 Saint Martin NaN NaN\n",
"234 Saint Pierre and Miquelon NaN -1.13\n",
"209 Saint Vincent and the Grenadines NaN -0.23\n",
"180 Samoa NaN 0.61\n",
"229 San Marino NaN 0.70\n",
"161 Sao Tome and Principe NaN 1.66\n",
"205 Seychelles NaN 0.74\n",
"207 Sint Maarten NaN 1.39\n",
"170 Solomon Islands NaN 1.90\n",
"190 Suriname NaN 1.00\n",
"244 Svalbard NaN -0.03\n",
"158 Timor-Leste NaN 2.32\n",
"243 Tokelau NaN -0.01\n",
"178 Tonga NaN -0.10\n",
"215 Trinidad and Tobago NaN -0.23\n",
"196 Turks and Caicos Islands NaN 2.09\n",
"173 Tuvalu NaN 0.86\n",
"174 Vanuatu NaN 1.81\n",
"212 Virgin Islands NaN -0.30\n",
"210 Wallis and Futuna NaN 0.30\n",
"167 West Bank NaN 1.81\n",
"165 Western Sahara NaN 2.64"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"happiness_cia = happiness.merge(cia, on='country', how='outer')[['country', 'score', 'population_growth']]\n",
"\n",
"pd.set_option('display.max_rows', 100) # increase the number of rows to display\n",
"happiness_cia[happiness_cia.isnull().any(axis=1)].sort_values(by=['score', 'country']) # the countries don't match"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To provide the same country names:\n",
"\n",
"- Create a dictionary mapping all names to the values in the `happiness` dataset since we explore the happiness data.\n",
"- Rename the countries in the `cia` dataset by replacing the values according to the map dictionary."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Color palette for the data\n",
"palette = 'summer_r'\n",
"\n",
"# Inscriptions\n",
"title = \"\"\"The Relationship Between Life Expectancy at Birth And Happiness\"\"\"\n",
"description = \"\"\"\n",
"Correlation of the life expectancy at birth with the happiness score by country based on 2018 data.\n",
"Data: Gallup World Poll - www.kaggle.com/unsdsn/world-happiness & CIA - www.cia.gov/library/publications/the-world-factbook | Author: @data.sugar\n",
"\"\"\"\n",
"\n",
"# Plot size\n",
"figsize = (6,4)\n",
"\n",
"# Set the figure\n",
"sns.set(context='paper', style='ticks', palette=snark_palette,\n",
" rc={'xtick.major.size': 4, 'ytick.major.size':4,\n",
" 'axes.spines.left': True, 'axes.spines.bottom': True,\n",
" 'axes.spines.right': False, 'axes.spines.top': False\n",
" }\n",
" )\n",
"\n",
"# Create the plot\n",
"fig = plt.figure(figsize=figsize, facecolor='w')\n",
"ax = sns.scatterplot(x='life_expectancy_at_birth', y='score',\n",
" hue=happiness_cia['score'].tolist(),\n",
" size=happiness_cia['life_expectancy_at_birth'].tolist(),\n",
" sizes=(10,100),\n",
" data=happiness_cia,\n",
" palette=palette, legend=False\n",
" )\n",
"\n",
"# Set some aesthetic params for the plot\n",
"ax.set_title(title, fontdict={'fontsize': 16}, loc='center', pad=10, c=snark_palette[-1]) # set a title of the plot\n",
"ax.annotate(description, xy=(0.1, -0.015), size=6, xycoords='figure fraction', c=snark_palette[-1])\n",
"ax.spines['bottom'].set_linestyle((0, (1, 10)))\n",
"ax.spines['bottom'].set_color(snark_palette[-1])\n",
"ax.spines['left'].set_linestyle((0, (1, 10)))\n",
"ax.spines['left'].set_color(snark_palette[-1])\n",
"ax.set_xlabel('Life Expectancy at Birth', loc='center', size='x-large', c=snark_palette[-1]) # set label of x axis\n",
"ax.set_xticks([i for i in range(50, 100, 10)])\n",
"ax.set_xticklabels([i for i in range(50, 100, 10)], c=snark_palette[-1])\n",
"ax.set_ylabel('Score', loc='center', size='x-large', c=snark_palette[-1]) # set label of y axis\n",
"ax.set_yticks([i for i in range(2, 9)])\n",
"ax.set_yticklabels([i for i in range(2, 9)], c=snark_palette[-1])\n",
"ax.tick_params(axis='both', labelsize='small', colors=snark_palette[-1], direction='out') # set x/y ticks\n",
"\n",
"# Save and plot\n",
"plt.savefig('plot.pic/plot.happiness.life_exp_at_birth.png', dpi=150, bbox_inches='tight')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Median Age And Happiness Score\n",
"\n",
"We would like to see how happiness is distributed according to the age of the population. \n",
"To do this, we will divide the countries by age group based on the median age."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
age_range
\n",
"
median_age
\n",
"
score
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
(15, 20]
\n",
"
18.134375
\n",
"
4.083625
\n",
"
\n",
"
\n",
"
1
\n",
"
(20, 25]
\n",
"
22.980952
\n",
"
4.746381
\n",
"
\n",
"
\n",
"
2
\n",
"
(25, 30]
\n",
"
28.318519
\n",
"
5.493296
\n",
"
\n",
"
\n",
"
3
\n",
"
(30, 35]
\n",
"
32.005263
\n",
"
5.717632
\n",
"
\n",
"
\n",
"
4
\n",
"
(35, 40]
\n",
"
37.645000
\n",
"
6.170850
\n",
"
\n",
"
\n",
"
5
\n",
"
(40, 45]
\n",
"
42.484375
\n",
"
6.202000
\n",
"
\n",
"
\n",
"
6
\n",
"
(45, 50]
\n",
"
46.966667
\n",
"
6.293333
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" age_range median_age score\n",
"0 (15, 20] 18.134375 4.083625\n",
"1 (20, 25] 22.980952 4.746381\n",
"2 (25, 30] 28.318519 5.493296\n",
"3 (30, 35] 32.005263 5.717632\n",
"4 (35, 40] 37.645000 6.170850\n",
"5 (40, 45] 42.484375 6.202000\n",
"6 (45, 50] 46.966667 6.293333"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Calculate min and max median age\n",
"median_age_min = cia['median_age'].min()\n",
"median_age_max = cia['median_age'].max()\n",
"\n",
"# Calculate bins\n",
"bin_step = 5\n",
"bin_from = int((median_age_min // bin_step) * bin_step)\n",
"bin_to = int((median_age_max // bin_step) * bin_step + bin_step)\n",
"bins = [i for i in range(bin_from, bin_to, bin_step)] # age groups\n",
"\n",
"happiness_cia['age_range'] = pd.cut(happiness_cia['median_age'], bins)\n",
"happiness_cia_grouped = (happiness_cia[['age_range', 'median_age', 'score']].groupby('age_range')\n",
" .mean()\n",
" .reset_index()\n",
" )\n",
"happiness_cia_grouped"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"\r\n",
"\r\n",
"\r\n",
"\r\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Inscriptions\n",
"title = \"\"\"The Relationship Between Median Age And Happiness\"\"\"\n",
"description = \"\"\"\n",
"Correlation of the median age with the happiness score by country based on 2018 data.\n",
"Data: Gallup World Poll - www.kaggle.com/unsdsn/world-happiness & CIA - www.cia.gov/library/publications/the-world-factbook | Author: @data.sugar\n",
"\"\"\"\n",
"\n",
"# Plot size\n",
"figsize = (6,4)\n",
"\n",
"# Set the figure\n",
"sns.set(context='paper', style='whitegrid', palette=snark_palette,\n",
" rc={'xtick.major.size': 4, 'ytick.major.size':4,\n",
" 'axes.spines.left': False, 'axes.spines.bottom': False,\n",
" 'axes.spines.right': False, 'axes.spines.top': False\n",
" }\n",
" )\n",
"\n",
"# Create the plot\n",
"fig = plt.figure(figsize=figsize, facecolor='w')\n",
"ax = sns.violinplot(x='age_range', y='score', data=happiness_cia,\n",
" inner='quart', linewidth=1,\n",
" palette='spring_r'\n",
" )\n",
"\n",
"# Set some aesthetic params for the plot\n",
"ax.set_title(title, fontdict={'fontsize': 16}, loc='center', pad=10, c=snark_palette[-1]) # set a title of the plot\n",
"ax.annotate(description, xy=(0.03, 0), size=6, xycoords='figure fraction', c=snark_palette[-1])\n",
"ax.xaxis.set_label_text('') # remove label of x axis\n",
"ax.text(s='Median Age', x=3, y=2.2, horizontalalignment='center', verticalalignment='center', size='large', c=snark_palette[-1]) # set label of x axis\n",
"ax.set_ylabel('Happiness score', loc='center', size='large', c=snark_palette[-1]) # set label of y axis\n",
"ax.tick_params(axis='both', labelsize='medium', colors=snark_palette[-1]) # set x/y ticks\n",
"\n",
"# Save and plot\n",
"plt.savefig('plot.pic/plot.happiness.median_age.png', dpi=150, bbox_inches='tight')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Correlation Map\n",
"\n",
"We'll plot the correlation matrix below, but first, we'll prepare the data to make the plot easier to read."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"# A triangular mask to avoid repeated values\n",
"happiness_cia_corr = happiness_cia_corr.iloc[1:, :-1]\n",
"mask = np.triu(np.ones_like(happiness_cia_corr), k=1)\n",
"\n",
"# Readable names for the plot\n",
"cols_dict = {'score':'Happiness',\n",
" 'life_expectancy_at_birth':'Life exp.\\nat birth',\n",
" 'median_age':'Median\\nage',\n",
" 'birth':'Birth',\n",
" 'population_growth':'Population\\ngrowth',\n",
" 'death':'Death'\n",
" }\n",
"# Rename columns in the correlation matrix\n",
"happiness_cia_corr.rename(columns=cols_dict, index=cols_dict, inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"\r\n",
"\r\n",
"\r\n",
"\r\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Color palette for the data\n",
"palette = [snark_palette[0], # red\n",
" 'lightgrey',\n",
" snark_palette[1] # green\n",
" ]\n",
"\n",
"# Inscriptions\n",
"title = \"\"\"Relationship Between Age Indicators And The Happiness score\"\"\"\n",
"description = \"\"\"\n",
"Сorrelation of age indicators with the happiness score by country based on 2018 data.\n",
"Data: Gallup World Poll - www.kaggle.com/unsdsn/world-happiness & CIA - www.cia.gov/library/publications/the-world-factbook | Author: @data.sugar\n",
"\"\"\"\n",
"\n",
"# Plot size\n",
"figsize = (6,4)\n",
"\n",
"# Set the figure\n",
"sns.set(context='paper', style='ticks', palette=palette,\n",
" rc={'xtick.bottom':False, 'ytick.left':False, \n",
" 'axes.spines.left': False, 'axes.spines.bottom': False,\n",
" 'axes.spines.right': False, 'axes.spines.top': False\n",
" }\n",
" )\n",
"\n",
"# Create the plot\n",
"fig, ax = plt.subplots(1, 1, figsize=figsize, facecolor='w')\n",
"sns.heatmap(happiness_cia_corr, mask=mask, cmap=palette,\n",
" vmin=-1, vmax=1, center=0,\n",
" square=False, linewidths=.5, annot=True, fmt='.2g',\n",
" cbar_kws={'shrink': 1, 'ticks':[], 'label':'-1 negative <- correlation -> positive +1'},\n",
" ax=ax)\n",
"\n",
"# Set some aesthetic params for the plot\n",
"ax.set_title(title, fontdict={'fontsize': 16}, loc='center', pad=10, c=snark_palette[-1]) # set a title of the plot\n",
"ax.annotate(description, xy=(20, -4), size=6, xycoords='figure points', c=snark_palette[-1])\n",
"ax.tick_params(axis='both', colors=snark_palette[-1]) # set x/y ticks\n",
"ax.set_yticklabels(ax.get_yticklabels(), rotation=0) # set rotation for y tick labels\n",
"\n",
"# Save and plot\n",
"plt.savefig('plot.pic/plot.happiness.age.png', dpi=150, bbox_inches='tight')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Not surprisingly, the older the population, the lower the birth rate. It's surprising how strong the correlation is between birth rate and median age. \n",
"The younger population has a higher population growth rate, but it is less happy."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Median Age And Birth"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/svg+xml": [
"\r\n",
"\r\n",
"\r\n",
"\r\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Inscriptions\n",
"title = \"\"\"The Relationship Between Median Age And Birth\"\"\"\n",
"description = \"\"\"\n",
"Correlation of the median age with the birth rate by country based on 2018 data.\n",
"Data: Gallup World Poll - www.kaggle.com/unsdsn/world-happiness & CIA - www.cia.gov/library/publications/the-world-factbook | Author: @data.sugar\n",
"\"\"\"\n",
"\n",
"# Plot size\n",
"figsize = (6,4)\n",
"\n",
"# Set the figure\n",
"sns.set(context='paper', style='ticks', palette=snark_palette,\n",
" rc={'xtick.major.size': 4, 'ytick.major.size':4,\n",
" 'axes.spines.left': False, 'axes.spines.bottom': False,\n",
" 'axes.spines.right': False, 'axes.spines.top': False\n",
" }\n",
" )\n",
"\n",
"# Create the plot\n",
"fig = plt.figure(figsize=figsize, facecolor='w')\n",
"g = sns.jointplot(x='median_age', y='birth', data=happiness_cia,\n",
" kind='reg', truncate=False, dropna=True,\n",
" xlim=(10, 50), ylim=(0, 50),\n",
" marginal_kws=dict(hist=True, bins=10),\n",
" color=snark_palette[0]\n",
" )\n",
"\n",
"# Set some aesthetic params for the plot\n",
"g.ax_marg_x.set_title(title, fontdict={'fontsize': 16}, loc='center', pad=10, c=snark_palette[-1]) # set a title of the plot\n",
"g.ax_marg_x.annotate(description, xy=(0.015, -0.01), size=6, xycoords='figure fraction', c=snark_palette[-1])\n",
"g.ax_joint.set_xlabel('Median Age', loc='center', size='x-large', c=snark_palette[-1]) # set label of x axis\n",
"g.ax_joint.set_ylabel('Birth', loc='center', size='x-large', c=snark_palette[-1]) # set label of y axis\n",
"g.ax_joint.tick_params(axis='both', labelsize='large', colors=snark_palette[-1]) # set x/y ticks\n",
"g.ax_joint.spines['bottom'].set_color(snark_palette[-1]) # color x axis\n",
"g.ax_joint.spines['left'].set_color(snark_palette[-1]) # color y axis\n",
"g.ax_marg_x.tick_params(axis='x', bottom=False) # disable x margin ticks\n",
"g.ax_marg_x.spines['bottom'].set_color(snark_palette[0])\n",
"g.ax_marg_y.tick_params(axis='y', left=False) # disable y margin ticks\n",
"g.ax_marg_y.spines['left'].set_color(snark_palette[0])\n",
"\n",
"# Save and plot\n",
"plt.savefig('plot.pic/plot.happiness.age.birth.png', dpi=150, bbox_inches='tight')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusions\n",
"\n",
"[In the previous project](https://nbviewer.jupyter.org/github/chupstee/data.sugar/blob/master/00002_world_happiness/world_happiness.where.map.ipynb), we have already defined the TOP5 least happy countries. This rating includes Afghanistan and the Central African Republic. \n",
"These countries were also included in the TOP5 countries with the lowest life expectancy at birth. \n",
"\n",
"Life expectancy at birth describes the overall quality of life in the country and indicates the health of the population. Not surprisingly, the higher the life expectancy at birth, the higher the happiness score. \n",
"\n",
"Let's take a look at the relationship between median age and happiness. \n",
"It seems that society in its development reaches some point of saturation, when a person, living a longer life, does not become happier. \n",
"\n",
"Some correlations indicate that the older the population, the less striving (or able?) to self-reproduction:\n",
"\n",
"- the higher median age, the lower population growth;\n",
"- the higher median age, the lower birth rate.\n",
"\n",
"Earlier, we found out that thirty three is a special age. You will find research on this topic [here](https://nbviewer.jupyter.org/github/chupstee/data.sugar/blob/master/00001_thirty_years_old/thirty_years_old.ipynb). \n",
"It is curious that the age of the population over which the happiness score does not change significantly is also about thirty five years old. \n",
"Even more interesting is that the median age of the world's population is thirty years, as we saw above. \n",
"Well, perhaps humanity is in its prime. On average, of course. \n",
"\n",
"\n",
"# Blog Post"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1976"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"blog_post = r\"\"\"\n",
"## IN SEARCH OF 🎈HAPPINESS: 🕗WHEN? \n",
"\n",
"\n",
"💪The \\#happiness \\#challenge on the go! \n",
"\n",
"👀Today we wondered when we are becoming happier. \n",
"To compare population age with happiness score, we'll take the World Happiness Report from \\#Kaggle, which ranks 156 countries for happiness on a 10-point scale, and use \\#CIA data from The World Factbook. \n",
"\n",
"📌We will try to discover the happiest age! \n",
"\n",
"✔In the previous project \\#data_sugar_happiness, we have already defined the \\#TOP5 least happy countries. \n",
"This rating includes Afghanistan and the Central African Republic. \n",
"These countries were also included in the TOP5 countries with the lowest life expectancy at birth. \n",
"\n",
"✔Life expectancy at birth describes the overall quality of life in the country and indicates the health of the population. \n",
"Not surprisingly, the higher the life expectancy at birth, the higher the happiness score. \n",
"\n",
"✔Let's take a look at the relationship between median age and happiness. \n",
"It seems that society in its development reaches some point of saturation, when a person, living a longer \\#life, does not become happier. \n",
"\n",
"✔Some correlations indicate that the older the \\#population, the less striving (or able?) to self-reproduction: \n",
"\n",
"- the higher median age, the lower population growth; \n",
"- the higher median age, the lower birth rate. \n",
"\n",
"📝Earlier \\#data_sugar_brain, we found out that 33 is a special age. \n",
"It is curious that the age of the population over which the happiness score does not change significantly is also about 35 years old. \n",
"Even more interesting is that the median age of the world's population is 30 years, as we saw above. \n",
"\n",
"Well, perhaps \\#humanity is in its \\#prime. \n",
"On average, of course. \n",
"\n",
"\\#tobecontinued\n",
"\n",
"(Interested in more details? Follow the link in bio for the entire research project!) \n",
". \n",
". \n",
". \n",
"\\#funtime \\#probably \\#datascience \\#datapower \\#data_sugar_happiness \\#happy\n",
"\\#data_know_everything_and_nothing \\#linkinbio \\#datajournalism \\#python\n",
"\"\"\"\n",
"\n",
"# Check post text length for Instagram\n",
"len(blog_post)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"\n",
"## IN SEARCH OF 🎈HAPPINESS: 🕗WHEN? \n",
"\n",
"\n",
"💪The \\#happiness \\#challenge on the go! \n",
"\n",
"👀Today we wondered when we are becoming happier. \n",
"To compare population age with happiness score, we'll take the World Happiness Report from \\#Kaggle, which ranks 156 countries for happiness on a 10-point scale, and use \\#CIA data from The World Factbook. \n",
"\n",
"📌We will try to discover the happiest age! \n",
"\n",
"✔In the previous project \\#data_sugar_happiness, we have already defined the \\#TOP5 least happy countries. \n",
"This rating includes Afghanistan and the Central African Republic. \n",
"These countries were also included in the TOP5 countries with the lowest life expectancy at birth. \n",
"\n",
"✔Life expectancy at birth describes the overall quality of life in the country and indicates the health of the population. \n",
"Not surprisingly, the higher the life expectancy at birth, the higher the happiness score. \n",
"\n",
"✔Let's take a look at the relationship between median age and happiness. \n",
"It seems that society in its development reaches some point of saturation, when a person, living a longer \\#life, does not become happier. \n",
"\n",
"✔Some correlations indicate that the older the \\#population, the less striving (or able?) to self-reproduction: \n",
"\n",
"- the higher median age, the lower population growth; \n",
"- the higher median age, the lower birth rate. \n",
"\n",
"📝Earlier \\#data_sugar_brain, we found out that 33 is a special age. \n",
"It is curious that the age of the population over which the happiness score does not change significantly is also about 35 years old. \n",
"Even more interesting is that the median age of the world's population is 30 years, as we saw above. \n",
"\n",
"Well, perhaps \\#humanity is in its \\#prime. \n",
"On average, of course. \n",
"\n",
"\\#tobecontinued\n",
"\n",
"(Interested in more details? Follow the link in bio for the entire research project!) \n",
". \n",
". \n",
". \n",
"\\#funtime \\#probably \\#datascience \\#datapower \\#data_sugar_happiness \\#happy\n",
"\\#data_know_everything_and_nothing \\#linkinbio \\#datajournalism \\#python\n"
],
"text/plain": [
""
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from IPython.display import Markdown as md\n",
"md(blog_post)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.1"
}
},
"nbformat": 4,
"nbformat_minor": 4
}