{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Python 101\n",
    "## Part XI.\n",
    "---\n",
    "\n",
    "## More pandas are coming!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "BASE_URI = \"./data/\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## 1. Read demographic data\n",
    "\n",
    "- Read data from a csv file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "df = pd.read_csv(BASE_URI + 'population.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Select the columns we need"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "columns = ['Country Name'] + [str(year) for year in range(1960, 2013)]\n",
    "pop = df[columns].dropna()\n",
    "pop.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Rename `'Country Name'` to `'country'` and set it as index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "pop = pop.rename(columns={'Country Name': 'country'}).set_index('country')\n",
    "pop.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Select a few countries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "countries = ['United Kingdom', 'Hungary', 'France', 'Germany']\n",
    "subpop = pop.loc[pop.index.isin(countries)]\n",
    "subpop"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Transpose the dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "subpop = subpop.transpose()\n",
    "subpop.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Plot!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "subpop.plot.line();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## 2. Read data about alcohol\n",
    "\n",
    "- Read the data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "df = pd.read_csv(BASE_URI + 'alcohol.csv')\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Select the columns, and rename them"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "columns = {\n",
    "    'Country': 'country',\n",
    "    'Year': 'year',\n",
    "    'Beverage Types': 'type',\n",
    "    'Display Value': 'alcohol'\n",
    "}\n",
    "alc = df[list(columns.keys())].rename(columns=columns).dropna()\n",
    "alc.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Select the same country subset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "subalc = alc.loc[alc['country'].isin(countries)]\n",
    "subalc.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We only care about the combined consumption:\n",
    "- filter the dataframe, select rows where `type` is `'All'`\n",
    "- remove the now defunct column `type`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "subalc = subalc.loc[subalc['type'] == 'All'].drop(columns=['type'])\n",
    "subalc.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Pivot the dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "subalc = subalc.pivot(index='year', columns='country', values='alcohol')\n",
    "subalc.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Plot!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "subalc.plot();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## 3. Merge data\n",
    "\n",
    "- Check index types"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "print('subpop index type:', subpop.index.dtype)\n",
    "print('subalc index type:', subalc.index.dtype)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "subpop's index type is unicode, change it to integer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "subpop.index = subpop.index.astype('int')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Check index lengths"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "set(subpop.index.values).symmetric_difference(set(subalc.index.values))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Remove missing index values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "subpop = subpop.loc[(1989 < subpop.index) & (subpop.index < 2011)]\n",
    "subalc = subalc.loc[(1989 < subalc.index) & (subalc.index < 2011)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Join the two dataframe!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "merged = subpop.join(subalc, rsuffix='_alc')\n",
    "merged.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Plot data into separate coordinate systems"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "merged[['Hungary', 'Hungary_alc']].plot.line(subplots=True);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Compute the total actual alcohol consumption"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "for country in countries:\n",
    "    merged[country + '_consumption'] = merged[country] * merged[country + '_alc']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- and plot it!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "merged[[c + '_consumption' for c in countries]].plot();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## Let's do some...\n",
    "\n",
    "<img align=\"left\" width=150 src=\"pics/magic.gif\">\n",
    "<br style=\"clear:left;\"/>\n",
    "\n",
    "### Act IV: Cool library of the week: <a href=\"https://github.com/ydataai/ydata-profiling\">ydata-profiling</a>\n",
    "#### Generate detailed reports from pandas dataframes\n",
    "- import it\n",
    "- generate report\n",
    " \n",
    "Install with:\n",
    "```bash\n",
    "pip install ydata-profiling\n",
    "```\n",
    "\n",
    "in case of import error, try:\n",
    "```bash\n",
    "pip install numba -U\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import ydata_profiling"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "profile = ydata_profiling.ProfileReport(alc)\n",
    "profile"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Export it to a file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "profile.to_file('./alc_profile.html')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## Final Act: your play time!\n",
    "## It's your turn - write the missing code snippets!\n",
    "\n",
    "#### 1.  Plot the top 5 alcohol consuming country in 1990 and their consumption"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2. Compare the average alcohol consumption in France, Germany, UK and Hungary by plotting"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3. Compute the most spirit consuming country for each year "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 4. How does the different type of alcohols changed over time in Hungary? Show it by plotting them!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 5. Which countries combined consuption changed the most from 1990 to 2010? (top5)\n",
    "Hint: `np.abs()` computes the absolute value of a pandas series "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "szisz_python_2025_automn",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}