{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"https://dsiag.ch/images/dsi_rgb.png\" alt=\"dsi logo\" width=\"100\" style=\"position: absolute; right: 0px;\"/>\n",
    "\n",
    "# Pandas 101 - Series & DataFrames\n",
    "\n",
    "Get to know the basic operations on series and dataframes.\n",
    "\n",
    "For more details check the [User Guide](https://pandas.pydata.org/docs/user_guide/index.html) or the [API reference](https://pandas.pydata.org/docs/reference/index.html) from https://pandas.pydata.org\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "# Series\n",
    "\n",
    "\n",
    "### Creation\n",
    "\n",
    "A `Series` object can be created by `pd.Series()` with a list of values as parameter. You can get the values by the `values` attribute of the `Series`, and the index values by the `index.values` attribute. An index can be a number o a string and it can be passed by the named parameter `index` of the `pd.Series()` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "series1 = pd.Series([1,2,3,4])\n",
    "print(\"series1 values: \" + str(series1.values))\n",
    "print(\"series1 indexes: \" + str(series1.index.values))\n",
    "\n",
    "series2 = pd.Series([1,2,3,4], index = ['a', 'b', 'c', 'd'])\n",
    "print(\"series2 values: \" + str(series2.values))\n",
    "print(\"series2 indexes: \" + str(series2.index.values))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "\n",
    "<div class=\"alert alert-block alert-success\">\n",
    "<b>Exercise: Create a Series</b> \n",
    "\n",
    "Create a series of values with the five largest countries on earth with the name of the country as index\n",
    "\n",
    "</div>\n",
    "\n",
    "\n",
    "| Country         | Size ($km^2$) |\n",
    "| --------------- | ------------ |\n",
    "| Russia          | 17,098,246   |\n",
    "| Canada          | 9,984,670    |\n",
    "| China           | 9,596,961    |\n",
    "| United States   | 9,525,067    |\n",
    "| Brazil          | 8,515,767    |\n",
    "\n",
    " \n",
    "***\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Selecting and Filtering Entries \n",
    "\n",
    "Single entries of a `Series` can be selected by the index or the index number.\n",
    "\n",
    "A filter can be used to select elements of the `Series` that match a given criteria. The result of the filtered `Series` are `Series` again."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "byIndex = series2['c']\n",
    "print(\"selecting by index: \"+ str(byIndex))\n",
    "\n",
    "byIndexNumber = series2[3]\n",
    "print(\"selecting by index number: \"+ str(byIndexNumber))\n",
    "\n",
    "byFilter = series2[series2 > 2]\n",
    "print(\"Filtering values:\\n\" + str(byFilter))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "\n",
    "<div class=\"alert alert-block alert-success\">\n",
    "<b>Exercise: Select & Filter a Series</b> \n",
    "    \n",
    "1. Select the value for Brazil\n",
    "\n",
    "2. Create a filter that select all countries that have more than $9,600,000~km^2$\n",
    "</div>\n",
    "\n",
    "\n",
    "***\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*** \n",
    "\n",
    "# DataFrame\n",
    "\n",
    "### Creation\n",
    "\n",
    "A `DataFrame` can be created in several ways. One of the simplest ways to create one from scratch is the pass a `dict` object to `pd.DataFrame()`. In Jupypter the `display()` function can be used to render a `DataFrame` as a pretty table. To print information about the metadata of your `DataFrame` you can use the `.info()` on your dataset. The get only the first or last rows you can use `.head()`and `tail()`, which is especially useful on large datasets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.DataFrame({\n",
    "    \"Column1\": [1, 4, 8, 7, 9],\n",
    "    \"Column2\": ['a', 'b', 'c', 'd', 'f'],\n",
    "    \"Column3\": [1.1, 23.5, 45.6, 32.1234, 89.453],\n",
    "    \"Column4\": [True, False, True, False, True]\n",
    "})\n",
    "\n",
    "print(\"Essential Information on DataFrame:\\n\")\n",
    "df.info()\n",
    "\n",
    "print(\"\\n\\nFull DataFrame:\")\n",
    "display(df)\n",
    "\n",
    "print(\"\\n\\nHead of DataFrame:\")\n",
    "display(df.head(2))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "\n",
    "<div class=\"alert alert-block alert-success\">\n",
    "<b>Exercise: Create a DataFrame</b> \n",
    "\n",
    "Create a DataFrame for the land and water size of the largest countries. What are the datatypes of your columns\n",
    "    \n",
    "</div>\n",
    "\n",
    "| Country        | Land        | Water    |\n",
    "| -------------- | ----------- | -------- |\n",
    "| Russia         | 16,377,742  | 720,500  |\n",
    "| Canada         | 9,093,507   | 891,163  |\n",
    "| China          | 9,326,410   | 270,550  |\n",
    "| United States  | 9,147,593   | 377,424  |\n",
    "| Brazil         | 8,460,415   | 55,352   |\n",
    "\n",
    "***\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Selecting data by column\n",
    "\n",
    "A single column an be selected by `df['column_name']` (the result is a `Series` object), multiple columns by `df[['column_name1', 'column_name2']]`  (the result is a `DataFrame` object)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "display(df['Column1'])\n",
    "\n",
    "display(df[['Column2', 'Column3']])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "\n",
    "<div class=\"alert alert-block alert-success\">\n",
    "<b>Exercise: Select columns of DataFrame</b> \n",
    "\n",
    "Use your country dataframe.\n",
    "1. Select all country names as a series    \n",
    "2. Create a new DataFrame that contains only two columns, the country name and the size of the land.   \n",
    "</div>\n",
    "\n",
    "***\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Selecting data by row\n",
    "\n",
    "Every row has an index. This index can be used to select one or multiple rows with the `.loc` function. \n",
    "\n",
    "A filter can be used to select elements of the `DataFrame` that match a given criteria. The result of the filtered `DataFrame` are `DataFrame` again.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"A single row as Series object:\")\n",
    "print(df.loc[2])\n",
    "\n",
    "print(\"\\n\\nMultiple rows as DataFrame object:\")\n",
    "display(df.loc[[2, 4]])\n",
    "\n",
    "print(\"\\n\\nMultiple rows (as range) as DataFrame object: \\n\")\n",
    "display(df.loc[2:5])\n",
    "\n",
    "print(\"\\n\\nRows filtered (Column3 > 42):\")\n",
    "\n",
    "filtered = df[df['Column3'] < 42]\n",
    "display(filtered)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "\n",
    "<div class=\"alert alert-block alert-success\">\n",
    "<b>Exercise: Filter rows of DataFrame</b> \n",
    "\n",
    "1. Filter your country dataframe to only display rows that have a water size of $500000~km^2$ \n",
    "    \n",
    "2. Filter your country dataframe to only display rows that have a land size that more than 20 times larger than the water size \n",
    "</div>\n",
    "\n",
    "***\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Adding / Removing a Series to a DataFrame\n",
    "\n",
    "A `Series` can easily be added to a `DataFrame` as a column with the assignment operation. For each index in the `DataFrame` the value at the index in the `Series` is assigned. If an index is not found the cell will be empty.\n",
    "\n",
    "With the `.drop` function a column (axis=1) or a row (axis=0) can be removed from a `DataFrame`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df['myserie'] = series1\n",
    "display(df)\n",
    "\n",
    "df.drop('myserie', axis=1, inplace=True)\n",
    "display(df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.set_index('Column3')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "\n",
    "<div class=\"alert alert-block alert-success\">\n",
    "<b>Exercise: Add a Series to a DataFrame</b> \n",
    "\n",
    "1. Add your Series with the total size of the country as a column named 'Total' to your DataFrame.\n",
    "   \n",
    "    <i><b>Note</b>: Your DataFrame must have the same index as your Series. Use df.set_index('columnname') to transform a regular column to an index. (This can reverted by the .reset_index() function)</i>\n",
    "   \n",
    "2. Create a new column named 'LandAndWater' that contains the sum of Land and Water. Is it equal to the 'Total' column?\n",
    "</div>\n",
    "\n",
    "***\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}