{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center>\n",
    "<img src=\"../../img/ods_stickers.jpg\" />\n",
    "    \n",
    "## [mlcourse.ai](https://mlcourse.ai) – Open Machine Learning Course \n",
    "\n",
    "Author: [Yury Kashnitsky](https://yorko.github.io). This material is subject to the terms and conditions of the [Creative Commons CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license. Free use is permitted for any non-commercial purpose."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# <center> Topic 1. Exploratory data analysis with Pandas\n",
    "## <center>Practice. Analyzing \"Titanic\" passengers\n",
    "\n",
    "**Fill in the missing code (\"You code here\") and choose answers in a [web-form](https://docs.google.com/forms/d/16EfhpDGPrREry0gfDQdRPjoiQX9IumaL2mPR0rcj19k/edit).**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "from matplotlib import pyplot as plt\n",
    "\n",
    "# Graphics in SVG format are more sharp and legible\n",
    "%config InlineBackend.figure_format = 'svg'\n",
    "pd.set_option(\"display.precision\", 2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Read data into a Pandas DataFrame**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = pd.read_csv(\"../../data/titanic_train.csv\", index_col=\"PassengerId\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**First 5 rows**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Let's select those passengers who embarked in Cherbourg (Embarked=C) and paid > 200 pounds for their ticker (fare > 200).**\n",
    "\n",
    "Make sure you understand how actually this construction works."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data[(data[\"Embarked\"] == \"C\") & (data.Fare > 200)].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**We can sort these people by Fare in descending order.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data[(data[\"Embarked\"] == \"C\") & (data[\"Fare\"] > 200)].sort_values(\n",
    "    by=\"Fare\", ascending=False\n",
    ").head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Let's create a new feature.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def age_category(age):\n",
    "    \"\"\"\n",
    "    < 30 -> 1\n",
    "    >= 30, <55 -> 2\n",
    "    >= 55 -> 3\n",
    "    \"\"\"\n",
    "    if age < 30:\n",
    "        return 1\n",
    "    elif age < 55:\n",
    "        return 2\n",
    "    elif age >= 55:\n",
    "        return 3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "age_categories = [age_category(age) for age in data.Age]\n",
    "data[\"Age_category\"] = age_categories"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Another way is to do it with `apply`.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data[\"Age_category\"] = data[\"Age\"].apply(age_category)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**1. How many men/women were there onboard?**\n",
    "- 412 men and 479 women\n",
    "- 314 men and 577 women\n",
    "- 479 men and 412 women\n",
    "- 577 men and 314 women"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# You code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**2. Print the distribution of the `Pclass` feature. Then the same, but for men and women separately. How many men from second class were there onboard?**\n",
    "- 104\n",
    "- 108\n",
    "- 112\n",
    "- 125"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# You code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**3. What are median and standard deviation of `Fare`?. Round to two decimals.**\n",
    "- median is  14.45, standard deviation is 49.69\n",
    "- median is 15.1, standard deviation is 12.15\n",
    "- median is 13.15, standard deviation is 35.3\n",
    "- median is  17.43, standard deviation is 39.1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# You code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**4. Is that true that the mean age of survived people is higher than that of passengers who eventually died?**\n",
    "- Yes\n",
    "- No\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# You code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**5. Is that true that passengers younger than 30 y.o. survived more frequently than those older than 60 y.o.? What are shares of survived people among young and old people?**\n",
    "- 22.7% among young and 40.6% among old\n",
    "- 40.6% among young and 22.7% among old\n",
    "- 35.3% among young and 27.4% among old\n",
    "- 27.4% among young and  35.3% among old"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# You code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**6. Is that true that women survived more frequently than men? What are shares of survived people among men and women?**\n",
    "- 30.2% among men and 46.2% among women\n",
    "- 35.7% among men and 74.2% among women\n",
    "- 21.1% among men and 46.2% among women\n",
    "- 18.9% among men and 74.2% among women"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# You code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**7. What's the most popular first name among male passengers?**\n",
    "- Charles\n",
    "- Thomas\n",
    "- William\n",
    "- John"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# You code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**8. How is average age for men/women dependent on `Pclass`? Choose all correct statements:**\n",
    "- On average, men of 1 class are older than 40\n",
    "- On average, women of 1 class are older than 40\n",
    "- Men of all classes are on average older than women of the same class\n",
    "- On average, passengers ofthe first class are older than those of the 2nd class who are older than passengers of the 3rd class"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# You code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Useful resources\n",
    "* The same notebook as an interactive web-based [Kaggle Kernel](https://www.kaggle.com/kashnitsky/topic-1-practice-analyzing-titanic-passengers) with a [solution](https://www.kaggle.com/kashnitsky/topic-1-practice-solution)\n",
    "* Topic 1 \"Exploratory Data Analysis with Pandas\" as a [Kaggle Kernel](https://www.kaggle.com/kashnitsky/topic-1-exploratory-data-analysis-with-pandas)\n",
    "* Main course [site](https://mlcourse.ai), [course repo](https://github.com/Yorko/mlcourse.ai), and YouTube [channel](https://www.youtube.com/watch?v=QKTuw4PNOsU&list=PLVlY_7IJCMJeRfZ68eVfEcu-UcN9BbwiX)"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  },
  "name": "seminar02_practice_pandas_titanic.ipynb"
 },
 "nbformat": 4,
 "nbformat_minor": 1
}