{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", " \n", "## [mlcourse.ai](https://mlcourse.ai) – Open Machine Learning Course \n", "\n", "Author: [Yury Kashnitsky](https://yorko.github.io). This material is subject to the terms and conditions of the [Creative Commons CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license. Free use is permitted for any non-commercial purpose." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#
Topic 1. Exploratory data analysis with Pandas\n", "##
Practice. Analyzing \"Titanic\" passengers. Solution\n", "\n", "**Fill in the missing code (\"Your code here\") and choose answers in a [web-form](https://docs.google.com/forms/d/16EfhpDGPrREry0gfDQdRPjoiQX9IumaL2mPR0rcj19k/edit).**" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "pd.set_option(\"display.precision\", 2)\n", "from matplotlib import pyplot as plt\n", "# Graphics in SVG format are more sharp and legible\n", "%config InlineBackend.figure_format = 'svg'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Read data into a Pandas DataFrame**" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "data = pd.read_csv('../../data/titanic_train.csv',\n", " index_col='PassengerId')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**First 5 rows**" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
PassengerId
103Braund, Mr. Owen Harrismale22.010A/5 211717.25NaNS
211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.28C85C
313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.92NaNS
411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.10C123S
503Allen, Mr. William Henrymale35.0003734508.05NaNS
\n", "
" ], "text/plain": [ " Survived Pclass \\\n", "PassengerId \n", "1 0 3 \n", "2 1 1 \n", "3 1 3 \n", "4 1 1 \n", "5 0 3 \n", "\n", " Name Sex Age \\\n", "PassengerId \n", "1 Braund, Mr. Owen Harris male 22.0 \n", "2 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 \n", "3 Heikkinen, Miss. Laina female 26.0 \n", "4 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 \n", "5 Allen, Mr. William Henry male 35.0 \n", "\n", " SibSp Parch Ticket Fare Cabin Embarked \n", "PassengerId \n", "1 1 0 A/5 21171 7.25 NaN S \n", "2 1 0 PC 17599 71.28 C85 C \n", "3 0 0 STON/O2. 3101282 7.92 NaN S \n", "4 1 0 113803 53.10 C123 S \n", "5 0 0 373450 8.05 NaN S " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.head(5)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SurvivedPclassAgeSibSpParchFare
count891.00891.00714.00891.00891.00891.00
mean0.382.3129.700.520.3832.20
std0.490.8414.531.100.8149.69
min0.001.000.420.000.000.00
25%0.002.0020.120.000.007.91
50%0.003.0028.000.000.0014.45
75%1.003.0038.001.000.0031.00
max1.003.0080.008.006.00512.33
\n", "
" ], "text/plain": [ " Survived Pclass Age SibSp Parch Fare\n", "count 891.00 891.00 714.00 891.00 891.00 891.00\n", "mean 0.38 2.31 29.70 0.52 0.38 32.20\n", "std 0.49 0.84 14.53 1.10 0.81 49.69\n", "min 0.00 1.00 0.42 0.00 0.00 0.00\n", "25% 0.00 2.00 20.12 0.00 0.00 7.91\n", "50% 0.00 3.00 28.00 0.00 0.00 14.45\n", "75% 1.00 3.00 38.00 1.00 0.00 31.00\n", "max 1.00 3.00 80.00 8.00 6.00 512.33" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Let's select those passengers who embarked in Cherbourg (Embarked=C) and paid > 200 pounds for their ticker (fare > 200).**\n", "\n", "Make sure you understand how actually this construction works." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
PassengerId
11901Baxter, Mr. Quigg Edmondmale24.001PC 17558247.52B58 B60C
25911Ward, Miss. Annafemale35.000PC 17755512.33NaNC
30011Baxter, Mrs. James (Helene DeLaudeniere Chaput)female50.001PC 17558247.52B58 B60C
31211Ryerson, Miss. Emily Boriefemale18.022PC 17608262.38B57 B59 B63 B66C
37801Widener, Mr. Harry Elkinsmale27.002113503211.50C82C
\n", "
" ], "text/plain": [ " Survived Pclass \\\n", "PassengerId \n", "119 0 1 \n", "259 1 1 \n", "300 1 1 \n", "312 1 1 \n", "378 0 1 \n", "\n", " Name Sex Age \\\n", "PassengerId \n", "119 Baxter, Mr. Quigg Edmond male 24.0 \n", "259 Ward, Miss. Anna female 35.0 \n", "300 Baxter, Mrs. James (Helene DeLaudeniere Chaput) female 50.0 \n", "312 Ryerson, Miss. Emily Borie female 18.0 \n", "378 Widener, Mr. Harry Elkins male 27.0 \n", "\n", " SibSp Parch Ticket Fare Cabin Embarked \n", "PassengerId \n", "119 0 1 PC 17558 247.52 B58 B60 C \n", "259 0 0 PC 17755 512.33 NaN C \n", "300 0 1 PC 17558 247.52 B58 B60 C \n", "312 2 2 PC 17608 262.38 B57 B59 B63 B66 C \n", "378 0 2 113503 211.50 C82 C " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data[(data['Embarked'] == 'C') & (data.Fare > 200)].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**We can sort these people by Fare in descending order.**" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
PassengerId
25911Ward, Miss. Annafemale35.000PC 17755512.33NaNC
68011Cardeza, Mr. Thomas Drake Martinezmale36.001PC 17755512.33B51 B53 B55C
73811Lesurer, Mr. Gustave Jmale35.000PC 17755512.33B101C
31211Ryerson, Miss. Emily Boriefemale18.022PC 17608262.38B57 B59 B63 B66C
74311Ryerson, Miss. Susan Parker \"Suzette\"female21.022PC 17608262.38B57 B59 B63 B66C
\n", "