{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Visualization of the Titanic's Voyage\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. [Data Preparation](#1.-Data-Preparation)\n", "\n", " 1.1 [Travellers Survival Rates by the Port of Embarkation](#1.1-Travellers-Survival-Rates-by-the-Port-of-Embarkation)\n", " \n", "2. [Ports of Embarkation on Map](#2.-Ports-of-Embarkation-on-Map) \n", "\n", " 2.1 [Spatial DataFrame (Pandas GeoDataFrame)](#2.1-Spatial-DataFrame-(Pandas-GeoDataFrame))\n", " \n", " 2.2 [Markers on Map](#2.2-Markers-on-Map)\n", " \n", "3. [The \"Titanic's site\" Marker](#3.-The-\"Titanic's-site\"-Marker) \n", "\n", "4. [The New York City Marker](#4.-The-New-York-City-Marker) \n", "\n", "5. [Connecting Markers on Map](#5.-Connecting-Markers-on-Map) \n", "\n", "6. [Pie-chart Markers on Map](#6.-Pie-chart-Markers-on-Map) \n", "\n", " 6.1 [Travellers Survival Rates, Now with Pie](#6.1-Travellers-Survival-Rates,-Now-with-Pie)\n", " \n", " 6.2 [Spatial Pies](#6.2-Spatial-Pies)\n", " \n", " 6.3 [Adjusting the Map Zoom Level and Position](#6.3-Adjusting-the-Map-Zoom-Level-and-Position)\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2024-11-01T20:24:51.075753Z", "iopub.status.busy": "2024-11-01T20:24:51.075678Z", "iopub.status.idle": "2024-11-01T20:24:51.395338Z", "shell.execute_reply": "2024-11-01T20:24:51.394914Z" } }, "outputs": [], "source": [ "import pandas as pd\n", "\n", "from lets_plot import *" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2024-11-01T20:24:51.396725Z", "iopub.status.busy": "2024-11-01T20:24:51.396610Z", "iopub.status.idle": "2024-11-01T20:24:51.398865Z", "shell.execute_reply": "2024-11-01T20:24:51.398632Z" } }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", " \n", " " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "LetsPlot.setup_html()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1. Data Preparation\n", "\n", "The Titanic dataset is available at [kaggle](https://www.kaggle.com) : [\"Titanic: cleaned data\" dataset](https://www.kaggle.com/jamesleslie/titanic-cleaned-data?select=train_clean.csv) (train_clean.csv)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2024-11-01T20:24:51.411629Z", "iopub.status.busy": "2024-11-01T20:24:51.411543Z", "iopub.status.idle": "2024-11-01T20:24:51.909048Z", "shell.execute_reply": "2024-11-01T20:24:51.908580Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AgeCabinEmbarkedFareNameParchPassengerIdPclassSexSibSpSurvivedTicketTitleFamily_Size
022.0NaNS7.2500Braund, Mr. Owen Harris013male10.0A/5 21171Mr1
138.0C85C71.2833Cumings, Mrs. John Bradley (Florence Briggs Th...021female11.0PC 17599Mrs1
226.0NaNS7.9250Heikkinen, Miss. Laina033female01.0STON/O2. 3101282Miss0
\n", "
" ], "text/plain": [ " Age Cabin Embarked Fare \\\n", "0 22.0 NaN S 7.2500 \n", "1 38.0 C85 C 71.2833 \n", "2 26.0 NaN S 7.9250 \n", "\n", " Name Parch PassengerId \\\n", "0 Braund, Mr. Owen Harris 0 1 \n", "1 Cumings, Mrs. John Bradley (Florence Briggs Th... 0 2 \n", "2 Heikkinen, Miss. Laina 0 3 \n", "\n", " Pclass Sex SibSp Survived Ticket Title Family_Size \n", "0 3 male 1 0.0 A/5 21171 Mr 1 \n", "1 1 female 1 1.0 PC 17599 Mrs 1 \n", "2 3 female 0 1.0 STON/O2. 3101282 Miss 0 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv(\"https://raw.githubusercontent.com/JetBrains/lets-plot-docs/master/data/titanic.csv\")\n", "df.head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this Titanic dataset the column `Embarked`contains a single-letter codes of the Titanic's ports of embarkation:\n", "- S: Southampton (UK)\n", "- C: Cherbourg (France)\n", "- Q: Cobh (Ireland)\n", "\n", "Let's add new colum \"Port\" to the data:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2024-11-01T20:24:51.910165Z", "iopub.status.busy": "2024-11-01T20:24:51.910077Z", "iopub.status.idle": "2024-11-01T20:24:51.917890Z", "shell.execute_reply": "2024-11-01T20:24:51.917637Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AgeCabinEmbarkedFareNameParchPassengerIdPclassSexSibSpSurvivedTicketTitleFamily_SizePort
022.0NaNS7.2500Braund, Mr. Owen Harris013male10.0A/5 21171Mr1Southampton
138.0C85C71.2833Cumings, Mrs. John Bradley (Florence Briggs Th...021female11.0PC 17599Mrs1Cherbourg
226.0NaNS7.9250Heikkinen, Miss. Laina033female01.0STON/O2. 3101282Miss0Southampton
\n", "
" ], "text/plain": [ " Age Cabin Embarked Fare \\\n", "0 22.0 NaN S 7.2500 \n", "1 38.0 C85 C 71.2833 \n", "2 26.0 NaN S 7.9250 \n", "\n", " Name Parch PassengerId \\\n", "0 Braund, Mr. Owen Harris 0 1 \n", "1 Cumings, Mrs. John Bradley (Florence Briggs Th... 0 2 \n", "2 Heikkinen, Miss. Laina 0 3 \n", "\n", " Pclass Sex SibSp Survived Ticket Title Family_Size \\\n", "0 3 male 1 0.0 A/5 21171 Mr 1 \n", "1 1 female 1 1.0 PC 17599 Mrs 1 \n", "2 3 female 0 1.0 STON/O2. 3101282 Miss 0 \n", "\n", " Port \n", "0 Southampton \n", "1 Cherbourg \n", "2 Southampton " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def to_port_name (row):\n", " if row['Embarked'] == 'S' :\n", " return 'Southampton'\n", " if row['Embarked'] == 'C' :\n", " return 'Cherbourg'\n", " if row['Embarked'] == 'Q' :\n", " return 'Cobh'\n", " return 'Other'\n", "\n", "df['Port']=df.apply (lambda row: to_port_name(row), axis=1)\n", "df.head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 1.1 Travellers Survival Rates by the Port of Embarkation" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2024-11-01T20:24:51.918961Z", "iopub.status.busy": "2024-11-01T20:24:51.918879Z", "iopub.status.idle": "2024-11-01T20:24:51.960570Z", "shell.execute_reply": "2024-11-01T20:24:51.960317Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c_surv=\"#E1A439\"\n", "c_lost=\"#6B9993\"\n", "\n", "bars = (ggplot(df) +\n", " geom_bar(\n", " aes('Port', fill=as_discrete('Survived')), \n", " tooltips=layer_tooltips()\n", " .line('@{..count..} (@{..prop..})')\n", " .format('@{..prop..}', '.0%'),\n", " position='dodge') +\n", " scale_fill_manual(values=[c_lost, c_surv], labels=['no', 'yes']) +\n", " scale_x_discrete(limits=['Cobh', 'Cherbourg', 'Southampton']) +\n", " labs(x=\"\", y=\"Travellers count\") + \n", " ggtitle(\"Survival by the Port of Embarkation\")\n", ") \n", "\n", "bars + ggsize(800, 300)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 2. Ports of Embarkation on Map\n", "\n", "Titanic's ports of of embarkation were:\n", "- Southampton (UK)\n", "- Cherbourg (France)\n", "- Cobh (Ireland)\n", "\n", "Let's find geographical coordinates of these cities using `Lets-Plot` geocoding module." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2024-11-01T20:24:51.961666Z", "iopub.status.busy": "2024-11-01T20:24:51.961575Z", "iopub.status.idle": "2024-11-01T20:24:51.966257Z", "shell.execute_reply": "2024-11-01T20:24:51.966051Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The geodata is provided by © OpenStreetMap contributors and is made available here under the Open Database License (ODbL).\n" ] } ], "source": [ "from lets_plot.geo_data import *\n", "\n", "ports = ['Southampton', 'Cherbourg', 'Cobh']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 2.1 Spatial DataFrame (Pandas GeoDataFrame)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2024-11-01T20:24:51.967241Z", "iopub.status.busy": "2024-11-01T20:24:51.967156Z", "iopub.status.idle": "2024-11-01T20:24:52.464840Z", "shell.execute_reply": "2024-11-01T20:24:52.464535Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cityfound namegeometry
0SouthamptonSouthamptonPOINT (-1.40254 50.91837)
1CherbourgCherbourgPOINT (-1.60901 49.62728)
2CobhCobhPOINT (-8.29428 51.85315)
\n", "
" ], "text/plain": [ " city found name geometry\n", "0 Southampton Southampton POINT (-1.40254 50.91837)\n", "1 Cherbourg Cherbourg POINT (-1.60901 49.62728)\n", "2 Cobh Cobh POINT (-8.29428 51.85315)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ports_gcoder = (geocode_cities(ports)\n", " .where(ports[0], scope='England')\n", " .where(ports[1], scope='France'))\n", "ports_gdf = ports_gcoder.get_centroids()\n", "ports_gdf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 2.2 Markers on Map\n", "\n", "`Lets-Plot` API makes it easy to create an interactive basemap layer using either its own vector tiles service or \n", "by configuring a 3-rd party Z-X-Y raster tile providers.\n", "\n", "In this notebook we will use beautifull *CARTO Voyager* raster tiles by [CARTO](https://carto.com/attribution/) as our basemap.\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2024-11-01T20:24:52.466085Z", "iopub.status.busy": "2024-11-01T20:24:52.465986Z", "iopub.status.idle": "2024-11-01T20:24:52.468137Z", "shell.execute_reply": "2024-11-01T20:24:52.467864Z" } }, "outputs": [], "source": [ "from lets_plot import tilesets\n", "\n", "LetsPlot.set(tilesets.CARTO_VOYAGER_HIRES)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2024-11-01T20:24:52.469119Z", "iopub.status.busy": "2024-11-01T20:24:52.469041Z", "iopub.status.idle": "2024-11-01T20:24:52.472928Z", "shell.execute_reply": "2024-11-01T20:24:52.472746Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "basemap = ggplot() + geom_livemap() + ggsize(800, 350)\n", "\n", "\n", "port_markers = geom_point(\n", " map=ports_gdf, \n", " size=7, \n", " shape=21, \n", " color=\"black\", \n", " fill=\"yellow\")\n", "\n", "basemap + port_markers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3. The \"Titanic's site\" Marker" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2024-11-01T20:24:52.473962Z", "iopub.status.busy": "2024-11-01T20:24:52.473881Z", "iopub.status.idle": "2024-11-01T20:24:52.477538Z", "shell.execute_reply": "2024-11-01T20:24:52.477357Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from shapely.geometry import Point, LineString\n", "titanic_site = Point(-38.056641, 46.920255)\n", "\n", "titanic_site_marker = geom_point(x=titanic_site.x, y = titanic_site.y, size=10, shape=9, color='red')\n", "\n", "basemap + port_markers + titanic_site_marker" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 4. The New York City Marker\n", "\n", "New York City was the Titanic's destination." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2024-11-01T20:24:52.478591Z", "iopub.status.busy": "2024-11-01T20:24:52.478509Z", "iopub.status.idle": "2024-11-01T20:24:52.920858Z", "shell.execute_reply": "2024-11-01T20:24:52.920492Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "NYC = geocode_cities(['New York']).get_centroids().geometry[0]\n", "NYC_marker = geom_point(x=NYC.x, y=NYC.y, size=7, shape=21, color='black', fill='white')\n", "\n", "(basemap + \n", " port_markers +\n", " titanic_site_marker +\n", " NYC_marker\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 5. Connecting Markers on Map\n", "\n", "To connect markers on the map we will create a `LineString` object (from `Shaply` package).\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2024-11-01T20:24:52.922128Z", "iopub.status.busy": "2024-11-01T20:24:52.922008Z", "iopub.status.idle": "2024-11-01T20:24:52.924815Z", "shell.execute_reply": "2024-11-01T20:24:52.924633Z" } }, "outputs": [], "source": [ "from geopandas import GeoSeries\n", "from geopandas import GeoDataFrame\n", "\n", "# Points of embarkation (GeoSeries).\n", "port_points = ports_gdf.geometry\n", "path_points = pd.concat([port_points, GeoSeries([titanic_site, NYC], crs=ports_gdf.crs)], ignore_index=True)\n", "\n", "# Create a new GeoDataFrame containing a `LineString` geometry.\n", "path_gdf = GeoDataFrame(\n", " dict(geometry=[ LineString(path_points) ])\n", ")" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "execution": { "iopub.execute_input": "2024-11-01T20:24:52.925822Z", "iopub.status.busy": "2024-11-01T20:24:52.925736Z", "iopub.status.idle": "2024-11-01T20:24:52.930304Z", "shell.execute_reply": "2024-11-01T20:24:52.930112Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Add \"path\" to the map.\n", "titanic_path = geom_path(\n", " map=path_gdf, \n", " color='dark-blue', \n", " linetype='dotted', size=1.2)\n", "\n", "(basemap +\n", " titanic_path +\n", " port_markers +\n", " titanic_site_marker +\n", " NYC_marker\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 6. Pie-chart Markers on Map" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 6.1 Travellers Survival Rates, Now with Pie" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "execution": { "iopub.execute_input": "2024-11-01T20:24:52.931464Z", "iopub.status.busy": "2024-11-01T20:24:52.931337Z", "iopub.status.idle": "2024-11-01T20:24:52.943230Z", "shell.execute_reply": "2024-11-01T20:24:52.943032Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pies = (ggplot(df) +\n", " geom_pie(\n", " aes(x='Port', y=\"..sum..\", fill=as_discrete('Survived'), size=\"..sum..\"), \n", " labels=layer_labels()\n", " .line('@{..count..}')\n", " .line('(@{..prop..})').format('@{..prop..}', '.0%'),\n", " tooltips=layer_tooltips().title(\"@Port (@{..sum..})\"),\n", " stroke=1.5, \n", " hole=0.5) +\n", " scale_fill_manual(values=[c_lost, c_surv], labels=['no', 'yes']) +\n", " scale_x_discrete(limits=['Cobh', 'Cherbourg', 'Southampton'], expand=[0, 0.3]) +\n", " scale_size(range=[3, 10], guide=\"none\") +\n", " ylim(0, 800) +\n", " labs(x=\"\", y=\"Travellers count\") + \n", " ggtitle(\"Survival by the Port of Embarkation\")\n", ")\n", "\n", "pies + ggsize(800, 300)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 6.2 Spatial Pies\n" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "execution": { "iopub.execute_input": "2024-11-01T20:24:52.944249Z", "iopub.status.busy": "2024-11-01T20:24:52.944076Z", "iopub.status.idle": "2024-11-01T20:24:52.957803Z", "shell.execute_reply": "2024-11-01T20:24:52.957614Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "spatial_pies = (\n", " geom_pie(\n", " aes(x='Port', fill=as_discrete('Survived'), size=\"..sum..\"), \n", " data=df,\n", " map=ports_gdf, \n", " map_join=['Port','city'],\n", " tooltips=layer_tooltips()\n", " .title(\"@Port (@{..sum..})\")\n", " .line('@{..count..} (@{..prop..})')\n", " .format('@{..prop..}', '.0%'),\n", " stroke=1.5, \n", " hole=0.5,\n", " color='white') +\n", " scale_fill_manual(values=[c_lost, c_surv], labels=['lost', 'survived']) +\n", " scale_size(range=[3, 10], guide=\"none\") +\n", " theme(legend_position=[0.5, 1], \n", " legend_justification=[0.5, 1], \n", " legend_direction='horizontal',\n", " legend_title=element_blank())\n", ") \n", "\n", "(basemap + \n", " titanic_path +\n", " spatial_pies +\n", " titanic_site_marker +\n", " NYC_marker\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 6.3 Adjusting the Map Zoom Level and Position" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "execution": { "iopub.execute_input": "2024-11-01T20:24:52.958845Z", "iopub.status.busy": "2024-11-01T20:24:52.958689Z", "iopub.status.idle": "2024-11-01T20:24:52.972853Z", "shell.execute_reply": "2024-11-01T20:24:52.972637Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(ggplot() + ggsize(900, 600) + ggtitle(\"Titanic Survival Rate by Port of Embarkation\") +\n", " geom_livemap(zoom=6, location=[-8.29, 51.85, -1.61, 49.63]) + \n", " titanic_path +\n", " port_markers +\n", " spatial_pies +\n", " titanic_site_marker +\n", " NYC_marker +\n", " theme(text=element_text(family=\"Garamond\"),\n", " plot_title=element_text(size=30))\n", ")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.13" } }, "nbformat": 4, "nbformat_minor": 4 }