{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", " \n", "## [mlcourse.ai](https://mlcourse.ai) – Open Machine Learning Course \n", "###
Author: Pisarev Ivan, ODS Slack: pisarev_i\n", " \n", "##
Tutorial\n", "##
Customizing some details in Matplotlib" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Intro" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the Matplotlib library for visualization is great because you can customize nothing if you don't need it, and everything works fine out of the box. \n", "Another great thing is you can customize almost any part of the plot as you wish, the tuning options are very wide. \n", " \n", "A good source of information about the possibilities of Matplotlib is a Gallery and tutorials on the project website. In the gallery you can find an example for any need, it is enough to imagine what and how you want to visualize - and you will find the implementation of your imagination in the gallery. \n", "In this tutorial, we will not retell the User's Guide. \n", "We just will create some plots and make corrections to them to better convey the idea." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Backend and style" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib\n", "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Backend specify different output formats, and there are two types of backends: user interface (interactive) backends and hardcopy (non-interactive) backends to make image files. You can see available backends:\n", "```python\n", "matplotlib.rcsetup.interactive_bk\n", "matplotlib.rcsetup.non_interactive_bk\n", "```\n", "\n", "If you want to get more interactive capabilities with your plots (such as zoom in, zoom out etc.) you can choose an appropriate backend\n", "```python\n", "matplotlib.use('nbagg')\n", "```\n", "before you call\n", "```python\n", "import matplotlib.pyplot as plt\n", "```\n", "We will not do that." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "\n", "%matplotlib inline\n", "import seaborn as sns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What you really need is to choose a style. In Appendix 1 you can see the available styles. \n", "The style defines many parameters of the chart, so if you can not live without a grid and with a gray background - first of all set your favorite style. \n", "I will:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.rcdefaults()\n", "plt.style.use(\"seaborn-whitegrid\")\n", "plt.rcParams[\"figure.figsize\"] = (8, 4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's get Olympic dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dfAthlete = pd.read_csv(\"../../data/athlete_events.csv\")\n", "dfNOC = pd.read_csv(\"../../data/noc_regions.csv\")\n", "\n", "dfNOC.columns = [\"NOC\", \"NOC_Region\", \"NOC_Notes\"]\n", "dfAthlete = pd.merge(\n", " left=dfAthlete, right=dfNOC, left_on=\"NOC\", right_on=\"NOC\", how=\"left\"\n", ")\n", "dfAthlete.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our first idea would be to demonstrate the growth of some values over the years." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(\n", " dfAthlete[dfAthlete[\"Season\"] == \"Summer\"].groupby(\"Year\")[\"ID\"].nunique(), \"ro-\"\n", ")\n", "plt.plot(\n", " dfAthlete[dfAthlete[\"Season\"] == \"Winter\"].groupby(\"Year\")[\"ID\"].nunique(), \"bo-\"\n", ")\n", "plt.xlabel(\"Years\", fontsize=14)\n", "plt.ylabel(\"Athletes\", fontsize=14)\n", "plt.title(\"The number of Athletes\", fontsize=16);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The total number of athletes taking part in the games is growing over time, obviously. \n", "Autoscaling has done well, but we want better. Let's consider, what we can do with a grid." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Ticks and grid" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we need to briefly describe the relationship between `Figure`, `Axes`, `Subplots` and `pyplot`. \n", " - The Figure is the top-level container. Figure can contain multiple Axes\n", " - The Axes is the area that we plot data on\n", " - Subplot places Axes on a regular grid\n", " - When we call most `pylot` functions, pyplot calls Axes method on Axes is \"current\" \n", "\n", "We purposely used `pylot` functions above, because it is faster and easier. \n", "But further we instead of this record:\n", "```python\n", "plt.plot(...)\n", "plt.title(...)\n", "```\n", "will use this record:\n", "```python\n", "fig, ax = plt.subplots()\n", "ax.plot(...)\n", "ax.set_title(...)\n", "```\n", "Yes, it's longer, but it's more explicit and this variant we can use in multiple axes with subplots. \n", "\n", "Ok, we want to turn on minor lines for XAxis and show it every 4 years. \n", "We can do it something like this:\n", "```python\n", "ax.xaxis.set_ticks(np.arange(minYear, maxYear, 4), minor=True) # set minor ticks location\n", "ax.grid(True, which='minor', linestyle='dotted') # turn minor ticks lines on\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots()\n", "ax.plot(\n", " dfAthlete[dfAthlete[\"Season\"] == \"Summer\"].groupby(\"Year\")[\"ID\"].nunique(), \"ro-\"\n", ")\n", "ax.plot(\n", " dfAthlete[dfAthlete[\"Season\"] == \"Winter\"].groupby(\"Year\")[\"ID\"].nunique(), \"bo-\"\n", ")\n", "ax.xaxis.set_ticks(np.arange(1896, 2020, 4), minor=True)\n", "ax.grid(True, which=\"minor\", linestyle=\"dotted\")\n", "ax.set_xlabel(\"Years\", fontsize=14)\n", "ax.set_ylabel(\"Athletes\", fontsize=14)\n", "ax.set_title(\"The number of Athletes\", fontsize=16);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But there is a better way for location ticks. \n", "We can use function `MultipleLocator`, it is just what we need in this case.\n", "Do not invent your own ways to location ticks until you have looked into `matplotlib.ticker`. There is many useful locator for any cases (but for datetime you can use lacators from `matplotlib.dates`). \n", "So, let's\n", " - Use `MultipleLocator` for minor (ticks every 4 years) and major (every 24 years) ticks. \n", " - For YAxis ticks we will use `MaxNLocator` (no more than 3 ticks). \n", " - Also, we add a legend, increase linewidth, marker and label size, add 5% \"padding\" to a plot in the y-direction." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from matplotlib.ticker import MaxNLocator, MultipleLocator\n", "\n", "fig, ax = plt.subplots()\n", "ax.plot(\n", " dfAthlete[dfAthlete[\"Season\"] == \"Summer\"].groupby(\"Year\")[\"ID\"].nunique(),\n", " color=\"r\",\n", " linewidth=4,\n", " marker=\"o\",\n", " markersize=8,\n", " markerfacecolor=\"w\",\n", " markeredgecolor=\"r\",\n", " label=\"Summer\",\n", ")\n", "ax.plot(\n", " dfAthlete[dfAthlete[\"Season\"] == \"Winter\"].groupby(\"Year\")[\"ID\"].nunique(),\n", " color=\"b\",\n", " linewidth=4,\n", " marker=\"o\",\n", " markersize=8,\n", " markerfacecolor=\"w\",\n", " markeredgecolor=\"b\",\n", " label=\"Winter\",\n", ")\n", "ax.xaxis.set_major_locator(MultipleLocator(24)) # Major ticks every 24 years\n", "ax.xaxis.set_minor_locator(MultipleLocator(4)) # Minor ticks every 4 years\n", "ax.yaxis.set_major_locator(MaxNLocator(3)) # Major ticks, maximum 3 lines\n", "ax.grid(True, which=\"minor\", linestyle=\"dotted\") # Minor lines dotted\n", "ax.margins(x=0.05, y=0.1) # padding\n", "ax.tick_params(labelsize=12)\n", "ax.legend(loc=\"center left\", frameon=True, fontsize=12) # legend with frame\n", "ax.set_xlabel(\"Years\", fontsize=14)\n", "ax.set_ylabel(\"Athletes\", fontsize=14)\n", "ax.set_title(\"The number of Athletes\", fontsize=16)\n", "plt.show();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sublots" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can add the same plot for the number of Events, and gather them together. \n", "We want to use one XAxis for both plots so we will use\n", "```python\n", "plt.subplots(nrows=2, ncols=1, sharex=True)\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, axes = plt.subplots(nrows=2, ncols=1, sharex=True)\n", "axes[0].plot(\n", " dfAthlete[dfAthlete[\"Season\"] == \"Summer\"].groupby(\"Year\")[\"ID\"].nunique(),\n", " color=\"r\",\n", " linewidth=4,\n", " marker=\"o\",\n", " markersize=8,\n", " markerfacecolor=\"w\",\n", " markeredgecolor=\"r\",\n", " label=\"Summer\",\n", ")\n", "axes[0].plot(\n", " dfAthlete[dfAthlete[\"Season\"] == \"Winter\"].groupby(\"Year\")[\"ID\"].nunique(),\n", " color=\"b\",\n", " linewidth=4,\n", " marker=\"o\",\n", " markersize=8,\n", " markerfacecolor=\"w\",\n", " markeredgecolor=\"b\",\n", " label=\"Winter\",\n", ")\n", "axes[0].grid(True, which=\"minor\", linestyle=\"dotted\")\n", "axes[0].yaxis.set_major_locator(MaxNLocator(3))\n", "axes[0].margins(x=0.05, y=0.1)\n", "axes[0].legend(loc=\"center left\", frameon=True, fontsize=12)\n", "axes[0].set_xlabel(\"Years\", fontsize=14)\n", "axes[0].set_ylabel(\"Athletes\", fontsize=14)\n", "axes[0].set_title(\"The number of Athletes\", fontsize=16)\n", "\n", "axes[1].plot(\n", " dfAthlete[dfAthlete[\"Season\"] == \"Summer\"].groupby(\"Year\")[\"Event\"].nunique(),\n", " color=\"r\",\n", " linewidth=4,\n", " marker=\"o\",\n", " markersize=8,\n", " markerfacecolor=\"w\",\n", " markeredgecolor=\"r\",\n", " label=\"Summer\",\n", ")\n", "axes[1].plot(\n", " dfAthlete[dfAthlete[\"Season\"] == \"Winter\"].groupby(\"Year\")[\"Event\"].nunique(),\n", " color=\"b\",\n", " linewidth=4,\n", " marker=\"o\",\n", " markersize=8,\n", " markerfacecolor=\"w\",\n", " markeredgecolor=\"b\",\n", " label=\"Winter\",\n", ")\n", "\n", "axes[1].xaxis.set_major_locator(MultipleLocator(24))\n", "axes[1].xaxis.set_minor_locator(MultipleLocator(4))\n", "axes[1].yaxis.set_major_locator(MaxNLocator(3))\n", "axes[1].grid(True, which=\"minor\", linestyle=\"dotted\")\n", "axes[1].margins(x=0.05, y=0.1)\n", "axes[1].legend(loc=\"center left\", frameon=True, fontsize=12)\n", "axes[1].set_xlabel(\"Years\", fontsize=14)\n", "axes[1].set_ylabel(\"Events\", fontsize=14)\n", "axes[1].set_title(\"The number of Events\", fontsize=16)\n", "\n", "plt.show();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Oh, something wrong with labels and titles. \n", "In this case, possible solutions are: \n", " - If you want matplotlib automatically adjusts subplot params so that the subplots fit into the figure area, you can use `fig.tight_layout()`.\n", " - You can increase space between subplots by `fig.subplots_adjust(hspace=XX)`\n", " - We just increase `figure.figsize`.\n", "\n", "\n", "Now let's set location of shared xticks. For example, we can locate it between two plots. \n", "We can do so: \n", " - for second plot move xticks to the top `axes[1].xaxis.tick_top()`\n", " - hide xlabel `axes[1].set_xlabel('')`\n", " - increase space between xticks-labels and plot `axes[1].tick_params(axis='x', pad=5)`\n", " - reduce space between plots `fig.subplots_adjust(hspace=0.1)`\n", "\n", "Let's try." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, axes = plt.subplots(nrows=2, ncols=1, sharex=True, figsize=(10, 8))\n", "axes[0].plot(\n", " dfAthlete[dfAthlete[\"Season\"] == \"Summer\"].groupby(\"Year\")[\"ID\"].nunique(),\n", " color=\"r\",\n", " linewidth=4,\n", " marker=\"o\",\n", " markersize=8,\n", " markerfacecolor=\"w\",\n", " markeredgecolor=\"r\",\n", " label=\"Summer\",\n", ")\n", "axes[0].plot(\n", " dfAthlete[dfAthlete[\"Season\"] == \"Winter\"].groupby(\"Year\")[\"ID\"].nunique(),\n", " color=\"b\",\n", " linewidth=4,\n", " marker=\"o\",\n", " markersize=8,\n", " markerfacecolor=\"w\",\n", " markeredgecolor=\"b\",\n", " label=\"Winter\",\n", ")\n", "axes[0].grid(True, which=\"minor\", linestyle=\"dotted\")\n", "axes[0].yaxis.set_major_locator(MaxNLocator(3))\n", "axes[0].margins(x=0.05, y=0.1)\n", "axes[0].set_xlabel(\"Years\", fontsize=14)\n", "axes[0].set_ylabel(\"Athletes\", fontsize=14)\n", "axes[0].set_title(\"The number of athletes and events is growing\", fontsize=16)\n", "# Common title\n", "\n", "axes[1].plot(\n", " dfAthlete[dfAthlete[\"Season\"] == \"Summer\"].groupby(\"Year\")[\"Event\"].nunique(),\n", " color=\"r\",\n", " linewidth=4,\n", " marker=\"o\",\n", " markersize=8,\n", " markerfacecolor=\"w\",\n", " markeredgecolor=\"r\",\n", " label=\"Summer\",\n", ")\n", "axes[1].plot(\n", " dfAthlete[dfAthlete[\"Season\"] == \"Winter\"].groupby(\"Year\")[\"Event\"].nunique(),\n", " color=\"b\",\n", " linewidth=4,\n", " marker=\"o\",\n", " markersize=8,\n", " markerfacecolor=\"w\",\n", " markeredgecolor=\"b\",\n", " label=\"Winter\",\n", ")\n", "\n", "axes[1].xaxis.set_major_locator(MultipleLocator(24))\n", "axes[1].xaxis.set_minor_locator(MultipleLocator(4))\n", "axes[1].yaxis.set_major_locator(MaxNLocator(3))\n", "axes[1].grid(True, which=\"minor\", linestyle=\"dotted\")\n", "axes[1].margins(x=0.05, y=0.1)\n", "axes[1].legend(loc=\"upper left\", frameon=True, fontsize=12)\n", "# Common legend\n", "axes[1].set_xlabel(\"\")\n", "# Hide x-label\n", "axes[1].xaxis.tick_top()\n", "# Move ticks to top\n", "axes[1].tick_params(axis=\"x\", pad=5) # Increase space to plot\n", "axes[1].set_ylabel(\"Events\", fontsize=14)\n", "\n", "fig.subplots_adjust(hspace=0.1) # Reduce space between plots\n", "plt.show();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Style settings" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We are going to add two more plots in the same style. So, we can get tired to set the same linewidth, markersize etc every time. \n", "If we plan to customize more parameters and use this style again, we can save `.mplstyle` file to `mpl_configdir/stylelib` with something like\n", " - lines.linewidth : 3\n", " - lines.markersize : 6\n", " - lines.marker : 'o'\n", " - lines.markerfacecolor : 'white'\n", "\n", "And then get your style with `plt.style.use()` everytime you need it.\n", "\n", "Now we just set some common parameters with rcParams:\n", "```python\n", "plt.rcParams['lines.linewidth'] = 3\n", "plt.rcParams['lines.marker'] = 'o'\n", "plt.rcParams['lines.markersize'] = 6\n", "plt.rcParams['lines.markerfacecolor'] = 'white'\n", "```\n", "\n", "After we finished to work with this parameters, we need to return to defaults settings:\n", "```python\n", "plt.rcdefaults()\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.rcdefaults()\n", "plt.style.use(\"seaborn-whitegrid\")\n", "\n", "plt.rcParams[\"lines.linewidth\"] = 3\n", "plt.rcParams[\"lines.marker\"] = \"o\"\n", "plt.rcParams[\"lines.markersize\"] = 6\n", "plt.rcParams[\"lines.markerfacecolor\"] = \"white\"\n", "\n", "color = sns.color_palette(\"Paired\")\n", "\n", "fig, axes = plt.subplots(nrows=2, ncols=1, sharex=True, figsize=(10, 8))\n", "\n", "axes[0].plot(\n", " dfAthlete[(dfAthlete[\"Sex\"] == \"F\") & (dfAthlete[\"Season\"] == \"Summer\")][\n", " [\"ID\", \"Year\", \"Height\"]\n", " ]\n", " .drop_duplicates()\n", " .groupby(\"Year\")[\"Height\"]\n", " .mean(),\n", " color=color[5],\n", " markeredgecolor=color[5],\n", " label=\"Female Summer\",\n", ")\n", "axes[0].plot(\n", " dfAthlete[(dfAthlete[\"Sex\"] == \"F\") & (dfAthlete[\"Season\"] == \"Winter\")][\n", " [\"ID\", \"Year\", \"Height\"]\n", " ]\n", " .drop_duplicates()\n", " .groupby(\"Year\")[\"Height\"]\n", " .mean(),\n", " color=color[4],\n", " markeredgecolor=color[4],\n", " label=\"Female Winter\",\n", ")\n", "axes[0].plot(\n", " dfAthlete[(dfAthlete[\"Sex\"] == \"M\") & (dfAthlete[\"Season\"] == \"Summer\")][\n", " [\"ID\", \"Year\", \"Height\"]\n", " ]\n", " .drop_duplicates()\n", " .groupby(\"Year\")[\"Height\"]\n", " .mean(),\n", " color=color[1],\n", " markeredgecolor=color[1],\n", " label=\"Male Summer\",\n", ")\n", "axes[0].plot(\n", " dfAthlete[(dfAthlete[\"Sex\"] == \"M\") & (dfAthlete[\"Season\"] == \"Winter\")][\n", " [\"ID\", \"Year\", \"Height\"]\n", " ]\n", " .drop_duplicates()\n", " .groupby(\"Year\")[\"Height\"]\n", " .mean(),\n", " color=color[0],\n", " markeredgecolor=color[0],\n", " label=\"Male Winter\",\n", ")\n", "axes[0].yaxis.set_major_locator(MaxNLocator(6))\n", "axes[0].grid(True, which=\"minor\", linestyle=\"dotted\")\n", "axes[0].margins(x=0.05, y=0.1)\n", "axes[0].set_xlabel(\"Years\", fontsize=12)\n", "axes[0].set_ylabel(\"Height\", fontsize=12)\n", "axes[0].set_title(\"The average height and weight of the Athletes\", fontsize=14)\n", "\n", "axes[1].plot(\n", " dfAthlete[(dfAthlete[\"Sex\"] == \"F\") & (dfAthlete[\"Season\"] == \"Summer\")][\n", " [\"ID\", \"Year\", \"Weight\"]\n", " ]\n", " .drop_duplicates()\n", " .groupby(\"Year\")[\"Weight\"]\n", " .mean(),\n", " color=color[5],\n", " markeredgecolor=color[5],\n", " label=\"Female Summer\",\n", ")\n", "axes[1].plot(\n", " dfAthlete[(dfAthlete[\"Sex\"] == \"F\") & (dfAthlete[\"Season\"] == \"Winter\")][\n", " [\"ID\", \"Year\", \"Weight\"]\n", " ]\n", " .drop_duplicates()\n", " .groupby(\"Year\")[\"Weight\"]\n", " .mean(),\n", " color=color[4],\n", " markeredgecolor=color[4],\n", " label=\"Female Winter\",\n", ")\n", "axes[1].plot(\n", " dfAthlete[(dfAthlete[\"Sex\"] == \"M\") & (dfAthlete[\"Season\"] == \"Summer\")][\n", " [\"ID\", \"Year\", \"Weight\"]\n", " ]\n", " .drop_duplicates()\n", " .groupby(\"Year\")[\"Weight\"]\n", " .mean(),\n", " color=color[1],\n", " markeredgecolor=color[1],\n", " label=\"Male Summer\",\n", ")\n", "axes[1].plot(\n", " dfAthlete[(dfAthlete[\"Sex\"] == \"M\") & (dfAthlete[\"Season\"] == \"Winter\")][\n", " [\"ID\", \"Year\", \"Weight\"]\n", " ]\n", " .drop_duplicates()\n", " .groupby(\"Year\")[\"Weight\"]\n", " .mean(),\n", " color=color[0],\n", " markeredgecolor=color[0],\n", " label=\"Male Winter\",\n", ")\n", "\n", "axes[1].set_ylabel(\"Weight\", fontsize=12)\n", "axes[1].xaxis.set_major_locator(MultipleLocator(24))\n", "axes[1].xaxis.set_minor_locator(MultipleLocator(4))\n", "axes[1].yaxis.set_major_locator(MaxNLocator(6))\n", "axes[1].grid(True, which=\"minor\", linestyle=\"dotted\")\n", "axes[1].margins(x=0.05, y=0.1)\n", "axes[1].set_xlabel(\"\")\n", "axes[1].xaxis.tick_top()\n", "axes[1].tick_params(axis=\"x\", pad=5)\n", "\n", "fig.subplots_adjust(hspace=0.1)\n", "plt.show();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Colormaps" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How you can see in the list of available colormaps, they are:\n", " - Sequential: colormaps varying smoothly between two color tones, good to show progression from low to high values\n", " - Diverging: have a median value, good when data has a significant median value\n", " - Qualitative: for choosing a discrete color\n", " - Other (cyclic, miscellaneous)\n", "\n", "We used qualitative colors from colormap Paired above for Male/Female and Summer/Winter gradation.\n", "```python\n", "color = sns.color_palette('Paired')\n", "ax.plot(...,color=color[5])\n", "``` \n", "\n", "Now let's use some diverging colormap for heatmap. Let's look at the change in the number of medals per athlete in 10 countries that have got the most medals in the last 5 games." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "TopTen = (\n", " dfAthlete[\n", " (dfAthlete[\"Season\"] == \"Summer\")\n", " & (dfAthlete[\"Year\"] >= 2000)\n", " & (~dfAthlete[\"Medal\"].isna())\n", " ][[\"Year\", \"NOC\", \"Event\", \"Medal\"]]\n", " .drop_duplicates()\n", " .groupby(\"NOC\")\n", " .size()\n", " .sort_values(ascending=False)\n", " .head(10)\n", " .index.values\n", ")\n", "TopTen_MedalRate = []\n", "for c in TopTen:\n", " for y in range(2000, 2020, 4):\n", " TopTen_MedalRate.append(\n", " [\n", " y,\n", " c,\n", " dfAthlete[\n", " (dfAthlete[\"Season\"] == \"Summer\")\n", " & (dfAthlete[\"Year\"] == y)\n", " & (dfAthlete[\"NOC\"] == c)\n", " & (~dfAthlete[\"Medal\"].isna())\n", " ][[\"Event\", \"Medal\"]]\n", " .drop_duplicates()[\"Event\"]\n", " .count(),\n", " dfAthlete[\n", " (dfAthlete[\"Season\"] == \"Summer\")\n", " & (dfAthlete[\"Year\"] == y)\n", " & (dfAthlete[\"NOC\"] == c)\n", " ][[\"ID\"]]\n", " .nunique()\n", " .values[0],\n", " ]\n", " )\n", "dfTopTen_MedalRate = pd.DataFrame(\n", " TopTen_MedalRate, columns=[\"Year\", \"NOC\", \"Medals\", \"Athletes\"]\n", ")\n", "dfTopTen_MedalRate[\"MedalsToAthlete\"] = (\n", " dfTopTen_MedalRate[\"Medals\"] / dfTopTen_MedalRate[\"Athletes\"]\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.rcdefaults()\n", "plt.style.use(\"seaborn-whitegrid\")\n", "sns.heatmap(\n", " dfTopTen_MedalRate.pivot_table(\n", " columns=\"Year\", index=\"NOC\", values=\"MedalsToAthlete\"\n", " ),\n", " annot=True,\n", " fmt=\".5f\",\n", " linewidths=0.5,\n", " cmap=\"RdGy_r\",\n", ");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can manually set center if we know the boundary separating the losers (we will consider as losers all who doesn't hold out to 0.16 medals on the athlete). Also, we can rotate yticks labels, and increase space from it to the plot." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots()\n", "sns.heatmap(\n", " dfTopTen_MedalRate.pivot_table(\n", " columns=\"Year\", index=\"NOC\", values=\"MedalsToAthlete\"\n", " ),\n", " annot=True,\n", " fmt=\".5f\",\n", " linewidths=0.5,\n", " cmap=\"RdGy_r\",\n", " center=0.16,\n", " ax=ax,\n", ")\n", "plt.setp(ax.yaxis.get_majorticklabels(), rotation=0)\n", "ax.tick_params(axis=\"y\", pad=10)\n", "ax.set_title(\n", " \"Change Medal-per-Athlete in last 5 Summer Olympic Games\",\n", " fontsize=12,\n", " weight=\"bold\",\n", ")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Usage of the miscellaneous colormap we will show on the rating data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.colors as clrs\n", "import numpy as np\n", "import pandas as pd\n", "\n", "dfMLO = pd.read_csv(\"../../data/English session 2 rating - Sheet1.csv\")\n", "dfMLO.fillna(value=0, inplace=True)\n", "AssignColimns = dfMLO.columns.values[\n", " [True if x.find(\"A\") == 0 else False for x in dfMLO.columns.values]\n", "]\n", "CURRENT_STEP = 9\n", "dfPlaces = pd.DataFrame(\n", " {\"Place\": range(1, 1 + dfMLO[\"Overall\"].drop_duplicates().shape[0])},\n", " index=dfMLO[\"Overall\"].drop_duplicates(),\n", ")\n", "dfMLO = pd.merge(\n", " left=dfMLO, right=dfPlaces, left_on=\"Overall\", right_index=True, how=\"left\"\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "MaxPlace = dfMLO.iloc[-1][\"Place\"]\n", "MaxGroupMembers = dfMLO.groupby(\"Overall\").size().max()\n", "plt.rcParams[\"figure.figsize\"] = (20, 20)\n", "\n", "cmap = plt.cm.rainbow\n", "cl_norm = clrs.Normalize(vmin=1, vmax=MaxPlace)\n", "\n", "fig, (ax1, ax2) = plt.subplots(2, 1, sharex=True)\n", "\n", "sns.countplot(x=\"Overall\", data=dfMLO, palette=\"rainbow\", ax=ax1)\n", "ax1.set_title(str(dfMLO.shape[0]) + \" participants\", fontsize=16)\n", "ax1.set_xlabel(\"Points\")\n", "ax1.set_xlim([dfMLO.iloc[-1][\"Overall\"] - 1, dfMLO.iloc[0][\"Overall\"] + 2])\n", "ax1.set_yticks(np.arange(0, ((MaxGroupMembers // 50) + 1) * 50, 50))\n", "ax1.set_yticks(np.arange(0, ((MaxGroupMembers // 10) + 1) * 10, 10), minor=True)\n", "ax1.grid(axis=\"y\", which=\"minor\", linestyle=\"dotted\")\n", "ax1.set_ylabel(\"Number of persons\", fontsize=16)\n", "\n", "for i in range(dfMLO.shape[0] - 1, 0, -1):\n", " ax2.plot(\n", " dfMLO.iloc[i][AssignColimns[:CURRENT_STEP]].cumsum().values,\n", " range(1, len(AssignColimns[:CURRENT_STEP]) + 1),\n", " color=cmap(cl_norm(MaxPlace - dfMLO.iloc[i][\"Place\"])),\n", " marker=\"o\",\n", " linestyle=\"-\",\n", " alpha=0.3,\n", " )\n", "\n", "ax2.set_yticks(ticks=range(1, len(AssignColimns[:CURRENT_STEP]) + 1))\n", "ax2.set_yticklabels(labels=AssignColimns[:CURRENT_STEP])\n", "ax2.set_ylabel(\"Assignment\", fontsize=16)\n", "ax2.xaxis.tick_top()\n", "ax2.set_xlim([dfMLO.iloc[-1][\"Overall\"] - 1, dfMLO.iloc[0][\"Overall\"] + 2])\n", "ax2.set_xticks(np.arange(0, ((dfMLO.iloc[0][\"Overall\"] // 5) + 1) * 5, 5))\n", "ax2.set_xticklabels(\n", " range(0, ((dfMLO.iloc[0][\"Overall\"] // 5) + 1) * 5, 5), rotation=\"vertical\"\n", ")\n", "ax2.tick_params(axis=\"x\", pad=20)\n", "ax2.grid(None)\n", "\n", "plt.subplots_adjust(hspace=0.1)\n", "plt.show();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Colormap here is sets by place in current rating and shows the \"way\" of participant in all previous assignment:\n", "```python\n", "import matplotlib.colors as clrs\n", "cmap = plt.cm.rainbow\n", "cl_norm = clrs.Normalize(vmin=1, vmax=MaxPlace)\n", "ax.plot(..., color=cmap(cl_norm(...))\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Conclusion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We very briefly reviewed:\n", " - Figures, Axes, Subplots\n", " - Ticks, Tickers (locators), Tick labels, grid\n", " - Styles, set common parameters in rcParams\n", " - Colormaps\n", "\n", "In preparation for this tutorial, I did not met the limits of the capabilities of Matplotlib. To find them, we need more experience. Good luck with that!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Refereces" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " 1. User's Guide\n", " 2. Tutorials\n", " 3. Gallery\n", " 4. https://github.com/matplotlib/AnatomyOfMatplotlib" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Appendix I" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for st in plt.style.available:\n", " plt.rcdefaults()\n", " plt.style.use(st)\n", " plt.plot(range(10), np.random.normal(5, 3, 10))\n", " plt.title(st)\n", " plt.show();" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0" } }, "nbformat": 4, "nbformat_minor": 2 }