{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Contour plots in Python with matplotlib: Easy as X-Y-Z\n", "\n", "When I have continuous data in three dimensions, my first visaulization inclination is to generate a contour plot. While 3-D surface plots might be useful in [some special cases](https://www.visualisingdata.com/2015/03/when-3d-works/), in general I think they should be avoided since they [add a great deal of complexity](https://www.gabrielaplucinska.com/blog/2017/8/7/3d-graphs) to a visualization without adding much (if any) information beyond a 2-D contour plot. \n", "\n", "While I usually use R/ggplot2 to generate my data visualizations, I found the support for good-looking, out-of-the-box contour plots to be a bit lacking. Of course, you can make anything look great with enough effort, but you can also waste an excessive amount of time fiddling with customizable tools. \n", "\n", "This isn't to say the Pythonic contour plot doesn't come with its own set of frustrations, but hopefully this post will make the task easier for any of you going down this road.\n", "**The most difficult part of using the Python/matplotlib implementation of contour plots is formatting your data.** The main plotting function we'll use, ax.contour(X,Y,Z), requires that your three-dimensional input data be in an odd and unintuitive structure. In this post, I'll give you the code to get from a more traditional data structure to this Python-specific format.\n", "\n", "## Data preparation\n", "\n", "To begin, I'll start with some dummy data that is in a standard \"long\" format, where each row corresponds to a single observation. In this case, my three dimensions are just x, y, and z which maps directly to the axes on which we wish to plot them. " ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
xyz
00.0000000.00.392
10.1989700.00.496
20.3494850.00.500
30.5000000.00.500
40.6989700.00.500
\n", "
" ], "text/plain": [ " x y z\n", "0 0.000000 0.0 0.392\n", "1 0.198970 0.0 0.496\n", "2 0.349485 0.0 0.500\n", "3 0.500000 0.0 0.500\n", "4 0.698970 0.0 0.500" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "import numpy as np\n", "data_url = 'https://raw.githubusercontent.com/alexmill/website_notebooks/master/data/data_3d_contour.csv'\n", "contour_data = pd.read_csv(data_url)\n", "contour_data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Nota bene:* For best results, make sure that there is a row for every combination of x and y coordinates in the plane of the range you want to plot. (Said differently, if $X$ is the set of points you want to plot on the $x$-axis and $Y$ is the set of points you want to plot on the $y$-axis, then your dataframe should contain a $z$-value for every point in the Cartesian product of $X \\times Y$.) If you know you're going to be making a contour plot, you can plat ahead of time so your data-generating process results in this format. It's not detrimental if your data don't meet this requirement, but you may get unwanted blank spots in your plot if your data is missing any points in the plane.\n", "\n", "Assuming your data are in a similar format, you can quickly convert it to the requisite structure for matplotlib using the code below. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "Z = contour_data.pivot_table(index='x', columns='y', values='z').T.values\n", "\n", "X_unique = np.sort(contour_data.x.unique())\n", "Y_unique = np.sort(contour_data.y.unique())\n", "X, Y = np.meshgrid(X_unique, Y_unique)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What's going on here? Looking at the Z data first, I've merely used the pivot_table method from pandas to cast my data into a matrix format, where the columns/rows correspond to the values of Z for each of the points in the range of the $x$/$y$-axes. We can see the resulting data structure below:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0123456
00.3920.4960.5000.5000.5000.5000.500
10.2860.4720.4940.5000.5000.5000.500
20.0940.3040.4340.4960.5000.5000.500
3-0.0360.1180.3080.4600.5000.5000.500
4-0.052-0.0420.1200.3280.4800.5000.500
5-0.212-0.192-0.1200.0040.2660.4380.496
6-0.320-0.362-0.348-0.352-0.304-0.247-0.145
7-0.328-0.414-0.454-0.460-0.478-0.474-0.490
\n", "
" ], "text/plain": [ " 0 1 2 3 4 5 6\n", "0 0.392 0.496 0.500 0.500 0.500 0.500 0.500\n", "1 0.286 0.472 0.494 0.500 0.500 0.500 0.500\n", "2 0.094 0.304 0.434 0.496 0.500 0.500 0.500\n", "3 -0.036 0.118 0.308 0.460 0.500 0.500 0.500\n", "4 -0.052 -0.042 0.120 0.328 0.480 0.500 0.500\n", "5 -0.212 -0.192 -0.120 0.004 0.266 0.438 0.496\n", "6 -0.320 -0.362 -0.348 -0.352 -0.304 -0.247 -0.145\n", "7 -0.328 -0.414 -0.454 -0.460 -0.478 -0.474 -0.490" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(Z).round(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This by itself is not terribly unintuitive, but the odd part about matplotlib's contour method is that it also requires your X and Y data to have the exact same shape as your Z data. This means that we need to *duplicate* our $x$ and $y$ values along different axes, so that each entry in Z has its corresponding $x$ and $y$ coordinates in the same entry of the X and Y matrices. Fortunately, the meshgrid method from numpy will do this automatically for us.\n", "\n", "To help you visualize exacctly what meshgrid is doing, first notice the unique values in each of my x/y axes:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([0. , 0.19897 , 0.349485, 0.5 , 0.69897 , 0.849485,\n", " 1. ]),\n", " array([0. , 0.26315789, 0.52631579, 0.63157895, 0.73684211,\n", " 0.84210526, 0.94736842, 1. ]))" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X_unique,Y_unique" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And now let's display the matrices X and Y generated by np.meshgrid(X_unique, Y_unique):" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0123456
00.00.1990.3490.50.6990.8491.0
10.00.1990.3490.50.6990.8491.0
20.00.1990.3490.50.6990.8491.0
30.00.1990.3490.50.6990.8491.0
40.00.1990.3490.50.6990.8491.0
50.00.1990.3490.50.6990.8491.0
60.00.1990.3490.50.6990.8491.0
70.00.1990.3490.50.6990.8491.0
\n", "
" ], "text/plain": [ " 0 1 2 3 4 5 6\n", "0 0.0 0.199 0.349 0.5 0.699 0.849 1.0\n", "1 0.0 0.199 0.349 0.5 0.699 0.849 1.0\n", "2 0.0 0.199 0.349 0.5 0.699 0.849 1.0\n", "3 0.0 0.199 0.349 0.5 0.699 0.849 1.0\n", "4 0.0 0.199 0.349 0.5 0.699 0.849 1.0\n", "5 0.0 0.199 0.349 0.5 0.699 0.849 1.0\n", "6 0.0 0.199 0.349 0.5 0.699 0.849 1.0\n", "7 0.0 0.199 0.349 0.5 0.699 0.849 1.0" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(X).round(3)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0123456
00.0000.0000.0000.0000.0000.0000.000
10.2630.2630.2630.2630.2630.2630.263
20.5260.5260.5260.5260.5260.5260.526
30.6320.6320.6320.6320.6320.6320.632
40.7370.7370.7370.7370.7370.7370.737
50.8420.8420.8420.8420.8420.8420.842
60.9470.9470.9470.9470.9470.9470.947
71.0001.0001.0001.0001.0001.0001.000
\n", "
" ], "text/plain": [ " 0 1 2 3 4 5 6\n", "0 0.000 0.000 0.000 0.000 0.000 0.000 0.000\n", "1 0.263 0.263 0.263 0.263 0.263 0.263 0.263\n", "2 0.526 0.526 0.526 0.526 0.526 0.526 0.526\n", "3 0.632 0.632 0.632 0.632 0.632 0.632 0.632\n", "4 0.737 0.737 0.737 0.737 0.737 0.737 0.737\n", "5 0.842 0.842 0.842 0.842 0.842 0.842 0.842\n", "6 0.947 0.947 0.947 0.947 0.947 0.947 0.947\n", "7 1.000 1.000 1.000 1.000 1.000 1.000 1.000" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(Y).round(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I'm not a huge fan of this formatting requirement since we have to duplicate a bunch of data, but hopefully I've helped you understand the basic process required to get here from a more standard \"long\" data format." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### matplotlib's default contour plot \n", "\n", "Now that my data is in the correct format for matplotlib to understand, I can generate my first pass at a contour plot:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n" ], "text/plain": [ "