{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Styling data frames\n", "\n", "[Data set download](https://s3.amazonaws.com/bebi103.caltech.edu/data/gfmt_sleep.csv)\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "nbsphinx": "hidden", "tags": [] }, "outputs": [], "source": [ "# Colab setup ------------------\n", "import os, sys, subprocess\n", "if \"google.colab\" in sys.modules:\n", " cmd = \"pip install --upgrade watermark\"\n", " process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)\n", " stdout, stderr = process.communicate()\n", " data_path = \"https://s3.amazonaws.com/bebi103.caltech.edu/data/\"\n", "else:\n", " data_path = \"../data/\"" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "It is sometimes useful to highlight features in a data frame when viewing them. (Note that this is generally far less useful than making informative plots, which we will come to shortly.) Pandas offers some convenient ways to style the display of a data frame.\n", "\n", "To demonstrate, we will again use a data set from [Beattie, et al.](https://doi.org/10.1098/rsos.160321) containing results from a study the effects of sleep quality on performance in the [Glasgow Facial Matching Test](https://doi.org/10.3758/BRM.42.1.286) (GMFT)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(os.path.join(data_path, 'gfmt_sleep.csv'), na_values='*')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As our first example demonstrating styling, let's say we wanted to highlight rows corresponding to women who scored at or above 75% correct. We can write a function that will take as an argument a row of the data frame, check the value in the `'gender'` and `'percent correct'` columns, and then specify a row color of gray or green accordingly. We then use `df.style.apply()` with the `axis=1` kwarg to apply that function to each row." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 participant numbergenderagecorrect hit percentagecorrect reject percentagepercent correctconfidence when correct hitconfidence incorrect hitconfidence correct rejectconfidence incorrect rejectconfidence when correctconfidence when incorrectscipsqiess
08f39658072.50000091.00000090.00000093.00000083.50000093.00000090.0000009132
116m42909090.00000075.50000055.50000070.50000050.00000075.00000050.0000004117
218f31909592.50000089.50000090.00000086.00000081.00000089.00000088.0000001093
322f351007587.50000089.500000nan71.00000080.00000088.00000080.00000013820
427f74606562.50000068.50000049.00000061.00000049.00000065.00000049.00000013912
528f61802050.00000071.00000063.00000031.00000072.50000064.50000070.50000015142
630m32907582.50000067.00000056.50000066.00000065.00000066.00000064.0000001693
733m62459067.50000054.00000037.00000065.00000081.50000062.00000061.0000001499
834f338010090.00000070.50000076.50000064.500000nan68.00000076.500000141210
935f531005075.00000074.500000nan60.50000065.00000071.00000065.0000001487
\n" ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def highlight_high_scoring_females(s):\n", " if s[\"gender\"] == \"f\" and s[\"percent correct\"] >= 75:\n", " return [\"background-color: #7fc97f\"] * len(s)\n", " else:\n", " return [\"background-color: lightgray\"] * len(s)\n", "\n", "df.head(10).style.apply(highlight_high_scoring_females, axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can be more fancy. Let's say we want to shade the `'percent correct'` column with a bar corresponding to the value in the column. We use the `df.style.bar()` method to do so. The `subset` kwarg specifies which columns are to have bars." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 participant numbergenderagecorrect hit percentagecorrect reject percentagepercent correctconfidence when correct hitconfidence incorrect hitconfidence correct rejectconfidence incorrect rejectconfidence when correctconfidence when incorrectscipsqiess
08f39658072.50000091.00000090.00000093.00000083.50000093.00000090.0000009132
116m42909090.00000075.50000055.50000070.50000050.00000075.00000050.0000004117
218f31909592.50000089.50000090.00000086.00000081.00000089.00000088.0000001093
322f351007587.50000089.500000nan71.00000080.00000088.00000080.00000013820
427f74606562.50000068.50000049.00000061.00000049.00000065.00000049.00000013912
528f61802050.00000071.00000063.00000031.00000072.50000064.50000070.50000015142
630m32907582.50000067.00000056.50000066.00000065.00000066.00000064.0000001693
733m62459067.50000054.00000037.00000065.00000081.50000062.00000061.0000001499
834f338010090.00000070.50000076.50000064.500000nan68.00000076.500000141210
935f531005075.00000074.500000nan60.50000065.00000071.00000065.0000001487
\n" ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head(10).style.bar(subset=[\"percent correct\"], vmin=0, vmax=100)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that I have used the `vmin=0` and `vmax=100` kwargs to set the base of the bar to be at zero and the maximum to be 100.\n", "\n", "Alternatively, I could color the percent correct according to the percent correct." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 participant numbergenderagecorrect hit percentagecorrect reject percentagepercent correctconfidence when correct hitconfidence incorrect hitconfidence correct rejectconfidence incorrect rejectconfidence when correctconfidence when incorrectscipsqiess
08f39658072.50000091.00000090.00000093.00000083.50000093.00000090.0000009132
116m42909090.00000075.50000055.50000070.50000050.00000075.00000050.0000004117
218f31909592.50000089.50000090.00000086.00000081.00000089.00000088.0000001093
322f351007587.50000089.500000nan71.00000080.00000088.00000080.00000013820
427f74606562.50000068.50000049.00000061.00000049.00000065.00000049.00000013912
528f61802050.00000071.00000063.00000031.00000072.50000064.50000070.50000015142
630m32907582.50000067.00000056.50000066.00000065.00000066.00000064.0000001693
733m62459067.50000054.00000037.00000065.00000081.50000062.00000061.0000001499
834f338010090.00000070.50000076.50000064.500000nan68.00000076.500000141210
935f531005075.00000074.500000nan60.50000065.00000071.00000065.0000001487
\n" ], "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head(10).style.background_gradient(subset=[\"percent correct\"], cmap=\"Reds\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We could have multiple effects together as well." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 participant numbergenderagecorrect hit percentagecorrect reject percentagepercent correctconfidence when correct hitconfidence incorrect hitconfidence correct rejectconfidence incorrect rejectconfidence when correctconfidence when incorrectscipsqiess
08f39658072.50000091.00000090.00000093.00000083.50000093.00000090.0000009132
116m42909090.00000075.50000055.50000070.50000050.00000075.00000050.0000004117
218f31909592.50000089.50000090.00000086.00000081.00000089.00000088.0000001093
322f351007587.50000089.500000nan71.00000080.00000088.00000080.00000013820
427f74606562.50000068.50000049.00000061.00000049.00000065.00000049.00000013912
528f61802050.00000071.00000063.00000031.00000072.50000064.50000070.50000015142
630m32907582.50000067.00000056.50000066.00000065.00000066.00000064.0000001693
733m62459067.50000054.00000037.00000065.00000081.50000062.00000061.0000001499
834f338010090.00000070.50000076.50000064.500000nan68.00000076.500000141210
935f531005075.00000074.500000nan60.50000065.00000071.00000065.0000001487
\n" ], "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head(10).style.bar(\n", " subset=[\"percent correct\"], vmin=0, vmax=100\n", ").apply(\n", " highlight_high_scoring_females, axis=1\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In practice, I almost never use these features because it is almost always better to display results as a plot rather than in tabular form. Still, it can be useful when exploring data sets to highlight certain aspects when exploring data sets in tabular form." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Computing environment" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "tags": [ "hide-input" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Python implementation: CPython\n", "Python version : 3.11.5\n", "IPython version : 8.15.0\n", "\n", "pandas : 2.0.3\n", "jupyterlab: 4.0.6\n", "\n" ] } ], "source": [ "%load_ext watermark\n", "%watermark -v -p pandas,jupyterlab" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" } }, "nbformat": 4, "nbformat_minor": 4 }