{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "In this demo, we will be exploring how world developmental indicators are related to a country’s early effort in COVID-19 response. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " \n", "

Visualizations of dataframes beyond simple tables

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To enable Lux, simply add `import lux` along with your Pandas import statement." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "import lux\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lux preserves the Pandas dataframe semantics -- which means that you can apply any command from Pandas's API to the dataframes in Lux and expect the same behavior. For example, we can load the [Happy Planet Index (HPI)](http://happyplanetindex.org/) dataset via standard Pandas `read_csv` command." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(\"https://github.com/lux-org/lux-datasets/blob/master/data/hpi_full.csv?raw=True\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can quickly get an overview of the dataframe, simply by print out the dataframe `df`. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From the Pandas table view, we see that the dataframe contains country-level data on sustainability and well-being.\n", "By clicking on the Toggle button, you can now explore the data visually through Lux, you should see several tabs of visualizations recommended to you that includes scatterplots, bar charts, and maps. In Lux, we recommend visualizations that may be relevant or interesting to you across different [actions](https://lux-api.readthedocs.io/en/latest/source/getting_started/overview.html#visualizing-dataframes-with-recommendations), which are displayed as different tabs." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " \n", "

Data Manipulation + Vis without changing a line of your Pandas code

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lux is designed to be tightly integrated with Pandas and can be used as-is, without modifying your existing Pandas code. This means that you can seamlessly transition from doing data cleaning and transformation to visualizing your dataframes with no effort. The goal of this section is largely to demonstrate how Lux can help you visualize your dataframe in a realistic scenario that involves lots of complex data cleaning and transformation\n", "\n", "__Note:__ If you're short on time, you can quickly execute these cells and skip to the next section." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We add an additional feature column, describing whether the country is one of the G10 nations\n", "df[\"G10\"] = df[\"Country\"].isin([\"Belgium\",\"Canada\",\"France\",\"Germany\",\"Italy\",\"Japan\",\"Netherlands\",\"United Kingdom\",\"Switzerland\",\"Sweden\",\"United States of America\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We drop the Inequality Adjusted measures since they are clearly correlated with each other, also dropping HPI Rank and just keeping Happy Planet Index." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = df[df.columns.drop(list(df.filter(regex='IneqAdj'))+[\"HPIRank\"])]\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now after dropping these columns, the correlations are a bit more realistic." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `Country` column needs to be assigned to a code that is easier to work with later on. So we load in [countries dataset](https://github.com/mledoze/countries) that contains the ISO-3 country code and information such as currency, language, and geography." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "countries = pd.read_csv(\"https://github.com/lux-org/lux-datasets/blob/master/data/countries.csv?raw=True\")\n", "countries[\"Country\"]=countries[\"name\"].apply(lambda x:x.split(\",\")[0])\n", "countries.loc[countries[\"Country\"]=='United States',\"Country\"] = 'United States of America'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "countries" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The countries dataset has some additional features column that we can add in\n", "countries[\"landlocked\"] = countries[\"landlocked\"].fillna(\"False\").replace(1,\"True\")\n", "countries[\"NumOfficialLanguages\"]=countries.languages.str.count(\",\")+1\n", "countries[\"NumBorderingCountries\"]=countries.borders.str.count(\",\")+1\n", "countries[\"NumBorderingCountries\"]=countries[\"NumBorderingCountries\"].fillna(0)\n", "countries = countries[['Country','cca3', 'landlocked', \"NumOfficialLanguages\", \"NumBorderingCountries\",'area']]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Combining the HPI information to get ISO-3 code\n", "df = df.merge(countries)\n", "df = df.rename(index=str, columns={\"SubRegion\":\"Region\",\"subregion\":\"SubRegion\"})\n", "df[\"Region\"] = df.Region.replace(\"Middle East and North Africa\",\"Middle East\")\n", "df.area = df.area.astype(int)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Ensure well-formatted country names based on: https://github.com/deactivated/python-iso3166/blob/master/iso3166/__init__.py\n", "df.loc[df.Country==\"Russia\",\"Country\"]=\"Russian Federation\"\n", "df.loc[df[\"Country\"]==\"Czech Republic\",\"Country\"]=\"Czechia\"\n", "df.loc[df.Country==\"DR Congo\",\"Country\"]=\"Congo, Democratic Republic of the\"#not working?\n", "df.loc[df.Country==\"Bolivia\",\"Country\"]=\"Bolivia, Plurinational State of\"\n", "df.loc[df[\"Country\"]==\"Cote d'Ivoire\",\"Country\"]=\"Côte d'Ivoire\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After all this data cleaning, we print out the combined dataframe again to look at the visualizations and patterns in the dataset. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By inspecting the `Correlation` tab, we learn that there is a negative correlation between `AvrgLifeExpectancy` and `Inequality`. In other words, countries with higher levels of inequality also have a lower average life expectancy. We can also look at other tabs, which show the Distribution of quantitative attributes and the Occurrence of categorical attributes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " \n", "

Steering analysis with intent

\n", "
\n", "\n", "Let's say that we want to investigate whether any country-level characteristics explain the observed negative correlation between inequality and life expectancy. Beyond the basic recommendations, you can further specify your analysis *intent*, i.e., the data attributes and values that you are interested in visualizing. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can do this by specifying our analysis intent to Lux via `df.intent`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.intent = [\"Inequality\",\"AvrgLifeExpectancy\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Upon printing the dataframe again, Lux leverages the analysis intent to steer the recommendations towards what the user might be interested in." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By looking at the colored scatterplots in the `Enhance` tab, we find that most G10 industrialized countries are on the upper left quadrant on the scatterplot (low inequality, high life expectancy). In the breakdown by Region, we observe that countries in Sub-Saharan Africa (yellow points) tend to be on the bottom right, with lower life expectancy and higher inequality." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "ℹ️ Check out [this tutorial](https://lux-api.readthedocs.io/en/latest/source/guide/intent.html#) to learn more about how to specify intent in Lux." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " \n", "

Quick inspection of 1-D Series

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Continuing our analysis, we are now interested in how these country-level metrics related to a country's COVID intervention strategy and response. We download the [COVID pandemic policy dataset](https://ourworldindata.org/grapher/covid-stringency-index) dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "covid = pd.read_csv(\"https://github.com/lux-org/lux-datasets/blob/master/data/covid-stringency.csv?raw=True\")\n", "covid= covid.rename(columns={\"stringency_index\":\"stringency\"})\n", "covid['Day'] = pd.to_datetime(covid['Day'], format=\"%Y-%m-%d\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When we print out the column `Day`, we see that this record spans from stringency tracked daily from January 2020 to March 2021. We also see the temporal patterns across year, month, and day of the week." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "covid[\"Day\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " \n", "

Quick-and-dirty experimentation with visualizations

\n", "
\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our COVID dataset contains a column `stringency`, which is a number from 0-100, with 100 being the highest level of responses (i.e., enacting measures, such as travel bans, stay-at-home orders, school closure, etc.). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want to plot the distribution of `stringency` by creating a `Vis` object in Lux.\n", "To generate a `Vis`, users should specify their intent (i.e., what columns/values do you want to plot, in this case: `['stringency']`) and a source dataframe (in this case : `covid`). " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from lux.vis.Vis import Vis\n", "Vis([\"stringency\"],covid)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When we print the dataframe, we see that stringency distribution is at the medium to high levels, around with the distribution peaking at a stringency of 60-80. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We are only interested in the records on March 11,2020, which is the first day WHO announce COVID as pandemic. By filtering to the records only on this day, the stringency score becomes a proxy that measures the strictness of the country's **early** intervention efforts." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "early = covid[covid[\"Day\"]==\"2020-03-11\"]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Vis([\"stringency\"],early)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Somewhat interestingly, we see that during this early date, the stringency is heavily right-skewed, suggesting that most countries didn't enact strict measures in the early days of the pandemic." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lux is built on the principle that users should always be able to visualize and explore anything they specify, without having to think about how the visualization should look like. The programmatic generation of `Vis` provides a quick and dirty way to ask specific questions about the dataframe without having to write a lot of code." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "ℹ️ In addition to `Vis`, you can also specify a list of visualizations to browse through via a `VisList`. Check out [this tutorial](https://lux-api.readthedocs.io/en/latest/source/guide/vis.html) to learn more about how to create Vis and VisList in Lux." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 🤔 Some interesting findings" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now join the countries dataframe `df` with the filtered `early` COVID dataframe: " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = early.merge(df,left_on=[\"Entity\",\"Code\"],right_on=[\"Country\",\"cca3\"])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result.intent = [\"stringency\"]\n", "result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When we set the intent as `stringency`, we see that China and Italy have the strictest measures (corresponding to dark blue on the geo map, among a sea of light yellow and green)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want to discern these country-level differences further, so we divide the stringency index into a categorical variable `stringency_level`. We use [pd.qcut](https://pandas.pydata.org/docs/reference/api/pandas.qcut.html) to ensure that there is equal number records in the `Low` and `High` bins." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result[\"stringency_level\"] = pd.qcut(result[\"stringency\"],2,labels=[\"Low\",\"High\"])\n", "result = result.drop(columns=[\"stringency\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With the modified dataframe, we revisit the negative correlation that we observed previously by setting the intent as average life expectancy and inequality again. The result is similar to what we saw before, with one visualization showing the breakdown by `stringency_level`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result.intent = [\"Inequality\",\"AvrgLifeExpectancy\"]\n", "result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see a strong separation showing how stricter countries (blue) corresponded to countries with higher life expectancy and lower levels of inequality. This visualization indicates that these countries could possibly have a more well-developed public health infrastructure that promoted the early pandemic response. However, we observe three outliers that seem to defy this trend. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When we filter to these dataframe records, we find that these countries correspond to [Afghanistan](https://www.who.int/news-room/feature-stories/detail/afghanistan-who-mission-reviews-covid-19-response), [Pakistan](https://www.who.int/news-room/feature-stories/detail/covid-19-in-pakistan-who-fighting-tirelessly-against-the-odds), and [Rwanda](https://www.npr.org/sections/goatsandsoda/2020/07/15/889802561/a-covid-19-success-story-in-rwanda-free-testing-robot-caregivers)—countries that were praised for their early pandemic response despite limited resources." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result[(result[\"Inequality\"]>0.35)&(result[\"stringency_level\"]==\"High\")]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " \n", "

Exporting visualization insight to edit and share

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To download this visualization insight and share with others, we can click on the visualization in the Lux view above and the button." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This exports the visualization from the widget to a `Vis` object. We can access the exported `Vis` object via the `exported` property and print it as code." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result.exported" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(result.exported[0].to_code(\"altair\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can copy-and-paste the output Altair code into a separate cell. Then let's tweak the plotting code a bit before sharing this insight with our colleagues." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import altair as alt\n", "\n", "c = \"#e7298a\"\n", "chart = alt.Chart(result,title=\"Check out this cool insight!\").mark_circle().encode(\n", " x=alt.X('Inequality',scale=alt.Scale(domain=(0.04, 0.51)),type='quantitative', axis=alt.Axis(title='Inequality')),\n", " y=alt.Y('AvrgLifeExpectancy',scale=alt.Scale(domain=(48.9, 83.6)),type='quantitative', axis=alt.Axis(title='AvrgLifeExpectancy'))\n", ")\n", "highlight = result[(result[\"Inequality\"]>0.35)&(result[\"stringency_level\"]==\"High\")]\n", "\n", "hchart = alt.Chart(highlight).mark_point(color=c,size=50,shape=\"cross\").encode(\n", " x=alt.X('Inequality',scale=alt.Scale(domain=(0.04, 0.51)),type='quantitative', axis=alt.Axis(title='Inequality')),\n", " y=alt.Y('AvrgLifeExpectancy',scale=alt.Scale(domain=(48.9, 83.6)),type='quantitative', axis=alt.Axis(title='AvrgLifeExpectancy')),\n", ")\n", "\n", "text = alt.Chart(highlight).mark_text(color=c,dx=-35,dy=0,fontWeight=800).encode(\n", " x=alt.X('Inequality',scale=alt.Scale(domain=(0.04, 0.51)),type='quantitative', axis=alt.Axis(title='Inequality')),\n", " y=alt.Y('AvrgLifeExpectancy',scale=alt.Scale(domain=(48.9, 83.6)),type='quantitative', axis=alt.Axis(title='AvrgLifeExpectancy')),\n", " text=alt.Text('Country')\n", ")\n", "\n", "chart = chart.encode(color=alt.Color('stringency_level',type='nominal'))\n", "chart = chart.properties(width=160,height=150)\n", "\n", "(chart + hchart + text).configure_title(color=c)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "ℹ️ Check out [this tutorial](https://lux-api.readthedocs.io/en/latest/source/guide/export.html) to learn more about exporting visualizations in Lux.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Try out Lux! " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To get started, Lux can be installed through [PyPI](https://pypi.org/project/lux-api/). \n", "\n", "```bash\n", "pip install lux-api\n", "``` \n", "\n", "\n", "To use Lux in [Jupyter notebook](https://github.com/jupyter/notebook) or [VSCode](https://code.visualstudio.com/docs/python/jupyter-support), activate the notebook extension:\n", "\n", "```bash\n", "jupyter nbextension install --py luxwidget\n", "jupyter nbextension enable --py luxwidget\n", "```\n", "\n", "To use Lux in [Jupyter Lab](https://github.com/jupyterlab/jupyterlab), activate the lab extension:\n", "\n", "```bash\n", "jupyter labextension install @jupyter-widgets/jupyterlab-manager\n", "jupyter labextension install luxwidget\n", "```\n", "\n", "If you encounter issues with the installation, please refer to [this page](https://lux-api.readthedocs.io/en/latest/source/guide/FAQ.html#troubleshooting-tips) to troubleshoot the installation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### More information: \n", "\n", "- Follow us on [Twitter](https://twitter.com/lux_api) for discussion and updates.\n", "- Sign up for the early-user [mailing list](https://forms.gle/XKv3ejrshkCi3FJE6) to stay tuned for upcoming releases, updates, or user studies. \n", "- Visit [ReadTheDoc](https://lux-api.readthedocs.io/en/latest/) for more detailed documentation.\n", "- Check out our [notebook gallery](https://lux-api.readthedocs.io/en/latest/source/reference/gallery.html) for links to more demo, tutorial, and exercise on how to use Lux.\n", "- Report any bugs, issues, or requests through [Github Issues](https://github.com/lux-org/lux/issues). \n", "\n", "
Icons made by Freepik from www.flaticon.com
" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.5" } }, "nbformat": 4, "nbformat_minor": 4 }