{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "In this demo, we will be analyzing the [IBM HR Employee Attrition dataset](https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset) to understand how employee characteristics might influence attrition. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lux is designed to be tightly integrated with Pandas and can be used as-is, without modifying your existing Pandas code. \n", "\n", "To enable Lux, simply add `import lux` along with your Pandas import statement." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import lux" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " \n", "

Visualizations of dataframes beyond simple tables

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lux preserves the Pandas dataframe semantics -- which means that you can apply any command from Pandas's API to the dataframes in Lux and expect the same behavior. For example, we can load the dataset via standard Pandas `read_csv` command." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(\"https://raw.githubusercontent.com/lux-org/lux-datasets/master/data/employee.csv\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To get an overview of the dataframe, simply print out the dataframe `df`. By clicking on the Toggle button, you can now explore the data visually through Lux. You should see three tabs of visualizations recommended to you. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The visualizations are displayed in different tabs as [actions](https://lux-api.readthedocs.io/en/latest/source/getting_started/overview.html#visualizing-dataframes-with-recommendations).\n", "By inspecting the Correlation action, we see several salient patterns with a sharp triangular pattern. This checks out with our intuition that it is impossible for total working years to exceed the employee's age, and for any employee to stay longer at a company than their total working years. " ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " \n", "

Steering analysis with intent

\n", "
\n", "\n", "Let's say that we want to investigate factors that influence employee attrition. Beyond these basic recommendations, you can further specify your analysis *intent*, i.e., the data attributes and values that you are interested in visualizing. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.intent=[\"Attrition\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Upon printing the dataframe again, Lux leverages the analysis intent to steer the recommendations towards what the user might be interested in." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On the left, we see that the visualization based on the specified intent shows that around 15% of employees leaves the company. On the right, in the Enhance action, we learn that employees that leave typically spent less time in their working life and at the Company than their counterparts by about four years. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "ℹ️ Check out [this tutorial](https://lux-api.readthedocs.io/en/latest/source/guide/intent.html#) to learn more about how to specify intent in Lux." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " \n", "

Quick inspection of 1-D Series

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Based on this insight, we want to derive a new column that captures what percentage of the employee's working year have they spend at this company. We quickly divide the two columns in order to inspect the Series visualization. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df[\"YearsAtCompany\"]/df[\"TotalWorkingYears\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We print out the result of dividing the two columns. The Series visualization enables us to quickly verify that the data range lies between 0 and 1. Somewhat surprisingly, we also find that a large group of employees who have spent almost all of their working life at this company. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want to learn more, so we create a new column `%WorkingYearsAtCompany` to capture this metric." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df[\"%WorkingYearsAtCompany\"]=df[\"YearsAtCompany\"]/df[\"TotalWorkingYears\"]" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " \n", "

Quick-and-dirty experimentation with visualizations

\n", "
\n", "\n", "\n", "Lux is built on the principle that users should always be able to visualize and explore anything they specify, without having to think about how the visualization should look like.\n", "\n", "Continuing our analysis, we are interested in seeing if there are any differences in percentage of working years at the Company for young employees compared to older employees. To investigate this hypothesis, we generate a Vis object showing the relationship between `Age` and `%WorkingYearsAtCompany`. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "from lux.vis.Vis import Vis\n", "Vis([\"%WorkingYearsAtCompany\",\"Age\"],df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The visualization does not show a super clear trend. To look into this more, we can binarize the `Age` variable based on whether the employee is above or below the average age across employees. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df[\"Age\"]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df[\"IsOld\"]=df[\"Age\"]>df[\"Age\"].mean()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df[\"IsOld\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we again look at whether young or old people tend to stay in the company for longer. The visualization generated from Lux shows that older employees actually have a shorter percentage of working year at the company. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Vis([\"%WorkingYearsAtCompany\",\"IsOld\"],df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We are also interested in whether the salary of old and young employees differs significantly. We take a look at the columns in the dataframe and see that there is a group of columns that contains the word `Rate`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.columns" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "compensation = list(filter(lambda col: 'Rate' in col, df.columns))\n", "compensation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We then pose these compensation-related attributes against the `IsOld` variable and find that the differences in compensation is not significant across old v.s. young employees. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from lux.vis.VisList import VisList" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "VisList([compensation,\"IsOld\"],df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The programmatic generation of Vis and Vislist provides a quick and dirty way to ask specific questions about the dataframe." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "ℹ️ Check out [this tutorial](https://lux-api.readthedocs.io/en/latest/source/guide/vis.html) to learn more about how to create Vis and VisList in Lux." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " \n", "

Exporting visualization insight to edit and share

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By skimming through the recommended visualizations again, we find that distance from home to work is a important factor in employee attrition." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can scroll down the recommended visualizations and noticed that employees that live further away have higher rates of attrition. We click on the bar chart visualization of `DistanceFromHome` v.s. `Attrition` and the export button on the top right corner, to export this visualization as code.\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can click on the bar chart visualization of `DistanceFromHome` v.s. `Attrition` and export it as code." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "vis = df.exported[0]\n", "print(vis.to_code(language=\"altair\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we can make minor stylistic changes before we can export this visualization to a slidedeck to share with colleagues." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import altair as alt\n", "visData = pd.DataFrame({'Attrition': {0: 'No', 1: 'Yes'}, \n", " 'DistanceFromHome': {0: 8.915652879156529, 1: 10.632911392405063}})\n", "\n", "chart = alt.Chart(visData,title=\"Important Factor for Employee Attrition!!\").mark_bar().encode(\n", " y = alt.Y('Attrition', type= 'nominal', axis=alt.Axis(labelOverlap=True, title='Attrition')),\n", " x = alt.X('DistanceFromHome', type= 'quantitative', axis=alt.Axis(title='Average Distance from Home')),\n", ")\n", "chart = chart.configure_mark(tooltip=alt.TooltipContent('encoding'))\n", "chart = chart.configure_title(fontWeight=500,fontSize=13,font='Helvetica Neue')\n", "chart = chart.configure_axis(titleFontWeight=500,titleFontSize=11,titleFont='Helvetica Neue',\n", " labelFontWeight=400,labelFontSize=8,labelFont='Helvetica Neue',labelColor='#505050')\n", "chart = chart.configure_legend(titleFontWeight=500,titleFontSize=10,titleFont='Helvetica Neue',\n", " labelFontWeight=400,labelFontSize=8,labelFont='Helvetica Neue')\n", "chart = chart.properties(width=250,height=70)\n", "\n", "chart" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "ℹ️ Check out [this tutorial](https://lux-api.readthedocs.io/en/latest/source/guide/export.html) to learn more about exporting visualizations in Lux.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Try out Lux! " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To get started, Lux can be installed through [PyPI](https://pypi.org/project/lux-api/). \n", "\n", "```bash\n", "pip install lux-api\n", "``` \n", "\n", "\n", "To use Lux in [Jupyter notebook](https://github.com/jupyter/notebook) or [VSCode](https://code.visualstudio.com/docs/python/jupyter-support), activate the notebook extension:\n", "\n", "```bash\n", "jupyter nbextension install --py luxwidget\n", "jupyter nbextension enable --py luxwidget\n", "```\n", "\n", "To use Lux in [Jupyter Lab](https://github.com/jupyterlab/jupyterlab), activate the lab extension:\n", "\n", "```bash\n", "jupyter labextension install @jupyter-widgets/jupyterlab-manager\n", "jupyter labextension install luxwidget\n", "```\n", "\n", "If you encounter issues with the installation, please refer to [this page](https://lux-api.readthedocs.io/en/latest/source/guide/FAQ.html#troubleshooting-tips) to troubleshoot the installation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### More information: \n", "\n", "- Follow us on [Twitter](https://twitter.com/lux_api) for discussion and updates.\n", "- Sign up for the early-user [mailing list](https://forms.gle/XKv3ejrshkCi3FJE6) to stay tuned for upcoming releases, updates, or user studies. \n", "- Visit [ReadTheDoc](https://lux-api.readthedocs.io/en/latest/) for more detailed documentation.\n", "- Check out our [notebook gallery](https://lux-api.readthedocs.io/en/latest/source/reference/gallery.html) for links to more demo, tutorial, and exercise on how to use Lux.\n", "- Report any bugs, issues, or requests through [Github Issues](https://github.com/lux-org/lux/issues). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Icons made by Freepik from www.flaticon.com
" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 4 }