{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Overview of Lux" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lux is designed to be tightly integrated with Pandas and can be used as-is, without modifying your existing Pandas code. To enable Lux, simply add `import lux` along with your Pandas import statement." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import lux" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Lux preserves the Pandas dataframe semantics -- which means that you can apply any command from Pandas's API to the dataframes in Lux and expect the same behavior. For example, we can load the dataset via standard Pandas `read_*` commands." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(\"../data/college.csv\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lux is built on the philosophy that generating useful visualizations should be as simple as printing out a dataframe. \n", "When you print out the dataframe in the notebook, you should see the default Pandas table display with an additional Toggle button. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By clicking on the Toggle button, you can now explore the data visually through Lux. You should see three tabs of visualizations recommended to you. Voila! You have generated your first set of recommendations through Lux!\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we will describe the details of how these recommendations are generated." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Visualizing Dataframes with _Recommendations_" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Recommendations highlight interesting patterns and trends in your dataframe. Lux offers different types of recommendations, known as _analytical actions_. These analytical actions represent different analysis that can be performed on the data. Lux recommends a set of actions depending on the content of your dataframe and your analysis goals and interests (described later). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As shown in the example above, by default, we display three types of actions shown as different tabs: \n", "\n", "- **Correlation** displays relationships between two quantitative variables, ranked by the most to least correlated scatterplots.\n", "\n", "\"Example\n", "\n", "- **Distribution** displays histogram distributions of different quantitative attributes in the dataframe, ranked by the most to least skewed distributions.\n", "\n", "\"Example\n", " \n", "- **Occurrence** displays bar chart distributions of different categorical attributes in the dataframe, ranked by the most to least uneven bar charts.\n", "\n", "\"Example\n", " \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Seamless Integration with Pandas" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see that the information about ACTMedian and SATAverage has a very strong correlation. This means that we could probably just keep one of the columns and still get about the same information. So let's drop the ACTMedian column." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "df = df.drop(columns=[\"ACTMedian\"])\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice how the recommendations are regenerated based on the updated dataframe. This means that we can seamlessly go from doing data transformations to visualizing our dataframes without having to import or export out of any visualization tool." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Steering Recommendations via User Intent" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We saw an example of how recommendations can be generated for the dataframe without providing additional information.\n", "Beyond these basic recommendations, you can further specify your analysis *intent*, i.e., the data attributes and values that you are interested in visualizing. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For example, let's say that you are interested in learning more about the median earning of students after they attend the college. You can set your intent in Lux to indicate that you are interested the attribute `MedianEarning`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.intent = [\"MedianEarnings\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When you print out the dataframe again, you should see three different tabs of visualizations recommended to you. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the displayed widget, the visualization on the left represent the visualization that you have specified as the intent. \n", "On the right, you see the gallery of visualizations recommended based on the specified intent.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Given the current intent represented as C = {MedianEarnings}, additional actions (**Enhance** and **Filter**) are generated. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- **Enhance** adds an attribute to intended visualization. Enhance lets users compare the effect the added variable on the intended visualization. For example, enhance displays visualizations involving C' = {MedianEarnings, **added attribute**}, including:\n", "\n", " - {MedianEarnings, **Expenditure**}\n", " - {MedianEarnings, **AverageCost**}\n", " - {MedianEarnings, **AverageFacultySalary**}." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"Screenshot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- **Filter** adds an additional filter to the intended visualization. Filter lets users browse through what the intended visualization looks like for different subsets of data. For example, Filter displays visualizations involving C' = {MedianEarnings, **added filter**}, including: \n", "\n", " - {MedianEarnings, **FundingModel=Public**}\n", " - {MedianEarnings, **Region=Southeast**}\n", " - {MedianEarnings, **Region=Great Lakes**}." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"Screenshot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lux supports a variety of analysis intent, such as specifying values of interest or encoding preferences, refer to [this page](https://lux-api.readthedocs.io/en/latest/source/guide/intent.html) to learn more about it." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 4 }