{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Visualization in Python: Altair" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "\n", "Altair is a declarative statistical visualization library for Python. It offers a powerful and concise visualization grammar that enables users to build a wide range of statistical visualizations quickly and simply.\n", "\n", "Here are some benefits of using Altair for visualization:\n", "\n", "- The graph can easily be interative\n", "\n", "- Every visualization generated by Altair can be downloaded as PNG file if click the three dots on the upper right side of the graph\n", "\n", "- Coding grammar is greatly formatted and easy to add features\n", "\n", "(Note: materials included are common useful visualizations methods collected from https://altair-viz.github.io/, if it doesn't include any specific visualization problem, please visit the website for further reference)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Installation\n", "\n", "If you are using pip: \n", "`! pip install altair vega_datasets`\n", "\n", "If you are using conda: \n", "`! conda install -c conda-forge altair vega_datasets`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading Package\n", "\n", "Loading the Altair package is similar to loading other packages." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import altair as alt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic Graph Format\n", "\n", "### Chart\n", "The fundamental object in Altair is the **Chart**, which takes a **dataframe** as a single argument.\n", "\n", "`alt.Chart(dataframe)`\n", "\n", "However, on its own, it will not draw anything because we have not yet told the chart to do anything with the data.We need to specify marks to successully draw the graph.\n", "\n", "### Marks\n", "The mark property lets you specify how the data needs to be represented on the plot.\n", "\n", "`alt.Chart(dataframe).mark_point()`\n", "\n", "### Encodings\n", "Once we have the data and determined how it is represented, we want to specify what columns in the dataframe to represent it. That is, we need to set up the x and y data, size, color, etc. This is where we use encodings.\n", "\n", "`alt.Chart(dataframe).mark_point().encode()`\n", "\n", "**After knowing the general structure of the command, we can explore more deeply in each categories.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Marks\n", "The mark property lets you specify how the data needs to be represented on the plot. Following are the common mark properties provided by Altair:\n", "\n", "(For detailed marks, visit https://altair-viz.github.io/user_guide/marks.html)\n", "\n", "| Mark Type | Command | Description |\n", "| :-: | :-: | :-: |\n", "| **area** | `mark_area()` | A filled area |\n", "| **line** | `mark_line()` | A line plot |\n", "| **bar** | `mark_bar()` | A bar plot |\n", "| **point** | `mark_point()` | A scatter plot with hollow point |\n", "| **circle** | `mark_circle()` | A scatter plot with solid point |\n", "| **text** | `mark_line()` | A scatter plot with point as text |\n", "| **square** | `mark_square()` | A scatter plot with square point |\n", "| **rect** | `mark_rect()` | A heatmap |\n", "| **box plot** | `mark_boxplot()` | A box plot |\n", "\n", "#### Example 1. Scatter Plot of Acceleration vs. Horsepower in Cars Dataset" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
\n", "
