{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Python Data Viz Libraries Compared: 8 Popular Graphs Made with pandas, matplotlib, seaborn, and plotly.express\n", "\n", "Author: [Dylan Castillo](https://twitter.com/_dylancastillo)\n", "\n", "I'm teaching a course about the essential tools of Data Science. Among those, I'm going to cover how to use some of the most popular data visualization libraries in Python: `pandas` (yes, that's not a typo!), `matplotlib`, `seaborn`, and `plotly.express`.\n", "\n", "I thought it be useful for my students to have cheat sheet with some popular graphs made with each of these tools. So I wrote this cheat sheet.\n", "\n", "In the next sections, you'll learn how to set up your local environment, read the data, and get the code to make the following types of graphs:\n", "\n", "- Line plot\n", "- Grouped bars plot\n", "- Stacked bars plot\n", "- Area chart\n", "- Pie/Donut chart\n", "- Histogram\n", "- Scatter plot\n", "- Boxplot\n", "\n", "Let me know what you think!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Set Up a Virtual Environment" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Working with virtual environments will save you lots of headhaches when working in Python project. So, you'll start by creating one, and installing the required libraries." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you're using `venv`, then here's how you set up your local enviroment:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```\n", "$ python3 -m venv .dataviz\n", "$ source .dataviz/bin/activate\n", "(.dataviz) $ python3 -m pip install pandas==1.2.4 numpy==1.2.0 matplotlib==3.4.2 plotly==4.14.3 seaborn==0.11.1 notebook==6.4.0\n", "(.dataviz) $ jupyter notebook \n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you're using `conda`, then you need to run these commands:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```\n", "$ conda create --name .dataviz\n", "$ conda activate .dataviz\n", "(.dataviz) $ conda install pandas==1.2.4 numpy==1.19.2 matplotlib==3.4.2 plotly==4.14.3 seaborn==0.11.1 notebook==6.4.0 -y\n", "$ jupyter notebook\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's it! These commands will:\n", "\n", "1. Create a virtual environment called `.dataviz`\n", "2. Active the virtual environment\n", "3. Install the required packages (`pandas`, `numpy`, `matplotlib`, `plotly`, `seaborn`, and `notebook`)\n", "4. Start a Jupyter Notebook\n", "\n", "Note that if you're only planning on using just one of the data visualization libraries, then feel free not to install all of them. For example, if you want to use `plotly.express`, you don't need to install `matplotlib` and `seaborn`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Start Jupyter Notebook and Import Libraries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Open *Jupyter Notebook*. Then, create a new notebook by clicking on *New > Python3 notebook* in the menu. By now, you should have an empty Jupyter notebook in front of you. Now, let's get to the fun part!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, you'll need to import the required libraries. Create a new cell in your notebook and paste the following code to import the required libraries:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# All\n", "import pandas as pd\n", "import numpy as np\n", "\n", "# matplotlib\n", "import matplotlib.ticker as mtick\n", "import matplotlib.pyplot as plt\n", "\n", "# plotly\n", "import plotly.io as pio\n", "import plotly.express as px\n", "\n", "# seaborn\n", "import seaborn as sns\n", "\n", "# Set templates\n", "pio.templates.default = \"seaborn\"\n", "plt.style.use(\"seaborn\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This code will import the required libraries and set up the themes for `matplotlib` and `plotly`. Each library provides you with a specific set of functionalities:\n", " \n", "- `pandas` helps you read the data\n", "- `matplotlib.pyplot`, `plotly.express` and `seaborn` will help you make the graphs\n", "- `matplotlib.ticker` provides with a way to set specific settings of the tickers on your axes in your `matplotlib` graphs\n", "- `plotly.io` makes it easy to define a specific theme for your plotly graphs\n", "\n", "In **lines 17 and 18**, you define the themes for `plotly` and `matplotlib`. In this case, you set them to use the `seaborn` theme. This will make the graphs from all the libraries look similar." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Understand the Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Throughout this tutorial you'll use a [dataset](https://github.com/szrlee/Stock-Time-Series-Analysis) with stock market data for 29 companies compiled by [ichardddddd](https://github.com/szrlee). It has the following columns:\n", "\n", "- **Date**: Date corresponding to observed value\n", "- **Open**: Price (in USD) at market open at the specified date\n", "- **High**: Highest price (in USD) reached during the corresponding date\n", "- **Low**: Lowest price (in USD) reached during the corresponding date \n", "- **Close**: Price (in USD) at market close at the specified date\n", "- **Volume**: Number of shares traded\n", "- **Name**: Stock symbol of the company\n", "\n", "You can take a look ad the data by taking a sample of a few rows:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Date | \n", "Open | \n", "High | \n", "Low | \n", "Close | \n", "Volume | \n", "Name | \n", "
---|---|---|---|---|---|---|---|
25931 | \n", "2013-01-18 | \n", "52.24 | \n", "52.34 | \n", "51.81 | \n", "52.34 | \n", "8492176 | \n", "DIS | \n", "
53204 | \n", "2013-06-05 | \n", "98.13 | \n", "98.16 | \n", "96.12 | \n", "96.42 | \n", "5394802 | \n", "MCD | \n", "
39946 | \n", "2008-09-26 | \n", "117.21 | \n", "121.01 | \n", "117.01 | \n", "119.42 | \n", "4760683 | \n", "IBM | \n", "
37191 | \n", "2009-10-15 | \n", "27.28 | \n", "27.37 | \n", "27.05 | \n", "27.30 | \n", "13350145 | \n", "HD | \n", "
2877 | \n", "2017-06-08 | \n", "204.84 | \n", "206.03 | \n", "204.09 | \n", "205.94 | \n", "2451348 | \n", "MMM | \n", "