{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Practical Data Visualization with Python (Homework - Participant)\n", "\n", "## Homework Overview\n", "\n", "Thanks for checking out the hands-on reinforcement exercises for this seminar. The goal of this homework is to provide you with a handful of questions that necessitate visualization that you might conceivably face on the job. There is not one \"right\" answer for the questions below, but some answers are *more* right than others. For example, if you were to be asked to visualize the trends in [LTV](https://www.investopedia.com/terms/l/loantovalue.asp) over the course of a year, would plotting average LTV over time be a better visualization than building twelve violin plots of LTV--one for each month? Not necessarily. But would both of those be better than a single box-and-whisker plot of LTV all originations in that year? Absolutely. It all depends on the context of the question, and the information you intend to convey with your visualization. \n", "\n", "When in doubt, ask yourself: **am I clearly and powerfully communicating the relevant information with this visualization?**\n", "\n", "- With each of these questions below, you will be asked to do two things:\n", " 1. Construct a visualization to answer the question.\n", " - You'll be pre-allotted one code cell in the notebook for this, but feel free to use as many as you'd like. As was shown in the lecture materials, a good visualization almost always requires iteration. Feel free to keep the remnants of your iterative creative procees in your notebooks; just ensure your final viz. for each question is clearly marked. \n", " 2. Briefly explain (in no more than a paragraph) why you chose to visualize the data as you did. \n", " - You'll be pre-allotted one markdown cell in the notebook for this, directly following the code cell. If you are struggling to think of what to write, fall back on the lecture materials, particulary Section 1: Why We Visualize. Imagine that each of your visualizations was going to be presented to your team at a Process Confirm / Code Review; your paragraph should read like the explanation you would give in that context, detailing why your choices made for the most effective viz. Be sure to focus on how your visualization answers the question at hand, the crux of which is **in bold** although the entire question provides relevant information as to what is expected.\n", " \n", "We'll be using the [same data](https://nbviewer.jupyter.org/github/pmaji/practical-python-data-viz-guide/blob/master/notebooks/data_prep_nb.ipynb) we've been dealing with throughout the seminar: January and December 2017 FNMA originations. Remember, if you don't understand what some of the variables mean, all the information you need is in the `data_prep_nb.ipynb`, including links to relevant glossaries and data dictionnaries. \n", "\n", "**Note**: For all questions below, you are free to use whatever python visualization package you want. That said, some questions require a specific type of visualization (example: if you know that you need an interactive visualization, don't start by using a package that you know cannot build interactive visualizations). \n", "\n", "Good luck!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# basic packages\n", "import numpy as np\n", "import pandas as pd\n", "import datetime" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'This notebook was last executed on 2019-09-08 20:42'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# store the datetime of the most recent running of this notebook as a form of a log\n", "most_recent_run_datetime = datetime.datetime.now().strftime(\"%Y-%m-%d %H:%M\")\n", "f\"This notebook was last executed on {most_recent_run_datetime}\"" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | loan_id | \n", "orig_chn | \n", "seller_name | \n", "orig_rt | \n", "orig_amt | \n", "orig_trm | \n", "orig_dte | \n", "frst_dte | \n", "oltv | \n", "ocltv | \n", "... | \n", "occ_stat | \n", "state | \n", "zip_3 | \n", "mi_pct | \n", "product_type | \n", "cscore_c | \n", "mi_type | \n", "relocation_flg | \n", "cscore_min | \n", "orig_val | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "100020736692 | \n", "B | \n", "CALIBER HOME LOANS, INC. | \n", "4.875 | \n", "492000 | \n", "360 | \n", "12/2017 | \n", "02/2018 | \n", "75 | \n", "75 | \n", "... | \n", "I | \n", "CA | \n", "920 | \n", "NaN | \n", "FRM | \n", "NaN | \n", "NaN | \n", "N | \n", "757.0 | \n", "656000.000000 | \n", "
1 | \n", "100036136334 | \n", "R | \n", "OTHER | \n", "2.750 | \n", "190000 | \n", "180 | \n", "12/2017 | \n", "01/2018 | \n", "67 | \n", "67 | \n", "... | \n", "P | \n", "MD | \n", "206 | \n", "NaN | \n", "FRM | \n", "798.0 | \n", "NaN | \n", "N | \n", "797.0 | \n", "283582.089552 | \n", "
2 | \n", "100043912941 | \n", "R | \n", "OTHER | \n", "4.125 | \n", "68000 | \n", "360 | \n", "12/2017 | \n", "02/2018 | \n", "66 | \n", "66 | \n", "... | \n", "P | \n", "OH | \n", "432 | \n", "NaN | \n", "FRM | \n", "NaN | \n", "NaN | \n", "N | \n", "804.0 | \n", "103030.303030 | \n", "
3 | \n", "100057175226 | \n", "R | \n", "OTHER | \n", "4.990 | \n", "71000 | \n", "360 | \n", "12/2017 | \n", "02/2018 | \n", "95 | \n", "95 | \n", "... | \n", "P | \n", "NC | \n", "278 | \n", "30.0 | \n", "FRM | \n", "NaN | \n", "1.0 | \n", "N | \n", "696.0 | \n", "74736.842105 | \n", "
4 | \n", "100060715643 | \n", "R | \n", "OTHER | \n", "4.500 | \n", "180000 | \n", "360 | \n", "12/2017 | \n", "02/2018 | \n", "75 | \n", "75 | \n", "... | \n", "I | \n", "WA | \n", "983 | \n", "NaN | \n", "FRM | \n", "NaN | \n", "NaN | \n", "N | \n", "726.0 | \n", "240000.000000 | \n", "
5 rows × 27 columns
\n", "