{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Practical Data Visualization with Python (Homework - Participant)\n", "\n", "## Homework Overview\n", "\n", "Thanks for checking out the hands-on reinforcement exercises for this seminar. The goal of this homework is to provide you with a handful of questions that necessitate visualization that you might conceivably face on the job. There is not one \"right\" answer for the questions below, but some answers are *more* right than others. For example, if you were to be asked to visualize the trends in [LTV](https://www.investopedia.com/terms/l/loantovalue.asp) over the course of a year, would plotting average LTV over time be a better visualization than building twelve violin plots of LTV--one for each month? Not necessarily. But would both of those be better than a single box-and-whisker plot of LTV all originations in that year? Absolutely. It all depends on the context of the question, and the information you intend to convey with your visualization. \n", "\n", "When in doubt, ask yourself: **am I clearly and powerfully communicating the relevant information with this visualization?**\n", "\n", "- With each of these questions below, you will be asked to do two things:\n", " 1. Construct a visualization to answer the question.\n", " - You'll be pre-allotted one code cell in the notebook for this, but feel free to use as many as you'd like. As was shown in the lecture materials, a good visualization almost always requires iteration. Feel free to keep the remnants of your iterative creative procees in your notebooks; just ensure your final viz. for each question is clearly marked. \n", " 2. Briefly explain (in no more than a paragraph) why you chose to visualize the data as you did. \n", " - You'll be pre-allotted one markdown cell in the notebook for this, directly following the code cell. If you are struggling to think of what to write, fall back on the lecture materials, particulary Section 1: Why We Visualize. Imagine that each of your visualizations was going to be presented to your team at a Process Confirm / Code Review; your paragraph should read like the explanation you would give in that context, detailing why your choices made for the most effective viz. Be sure to focus on how your visualization answers the question at hand, the crux of which is **in bold** although the entire question provides relevant information as to what is expected.\n", " \n", "We'll be using the [same data](https://nbviewer.jupyter.org/github/pmaji/practical-python-data-viz-guide/blob/master/notebooks/data_prep_nb.ipynb) we've been dealing with throughout the seminar: January and December 2017 FNMA originations. Remember, if you don't understand what some of the variables mean, all the information you need is in the `data_prep_nb.ipynb`, including links to relevant glossaries and data dictionnaries. \n", "\n", "**Note**: For all questions below, you are free to use whatever python visualization package you want. That said, some questions require a specific type of visualization (example: if you know that you need an interactive visualization, don't start by using a package that you know cannot build interactive visualizations). \n", "\n", "Good luck!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# basic packages\n", "import numpy as np\n", "import pandas as pd\n", "import datetime" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'This notebook was last executed on 2019-09-08 20:42'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# store the datetime of the most recent running of this notebook as a form of a log\n", "most_recent_run_datetime = datetime.datetime.now().strftime(\"%Y-%m-%d %H:%M\")\n", "f\"This notebook was last executed on {most_recent_run_datetime}\"" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
loan_idorig_chnseller_nameorig_rtorig_amtorig_trmorig_dtefrst_dteoltvocltv...occ_statstatezip_3mi_pctproduct_typecscore_cmi_typerelocation_flgcscore_minorig_val
0100020736692BCALIBER HOME LOANS, INC.4.87549200036012/201702/20187575...ICA920NaNFRMNaNNaNN757.0656000.000000
1100036136334ROTHER2.75019000018012/201701/20186767...PMD206NaNFRM798.0NaNN797.0283582.089552
2100043912941ROTHER4.1256800036012/201702/20186666...POH432NaNFRMNaNNaNN804.0103030.303030
3100057175226ROTHER4.9907100036012/201702/20189595...PNC27830.0FRMNaN1.0N696.074736.842105
4100060715643ROTHER4.50018000036012/201702/20187575...IWA983NaNFRMNaNNaNN726.0240000.000000
\n", "

5 rows × 27 columns

\n", "
" ], "text/plain": [ " loan_id orig_chn seller_name orig_rt orig_amt \\\n", "0 100020736692 B CALIBER HOME LOANS, INC. 4.875 492000 \n", "1 100036136334 R OTHER 2.750 190000 \n", "2 100043912941 R OTHER 4.125 68000 \n", "3 100057175226 R OTHER 4.990 71000 \n", "4 100060715643 R OTHER 4.500 180000 \n", "\n", " orig_trm orig_dte frst_dte oltv ocltv ... occ_stat state zip_3 \\\n", "0 360 12/2017 02/2018 75 75 ... I CA 920 \n", "1 180 12/2017 01/2018 67 67 ... P MD 206 \n", "2 360 12/2017 02/2018 66 66 ... P OH 432 \n", "3 360 12/2017 02/2018 95 95 ... P NC 278 \n", "4 360 12/2017 02/2018 75 75 ... I WA 983 \n", "\n", " mi_pct product_type cscore_c mi_type relocation_flg cscore_min \\\n", "0 NaN FRM NaN NaN N 757.0 \n", "1 NaN FRM 798.0 NaN N 797.0 \n", "2 NaN FRM NaN NaN N 804.0 \n", "3 30.0 FRM NaN 1.0 N 696.0 \n", "4 NaN FRM NaN NaN N 726.0 \n", "\n", " orig_val \n", "0 656000.000000 \n", "1 283582.089552 \n", "2 103030.303030 \n", "3 74736.842105 \n", "4 240000.000000 \n", "\n", "[5 rows x 27 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# pulling in our main data; for more info on the data, see the \"data_prep_nb.ipynb\" file\n", "main_df = pd.read_csv(filepath_or_buffer='../data/jan_and_dec_17_acqs.csv')\n", "\n", "# taking a peek at our data\n", "main_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 1\n", "\n", "A business partner of yours came to you to ask about how occupancy status relates to risk. They were wondering, **what occupancy status appears riskier in our data: principal homes (i.e. someone's primary residence), second homes, or investor-owned homes?** There are obviously many ways of measuring risk. Here it's safe to assume your business partner means credit risk, so some variables you may want to consider would be the borrower's credit score, DTI, or LTV. You can use one or more of these variables in your analysis, or something else altogether if you see fit; just ensure that in the end you arrive at one a single visualization to share with your business partner. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# code for visualization goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Explanation for why you chose this particular visualization goes here..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 2\n", "\n", "Imagine that a recent news event broke that had to do with [mortgage insurance (MI)](https://en.wikipedia.org/wiki/Mortgage_insurance), and even though we don't yet know exactly how that news will impact Fannie Mae's business, you've been asked to produce a visualization that communicates **to what extent our December 2017 acquisitions were covered by MI**. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# code for visualization goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Explanation for why you chose this particular visualization goes here..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 3\n", "\n", "One of your business partners is trying to **learn more about the areas of the country where we are providing the highest value loans in terms of origination amount**. You've also been told that an interactive map of the United States would be optimal here, and they'd like you to add whatever data you might think are relevant to the tooltip. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# code for visualization goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Explanation for why you chose this particular visualization goes here..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 4\n", "\n", "You've received a very open-ended question from an account manager hoping to learn more about how the seller with whom they work most closely compares to all sellers. Pick any seller (aside from \"Other\") and any two variables in our data (i.e. origination amount and origination value, but don't use that combo), and put together a visualization that communicates **whether or not that seller is unique in any way as it pertains to the two variables you selected**. The answer can be yes, no, or maybe... just justify your answer with your visualization. " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# code for visualization goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Explanation for why you chose this particular visualization goes here..." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0" } }, "nbformat": 4, "nbformat_minor": 4 }