{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Practical Data Visualization with Python (Homework - Participant)\n",
    "\n",
    "## Homework Overview\n",
    "\n",
    "Thanks for checking out the hands-on reinforcement exercises for this seminar. The goal of this homework is to provide you with a handful of questions that necessitate visualization that you might conceivably face on the job. There is not one \"right\" answer for the questions below, but some answers are *more* right than others. For example, if you were to be asked to visualize the trends in [LTV](https://www.investopedia.com/terms/l/loantovalue.asp) over the course of a year, would plotting average LTV over time be a better visualization than building twelve violin plots of LTV--one for each month? Not necessarily. But would both of those be better than a single box-and-whisker plot of LTV all originations in that year? Absolutely. It all depends on the context of the question, and the information you intend to convey with your visualization. \n",
    "\n",
    "When in doubt, ask yourself: **am I clearly and powerfully communicating the relevant information with this visualization?**\n",
    "\n",
    "- With each of these questions below, you will be asked to do two things:\n",
    "    1. Construct a visualization to answer the question.\n",
    "        - You'll be pre-allotted one code cell in the notebook for this, but feel free to use as many as you'd like. As was shown in the lecture materials, a good visualization almost always requires iteration. Feel free to keep the remnants of your iterative creative procees in your notebooks; just ensure your final viz. for each question is clearly marked. \n",
    "    2. Briefly explain (in no more than a paragraph) why you chose to visualize the data as you did. \n",
    "        - You'll be pre-allotted one markdown cell in the notebook for this, directly following the code cell. If you are struggling to think of what to write, fall back on the lecture materials, particulary Section 1: Why We Visualize. Imagine that each of your visualizations was going to be presented to your team at a Process Confirm / Code Review; your paragraph should read like the explanation you would give in that context, detailing why your choices made for the most effective viz. Be sure to focus on how your visualization answers the question at hand, the crux of which is **in bold** although the entire question provides relevant information as to what is expected.\n",
    "    \n",
    "We'll be using the [same data](https://nbviewer.jupyter.org/github/pmaji/practical-python-data-viz-guide/blob/master/notebooks/data_prep_nb.ipynb) we've been dealing with throughout the seminar: January and December 2017 FNMA originations. Remember, if you don't understand what some of the variables mean, all the information you need is in the `data_prep_nb.ipynb`, including links to relevant glossaries and data dictionnaries.  \n",
    "\n",
    "**Note**: For all questions below, you are free to use whatever python visualization package you want. That said, some questions require a specific type of visualization (example: if you know that you need an interactive visualization, don't start by using a package that you know cannot build interactive visualizations). \n",
    "\n",
    "Good luck!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# basic packages\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import datetime"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'This notebook was last executed on 2019-09-08 20:42'"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# store the datetime of the most recent running of this notebook as a form of a log\n",
    "most_recent_run_datetime = datetime.datetime.now().strftime(\"%Y-%m-%d %H:%M\")\n",
    "f\"This notebook was last executed on {most_recent_run_datetime}\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>loan_id</th>\n",
       "      <th>orig_chn</th>\n",
       "      <th>seller_name</th>\n",
       "      <th>orig_rt</th>\n",
       "      <th>orig_amt</th>\n",
       "      <th>orig_trm</th>\n",
       "      <th>orig_dte</th>\n",
       "      <th>frst_dte</th>\n",
       "      <th>oltv</th>\n",
       "      <th>ocltv</th>\n",
       "      <th>...</th>\n",
       "      <th>occ_stat</th>\n",
       "      <th>state</th>\n",
       "      <th>zip_3</th>\n",
       "      <th>mi_pct</th>\n",
       "      <th>product_type</th>\n",
       "      <th>cscore_c</th>\n",
       "      <th>mi_type</th>\n",
       "      <th>relocation_flg</th>\n",
       "      <th>cscore_min</th>\n",
       "      <th>orig_val</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>100020736692</td>\n",
       "      <td>B</td>\n",
       "      <td>CALIBER HOME LOANS, INC.</td>\n",
       "      <td>4.875</td>\n",
       "      <td>492000</td>\n",
       "      <td>360</td>\n",
       "      <td>12/2017</td>\n",
       "      <td>02/2018</td>\n",
       "      <td>75</td>\n",
       "      <td>75</td>\n",
       "      <td>...</td>\n",
       "      <td>I</td>\n",
       "      <td>CA</td>\n",
       "      <td>920</td>\n",
       "      <td>NaN</td>\n",
       "      <td>FRM</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>N</td>\n",
       "      <td>757.0</td>\n",
       "      <td>656000.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>100036136334</td>\n",
       "      <td>R</td>\n",
       "      <td>OTHER</td>\n",
       "      <td>2.750</td>\n",
       "      <td>190000</td>\n",
       "      <td>180</td>\n",
       "      <td>12/2017</td>\n",
       "      <td>01/2018</td>\n",
       "      <td>67</td>\n",
       "      <td>67</td>\n",
       "      <td>...</td>\n",
       "      <td>P</td>\n",
       "      <td>MD</td>\n",
       "      <td>206</td>\n",
       "      <td>NaN</td>\n",
       "      <td>FRM</td>\n",
       "      <td>798.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>N</td>\n",
       "      <td>797.0</td>\n",
       "      <td>283582.089552</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>100043912941</td>\n",
       "      <td>R</td>\n",
       "      <td>OTHER</td>\n",
       "      <td>4.125</td>\n",
       "      <td>68000</td>\n",
       "      <td>360</td>\n",
       "      <td>12/2017</td>\n",
       "      <td>02/2018</td>\n",
       "      <td>66</td>\n",
       "      <td>66</td>\n",
       "      <td>...</td>\n",
       "      <td>P</td>\n",
       "      <td>OH</td>\n",
       "      <td>432</td>\n",
       "      <td>NaN</td>\n",
       "      <td>FRM</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>N</td>\n",
       "      <td>804.0</td>\n",
       "      <td>103030.303030</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>100057175226</td>\n",
       "      <td>R</td>\n",
       "      <td>OTHER</td>\n",
       "      <td>4.990</td>\n",
       "      <td>71000</td>\n",
       "      <td>360</td>\n",
       "      <td>12/2017</td>\n",
       "      <td>02/2018</td>\n",
       "      <td>95</td>\n",
       "      <td>95</td>\n",
       "      <td>...</td>\n",
       "      <td>P</td>\n",
       "      <td>NC</td>\n",
       "      <td>278</td>\n",
       "      <td>30.0</td>\n",
       "      <td>FRM</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>N</td>\n",
       "      <td>696.0</td>\n",
       "      <td>74736.842105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>100060715643</td>\n",
       "      <td>R</td>\n",
       "      <td>OTHER</td>\n",
       "      <td>4.500</td>\n",
       "      <td>180000</td>\n",
       "      <td>360</td>\n",
       "      <td>12/2017</td>\n",
       "      <td>02/2018</td>\n",
       "      <td>75</td>\n",
       "      <td>75</td>\n",
       "      <td>...</td>\n",
       "      <td>I</td>\n",
       "      <td>WA</td>\n",
       "      <td>983</td>\n",
       "      <td>NaN</td>\n",
       "      <td>FRM</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>N</td>\n",
       "      <td>726.0</td>\n",
       "      <td>240000.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 27 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        loan_id orig_chn               seller_name  orig_rt  orig_amt  \\\n",
       "0  100020736692        B  CALIBER HOME LOANS, INC.    4.875    492000   \n",
       "1  100036136334        R                     OTHER    2.750    190000   \n",
       "2  100043912941        R                     OTHER    4.125     68000   \n",
       "3  100057175226        R                     OTHER    4.990     71000   \n",
       "4  100060715643        R                     OTHER    4.500    180000   \n",
       "\n",
       "   orig_trm orig_dte frst_dte  oltv  ocltv  ...  occ_stat  state  zip_3  \\\n",
       "0       360  12/2017  02/2018    75     75  ...         I     CA    920   \n",
       "1       180  12/2017  01/2018    67     67  ...         P     MD    206   \n",
       "2       360  12/2017  02/2018    66     66  ...         P     OH    432   \n",
       "3       360  12/2017  02/2018    95     95  ...         P     NC    278   \n",
       "4       360  12/2017  02/2018    75     75  ...         I     WA    983   \n",
       "\n",
       "  mi_pct product_type cscore_c  mi_type relocation_flg cscore_min  \\\n",
       "0    NaN          FRM      NaN      NaN              N      757.0   \n",
       "1    NaN          FRM    798.0      NaN              N      797.0   \n",
       "2    NaN          FRM      NaN      NaN              N      804.0   \n",
       "3   30.0          FRM      NaN      1.0              N      696.0   \n",
       "4    NaN          FRM      NaN      NaN              N      726.0   \n",
       "\n",
       "        orig_val  \n",
       "0  656000.000000  \n",
       "1  283582.089552  \n",
       "2  103030.303030  \n",
       "3   74736.842105  \n",
       "4  240000.000000  \n",
       "\n",
       "[5 rows x 27 columns]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# pulling in our main data; for more info on the data, see the \"data_prep_nb.ipynb\" file\n",
    "main_df = pd.read_csv(filepath_or_buffer='../data/jan_and_dec_17_acqs.csv')\n",
    "\n",
    "# taking a peek at our data\n",
    "main_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 1\n",
    "\n",
    "A business partner of yours came to you to ask about how occupancy status relates to risk. They were wondering, **what occupancy status appears riskier in our data: principal homes (i.e. someone's primary residence), second homes, or investor-owned homes?** There are obviously many ways of measuring risk. Here it's safe to assume your business partner means credit risk, so some variables you may want to consider would be the borrower's credit score, DTI, or LTV. You can use one or more of these variables in your analysis, or something else altogether if you see fit; just ensure that in the end you arrive at one a single visualization to share with your business partner. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# code for visualization goes here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Explanation for why you chose this particular visualization goes here..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 2\n",
    "\n",
    "Imagine that a recent news event broke that had to do with [mortgage insurance (MI)](https://en.wikipedia.org/wiki/Mortgage_insurance), and even though we don't yet know exactly how that news will impact Fannie Mae's business, you've been asked to produce a visualization that communicates **to what extent our December 2017 acquisitions were covered by MI**. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# code for visualization goes here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Explanation for why you chose this particular visualization goes here..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 3\n",
    "\n",
    "One of your business partners is trying to **learn more about the areas of the country where we are providing the highest value loans in terms of origination amount**. You've also been told that an interactive map of the United States would be optimal here, and they'd like you to add whatever data you might think are relevant to the tooltip. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# code for visualization goes here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Explanation for why you chose this particular visualization goes here..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 4\n",
    "\n",
    "You've received a very open-ended question from an account manager hoping to learn more about how the seller with whom they work most closely compares to all sellers. Pick any seller (aside from \"Other\") and any two variables in our data (i.e. origination amount and origination value, but don't use that combo), and put together a visualization that communicates **whether or not that seller is unique in any way as it pertains to the two variables you selected**. The answer can be yes, no, or maybe... just justify your answer with your visualization. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# code for visualization goes here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Explanation for why you chose this particular visualization goes here..."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}