{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Data Prep\n",
    "\n",
    "For this vignette, we'll be using acquisition loan data from [Fannie Mae's public datasets](https://loanperformancedata.fanniemae.com/lppub/index.html#Portfolio). There is sadly not yet a Python script to facilitate the reading in of all the data—which are stored as pipe-delimited .txt files, released quarterly on ~1 year lag—so I had to infer some of my data cleaning steps from the [R code that Fannie made available](https://loanperformancedata.fanniemae.com/lppub-docs/FNMA_SF_Loan_Performance_r_Primary.zip) to download and clean the data.\n",
    "\n",
    "If you are not immediately familiar with these data (or mortgage finance-related data in general), I highly recommend referencing the 1st table—\"Acquisition File Layout\"—in [this dictionnary file here](https://loanperformancedata.fanniemae.com/lppub-docs/FNMA_SF_Loan_Performance_File_layout.pdf), along with the [corresponding glossary here](https://loanperformancedata.fanniemae.com/lppub-docs/FNMA_SF_Loan_Performance_Glossary.pdf), which provides an explanation of the meaning of the values for each variable.\n",
    "\n",
    "For our purposes here, I will only be using acquisition data from the 1st and 4th quarter of 2017; thus, I will need to read in 2 .txt files. The code below is generalizable up to however many text files you want to use; you need only change the path_to_data variable to point at whatever path stores your .txt files.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# basic packages\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import datetime\n",
    "\n",
    "# for data importing\n",
    "import os\n",
    "import csv\n",
    "\n",
    "# for data cleaning\n",
    "from janitor import clean_names, remove_empty"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'This notebook was last executed on 2019-08-05 22:09'"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# store the datetime of the most recent running of this notebook as a form of a log\n",
    "most_recent_run_datetime = datetime.datetime.now().strftime(\"%Y-%m-%d %H:%M\")\n",
    "f\"This notebook was last executed on {most_recent_run_datetime}\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Reading in the Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# location of my .txt data files\n",
    "path_to_data = '../data/'\n",
    "\n",
    "# instantiate an empty list to store the file names to read in\n",
    "filelist = []\n",
    "\n",
    "# loop over each filename in the specified directory\n",
    "for filename in os.listdir(path_to_data):\n",
    "    # check if the file path corresponds to a file\n",
    "    # ensure the file in question is a .txt file\n",
    "    # ensure we haven't already added that file to the list\n",
    "    if os.path.isfile(path_to_data + filename) \\\n",
    "        and filename.endswith(\".txt\") \\\n",
    "        and filename not in filelist:\n",
    "            # append the file to our list of files\n",
    "            filelist.append(path_to_data+filename)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['../data/Acquisition_2017Q4.txt', '../data/Acquisition_2017Q1.txt']"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# take a look at the files we'll be reading in\n",
    "filelist"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# instantiate an empty df that we'll use to store all our data\n",
    "acq_df = pd.DataFrame()\n",
    "\n",
    "# list of variable names for all the fields in the .txt files (adapted from the aforementioned docs)\n",
    "acq_var_names = ['LOAN_ID', 'ORIG_CHN', 'Seller.Name', 'ORIG_RT', 'ORIG_AMT', 'ORIG_TRM', 'ORIG_DTE','FRST_DTE',\n",
    "                 'OLTV', 'OCLTV', 'NUM_BO', 'DTI', 'CSCORE_B', 'FTHB_FLG', 'PURPOSE', 'PROP_TYP', 'NUM_UNIT', \n",
    "                 'OCC_STAT', 'STATE', 'ZIP_3', 'MI_PCT', 'Product.Type', 'CSCORE_C', 'MI_TYPE', 'RELOCATION_FLG']\n",
    "\n",
    "# loop over the .txt files, read them in, and append them to make our master acquisitions df\n",
    "for f in filelist:\n",
    "    # specify that our delimiter is a pipe, ignore the header, and use pre-specified variable names\n",
    "    temp_df = pd.read_csv(filepath_or_buffer = f, sep=\"|\", header=None, names=acq_var_names)\n",
    "    # ensure that concatenation is row-wise and ignore the index values as they don't convey meaning here\n",
    "    acq_df = pd.concat(objs=[acq_df, temp_df], axis=0, ignore_index=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>LOAN_ID</th>\n",
       "      <th>ORIG_CHN</th>\n",
       "      <th>Seller.Name</th>\n",
       "      <th>ORIG_RT</th>\n",
       "      <th>ORIG_AMT</th>\n",
       "      <th>ORIG_TRM</th>\n",
       "      <th>ORIG_DTE</th>\n",
       "      <th>FRST_DTE</th>\n",
       "      <th>OLTV</th>\n",
       "      <th>OCLTV</th>\n",
       "      <th>...</th>\n",
       "      <th>PROP_TYP</th>\n",
       "      <th>NUM_UNIT</th>\n",
       "      <th>OCC_STAT</th>\n",
       "      <th>STATE</th>\n",
       "      <th>ZIP_3</th>\n",
       "      <th>MI_PCT</th>\n",
       "      <th>Product.Type</th>\n",
       "      <th>CSCORE_C</th>\n",
       "      <th>MI_TYPE</th>\n",
       "      <th>RELOCATION_FLG</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>100004724719</td>\n",
       "      <td>R</td>\n",
       "      <td>GUILD MORTGAGE COMPANY</td>\n",
       "      <td>4.375</td>\n",
       "      <td>228000</td>\n",
       "      <td>360</td>\n",
       "      <td>09/2017</td>\n",
       "      <td>11/2017</td>\n",
       "      <td>95</td>\n",
       "      <td>95</td>\n",
       "      <td>...</td>\n",
       "      <td>SF</td>\n",
       "      <td>1</td>\n",
       "      <td>P</td>\n",
       "      <td>HI</td>\n",
       "      <td>967</td>\n",
       "      <td>30.0</td>\n",
       "      <td>FRM</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>N</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>100005682269</td>\n",
       "      <td>R</td>\n",
       "      <td>WELLS FARGO BANK, N.A.</td>\n",
       "      <td>3.375</td>\n",
       "      <td>75000</td>\n",
       "      <td>120</td>\n",
       "      <td>10/2017</td>\n",
       "      <td>12/2017</td>\n",
       "      <td>79</td>\n",
       "      <td>79</td>\n",
       "      <td>...</td>\n",
       "      <td>SF</td>\n",
       "      <td>1</td>\n",
       "      <td>P</td>\n",
       "      <td>MS</td>\n",
       "      <td>394</td>\n",
       "      <td>NaN</td>\n",
       "      <td>FRM</td>\n",
       "      <td>765.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>N</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>100005754525</td>\n",
       "      <td>C</td>\n",
       "      <td>WELLS FARGO BANK, N.A.</td>\n",
       "      <td>4.750</td>\n",
       "      <td>79000</td>\n",
       "      <td>360</td>\n",
       "      <td>11/2017</td>\n",
       "      <td>01/2018</td>\n",
       "      <td>80</td>\n",
       "      <td>80</td>\n",
       "      <td>...</td>\n",
       "      <td>SF</td>\n",
       "      <td>1</td>\n",
       "      <td>I</td>\n",
       "      <td>MO</td>\n",
       "      <td>653</td>\n",
       "      <td>NaN</td>\n",
       "      <td>FRM</td>\n",
       "      <td>799.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>N</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>100006973104</td>\n",
       "      <td>C</td>\n",
       "      <td>SUNTRUST BANK</td>\n",
       "      <td>4.250</td>\n",
       "      <td>295000</td>\n",
       "      <td>360</td>\n",
       "      <td>11/2017</td>\n",
       "      <td>01/2018</td>\n",
       "      <td>91</td>\n",
       "      <td>91</td>\n",
       "      <td>...</td>\n",
       "      <td>SF</td>\n",
       "      <td>1</td>\n",
       "      <td>P</td>\n",
       "      <td>OH</td>\n",
       "      <td>452</td>\n",
       "      <td>30.0</td>\n",
       "      <td>FRM</td>\n",
       "      <td>789.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>N</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>100007420970</td>\n",
       "      <td>R</td>\n",
       "      <td>FREEDOM MORTGAGE CORP.</td>\n",
       "      <td>4.500</td>\n",
       "      <td>178000</td>\n",
       "      <td>360</td>\n",
       "      <td>09/2017</td>\n",
       "      <td>11/2017</td>\n",
       "      <td>63</td>\n",
       "      <td>63</td>\n",
       "      <td>...</td>\n",
       "      <td>SF</td>\n",
       "      <td>1</td>\n",
       "      <td>P</td>\n",
       "      <td>CA</td>\n",
       "      <td>924</td>\n",
       "      <td>NaN</td>\n",
       "      <td>FRM</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>N</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 25 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        LOAN_ID ORIG_CHN             Seller.Name  ORIG_RT  ORIG_AMT  ORIG_TRM  \\\n",
       "0  100004724719        R  GUILD MORTGAGE COMPANY    4.375    228000       360   \n",
       "1  100005682269        R  WELLS FARGO BANK, N.A.    3.375     75000       120   \n",
       "2  100005754525        C  WELLS FARGO BANK, N.A.    4.750     79000       360   \n",
       "3  100006973104        C           SUNTRUST BANK    4.250    295000       360   \n",
       "4  100007420970        R  FREEDOM MORTGAGE CORP.    4.500    178000       360   \n",
       "\n",
       "  ORIG_DTE FRST_DTE  OLTV  OCLTV  ...  PROP_TYP  NUM_UNIT  OCC_STAT STATE  \\\n",
       "0  09/2017  11/2017    95     95  ...        SF         1         P    HI   \n",
       "1  10/2017  12/2017    79     79  ...        SF         1         P    MS   \n",
       "2  11/2017  01/2018    80     80  ...        SF         1         I    MO   \n",
       "3  11/2017  01/2018    91     91  ...        SF         1         P    OH   \n",
       "4  09/2017  11/2017    63     63  ...        SF         1         P    CA   \n",
       "\n",
       "  ZIP_3 MI_PCT  Product.Type CSCORE_C MI_TYPE  RELOCATION_FLG  \n",
       "0   967   30.0           FRM      NaN     1.0               N  \n",
       "1   394    NaN           FRM    765.0     NaN               N  \n",
       "2   653    NaN           FRM    799.0     NaN               N  \n",
       "3   452   30.0           FRM    789.0     1.0               N  \n",
       "4   924    NaN           FRM      NaN     NaN               N  \n",
       "\n",
       "[5 rows x 25 columns]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# taking a look at the data structure we have so far\n",
    "acq_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cleaning the Data\n",
    "\n",
    "For the most part, our data here are pretty clean. In the section below, we'll just make a few convenience changes, execute a couple checks, and create a few new variables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# use pyjanitor package to take care of basic data cleaning\n",
    "acq_df = (\n",
    "    acq_df\n",
    "    # clean the column names, remove any leading/trailing underscores\n",
    "    .clean_names(strip_underscores=True)\n",
    "    # remove any rows that are entirely NA\n",
    "    .remove_empty()\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create a few new fields as recommended by the aformentioned docs\n",
    "\n",
    "# find minimum credit score of borrower and co-borrower\n",
    "acq_df['cscore_min'] = (\n",
    "    acq_df[['cscore_b','cscore_c']].min(axis=1)\n",
    ")\n",
    "\n",
    "# find origination value = origination amount / origination loan-to-value ratio\n",
    "acq_df['orig_val'] = (\n",
    "    acq_df['orig_amt'] / (acq_df['oltv']/100)\n",
    ")\n",
    "\n",
    "# check if the ocltv is null; if it is, set it to the oltv\n",
    "acq_df['ocltv'] = (\n",
    "    np.where(acq_df['ocltv'].isnull(), acq_df['oltv'], acq_df['ocltv'])\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>loan_id</th>\n",
       "      <th>orig_chn</th>\n",
       "      <th>seller_name</th>\n",
       "      <th>orig_rt</th>\n",
       "      <th>orig_amt</th>\n",
       "      <th>orig_trm</th>\n",
       "      <th>orig_dte</th>\n",
       "      <th>frst_dte</th>\n",
       "      <th>oltv</th>\n",
       "      <th>ocltv</th>\n",
       "      <th>...</th>\n",
       "      <th>occ_stat</th>\n",
       "      <th>state</th>\n",
       "      <th>zip_3</th>\n",
       "      <th>mi_pct</th>\n",
       "      <th>product_type</th>\n",
       "      <th>cscore_c</th>\n",
       "      <th>mi_type</th>\n",
       "      <th>relocation_flg</th>\n",
       "      <th>cscore_min</th>\n",
       "      <th>orig_val</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>100004724719</td>\n",
       "      <td>R</td>\n",
       "      <td>GUILD MORTGAGE COMPANY</td>\n",
       "      <td>4.375</td>\n",
       "      <td>228000</td>\n",
       "      <td>360</td>\n",
       "      <td>09/2017</td>\n",
       "      <td>11/2017</td>\n",
       "      <td>95</td>\n",
       "      <td>95</td>\n",
       "      <td>...</td>\n",
       "      <td>P</td>\n",
       "      <td>HI</td>\n",
       "      <td>967</td>\n",
       "      <td>30.0</td>\n",
       "      <td>FRM</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>N</td>\n",
       "      <td>693.0</td>\n",
       "      <td>240000.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>100005682269</td>\n",
       "      <td>R</td>\n",
       "      <td>WELLS FARGO BANK, N.A.</td>\n",
       "      <td>3.375</td>\n",
       "      <td>75000</td>\n",
       "      <td>120</td>\n",
       "      <td>10/2017</td>\n",
       "      <td>12/2017</td>\n",
       "      <td>79</td>\n",
       "      <td>79</td>\n",
       "      <td>...</td>\n",
       "      <td>P</td>\n",
       "      <td>MS</td>\n",
       "      <td>394</td>\n",
       "      <td>NaN</td>\n",
       "      <td>FRM</td>\n",
       "      <td>765.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>N</td>\n",
       "      <td>737.0</td>\n",
       "      <td>94936.708861</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>100005754525</td>\n",
       "      <td>C</td>\n",
       "      <td>WELLS FARGO BANK, N.A.</td>\n",
       "      <td>4.750</td>\n",
       "      <td>79000</td>\n",
       "      <td>360</td>\n",
       "      <td>11/2017</td>\n",
       "      <td>01/2018</td>\n",
       "      <td>80</td>\n",
       "      <td>80</td>\n",
       "      <td>...</td>\n",
       "      <td>I</td>\n",
       "      <td>MO</td>\n",
       "      <td>653</td>\n",
       "      <td>NaN</td>\n",
       "      <td>FRM</td>\n",
       "      <td>799.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>N</td>\n",
       "      <td>799.0</td>\n",
       "      <td>98750.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>100006973104</td>\n",
       "      <td>C</td>\n",
       "      <td>SUNTRUST BANK</td>\n",
       "      <td>4.250</td>\n",
       "      <td>295000</td>\n",
       "      <td>360</td>\n",
       "      <td>11/2017</td>\n",
       "      <td>01/2018</td>\n",
       "      <td>91</td>\n",
       "      <td>91</td>\n",
       "      <td>...</td>\n",
       "      <td>P</td>\n",
       "      <td>OH</td>\n",
       "      <td>452</td>\n",
       "      <td>30.0</td>\n",
       "      <td>FRM</td>\n",
       "      <td>789.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>N</td>\n",
       "      <td>760.0</td>\n",
       "      <td>324175.824176</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>100007420970</td>\n",
       "      <td>R</td>\n",
       "      <td>FREEDOM MORTGAGE CORP.</td>\n",
       "      <td>4.500</td>\n",
       "      <td>178000</td>\n",
       "      <td>360</td>\n",
       "      <td>09/2017</td>\n",
       "      <td>11/2017</td>\n",
       "      <td>63</td>\n",
       "      <td>63</td>\n",
       "      <td>...</td>\n",
       "      <td>P</td>\n",
       "      <td>CA</td>\n",
       "      <td>924</td>\n",
       "      <td>NaN</td>\n",
       "      <td>FRM</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>N</td>\n",
       "      <td>675.0</td>\n",
       "      <td>282539.682540</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 27 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        loan_id orig_chn             seller_name  orig_rt  orig_amt  orig_trm  \\\n",
       "0  100004724719        R  GUILD MORTGAGE COMPANY    4.375    228000       360   \n",
       "1  100005682269        R  WELLS FARGO BANK, N.A.    3.375     75000       120   \n",
       "2  100005754525        C  WELLS FARGO BANK, N.A.    4.750     79000       360   \n",
       "3  100006973104        C           SUNTRUST BANK    4.250    295000       360   \n",
       "4  100007420970        R  FREEDOM MORTGAGE CORP.    4.500    178000       360   \n",
       "\n",
       "  orig_dte frst_dte  oltv  ocltv  ...  occ_stat  state  zip_3 mi_pct  \\\n",
       "0  09/2017  11/2017    95     95  ...         P     HI    967   30.0   \n",
       "1  10/2017  12/2017    79     79  ...         P     MS    394    NaN   \n",
       "2  11/2017  01/2018    80     80  ...         I     MO    653    NaN   \n",
       "3  11/2017  01/2018    91     91  ...         P     OH    452   30.0   \n",
       "4  09/2017  11/2017    63     63  ...         P     CA    924    NaN   \n",
       "\n",
       "  product_type cscore_c  mi_type relocation_flg cscore_min       orig_val  \n",
       "0          FRM      NaN      1.0              N      693.0  240000.000000  \n",
       "1          FRM    765.0      NaN              N      737.0   94936.708861  \n",
       "2          FRM    799.0      NaN              N      799.0   98750.000000  \n",
       "3          FRM    789.0      1.0              N      760.0  324175.824176  \n",
       "4          FRM      NaN      NaN              N      675.0  282539.682540  \n",
       "\n",
       "[5 rows x 27 columns]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# inspect our final cleaned data\n",
    "acq_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Final Data Trimming\n",
    "\n",
    "For the sake of size / efficiency, I am going to create a dataset composed of only loans originated in two months: January 2017 and December 2017. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'The final dataset filtered to just Jan2017 and Dec2017 originations has 158168 rows and 27 columns.'"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# filter original acq_df to just loans with origination dates in jan or dec 2017\n",
    "jan_and_dec_17_acqs = acq_df.loc[\n",
    "    (acq_df['orig_dte'] == '01/2017') | (acq_df['orig_dte'] == '12/2017')\n",
    "]\n",
    "\n",
    "# inspect the features of the resulting dataset\n",
    "row_count, column_count = jan_and_dec_17_acqs.shape\n",
    "f\"The final dataset filtered to just Jan2017 and Dec2017 originations has {row_count} rows and {column_count} columns.\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "# lastly, we'll save out this dataset for use elsewhere\n",
    "jan_and_dec_17_acqs.to_csv(path_or_buf='../data/jan_and_dec_17_acqs.csv', index=False)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}