{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# ZBP to ZCTA\n",
    "Retrieves data from the Census Bureau's ZIP Code Business Patterns API for a specific area and summarizes it by ZCTA. Three tables are generated: one for employees that contains employment, establishments, and wages, one for industries that contains counts of establishments by 2-digit sector NAICS codes, and one reference table that correlates sector numbers and names. Initial data retrieved from API is written to json, final output is written to a SQLite database. \n",
    "\n",
    "Confirmed to work with the 2018 ZBP series\n",
    "\n",
    "https://www.census.gov/data/developers/data-sets/cbp-nonemp-zbp/zbp-api.html\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Variables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd, requests, sqlite3, os, json\n",
    "from IPython.display import clear_output"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Crosswalk files - update only if necessary\n",
    "uszips_file='zip_to_zcta_2019_uds.csv'\n",
    "zcta_file='geocorr14_modified.csv'\n",
    "\n",
    "uszips_path=os.path.join('inputs',uszips_file)\n",
    "zcta_path=zcta_file=os.path.join('inputs',zcta_file)\n",
    "\n",
    "#Dump files for api data storage\n",
    "ejsonpath=os.path.join('outputs', 'emp_data.json')\n",
    "ijsonpath=os.path.join('outputs', 'ind_data.json')\n",
    "cjsonpath=os.path.join('outputs', 'codes_data.json')\n",
    "\n",
    "#API variables - UPDATE THE YEAR\n",
    "keyfile='census_key.txt'\n",
    "\n",
    "year='2018'\n",
    "dsource='zbp'\n",
    "state='36'\n",
    "ecols='ESTAB,EMP,PAYQTR1,PAYANN'\n",
    "icols='ESTAB'\n",
    "ncodes=['00','11','21','22','23','31-33','42','44-45','48-49',\n",
    "        '51','52','53','54','55','56','61','62','71','72','81',\n",
    "        '99']\n",
    "\n",
    "#SQL output - UPDATE EACH TABLE NAME\n",
    "dbname=os.path.join('outputs','testdb.sqlite')\n",
    "emptable='zbp2018emp'\n",
    "indtable='zbp2018ind'\n",
    "codetable='zbp2018indcodes'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Crosswalking\n",
    "Read in the files that relate US ZIP codes to ZCTAs (from JSI) and\n",
    "ZCTAs to counties for local area (from MCDC Geocorr), then join them by \n",
    "ZIP Code to create a ZCTA to ZIP table for the local area"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ZIP_CODE</th>\n",
       "      <th>PO_NAME</th>\n",
       "      <th>STATE</th>\n",
       "      <th>ZIP_TYPE</th>\n",
       "      <th>ZCTA</th>\n",
       "      <th>zip_join_type</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>00501</td>\n",
       "      <td>Holtsville</td>\n",
       "      <td>NY</td>\n",
       "      <td>Post Office or large volume customer</td>\n",
       "      <td>11742</td>\n",
       "      <td>Spatial join to ZCTA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>00544</td>\n",
       "      <td>Holtsville</td>\n",
       "      <td>NY</td>\n",
       "      <td>Post Office or large volume customer</td>\n",
       "      <td>11742</td>\n",
       "      <td>Spatial join to ZCTA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>00601</td>\n",
       "      <td>Adjuntas</td>\n",
       "      <td>PR</td>\n",
       "      <td>Zip Code Area</td>\n",
       "      <td>00601</td>\n",
       "      <td>Zip Matches ZCTA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>00602</td>\n",
       "      <td>Aguada</td>\n",
       "      <td>PR</td>\n",
       "      <td>Zip Code Area</td>\n",
       "      <td>00602</td>\n",
       "      <td>Zip Matches ZCTA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>00603</td>\n",
       "      <td>Aguadilla</td>\n",
       "      <td>PR</td>\n",
       "      <td>Zip Code Area</td>\n",
       "      <td>00603</td>\n",
       "      <td>Zip Matches ZCTA</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  ZIP_CODE     PO_NAME STATE                              ZIP_TYPE   ZCTA  \\\n",
       "0    00501  Holtsville    NY  Post Office or large volume customer  11742   \n",
       "1    00544  Holtsville    NY  Post Office or large volume customer  11742   \n",
       "2    00601    Adjuntas    PR                         Zip Code Area  00601   \n",
       "3    00602      Aguada    PR                         Zip Code Area  00602   \n",
       "4    00603   Aguadilla    PR                         Zip Code Area  00603   \n",
       "\n",
       "          zip_join_type  \n",
       "0  Spatial join to ZCTA  \n",
       "1  Spatial join to ZCTA  \n",
       "2      Zip Matches ZCTA  \n",
       "3      Zip Matches ZCTA  \n",
       "4      Zip Matches ZCTA  "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "uszips=pd.read_csv(uszips_path, sep=',', dtype={'ZIP_CODE':str, 'ZCTA':str})\n",
    "uszips.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(41107, 6)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#All ZIP Codes in US\n",
    "uszips.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>zcta5</th>\n",
       "      <th>county14</th>\n",
       "      <th>cntyname2</th>\n",
       "      <th>zipname</th>\n",
       "      <th>pop10</th>\n",
       "      <th>afact</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10001</td>\n",
       "      <td>36061</td>\n",
       "      <td>New York NY</td>\n",
       "      <td>New York, NY</td>\n",
       "      <td>21102</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10002</td>\n",
       "      <td>36061</td>\n",
       "      <td>New York NY</td>\n",
       "      <td>New York, NY</td>\n",
       "      <td>81410</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>10003</td>\n",
       "      <td>36061</td>\n",
       "      <td>New York NY</td>\n",
       "      <td>New York, NY</td>\n",
       "      <td>56024</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>10004</td>\n",
       "      <td>36061</td>\n",
       "      <td>New York NY</td>\n",
       "      <td>New York, NY</td>\n",
       "      <td>3089</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>10005</td>\n",
       "      <td>36061</td>\n",
       "      <td>New York NY</td>\n",
       "      <td>New York, NY</td>\n",
       "      <td>7135</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   zcta5 county14    cntyname2       zipname  pop10  afact\n",
       "0  10001    36061  New York NY  New York, NY  21102    1.0\n",
       "1  10002    36061  New York NY  New York, NY  81410    1.0\n",
       "2  10003    36061  New York NY  New York, NY  56024    1.0\n",
       "3  10004    36061  New York NY  New York, NY   3089    1.0\n",
       "4  10005    36061  New York NY  New York, NY   7135    1.0"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "zcta=pd.read_csv(zcta_path, sep=',', dtype={'zcta5':str, 'county14':str})\n",
    "zcta.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(214, 6)"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#ZCTAs in local area\n",
    "zcta.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ZIP_TYPE</th>\n",
       "      <th>PO_NAME</th>\n",
       "      <th>zcta5</th>\n",
       "      <th>county14</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ZIP_CODE</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>10001</th>\n",
       "      <td>Zip Code Area</td>\n",
       "      <td>New York</td>\n",
       "      <td>10001</td>\n",
       "      <td>36061</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10118</th>\n",
       "      <td>Post Office or large volume customer</td>\n",
       "      <td>New York</td>\n",
       "      <td>10001</td>\n",
       "      <td>36061</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10120</th>\n",
       "      <td>Post Office or large volume customer</td>\n",
       "      <td>New York</td>\n",
       "      <td>10001</td>\n",
       "      <td>36061</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10122</th>\n",
       "      <td>Post Office or large volume customer</td>\n",
       "      <td>New York</td>\n",
       "      <td>10001</td>\n",
       "      <td>36061</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10123</th>\n",
       "      <td>Post Office or large volume customer</td>\n",
       "      <td>New York</td>\n",
       "      <td>10001</td>\n",
       "      <td>36061</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                      ZIP_TYPE   PO_NAME  zcta5 county14\n",
       "ZIP_CODE                                                                \n",
       "10001                            Zip Code Area  New York  10001    36061\n",
       "10118     Post Office or large volume customer  New York  10001    36061\n",
       "10120     Post Office or large volume customer  New York  10001    36061\n",
       "10122     Post Office or large volume customer  New York  10001    36061\n",
       "10123     Post Office or large volume customer  New York  10001    36061"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Merge ZIP Codes with ZCTAs for local area\n",
    "zip2zcta = pd.merge(uszips[['ZIP_CODE','ZIP_TYPE','PO_NAME','ZCTA']],zcta[['zcta5','county14']],how='right', \n",
    "                    left_on='ZCTA', right_on='zcta5').set_index('ZIP_CODE')\n",
    "zip2zcta.drop(columns=['ZCTA'],inplace=True)\n",
    "zip2zcta.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(313, 4)"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#ZIP Codes in local area\n",
    "zip2zcta.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## API Call and Processing\n",
    "Request the ZBP data from the Census Bureau for the state, join the ZBP data to the local ZCTA to ZIP table based on ZIP Code, and group the data by ZCTA "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open(keyfile) as key:\n",
    "    api_key=key.read().strip()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'https://api.census.gov/data/2018/zbp'"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "base_url = f'https://api.census.gov/data/{year}/{dsource}'\n",
    "base_url"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### ZBP Employment Data\n",
    "This data is requested in a series of chunks which contain multiple ZIP Codes - do not rerun the requests block if retrieval is successful but subsequent notebook changes are needed. Proceed to the next block and pull data from json dump file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "def chunks(l, n):\n",
    "    # For item i in a range that is a length of l,\n",
    "    for i in range(0, len(l), n):\n",
    "        # Create an index range for l of n items:\n",
    "        yield l[i:i+n]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of chunks: 7\n"
     ]
    }
   ],
   "source": [
    "reqzips=list(chunks(zip2zcta.index.tolist(),48))\n",
    "print('Number of chunks:',len(reqzips))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### ***THIS BLOCK IS A REQUESTS BLOCK!*** "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Retrieved data for chunk 6\n",
      "Done - Data dumped to json file\n"
     ]
    }
   ],
   "source": [
    "#Code 200 = success, do not rerun this block unless it's necessary\n",
    "emp_data=[]\n",
    "for i, v in enumerate (reqzips):\n",
    "    batchzips=','.join(v)\n",
    "    edata_url = f'{base_url}?get={ecols}&EMPSZES=001&for=zipcode:{batchzips}&key={api_key}'\n",
    "    response=requests.get(edata_url)\n",
    "    if response.status_code==200:\n",
    "        clear_output(wait=True)\n",
    "        data=response.json()\n",
    "        if i == 0:    \n",
    "            for record in data:\n",
    "                emp_data.append(record)\n",
    "        else:\n",
    "            for record in data[1:]:\n",
    "                emp_data.append(record) \n",
    "        print('Retrieved data for chunk',i)\n",
    "    else:\n",
    "        print('***Problem with retrieval***, response code',response.status_code)\n",
    "        break\n",
    "with open(ejsonpath, 'w') as f:\n",
    "    json.dump(emp_data, f)\n",
    "print('Done - Data dumped to json file')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ESTAB</th>\n",
       "      <th>EMP</th>\n",
       "      <th>PAYQ1</th>\n",
       "      <th>PAYAN</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>zipcode</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>10118</th>\n",
       "      <td>326</td>\n",
       "      <td>6358</td>\n",
       "      <td>202796</td>\n",
       "      <td>815737</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10120</th>\n",
       "      <td>70</td>\n",
       "      <td>1378</td>\n",
       "      <td>41023</td>\n",
       "      <td>132076</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10121</th>\n",
       "      <td>69</td>\n",
       "      <td>6104</td>\n",
       "      <td>199451</td>\n",
       "      <td>887899</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10122</th>\n",
       "      <td>177</td>\n",
       "      <td>1917</td>\n",
       "      <td>40477</td>\n",
       "      <td>156824</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10123</th>\n",
       "      <td>180</td>\n",
       "      <td>2036</td>\n",
       "      <td>51353</td>\n",
       "      <td>195513</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         ESTAB   EMP   PAYQ1   PAYAN\n",
       "zipcode                             \n",
       "10118      326  6358  202796  815737\n",
       "10120       70  1378   41023  132076\n",
       "10121       69  6104  199451  887899\n",
       "10122      177  1917   40477  156824\n",
       "10123      180  2036   51353  195513"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "with open(ejsonpath, 'r') as f:\n",
    "    ejsondata=json.load(f)\n",
    "zbpemp=pd.DataFrame(ejsondata[1:], columns=ejsondata[0]).rename(columns={'PAYQTR1':'PAYQ1','PAYANN':'PAYAN','zip code':'zipcode'}).set_index('zipcode')\n",
    "for field in zbpemp.columns:\n",
    "    zbpemp=zbpemp.astype(dtype={field:'int64'})\n",
    "zbpemp.drop(columns=['EMPSZES'],inplace=True)\n",
    "zbpemp.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(266, 4)"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#ZIP Codes retrieved - may differ from zip2zcta as some zips have no businesses\n",
    "zbpemp.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ESTAB</th>\n",
       "      <th>EMP</th>\n",
       "      <th>PAYQ1</th>\n",
       "      <th>PAYAN</th>\n",
       "      <th>FLAG_EMP</th>\n",
       "      <th>FLAG_PAYQ1</th>\n",
       "      <th>FLAG_PAYAN</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>zipcode</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>10118</th>\n",
       "      <td>326</td>\n",
       "      <td>6358</td>\n",
       "      <td>202796</td>\n",
       "      <td>815737</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10120</th>\n",
       "      <td>70</td>\n",
       "      <td>1378</td>\n",
       "      <td>41023</td>\n",
       "      <td>132076</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10121</th>\n",
       "      <td>69</td>\n",
       "      <td>6104</td>\n",
       "      <td>199451</td>\n",
       "      <td>887899</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10122</th>\n",
       "      <td>177</td>\n",
       "      <td>1917</td>\n",
       "      <td>40477</td>\n",
       "      <td>156824</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10123</th>\n",
       "      <td>180</td>\n",
       "      <td>2036</td>\n",
       "      <td>51353</td>\n",
       "      <td>195513</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         ESTAB   EMP   PAYQ1   PAYAN  FLAG_EMP  FLAG_PAYQ1  FLAG_PAYAN\n",
       "zipcode                                                               \n",
       "10118      326  6358  202796  815737         0           0           0\n",
       "10120       70  1378   41023  132076         0           0           0\n",
       "10121       69  6104  199451  887899         0           0           0\n",
       "10122      177  1917   40477  156824         0           0           0\n",
       "10123      180  2036   51353  195513         0           0           0"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Flag columns count the number of establishments for which data is not disclosed\n",
    "flags=['FLAG_EMP','FLAG_PAYQ1','FLAG_PAYAN']\n",
    "for flagcol in flags:\n",
    "    datacol=flagcol.split('_')[1]\n",
    "    zbpemp[flagcol]=0\n",
    "    zbpemp.loc[zbpemp[datacol] == 0, flagcol] = zbpemp['ESTAB']\n",
    "zbpemp.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ZIP_TYPE</th>\n",
       "      <th>PO_NAME</th>\n",
       "      <th>zcta5</th>\n",
       "      <th>county14</th>\n",
       "      <th>ESTAB</th>\n",
       "      <th>EMP</th>\n",
       "      <th>PAYQ1</th>\n",
       "      <th>PAYAN</th>\n",
       "      <th>FLAG_EMP</th>\n",
       "      <th>FLAG_PAYQ1</th>\n",
       "      <th>FLAG_PAYAN</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ZIP_CODE</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>10001</th>\n",
       "      <td>Zip Code Area</td>\n",
       "      <td>New York</td>\n",
       "      <td>10001</td>\n",
       "      <td>36061</td>\n",
       "      <td>7248</td>\n",
       "      <td>151769</td>\n",
       "      <td>2717186</td>\n",
       "      <td>10646611</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10118</th>\n",
       "      <td>Post Office or large volume customer</td>\n",
       "      <td>New York</td>\n",
       "      <td>10001</td>\n",
       "      <td>36061</td>\n",
       "      <td>326</td>\n",
       "      <td>6358</td>\n",
       "      <td>202796</td>\n",
       "      <td>815737</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10120</th>\n",
       "      <td>Post Office or large volume customer</td>\n",
       "      <td>New York</td>\n",
       "      <td>10001</td>\n",
       "      <td>36061</td>\n",
       "      <td>70</td>\n",
       "      <td>1378</td>\n",
       "      <td>41023</td>\n",
       "      <td>132076</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10122</th>\n",
       "      <td>Post Office or large volume customer</td>\n",
       "      <td>New York</td>\n",
       "      <td>10001</td>\n",
       "      <td>36061</td>\n",
       "      <td>177</td>\n",
       "      <td>1917</td>\n",
       "      <td>40477</td>\n",
       "      <td>156824</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10123</th>\n",
       "      <td>Post Office or large volume customer</td>\n",
       "      <td>New York</td>\n",
       "      <td>10001</td>\n",
       "      <td>36061</td>\n",
       "      <td>180</td>\n",
       "      <td>2036</td>\n",
       "      <td>51353</td>\n",
       "      <td>195513</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                      ZIP_TYPE   PO_NAME  zcta5 county14  \\\n",
       "ZIP_CODE                                                                   \n",
       "10001                            Zip Code Area  New York  10001    36061   \n",
       "10118     Post Office or large volume customer  New York  10001    36061   \n",
       "10120     Post Office or large volume customer  New York  10001    36061   \n",
       "10122     Post Office or large volume customer  New York  10001    36061   \n",
       "10123     Post Office or large volume customer  New York  10001    36061   \n",
       "\n",
       "          ESTAB     EMP    PAYQ1     PAYAN  FLAG_EMP  FLAG_PAYQ1  FLAG_PAYAN  \n",
       "ZIP_CODE                                                                      \n",
       "10001      7248  151769  2717186  10646611         0           0           0  \n",
       "10118       326    6358   202796    815737         0           0           0  \n",
       "10120        70    1378    41023    132076         0           0           0  \n",
       "10122       177    1917    40477    156824         0           0           0  \n",
       "10123       180    2036    51353    195513         0           0           0  "
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Join to ZIP ZCTA crosswalk\n",
    "zbpemp2zcta = pd.merge(zip2zcta,zbpemp,how='inner',left_index=True,right_index=True)\n",
    "zbpemp2zcta.index.name = 'ZIP_CODE'\n",
    "zbpemp2zcta.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(266, 11)"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#ZIP codes in the local area that appear in the ZBP data\n",
    "zbpemp2zcta.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ESTAB</th>\n",
       "      <th>EMP</th>\n",
       "      <th>PAYQ1</th>\n",
       "      <th>PAYAN</th>\n",
       "      <th>FLAG_EMP</th>\n",
       "      <th>FLAG_PAYQ1</th>\n",
       "      <th>FLAG_PAYAN</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>zcta5</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>10001</th>\n",
       "      <td>8006</td>\n",
       "      <td>163513</td>\n",
       "      <td>3053219</td>\n",
       "      <td>11948301</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10002</th>\n",
       "      <td>2962</td>\n",
       "      <td>22737</td>\n",
       "      <td>176288</td>\n",
       "      <td>764720</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10003</th>\n",
       "      <td>4273</td>\n",
       "      <td>98742</td>\n",
       "      <td>1830369</td>\n",
       "      <td>6916902</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10004</th>\n",
       "      <td>1660</td>\n",
       "      <td>69025</td>\n",
       "      <td>2337956</td>\n",
       "      <td>7349828</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10005</th>\n",
       "      <td>1397</td>\n",
       "      <td>48129</td>\n",
       "      <td>3043620</td>\n",
       "      <td>7586070</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       ESTAB     EMP    PAYQ1     PAYAN  FLAG_EMP  FLAG_PAYQ1  FLAG_PAYAN\n",
       "zcta5                                                                    \n",
       "10001   8006  163513  3053219  11948301         0           0           0\n",
       "10002   2962   22737   176288    764720         0           0           0\n",
       "10003   4273   98742  1830369   6916902         0           0           0\n",
       "10004   1660   69025  2337956   7349828         0           0           0\n",
       "10005   1397   48129  3043620   7586070         0           0           0"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Aggregate to ZCTAs\n",
    "zctaemp=zbpemp2zcta[['zcta5','ESTAB','EMP','PAYQ1','PAYAN','FLAG_EMP','FLAG_PAYQ1','FLAG_PAYAN']].groupby(['zcta5'])[['ESTAB','EMP','PAYQ1','PAYAN','FLAG_EMP','FLAG_PAYQ1','FLAG_PAYAN']].sum()\n",
    "zctaemp.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### ZBP Industry Data\n",
    "This data must be requested one record at a time. NOTE that this request can take a LONG TIME, up to one hour to complete for approx 300 ZIP Codes. Once the request is finished the data gets dumped into a json file. If the request is successful but subsequent blocks need to be modified, don't rerun the requests block - pull the data from the json file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "266"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "zipcodes=zbpemp2zcta.index.tolist()\n",
    "len(zipcodes)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### ***THIS BLOCK IS A REQUESTS BLOCK!***  \n",
    "Retrieving approx 300 ZIP Codes takes 1 hour\n",
    "\n",
    "*NOTE - revise in the future to retrieve chunks of zip codes*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "5586 records have been retrieved for 266 ZIP codes...\n",
      "Done\n",
      "Data dumped to json file\n"
     ]
    }
   ],
   "source": [
    "#If this block is successful but there are subsequent problems, do not rerun it - start from the following block.\n",
    "#For industry data, if there are no records for an industry create a blank record with zeros\n",
    "n=0\n",
    "z=0\n",
    "ind_data=[['estab','naics','zipcode']]\n",
    "for zcode in zipcodes:\n",
    "#for zcode in zipcodes[0:5]:\n",
    "    clear_output(wait=True)\n",
    "    for naics in ncodes:\n",
    "        idata_url = f'{base_url}?get={icols}&NAICS2017={naics}&for=zipcode:{zcode}&key={api_key}'\n",
    "        try:\n",
    "            response=requests.get(idata_url)\n",
    "        except requests.exceptions.RequestException as e:\n",
    "            print (e)\n",
    "            break\n",
    "        if response.status_code==200:\n",
    "            jsondata=response.json()\n",
    "            ind_data.append(jsondata[1]) \n",
    "            n=n+1\n",
    "        elif response.status_code==204:\n",
    "            record=['0',naics,zcode]\n",
    "            ind_data.append(record)\n",
    "            n=n+1\n",
    "        else:\n",
    "            print('Problem retrieving data, status code:',response.status_code)\n",
    "            break\n",
    "    z=z+1\n",
    "    print(n,'records have been retrieved for',z,'ZIP codes...')\n",
    "print('Done')\n",
    "\n",
    "with open(ijsonpath, 'w') as f:\n",
    "    json.dump(ind_data, f)\n",
    "print('Data dumped to json file')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>estab</th>\n",
       "      <th>naics</th>\n",
       "      <th>zipcode</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>7248</td>\n",
       "      <td>00</td>\n",
       "      <td>10001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>11</td>\n",
       "      <td>10001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>21</td>\n",
       "      <td>10001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>5</td>\n",
       "      <td>22</td>\n",
       "      <td>10001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>262</td>\n",
       "      <td>23</td>\n",
       "      <td>10001</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   estab naics zipcode\n",
       "0   7248    00   10001\n",
       "1      0    11   10001\n",
       "2      0    21   10001\n",
       "3      5    22   10001\n",
       "4    262    23   10001"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "with open(ijsonpath, 'r') as f:\n",
    "    ijsondata=json.load(f)\n",
    "zbpind = pd.DataFrame(ijsondata[1:],columns=ijsondata[0])\n",
    "zbpind['estab']=zbpind['estab'].astype('int64')\n",
    "zbpind.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>naics</th>\n",
       "      <th>N00</th>\n",
       "      <th>N11</th>\n",
       "      <th>N21</th>\n",
       "      <th>N22</th>\n",
       "      <th>N23</th>\n",
       "      <th>N31_33</th>\n",
       "      <th>N42</th>\n",
       "      <th>N44_45</th>\n",
       "      <th>N48_49</th>\n",
       "      <th>N51</th>\n",
       "      <th>...</th>\n",
       "      <th>N53</th>\n",
       "      <th>N54</th>\n",
       "      <th>N55</th>\n",
       "      <th>N56</th>\n",
       "      <th>N61</th>\n",
       "      <th>N62</th>\n",
       "      <th>N71</th>\n",
       "      <th>N72</th>\n",
       "      <th>N81</th>\n",
       "      <th>N99</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>zipcode</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>10001</th>\n",
       "      <td>7248</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>262</td>\n",
       "      <td>188</td>\n",
       "      <td>1079</td>\n",
       "      <td>654</td>\n",
       "      <td>62</td>\n",
       "      <td>418</td>\n",
       "      <td>...</td>\n",
       "      <td>500</td>\n",
       "      <td>1453</td>\n",
       "      <td>153</td>\n",
       "      <td>363</td>\n",
       "      <td>161</td>\n",
       "      <td>405</td>\n",
       "      <td>313</td>\n",
       "      <td>528</td>\n",
       "      <td>506</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10002</th>\n",
       "      <td>2962</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>100</td>\n",
       "      <td>60</td>\n",
       "      <td>167</td>\n",
       "      <td>520</td>\n",
       "      <td>46</td>\n",
       "      <td>66</td>\n",
       "      <td>...</td>\n",
       "      <td>274</td>\n",
       "      <td>323</td>\n",
       "      <td>3</td>\n",
       "      <td>88</td>\n",
       "      <td>27</td>\n",
       "      <td>206</td>\n",
       "      <td>100</td>\n",
       "      <td>607</td>\n",
       "      <td>314</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10003</th>\n",
       "      <td>4268</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>51</td>\n",
       "      <td>35</td>\n",
       "      <td>94</td>\n",
       "      <td>433</td>\n",
       "      <td>5</td>\n",
       "      <td>195</td>\n",
       "      <td>...</td>\n",
       "      <td>372</td>\n",
       "      <td>672</td>\n",
       "      <td>25</td>\n",
       "      <td>120</td>\n",
       "      <td>75</td>\n",
       "      <td>342</td>\n",
       "      <td>561</td>\n",
       "      <td>658</td>\n",
       "      <td>504</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10004</th>\n",
       "      <td>1600</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>34</td>\n",
       "      <td>13</td>\n",
       "      <td>68</td>\n",
       "      <td>65</td>\n",
       "      <td>20</td>\n",
       "      <td>109</td>\n",
       "      <td>...</td>\n",
       "      <td>87</td>\n",
       "      <td>483</td>\n",
       "      <td>12</td>\n",
       "      <td>84</td>\n",
       "      <td>34</td>\n",
       "      <td>85</td>\n",
       "      <td>31</td>\n",
       "      <td>131</td>\n",
       "      <td>134</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10005</th>\n",
       "      <td>1346</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>26</td>\n",
       "      <td>6</td>\n",
       "      <td>39</td>\n",
       "      <td>43</td>\n",
       "      <td>11</td>\n",
       "      <td>72</td>\n",
       "      <td>...</td>\n",
       "      <td>78</td>\n",
       "      <td>421</td>\n",
       "      <td>25</td>\n",
       "      <td>74</td>\n",
       "      <td>21</td>\n",
       "      <td>41</td>\n",
       "      <td>30</td>\n",
       "      <td>81</td>\n",
       "      <td>102</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 21 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "naics     N00  N11  N21  N22  N23  N31_33   N42  N44_45  N48_49  N51  ...  \\\n",
       "zipcode                                                               ...   \n",
       "10001    7248    0    0    5  262     188  1079     654      62  418  ...   \n",
       "10002    2962    0    0    0  100      60   167     520      46   66  ...   \n",
       "10003    4268    0    0    0   51      35    94     433       5  195  ...   \n",
       "10004    1600    0    0    0   34      13    68      65      20  109  ...   \n",
       "10005    1346    0    0    0   26       6    39      43      11   72  ...   \n",
       "\n",
       "naics    N53   N54  N55  N56  N61  N62  N71  N72  N81  N99  \n",
       "zipcode                                                     \n",
       "10001    500  1453  153  363  161  405  313  528  506    9  \n",
       "10002    274   323    3   88   27  206  100  607  314    3  \n",
       "10003    372   672   25  120   75  342  561  658  504    8  \n",
       "10004     87   483   12   84   34   85   31  131  134    4  \n",
       "10005     78   421   25   74   21   41   30   81  102    3  \n",
       "\n",
       "[5 rows x 21 columns]"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Pivot data to move NAICS to columns\n",
    "zbpind_tab=zbpind.pivot(index='zipcode', columns='naics', values='estab')\n",
    "zbpind_tab=zbpind_tab.add_prefix('N')\n",
    "zbpind_tab.rename(columns=lambda x: x.replace('-', '_'),inplace=True)\n",
    "zbpind_tab.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>naics</th>\n",
       "      <th>N00</th>\n",
       "      <th>N11</th>\n",
       "      <th>N21</th>\n",
       "      <th>N22</th>\n",
       "      <th>N23</th>\n",
       "      <th>N31_33</th>\n",
       "      <th>N42</th>\n",
       "      <th>N44_45</th>\n",
       "      <th>N48_49</th>\n",
       "      <th>N51</th>\n",
       "      <th>...</th>\n",
       "      <th>N54</th>\n",
       "      <th>N55</th>\n",
       "      <th>N56</th>\n",
       "      <th>N61</th>\n",
       "      <th>N62</th>\n",
       "      <th>N71</th>\n",
       "      <th>N72</th>\n",
       "      <th>N81</th>\n",
       "      <th>N99</th>\n",
       "      <th>NXX</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>zipcode</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>10001</th>\n",
       "      <td>7248</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>262</td>\n",
       "      <td>188</td>\n",
       "      <td>1079</td>\n",
       "      <td>654</td>\n",
       "      <td>62</td>\n",
       "      <td>418</td>\n",
       "      <td>...</td>\n",
       "      <td>1453</td>\n",
       "      <td>153</td>\n",
       "      <td>363</td>\n",
       "      <td>161</td>\n",
       "      <td>405</td>\n",
       "      <td>313</td>\n",
       "      <td>528</td>\n",
       "      <td>506</td>\n",
       "      <td>9</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10002</th>\n",
       "      <td>2962</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>100</td>\n",
       "      <td>60</td>\n",
       "      <td>167</td>\n",
       "      <td>520</td>\n",
       "      <td>46</td>\n",
       "      <td>66</td>\n",
       "      <td>...</td>\n",
       "      <td>323</td>\n",
       "      <td>3</td>\n",
       "      <td>88</td>\n",
       "      <td>27</td>\n",
       "      <td>206</td>\n",
       "      <td>100</td>\n",
       "      <td>607</td>\n",
       "      <td>314</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10003</th>\n",
       "      <td>4268</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>51</td>\n",
       "      <td>35</td>\n",
       "      <td>94</td>\n",
       "      <td>433</td>\n",
       "      <td>5</td>\n",
       "      <td>195</td>\n",
       "      <td>...</td>\n",
       "      <td>672</td>\n",
       "      <td>25</td>\n",
       "      <td>120</td>\n",
       "      <td>75</td>\n",
       "      <td>342</td>\n",
       "      <td>561</td>\n",
       "      <td>658</td>\n",
       "      <td>504</td>\n",
       "      <td>8</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10004</th>\n",
       "      <td>1600</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>34</td>\n",
       "      <td>13</td>\n",
       "      <td>68</td>\n",
       "      <td>65</td>\n",
       "      <td>20</td>\n",
       "      <td>109</td>\n",
       "      <td>...</td>\n",
       "      <td>483</td>\n",
       "      <td>12</td>\n",
       "      <td>84</td>\n",
       "      <td>34</td>\n",
       "      <td>85</td>\n",
       "      <td>31</td>\n",
       "      <td>131</td>\n",
       "      <td>134</td>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10005</th>\n",
       "      <td>1346</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>26</td>\n",
       "      <td>6</td>\n",
       "      <td>39</td>\n",
       "      <td>43</td>\n",
       "      <td>11</td>\n",
       "      <td>72</td>\n",
       "      <td>...</td>\n",
       "      <td>421</td>\n",
       "      <td>25</td>\n",
       "      <td>74</td>\n",
       "      <td>21</td>\n",
       "      <td>41</td>\n",
       "      <td>30</td>\n",
       "      <td>81</td>\n",
       "      <td>102</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 22 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "naics     N00  N11  N21  N22  N23  N31_33   N42  N44_45  N48_49  N51  ...  \\\n",
       "zipcode                                                               ...   \n",
       "10001    7248    0    0    5  262     188  1079     654      62  418  ...   \n",
       "10002    2962    0    0    0  100      60   167     520      46   66  ...   \n",
       "10003    4268    0    0    0   51      35    94     433       5  195  ...   \n",
       "10004    1600    0    0    0   34      13    68      65      20  109  ...   \n",
       "10005    1346    0    0    0   26       6    39      43      11   72  ...   \n",
       "\n",
       "naics     N54  N55  N56  N61  N62  N71  N72  N81  N99  NXX  \n",
       "zipcode                                                     \n",
       "10001    1453  153  363  161  405  313  528  506    9    0  \n",
       "10002     323    3   88   27  206  100  607  314    3    2  \n",
       "10003     672   25  120   75  342  561  658  504    8    2  \n",
       "10004     483   12   84   34   85   31  131  134    4    1  \n",
       "10005     421   25   74   21   41   30   81  102    3    0  \n",
       "\n",
       "[5 rows x 22 columns]"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Create column to summarize businesses that were not disclosed and categorized, but that are included in the total\n",
    "zbpind_tab['NXX']=zbpind_tab.loc[:,'N00'].subtract(zbpind_tab.loc[:,'N11':'N99'].sum(axis=1))\n",
    "zbpind_tab.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>zcta5</th>\n",
       "      <th>N00</th>\n",
       "      <th>N11</th>\n",
       "      <th>N21</th>\n",
       "      <th>N22</th>\n",
       "      <th>N23</th>\n",
       "      <th>N31_33</th>\n",
       "      <th>N42</th>\n",
       "      <th>N44_45</th>\n",
       "      <th>N48_49</th>\n",
       "      <th>...</th>\n",
       "      <th>N54</th>\n",
       "      <th>N55</th>\n",
       "      <th>N56</th>\n",
       "      <th>N61</th>\n",
       "      <th>N62</th>\n",
       "      <th>N71</th>\n",
       "      <th>N72</th>\n",
       "      <th>N81</th>\n",
       "      <th>N99</th>\n",
       "      <th>NXX</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ZIP_CODE</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>10001</th>\n",
       "      <td>10001</td>\n",
       "      <td>7248</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>262</td>\n",
       "      <td>188</td>\n",
       "      <td>1079</td>\n",
       "      <td>654</td>\n",
       "      <td>62</td>\n",
       "      <td>...</td>\n",
       "      <td>1453</td>\n",
       "      <td>153</td>\n",
       "      <td>363</td>\n",
       "      <td>161</td>\n",
       "      <td>405</td>\n",
       "      <td>313</td>\n",
       "      <td>528</td>\n",
       "      <td>506</td>\n",
       "      <td>9</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10118</th>\n",
       "      <td>10001</td>\n",
       "      <td>326</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8</td>\n",
       "      <td>0</td>\n",
       "      <td>45</td>\n",
       "      <td>10</td>\n",
       "      <td>5</td>\n",
       "      <td>...</td>\n",
       "      <td>103</td>\n",
       "      <td>12</td>\n",
       "      <td>17</td>\n",
       "      <td>0</td>\n",
       "      <td>8</td>\n",
       "      <td>29</td>\n",
       "      <td>11</td>\n",
       "      <td>23</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10120</th>\n",
       "      <td>10001</td>\n",
       "      <td>70</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8</td>\n",
       "      <td>8</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>18</td>\n",
       "      <td>10</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10122</th>\n",
       "      <td>10001</td>\n",
       "      <td>177</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>10</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>45</td>\n",
       "      <td>0</td>\n",
       "      <td>7</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>53</td>\n",
       "      <td>0</td>\n",
       "      <td>10</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10123</th>\n",
       "      <td>10001</td>\n",
       "      <td>180</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7</td>\n",
       "      <td>0</td>\n",
       "      <td>14</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>77</td>\n",
       "      <td>0</td>\n",
       "      <td>13</td>\n",
       "      <td>7</td>\n",
       "      <td>11</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>8</td>\n",
       "      <td>0</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 23 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "          zcta5   N00  N11  N21  N22  N23  N31_33   N42  N44_45  N48_49  ...  \\\n",
       "ZIP_CODE                                                                 ...   \n",
       "10001     10001  7248    0    0    5  262     188  1079     654      62  ...   \n",
       "10118     10001   326    0    0    0    8       0    45      10       5  ...   \n",
       "10120     10001    70    0    0    0    0       0     8       8       0  ...   \n",
       "10122     10001   177    0    0    0    0       0    10       3       0  ...   \n",
       "10123     10001   180    0    0    0    7       0    14       4       0  ...   \n",
       "\n",
       "           N54  N55  N56  N61  N62  N71  N72  N81  N99  NXX  \n",
       "ZIP_CODE                                                     \n",
       "10001     1453  153  363  161  405  313  528  506    9    0  \n",
       "10118      103   12   17    0    8   29   11   23    0    5  \n",
       "10120       18   10    0    0    3    0    0    0    0   11  \n",
       "10122       45    0    7    0    5   53    0   10    0    3  \n",
       "10123       77    0   13    7   11    3    0    8    0    6  \n",
       "\n",
       "[5 rows x 23 columns]"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Join to ZIP ZCTA crosswalk\n",
    "zbpind2zcta = pd.merge(zip2zcta[['zcta5']],zbpind_tab,how='inner',left_index=True,right_index=True)\n",
    "zbpind2zcta.index.name = 'ZIP_CODE'\n",
    "zbpind2zcta.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(266, 23)"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "zbpind2zcta.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>N00</th>\n",
       "      <th>N11</th>\n",
       "      <th>N21</th>\n",
       "      <th>N22</th>\n",
       "      <th>N23</th>\n",
       "      <th>N31_33</th>\n",
       "      <th>N42</th>\n",
       "      <th>N44_45</th>\n",
       "      <th>N48_49</th>\n",
       "      <th>N51</th>\n",
       "      <th>...</th>\n",
       "      <th>N54</th>\n",
       "      <th>N55</th>\n",
       "      <th>N56</th>\n",
       "      <th>N61</th>\n",
       "      <th>N62</th>\n",
       "      <th>N71</th>\n",
       "      <th>N72</th>\n",
       "      <th>N81</th>\n",
       "      <th>N99</th>\n",
       "      <th>NXX</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>zcta5</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>10001</th>\n",
       "      <td>8006</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>277</td>\n",
       "      <td>188</td>\n",
       "      <td>1156</td>\n",
       "      <td>679</td>\n",
       "      <td>67</td>\n",
       "      <td>450</td>\n",
       "      <td>...</td>\n",
       "      <td>1696</td>\n",
       "      <td>175</td>\n",
       "      <td>400</td>\n",
       "      <td>168</td>\n",
       "      <td>432</td>\n",
       "      <td>398</td>\n",
       "      <td>539</td>\n",
       "      <td>547</td>\n",
       "      <td>9</td>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10002</th>\n",
       "      <td>2962</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>100</td>\n",
       "      <td>60</td>\n",
       "      <td>167</td>\n",
       "      <td>520</td>\n",
       "      <td>46</td>\n",
       "      <td>66</td>\n",
       "      <td>...</td>\n",
       "      <td>323</td>\n",
       "      <td>3</td>\n",
       "      <td>88</td>\n",
       "      <td>27</td>\n",
       "      <td>206</td>\n",
       "      <td>100</td>\n",
       "      <td>607</td>\n",
       "      <td>314</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10003</th>\n",
       "      <td>4273</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>51</td>\n",
       "      <td>35</td>\n",
       "      <td>94</td>\n",
       "      <td>433</td>\n",
       "      <td>5</td>\n",
       "      <td>195</td>\n",
       "      <td>...</td>\n",
       "      <td>672</td>\n",
       "      <td>25</td>\n",
       "      <td>120</td>\n",
       "      <td>75</td>\n",
       "      <td>342</td>\n",
       "      <td>561</td>\n",
       "      <td>658</td>\n",
       "      <td>504</td>\n",
       "      <td>8</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10004</th>\n",
       "      <td>1660</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>34</td>\n",
       "      <td>13</td>\n",
       "      <td>68</td>\n",
       "      <td>65</td>\n",
       "      <td>20</td>\n",
       "      <td>126</td>\n",
       "      <td>...</td>\n",
       "      <td>491</td>\n",
       "      <td>12</td>\n",
       "      <td>88</td>\n",
       "      <td>34</td>\n",
       "      <td>85</td>\n",
       "      <td>31</td>\n",
       "      <td>131</td>\n",
       "      <td>137</td>\n",
       "      <td>4</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10005</th>\n",
       "      <td>1397</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>26</td>\n",
       "      <td>6</td>\n",
       "      <td>39</td>\n",
       "      <td>43</td>\n",
       "      <td>11</td>\n",
       "      <td>72</td>\n",
       "      <td>...</td>\n",
       "      <td>421</td>\n",
       "      <td>25</td>\n",
       "      <td>74</td>\n",
       "      <td>21</td>\n",
       "      <td>41</td>\n",
       "      <td>30</td>\n",
       "      <td>81</td>\n",
       "      <td>102</td>\n",
       "      <td>3</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 22 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        N00  N11  N21  N22  N23  N31_33   N42  N44_45  N48_49  N51  ...   N54  \\\n",
       "zcta5                                                               ...         \n",
       "10001  8006    0    0    5  277     188  1156     679      67  450  ...  1696   \n",
       "10002  2962    0    0    0  100      60   167     520      46   66  ...   323   \n",
       "10003  4273    0    0    0   51      35    94     433       5  195  ...   672   \n",
       "10004  1660    0    0    0   34      13    68      65      20  126  ...   491   \n",
       "10005  1397    0    0    0   26       6    39      43      11   72  ...   421   \n",
       "\n",
       "       N55  N56  N61  N62  N71  N72  N81  N99  NXX  \n",
       "zcta5                                               \n",
       "10001  175  400  168  432  398  539  547    9   30  \n",
       "10002    3   88   27  206  100  607  314    3    2  \n",
       "10003   25  120   75  342  561  658  504    8    7  \n",
       "10004   12   88   34   85   31  131  137    4   12  \n",
       "10005   25   74   21   41   30   81  102    3   13  \n",
       "\n",
       "[5 rows x 22 columns]"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Aggregate to ZCTAs\n",
    "zctaind=zbpind2zcta.groupby(['zcta5']).sum()\n",
    "zctaind.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>N00</th>\n",
       "      <th>N11</th>\n",
       "      <th>N21</th>\n",
       "      <th>N22</th>\n",
       "      <th>N23</th>\n",
       "      <th>N31_33</th>\n",
       "      <th>N42</th>\n",
       "      <th>N44_45</th>\n",
       "      <th>N48_49</th>\n",
       "      <th>N51</th>\n",
       "      <th>...</th>\n",
       "      <th>N54_PCT</th>\n",
       "      <th>N55_PCT</th>\n",
       "      <th>N56_PCT</th>\n",
       "      <th>N61_PCT</th>\n",
       "      <th>N62_PCT</th>\n",
       "      <th>N71_PCT</th>\n",
       "      <th>N72_PCT</th>\n",
       "      <th>N81_PCT</th>\n",
       "      <th>N99_PCT</th>\n",
       "      <th>NXX_PCT</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>zcta5</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>10001</th>\n",
       "      <td>8006</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>277</td>\n",
       "      <td>188</td>\n",
       "      <td>1156</td>\n",
       "      <td>679</td>\n",
       "      <td>67</td>\n",
       "      <td>450</td>\n",
       "      <td>...</td>\n",
       "      <td>21.18</td>\n",
       "      <td>2.19</td>\n",
       "      <td>5.00</td>\n",
       "      <td>2.10</td>\n",
       "      <td>5.40</td>\n",
       "      <td>4.97</td>\n",
       "      <td>6.73</td>\n",
       "      <td>6.83</td>\n",
       "      <td>0.11</td>\n",
       "      <td>0.37</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10002</th>\n",
       "      <td>2962</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>100</td>\n",
       "      <td>60</td>\n",
       "      <td>167</td>\n",
       "      <td>520</td>\n",
       "      <td>46</td>\n",
       "      <td>66</td>\n",
       "      <td>...</td>\n",
       "      <td>10.90</td>\n",
       "      <td>0.10</td>\n",
       "      <td>2.97</td>\n",
       "      <td>0.91</td>\n",
       "      <td>6.95</td>\n",
       "      <td>3.38</td>\n",
       "      <td>20.49</td>\n",
       "      <td>10.60</td>\n",
       "      <td>0.10</td>\n",
       "      <td>0.07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10003</th>\n",
       "      <td>4273</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>51</td>\n",
       "      <td>35</td>\n",
       "      <td>94</td>\n",
       "      <td>433</td>\n",
       "      <td>5</td>\n",
       "      <td>195</td>\n",
       "      <td>...</td>\n",
       "      <td>15.73</td>\n",
       "      <td>0.59</td>\n",
       "      <td>2.81</td>\n",
       "      <td>1.76</td>\n",
       "      <td>8.00</td>\n",
       "      <td>13.13</td>\n",
       "      <td>15.40</td>\n",
       "      <td>11.79</td>\n",
       "      <td>0.19</td>\n",
       "      <td>0.16</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10004</th>\n",
       "      <td>1660</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>34</td>\n",
       "      <td>13</td>\n",
       "      <td>68</td>\n",
       "      <td>65</td>\n",
       "      <td>20</td>\n",
       "      <td>126</td>\n",
       "      <td>...</td>\n",
       "      <td>29.58</td>\n",
       "      <td>0.72</td>\n",
       "      <td>5.30</td>\n",
       "      <td>2.05</td>\n",
       "      <td>5.12</td>\n",
       "      <td>1.87</td>\n",
       "      <td>7.89</td>\n",
       "      <td>8.25</td>\n",
       "      <td>0.24</td>\n",
       "      <td>0.72</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10005</th>\n",
       "      <td>1397</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>26</td>\n",
       "      <td>6</td>\n",
       "      <td>39</td>\n",
       "      <td>43</td>\n",
       "      <td>11</td>\n",
       "      <td>72</td>\n",
       "      <td>...</td>\n",
       "      <td>30.14</td>\n",
       "      <td>1.79</td>\n",
       "      <td>5.30</td>\n",
       "      <td>1.50</td>\n",
       "      <td>2.93</td>\n",
       "      <td>2.15</td>\n",
       "      <td>5.80</td>\n",
       "      <td>7.30</td>\n",
       "      <td>0.21</td>\n",
       "      <td>0.93</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 43 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        N00  N11  N21  N22  N23  N31_33   N42  N44_45  N48_49  N51  ...  \\\n",
       "zcta5                                                               ...   \n",
       "10001  8006    0    0    5  277     188  1156     679      67  450  ...   \n",
       "10002  2962    0    0    0  100      60   167     520      46   66  ...   \n",
       "10003  4273    0    0    0   51      35    94     433       5  195  ...   \n",
       "10004  1660    0    0    0   34      13    68      65      20  126  ...   \n",
       "10005  1397    0    0    0   26       6    39      43      11   72  ...   \n",
       "\n",
       "       N54_PCT  N55_PCT  N56_PCT  N61_PCT  N62_PCT  N71_PCT  N72_PCT  N81_PCT  \\\n",
       "zcta5                                                                           \n",
       "10001    21.18     2.19     5.00     2.10     5.40     4.97     6.73     6.83   \n",
       "10002    10.90     0.10     2.97     0.91     6.95     3.38    20.49    10.60   \n",
       "10003    15.73     0.59     2.81     1.76     8.00    13.13    15.40    11.79   \n",
       "10004    29.58     0.72     5.30     2.05     5.12     1.87     7.89     8.25   \n",
       "10005    30.14     1.79     5.30     1.50     2.93     2.15     5.80     7.30   \n",
       "\n",
       "       N99_PCT  NXX_PCT  \n",
       "zcta5                    \n",
       "10001     0.11     0.37  \n",
       "10002     0.10     0.07  \n",
       "10003     0.19     0.16  \n",
       "10004     0.24     0.72  \n",
       "10005     0.21     0.93  \n",
       "\n",
       "[5 rows x 43 columns]"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Generate and calculate percent total columns\n",
    "ncols=list(zctaind)\n",
    "for c in ncols[1:]:\n",
    "    pct=c+'_PCT'\n",
    "    zctaind[pct]=((zctaind[c]/zctaind['N00'])*100).round(2)\n",
    "zctaind.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### NAICS Codes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### ***THIS BLOCK IS A REQUESTS BLOCK!***"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
    "codedict={}\n",
    "codes_url=f'https://api.census.gov/data/2018/zbp/variables/NAICS2017.json'\n",
    "response=requests.get(codes_url)\n",
    "codes_data=response.json()\n",
    "codedict.update(codes_data['values']['item'])\n",
    "sectordict=dict((k, codedict[k]) for k in ncodes)\n",
    "sectordict['XX']='Establishments omitted from classification due to privacy regulations'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>naics</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>00</th>\n",
       "      <td>Total for all sectors</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Agriculture, forestry, fishing and hunting</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>Mining, quarrying, and oil and gas extraction</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>Utilities</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>Construction</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31-33</th>\n",
       "      <td>Manufacturing</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42</th>\n",
       "      <td>Wholesale trade</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>44-45</th>\n",
       "      <td>Retail trade</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>48-49</th>\n",
       "      <td>Transportation and warehousing</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>51</th>\n",
       "      <td>Information</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>52</th>\n",
       "      <td>Finance and insurance</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>53</th>\n",
       "      <td>Real estate and rental and leasing</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>54</th>\n",
       "      <td>Professional, scientific, and technical services</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>55</th>\n",
       "      <td>Management of companies and enterprises</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>56</th>\n",
       "      <td>Administrative and support and waste managemen...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>61</th>\n",
       "      <td>Educational services</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>62</th>\n",
       "      <td>Health care and social assistance</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>71</th>\n",
       "      <td>Arts, entertainment, and recreation</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>72</th>\n",
       "      <td>Accommodation and food services</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>81</th>\n",
       "      <td>Other services (except public administration)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99</th>\n",
       "      <td>Industries not classified</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>XX</th>\n",
       "      <td>Establishments omitted from classification due...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                    name\n",
       "naics                                                   \n",
       "00                                 Total for all sectors\n",
       "11            Agriculture, forestry, fishing and hunting\n",
       "21         Mining, quarrying, and oil and gas extraction\n",
       "22                                             Utilities\n",
       "23                                          Construction\n",
       "31-33                                      Manufacturing\n",
       "42                                       Wholesale trade\n",
       "44-45                                       Retail trade\n",
       "48-49                     Transportation and warehousing\n",
       "51                                           Information\n",
       "52                                 Finance and insurance\n",
       "53                    Real estate and rental and leasing\n",
       "54      Professional, scientific, and technical services\n",
       "55               Management of companies and enterprises\n",
       "56     Administrative and support and waste managemen...\n",
       "61                                  Educational services\n",
       "62                     Health care and social assistance\n",
       "71                   Arts, entertainment, and recreation\n",
       "72                       Accommodation and food services\n",
       "81         Other services (except public administration)\n",
       "99                             Industries not classified\n",
       "XX     Establishments omitted from classification due..."
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "codes=pd.DataFrame(list(sectordict.items()), columns=['naics', 'name']).set_index('naics')\n",
    "codes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Quality Control Checks"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n"
     ]
    }
   ],
   "source": [
    "#Does sum of industries equal industry total?\n",
    "indsum=zctaind['N00'].subtract(zctaind.iloc[:,1:22].sum(axis=1))\n",
    "if indsum.sum()==0:\n",
    "    print (True)\n",
    "else:\n",
    "    print(indsum.loc[indsum != 0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n"
     ]
    }
   ],
   "source": [
    "#Is sum of percent totals approximately 100?\n",
    "ptotal=zctaind.iloc[:,22:].sum(axis=1)\n",
    "if ptotal.loc[(ptotal <= 99.05) | (ptotal >= 100.05)].empty:\n",
    "    print(True)\n",
    "else:\n",
    "    print(ptotal.loc[(ptotal <= 99.05) | (ptotal >= 100.05)])\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n"
     ]
    }
   ],
   "source": [
    "#Do number of ZCTAs in employment table match the industries table?\n",
    "ecount=zctaemp.shape[0]\n",
    "icount=zctaind.shape[0]\n",
    "if ecount == icount:\n",
    "    print (True)\n",
    "else:\n",
    "    print('Mistmatched count between employment',ecount, 'rows and industry',icount, 'rows')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n"
     ]
    }
   ],
   "source": [
    "#Does sum of estabslishments from employment table equal establishments in industries table?\n",
    "estsum=zctaemp['ESTAB'].subtract(zctaind['N00'])\n",
    "if estsum.sum()==0:\n",
    "    print (True)\n",
    "else:\n",
    "    print(estsum.loc[estsum != 0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Write to Database "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [],
   "source": [
    "con = sqlite3.connect(dbname) \n",
    "cur = con.cursor()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<sqlite3.Cursor at 0x7fac37579570>"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Employment table\n",
    "cur.execute('DROP TABLE IF EXISTS {};'.format(emptable))\n",
    "qcreate_emptab=\"\"\"\n",
    "CREATE TABLE {}(\n",
    "zcta5 TEXT NOT NULL PRIMARY KEY,\n",
    "estab INTEGER,\n",
    "emp INTEGER,\n",
    "payq1 INTEGER,\n",
    "payan INTEGER,\n",
    "flag_emp INTEGER,\n",
    "flag_payq1 INTEGER,\n",
    "flag_payan INTEGER);\n",
    "\"\"\".format(emptable)\n",
    "\n",
    "cur.execute(qcreate_emptab)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Don't run this block unless you've run the previous one\n",
    "zctaemp.to_sql(name='{}'.format(emptable), if_exists='append', index=True, con=con)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "212 records written to zbp2018emp\n"
     ]
    }
   ],
   "source": [
    "cur.execute('SELECT COUNT(*) FROM {};'.format(emptable))\n",
    "rows = cur.fetchone()\n",
    "print(rows[0], 'records written to', emptable)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0 records updated for EMP\n",
      "0 records updated for PAYQ1\n",
      "0 records updated for PAYAN\n",
      "212 records updated for FLAG_EMP\n",
      "212 records updated for FLAG_PAYQ1\n",
      "212 records updated for FLAG_PAYAN\n"
     ]
    }
   ],
   "source": [
    "#Replace zeros with nulls, as these values really represent no data\n",
    "for col in zctaemp.columns[1:]:\n",
    "    qupdate='UPDATE {} SET {} = NULL WHERE {} = 0;'.format(emptable,col,col)\n",
    "    cur.execute(qupdate)\n",
    "    print(cur.rowcount,'records updated for',col)\n",
    "    con.commit()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<sqlite3.Cursor at 0x7fac37579570>"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Industry table\n",
    "cur.execute('DROP TABLE IF EXISTS {}'.format(indtable))\n",
    "qcreate_indtab=\"\"\"\n",
    "CREATE TABLE {} (\n",
    "zcta5 TEXT NOT NULL PRIMARY KEY, \n",
    "N00 INTEGER, \n",
    "N11 INTEGER, \n",
    "N21 INTEGER, \n",
    "N22 INTEGER, \n",
    "N23 INTEGER, \n",
    "N31_33 INTEGER, \n",
    "N42 INTEGER, \n",
    "N44_45 INTEGER, \n",
    "N48_49 INTEGER, \n",
    "N51 INTEGER, \n",
    "N52 INTEGER, \n",
    "N53 INTEGER, \n",
    "N54 INTEGER, \n",
    "N55 INTEGER, \n",
    "N56 INTEGER, \n",
    "N61 INTEGER, \n",
    "N62 INTEGER, \n",
    "N71 INTEGER, \n",
    "N72 INTEGER, \n",
    "N81 INTEGER, \n",
    "N99 INTEGER,\n",
    "NXX INTEGER,\n",
    "N11_PCT REAL, \n",
    "N21_PCT REAL, \n",
    "N22_PCT REAL, \n",
    "N23_PCT REAL, \n",
    "N31_33_PCT REAL, \n",
    "N42_PCT REAL, \n",
    "N44_45_PCT REAL, \n",
    "N48_49_PCT REAL, \n",
    "N51_PCT REAL, \n",
    "N52_PCT REAL, \n",
    "N53_PCT REAL, \n",
    "N54_PCT REAL, \n",
    "N55_PCT REAL, \n",
    "N56_PCT REAL, \n",
    "N61_PCT REAL, \n",
    "N62_PCT REAL, \n",
    "N71_PCT REAL, \n",
    "N72_PCT REAL, \n",
    "N81_PCT REAL, \n",
    "N99_PCT REAL,\n",
    "NXX_PCT REAL);\n",
    "\"\"\".format(indtable)\n",
    "\n",
    "cur.execute(qcreate_indtab)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Don't run this block unless you've run the previous one\n",
    "zctaind.to_sql(name='{}'.format(indtable), if_exists='append', index=True, con=con)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "212 records written to zbp2018ind\n"
     ]
    }
   ],
   "source": [
    "cur.execute('SELECT COUNT(*) FROM {};'.format(indtable))\n",
    "rows = cur.fetchone()\n",
    "print(rows[0], 'records written to', indtable)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "212 records updated for N11_PCT\n",
      "211 records updated for N21_PCT\n",
      "201 records updated for N22_PCT\n",
      "27 records updated for N23_PCT\n",
      "58 records updated for N31_33_PCT\n",
      "26 records updated for N42_PCT\n",
      "18 records updated for N44_45_PCT\n",
      "35 records updated for N48_49_PCT\n",
      "39 records updated for N51_PCT\n",
      "12 records updated for N52_PCT\n",
      "10 records updated for N53_PCT\n",
      "7 records updated for N54_PCT\n",
      "140 records updated for N55_PCT\n",
      "12 records updated for N56_PCT\n",
      "39 records updated for N61_PCT\n",
      "16 records updated for N62_PCT\n",
      "46 records updated for N71_PCT\n",
      "17 records updated for N72_PCT\n",
      "8 records updated for N81_PCT\n",
      "158 records updated for N99_PCT\n",
      "19 records updated for NXX_PCT\n"
     ]
    }
   ],
   "source": [
    "#For percentages, replace zeros with nulls, as these values really represent no data\n",
    "for col in zctaind.columns[22:]:\n",
    "    qupdate='UPDATE {} SET {} = NULL WHERE {} = 0.0;'.format(indtable,col,col)\n",
    "    cur.execute(qupdate)\n",
    "    print(cur.rowcount,'records updated for',col)\n",
    "    con.commit()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "193 records updated for N11\n",
      "192 records updated for N21\n",
      "184 records updated for N22\n",
      "27 records updated for N23\n",
      "58 records updated for N31_33\n",
      "26 records updated for N42\n",
      "18 records updated for N44_45\n",
      "35 records updated for N48_49\n",
      "39 records updated for N51\n",
      "12 records updated for N52\n",
      "10 records updated for N53\n",
      "7 records updated for N54\n",
      "132 records updated for N55\n",
      "12 records updated for N56\n",
      "39 records updated for N61\n",
      "16 records updated for N62\n",
      "46 records updated for N71\n",
      "17 records updated for N72\n",
      "8 records updated for N81\n",
      "147 records updated for N99\n",
      "0 records updated for NXX\n"
     ]
    }
   ],
   "source": [
    "#For establishments, replace zeros with nulls unless establishments were omitted from classification\n",
    "for col in zctaind.columns[1:22]:\n",
    "    qupdate='UPDATE {} SET {} = NULL WHERE {} = 0 AND NXX !=0;'.format(indtable,col,col)\n",
    "    cur.execute(qupdate)\n",
    "    print(cur.rowcount,'records updated for',col)\n",
    "    con.commit()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<sqlite3.Cursor at 0x7fac37579570>"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#NAICS code table\n",
    "cur.execute('DROP TABLE IF EXISTS {};'.format(codetable))\n",
    "qcreate_codetab=\"\"\"\n",
    "CREATE TABLE {}(\n",
    "naics TEXT NOT NULL PRIMARY KEY,\n",
    "name TEXT);\n",
    "\"\"\".format(codetable)\n",
    "\n",
    "cur.execute(qcreate_codetab)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Don't run this block unless you've run the previous one\n",
    "codes.to_sql(name='{}'.format(codetable), if_exists='append', index=True, con=con)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "22 records written to zbp2018indcodes\n"
     ]
    }
   ],
   "source": [
    "cur.execute('SELECT COUNT(*) FROM {};'.format(codetable))\n",
    "rows = cur.fetchone()\n",
    "print(rows[0], 'records written to', codetable)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [],
   "source": [
    "con.close()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}