{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Census ACS\n",
    "\n",
    "Retrieves data from the Census Bureau's American Community Survey 5-year series API for zctas, pumas, and tracts. A specific list of census variables is passed into the script, which are retrieved from the four ACS profile tables. Variables must be retrieved in chunks because only 50 can be passed to the API at a time, and each url varies by geography and retreives them in different combinations. After some processing output is written to a SQLite database. An option to create a metadata table appears at the bottom, but should only be run once for a given extract (acs1 and acs2) and not for each individual geography.\n",
    "\n",
    "https://www.census.gov/data/developers/data-sets/acs-5year.html"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Variables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os, requests, json, sqlite3, random, pandas as pd, numpy as np\n",
    "from IPython.display import clear_output"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "keyfile='census_key.txt'\n",
    "\n",
    "#API variables - UPDATE THE YEAR AND GEO\n",
    "year='2018'\n",
    "geo='zip code tabulation area' # 'zip code tabulation area' or 'public use microdata area' or 'tract'\n",
    "state='36'\n",
    "dsource='acs'\n",
    "dsource2='acs5'\n",
    "dname='profile'\n",
    "\n",
    "#Variables to read in from spreadsheet - UPDATE WORKSHEET\n",
    "worksheet='acs2' # 'acs1' or 'acs2'\n",
    "geoexcelsheet={'zip code tabulation area':'zctas', 'public use microdata area':'pumas', 'tract':'tracts'}\n",
    "geotype=geoexcelsheet.get(geo)\n",
    "\n",
    "#SQL output\n",
    "tabname='{}_{}{}'.format(geotype,year,worksheet)\n",
    "dbname=os.path.join('outputs','testdb.sqlite')\n",
    "\n",
    "#Dump files for api data storage\n",
    "jsonpath=os.path.join('outputs', tabname+'_retrieved_data.json')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Variable Lists\n",
    "Get full list of variables from the API, read in our retrieval list, and compare the varianle IDs and names to make sure nothing is missing and that nothing has changed since the last iteration. *Don't move on to the next block until both lists match.* Lastly, read in list of geographies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[('DP05_0086MA',\n",
       "  {'label': 'Annotation of Margin of Error!!Total housing units',\n",
       "   'predicateType': 'string',\n",
       "   'group': 'DP05',\n",
       "   'limit': 0,\n",
       "   'predicateOnly': True}),\n",
       " ('DP02_0011PM',\n",
       "  {'label': 'Percent Margin of Error!!HOUSEHOLDS BY TYPE!!Total households!!Nonfamily households!!Householder living alone',\n",
       "   'concept': 'SELECTED SOCIAL CHARACTERISTICS IN THE UNITED STATES',\n",
       "   'predicateType': 'float',\n",
       "   'group': 'DP02',\n",
       "   'limit': 0,\n",
       "   'predicateOnly': True})]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "datadict={}\n",
    "dps=['DP02','DP03','DP04','DP05']\n",
    "for p in dps:\n",
    "    vars_url = f'https://api.census.gov/data/{year}/{dsource}/{dsource2}/{dname}/groups/{p}.json'\n",
    "    response=requests.get(vars_url)\n",
    "    var_data=response.json()\n",
    "    datadict.update(var_data['variables'])\n",
    "random.sample(datadict.items(), 2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>db_var</th>\n",
       "      <th>census_var</th>\n",
       "      <th>census_label</th>\n",
       "      <th>dtype</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>HOC01_E</td>\n",
       "      <td>DP04_0001E</td>\n",
       "      <td>Estimate!!HOUSING OCCUPANCY!!Total housing units</td>\n",
       "      <td>int</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>HOC01_M</td>\n",
       "      <td>DP04_0001M</td>\n",
       "      <td>Margin of Error!!HOUSING OCCUPANCY!!Total hous...</td>\n",
       "      <td>int</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>HOC01_PC</td>\n",
       "      <td>DP04_0001PE</td>\n",
       "      <td>Percent Estimate!!HOUSING OCCUPANCY!!Total hou...</td>\n",
       "      <td>int</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>HOC01_PM</td>\n",
       "      <td>DP04_0001PM</td>\n",
       "      <td>Percent Margin of Error!!HOUSING OCCUPANCY!!To...</td>\n",
       "      <td>int</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>HOC02_E</td>\n",
       "      <td>DP04_0002E</td>\n",
       "      <td>Estimate!!HOUSING OCCUPANCY!!Total housing uni...</td>\n",
       "      <td>int</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     db_var   census_var                                       census_label  \\\n",
       "0   HOC01_E   DP04_0001E   Estimate!!HOUSING OCCUPANCY!!Total housing units   \n",
       "1   HOC01_M   DP04_0001M  Margin of Error!!HOUSING OCCUPANCY!!Total hous...   \n",
       "2  HOC01_PC  DP04_0001PE  Percent Estimate!!HOUSING OCCUPANCY!!Total hou...   \n",
       "3  HOC01_PM  DP04_0001PM  Percent Margin of Error!!HOUSING OCCUPANCY!!To...   \n",
       "4   HOC02_E   DP04_0002E  Estimate!!HOUSING OCCUPANCY!!Total housing uni...   \n",
       "\n",
       "  dtype  \n",
       "0   int  \n",
       "1   int  \n",
       "2   int  \n",
       "3   int  \n",
       "4   int  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dfexcel = pd.read_excel(os.path.join('inputs','acs_variables.xlsx'),sheet_name=worksheet)\n",
    "dfexcel.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "There are an equal number of variables in both lists: 236\n"
     ]
    }
   ],
   "source": [
    "dfvars = pd.DataFrame.from_dict(datadict,columns=['label'],orient='index')\n",
    "dfvars_selected=dfvars.loc[dfvars.index.isin(dfexcel['census_var'])]\n",
    "dfvars_count=len(dfvars_selected)\n",
    "dfexcel_count=len(dfexcel['census_var'])\n",
    "\n",
    "if dfvars_count==dfexcel_count:\n",
    "    print('There are an equal number of variables in both lists:', dfvars_count)\n",
    "else:\n",
    "    print('There is a mismatch in the number of variables; the api has,', dfvars_count, \n",
    "          'while the original list has',dfexcel_count,'. Missing:')\n",
    "    nomatch=dfexcel[~dfexcel['census_var'].isin(dfvars_selected.index)]\n",
    "    print(nomatch)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "All labels match\n"
     ]
    }
   ],
   "source": [
    "mismatch=dfexcel[~dfexcel['census_label'].isin(dfvars_selected['label'])]\n",
    "\n",
    "if len (mismatch) ==0:\n",
    "    print('All labels match')\n",
    "else:\n",
    "    compare=pd.merge(mismatch,dfvars_selected, left_on='census_var', right_on=dfvars_selected.index)\n",
    "    misfile=os.path.join('outputs','{}_mismatch.csv'.format(worksheet))\n",
    "    compare.to_csv(misfile, columns=['db_var', 'census_var', 'census_label', 'label'],sep=',', index=False,\n",
    "    header=['db_var','census_var' ,'oldlabel','newlabel'])\n",
    "    print('* Mismatch file printed to outputs folder * \\n')\n",
    "    print('These labels do not match:')\n",
    "    print(compare[['census_var','census_label','label']])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of geographic indetifiers: 215\n"
     ]
    }
   ],
   "source": [
    "# Geographic indetifiers: zctas to retrieve, pumas to filter by, and counties containing tracts to retrieve\n",
    "excelgeo = pd.read_excel(os.path.join('inputs','acs_variables.xlsx'),sheet_name=geotype, dtype=object)\n",
    "geoids = excelgeo['GEO'].tolist()\n",
    "print('Number of geographic indetifiers:',len(geoids))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Retrieve Data\n",
    "Given the large number of variables in the ACS and limits of the API, variables must be passed to the url in separate blocks or chunks. The first chunk that's captured is written to an empty datalist; the header row and then one row for each geography. Each subsequent chunk is iterated through by row, so each row is appended to the correct row in datalist. In all cases, the last values, identifiers automatically returned with each API call, are not appended."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "def chunks(l, n):\n",
    "    # For item i in a range that is a length of l,\n",
    "    for i in range(0, len(l), n):\n",
    "        # Create an index range for l of n items:\n",
    "        yield l[i:i+n]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of chunks: 6\n"
     ]
    }
   ],
   "source": [
    "reqvars=list(chunks(dfvars_selected.index.tolist(),46))\n",
    "reqvars[0].insert(0,'NAME')\n",
    "reqvars[0].insert(0,'GEO_ID')\n",
    "print('Number of chunks:',len(reqvars))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'https://api.census.gov/data/2018/acs/acs5/profile'"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "with open(keyfile) as key:\n",
    "    api_key=key.read().strip()\n",
    "\n",
    "base_url = f'https://api.census.gov/data/{year}/{dsource}/{dsource2}/{dname}'\n",
    "base_url"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Function for retrieving data; running this block loads it into memory\n",
    "#Different geographies have different urls, \n",
    "#and a different number of identifiers tacked on to the end of each request\n",
    "\n",
    "def getdata():\n",
    "    dlist=[]\n",
    "    for i, v in enumerate(reqvars):\n",
    "        batchcols=','.join(v)\n",
    "        if geotype=='zctas':\n",
    "            data_url = f'{base_url}?get={batchcols}&for={geo}:{g}&key={api_key}'\n",
    "            dropvar=-1\n",
    "        elif geotype=='pumas':\n",
    "            data_url = f'{base_url}?get={batchcols}&for={geo}:*&in=state:{state}&key={api_key}'\n",
    "            dropvar=-2\n",
    "        elif geotype=='tracts':\n",
    "            data_url = f'{base_url}?get={batchcols}&for={geo}:*&in=state:{state}&in=county:{county}&key={api_key}'\n",
    "            dropvar=-3\n",
    "        else:\n",
    "            print('Appropriate geography not specified in variables block')\n",
    "            break  \n",
    "        response=requests.get(data_url)\n",
    "        if response.status_code==200:\n",
    "            clear_output(wait=True)\n",
    "            data=response.json()\n",
    "            for i2, v2 in enumerate(data):\n",
    "                if i == 0:\n",
    "                    dlist.append(v2[:dropvar])\n",
    "                else:\n",
    "                    for item in v2[:dropvar]:\n",
    "                        dlist[i2].append(item)\n",
    "        else:\n",
    "            print('***Problem with retrieval***, response code',response.status_code)\n",
    "    return dlist"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### ***THIS BLOCK IS A REQUESTS BLOCK!***\n",
    "*NOTE: ZCTA retrieval takes a long time - 15 mins for 215 ZCTAs*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Retrieved data for 11697\n",
      "Retrieved 216 records and 51408 data points with 238.0 points for each record...\n",
      "Done - Data dumped to json file\n"
     ]
    }
   ],
   "source": [
    "#If this block was run successfully for a given table and geography don't rerun - next block pulls from saved json\n",
    "datalist=[]\n",
    "if geotype=='zctas':\n",
    "    for g in geoids:\n",
    "        georecord=getdata()\n",
    "        print('Retrieved data for',g)\n",
    "        if len(datalist)==0:\n",
    "            datalist.append(georecord[0])\n",
    "            datalist.append(georecord[1])\n",
    "        else:\n",
    "            datalist.append(georecord[1])\n",
    "elif geotype=='pumas':\n",
    "    datalist=getdata()\n",
    "elif geotype=='tracts':\n",
    "    for county in geoids:\n",
    "        georecord=getdata()\n",
    "        print('Retrieved data for',county)\n",
    "        if len(datalist)==0:\n",
    "            for tract in georecord:\n",
    "                datalist.append(tract)\n",
    "        else:\n",
    "            for tract in georecord[1:]:\n",
    "                datalist.append(tract)\n",
    "    \n",
    "dlrows=len(datalist)\n",
    "dlitems=sum(len(x) for x in datalist)\n",
    "dlbyrow=dlitems / dlrows\n",
    "print('Retrieved', dlrows, 'records and', dlitems,'data points with', dlbyrow, 'points for each record...')\n",
    "        \n",
    "with open(jsonpath, 'w') as f:\n",
    "    json.dump(datalist, f)\n",
    "print('Done - Data dumped to json file')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Process Data\n",
    "Replace footnotes with nulls, create a new GEOID2 column, replace census variable names with database variable names."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "Index: 215 entries, 8600000US10001 to 8600000US11697\n",
      "Columns: 237 entries, GEOLABEL to DP05_0087PM\n",
      "dtypes: object(237)\n",
      "memory usage: 399.8+ KB\n"
     ]
    }
   ],
   "source": [
    "with open(jsonpath, 'r') as f:\n",
    "    jsondata=json.load(f)\n",
    "alldata = pd.DataFrame(jsondata[1:],columns=jsondata[0],dtype=object).rename(columns={\n",
    "    'GEO_ID':'GEOID','NAME':'GEOLABEL'}).set_index('GEOID')\n",
    "alldata.info()\n",
    "# Index and column entries should be 1 row and 1 column less than previous count (excludes header row and index column) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>GEOLABEL</th>\n",
       "      <th>DP04_0001E</th>\n",
       "      <th>DP04_0001M</th>\n",
       "      <th>DP04_0001PE</th>\n",
       "      <th>DP04_0001PM</th>\n",
       "      <th>DP04_0002E</th>\n",
       "      <th>DP04_0002M</th>\n",
       "      <th>DP04_0002PE</th>\n",
       "      <th>DP04_0002PM</th>\n",
       "      <th>DP04_0003E</th>\n",
       "      <th>...</th>\n",
       "      <th>DP05_0082PE</th>\n",
       "      <th>DP05_0082PM</th>\n",
       "      <th>DP05_0083E</th>\n",
       "      <th>DP05_0083M</th>\n",
       "      <th>DP05_0083PE</th>\n",
       "      <th>DP05_0083PM</th>\n",
       "      <th>DP05_0087E</th>\n",
       "      <th>DP05_0087M</th>\n",
       "      <th>DP05_0087PE</th>\n",
       "      <th>DP05_0087PM</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>GEOID</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>8600000US10001</th>\n",
       "      <td>ZCTA5 10001</td>\n",
       "      <td>14141</td>\n",
       "      <td>539</td>\n",
       "      <td>14141</td>\n",
       "      <td>-888888888</td>\n",
       "      <td>12431</td>\n",
       "      <td>521</td>\n",
       "      <td>87.9</td>\n",
       "      <td>2.2</td>\n",
       "      <td>1710</td>\n",
       "      <td>...</td>\n",
       "      <td>0.5</td>\n",
       "      <td>0.5</td>\n",
       "      <td>309</td>\n",
       "      <td>133</td>\n",
       "      <td>1.3</td>\n",
       "      <td>0.6</td>\n",
       "      <td>17441</td>\n",
       "      <td>981</td>\n",
       "      <td>17441</td>\n",
       "      <td>-888888888</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8600000US10002</th>\n",
       "      <td>ZCTA5 10002</td>\n",
       "      <td>35724</td>\n",
       "      <td>453</td>\n",
       "      <td>35724</td>\n",
       "      <td>-888888888</td>\n",
       "      <td>33540</td>\n",
       "      <td>614</td>\n",
       "      <td>93.9</td>\n",
       "      <td>1.2</td>\n",
       "      <td>2184</td>\n",
       "      <td>...</td>\n",
       "      <td>0.2</td>\n",
       "      <td>0.2</td>\n",
       "      <td>1516</td>\n",
       "      <td>445</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.6</td>\n",
       "      <td>53042</td>\n",
       "      <td>1715</td>\n",
       "      <td>53042</td>\n",
       "      <td>-888888888</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8600000US10003</th>\n",
       "      <td>ZCTA5 10003</td>\n",
       "      <td>30876</td>\n",
       "      <td>378</td>\n",
       "      <td>30876</td>\n",
       "      <td>-888888888</td>\n",
       "      <td>26124</td>\n",
       "      <td>703</td>\n",
       "      <td>84.6</td>\n",
       "      <td>2.1</td>\n",
       "      <td>4752</td>\n",
       "      <td>...</td>\n",
       "      <td>0.3</td>\n",
       "      <td>0.2</td>\n",
       "      <td>1074</td>\n",
       "      <td>304</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.6</td>\n",
       "      <td>44049</td>\n",
       "      <td>1560</td>\n",
       "      <td>44049</td>\n",
       "      <td>-888888888</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>3 rows × 237 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                   GEOLABEL DP04_0001E DP04_0001M DP04_0001PE DP04_0001PM  \\\n",
       "GEOID                                                                       \n",
       "8600000US10001  ZCTA5 10001      14141        539       14141  -888888888   \n",
       "8600000US10002  ZCTA5 10002      35724        453       35724  -888888888   \n",
       "8600000US10003  ZCTA5 10003      30876        378       30876  -888888888   \n",
       "\n",
       "               DP04_0002E DP04_0002M DP04_0002PE DP04_0002PM DP04_0003E  ...  \\\n",
       "GEOID                                                                    ...   \n",
       "8600000US10001      12431        521        87.9         2.2       1710  ...   \n",
       "8600000US10002      33540        614        93.9         1.2       2184  ...   \n",
       "8600000US10003      26124        703        84.6         2.1       4752  ...   \n",
       "\n",
       "               DP05_0082PE DP05_0082PM DP05_0083E DP05_0083M DP05_0083PE  \\\n",
       "GEOID                                                                      \n",
       "8600000US10001         0.5         0.5        309        133         1.3   \n",
       "8600000US10002         0.2         0.2       1516        445         2.0   \n",
       "8600000US10003         0.3         0.2       1074        304         2.0   \n",
       "\n",
       "               DP05_0083PM DP05_0087E DP05_0087M DP05_0087PE DP05_0087PM  \n",
       "GEOID                                                                     \n",
       "8600000US10001         0.6      17441        981       17441  -888888888  \n",
       "8600000US10002         0.6      53042       1715       53042  -888888888  \n",
       "8600000US10003         0.6      44049       1560       44049  -888888888  \n",
       "\n",
       "[3 rows x 237 columns]"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "alldata.head(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "#This is a lousy solution, come up with something better in the future\n",
    "footnotes=['-999999999','-999999999.0', '-999999999.00',\n",
    "           '-888888888','-888888888.0', '-888888888.00',\n",
    "           '-666666666','-666666666.0', '-666666666.00',\n",
    "           '-555555555','-555555555.0', '-555555555.00',\n",
    "           '-333333333','-333333333.0', '-333333333.00',\n",
    "           '-222222222','-222222222.0', '-222222222.00']\n",
    "alldata.replace(footnotes,np.nan,inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>GEOID2</th>\n",
       "      <th>GEOLABEL</th>\n",
       "      <th>DP04_0001E</th>\n",
       "      <th>DP04_0001M</th>\n",
       "      <th>DP04_0001PE</th>\n",
       "      <th>DP04_0001PM</th>\n",
       "      <th>DP04_0002E</th>\n",
       "      <th>DP04_0002M</th>\n",
       "      <th>DP04_0002PE</th>\n",
       "      <th>DP04_0002PM</th>\n",
       "      <th>...</th>\n",
       "      <th>DP05_0082PE</th>\n",
       "      <th>DP05_0082PM</th>\n",
       "      <th>DP05_0083E</th>\n",
       "      <th>DP05_0083M</th>\n",
       "      <th>DP05_0083PE</th>\n",
       "      <th>DP05_0083PM</th>\n",
       "      <th>DP05_0087E</th>\n",
       "      <th>DP05_0087M</th>\n",
       "      <th>DP05_0087PE</th>\n",
       "      <th>DP05_0087PM</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>GEOID</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>8600000US10001</th>\n",
       "      <td>10001</td>\n",
       "      <td>ZCTA5 10001</td>\n",
       "      <td>14141</td>\n",
       "      <td>539</td>\n",
       "      <td>14141</td>\n",
       "      <td>NaN</td>\n",
       "      <td>12431</td>\n",
       "      <td>521</td>\n",
       "      <td>87.9</td>\n",
       "      <td>2.2</td>\n",
       "      <td>...</td>\n",
       "      <td>0.5</td>\n",
       "      <td>0.5</td>\n",
       "      <td>309</td>\n",
       "      <td>133</td>\n",
       "      <td>1.3</td>\n",
       "      <td>0.6</td>\n",
       "      <td>17441</td>\n",
       "      <td>981</td>\n",
       "      <td>17441</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8600000US10002</th>\n",
       "      <td>10002</td>\n",
       "      <td>ZCTA5 10002</td>\n",
       "      <td>35724</td>\n",
       "      <td>453</td>\n",
       "      <td>35724</td>\n",
       "      <td>NaN</td>\n",
       "      <td>33540</td>\n",
       "      <td>614</td>\n",
       "      <td>93.9</td>\n",
       "      <td>1.2</td>\n",
       "      <td>...</td>\n",
       "      <td>0.2</td>\n",
       "      <td>0.2</td>\n",
       "      <td>1516</td>\n",
       "      <td>445</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.6</td>\n",
       "      <td>53042</td>\n",
       "      <td>1715</td>\n",
       "      <td>53042</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8600000US10003</th>\n",
       "      <td>10003</td>\n",
       "      <td>ZCTA5 10003</td>\n",
       "      <td>30876</td>\n",
       "      <td>378</td>\n",
       "      <td>30876</td>\n",
       "      <td>NaN</td>\n",
       "      <td>26124</td>\n",
       "      <td>703</td>\n",
       "      <td>84.6</td>\n",
       "      <td>2.1</td>\n",
       "      <td>...</td>\n",
       "      <td>0.3</td>\n",
       "      <td>0.2</td>\n",
       "      <td>1074</td>\n",
       "      <td>304</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.6</td>\n",
       "      <td>44049</td>\n",
       "      <td>1560</td>\n",
       "      <td>44049</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>3 rows × 238 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "               GEOID2     GEOLABEL DP04_0001E DP04_0001M DP04_0001PE  \\\n",
       "GEOID                                                                  \n",
       "8600000US10001  10001  ZCTA5 10001      14141        539       14141   \n",
       "8600000US10002  10002  ZCTA5 10002      35724        453       35724   \n",
       "8600000US10003  10003  ZCTA5 10003      30876        378       30876   \n",
       "\n",
       "                DP04_0001PM DP04_0002E DP04_0002M DP04_0002PE DP04_0002PM  \\\n",
       "GEOID                                                                       \n",
       "8600000US10001          NaN      12431        521        87.9         2.2   \n",
       "8600000US10002          NaN      33540        614        93.9         1.2   \n",
       "8600000US10003          NaN      26124        703        84.6         2.1   \n",
       "\n",
       "                ... DP05_0082PE DP05_0082PM DP05_0083E DP05_0083M DP05_0083PE  \\\n",
       "GEOID           ...                                                             \n",
       "8600000US10001  ...         0.5         0.5        309        133         1.3   \n",
       "8600000US10002  ...         0.2         0.2       1516        445         2.0   \n",
       "8600000US10003  ...         0.3         0.2       1074        304         2.0   \n",
       "\n",
       "               DP05_0083PM  DP05_0087E  DP05_0087M DP05_0087PE DP05_0087PM  \n",
       "GEOID                                                                       \n",
       "8600000US10001         0.6       17441         981       17441         NaN  \n",
       "8600000US10002         0.6       53042        1715       53042         NaN  \n",
       "8600000US10003         0.6       44049        1560       44049         NaN  \n",
       "\n",
       "[3 rows x 238 columns]"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "idxgeoid2={'zctas':-5, 'pumas':-7,'tracts':-11}\n",
    "alldata.insert(loc=0, column='GEOID2',value=alldata.index.str[idxgeoid2.get(geotype):])\n",
    "alldata.head(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(215, 238)"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# For PUMAS filter all the geotype for the state by local areas\n",
    "if geotype == 'pumas':\n",
    "    acsdata=alldata.loc[alldata.GEOID2.isin(geoids)].copy().astype(object).sort_index()\n",
    "else:\n",
    "    acsdata=alldata.copy().astype(object).sort_index()\n",
    "acsdata.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[('DP05_0001M', 'SXAG01_M'),\n",
       " ('DP05_0036PM', 'RACE02_PM'),\n",
       " ('DP05_0039M', 'RACE05_M'),\n",
       " ('DP04_0134E', 'RENT09_E'),\n",
       " ('DP05_0037PM', 'RACE03_PM')]"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Dictionary of column names from the census and the nyc geodatabase\n",
    "cv_to_db=dict(zip(dfexcel.census_var, dfexcel.db_var))\n",
    "random.sample(cv_to_db.items(), 5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>GEOID2</th>\n",
       "      <th>GEOLABEL</th>\n",
       "      <th>HOC01_E</th>\n",
       "      <th>HOC01_M</th>\n",
       "      <th>HOC01_PC</th>\n",
       "      <th>HOC01_PM</th>\n",
       "      <th>HOC02_E</th>\n",
       "      <th>HOC02_M</th>\n",
       "      <th>HOC02_PC</th>\n",
       "      <th>HOC02_PM</th>\n",
       "      <th>...</th>\n",
       "      <th>HISL09_PC</th>\n",
       "      <th>HISL09_PM</th>\n",
       "      <th>HISL10_E</th>\n",
       "      <th>HISL10_M</th>\n",
       "      <th>HISL10_PC</th>\n",
       "      <th>HISL10_PM</th>\n",
       "      <th>VOTE01_E</th>\n",
       "      <th>VOTE01_M</th>\n",
       "      <th>VOTE01_PC</th>\n",
       "      <th>VOTE01_PM</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>GEOID</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>8600000US10001</th>\n",
       "      <td>10001</td>\n",
       "      <td>ZCTA5 10001</td>\n",
       "      <td>14141</td>\n",
       "      <td>539</td>\n",
       "      <td>14141</td>\n",
       "      <td>NaN</td>\n",
       "      <td>12431</td>\n",
       "      <td>521</td>\n",
       "      <td>87.9</td>\n",
       "      <td>2.2</td>\n",
       "      <td>...</td>\n",
       "      <td>0.5</td>\n",
       "      <td>0.5</td>\n",
       "      <td>309</td>\n",
       "      <td>133</td>\n",
       "      <td>1.3</td>\n",
       "      <td>0.6</td>\n",
       "      <td>17441</td>\n",
       "      <td>981</td>\n",
       "      <td>17441</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8600000US10002</th>\n",
       "      <td>10002</td>\n",
       "      <td>ZCTA5 10002</td>\n",
       "      <td>35724</td>\n",
       "      <td>453</td>\n",
       "      <td>35724</td>\n",
       "      <td>NaN</td>\n",
       "      <td>33540</td>\n",
       "      <td>614</td>\n",
       "      <td>93.9</td>\n",
       "      <td>1.2</td>\n",
       "      <td>...</td>\n",
       "      <td>0.2</td>\n",
       "      <td>0.2</td>\n",
       "      <td>1516</td>\n",
       "      <td>445</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.6</td>\n",
       "      <td>53042</td>\n",
       "      <td>1715</td>\n",
       "      <td>53042</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8600000US10003</th>\n",
       "      <td>10003</td>\n",
       "      <td>ZCTA5 10003</td>\n",
       "      <td>30876</td>\n",
       "      <td>378</td>\n",
       "      <td>30876</td>\n",
       "      <td>NaN</td>\n",
       "      <td>26124</td>\n",
       "      <td>703</td>\n",
       "      <td>84.6</td>\n",
       "      <td>2.1</td>\n",
       "      <td>...</td>\n",
       "      <td>0.3</td>\n",
       "      <td>0.2</td>\n",
       "      <td>1074</td>\n",
       "      <td>304</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.6</td>\n",
       "      <td>44049</td>\n",
       "      <td>1560</td>\n",
       "      <td>44049</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>3 rows × 238 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "               GEOID2     GEOLABEL HOC01_E HOC01_M HOC01_PC HOC01_PM HOC02_E  \\\n",
       "GEOID                                                                          \n",
       "8600000US10001  10001  ZCTA5 10001   14141     539    14141      NaN   12431   \n",
       "8600000US10002  10002  ZCTA5 10002   35724     453    35724      NaN   33540   \n",
       "8600000US10003  10003  ZCTA5 10003   30876     378    30876      NaN   26124   \n",
       "\n",
       "               HOC02_M HOC02_PC HOC02_PM  ... HISL09_PC HISL09_PM HISL10_E  \\\n",
       "GEOID                                     ...                                \n",
       "8600000US10001     521     87.9      2.2  ...       0.5       0.5      309   \n",
       "8600000US10002     614     93.9      1.2  ...       0.2       0.2     1516   \n",
       "8600000US10003     703     84.6      2.1  ...       0.3       0.2     1074   \n",
       "\n",
       "               HISL10_M HISL10_PC HISL10_PM VOTE01_E VOTE01_M VOTE01_PC  \\\n",
       "GEOID                                                                     \n",
       "8600000US10001      133       1.3       0.6    17441      981     17441   \n",
       "8600000US10002      445       2.0       0.6    53042     1715     53042   \n",
       "8600000US10003      304       2.0       0.6    44049     1560     44049   \n",
       "\n",
       "               VOTE01_PM  \n",
       "GEOID                     \n",
       "8600000US10001       NaN  \n",
       "8600000US10002       NaN  \n",
       "8600000US10003       NaN  \n",
       "\n",
       "[3 rows x 238 columns]"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Rename the census variables to nyc geodatabase variables\n",
    "acsdata.rename(columns=cv_to_db,inplace=True)\n",
    "acsdata.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Write to Database\n",
    "Update list of variables and data types, build create table string, create datatable in temporary database.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>db_var</th>\n",
       "      <th>census_var</th>\n",
       "      <th>census_label</th>\n",
       "      <th>dtype</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>HOC01_E</td>\n",
       "      <td>DP04_0001E</td>\n",
       "      <td>Estimate - HOUSING OCCUPANCY - Total housing u...</td>\n",
       "      <td>INTEGER</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>HOC01_M</td>\n",
       "      <td>DP04_0001M</td>\n",
       "      <td>Margin of Error - HOUSING OCCUPANCY - Total ho...</td>\n",
       "      <td>INTEGER</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>HOC01_PC</td>\n",
       "      <td>DP04_0001PE</td>\n",
       "      <td>Percent Estimate - HOUSING OCCUPANCY - Total h...</td>\n",
       "      <td>INTEGER</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>HOC01_PM</td>\n",
       "      <td>DP04_0001PM</td>\n",
       "      <td>Percent Margin of Error - HOUSING OCCUPANCY - ...</td>\n",
       "      <td>INTEGER</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>HOC02_E</td>\n",
       "      <td>DP04_0002E</td>\n",
       "      <td>Estimate - HOUSING OCCUPANCY - Total housing u...</td>\n",
       "      <td>INTEGER</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     db_var   census_var                                       census_label  \\\n",
       "0   HOC01_E   DP04_0001E  Estimate - HOUSING OCCUPANCY - Total housing u...   \n",
       "1   HOC01_M   DP04_0001M  Margin of Error - HOUSING OCCUPANCY - Total ho...   \n",
       "2  HOC01_PC  DP04_0001PE  Percent Estimate - HOUSING OCCUPANCY - Total h...   \n",
       "3  HOC01_PM  DP04_0001PM  Percent Margin of Error - HOUSING OCCUPANCY - ...   \n",
       "4   HOC02_E   DP04_0002E  Estimate - HOUSING OCCUPANCY - Total housing u...   \n",
       "\n",
       "     dtype  \n",
       "0  INTEGER  \n",
       "1  INTEGER  \n",
       "2  INTEGER  \n",
       "3  INTEGER  \n",
       "4  INTEGER  "
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dfexcel.replace({'dtype': {'int': 'INTEGER', 'float': 'REAL'}},inplace=True)\n",
    "dfexcel.census_label.replace({'!!': ' - '},inplace=True, regex=True)\n",
    "dfexcel.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[('HISL09_PM',\n",
       "  ['DP05_0082PM',\n",
       "   'Percent Margin of Error - HISPANIC OR LATINO AND RACE - Total population - Not Hispanic or Latino - Some other race alone',\n",
       "   'REAL']),\n",
       " ('HISL03_E',\n",
       "  ['DP05_0076E',\n",
       "   'Estimate - HISPANIC OR LATINO AND RACE - Total population - Not Hispanic or Latino',\n",
       "   'INTEGER'])]"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vardict=dfexcel.set_index('db_var').T.to_dict('list')\n",
    "random.sample(vardict.items(), 2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "con = sqlite3.connect(dbname) \n",
    "cur = con.cursor()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "CREATE TABLE zctas_2018acs2 (\n",
      "GEOID TEXT,\n",
      "GEOID2 TEXT NOT NULL PRIMARY KEY,\n",
      "GEOLABEL TEXT,\n",
      "HOC01_E INTEGER, \n",
      "HOC01_M INTEGER, \n",
      "HOC01_PC INTEGER, \n",
      "HOC01_PM INTEGER, \n",
      "HOC02_E INTEGER, \n",
      "HOC02_M INTEGER, \n",
      "HOC02_PC REAL, \n",
      "HOC02_PM REAL, \n",
      "HOC03_E INTEGER, \n",
      "HOC03_M INTEGER, \n",
      "HOC03_PC REAL, \n",
      "HOC03_PM REAL, \n",
      "HOC04_E REAL, \n",
      "HOC04_M REAL, \n",
      "HOC04_PC INTEGER, \n",
      "HOC04_PM INTEGER, \n",
      "HOC05_E REAL, \n",
      "HOC05_M REAL, \n",
      "HOC05_PC INTEGER, \n",
      "HOC05_PM INTEGER, \n",
      "HTEN01_E INTEGER, \n",
      "HTEN01_M INTEGER, \n",
      "HTEN01_PC INTEGER, \n",
      "HTEN01_PM INTEGER, \n",
      "HTEN02_E INTEGER, \n",
      "HTEN02_M INTEGER, \n",
      "HTEN02_PC REAL, \n",
      "HTEN02_PM REAL, \n",
      "HTEN03_E INTEGER, \n",
      "HTEN03_M INTEGER, \n",
      "HTEN03_PC REAL, \n",
      "HTEN03_PM REAL, \n",
      "HVAL01_E INTEGER, \n",
      "HVAL01_M INTEGER, \n",
      "HVAL01_PC INTEGER, \n",
      "HVAL01_PM INTEGER, \n",
      "MORT01_E INTEGER, \n",
      "MORT01_M INTEGER, \n",
      "MORT01_PC INTEGER, \n",
      "MORT01_PM INTEGER, \n",
      "MORT02_E INTEGER, \n",
      "MORT02_M INTEGER, \n",
      "MORT02_PC REAL, \n",
      "MORT02_PM REAL, \n",
      "MORT03_E INTEGER, \n",
      "MORT03_M INTEGER, \n",
      "MORT03_PC REAL, \n",
      "MORT03_PM REAL, \n",
      "RENT01_E INTEGER, \n",
      "RENT01_M INTEGER, \n",
      "RENT01_PC INTEGER, \n",
      "RENT01_PM INTEGER, \n",
      "RENT02_E INTEGER, \n",
      "RENT02_M INTEGER, \n",
      "RENT02_PC REAL, \n",
      "RENT02_PM REAL, \n",
      "RENT03_E INTEGER, \n",
      "RENT03_M INTEGER, \n",
      "RENT03_PC REAL, \n",
      "RENT03_PM REAL, \n",
      "RENT04_E INTEGER, \n",
      "RENT04_M INTEGER, \n",
      "RENT04_PC REAL, \n",
      "RENT04_PM REAL, \n",
      "RENT05_E INTEGER, \n",
      "RENT05_M INTEGER, \n",
      "RENT05_PC REAL, \n",
      "RENT05_PM REAL, \n",
      "RENT06_E INTEGER, \n",
      "RENT06_M INTEGER, \n",
      "RENT06_PC REAL, \n",
      "RENT06_PM REAL, \n",
      "RENT07_E INTEGER, \n",
      "RENT07_M INTEGER, \n",
      "RENT07_PC REAL, \n",
      "RENT07_PM REAL, \n",
      "RENT08_E INTEGER, \n",
      "RENT08_M INTEGER, \n",
      "RENT08_PC REAL, \n",
      "RENT08_PM REAL, \n",
      "RENT09_E INTEGER, \n",
      "RENT09_M INTEGER, \n",
      "RENT09_PC INTEGER, \n",
      "RENT09_PM INTEGER, \n",
      "RENT10_E INTEGER, \n",
      "RENT10_M INTEGER, \n",
      "RENT10_PC INTEGER, \n",
      "RENT10_PM INTEGER, \n",
      "SXAG01_E INTEGER, \n",
      "SXAG01_M INTEGER, \n",
      "SXAG01_PC INTEGER, \n",
      "SXAG01_PM INTEGER, \n",
      "SXAG02_E INTEGER, \n",
      "SXAG02_M INTEGER, \n",
      "SXAG02_PC REAL, \n",
      "SXAG02_PM REAL, \n",
      "SXAG03_E INTEGER, \n",
      "SXAG03_M INTEGER, \n",
      "SXAG03_PC REAL, \n",
      "SXAG03_PM REAL, \n",
      "SXAG04_E INTEGER, \n",
      "SXAG04_M INTEGER, \n",
      "SXAG04_PC REAL, \n",
      "SXAG04_PM REAL, \n",
      "SXAG05_E INTEGER, \n",
      "SXAG05_M INTEGER, \n",
      "SXAG05_PC REAL, \n",
      "SXAG05_PM REAL, \n",
      "SXAG06_E INTEGER, \n",
      "SXAG06_M INTEGER, \n",
      "SXAG06_PC REAL, \n",
      "SXAG06_PM REAL, \n",
      "SXAG07_E INTEGER, \n",
      "SXAG07_M INTEGER, \n",
      "SXAG07_PC REAL, \n",
      "SXAG07_PM REAL, \n",
      "SXAG08_E INTEGER, \n",
      "SXAG08_M INTEGER, \n",
      "SXAG08_PC REAL, \n",
      "SXAG08_PM REAL, \n",
      "SXAG09_E INTEGER, \n",
      "SXAG09_M INTEGER, \n",
      "SXAG09_PC REAL, \n",
      "SXAG09_PM REAL, \n",
      "SXAG10_E INTEGER, \n",
      "SXAG10_M INTEGER, \n",
      "SXAG10_PC REAL, \n",
      "SXAG10_PM REAL, \n",
      "SXAG11_E INTEGER, \n",
      "SXAG11_M INTEGER, \n",
      "SXAG11_PC REAL, \n",
      "SXAG11_PM REAL, \n",
      "SXAG12_E INTEGER, \n",
      "SXAG12_M INTEGER, \n",
      "SXAG12_PC REAL, \n",
      "SXAG12_PM REAL, \n",
      "SXAG13_E INTEGER, \n",
      "SXAG13_M INTEGER, \n",
      "SXAG13_PC REAL, \n",
      "SXAG13_PM REAL, \n",
      "SXAG14_E INTEGER, \n",
      "SXAG14_M INTEGER, \n",
      "SXAG14_PC REAL, \n",
      "SXAG14_PM REAL, \n",
      "SXAG15_E INTEGER, \n",
      "SXAG15_M INTEGER, \n",
      "SXAG15_PC REAL, \n",
      "SXAG15_PM REAL, \n",
      "SXAG16_E INTEGER, \n",
      "SXAG16_M INTEGER, \n",
      "SXAG16_PC REAL, \n",
      "SXAG16_PM REAL, \n",
      "SXAG17_E REAL, \n",
      "SXAG17_M REAL, \n",
      "SXAG17_PC INTEGER, \n",
      "SXAG17_PM INTEGER, \n",
      "RACE01_E INTEGER, \n",
      "RACE01_M INTEGER, \n",
      "RACE01_PC INTEGER, \n",
      "RACE01_PM INTEGER, \n",
      "RACE02_E INTEGER, \n",
      "RACE02_M INTEGER, \n",
      "RACE02_PC REAL, \n",
      "RACE02_PM REAL, \n",
      "RACE03_E INTEGER, \n",
      "RACE03_M INTEGER, \n",
      "RACE03_PC REAL, \n",
      "RACE03_PM REAL, \n",
      "RACE04_E INTEGER, \n",
      "RACE04_M INTEGER, \n",
      "RACE04_PC REAL, \n",
      "RACE04_PM REAL, \n",
      "RACE05_E INTEGER, \n",
      "RACE05_M INTEGER, \n",
      "RACE05_PC REAL, \n",
      "RACE05_PM REAL, \n",
      "RACE06_E INTEGER, \n",
      "RACE06_M INTEGER, \n",
      "RACE06_PC REAL, \n",
      "RACE06_PM REAL, \n",
      "RACE07_E INTEGER, \n",
      "RACE07_M INTEGER, \n",
      "RACE07_PC REAL, \n",
      "RACE07_PM REAL, \n",
      "RACE08_E INTEGER, \n",
      "RACE08_M INTEGER, \n",
      "RACE08_PC REAL, \n",
      "RACE08_PM REAL, \n",
      "RACE09_E INTEGER, \n",
      "RACE09_M INTEGER, \n",
      "RACE09_PC REAL, \n",
      "RACE09_PM REAL, \n",
      "HISL01_E INTEGER, \n",
      "HISL01_M INTEGER, \n",
      "HISL01_PC INTEGER, \n",
      "HISL01_PM INTEGER, \n",
      "HISL02_E INTEGER, \n",
      "HISL02_M INTEGER, \n",
      "HISL02_PC REAL, \n",
      "HISL02_PM REAL, \n",
      "HISL03_E INTEGER, \n",
      "HISL03_M INTEGER, \n",
      "HISL03_PC REAL, \n",
      "HISL03_PM REAL, \n",
      "HISL04_E INTEGER, \n",
      "HISL04_M INTEGER, \n",
      "HISL04_PC REAL, \n",
      "HISL04_PM REAL, \n",
      "HISL05_E INTEGER, \n",
      "HISL05_M INTEGER, \n",
      "HISL05_PC REAL, \n",
      "HISL05_PM REAL, \n",
      "HISL06_E INTEGER, \n",
      "HISL06_M INTEGER, \n",
      "HISL06_PC REAL, \n",
      "HISL06_PM REAL, \n",
      "HISL07_E INTEGER, \n",
      "HISL07_M INTEGER, \n",
      "HISL07_PC REAL, \n",
      "HISL07_PM REAL, \n",
      "HISL08_E INTEGER, \n",
      "HISL08_M INTEGER, \n",
      "HISL08_PC REAL, \n",
      "HISL08_PM REAL, \n",
      "HISL09_E INTEGER, \n",
      "HISL09_M INTEGER, \n",
      "HISL09_PC REAL, \n",
      "HISL09_PM REAL, \n",
      "HISL10_E INTEGER, \n",
      "HISL10_M INTEGER, \n",
      "HISL10_PC REAL, \n",
      "HISL10_PM REAL, \n",
      "VOTE01_E INTEGER, \n",
      "VOTE01_M INTEGER, \n",
      "VOTE01_PC INTEGER, \n",
      "VOTE01_PM INTEGER);\n"
     ]
    }
   ],
   "source": [
    "cur.execute('DROP TABLE IF EXISTS {}'.format(tabname))\n",
    "dbstring=\"\"\"\n",
    "CREATE TABLE {} (\n",
    "GEOID TEXT,\n",
    "GEOID2 TEXT NOT NULL PRIMARY KEY,\n",
    "GEOLABEL TEXT,\n",
    "\"\"\".format(tabname)\n",
    "\n",
    "for k,v in vardict.items():\n",
    "    dbstring=dbstring+k+' '+v[2]+', \\n'\n",
    "    \n",
    "dbstring=dbstring[:-3]\n",
    "dbstring=dbstring+');'\n",
    "print(dbstring)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<sqlite3.Cursor at 0x21af7efbea0>"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cur.execute(dbstring)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "acsdata.to_sql(name=tabname, if_exists='append', index=True, con=con)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "215 records written to zctas_2018acs2\n"
     ]
    }
   ],
   "source": [
    "cur.execute('SELECT COUNT(*) FROM {};'.format(tabname))\n",
    "rows = cur.fetchone()\n",
    "print(rows[0], 'records written to', tabname)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "239 columns written to zctas_2018acs2\n"
     ]
    }
   ],
   "source": [
    "cur.execute('SELECT * FROM {} LIMIT 1;'.format(tabname))\n",
    "col_names = [cn[0] for cn in cur.description]\n",
    "print(len(col_names), 'columns written to', tabname)\n",
    "#Number should be same as number in df acsdata plus 1, since index not included in df count"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "con.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Metadata Table\n",
    "DO NOT RERUN THIS SECTION FOR MULTIPLE GEOGRAPHIES. In the NYC Geodatabase there is only one metadata table for all of the ACS tables (acs1 and acs2) for all geographies. For whichever geography is processed first, set action variable to 'create' and run this entire series of blocks for the acs1 table. For the acs2 table, set the action variable to 'append' and skip the table creation and identifier insertion blocks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Change table name and specify an action - you're creating the table for the first time with acs1 variables, \n",
    "#or appending the tables with acs2 variables\n",
    "\n",
    "metatab='acslookup2018'\n",
    "action='append' # 'create' or 'append'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "con = sqlite3.connect(dbname) \n",
    "cur = con.cursor()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Only run this block when creating initial table\n",
    "if action=='create':\n",
    "    mdstring=\"\"\"\n",
    "    CREATE TABLE {} (\n",
    "    tabnum TEXT,\n",
    "    est_id TEXT,\n",
    "    est_value TEXT);\n",
    "    \"\"\".format(metatab)\n",
    "    cur.execute(mdstring)\n",
    "else:\n",
    "    print('Block not executed because \"create\" not selected as an action in earlier block')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Only run this block when creating initial table\n",
    "if action=='create':\n",
    "    exstring=\"\"\"\n",
    "        INSERT INTO {} VALUES('both','NOTE','Each variable has 4 values that are identified by a particular suffix: E for estimate, M for margin of error for the estimate, PC for percent total, and PM for margin of error for the percent total');\n",
    "        INSERT INTO {} VALUES('both','GEOID','Id');\n",
    "        INSERT INTO {} VALUES('both','GEOID2','Id2');\n",
    "        INSERT INTO {} VALUES('both','GEOLABEL','Geography');\n",
    "        \"\"\".format(metatab,metatab,metatab,metatab)\n",
    "    cur.executescript(exstring)\n",
    "    con.commit()\n",
    "else:\n",
    "    print('Block not executed because \"create\" not selected as an action in earlier block')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Run when creating table or when appending records\n",
    "#Keys and values - db ids and labels - are simplified and truncated to 1 entry for each 4-column group (E,M,PC,PM)\n",
    "if action in ('create','append'):\n",
    "    for mk, mv in vardict.items():\n",
    "        if mk.endswith('_E'):\n",
    "            cur.execute(\"INSERT INTO {} values(?,?,?)\".format(metatab),(worksheet,mk[:-2],mv[1][11:]))\n",
    "    con.commit()\n",
    "else:\n",
    "    print('Block not executed because action not specified in earlier block')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cur.execute('SELECT COUNT(*) FROM {};'.format(metatab))\n",
    "rows = cur.fetchone()\n",
    "print(rows[0], 'records in', metatab)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "action=''\n",
    "con.close()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}