{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lab 11: Part 2 - Food Inspection Forecasting: Data processing\n", "This file is an ipython notebook with [`R-magic`](https://ipython.org/ipython-doc/1/config/extensions/rmagic.html) to convert the data from Rds (the R programming language data dtorage sytem) to `csv` to be read into Python. If you ever find yourself in a bind with R code available for you... give `R-magic` a try. \n", "\n", "\n", "## **HUGE NOTE: All code here is taken from the [food-inspections-evaluation]( https://github.com/Chicago/food-inspections-evaluation) repository** \n", "### They did a great job at cleaning the data in R so I don't want to repeat work.\n", "\n", "All code and data is available on GitHub:\n", "https://github.com/Chicago/food-inspections-evaluation" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The rpy2.ipython extension is already loaded. To reload it, use:\n", " %reload_ext rpy2.ipython\n" ] } ], "source": [ "import rpy2\n", "import pandas as pd\n", "%load_ext rpy2.ipython" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%%R\n", "# change to your local clone\n", "data_dir = '~/food-inspections-evaluation/'\n", "out_dir = '~'\n", "\n", "library(\"data.table\", \"ggplot2\")\n", "\n", "setwd(data_dir)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Food Inspection database processing" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%%R\n", "food = readRDS(\"DATA/food_inspections.Rds\")\n", "write.csv(food, file = paste(out_dir, '/food_inspections.csv', sep = ''), row.names = FALSE)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Model Dataframe processing" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%%R\n", "dat = readRDS(\"DATA/dat_model.Rds\")\n", "write.csv(dat, file = paste(out_dir, '/dat_model.csv', sep = ''))" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "Classes ‘data.table’ and 'data.frame':\t18712 obs. of 16 variables:\n", " $ Inspectorblue : num 0 1 1 1 1 0 0 0 0 0 ...\n", " $ Inspectorbrown : num 0 0 0 0 0 0 0 0 0 0 ...\n", " $ Inspectorgreen : num 1 0 0 0 0 0 0 0 0 0 ...\n", " $ Inspectororange : num 0 0 0 0 0 1 1 1 1 1 ...\n", " $ Inspectorpurple : num 0 0 0 0 0 0 0 0 0 0 ...\n", " $ Inspectoryellow : num 0 0 0 0 0 0 0 0 0 0 ...\n", " $ pastSerious : num 0 0 0 0 0 0 0 0 0 0 ...\n", " $ pastCritical : num 0 0 0 0 0 0 0 0 0 0 ...\n", " $ timeSinceLast : num 2 2 2 2 2 2 2 2 2 2 ...\n", " $ ageAtInspection : num 1 1 1 1 1 1 0 1 1 0 ...\n", " $ consumption_on_premises_incidental_activity: num 0 0 0 0 0 0 0 0 0 0 ...\n", " $ tobacco_retail_over_counter : num 1 0 0 0 0 0 0 0 0 0 ...\n", " $ temperatureMax : num 53.5 59 59 56.2 52.7 ...\n", " $ heat_burglary : num 26.99 13.98 12.61 35.91 9.53 ...\n", " $ heat_sanitation : num 37.75 15.41 8.32 38.19 2.13 ...\n", " $ heat_garbage : num 12.8 12.9 8 26.2 3.4 ...\n", " - attr(*, \".internal.selfref\")= \n", "[1] 18712\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%R\n", "dat <- readRDS(\"DATA/dat_model.Rds\")\n", "\n", "## Only keep \"Retail Food Establishment\"\n", "dat <- dat[LICENSE_DESCRIPTION == \"Retail Food Establishment\"]\n", "## Remove License Description\n", "dat$LICENSE_DESCRIPTION <- NULL\n", "dat <- na.omit(dat)\n", "\n", "## Add criticalFound variable to dat:\n", "dat$criticalFound <- pmin(1, dat$criticalCount)\n", "\n", "# ## Set the key for dat\n", "setkey(dat, Inspection_ID)\n", "\n", "# Match time period of original results\n", "# dat <- dat[Inspection_Date < \"2013-09-01\" | Inspection_Date > \"2014-07-01\"]\n", "\n", "#==============================================================================\n", "# CREATE MODEL DATA\n", "#==============================================================================\n", "# sort(colnames(dat))\n", "\n", "xmat <- dat[ , list(Inspector = Inspector_Assigned,\n", " pastSerious = pmin(pastSerious, 1),\n", " pastCritical = pmin(pastCritical, 1),\n", " timeSinceLast,\n", " ageAtInspection = ifelse(ageAtInspection > 4, 1L, 0L),\n", " consumption_on_premises_incidental_activity,\n", " tobacco_retail_over_counter,\n", " temperatureMax,\n", " heat_burglary = pmin(heat_burglary, 70),\n", " heat_sanitation = pmin(heat_sanitation, 70),\n", " heat_garbage = pmin(heat_garbage, 50),\n", " # Facility_Type,\n", " criticalFound),\n", " keyby = Inspection_ID]\n", "mm <- model.matrix(criticalFound ~ . -1, data=xmat[ , -1, with=F])\n", "mm <- as.data.table(mm)\n", "str(mm)\n", "colnames(mm)\n", "\n", "#==============================================================================\n", "# CREATE TEST / TRAIN PARTITIONS\n", "#==============================================================================\n", "# 2014-07-01 is an easy separator\n", "\n", "dat[Inspection_Date < \"2014-07-01\", range(Inspection_Date)]\n", "dat[Inspection_Date > \"2014-07-01\", range(Inspection_Date)]\n", "\n", "iiTrain <- dat[ , which(Inspection_Date < \"2014-07-01\")]\n", "iiTest <- dat[ , which(Inspection_Date > \"2014-07-01\")]\n", "\n", "## Check to see if any rows didn't make it through the model.matrix formula\n", "nrow(dat)\n", "nrow(xmat)\n", "nrow(mm)\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%%R\n", "# Output Model Matrix and Target\n", "write.csv(mm, file = paste(out_dir, '/model_matrix.csv', sep = ''), row.names = FALSE)\n", "write.csv(xmat$criticalFound, file = paste(out_dir, '/TARGET.csv', sep = ''), row.names = FALSE)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.10" } }, "nbformat": 4, "nbformat_minor": 0 }