{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Data Incubator Challenge\n", "\n", "This notebook shows questions from Data Incubator Challenge of 2019\n", "\n", "## Table of Contents\n", "\n", "1. [Section 1: New York City Fires](#Section-1:-New-York-City-Fires)\n", "2. [Section 2: Cars in a circular road](#Section-2:-Cars-in-a-circular-road)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Section 1: New York City Fires\n", "#### ([Back to Top](#Table-of-Contents))\n", "The New York City Fire Department keeps a log of detailed information on incidents handled by FDNY units. In this challenge we will work with a dataset that contains a record of incidents handled by FDNY units from 2013-2017. Download the FDNY data set.(https://data.cityofnewyork.us/api/views/tm6d-hbzd/rows.csv?accessType=DOWNLOAD) Also take a look at the dataset landing page(https://data.cityofnewyork.us/Public-Safety/Incidents-Responded-to-by-Fire-Companies/tm6d-hbzd) and find descriptions of column names here. (https://data.cityofnewyork.us/api/views/tm6d-hbzd/files/1434d09c-fbf8-4450-8b42-9fe0c3b85fb3?download=true&filename=OPEN_DATA_FIRE_INCIDENTS_FILE_DESCRIPTION.xls)\n", "\n", "1. What proportion of FDNY responses in this dataset correspond to the most common type of incident?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. What is the ratio of the average number of units that arrive to a scene of an incident classified as `111 - Building fire` to the number that arrive for `651 - Smoke scare, odor of smoke`?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. How many times more likely is an incident in Staten Island a false call compared to in Manhattan? The answer should be the ratio of Staten Island false call rate to Manhattan false call rate. A false call is an incident for which `INCIDENT_TYPE_DESC` is `710 - Malicious, mischievous false call, other`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "4. Check the distribution of the number of minutes it takes between the time a `111 - Building fire` incident has been logged into the Computer Aided Dispatch system and the time at which the first unit arrives on scene. \n", " * What is the third quartile of that distribution. **Note:** the number of minutes can be fractional (ie, do not round)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "scrolled": false }, "source": [ "5. We can use the FDNY dataset to investigate at what time of the day people cook most. \n", " * Compute what proportion of all incidents are cooking fires for every hour of the day by normalizing the number of cooking fires in a given hour by the total number of incidents that occured in that hour.\n", " * Find the hour of the day that has the highest proportion of cooking fires and submit that proportion of cooking fires. A cooking fire is an incident for which `INCIDENT_TYPE_DESC` is `113 - Cooking fire, confined to container`. **Note:** round incident times down. For example, if an incident occured at 22:55 it occured in hour 22." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "6. What is the coefficient of determination (R squared) between the number of residents at each zip code and the number of inicidents whose type is classified as `111 - Building fire` at each of those zip codes. **Note:** The 2010 US Census population by zip code dataset should be downloaded from here (https://s3.amazonaws.com/SplitwiseBlogJB/2010+Census+Population+By+Zipcode+(ZCTA).csv) You will need to use both the FDNY responses and the US Census dataset. Ignore zip codes that do not appear in the census table." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "7. For this question, only consider incidents that have information about whether a CO detector was present or not. We are interested in how many times more likely it is that an incident is long when no CO detector is present compared to when a CO detector is present. For events with CO detector and for those without one, compute the proportion of incidents that lasted $20-30, 30-40, 40-50, 50-60,$ and $60-70$ minutes (both interval boundary values included) by dividing the number of incidents in each time interval with the total number of incidents. \n", " * For each bin, compute the ratio of the `CO detector absent` frequency to the `CO detector present` frequency.\n", " * Perform a linear regression of this ratio to the midpoint of the bins.\n", " * From this, what is the predicted ratio for events lasting $39$ minutes?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "8. Calculate the chi-square test statistic for testing whether an incident is more likely to last longer than 60 minutes when CO detector is not present. Again only consider incidents that have information about whether a CO detector was present or not." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "9. Please provide the script used to generate this result (max 10000 characters)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Section 2: Cars in a circular road\n", "#### ([Back to Top](#Table-of-Contents))\n", "A circular road has $N$ positions labeled $0$ through $N-1$ where adjacent positions are connected to each other and position $N-1$ is connected to $0$. $M$ cars start at position $0$ through $M-1$ (inclusive). A car can make a valid move by moving forward one position (or goes from $N-1$ to $0$) if the position it is moving into is empty. At each turn, only consider cars that have a valid move available and make one of the valid moves that you choose randomly with equal probability. After $T$ rounds, we compute the average ($A$) and standard deviation ($S$) of the position of the cars.\n", "1. What is the expected value of $A$ when $N = 10, M = 5,$ and $T = 20$?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. What is the standard deviation of $A$ when $N = 10, M = 5,$ and $T = 20$?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. What is the expected value of $S$ when $N = 10, M = 5,$ and $T = 20$?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "4. What is the standard deviation of $S$ when $N = 10, M = 5,$ and $T = 20$?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "5. What is the expected value of $A$ when $N = 25, M = 10,$ and $T = 50$?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "6. What is the standard deviation of $A$ when $N = 25, M = 10,$ and $T = 50$?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "7. What is the expected value of $S$ when $N = 25, M = 10,$ and $T = 50$?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "8. What is the standard deviation of $S$ when $N = 25, M = 10,$ and $T = 50$?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "9. Please provide the script used to generate this result (max 10000 characters)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" } }, "nbformat": 4, "nbformat_minor": 2 }