{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd\n", "# pd.set_option('max_colwidth', 50)\n", "# set this if you need to" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Health Department has developed an inspection report and scoring system. \n", "After conducting an inspection of the facility, the Health Inspector calculates a score based on the violations observed. \n", "Violations can fall into:\n", "- **high risk category**: records specific violations that directly relate to the transmission of food borne illnesses,the adulteration of food products and the contamination of food-contact surfaces.\n", "- **moderate risk category**: records specific violations that are of a moderate risk to the public health and safety.\n", "- **low risk category**: records violations that are low risk or have no immediate risk to the public health and safety." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "businesses = pd.read_csv('./data/businesses_plus.csv', parse_dates=True, dtype={'phone_number': str})\n", "businesses.head()\n", "# dtype casts the column as a specific data type" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "inspections = pd.read_csv('./data/inspections_plus.csv', parse_dates=True)\n", "inspections.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "violations = pd.read_csv('./data/violations_plus.csv', parse_dates=True)\n", "violations.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# 1 Combine the three dataframes into one data frame called restaurant_scores\n", "# Hint: http://pandas.pydata.org/pandas-docs/stable/merging.html" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# 2 Which ten business have had the most inspections?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 3 Group and count the inspections by type" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# 4 Create a plot that shows number of inspections per month\n", "# Bonus for creating a heatmap\n", "# http://stanford.edu/~mwaskom/software/seaborn/generated/seaborn.heatmap.html?highlight=heatmap" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 5 Which zip code contains the most high risk violations?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 6 If inspection is prompted by a change in restaurant ownership, \n", "# is the inspection more likely to be categorized as higher or lower risk?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 7 Examining the descriptions, what is the most common violation?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 8 Create a hist of the scores with 10 bins" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 9 Can you predict risk category based on the other features in this dataset?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 10 Extra Credit:\n", "# Use Instagram location API to find pictures taken at the lat, long of the most High Risk restaurant\n", "# https://www.instagram.com/developer/endpoints/locations/" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "############################\n", "### A Little More Morbid ###\n", "############################" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "killings = pd.read_csv('./data/police-killings.csv')\n", "killings.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 1. Make the following changed to column names:\n", "# lawenforcementagency -> agency\n", "# raceethnicity -> race" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 2. Show the count of missing values in each column" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 3. replace each null value in the dataframe with the string \"Unknown\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 4. How many killings were there so far in 2015?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 5. Of all killings, how many were male and how many female?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 6. How many killings were of unarmed people?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 7. What percentage of all killings were unarmed?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 8. What are the 5 states with the most killings?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 9. Show a value counts of deaths for each race" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 10. Display a histogram of ages of all killings" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 11. Show 6 histograms of ages by race" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 12. What is the average age of death by race?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 13. Show a bar chart with counts of deaths every month" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "###################\n", "### Less Morbid ###\n", "###################" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "majors = pd.read_csv('./data/college-majors.csv')\n", "majors.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 1. Delete the columns (employed_full_time_year_round, major_code)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 2. Show the cout of missing values in each column" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 3. What are the top 10 highest paying majors?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 4. Plot the data from the last question in a bar chart, include proper title, and labels!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 5. What is the average median salary for each major category?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 6. Show only the top 5 paying major categories" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 7. Plot a histogram of the distribution of median salaries" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 8. Plot a histogram of the distribution of median salaries by major category" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 9. What are the top 10 most UNemployed majors?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# What are the unemployment rates?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 10. What are the top 10 most UNemployed majors CATEGORIES? Use the mean for each category\n", "# What are the unemployment rates?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 11. the total and employed column refer to the people that were surveyed.\n", "# Create a new column showing the emlpoyment rate of the people surveyed for each major\n", "# call it \"sample_employment_rate\"\n", "# Example the first row has total: 128148 and employed: 90245. it's \n", "# sample_employment_rate should be 90245.0 / 128148.0 = .7042" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 12. Create a \"sample_unemployment_rate\" column\n", "# this column should be 1 - \"sample_employment_rate\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [sfdat26-env]", "language": "python", "name": "Python [sfdat26-env]" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.12" } }, "nbformat": 4, "nbformat_minor": 0 }