{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "**[SQL Home Page](https://www.kaggle.com/learn/intro-to-sql)**\n", "\n", "---\n", "\n", "*This exercise involves you writing code, and we check it automatically to tell you if it's right. We're having a temporary problem with out checking infrastructure, causing a bar that says `None` in some cases when you have the right answer. We're sorry. We're fixing it. In the meantime, if you see a bar saying `None` that means you've done something good.*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction\n", "\n", "The first test of your new data exploration skills uses data describing crime in the city of Chicago.\n", "\n", "Before you get started, run the following cell. It sets up the automated feedback system to review your answers." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Using Kaggle's public dataset BigQuery integration.\n", "Setup Complete\n" ] } ], "source": [ "# Set up feedack system\n", "from learntools.core import binder\n", "binder.bind(globals())\n", "from learntools.sql.ex1 import *\n", "print(\"Setup Complete\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use the next code cell to fetch the dataset." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Using Kaggle's public dataset BigQuery integration.\n" ] } ], "source": [ "from google.cloud import bigquery\n", "\n", "# Create a \"Client\" object\n", "client = bigquery.Client()\n", "\n", "# Construct a reference to the \"chicago_crime\" dataset\n", "dataset_ref = client.dataset(\"chicago_crime\", project=\"bigquery-public-data\")\n", "\n", "# API request - fetch the dataset\n", "dataset = client.get_dataset(dataset_ref)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Exercises\n", "\n", "### 1) Count tables in the dataset\n", "\n", "How many tables are in the Chicago Crime dataset?" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['crime']\n" ] } ], "source": [ "# Write the code you need here to figure out the answer\n", "table_ids = [x.table_id for x in client.list_tables(dataset)]\n", "print(table_ids)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "parent.postMessage({\"jupyterEvent\": \"custom.exercise_interaction\", \"data\": {\"outcomeType\": 1, \"valueTowardsCompletion\": 0.3333333333333333, \"interactionType\": 1, \"questionType\": 1, \"questionId\": \"1_CountTables\", \"learnToolsVersion\": \"0.3.4\", \"failureMessage\": \"\", \"exceptionClass\": \"\", \"trace\": \"\"}}, \"*\")" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "Correct" ], "text/plain": [ "Correct" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "num_tables = len(table_ids) # Store the answer as num_tables and then run this cell\n", "\n", "q_1.check()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For a hint or the solution, uncomment the appropriate line below." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "#q_1.hint()\n", "#q_1.solution()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2) Explore the table schema\n", "\n", "How many columns in the `crime` table have `TIMESTAMP` data?" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[SchemaField('unique_key', 'INTEGER', 'REQUIRED', 'Unique identifier for the record.', ()),\n", " SchemaField('case_number', 'STRING', 'NULLABLE', 'The Chicago Police Department RD Number (Records Division Number), which is unique to the incident.', ()),\n", " SchemaField('date', 'TIMESTAMP', 'NULLABLE', 'Date when the incident occurred. this is sometimes a best estimate.', ()),\n", " SchemaField('block', 'STRING', 'NULLABLE', 'The partially redacted address where the incident occurred, placing it on the same block as the actual address.', ()),\n", " SchemaField('iucr', 'STRING', 'NULLABLE', 'The Illinois Unifrom Crime Reporting code. This is directly linked to the Primary Type and Description. See the list of IUCR codes at https://data.cityofchicago.org/d/c7ck-438e.', ()),\n", " SchemaField('primary_type', 'STRING', 'NULLABLE', 'The primary description of the IUCR code.', ()),\n", " SchemaField('description', 'STRING', 'NULLABLE', 'The secondary description of the IUCR code, a subcategory of the primary description.', ()),\n", " SchemaField('location_description', 'STRING', 'NULLABLE', 'Description of the location where the incident occurred.', ()),\n", " SchemaField('arrest', 'BOOLEAN', 'NULLABLE', 'Indicates whether an arrest was made.', ()),\n", " SchemaField('domestic', 'BOOLEAN', 'NULLABLE', 'Indicates whether the incident was domestic-related as defined by the Illinois Domestic Violence Act.', ()),\n", " SchemaField('beat', 'INTEGER', 'NULLABLE', 'Indicates the beat where the incident occurred. A beat is the smallest police geographic area – each beat has a dedicated police beat car. Three to five beats make up a police sector, and three sectors make up a police district. The Chicago Police Department has 22 police districts. See the beats at https://data.cityofchicago.org/d/aerh-rz74.', ()),\n", " SchemaField('district', 'INTEGER', 'NULLABLE', 'Indicates the police district where the incident occurred. See the districts at https://data.cityofchicago.org/d/fthy-xz3r.', ()),\n", " SchemaField('ward', 'INTEGER', 'NULLABLE', 'The ward (City Council district) where the incident occurred. See the wards at https://data.cityofchicago.org/d/sp34-6z76.', ()),\n", " SchemaField('community_area', 'INTEGER', 'NULLABLE', 'Indicates the community area where the incident occurred. Chicago has 77 community areas. See the community areas at https://data.cityofchicago.org/d/cauq-8yn6.', ()),\n", " SchemaField('fbi_code', 'STRING', 'NULLABLE', \"Indicates the crime classification as outlined in the FBI's National Incident-Based Reporting System (NIBRS). See the Chicago Police Department listing of these classifications at http://gis.chicagopolice.org/clearmap_crime_sums/crime_types.html.\", ()),\n", " SchemaField('x_coordinate', 'FLOAT', 'NULLABLE', 'The x coordinate of the location where the incident occurred in State Plane Illinois East NAD 1983 projection. This location is shifted from the actual location for partial redaction but falls on the same block.', ()),\n", " SchemaField('y_coordinate', 'FLOAT', 'NULLABLE', 'The y coordinate of the location where the incident occurred in State Plane Illinois East NAD 1983 projection. This location is shifted from the actual location for partial redaction but falls on the same block.', ()),\n", " SchemaField('year', 'INTEGER', 'NULLABLE', 'Year the incident occurred.', ()),\n", " SchemaField('updated_on', 'TIMESTAMP', 'NULLABLE', 'Date and time the record was last updated.', ()),\n", " SchemaField('latitude', 'FLOAT', 'NULLABLE', 'The latitude of the location where the incident occurred. This location is shifted from the actual location for partial redaction but falls on the same block.', ()),\n", " SchemaField('longitude', 'FLOAT', 'NULLABLE', 'The longitude of the location where the incident occurred. This location is shifted from the actual location for partial redaction but falls on the same block.', ()),\n", " SchemaField('location', 'STRING', 'NULLABLE', 'The location where the incident occurred in a format that allows for creation of maps and other geographic operations on this data portal. This location is shifted from the actual location for partial redaction but falls on the same block.', ())]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Write the code to figure out the answer\n", "# Construct a reference to the \"crime\" table\n", "table_ref = dataset_ref.table(\"crime\")\n", "\n", "# API request - fetch the table\n", "table = client.get_table(table_ref)\n", "\n", "# Print information on all the columns in the \"crime\" table in the \"chicago_crime\" dataset\n", "table.schema" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "parent.postMessage({\"jupyterEvent\": \"custom.exercise_interaction\", \"data\": {\"outcomeType\": 1, \"valueTowardsCompletion\": 0.3333333333333333, \"interactionType\": 1, \"questionType\": 1, \"questionId\": \"2_CountTimestampFields\", \"learnToolsVersion\": \"0.3.4\", \"failureMessage\": \"\", \"exceptionClass\": \"\", \"trace\": \"\"}}, \"*\")" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "Correct" ], "text/plain": [ "Correct" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# List of field of type TIMESTAMP\n", "timestamp_fields = [x for x in table.schema if x.field_type == 'TIMESTAMP']\n", "num_timestamp_fields = len(timestamp_fields) # Put your answer here\n", "\n", "q_2.check()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For a hint or the solution, uncomment the appropriate line below." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "#q_2.hint()\n", "#q_2.solution()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3) Create a crime map\n", "\n", "If you wanted to create a map with a dot at the location of each crime, what are the names of the two fields you likely need to pull out of the `crime` table to plot the crimes on a map?" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
unique_keycase_numberdateblockiucrprimary_typedescriptionlocation_descriptionarrestdomestic...wardcommunity_areafbi_codex_coordinatey_coordinateyearupdated_onlatitudelongitudelocation
010272397HY4605272015-10-09 00:00:00+00:00111XX S KEDZIE AVE1752OFFENSE INVOLVING CHILDRENAGG CRIM SEX ABUSE FAM MEMBERRESIDENCEFalseTrue...197420NoneNone20152016-04-15 11:16:05+00:00NoneNoneNone
111297332JB2398542017-08-01 12:00:00+00:00053XX N NAGLE AVE1153DECEPTIVE PRACTICEFINANCIAL IDENTITY THEFT OVER $ 300RESIDENCEFalseFalse...411011NoneNone20172018-04-28 16:03:06+00:00NoneNoneNone
24903025HM5188002004-01-15 08:00:00+00:00052XX W FOSTER AVE1120DECEPTIVE PRACTICEFORGERYRESIDENCETrueFalse...451110NoneNone20042015-08-17 15:03:40+00:00NoneNoneNone
37781803HS5877662009-08-01 00:00:00+00:00071XX W DEVON AVE1751OFFENSE INVOLVING CHILDRENCRIM SEX ABUSE BY FAM MEMBERRESIDENCETrueTrue...411020NoneNone20092015-08-17 15:03:40+00:00NoneNoneNone
410691033HZ4435652015-09-29 09:00:00+00:00021XX S ARCHER AVE1154DECEPTIVE PRACTICEFINANCIAL IDENTITY THEFT $300 AND UNDERSMALL RETAIL STOREFalseFalse...253411NoneNone20152016-09-23 15:50:53+00:00NoneNoneNone
\n", "

5 rows × 22 columns

\n", "
" ], "text/plain": [ " unique_key case_number date block iucr \\\n", "0 10272397 HY460527 2015-10-09 00:00:00+00:00 111XX S KEDZIE AVE 1752 \n", "1 11297332 JB239854 2017-08-01 12:00:00+00:00 053XX N NAGLE AVE 1153 \n", "2 4903025 HM518800 2004-01-15 08:00:00+00:00 052XX W FOSTER AVE 1120 \n", "3 7781803 HS587766 2009-08-01 00:00:00+00:00 071XX W DEVON AVE 1751 \n", "4 10691033 HZ443565 2015-09-29 09:00:00+00:00 021XX S ARCHER AVE 1154 \n", "\n", " primary_type description \\\n", "0 OFFENSE INVOLVING CHILDREN AGG CRIM SEX ABUSE FAM MEMBER \n", "1 DECEPTIVE PRACTICE FINANCIAL IDENTITY THEFT OVER $ 300 \n", "2 DECEPTIVE PRACTICE FORGERY \n", "3 OFFENSE INVOLVING CHILDREN CRIM SEX ABUSE BY FAM MEMBER \n", "4 DECEPTIVE PRACTICE FINANCIAL IDENTITY THEFT $300 AND UNDER \n", "\n", " location_description arrest domestic ... ward community_area fbi_code \\\n", "0 RESIDENCE False True ... 19 74 20 \n", "1 RESIDENCE False False ... 41 10 11 \n", "2 RESIDENCE True False ... 45 11 10 \n", "3 RESIDENCE True True ... 41 10 20 \n", "4 SMALL RETAIL STORE False False ... 25 34 11 \n", "\n", " x_coordinate y_coordinate year updated_on latitude \\\n", "0 None None 2015 2016-04-15 11:16:05+00:00 None \n", "1 None None 2017 2018-04-28 16:03:06+00:00 None \n", "2 None None 2004 2015-08-17 15:03:40+00:00 None \n", "3 None None 2009 2015-08-17 15:03:40+00:00 None \n", "4 None None 2015 2016-09-23 15:50:53+00:00 None \n", "\n", " longitude location \n", "0 None None \n", "1 None None \n", "2 None None \n", "3 None None \n", "4 None None \n", "\n", "[5 rows x 22 columns]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Write the code here to explore the data so you can find the answer\n", "client.list_rows(table, max_results=5).to_dataframe()" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "parent.postMessage({\"jupyterEvent\": \"custom.exercise_interaction\", \"data\": {\"outcomeType\": 1, \"valueTowardsCompletion\": 0.3333333333333333, \"interactionType\": 1, \"questionType\": 2, \"questionId\": \"3_IdentifyFieldsForPlotting\", \"learnToolsVersion\": \"0.3.4\", \"failureMessage\": \"\", \"exceptionClass\": \"\", \"trace\": \"\"}}, \"*\")" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "Correct" ], "text/plain": [ "Correct" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fields_for_plotting = ['latitude', 'longitude'] # Put your answers here\n", "\n", "q_3.check()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For a hint or the solution, uncomment the appropriate line below." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "#q_3.hint()\n", "#q_3.solution()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Thinking about the question above, there are a few columns that appear to have geographic data. Look at a few values (with the `list_rows()` command) to see if you can determine their relationship. Two columns will still be hard to interpret. But it should be obvious how the `location` column relates to `latitude` and `longitude`." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
latitudelongitudelocation
0NoneNoneNone
1NoneNoneNone
2NoneNoneNone
3NoneNoneNone
4NoneNoneNone
\n", "
" ], "text/plain": [ " latitude longitude location\n", "0 None None None\n", "1 None None None\n", "2 None None None\n", "3 None None None\n", "4 None None None" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Scratch space for your code\n", "client.list_rows(table, selected_fields=table.schema[-3:], max_results=5).to_dataframe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Keep going\n", "\n", "You've looked at the schema, but you haven't yet done anything exciting with the data itself. Things get more interesting when you get to the data, so keep going to **[write your first SQL query](https://www.kaggle.com/dansbecker/select-from-where).**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "**[SQL Home Page](https://www.kaggle.com/learn/intro-to-sql)**\n", "\n", "\n", "\n", "\n", "\n", "*Have questions or comments? Visit the [Learn Discussion forum](https://www.kaggle.com/learn-forum) to chat with other Learners.*" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" } }, "nbformat": 4, "nbformat_minor": 4 }