{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "f13923fd-5ece-41e8-ad79-06a0a8972672",
"showTitle": false,
"title": ""
}
},
"source": [
"# SafeGraph SIGSPATIAL 2021 Workshop\n",
"\n",
"## Understand Consumer Behavior With Verified Foot Traffic Data\n",
"\n",
"**Background** Visitor and demographic aggregation data provide essential context on population behavior. How often a place of interest is visited, how long do visitors stay, where did they come from, and where are they going? The answers are invaluable in numerous industries. Building financial indicators, city and urban planning, public health indicators, or identifying your primary business competitors all require accurate, high quality population and POI data. \n",
"\n",
"**Objective** Our workshop’s objective is to provide professionals, researchers, and practitioners interested in deriving human movement patterns from location data. We use a sample of our Weekly and Monthly Patterns and Core Places products to perform market research on a potential new coffee shop location. We’ll address these concerns and more in building a market analysis proposal in real time. \n",
"\n",
"**Questions to Answer** \n",
"- How far are customers willing to travel for coffee? \n",
"- What location will receive the most visibility? \n",
"- Where do most of the coffee customers come from?"
]
},
{
"cell_type": "markdown",
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "6ff40847-a1f5-434e-89e9-099b086340d8",
"showTitle": false,
"title": ""
}
},
"source": [
"## Notebook Setup"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING: An illegal reflective access operation has occurred\n",
"WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/media/hdd1/code/spark-3.1.2-bin-hadoop3.2/jars/spark-unsafe_2.12-3.1.2.jar) to constructor java.nio.DirectByteBuffer(long,int)\n",
"WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform\n",
"WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations\n",
"WARNING: All illegal access operations will be denied in a future release\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
":: loading settings :: url = jar:file:/media/hdd1/code/spark-3.1.2-bin-hadoop3.2/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Ivy Default Cache set to: /home/jiayu/.ivy2/cache\n",
"The jars for the packages stored in: /home/jiayu/.ivy2/jars\n",
"org.apache.sedona#sedona-python-adapter-3.0_2.12 added as a dependency\n",
"org.datasyslab#geotools-wrapper added as a dependency\n",
":: resolving dependencies :: org.apache.spark#spark-submit-parent-f959fc69-d6af-4223-b017-6334535799e9;1.0\n",
"\tconfs: [default]\n",
"\tfound org.apache.sedona#sedona-python-adapter-3.0_2.12;1.2.0-incubating in central\n",
"\tfound org.locationtech.jts#jts-core;1.18.0 in local-m2-cache\n",
"\tfound org.wololo#jts2geojson;0.16.1 in central\n",
"\tfound com.fasterxml.jackson.core#jackson-databind;2.12.2 in central\n",
"\tfound com.fasterxml.jackson.core#jackson-annotations;2.12.2 in central\n",
"\tfound com.fasterxml.jackson.core#jackson-core;2.12.2 in central\n",
"\tfound org.apache.sedona#sedona-core-3.0_2.12;1.2.0-incubating in central\n",
"\tfound org.scala-lang.modules#scala-collection-compat_2.12;2.5.0 in central\n",
"\tfound org.apache.sedona#sedona-sql-3.0_2.12;1.2.0-incubating in central\n",
"\tfound org.datasyslab#geotools-wrapper;1.1.0-25.2 in central\n",
":: resolution report :: resolve 322ms :: artifacts dl 7ms\n",
"\t:: modules in use:\n",
"\tcom.fasterxml.jackson.core#jackson-annotations;2.12.2 from central in [default]\n",
"\tcom.fasterxml.jackson.core#jackson-core;2.12.2 from central in [default]\n",
"\tcom.fasterxml.jackson.core#jackson-databind;2.12.2 from central in [default]\n",
"\torg.apache.sedona#sedona-core-3.0_2.12;1.2.0-incubating from central in [default]\n",
"\torg.apache.sedona#sedona-python-adapter-3.0_2.12;1.2.0-incubating from central in [default]\n",
"\torg.apache.sedona#sedona-sql-3.0_2.12;1.2.0-incubating from central in [default]\n",
"\torg.datasyslab#geotools-wrapper;1.1.0-25.2 from central in [default]\n",
"\torg.locationtech.jts#jts-core;1.18.0 from local-m2-cache in [default]\n",
"\torg.scala-lang.modules#scala-collection-compat_2.12;2.5.0 from central in [default]\n",
"\torg.wololo#jts2geojson;0.16.1 from central in [default]\n",
"\t:: evicted modules:\n",
"\torg.locationtech.jts#jts-core;1.18.1 by [org.locationtech.jts#jts-core;1.18.0] in [default]\n",
"\t---------------------------------------------------------------------\n",
"\t| | modules || artifacts |\n",
"\t| conf | number| search|dwnlded|evicted|| number|dwnlded|\n",
"\t---------------------------------------------------------------------\n",
"\t| default | 11 | 0 | 0 | 1 || 10 | 0 |\n",
"\t---------------------------------------------------------------------\n",
":: retrieving :: org.apache.spark#spark-submit-parent-f959fc69-d6af-4223-b017-6334535799e9\n",
"\tconfs: [default]\n",
"\t0 artifacts copied, 10 already retrieved (0kB/8ms)\n",
"2022-05-22 13:07:11,104 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable\n",
"Setting default log level to \"WARN\".\n",
"To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).\n"
]
}
],
"source": [
"from pyspark.sql import SparkSession\n",
"from sedona.register import SedonaRegistrator\n",
"from sedona.utils import SedonaKryoRegistrator, KryoSerializer\n",
"\n",
"spark = SparkSession. \\\n",
" builder. \\\n",
" appName('sigspatial2021'). \\\n",
" master(\"spark://data-ocean-lab-1:7077\").\\\n",
" config(\"spark.serializer\", KryoSerializer.getName).\\\n",
" config(\"spark.kryo.registrator\", SedonaKryoRegistrator.getName) .\\\n",
" config('spark.jars.packages',\n",
" 'org.apache.sedona:sedona-python-adapter-3.0_2.12:1.2.0-incubating,'\n",
" 'org.datasyslab:geotools-wrapper:1.1.0-25.2'). \\\n",
" getOrCreate()"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "b148206f-8930-4a38-bbcc-7be781f099ce",
"showTitle": false,
"title": ""
}
},
"outputs": [],
"source": [
"import pyspark.sql.functions as f\n",
"from pyspark.sql.types import MapType, StringType, IntegerType\n",
"\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "f01fd7cd-55f3-4a21-b8d6-bc57655e1056",
"showTitle": false,
"title": ""
}
},
"source": [
"### Load in SafeGraph sample data from s3\n",
"The covers all coffee shops (`category_tag.contains(\"Coffee Shop\")`) in Seattle (`county_FIPS.isin([\"53033\",\"53053\",\"53061\"]`), with multiple rows per POI corresponding to monthly foot traffic since the beginning of 2018 (1 month per row per POI).\n",
"\n",
"Columns are SafeGraph Core, Geo, and Patterns pre-joined together with `placekey` as the join key."
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "4a3b905c-d951-4607-afd8-8d997316303c",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
" \r"
]
}
],
"source": [
"sample_csv_path = 'file:///media/hdd1/code/sigspatial-2021-cafe-analysis/data/seattle_coffee_monthly_patterns/'\n",
"sample = (\n",
" spark.read.option(\"header\", \"true\").option(\"escape\", \"\\\"\").csv(sample_csv_path)\n",
" .withColumn('date_range_start', f.to_date(f.col('date_range_start')))\n",
" .withColumn('date_range_end', f.to_date(f.col('date_range_end')))\n",
" .withColumn('visitor_home_cbgs', f.from_json('visitor_home_cbgs', schema = MapType(StringType(), IntegerType())))\n",
" .withColumn(\"distance_from_home\", f.col(\"distance_from_home\"))\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "6b261aaf-614c-4313-b928-455783a3b432",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
" \r"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of coffe shops patterns: 66607\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
" \r"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of coffe shops: 1679\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" placekey | \n",
" safegraph_place_id | \n",
" parent_placekey | \n",
" parent_safegraph_place_id | \n",
" safegraph_brand_ids | \n",
" location_name | \n",
" brands | \n",
" store_id | \n",
" top_category | \n",
" sub_category | \n",
" ... | \n",
" distance_from_home | \n",
" median_dwell | \n",
" bucketed_dwell_times | \n",
" related_same_day_brand | \n",
" related_same_month_brand | \n",
" popularity_by_hour | \n",
" popularity_by_day | \n",
" device_type | \n",
" carrier_name | \n",
" county_fips | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 226-222@5x3-t3j-ch5 | \n",
" sg:290a6c59dd3f4b28a1c086a510e40944 | \n",
" None | \n",
" None | \n",
" None | \n",
" Black Gold Coffee Company | \n",
" None | \n",
" None | \n",
" Restaurants and Other Eating Places | \n",
" Snack and Nonalcoholic Beverage Bars | \n",
" ... | \n",
" 11416 | \n",
" 20.0 | \n",
" {\"<5\":3,\"5-10\":30,\"11-20\":12,\"21-60\":14,\"61-12... | \n",
" {\"CENEX\":10,\"Starbucks\":9,\"Safeway Pharmacy\":9... | \n",
" {\"McDonald's\":44,\"Walmart\":37,\"Starbucks\":37,\"... | \n",
" [9,9,9,9,9,12,11,9,15,20,15,22,16,12,11,11,8,7... | \n",
" {\"Monday\":13,\"Tuesday\":20,\"Wednesday\":8,\"Thurs... | \n",
" {\"android\":25,\"ios\":14} | \n",
" {\"AT&T\":5,\"Sprint\":4,\"T-Mobile\":9,\"Verizon\":14} | \n",
" 53033 | \n",
"
\n",
" \n",
" 1 | \n",
" 224-222@5x4-49y-8gk | \n",
" sg:33d6c187ce6e419cbcb09f0b47ed6236 | \n",
" None | \n",
" None | \n",
" None | \n",
" Bite Box | \n",
" None | \n",
" None | \n",
" Restaurants and Other Eating Places | \n",
" Snack and Nonalcoholic Beverage Bars | \n",
" ... | \n",
" 2737 | \n",
" 41.0 | \n",
" {\"<5\":4,\"5-10\":28,\"11-20\":12,\"21-60\":30,\"61-12... | \n",
" {\"Safeway Pharmacy\":6,\"Ace Hardware\":5,\"ARCO\":... | \n",
" {\"Starbucks\":41,\"Shell Oil\":33,\"Safeway Pharma... | \n",
" [8,10,10,9,8,12,13,19,22,25,30,34,39,32,21,20,... | \n",
" {\"Monday\":17,\"Tuesday\":25,\"Wednesday\":15,\"Thur... | \n",
" {\"android\":20,\"ios\":51} | \n",
" {\"AT&T\":14,\"Sprint\":1,\"T-Mobile\":14,\"Verizon\":35} | \n",
" 53033 | \n",
"
\n",
" \n",
" 2 | \n",
" 228-222@5x2-t6c-qfz | \n",
" sg:442c1606a0d54d2aa017ea5a89ea8d78 | \n",
" zzw-223@5x2-t6c-qfz | \n",
" sg:a6e078c59b154cd59d8ac9e1f6f6311c | \n",
" SG_BRAND_f116acfe9147494063e58da666d1d57e | \n",
" Starbucks | \n",
" Starbucks | \n",
" 26209-243989 | \n",
" Restaurants and Other Eating Places | \n",
" Snack and Nonalcoholic Beverage Bars | \n",
" ... | \n",
" 5074 | \n",
" 9.0 | \n",
" {\"<5\":22,\"5-10\":165,\"11-20\":47,\"21-60\":50,\"61-... | \n",
" {\"Walmart\":7,\"Safeway Fuel Station\":4,\"McDonal... | \n",
" {\"Walmart\":55,\"McDonald's\":45,\"Costco\":35,\"She... | \n",
" [9,7,6,6,6,10,17,22,26,47,61,51,39,35,39,48,26... | \n",
" {\"Monday\":52,\"Tuesday\":60,\"Wednesday\":38,\"Thur... | \n",
" {\"android\":100,\"ios\":160} | \n",
" {\"AT&T\":64,\"Sprint\":9,\"T-Mobile\":62,\"Verizon\":98} | \n",
" 53061 | \n",
"
\n",
" \n",
" 3 | \n",
" zzw-22b@5x4-4yr-gtv | \n",
" sg:5b5977ecc8814452bb5630d0b1b56ae1 | \n",
" None | \n",
" None | \n",
" None | \n",
" Cozy Bubble Tea | \n",
" None | \n",
" None | \n",
" Restaurants and Other Eating Places | \n",
" Snack and Nonalcoholic Beverage Bars | \n",
" ... | \n",
" None | \n",
" 90.0 | \n",
" {\"<5\":0,\"5-10\":0,\"11-20\":0,\"21-60\":1,\"61-120\":... | \n",
" {\"Hallmark Cards\":20,\"76\":20} | \n",
" {\"Costco Gasoline\":60,\"Costco\":60,\"Starbucks\":... | \n",
" [1,1,1,1,1,1,1,1,1,1,1,0,1,0,0,0,0,0,0,1,2,2,2,2] | \n",
" {\"Monday\":0,\"Tuesday\":1,\"Wednesday\":1,\"Thursda... | \n",
" {\"android\":4,\"ios\":4} | \n",
" {\"AT&T\":1,\"T-Mobile\":2,\"Verizon\":1} | \n",
" 53033 | \n",
"
\n",
" \n",
" 4 | \n",
" zzw-224@5x4-4bc-d5f | \n",
" sg:6af014f98e9f471b85b8e3eacb58b2f7 | \n",
" None | \n",
" None | \n",
" SG_BRAND_f116acfe9147494063e58da666d1d57e | \n",
" Starbucks | \n",
" Starbucks | \n",
" 3278-4859 | \n",
" Restaurants and Other Eating Places | \n",
" Snack and Nonalcoholic Beverage Bars | \n",
" ... | \n",
" 9319 | \n",
" 45.0 | \n",
" {\"<5\":36,\"5-10\":201,\"11-20\":119,\"21-60\":175,\"6... | \n",
" {\"Potbelly Sandwich Works\":1,\"Kung Fu Tea\":1,\"... | \n",
" {\"Chevron\":18,\"Costco\":15,\"Shell Oil\":13,\"McDo... | \n",
" [202,186,187,192,187,196,235,254,272,267,259,2... | \n",
" {\"Monday\":181,\"Tuesday\":171,\"Wednesday\":127,\"T... | \n",
" {\"android\":136,\"ios\":289} | \n",
" {\"AT&T\":41,\"Sprint\":2,\"T-Mobile\":69,\"Verizon\":91} | \n",
" 53033 | \n",
"
\n",
" \n",
"
\n",
"
5 rows × 51 columns
\n",
"
"
],
"text/plain": [
" placekey safegraph_place_id \\\n",
"0 226-222@5x3-t3j-ch5 sg:290a6c59dd3f4b28a1c086a510e40944 \n",
"1 224-222@5x4-49y-8gk sg:33d6c187ce6e419cbcb09f0b47ed6236 \n",
"2 228-222@5x2-t6c-qfz sg:442c1606a0d54d2aa017ea5a89ea8d78 \n",
"3 zzw-22b@5x4-4yr-gtv sg:5b5977ecc8814452bb5630d0b1b56ae1 \n",
"4 zzw-224@5x4-4bc-d5f sg:6af014f98e9f471b85b8e3eacb58b2f7 \n",
"\n",
" parent_placekey parent_safegraph_place_id \\\n",
"0 None None \n",
"1 None None \n",
"2 zzw-223@5x2-t6c-qfz sg:a6e078c59b154cd59d8ac9e1f6f6311c \n",
"3 None None \n",
"4 None None \n",
"\n",
" safegraph_brand_ids location_name \\\n",
"0 None Black Gold Coffee Company \n",
"1 None Bite Box \n",
"2 SG_BRAND_f116acfe9147494063e58da666d1d57e Starbucks \n",
"3 None Cozy Bubble Tea \n",
"4 SG_BRAND_f116acfe9147494063e58da666d1d57e Starbucks \n",
"\n",
" brands store_id top_category \\\n",
"0 None None Restaurants and Other Eating Places \n",
"1 None None Restaurants and Other Eating Places \n",
"2 Starbucks 26209-243989 Restaurants and Other Eating Places \n",
"3 None None Restaurants and Other Eating Places \n",
"4 Starbucks 3278-4859 Restaurants and Other Eating Places \n",
"\n",
" sub_category ... distance_from_home median_dwell \\\n",
"0 Snack and Nonalcoholic Beverage Bars ... 11416 20.0 \n",
"1 Snack and Nonalcoholic Beverage Bars ... 2737 41.0 \n",
"2 Snack and Nonalcoholic Beverage Bars ... 5074 9.0 \n",
"3 Snack and Nonalcoholic Beverage Bars ... None 90.0 \n",
"4 Snack and Nonalcoholic Beverage Bars ... 9319 45.0 \n",
"\n",
" bucketed_dwell_times \\\n",
"0 {\"<5\":3,\"5-10\":30,\"11-20\":12,\"21-60\":14,\"61-12... \n",
"1 {\"<5\":4,\"5-10\":28,\"11-20\":12,\"21-60\":30,\"61-12... \n",
"2 {\"<5\":22,\"5-10\":165,\"11-20\":47,\"21-60\":50,\"61-... \n",
"3 {\"<5\":0,\"5-10\":0,\"11-20\":0,\"21-60\":1,\"61-120\":... \n",
"4 {\"<5\":36,\"5-10\":201,\"11-20\":119,\"21-60\":175,\"6... \n",
"\n",
" related_same_day_brand \\\n",
"0 {\"CENEX\":10,\"Starbucks\":9,\"Safeway Pharmacy\":9... \n",
"1 {\"Safeway Pharmacy\":6,\"Ace Hardware\":5,\"ARCO\":... \n",
"2 {\"Walmart\":7,\"Safeway Fuel Station\":4,\"McDonal... \n",
"3 {\"Hallmark Cards\":20,\"76\":20} \n",
"4 {\"Potbelly Sandwich Works\":1,\"Kung Fu Tea\":1,\"... \n",
"\n",
" related_same_month_brand \\\n",
"0 {\"McDonald's\":44,\"Walmart\":37,\"Starbucks\":37,\"... \n",
"1 {\"Starbucks\":41,\"Shell Oil\":33,\"Safeway Pharma... \n",
"2 {\"Walmart\":55,\"McDonald's\":45,\"Costco\":35,\"She... \n",
"3 {\"Costco Gasoline\":60,\"Costco\":60,\"Starbucks\":... \n",
"4 {\"Chevron\":18,\"Costco\":15,\"Shell Oil\":13,\"McDo... \n",
"\n",
" popularity_by_hour \\\n",
"0 [9,9,9,9,9,12,11,9,15,20,15,22,16,12,11,11,8,7... \n",
"1 [8,10,10,9,8,12,13,19,22,25,30,34,39,32,21,20,... \n",
"2 [9,7,6,6,6,10,17,22,26,47,61,51,39,35,39,48,26... \n",
"3 [1,1,1,1,1,1,1,1,1,1,1,0,1,0,0,0,0,0,0,1,2,2,2,2] \n",
"4 [202,186,187,192,187,196,235,254,272,267,259,2... \n",
"\n",
" popularity_by_day \\\n",
"0 {\"Monday\":13,\"Tuesday\":20,\"Wednesday\":8,\"Thurs... \n",
"1 {\"Monday\":17,\"Tuesday\":25,\"Wednesday\":15,\"Thur... \n",
"2 {\"Monday\":52,\"Tuesday\":60,\"Wednesday\":38,\"Thur... \n",
"3 {\"Monday\":0,\"Tuesday\":1,\"Wednesday\":1,\"Thursda... \n",
"4 {\"Monday\":181,\"Tuesday\":171,\"Wednesday\":127,\"T... \n",
"\n",
" device_type \\\n",
"0 {\"android\":25,\"ios\":14} \n",
"1 {\"android\":20,\"ios\":51} \n",
"2 {\"android\":100,\"ios\":160} \n",
"3 {\"android\":4,\"ios\":4} \n",
"4 {\"android\":136,\"ios\":289} \n",
"\n",
" carrier_name county_fips \n",
"0 {\"AT&T\":5,\"Sprint\":4,\"T-Mobile\":9,\"Verizon\":14} 53033 \n",
"1 {\"AT&T\":14,\"Sprint\":1,\"T-Mobile\":14,\"Verizon\":35} 53033 \n",
"2 {\"AT&T\":64,\"Sprint\":9,\"T-Mobile\":62,\"Verizon\":98} 53061 \n",
"3 {\"AT&T\":1,\"T-Mobile\":2,\"Verizon\":1} 53033 \n",
"4 {\"AT&T\":41,\"Sprint\":2,\"T-Mobile\":69,\"Verizon\":91} 53033 \n",
"\n",
"[5 rows x 51 columns]"
]
},
"execution_count": 77,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print(\"Number of coffe shops patterns: \", sample.count())\n",
"print(\"Number of coffe shops: \", sample.select(\"placekey\").distinct().count())\n",
"sample.limit(10).toPandas().head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "3212b739-9ab0-40ac-8800-a6c9b7b617a1",
"showTitle": false,
"title": ""
}
},
"source": [
"## Exploratory Data Analysis and Visualization"
]
},
{
"cell_type": "markdown",
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "6f5cfe88-b2f7-4f42-b8f6-1cb34aef79ad",
"showTitle": false,
"title": ""
}
},
"source": [
"### Visualize the coffee shops"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "3e44811d-87da-465a-8ab3-ae29ad7b9a39",
"showTitle": false,
"title": ""
}
},
"outputs": [],
"source": [
"from pyspark.sql.window import Window\n",
"import geopandas as gpd\n",
"import folium"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "2e47587c-7a3f-47b6-b8de-2b5eb685fff6",
"showTitle": false,
"title": ""
}
},
"outputs": [],
"source": [
"w = Window().partitionBy('placekey').orderBy(f.col('date_range_start').desc())\n",
"\n",
"cafes_latest = (\n",
" sample\n",
" # as our data improves, addresses or geocodes for a given location may change over time\n",
" # use a window function to keep only the most recent appearance of the given cafe\n",
" .withColumn('row_num', f.row_number().over(w))\n",
" .filter(f.col('row_num') == 1)\n",
" # select the columns we need for mapping\n",
" .select('placekey', 'location_name', 'brands', 'street_address', 'city', 'region', 'postal_code', 'latitude', 'longitude', 'open_hours')\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "7e021a76-d6bc-4141-b8e0-07ab7945e5e5",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
" \r"
]
}
],
"source": [
"# create a geopandas geodataframe\n",
"cafes_gdf = cafes_latest.toPandas()\n",
"cafes_gdf = gpd.GeoDataFrame(cafes_gdf, geometry = gpd.points_from_xy(cafes_gdf['longitude'], cafes_gdf['latitude']), crs = 'EPSG:4326')"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "3ffcf731-aff1-4fd5-a125-a70bdb969212",
"showTitle": false,
"title": ""
}
},
"outputs": [],
"source": [
"def map_cafes(gdf):\n",
" \n",
" # map bounds\n",
" sw = [gdf.unary_union.bounds[1], gdf.unary_union.bounds[0]]\n",
" ne = [gdf.unary_union.bounds[3], gdf.unary_union.bounds[2]]\n",
" folium_bounds = [sw, ne]\n",
" \n",
" # map\n",
" x = gdf.centroid.x[0]\n",
" y = gdf.centroid.y[0]\n",
" \n",
" map_ = folium.Map(\n",
" location = [y, x],\n",
" tiles = \"OpenStreetMap\"\n",
" )\n",
" \n",
" for i, point in gdf.iterrows():\n",
" \n",
" tooltip = f\"placekey: {point['placekey']}
location_name: {point['location_name']}
brands: {point['brands']}
street_address: {point['street_address']}
city: {point['city']}
region: {point['region']}
postal_code: {point['postal_code']}
open_hours: {point['open_hours']}\"\n",
" \n",
" folium.Circle(\n",
" [point['geometry'].y, point['geometry'].x],\n",
" radius = 40,\n",
" fill_color = 'blue',\n",
" color = 'blue',\n",
" fill_opacity = 1,\n",
" tooltip = tooltip\n",
" ).add_to(map_)\n",
"\n",
" map_.fit_bounds(folium_bounds) \n",
" \n",
" return map_"
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "fb5e0ed3-d73b-4f62-94a8-c69bd25b2366",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/tmp/ipykernel_2685102/3619001546.py:9: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.\n",
"\n",
" x = gdf.centroid.x[0]\n",
"/tmp/ipykernel_2685102/3619001546.py:10: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.\n",
"\n",
" y = gdf.centroid.y[0]\n"
]
},
{
"data": {
"text/html": [
"Make this Notebook Trusted to load map: File -> Trust Notebook
"
],
"text/plain": [
""
]
},
"execution_count": 82,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"map_ = map_cafes(cafes_gdf)\n",
"map_"
]
},
{
"cell_type": "markdown",
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "b339352b-fb18-4940-ac18-8c4b50d162a3",
"showTitle": false,
"title": ""
}
},
"source": [
"## Analysis"
]
},
{
"cell_type": "markdown",
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "7fea603b-83fc-41fa-a423-778cdb51e88b",
"showTitle": false,
"title": ""
}
},
"source": [
"### How far are people willing to travel for coffee?"
]
},
{
"cell_type": "code",
"execution_count": 128,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "520fefcf-e222-453d-b5e2-4efd8a098a61",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"data": {
"text/plain": [
"DataFrame[placekey: string, location_name: string, street_address: string, avg_median_dist_from_home: double]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\r",
"[Stage 332:=> (1 + 39) / 40]\r",
"\r",
"[Stage 332:======================> (17 + 23) / 40]\r"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"+-------------------+--------------------------+-----------------------------------+-------------------------+\n",
"|placekey |location_name |street_address |avg_median_dist_from_home|\n",
"+-------------------+--------------------------+-----------------------------------+-------------------------+\n",
"|zzy-222@5x4-4b4-p7q|Starbucks |1700 Seventh Ave Nordstrom Building|1124.5566666666668 |\n",
"|25x-222@5x4-4b5-dvz|Storyville Coffee Company |94 Pike St Ste 34 |757.5968378378378 |\n",
"|zzw-22f@5x4-4b5-p7q|Starbucks |1912 Pike Pl |750.9135581395349 |\n",
"|223-222@5x4-4b5-dvz|Ghost Alley Espresso |1499 Post Aly |727.4028888888889 |\n",
"|224-224@5x4-4b4-xt9|The Bar |110 6th Ave N |667.3632222222222 |\n",
"|zzw-222@5x4-4b5-pvz|Cafe Opla |2200 Alaskan Way Ste 120 |661.2868000000001 |\n",
"|22c-222@5x4-4b4-s5z|Starbucks |1124 Pike St |654.5950666666666 |\n",
"|222-222@5x4-4b5-dvz|Starbucks |102 Pike St |633.7193111111111 |\n",
"|22q-222@5x4-4b5-m8v|Limoncello Belltown |2326 1st Ave |485.0474516129032 |\n",
"|22j-222@5x4-4b5-dvz|The Crumpet Shop |1503 1st Ave |431.5123333333333 |\n",
"|zzw-22c@5x4-4b7-bkz|Starbucks |305 Harrison St Ste 220 |406.287 |\n",
"|zzy-222@5xd-hjq-yn5|Starbucks |1300 Station Drive |369.72424137931034 |\n",
"|zzw-22b@5x4-4b5-pgk|World Spice Merchants |1509 Western Ave |360.2277209302325 |\n",
"|zzw-22f@5x4-4b5-pgk|Procopio Gelateria |1501 Western Ave Ste 300 |344.466 |\n",
"|223-222@5x4-4b5-hqz|Starbucks |1101 Alaskan Way Ste 102 |317.55111111111114 |\n",
"|zzw-22r@5x4-4b5-gkz|Cafe ABoDegas |1303 6th Ave |316.76696000000004 |\n",
"|22g-222@5x4-4b5-dsq|Bottega Italiana |1425 1st Ave |308.5182222222222 |\n",
"|zzw-22d@5x4-4b7-bkz|Eltana |305 Harrison St |294.919 |\n",
"|zzw-224@5x4-4b5-f75|MarketSpice |85A Pike St |263.49886363636364 |\n",
"|22b-222@5x4-4vs-qzz|Cherry Street Coffee House|700 1st Ave |210.19386666666668 |\n",
"+-------------------+--------------------------+-----------------------------------+-------------------------+\n",
"only showing top 20 rows\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\r",
" \r"
]
}
],
"source": [
"# the `distance_from_home` column tells us the median distance (as the crow flies), in meters, between the coffee shop and the visitors' homes \n",
"# which coffee shop's visitors had the highest average median distance traveled since Jan 2018?\n",
"\n",
"# outlier values in this column distort the histogram\n",
"# these outliers are likely due to a combination of (1) coffee shops in downtown areas that receive high numbers of out-of-town visitors and (2) quirks in the underlying GPS data\n",
"furthest_traveled = (\n",
" sample\n",
" .groupBy('placekey', 'location_name', 'street_address')\n",
" .agg(f.mean('distance_from_home').alias('avg_median_dist_from_home'))\n",
" .orderBy('avg_median_dist_from_home', ascending = False)\n",
")\n",
"\n",
"display(furthest_traveled)\n",
"furthest_traveled.filter(f.col(\"avg_median_dist_from_home\").isNotNull()).withColumn(\"avg_median_dist_from_home\", f.col(\"avg_median_dist_from_home\")/1000.0).show(truncate=False)\n",
"furthest_traveled = furthest_traveled.drop('street_address')"
]
},
{
"cell_type": "code",
"execution_count": 129,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "c4a5839c-610b-4802-a26e-d97a9145a818",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"data": {
"text/plain": [
"DataFrame[placekey: string, location_name: string, avg_median_dist_from_home: double]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[Stage 337:========================================> (30 + 10) / 40]\r"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"1071 coffee shops have vistors' home <10 km away, out of 1688 coffee shops.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\r",
" \r"
]
}
],
"source": [
"# most coffee shops' visitors' homes are <10km away\n",
"display(furthest_traveled.filter(f.col('avg_median_dist_from_home') < 10000))\n",
"print(furthest_traveled.filter(f.col('avg_median_dist_from_home') < 10000).count(), \" coffee shops have vistors' home <10 km away, out of \", furthest_traveled.count(), \"coffee shops.\")"
]
},
{
"cell_type": "code",
"execution_count": 130,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "30567048-2827-4fa2-afbb-6a8bf8880a50",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2022-05-22 14:02:05,714 WARN types.UDTRegistration: Cannot register UDT for org.locationtech.jts.geom.Geometry, which is already registered.\n",
"2022-05-22 14:02:05,714 WARN types.UDTRegistration: Cannot register UDT for org.locationtech.jts.index.SpatialIndex, which is already registered.\n",
"2022-05-22 14:02:05,714 WARN analysis.SimpleFunctionRegistry: The function st_pointfromtext replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_polygonfromtext replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_linestringfromtext replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_geomfromtext replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_geomfromwkt replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_geomfromwkb replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_geomfromgeojson replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_point replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_polygonfromenvelope replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_contains replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_intersects replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_within replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_distance replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_3ddistance replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_convexhull replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_npoints replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_buffer replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_envelope replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_length replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_area replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_centroid replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_transform replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_intersection replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_difference replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_symdifference replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_union replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_isvalid replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_precisionreduce replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_equals replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_touches replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_overlaps replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_crosses replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_issimple replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_makevalid replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_simplifypreservetopology replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_astext replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_asgeojson replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_asbinary replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_asewkb replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_srid replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_setsrid replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_geometrytype replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_numgeometries replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_linemerge replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_azimuth replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_x replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_y replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_z replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_startpoint replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_boundary replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_minimumboundingradius replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_minimumboundingcircle replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_endpoint replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_exteriorring replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_geometryn replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_interiorringn replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_dump replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_dumppoints replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_isclosed replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_numinteriorrings replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_addpoint replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_removepoint replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_isring replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_flipcoordinates replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_linesubstring replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_lineinterpolatepoint replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_subdivideexplode replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_subdivide replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_makepolygon replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_geohash replaced a previously registered function.\n",
"2022-05-22 14:02:05,715 WARN analysis.SimpleFunctionRegistry: The function st_geomfromgeohash replaced a previously registered function.\n",
"2022-05-22 14:02:05,716 WARN analysis.SimpleFunctionRegistry: The function st_collect replaced a previously registered function.\n",
"2022-05-22 14:02:05,716 WARN analysis.SimpleFunctionRegistry: The function st_multi replaced a previously registered function.\n",
"2022-05-22 14:02:05,716 WARN analysis.SimpleFunctionRegistry: The function rs_normalizeddifference replaced a previously registered function.\n",
"2022-05-22 14:02:05,716 WARN analysis.SimpleFunctionRegistry: The function rs_mean replaced a previously registered function.\n",
"2022-05-22 14:02:05,716 WARN analysis.SimpleFunctionRegistry: The function rs_mode replaced a previously registered function.\n",
"2022-05-22 14:02:05,716 WARN analysis.SimpleFunctionRegistry: The function rs_fetchregion replaced a previously registered function.\n",
"2022-05-22 14:02:05,716 WARN analysis.SimpleFunctionRegistry: The function rs_greaterthan replaced a previously registered function.\n",
"2022-05-22 14:02:05,716 WARN analysis.SimpleFunctionRegistry: The function rs_greaterthanequal replaced a previously registered function.\n",
"2022-05-22 14:02:05,716 WARN analysis.SimpleFunctionRegistry: The function rs_lessthan replaced a previously registered function.\n",
"2022-05-22 14:02:05,716 WARN analysis.SimpleFunctionRegistry: The function rs_lessthanequal replaced a previously registered function.\n",
"2022-05-22 14:02:05,716 WARN analysis.SimpleFunctionRegistry: The function rs_addbands replaced a previously registered function.\n",
"2022-05-22 14:02:05,716 WARN analysis.SimpleFunctionRegistry: The function rs_subtractbands replaced a previously registered function.\n",
"2022-05-22 14:02:05,716 WARN analysis.SimpleFunctionRegistry: The function rs_dividebands replaced a previously registered function.\n",
"2022-05-22 14:02:05,716 WARN analysis.SimpleFunctionRegistry: The function rs_multiplyfactor replaced a previously registered function.\n",
"2022-05-22 14:02:05,716 WARN analysis.SimpleFunctionRegistry: The function rs_multiplybands replaced a previously registered function.\n",
"2022-05-22 14:02:05,716 WARN analysis.SimpleFunctionRegistry: The function rs_bitwiseand replaced a previously registered function.\n",
"2022-05-22 14:02:05,716 WARN analysis.SimpleFunctionRegistry: The function rs_bitwiseor replaced a previously registered function.\n",
"2022-05-22 14:02:05,716 WARN analysis.SimpleFunctionRegistry: The function rs_count replaced a previously registered function.\n",
"2022-05-22 14:02:05,716 WARN analysis.SimpleFunctionRegistry: The function rs_modulo replaced a previously registered function.\n",
"2022-05-22 14:02:05,716 WARN analysis.SimpleFunctionRegistry: The function rs_getband replaced a previously registered function.\n",
"2022-05-22 14:02:05,716 WARN analysis.SimpleFunctionRegistry: The function rs_squareroot replaced a previously registered function.\n",
"2022-05-22 14:02:05,716 WARN analysis.SimpleFunctionRegistry: The function rs_logicaldifference replaced a previously registered function.\n",
"2022-05-22 14:02:05,716 WARN analysis.SimpleFunctionRegistry: The function rs_logicalover replaced a previously registered function.\n",
"2022-05-22 14:02:05,716 WARN analysis.SimpleFunctionRegistry: The function rs_base64 replaced a previously registered function.\n",
"2022-05-22 14:02:05,716 WARN analysis.SimpleFunctionRegistry: The function rs_html replaced a previously registered function.\n",
"2022-05-22 14:02:05,716 WARN analysis.SimpleFunctionRegistry: The function rs_array replaced a previously registered function.\n",
"2022-05-22 14:02:05,716 WARN analysis.SimpleFunctionRegistry: The function rs_normalize replaced a previously registered function.\n",
"2022-05-22 14:02:05,717 WARN analysis.SimpleFunctionRegistry: The function st_union_aggr replaced a previously registered function.\n",
"2022-05-22 14:02:05,719 WARN analysis.SimpleFunctionRegistry: The function st_envelope_aggr replaced a previously registered function.\n",
"2022-05-22 14:02:05,720 WARN analysis.SimpleFunctionRegistry: The function st_intersection_aggr replaced a previously registered function.\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 130,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# but `distance_from_home` takes into account ALL visitors to the coffee shop, which as we saw above can distort values. When selecting a site for a coffee shop, we likely care more about how far visitors traveled from within Seattle\n",
"# we can compute this using Sedona\n",
"import sedona\n",
"from sedona.register import SedonaRegistrator\n",
"from sedona.utils import SedonaKryoRegistrator, KryoSerializer\n",
"\n",
"SedonaRegistrator.registerAll(spark)"
]
},
{
"cell_type": "code",
"execution_count": 131,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "d516f594-62db-494d-b08d-8261836ed7e0",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"data": {
"text/plain": [
"Row(GEOID='530330033001', geometry='POLYGON ((-122.376771 47.679527, -122.376172 47.67953199999999, -122.375716 47.679533, -122.375137 47.679534, -122.374659 47.679531, -122.374092 47.679536, -122.373605 47.679537, -122.373003 47.679538, -122.37255 47.679539, -122.371964 47.679541, -122.371494 47.679539, -122.370899 47.679543, -122.370439 47.679543, -122.369828 47.679544, -122.369383 47.679545, -122.368747 47.67954599999999, -122.368328 47.679547, -122.367673 47.679548, -122.367273 47.679548, -122.366609 47.67954899999999, -122.366038 47.679551, -122.365979 47.679399, -122.365958 47.677762, -122.365957 47.676293, -122.36602 47.67616899999999, -122.366025 47.675984, -122.366197 47.675983, -122.36656 47.675982, -122.367624 47.67598, -122.368688 47.67598599999999, -122.36874 47.67598599999999, -122.369067 47.675987, -122.369779 47.675991, -122.370845 47.675993, -122.371399 47.675997, -122.371946 47.675999, -122.373158 47.675993, -122.373574 47.675991, -122.37401 47.67599, -122.37558 47.675988, -122.375645 47.675989, -122.375908 47.675992, -122.376117 47.675992, -122.376277 47.675993, -122.376513 47.675996, -122.376652 47.676203, -122.376708 47.67631799999999, -122.376749 47.676443, -122.376767 47.676553, -122.376765 47.677709, -122.376765 47.6778, -122.376771 47.679527))')"
]
},
"execution_count": 131,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# load the census block groups for Washington State\n",
"# filter it down to our three counties of interest in Seattle \n",
"WA_cbgs = (\n",
" spark.read.option('header', 'true').option('escape', \"\\\"\").csv('file:///media/hdd1/code/sigspatial-2021-cafe-analysis/data/wa_cbg.csv')\n",
" .filter(f.col('GEOID').rlike('^(53033|53053|53061)'))\n",
")\n",
"WA_cbgs.head()"
]
},
{
"cell_type": "code",
"execution_count": 132,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "df54c6c2-d183-44c2-9a1a-6f56a1ed6793",
"showTitle": false,
"title": ""
}
},
"outputs": [],
"source": [
"# transform the geometry column into a Geometry-type\n",
"WA_cbgs = (\n",
" WA_cbgs\n",
" .withColumn('cbg_geometry', f.expr(\"ST_GeomFromWkt(geometry)\"))\n",
" # we'll just use the CBG centroid\n",
" .withColumn('cbg_geometry', f.expr(\"ST_Centroid(cbg_geometry)\"))\n",
" # since we'll be doing a distance calculation, let's also use a projected CRS - epsg:3857\n",
" .withColumn('cbg_geometry', f.expr(\"ST_Transform(ST_FlipCoordinates(cbg_geometry), 'epsg:4326','epsg:3857', false)\")) # ST_FlipCoordinates() necessary due to this bug: https://issues.apache.org/jira/browse/SEDONA-39\n",
" .withColumnRenamed('GEOID', 'cbg')\n",
" .withColumnRenamed('geometry', 'cbg_polygon_geometry')\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 133,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "5ce1db0f-0446-4369-ae2e-672853bbf7c4",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" cbg | \n",
" cbg_polygon_geometry | \n",
" cbg_geometry | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 530330033001 | \n",
" POLYGON ((-122.376771 47.679527, -122.376172 4... | \n",
" POINT (-13622316.63058369 6053413.257523819) | \n",
"
\n",
" \n",
" 1 | \n",
" 530530729031 | \n",
" POLYGON ((-122.643503 47.134887, -122.643502 4... | \n",
" POINT (-13647989.55994223 5963208.486137308) | \n",
"
\n",
" \n",
" 2 | \n",
" 530530730015 | \n",
" POLYGON ((-122.567775 46.970114, -122.566807 4... | \n",
" POINT (-13641020.87626943 5941126.596682728) | \n",
"
\n",
" \n",
" 3 | \n",
" 530530730013 | \n",
" POLYGON ((-122.533868 46.996149, -122.532109 4... | \n",
" POINT (-13637080.70343767 5938242.363543665) | \n",
"
\n",
" \n",
" 4 | \n",
" 530530730014 | \n",
" POLYGON ((-122.484146 46.938446, -122.484146 4... | \n",
" POINT (-13632564.01655458 5936624.370012998) | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" cbg cbg_polygon_geometry \\\n",
"0 530330033001 POLYGON ((-122.376771 47.679527, -122.376172 4... \n",
"1 530530729031 POLYGON ((-122.643503 47.134887, -122.643502 4... \n",
"2 530530730015 POLYGON ((-122.567775 46.970114, -122.566807 4... \n",
"3 530530730013 POLYGON ((-122.533868 46.996149, -122.532109 4... \n",
"4 530530730014 POLYGON ((-122.484146 46.938446, -122.484146 4... \n",
"\n",
" cbg_geometry \n",
"0 POINT (-13622316.63058369 6053413.257523819) \n",
"1 POINT (-13647989.55994223 5963208.486137308) \n",
"2 POINT (-13641020.87626943 5941126.596682728) \n",
"3 POINT (-13637080.70343767 5938242.363543665) \n",
"4 POINT (-13632564.01655458 5936624.370012998) "
]
},
"execution_count": 133,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"WA_cbgs.limit(10).toPandas().head()"
]
},
{
"cell_type": "code",
"execution_count": 134,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "b15c3035-5ab9-4b23-9d37-b5f7ee023026",
"showTitle": false,
"title": ""
}
},
"outputs": [],
"source": [
"# Next let's prep our sample data\n",
"sample_seattle_visitors = (\n",
" sample\n",
" .select('placekey', f.explode('visitor_home_cbgs'))\n",
" .withColumnRenamed('key', 'cbg')\n",
" .withColumnRenamed('value', 'visitors')\n",
"# # filter out CBGs with low visitor counts\n",
" .filter(f.col('visitors') > 4)\n",
" # filter down to only the visitors from Seattle CBGs\n",
" .filter(f.col('cbg').rlike('^(53033|53053|53061)'))\n",
" # aggregate up all the visitors over time from each CBG to each Cafe\n",
" .groupBy('placekey', 'cbg')\n",
" .agg(\n",
" f.sum('visitors').alias('visitors')\n",
" )\n",
" # join back with most up-to-date POI information\n",
" .join(\n",
" cafes_latest.select('placekey', 'latitude', 'longitude'),\n",
" 'placekey'\n",
" )\n",
" # transform geometry column\n",
" .withColumn('cafe_geometry', f.expr(\"ST_Point(CAST(longitude AS Decimal(24, 20)), CAST(latitude AS Decimal(24, 20)))\"))\n",
" .withColumn('cafe_geometry', f.expr(\"ST_Transform(ST_FlipCoordinates(cafe_geometry), 'epsg:4326','epsg:3857', false)\"))\n",
" # join with CBG geometries\n",
" .join(WA_cbgs, 'cbg')\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 135,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "620a4eea-a132-4900-9b93-164c0228455f",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
" \r"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" cbg | \n",
" placekey | \n",
" visitors | \n",
" latitude | \n",
" longitude | \n",
" cafe_geometry | \n",
" cbg_polygon_geometry | \n",
" cbg_geometry | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 530330002006 | \n",
" zzw-222@5x4-4d5-mrk | \n",
" 77 | \n",
" 47.742573 | \n",
" -122.349879 | \n",
" POINT (-13619926.22889864 6064134.508228292) | \n",
" POLYGON ((-122.308696 47.733904, -122.308045 4... | \n",
" POINT (-13614894.7487456 6061924.520392932) | \n",
"
\n",
" \n",
" 1 | \n",
" 530330002006 | \n",
" 223-222@5x4-48d-f4v | \n",
" 5 | \n",
" 47.701524 | \n",
" -122.323637 | \n",
" POINT (-13617004.98282124 6057341.933516146) | \n",
" POLYGON ((-122.308696 47.733904, -122.308045 4... | \n",
" POINT (-13614894.7487456 6061924.520392932) | \n",
"
\n",
" \n",
" 2 | \n",
" 530330002006 | \n",
" zzw-226@5x4-48g-t5f | \n",
" 5 | \n",
" 47.680164 | \n",
" -122.323755 | \n",
" POINT (-13617018.11852115 6053809.507537933) | \n",
" POLYGON ((-122.308696 47.733904, -122.308045 4... | \n",
" POINT (-13614894.7487456 6061924.520392932) | \n",
"
\n",
" \n",
" 3 | \n",
" 530330002006 | \n",
" 223-222@5x4-4nz-pqf | \n",
" 5 | \n",
" 47.71778 | \n",
" -122.313042 | \n",
" POINT (-13615825.55281628 6060031.251671663) | \n",
" POLYGON ((-122.308696 47.733904, -122.308045 4... | \n",
" POINT (-13614894.7487456 6061924.520392932) | \n",
"
\n",
" \n",
" 4 | \n",
" 530330002006 | \n",
" 225-222@5x4-4d6-789 | \n",
" 10 | \n",
" 47.724882 | \n",
" -122.343856 | \n",
" POINT (-13619255.75160559 6061206.437371126) | \n",
" POLYGON ((-122.308696 47.733904, -122.308045 4... | \n",
" POINT (-13614894.7487456 6061924.520392932) | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" cbg placekey visitors latitude longitude \\\n",
"0 530330002006 zzw-222@5x4-4d5-mrk 77 47.742573 -122.349879 \n",
"1 530330002006 223-222@5x4-48d-f4v 5 47.701524 -122.323637 \n",
"2 530330002006 zzw-226@5x4-48g-t5f 5 47.680164 -122.323755 \n",
"3 530330002006 223-222@5x4-4nz-pqf 5 47.71778 -122.313042 \n",
"4 530330002006 225-222@5x4-4d6-789 10 47.724882 -122.343856 \n",
"\n",
" cafe_geometry \\\n",
"0 POINT (-13619926.22889864 6064134.508228292) \n",
"1 POINT (-13617004.98282124 6057341.933516146) \n",
"2 POINT (-13617018.11852115 6053809.507537933) \n",
"3 POINT (-13615825.55281628 6060031.251671663) \n",
"4 POINT (-13619255.75160559 6061206.437371126) \n",
"\n",
" cbg_polygon_geometry \\\n",
"0 POLYGON ((-122.308696 47.733904, -122.308045 4... \n",
"1 POLYGON ((-122.308696 47.733904, -122.308045 4... \n",
"2 POLYGON ((-122.308696 47.733904, -122.308045 4... \n",
"3 POLYGON ((-122.308696 47.733904, -122.308045 4... \n",
"4 POLYGON ((-122.308696 47.733904, -122.308045 4... \n",
"\n",
" cbg_geometry \n",
"0 POINT (-13614894.7487456 6061924.520392932) \n",
"1 POINT (-13614894.7487456 6061924.520392932) \n",
"2 POINT (-13614894.7487456 6061924.520392932) \n",
"3 POINT (-13614894.7487456 6061924.520392932) \n",
"4 POINT (-13614894.7487456 6061924.520392932) "
]
},
"execution_count": 135,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sample_seattle_visitors.limit(10).toPandas().head()"
]
},
{
"cell_type": "code",
"execution_count": 136,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "11f5e9d4-73b3-4df0-91f0-1df67935edc9",
"showTitle": false,
"title": ""
}
},
"outputs": [],
"source": [
"distance_traveled_SEA = (\n",
" sample_seattle_visitors\n",
" # calculate the distance from home in meters\n",
" .withColumn('distance_from_home', f.expr(\"ST_Distance(cafe_geometry, cbg_geometry)\"))\n",
")\n",
"distance_traveled_SEA.createOrReplaceTempView('distance_traveled_SEA')"
]
},
{
"cell_type": "code",
"execution_count": 137,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "95616ebb-b693-4144-8492-1a1efebd91d2",
"showTitle": false,
"title": ""
}
},
"outputs": [],
"source": [
"q = '''\n",
"SELECT *\n",
"FROM (\n",
" SELECT DISTINCT \n",
" placekey, \n",
" cbg,\n",
" visitors,\n",
" distance_from_home,\n",
" posexplode(split(repeat(\",\", visitors), \",\"))\n",
" FROM distance_traveled_SEA\n",
")\n",
"WHERE pos > 0\n",
"'''\n",
"\n",
"weighted_median_tmp = spark.sql(q)"
]
},
{
"cell_type": "code",
"execution_count": 138,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "0b9e4328-7f7d-4a8c-a3f9-2b2eb915b81e",
"showTitle": false,
"title": ""
}
},
"outputs": [],
"source": [
"grp_window = Window.partitionBy('placekey')\n",
"median_percentile = f.expr('percentile_approx(distance_from_home, 0.5)')"
]
},
{
"cell_type": "code",
"execution_count": 139,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "6b759ccb-409c-4862-a4a3-d3bc4b5ac790",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
" \r"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" placekey | \n",
" cbg | \n",
" visitors | \n",
" distance_from_home | \n",
" pos | \n",
" col | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" zzw-222@5x4-4d5-mrk | \n",
" 530330002006 | \n",
" 77 | \n",
" 5495.437996 | \n",
" 1 | \n",
" | \n",
"
\n",
" \n",
" 1 | \n",
" zzw-222@5x4-4d5-mrk | \n",
" 530330002006 | \n",
" 77 | \n",
" 5495.437996 | \n",
" 2 | \n",
" | \n",
"
\n",
" \n",
" 2 | \n",
" zzw-222@5x4-4d5-mrk | \n",
" 530330002006 | \n",
" 77 | \n",
" 5495.437996 | \n",
" 3 | \n",
" | \n",
"
\n",
" \n",
" 3 | \n",
" zzw-222@5x4-4d5-mrk | \n",
" 530330002006 | \n",
" 77 | \n",
" 5495.437996 | \n",
" 4 | \n",
" | \n",
"
\n",
" \n",
" 4 | \n",
" zzw-222@5x4-4d5-mrk | \n",
" 530330002006 | \n",
" 77 | \n",
" 5495.437996 | \n",
" 5 | \n",
" | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" placekey cbg visitors distance_from_home pos col\n",
"0 zzw-222@5x4-4d5-mrk 530330002006 77 5495.437996 1 \n",
"1 zzw-222@5x4-4d5-mrk 530330002006 77 5495.437996 2 \n",
"2 zzw-222@5x4-4d5-mrk 530330002006 77 5495.437996 3 \n",
"3 zzw-222@5x4-4d5-mrk 530330002006 77 5495.437996 4 \n",
"4 zzw-222@5x4-4d5-mrk 530330002006 77 5495.437996 5 "
]
},
"execution_count": 139,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"weighted_median_tmp.limit(10).toPandas().head()"
]
},
{
"cell_type": "code",
"execution_count": 140,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "c6d72ef3-d4ae-444c-8e1b-8beb8d990407",
"showTitle": false,
"title": ""
}
},
"outputs": [],
"source": [
"median_dist_traveled_SEA = (\n",
" weighted_median_tmp\n",
" .groupBy('placekey')\n",
" .agg(median_percentile.alias('median_dist_traveled_SEA'))\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 141,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "ffb3a4b3-1ede-4e80-a22d-bb2968ef1b40",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
" \r"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" placekey | \n",
" median_dist_traveled_SEA | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" zzw-225@5x4-8xs-h3q | \n",
" 5373.004786 | \n",
"
\n",
" \n",
" 1 | \n",
" zzw-222@5x4-49x-jqf | \n",
" 1337.200428 | \n",
"
\n",
" \n",
" 2 | \n",
" 224-222@5x4-4mg-9vf | \n",
" 2707.661156 | \n",
"
\n",
" \n",
" 3 | \n",
" 222-222@5x4-4fr-mrk | \n",
" 9124.893780 | \n",
"
\n",
" \n",
" 4 | \n",
" zzw-223@5x4-8pj-b49 | \n",
" 2364.791999 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" placekey median_dist_traveled_SEA\n",
"0 zzw-225@5x4-8xs-h3q 5373.004786\n",
"1 zzw-222@5x4-49x-jqf 1337.200428\n",
"2 224-222@5x4-4mg-9vf 2707.661156\n",
"3 222-222@5x4-4fr-mrk 9124.893780\n",
"4 zzw-223@5x4-8pj-b49 2364.791999"
]
},
"execution_count": 141,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"median_dist_traveled_SEA.filter(f.col('median_dist_traveled_SEA') < 25000).limit(10).toPandas().head()\n"
]
},
{
"cell_type": "code",
"execution_count": 142,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "0eab1532-a2b6-4de0-ab48-feb16672125e",
"showTitle": false,
"title": ""
}
},
"outputs": [],
"source": [
"total_visits = (\n",
" sample\n",
" .groupBy('placekey')\n",
" .agg(\n",
" f.sum('raw_visit_counts').alias('total_visits'),\n",
" f.sum('raw_visitor_counts').alias('total_visitors')\n",
" )\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 143,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "443071d7-f9ae-469b-8b73-d57af8880635",
"showTitle": false,
"title": ""
}
},
"outputs": [],
"source": [
"distance_traveled_final = (\n",
" furthest_traveled\n",
" .join(median_dist_traveled_SEA, 'placekey')\n",
" .join(cafes_latest, ['placekey', 'location_name'])\n",
" .withColumn('distance_traveled_diff', f.col('avg_median_dist_from_home') - f.col('median_dist_traveled_SEA'))\n",
" # keep only the cafes with a meaningful sample - at least 1000 visits since 2018\n",
" .join(total_visits, 'placekey')\n",
" .filter(f.col('total_visits') > 1000)\n",
" .orderBy('distance_traveled_diff', ascending = False)\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 144,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "1af4befb-3fed-4010-ac42-b85e4be5fc2a",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
" \r"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" placekey | \n",
" location_name | \n",
" avg_median_dist_from_home | \n",
" median_dist_traveled_SEA | \n",
" brands | \n",
" street_address | \n",
" city | \n",
" region | \n",
" postal_code | \n",
" latitude | \n",
" longitude | \n",
" open_hours | \n",
" distance_traveled_diff | \n",
" total_visits | \n",
" total_visitors | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" zzw-22f@5x4-4b5-p7q | \n",
" Starbucks | \n",
" 750913.558140 | \n",
" 23663.589737 | \n",
" Starbucks | \n",
" 1912 Pike Pl | \n",
" Seattle | \n",
" WA | \n",
" 98101 | \n",
" 47.6101 | \n",
" -122.342482 | \n",
" { \"Mon\": [[\"6:30\", \"20:30\"]], \"Tue\": [[\"6:30\",... | \n",
" 727249.968402 | \n",
" 5195.0 | \n",
" 4938.0 | \n",
"
\n",
" \n",
" 1 | \n",
" 223-222@5x4-4b5-dvz | \n",
" Ghost Alley Espresso | \n",
" 727402.888889 | \n",
" 26369.553927 | \n",
" None | \n",
" 1499 Post Aly | \n",
" Seattle | \n",
" WA | \n",
" 98101 | \n",
" 47.608608 | \n",
" -122.340546 | \n",
" { \"Mon\": [[\"7:30\", \"17:30\"]], \"Tue\": [[\"7:30\",... | \n",
" 701033.334962 | \n",
" 17286.0 | \n",
" 16489.0 | \n",
"
\n",
" \n",
" 2 | \n",
" 224-224@5x4-4b4-xt9 | \n",
" The Bar | \n",
" 667363.222222 | \n",
" 290.137631 | \n",
" None | \n",
" 110 6th Ave N | \n",
" Seattle | \n",
" WA | \n",
" 98109 | \n",
" 47.618842 | \n",
" -122.344648 | \n",
" None | \n",
" 667073.084592 | \n",
" 15564.0 | \n",
" 7781.0 | \n",
"
\n",
" \n",
" 3 | \n",
" 22c-222@5x4-4b4-s5z | \n",
" Starbucks | \n",
" 654595.066667 | \n",
" 1881.813510 | \n",
" Starbucks | \n",
" 1124 Pike St | \n",
" Seattle | \n",
" WA | \n",
" 98101 | \n",
" 47.613994 | \n",
" -122.328201 | \n",
" { \"Mon\": [[\"7:00\", \"21:00\"]], \"Tue\": [[\"7:00\",... | \n",
" 652713.253157 | \n",
" 52085.0 | \n",
" 43177.0 | \n",
"
\n",
" \n",
" 4 | \n",
" zzw-222@5x4-4b5-pvz | \n",
" Cafe Opla | \n",
" 661286.800000 | \n",
" 24168.052053 | \n",
" None | \n",
" 2200 Alaskan Way Ste 120 | \n",
" Seattle | \n",
" WA | \n",
" 98121 | \n",
" 47.611012 | \n",
" -122.34752 | \n",
" None | \n",
" 637118.747947 | \n",
" 14462.0 | \n",
" 9118.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" placekey location_name avg_median_dist_from_home \\\n",
"0 zzw-22f@5x4-4b5-p7q Starbucks 750913.558140 \n",
"1 223-222@5x4-4b5-dvz Ghost Alley Espresso 727402.888889 \n",
"2 224-224@5x4-4b4-xt9 The Bar 667363.222222 \n",
"3 22c-222@5x4-4b4-s5z Starbucks 654595.066667 \n",
"4 zzw-222@5x4-4b5-pvz Cafe Opla 661286.800000 \n",
"\n",
" median_dist_traveled_SEA brands street_address city \\\n",
"0 23663.589737 Starbucks 1912 Pike Pl Seattle \n",
"1 26369.553927 None 1499 Post Aly Seattle \n",
"2 290.137631 None 110 6th Ave N Seattle \n",
"3 1881.813510 Starbucks 1124 Pike St Seattle \n",
"4 24168.052053 None 2200 Alaskan Way Ste 120 Seattle \n",
"\n",
" region postal_code latitude longitude \\\n",
"0 WA 98101 47.6101 -122.342482 \n",
"1 WA 98101 47.608608 -122.340546 \n",
"2 WA 98109 47.618842 -122.344648 \n",
"3 WA 98101 47.613994 -122.328201 \n",
"4 WA 98121 47.611012 -122.34752 \n",
"\n",
" open_hours distance_traveled_diff \\\n",
"0 { \"Mon\": [[\"6:30\", \"20:30\"]], \"Tue\": [[\"6:30\",... 727249.968402 \n",
"1 { \"Mon\": [[\"7:30\", \"17:30\"]], \"Tue\": [[\"7:30\",... 701033.334962 \n",
"2 None 667073.084592 \n",
"3 { \"Mon\": [[\"7:00\", \"21:00\"]], \"Tue\": [[\"7:00\",... 652713.253157 \n",
"4 None 637118.747947 \n",
"\n",
" total_visits total_visitors \n",
"0 5195.0 4938.0 \n",
"1 17286.0 16489.0 \n",
"2 15564.0 7781.0 \n",
"3 52085.0 43177.0 \n",
"4 14462.0 9118.0 "
]
},
"execution_count": 144,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"distance_traveled_final.limit(10).toPandas().head()"
]
},
{
"cell_type": "code",
"execution_count": 145,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "7ebcbde8-5138-4b78-95f9-72584936c841",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
" \r"
]
},
{
"data": {
"text/plain": [
"1429"
]
},
"execution_count": 145,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"distance_traveled_final.select('placekey').distinct().count()"
]
},
{
"cell_type": "code",
"execution_count": 146,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "99f02384-ffd0-4524-ae86-c39fd3295d55",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
" \r"
]
}
],
"source": [
"most_tourists = distance_traveled_final.limit(500).withColumn('visitor_type', f.lit('tourist'))\n",
"most_locals = distance_traveled_final.orderBy('distance_traveled_diff').limit(500).withColumn('visitor_type', f.lit('local'))\n",
"\n",
"visitor_type = most_tourists.unionByName(most_locals)\n",
"\n",
"# create a geopandas geodataframe\n",
"visitor_type_gdf = visitor_type.toPandas()\n",
"visitor_type_gdf = gpd.GeoDataFrame(visitor_type_gdf, geometry = gpd.points_from_xy(visitor_type_gdf['longitude'], visitor_type_gdf['latitude']), crs = 'EPSG:4326')"
]
},
{
"cell_type": "code",
"execution_count": 147,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "853ad746-d7a7-4a3d-aa49-6c206187cda5",
"showTitle": false,
"title": ""
}
},
"outputs": [],
"source": [
"def map_cafe_visitor_type(gdf):\n",
" \n",
" # map bounds\n",
" sw = [gdf.unary_union.bounds[1], gdf.unary_union.bounds[0]]\n",
" ne = [gdf.unary_union.bounds[3], gdf.unary_union.bounds[2]]\n",
" folium_bounds = [sw, ne]\n",
" \n",
" # map\n",
" x = gdf.centroid.x[0]\n",
" y = gdf.centroid.y[0]\n",
" \n",
" map_ = folium.Map(\n",
" location = [y, x],\n",
" tiles = \"OpenStreetMap\"\n",
" )\n",
" \n",
" for i, point in gdf.iterrows():\n",
" \n",
" tooltip = f\"placekey: {point['placekey']}
location_name: {point['location_name']}
brands: {point['brands']}
street_address: {point['street_address']}
city: {point['city']}
region: {point['region']}
postal_code: {point['postal_code']}
visitor_type: {point['visitor_type']}
avg_median_dist_from_home: {point['avg_median_dist_from_home']}\"\n",
" \n",
" folium.Circle(\n",
" [point['geometry'].y, point['geometry'].x],\n",
" radius = 40,\n",
" fill_color = 'blue' if point['visitor_type'] == 'tourist' else 'red',\n",
" color = 'blue' if point['visitor_type'] == 'tourist' else 'red',\n",
" fill_opacity = 1,\n",
" tooltip = tooltip\n",
" ).add_to(map_)\n",
"\n",
" map_.fit_bounds(folium_bounds) \n",
" \n",
" return map_"
]
},
{
"cell_type": "code",
"execution_count": 148,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "67e401a7-fb5e-4ab0-8f97-7d7361df2397",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/tmp/ipykernel_2685102/2617676160.py:9: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.\n",
"\n",
" x = gdf.centroid.x[0]\n",
"/tmp/ipykernel_2685102/2617676160.py:10: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.\n",
"\n",
" y = gdf.centroid.y[0]\n"
]
},
{
"data": {
"text/html": [
"Make this Notebook Trusted to load map: File -> Trust Notebook
"
],
"text/plain": [
""
]
},
"execution_count": 148,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# blue = touristy, red = locals\n",
"map_cafe_visitor_type(visitor_type_gdf)"
]
},
{
"cell_type": "markdown",
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "305edc97-20cf-4ccc-b5e4-64385bdf2bc4",
"showTitle": false,
"title": ""
}
},
"source": [
"### What location will receive the most visibility?\n",
"\n",
"In other words, in what neighborhood do coffee shops get the most visits generally?"
]
},
{
"cell_type": "code",
"execution_count": 149,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "6da8be09-d963-4e90-a263-a8bdb9664a99",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" S_HOOD | \n",
" L_HOOD | \n",
" geometry | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" OOO | \n",
" | \n",
" POLYGON ((-122.2739789529401 47.69522647266365... | \n",
"
\n",
" \n",
" 1 | \n",
" OOO | \n",
" | \n",
" POLYGON ((-122.2875597861967 47.64522740482133... | \n",
"
\n",
" \n",
" 2 | \n",
" OOO | \n",
" | \n",
" POLYGON ((-122.3952908582123 47.6651350445393,... | \n",
"
\n",
" \n",
" 3 | \n",
" OOO | \n",
" | \n",
" POLYGON ((-122.3983207858678 47.66608770690774... | \n",
"
\n",
" \n",
" 4 | \n",
" OOO | \n",
" | \n",
" POLYGON ((-122.2885127664106 47.65630022774357... | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" S_HOOD L_HOOD geometry\n",
"0 OOO POLYGON ((-122.2739789529401 47.69522647266365...\n",
"1 OOO POLYGON ((-122.2875597861967 47.64522740482133...\n",
"2 OOO POLYGON ((-122.3952908582123 47.6651350445393,...\n",
"3 OOO POLYGON ((-122.3983207858678 47.66608770690774...\n",
"4 OOO POLYGON ((-122.2885127664106 47.65630022774357..."
]
},
"execution_count": 149,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"WA_neighbs = (\n",
" spark.read.option('header', 'true').option('escape', \"\\\"\").csv('file:///media/hdd1/code/sigspatial-2021-cafe-analysis/data/seattle_neighborhoods.csv')\n",
" # transform the geometry column into a Geometry-type\n",
" .withColumn('geometry', f.expr(\"ST_GeomFromWkt(geometry)\"))\n",
")\n",
"WA_neighbs.createOrReplaceTempView('WA_neighbs')\n",
"\n",
"WA_neighbs.limit(10).toPandas().head()"
]
},
{
"cell_type": "code",
"execution_count": 150,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "6430ac39-a32e-4baa-b006-11679cbc6f36",
"showTitle": false,
"title": ""
}
},
"outputs": [],
"source": [
"cafes_geo = cafes_latest.withColumn('cafe_geometry', f.expr(\"ST_Point(CAST(longitude AS Decimal(24,20)), CAST(latitude AS Decimal(24,20)))\")).select('placekey', 'cafe_geometry')\n",
"cafes_geo.createOrReplaceTempView('cafes_geo')"
]
},
{
"cell_type": "code",
"execution_count": 151,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "8fe0b764-dccd-4eba-ad9e-15cb554fb797",
"showTitle": false,
"title": ""
}
},
"outputs": [],
"source": [
"# perform a spatial join\n",
"q = '''\n",
"SELECT cafes_geo.placekey, WA_neighbs.S_HOOD as neighborhood, WA_neighbs.geometry\n",
"FROM WA_neighbs, cafes_geo\n",
"WHERE ST_Intersects(WA_neighbs.geometry, cafes_geo.cafe_geometry)\n",
"'''\n",
"\n",
"cafe_neighb_join = spark.sql(q)"
]
},
{
"cell_type": "code",
"execution_count": 152,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "b70c626b-8ccb-4671-95d8-df7baade3a26",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2022-05-22 14:03:39,421 WARN spatialOperator.JoinQuery: UseIndex is true, but no index exists. Will build index on the fly.\n",
" \r"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" placekey | \n",
" neighborhood | \n",
" geometry | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 222-222@5x4-4fr-mrk | \n",
" Adams | \n",
" POLYGON ((-122.3763365647225 47.67591769896643... | \n",
"
\n",
" \n",
" 1 | \n",
" zzw-222@5x4-49x-jqf | \n",
" South Lake Union | \n",
" POLYGON ((-122.3296783220331 47.63972718752359... | \n",
"
\n",
" \n",
" 2 | \n",
" 223-222@5x4-487-3t9 | \n",
" Maple Leaf | \n",
" POLYGON ((-122.3300076901854 47.70863525163715... | \n",
"
\n",
" \n",
" 3 | \n",
" zzw-224@5x4-4vs-jsq | \n",
" International District | \n",
" POLYGON ((-122.3202379357072 47.59582723132589... | \n",
"
\n",
" \n",
" 4 | \n",
" 223-222@5x4-4ft-3bk | \n",
" Sunset Hill | \n",
" POLYGON ((-122.402107196151 47.69768191273809,... | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" placekey neighborhood \\\n",
"0 222-222@5x4-4fr-mrk Adams \n",
"1 zzw-222@5x4-49x-jqf South Lake Union \n",
"2 223-222@5x4-487-3t9 Maple Leaf \n",
"3 zzw-224@5x4-4vs-jsq International District \n",
"4 223-222@5x4-4ft-3bk Sunset Hill \n",
"\n",
" geometry \n",
"0 POLYGON ((-122.3763365647225 47.67591769896643... \n",
"1 POLYGON ((-122.3296783220331 47.63972718752359... \n",
"2 POLYGON ((-122.3300076901854 47.70863525163715... \n",
"3 POLYGON ((-122.3202379357072 47.59582723132589... \n",
"4 POLYGON ((-122.402107196151 47.69768191273809,... "
]
},
"execution_count": 152,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cafe_neighb_join.limit(10).toPandas().head()"
]
},
{
"cell_type": "code",
"execution_count": 153,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "eef49cf0-b948-4bc2-a710-a26e4100ed8f",
"showTitle": false,
"title": ""
}
},
"outputs": [],
"source": [
"# add the visit and visitor counts and aggregate up to the neighborhood\n",
"neighborhood_agg = (\n",
" cafe_neighb_join\n",
" .join(total_visits, 'placekey')\n",
" .groupBy('neighborhood', f.col('geometry').cast('string').alias('geometry'))\n",
" .agg(\n",
" f.sum('total_visits').alias('total_visits'),\n",
" f.sum('total_visitors').alias('total_visitors')\n",
" )\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 154,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "2589de66-5e04-4455-ba0c-ce6df58eb577",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2022-05-22 14:03:48,317 WARN spatialOperator.JoinQuery: UseIndex is true, but no index exists. Will build index on the fly.\n",
" \r"
]
}
],
"source": [
"neighbs_gdf = (\n",
" neighborhood_agg\n",
" .toPandas()\n",
")\n",
"neighbs_gdf = gpd.GeoDataFrame(neighbs_gdf, geometry = gpd.GeoSeries.from_wkt(neighbs_gdf['geometry']), crs = 'EPSG:4326')"
]
},
{
"cell_type": "code",
"execution_count": 155,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "4d59b328-99e1-459f-b409-8dfa9e9f518a",
"showTitle": false,
"title": ""
}
},
"outputs": [],
"source": [
"def map_neighbs(gdf):\n",
" \n",
" # map bounds\n",
" sw = [gdf.unary_union.bounds[1], gdf.unary_union.bounds[0]]\n",
" ne = [gdf.unary_union.bounds[3], gdf.unary_union.bounds[2]]\n",
" folium_bounds = [sw, ne]\n",
" \n",
" # map\n",
" x = gdf.centroid.x[0]\n",
" y = gdf.centroid.y[0]\n",
" \n",
" map_ = folium.Map(\n",
" location = [y, x],\n",
" tiles = \"OpenStreetMap\"\n",
" )\n",
" \n",
" gdf['percentile'] = pd.qcut(gdf['total_visits'], 100, labels=False) / 100\n",
" \n",
" folium.GeoJson(\n",
" gdf[['neighborhood', 'total_visits', 'total_visitors', 'percentile', 'geometry']],\n",
" style_function = lambda x: {\n",
" 'weight':0,\n",
" 'color':'blue',\n",
" 'fillOpacity': x['properties']['percentile']\n",
" },\n",
" tooltip = folium.features.GeoJsonTooltip(\n",
" fields = ['neighborhood', 'total_visits', 'total_visitors', 'percentile']\n",
" )\n",
" ).add_to(map_)\n",
"\n",
" map_.fit_bounds(folium_bounds) \n",
" \n",
" return map_"
]
},
{
"cell_type": "code",
"execution_count": 156,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "a63202a8-4e20-4e05-a406-0feedea81106",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/tmp/ipykernel_2685102/3760141471.py:9: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.\n",
"\n",
" x = gdf.centroid.x[0]\n",
"/tmp/ipykernel_2685102/3760141471.py:10: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.\n",
"\n",
" y = gdf.centroid.y[0]\n"
]
},
{
"data": {
"text/html": [
"Make this Notebook Trusted to load map: File -> Trust Notebook
"
],
"text/plain": [
""
]
},
"execution_count": 156,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"map_neighbs(neighbs_gdf)"
]
},
{
"cell_type": "markdown",
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "bd8784f0-5063-4b62-9da0-48f42aea8779",
"showTitle": false,
"title": ""
}
},
"source": [
"### What home location has the most coffee-shop-goers?"
]
},
{
"cell_type": "code",
"execution_count": 157,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "576a8374-63cc-461b-8fb7-034cad0799d3",
"showTitle": false,
"title": ""
}
},
"outputs": [],
"source": [
"home_loc_most_cafe_visitors = (\n",
" sample\n",
" .select(f.explode('visitor_home_cbgs'))\n",
" .withColumnRenamed('key', 'cbg')\n",
" .withColumnRenamed('value', 'visitors')\n",
" .groupBy('cbg')\n",
" .agg(f.sum('visitors').alias('visitors'))\n",
" .orderBy('visitors', ascending = False)\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 158,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "fb9c94b2-cb04-4fe9-982a-b8c34ae27928",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
" \r"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" cbg | \n",
" visitors | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 530330082001 | \n",
" 40286 | \n",
"
\n",
" \n",
" 1 | \n",
" 530330326023 | \n",
" 20529 | \n",
"
\n",
" \n",
" 2 | \n",
" 530330072001 | \n",
" 19432 | \n",
"
\n",
" \n",
" 3 | \n",
" 530530731251 | \n",
" 18486 | \n",
"
\n",
" \n",
" 4 | \n",
" 530530702031 | \n",
" 18131 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" cbg visitors\n",
"0 530330082001 40286\n",
"1 530330326023 20529\n",
"2 530330072001 19432\n",
"3 530530731251 18486\n",
"4 530530702031 18131"
]
},
"execution_count": 158,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 40k visitors to Seattle coffee shops since Jan 2018 originated from CBG `530330082001`, just under twice as many as the next CBG.\n",
"home_loc_most_cafe_visitors.limit(10).toPandas().head()"
]
},
{
"cell_type": "code",
"execution_count": 159,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "97f719eb-f50b-45fa-81a5-2566865e6682",
"showTitle": false,
"title": ""
}
},
"outputs": [],
"source": [
"from shapely import wkt"
]
},
{
"cell_type": "code",
"execution_count": 160,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "2fa0df31-b5d8-492b-8e34-b699f7a52adf",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
" \r"
]
}
],
"source": [
"# Map of the top 1000 CBGs in terms of visitors' origins\n",
"home_loc_gdf = (\n",
" home_loc_most_cafe_visitors\n",
" .limit(1000)\n",
" .join(\n",
" WA_cbgs.select('cbg', 'cbg_polygon_geometry'),\n",
" 'cbg'\n",
" )\n",
" .toPandas()\n",
")\n",
"home_loc_gdf = gpd.GeoDataFrame(home_loc_gdf, geometry = gpd.GeoSeries.from_wkt(home_loc_gdf['cbg_polygon_geometry']), crs = 'EPSG:4326')"
]
},
{
"cell_type": "code",
"execution_count": 161,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "f73c0bcb-a047-4437-ab68-6561f86b9c96",
"showTitle": false,
"title": ""
}
},
"outputs": [],
"source": [
"def map_cbgs(gdf):\n",
" \n",
" # map bounds\n",
" sw = [gdf.unary_union.bounds[1], gdf.unary_union.bounds[0]]\n",
" ne = [gdf.unary_union.bounds[3], gdf.unary_union.bounds[2]]\n",
" folium_bounds = [sw, ne]\n",
" \n",
" # map\n",
" x = gdf.centroid.x[0]\n",
" y = gdf.centroid.y[0]\n",
" \n",
" map_ = folium.Map(\n",
" location = [y, x],\n",
" tiles = \"OpenStreetMap\"\n",
" )\n",
" \n",
" gdf['quantile'] = pd.qcut(gdf['visitors'], 100, labels=False) / 100\n",
" \n",
" folium.GeoJson(\n",
" gdf[['cbg', 'visitors', 'geometry', 'quantile']],\n",
" style_function = lambda x: {\n",
" 'weight':0,\n",
" 'color':'blue',\n",
" 'fillOpacity': x['properties']['quantile']\n",
" },\n",
" tooltip = folium.features.GeoJsonTooltip(\n",
" fields = ['cbg', 'visitors', 'quantile']\n",
" )\n",
" ).add_to(map_)\n",
"\n",
" map_.fit_bounds(folium_bounds) \n",
" \n",
" return map_"
]
},
{
"cell_type": "code",
"execution_count": 162,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "fbc5ed14-3a88-42aa-bcd7-1e99509828f3",
"showTitle": false,
"title": ""
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/tmp/ipykernel_2685102/4182001321.py:9: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.\n",
"\n",
" x = gdf.centroid.x[0]\n",
"/tmp/ipykernel_2685102/4182001321.py:10: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.\n",
"\n",
" y = gdf.centroid.y[0]\n"
]
},
{
"data": {
"text/html": [
"Make this Notebook Trusted to load map: File -> Trust Notebook
"
],
"text/plain": [
""
]
},
"execution_count": 162,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# top 1000 home_cbgs by number of people who visited coffee shops in Seattle\n",
"map_cbgs(home_loc_gdf)"
]
},
{
"cell_type": "markdown",
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "51cd82af-30e8-42ae-9cb7-6e29c8095e1b",
"showTitle": false,
"title": ""
}
},
"source": [
"## Conclusion\n",
"\n",
"Apache Sedona provides a familiar and straight-forward spatial SQL API for performing distributed spatial queries."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"application/vnd.databricks.v1+cell": {
"inputWidgets": {},
"nuid": "d4489511-104a-4c4f-aa95-7c9f1f81a6a8",
"showTitle": false,
"title": ""
}
},
"outputs": [],
"source": []
}
],
"metadata": {
"application/vnd.databricks.v1+notebook": {
"dashboards": [],
"language": "python",
"notebookMetadata": {
"pythonIndentUnit": 2
},
"notebookName": "Workshop_example",
"notebookOrigID": 3914119931284500,
"widgets": {}
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
}
},
"nbformat": 4,
"nbformat_minor": 4
}