{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Los Angeles Weedmaps analysis\n", "\n", "By [Ben Welsh](https://palewi.re/who-is-ben-welsh/)\n", "\n", "This analysis was conducted for the May 29, 2019, Los Angeles Times story [\" Black market cannabis shops thrive in L.A. even as city cracks down\"](https://www.latimes.com/local/lanow/la-me-weed-pot-dispensaries-illegal-marijuana-weedmaps-black-market-los-angeles-20190529-story.html). It found more than 220 dispensaries inside Los Angeles advertised on Weedmaps.com that are not on the city's list of authorized retailers.\n", "\n", "Due to the volatilty of the marijuana business and the high frequency of law enforcement actions against illegal operators, the number of listings can vary from day to day. So if you rerun this notebook you are likely to get slightly different results." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### How we did it" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Import Python tools" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import math\n", "import time\n", "import requests\n", "import pandas as pd\n", "import geopandas as gpd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Scrape the WeedMaps API" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "def scrape():\n", " \"\"\"\n", " Provided a page number, returns all LA-area listings from the WeedMaps API\n", " \"\"\"\n", " base_url = \"https://api-g.weedmaps.com/discovery/v1/listings\"\n", " \n", " # Configure our request payload\n", " params = {\n", " # The type of business we want to be returned\n", " 'filter[any_retailer_services][]': ['doctor', 'storefront', 'delivery'],\n", " # A bounding box around the city of Los Angeles, with some area to spare\n", " 'filter[bounding_box]': ['33.0,-119.0,36.0,-117.0'],\n", " # The number items we want in the return the page. I believe this is the maximum allowed.\n", " 'page_size': [\"150\"],\n", " 'size': ['100'],\n", " # The page number we are requesting\n", " 'page': [1]\n", " }\n", "\n", " # Make an initial probe to determine how many pages we need to scrape\n", " r = requests.get(base_url, params=params)\n", " total_listings = r.json()['meta']['total_listings']\n", " \n", " # Do the math\n", " print(f\"{total_listings} listings found\")\n", " pages = int(math.ceil(total_listings / 100))\n", " print(f\"Downloading {pages} pages\")\n", " \n", "\n", " # Loop through and download each page\n", " dict_list = [] \n", " for page in range(1, pages+1):\n", " print(f\"Downloading page {page}\")\n", " \n", " # Reconfigure the payload\n", " params['page'] = [str(page)]\n", "\n", " # Make the request\n", " r = requests.get(base_url, params=params)\n", "\n", " # Parse each item\n", " for d in r.json()['data']['listings']:\n", " dict_list.append(dict(\n", " name=d['name'],\n", " slug=d['slug'],\n", " type=d['type'],\n", " address=d['address'],\n", " city=d['city'],\n", " state=d['state'],\n", " zipcode=d['zip_code'],\n", " x=d['longitude'],\n", " y=d['latitude'],\n", " url=d['web_url']\n", " ))\n", " \n", " # Give the API a break\n", " time.sleep(1)\n", " \n", " # Return the result\n", " return dict_list" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dict_list = scrape()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Convert the results into a DataFrame" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "df = pd.DataFrame(dict_list)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Read in the Los Angeles city boundaries" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "la_city = gpd.read_file(\"http://boundaries.latimes.com/1.0/boundary/los-angeles-census-place-2012/?format=geojson\") " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Convert our DataFrame to a GeoDataFrame with a matching CRS." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "gdf = gpd.GeoDataFrame(\n", " df,\n", " crs=la_city.crs,\n", " geometry=geometry=gpd.points_from_xy(df.x, df.y)\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pull out the LA city geometry." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "city_limits = la_city.iloc[0].geometry" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "" ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "city_limits" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Filter the scrape to the city limits." ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [], "source": [ "gdf_la = gdf[gdf.geometry.within(city_limits)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Clean up the file a little." ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [], "source": [ "gdf_trimmed = gdf_la[[\n", " 'name',\n", " 'slug',\n", " 'type',\n", " 'address',\n", " 'city',\n", " 'zipcode',\n", " 'url',\n", " 'x',\n", " 'y',\n", " 'geometry'\n", "]].sort_values(\"name\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Write the result to a file we can vet." ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [], "source": [ "gdf_trimmed.to_csv(\n", " \"./data/scraped-la-listings.csv\",\n", " index=False,\n", " encoding=\"utf-8\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Offline this scraped file was compared to the [official list of retailers authorized by the city](https://cannabis.lacity.org/resources/authorized-retail-businesses). Licensed storefronts were marked manually. That refined list will be read back in for analysis." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "analysis_df = pd.read_csv(\"./data/vetted-la-dispensaries.csv\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How many total dispensaries were listed?" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "365" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(analysis_df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How many are registered versus unregistered?" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "analysis_df.registered.fillna(False, inplace=True)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False 221\n", "True 144\n", "Name: registered, dtype: int64" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "analysis_df.registered.value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What's the percentage of each?" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False 0.605479\n", "True 0.394521\n", "Name: registered, dtype: float64" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "analysis_df.registered.value_counts() / len(analysis_df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How many are on Florence Avenue?" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "florence = analysis_df[\n", " (analysis_df.address.str.upper().str.contains(\"FLORENCE\")) &\n", " (analysis_df.registered == False)\n", "]" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "12" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(florence)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What are the totals in our different regions of LA city?" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "analysis_gdf = gpd.GeoDataFrame(\n", " analysis_df,\n", " crs={'init': 'epsg:4326'},\n", " geometry=gpd.points_from_xy(analysis_df.x, analysis_df.y)\n", ")" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "def spatial_pivot(gdf, shapes):\n", " sjoin = gpd.sjoin(shapes, gdf, op=\"intersects\")\n", " spivot = sjoin.groupby([\"name_left\", \"registered\"]).size().rename(\"dispensaries\").reset_index().pivot(\n", " index=\"name_left\",\n", " columns=\"registered\",\n", " values=\"dispensaries\"\n", " )\n", " spivot['total'] = spivot.sum(axis=1)\n", " spivot['percent_unlicensed'] = spivot[False] / spivot['total']\n", " return spivot.sort_values(\"percent_unlicensed\", ascending=False)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "regions = gpd.read_file(\"http://boundaries.latimes.com/1.0/boundary-set/la-county-municipal-regions-v6/?format=geojson\")" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
registeredFalseTruetotalpercent_unlicensed
name_left
South708780.897436
Harbor228300.733333
The Eastside167230.695652
Central5130810.629630
The Valley50691190.420168
The Westside1119300.366667
Northeast1340.250000
\n", "
" ], "text/plain": [ "registered False True total percent_unlicensed\n", "name_left \n", "South 70 8 78 0.897436\n", "Harbor 22 8 30 0.733333\n", "The Eastside 16 7 23 0.695652\n", "Central 51 30 81 0.629630\n", "The Valley 50 69 119 0.420168\n", "The Westside 11 19 30 0.366667\n", "Northeast 1 3 4 0.250000" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "spatial_pivot(analysis_gdf, regions)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What are the totals in different neighborhoods of LA city?" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "hoods = gpd.read_file(\"http://boundaries.latimes.com/1.0/boundary-set/la-county-neighborhoods-v6/?format=geojson\")" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
registeredFalseTruetotalpercent_unlicensed
name_left
Downtown21.07.028.00.750000
Florence14.01.015.00.933333
North Hollywood13.06.019.00.684211
Wilmington10.01.011.00.909091
Boyle Heights10.04.014.00.714286
\n", "
" ], "text/plain": [ "registered False True total percent_unlicensed\n", "name_left \n", "Downtown 21.0 7.0 28.0 0.750000\n", "Florence 14.0 1.0 15.0 0.933333\n", "North Hollywood 13.0 6.0 19.0 0.684211\n", "Wilmington 10.0 1.0 11.0 0.909091\n", "Boyle Heights 10.0 4.0 14.0 0.714286" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "spatial_pivot(analysis_gdf, hoods).sort_values(False, ascending=False).head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Write out file for mapping." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "analysis_gdf.to_file(\"./data/map.geojson\", driver=\"GeoJSON\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" } }, "nbformat": 4, "nbformat_minor": 2 }