{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "![title](https://www.nationsonline.org/gallery/USA/Golden-Gate-Bridge-San-Francisco.jpg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#
An intelligent location study and machine learning algorithms to select locations from a Italian restaurant in the city of San Francisco
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Roque Leal
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Italian restaurant of San Francisco are part of the culture of the city, the customs of its inhabitants and its tourist circuit. They have been the subject of study by different writers, inspirers of countless artistic creations and traditional union meeting.\n", "In this project, the idea is to find an optimal location for a new Italian restaurant, based on machine learning algorithms taken from the \"The Battle of Neighborhoods: Coursera Capstone Project\" course (1).\n", "Starting from the association of Italian restaurant with restaurants, we will first try to detect locations based on the definition of factors that will influence our decision:\n", "\n", "** 1- Places that are not yet full of restaurants. **\n", "\n", "** 2- Areas with little or no cafe nearby. **\n", "\n", "** 3- Near the center, if possible, assuming the first two conditions are met. **\n", "\n", "With these simple parameters we will program an algorithm to discover what solutions can be obtained." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data Source\n", "\n", "The following data sources will be needed to extract and generate the required information:\n", "\n", "1.- The centers of the candidate areas will be generated automatically following the algorithm and the approximate addresses of the centers of these areas will be obtained using one of the Geopy Geocoders packages. (2)\n", "\n", "2-The number of restaurants, their type and location in each neighborhood will be obtained using the Foursquare API. (3)\n", "\n", "The data will be used in the following scenarios:\n", "\n", "** 1- To discover the density of all restaurants and cafes from the data extracted. **\n", "\n", "** 2- To identify areas that are not very dense and not very competitive. **\n", "\n", "** 3- To calculate the distances between competing restaurants. **" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Locate the candidates\n", "\n", "The target area will be the center of the city, where tourist attractions are more numerous compared to other places. From this we will create a grid of cells that covers the area of ​​interest which will be about 12x12 kilometers centered around the center of the city of San Francisco." ] }, { "cell_type": "code", "execution_count": 140, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Coordinate of 199 Gough St, San Francisco, CA 94102, USA: [37.7752096, -122.4227735] location : Rich Table, 199, Gough Street, Western Addition, San Francisco, San Francisco City and County, California, 94102, United States\n" ] } ], "source": [ "import requests\n", "\n", "from geopy.geocoders import Nominatim\n", "\n", "\n", "address = '199 Gough St, San Francisco, CA 94102, USA'\n", "geolocator = Nominatim(user_agent=\"usa_explorer\")\n", "location = geolocator.geocode(address)\n", "lat = location.latitude\n", "lng = location.longitude\n", "sf_center = [lat, lng]\n", "print('Coordinate of {}: {}'.format(address, sf_center), ' location : ', location)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We create a grid of the equidistant candidate areas, centered around the city center and that is 6 km around this point, for this we calculate the distances we need to create our grid of locations in a 2D Cartesian coordinate system that will allow us to then Calculate distances in meters.\n", "\n", "Next, we will project these coordinates in degrees of latitude / longitude to be displayed on the maps with Mapbox and Folium (3)." ] }, { "cell_type": "code", "execution_count": 141, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Coordinate Verification\n", "-------------------------------\n", "San Francisco Center Union Square longitude=-122.4227735, latitude=37.7752096\n", "San Francisco Center Union Square UTM X=550833.4653390996, Y=4181031.39254272\n", "San Francisco Center Union Square longitude=-122.4227735, latitude=37.7752096\n" ] } ], "source": [ "#!pip install shapely\n", "import shapely.geometry\n", "\n", "#!pip install pyproj\n", "import pyproj\n", "\n", "import math\n", "\n", "def lonlat_to_xy(lon, lat):\n", " proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')\n", " proj_xy = pyproj.Proj(proj=\"utm\", zone=10, datum='WGS84')\n", " xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)\n", " return xy[0], xy[1]\n", "\n", "def xy_to_lonlat(x, y):\n", " proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')\n", " proj_xy = pyproj.Proj(proj=\"utm\", zone=10, datum='WGS84')\n", " lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)\n", " return lonlat[0], lonlat[1]\n", "\n", "def calc_xy_distance(x1, y1, x2, y2):\n", " dx = x2 - x1\n", " dy = y2 - y1\n", " return math.sqrt(dx*dx + dy*dy)\n", "\n", "print('Coordinate Verification')\n", "print('-------------------------------')\n", "print('San Francisco Center Union Square longitude={}, latitude={}'.format(sf_center[1], sf_center[0]))\n", "x, y = lonlat_to_xy(sf_center[1], sf_center[0])\n", "print('San Francisco Center Union Square UTM X={}, Y={}'.format(x, y))\n", "lo, la = xy_to_lonlat(x, y)\n", "print('San Francisco Center Union Square longitude={}, latitude={}'.format(lo, la))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We create a hexagonal grid of cells: ** we move all the lines and adjust the spacing of the vertical lines so that each cell center is equidistant from all its neighbors. **" ] }, { "cell_type": "code", "execution_count": 142, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "728 Union Square San Francisco grid - SF\n" ] } ], "source": [ "sf_center_x, sf_center_y = lonlat_to_xy(sf_center[1], sf_center[0]) # City center in Cartesian coordinates\n", "\n", "k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells\n", "x_min = sf_center_x - 6000\n", "x_step = 600\n", "y_min = sf_center_y - 6000 - (int(21/k)*k*600 - 12000)/2\n", "y_step = 600 * k \n", "\n", "latitude = []\n", "longitude = []\n", "distances_from_center = []\n", "xs = []\n", "ys = []\n", "for i in range(0, int(21/k)):\n", " y = y_min + i * y_step\n", " x_offset = 300 if i%2==0 else 0\n", " for j in range(0, 21):\n", " x = x_min + j * x_step + x_offset\n", " distance_from_center = calc_xy_distance(sf_center_x, sf_center_y, x, y)\n", " if (distance_from_center <= 6001):\n", " lon, lat = xy_to_lonlat(x, y)\n", " latitude.append(lat)\n", " longitude.append(lon)\n", " distances_from_center.append(distance_from_center)\n", " xs.append(x)\n", " ys.append(y)\n", "\n", "print(len(latitudes), 'Union Square San Francisco grid - SF')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's look at the data we have so far: location in the center and the candidate neighborhood centers:" ] }, { "cell_type": "code", "execution_count": 143, "metadata": {}, "outputs": [], "source": [ "import folium" ] }, { "cell_type": "code", "execution_count": 144, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 144, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tileset = r'https://api.mapbox.com'\n", "attribution = (r'Map data © OpenStreetMap'\n", " ' contributors, Imagery © MapBox')\n", "\n", "map_sf = folium.Map(location=sf_center, zoom_start=14, tiles=tileset, attr=attribution)\n", "folium.Marker(sf_center, popup='San Francisco').add_to(map_sf)\n", "for lat, lon in zip(latitude, longitude):\n", " #folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_lyon) \n", " folium.Circle([lat, lon], radius=300, color='purple', fill=False).add_to(map_sf)\n", " #folium.Marker([lat, lon]).add_to(map_caba)\n", "map_sf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At this point, we now have the coordinates of the local centers / areas to be evaluated, at the same distance (the distance between each point and its neighbors is exactly the same) and approximately 4 km from downtown San Francisco." ] }, { "cell_type": "code", "execution_count": 145, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Reverse geocoding check\n", "-----------------------\n", "Address of [37.7752096, -122.4227735] is: Rich Table, 199, Gough Street, Western Addition, San Francisco, San Francisco City and County, California, 94102, United States\n", "\n" ] } ], "source": [ "def get_address(lat, lng):\n", " #print('entering get address')\n", " try:\n", " #address = '{},{}'.format(lat, lng)\n", " address = [lat, lng]\n", " geolocator = Nominatim(user_agent=\"usa_explorer\")\n", " location = geolocator.geocode(address)\n", " #print(location[0])\n", " return location[0]\n", " except:\n", " return 'nothing found'\n", "\n", "\n", "addr = get_address(sf_center[0], sf_center[1])\n", "print('Reverse geocoding check')\n", "print('-----------------------')\n", "print('Address of [{}, {}] is: {}'.format(sf_center[0], sf_center[1], addr)) \n", "print(type(location[0]))" ] }, { "cell_type": "code", "execution_count": 146, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Getting Locations: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.\n" ] } ], "source": [ "print('Getting Locations: ', end='')\n", "addresses = []\n", "for lat, lon in zip(latitude, longitude):\n", " address = get_address(lat, lon)\n", " if address is None:\n", " address = 'NO ADDRESS'\n", " address = address.replace(', United States', '') \n", " addresses.append(address)\n", " print(' .', end='')\n", "print(' done.')" ] }, { "cell_type": "code", "execution_count": 180, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DirecciónLatitudeLongitudeXYDistance from centroid
0San Jose Avenue, Excelsior, San Francisco, San...37.723793-122.443598549033.4653394.175316e+065992.495307
1nothing found37.723760-122.436790549633.4653394.175316e+065840.376700
2335, Edinburgh Street, Excelsior, San Francisc...37.723727-122.429982550233.4653394.175316e+065747.173218
3John McLaren Park Playground, Burrows Street, ...37.723694-122.423174550833.4653394.175316e+065715.767665
4400, Yale Street, Portola, San Francisco, San ...37.723661-122.416365551433.4653394.175316e+065747.173218
\n", "
" ], "text/plain": [ " Dirección Latitude Longitude \\\n", "0 San Jose Avenue, Excelsior, San Francisco, San... 37.723793 -122.443598 \n", "1 nothing found 37.723760 -122.436790 \n", "2 335, Edinburgh Street, Excelsior, San Francisc... 37.723727 -122.429982 \n", "3 John McLaren Park Playground, Burrows Street, ... 37.723694 -122.423174 \n", "4 400, Yale Street, Portola, San Francisco, San ... 37.723661 -122.416365 \n", "\n", " X Y Distance from centroid \n", "0 549033.465339 4.175316e+06 5992.495307 \n", "1 549633.465339 4.175316e+06 5840.376700 \n", "2 550233.465339 4.175316e+06 5747.173218 \n", "3 550833.465339 4.175316e+06 5715.767665 \n", "4 551433.465339 4.175316e+06 5747.173218 " ] }, "execution_count": 180, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "df_locations = pd.DataFrame({'Dirección': addresses,\n", " 'Latitude': latitude,\n", " 'Longitude': longitude,\n", " 'X': xs,\n", " 'Y': ys,\n", " 'Distance from centroid': distances_from_center})\n", "\n", "df_locations.head()" ] }, { "cell_type": "code", "execution_count": 181, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(364, 6)" ] }, "execution_count": 181, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_locations.shape" ] }, { "cell_type": "code", "execution_count": 182, "metadata": {}, "outputs": [], "source": [ "df_locations.to_pickle('./Dataset/sf_locations.pkl') " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Foursquare" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we will use the Foursquare API to explore the number of restaurants available within these grids and we will limit the search to food categories to retrieve latitude and longitude data from restaurants and Italian restaurant." ] }, { "cell_type": "code", "execution_count": 183, "metadata": {}, "outputs": [], "source": [ "client_id = 'xxx'\n", "client_secret = 'xxx'\n", "VERSION = 'xxx'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We use the Foursquare API to explore the number of restaurants available within 4 km of downtown San Francisco and limit the search to all locations associated with the category of restaurants and especially those that correspond to Italian restaurants." ] }, { "cell_type": "code", "execution_count": 184, "metadata": {}, "outputs": [], "source": [ "food_category = '4d4b7105d754a06374d81259' \n", "\n", "sf_italian_categories = ['4bf58dd8d48988d110941735', '55a5a1ebe4b013909087cbb6', '55a5a1ebe4b013909087cb7c', '55a5a1ebe4b013909087cba7',\n", " '55a5a1ebe4b013909087cba1', '55a5a1ebe4b013909087cba4', '55a5a1ebe4b013909087cb95', '55a5a1ebe4b013909087cb89',\n", " '55a5a1ebe4b013909087cb9b', '55a5a1ebe4b013909087cb98', '55a5a1ebe4b013909087cbbf', '55a5a1ebe4b013909087cb79',\n", " '55a5a1ebe4b013909087cbb0', '55a5a1ebe4b013909087cbb3', '55a5a1ebe4b013909087cb74', '55a5a1ebe4b013909087cbaa',\n", " '55a5a1ebe4b013909087cb83', '55a5a1ebe4b013909087cb8c', '55a5a1ebe4b013909087cb92', '55a5a1ebe4b013909087cb8f',\n", " '55a5a1ebe4b013909087cb86', '55a5a1ebe4b013909087cbb9', '55a5a1ebe4b013909087cb7f', '55a5a1ebe4b013909087cbbc',\n", " '55a5a1ebe4b013909087cb9e', '55a5a1ebe4b013909087cbc2', '55a5a1ebe4b013909087cbad'] # 'Food' Catégorie de restaurants cafe\n", "\n" ] }, { "cell_type": "code", "execution_count": 185, "metadata": {}, "outputs": [], "source": [ "def is_restaurant(categories, specific_filter=None):\n", " restaurant_words = ['restaurant', 'sushi', 'hamburger', 'seafood']\n", " restaurant = False\n", " specific = False\n", " for c in categories:\n", " category_name = c[0].lower()\n", " category_id = c[1]\n", " for r in restaurant_words:\n", " if r in category_name:\n", " restaurant = True\n", " if 'Restaurante' in category_name:\n", " restaurant = False\n", " if not(specific_filter is None) and (category_id in specific_filter):\n", " specific = True\n", " restaurant = True\n", " return restaurant, specific\n", "\n", "def get_categories(categories):\n", " return [(cat['name'], cat['id']) for cat in categories]\n", "\n", "def format_address(location):\n", " address = ', '.join(location['formattedAddress'])\n", " address = address.replace(', USA', '')\n", " address = address.replace(', United States', '')\n", " return address\n", "\n", "def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=1000):\n", " version = '20180724'\n", " url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(\n", " client_id, client_secret, version, lat, lon, category, radius, limit)\n", " try:\n", " results = requests.get(url).json()['response']['groups'][0]['items']\n", " venues = [(item['venue']['id'],\n", " item['venue']['name'],\n", " get_categories(item['venue']['categories']),\n", " (item['venue']['location']['lat'], item['venue']['location']['lng']),\n", " format_address(item['venue']['location']),\n", " item['venue']['location']['distance']) for item in results] \n", " except:\n", " venues = []\n", " return venues" ] }, { "cell_type": "code", "execution_count": 186, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Restaurant Data Downloading\n", "Obtaining the candidates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.\n" ] } ], "source": [ "\n", "import pickle\n", "\n", "def get_restaurants(lats, lons):\n", " restaurants = {}\n", " sf_italian = {}\n", " location_restaurants = []\n", "\n", " print('Obtaining the candidates', end='')\n", " for lat, lon in zip(lats, lons):\n", " venues = get_venues_near_location(lat, lon, food_category, client_id, client_secret, radius=350, limit=100)\n", " area_restaurants = []\n", " for venue in venues:\n", " venue_id = venue[0]\n", " venue_name = venue[1]\n", " venue_categories = venue[2]\n", " venue_latlon = venue[3]\n", " venue_address = venue[4]\n", " venue_distance = venue[5]\n", " is_res, is_italian = is_restaurant(venue_categories, specific_filter=sf_italian_categories)\n", " if is_res:\n", " x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])\n", " restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_italian, x, y)\n", " if venue_distance<=300:\n", " area_restaurants.append(restaurant)\n", " restaurants[venue_id] = restaurant\n", " if is_italian:\n", " sf_italian[venue_id] = restaurant\n", " location_restaurants.append(area_restaurants)\n", " print(' .', end='')\n", " print(' done.')\n", " return restaurants, sf_italian, location_restaurants\n", "\n", "\n", "restaurants = {}\n", "sf_italian = {}\n", "location_restaurants = []\n", "loaded = False\n", "try:\n", " with open('/Dataset/restaurants_350.pkl', 'rb') as f:\n", " restaurants = pickle.load(f)\n", " print('Restaurant data loaded.')\n", " with open('/Dataset/sf_italian_350.pkl', 'rb') as f:\n", " caba_cafe = pickle.load(f)\n", " print('Descargando Datos de las Cafeterías')\n", " with open('/Dataset/location_restaurants_350.pkl', 'rb') as f:\n", " location_restaurants = pickle.load(f)\n", " print('Downloading data from San Francisco Restaurants')\n", " loaded = True\n", "except:\n", " print('Restaurant Data Downloading')\n", " pass\n", "\n", "\n", "if not loaded:\n", " restaurants, sf_italian, location_restaurants = get_restaurants(latitudes, longitudes)\n", " " ] }, { "cell_type": "code", "execution_count": 187, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": 188, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "**Results**\n", "Total Number of Restaurants: 1681\n", "Total Number of Italian restaurants: 118\n", "Percentage of Italian restaurants: 7.02%\n", "Average of Venues per grid: 4.052197802197802\n" ] } ], "source": [ "print('**Results**',)\n", "print('Total Number of Restaurants:', len(restaurants))\n", "print('Total Number of Italian restaurants:', len(sf_italian))\n", "print('Percentage of Italian restaurants: {:.2f}%'.format(len(sf_italian) / len(restaurants) * 100))\n", "print('Average of Venues per grid:', np.array([len(r) for r in location_restaurants]).mean())" ] }, { "cell_type": "code", "execution_count": 189, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "List of All Restaurants\n", "-----------------------\n", "('4ceec3b83b03f04de88d3bdc', \"Henry's Hunan Restaurant\", 37.72218603642267, -122.43659651808754, '4753 Mission St, San Francisco, CA 94112', 176, False, 549651.5465355547, 4175141.074954057)\n", "('4a0e123af964a520c2751fe3', 'Taquerias El Farolito', 37.72122961664814, -122.43739536867459, '4817 Mission St (at Onondaga St), San Francisco, CA 94112', 286, False, 549581.7824803684, 4175034.538650764)\n", "('546960f7498eac74bd5baf47', 'Tao Sushi', 37.721036775089686, -122.4376651904847, '4808 Mission At (Onondaga Ave), San Francisco, CA', 312, False, 549558.1316389883, 4175013.0004052045)\n", "('4b244110f964a520c76424e3', 'Taqueria Guadalajara', 37.7212324569519, -122.43763599260711, '4798 Mission St (at Onondaga Ave), San Francisco, CA 94112', 291, False, 549560.574468874, 4175034.726401493)\n", "('4a6b8478f964a520ecce1fe3', 'Mexico Tipico', 37.72501226746621, -122.43447912554541, '4581 Mission St (at Brazil Ave), San Francisco, CA 94112', 246, False, 549836.2556397481, 4175455.7650877447)\n", "('4a91a3faf964a520171b20e3', 'Beijing Restaurant 北京小馆', 37.723599683798, -122.43719187724251, '1801 Alemany Blvd (at Ocean Ave), San Francisco, CA 94112', 39, False, 549598.1357189683, 4175297.6010806696)\n", "('588e3e6632b072494c6cf57e', 'An Chi', 37.72343008519264, -122.43573516334256, '4683 Mission St, San Francisco, CA 94112', 99, False, 549726.6248046655, 4175279.5569969686)\n", "('4aff274cf964a5200b3522e3', 'Hawaiian Drive Inn #28', 37.72114068878443, -122.43738942911332, '4827 Mission St, San Francisco, CA 94112', 296, False, 549582.3652084664, 4175024.675411926)\n", "('57bd06c8cd10e903763a7664', 'Hwaro', 37.725637597880784, -122.43431782363075, '4516 Mission St, San Francisco, CA 94112', 322, False, 549850.0512717982, 4175525.230272441)\n", "('5941ec67e2ead1688f4f464a', 'El Gran Taco Loco', 37.724746, -122.43448300000001, '4591 Mission St, San Francisco, CA 94112', 230, False, 549836.0926191276, 4175426.2211156166)\n", "...\n", "Total: 1681\n" ] } ], "source": [ "print('List of All Restaurants')\n", "print('-----------------------')\n", "for r in list(restaurants.values())[:10]:\n", " print(r)\n", "print('...')\n", "print('Total:', len(restaurants))" ] }, { "cell_type": "code", "execution_count": 190, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "List of all Italian restaurants\n", "---------------------------\n", "('4be4bf122457a593e2b9aa15', 'Marche Club', 37.728095, -122.432397, '4346 Mission St (btwn Tingley St & Theresa St), San Francisco, CA 94112', 91, True, 550017.6701432205, 4175798.899217597)\n", "('4ef010c00e01e1fde2099099', 'Manzoni', 37.73467816914885, -122.43389799980405, '2790 Diamond St, San Francisco, CA 94131', 302, True, 549880.9832699064, 4176528.490363779)\n", "('5195394d498e344eeb952b4f', 'Trattoria Da Vittorio', 37.739295412112625, -122.46759110305597, '150 West Portal Ave, San Francisco, CA 94127', 151, True, 546909.2447572381, 4177023.347445145)\n", "('4be72d932457a593b8a6ad15', 'Spiazzo Ristorante', 37.74049906835031, -122.46611414213069, '33 West Portal Ave, San Francisco, CA 94127', 306, True, 547038.6154491554, 4177157.632339159)\n", "('4b2edd7df964a520a2e724e3', 'Vega', 37.7391742135669, -122.41743951497574, '419 Cortland Ave (btwn Bennington & Wool), San Francisco, CA 94110', 253, True, 551328.0990331663, 4177036.2170301196)\n", "('4ae4ff0cf964a520f49f21e3', 'VinoRosso', 37.73901245660888, -122.41534272358848, '629 Cortland Ave (at Anderson Street), San Francisco, CA 94110', 263, True, 551512.9563385877, 4177019.42214691)\n", "('49bed272f964a520e3541fe3', 'La Ciccia', 37.74200800946477, -122.42653101682663, '291 30th St (at Church), San Francisco, CA 94131', 311, True, 550525.1341258159, 4177345.6763315448)\n", "('58c6b74f730a925fc305a126', 'Ardiana', 37.74248738572593, -122.42650722060347, '1781 Church St, San Francisco, CA 94131', 306, True, 550526.9048224975, 4177398.875309537)\n", "('4b5fb718f964a5209dc929e3', 'Cafe Stefano', 37.74236536, -122.423196, '59 30th St (btw Mission & San Jose), San Francisco, CA 94110', 16, True, 550818.7219270115, 4177387.1293513896)\n", "('4be1d60c4283c9b68da754f8', 'South Beach Cafe', 37.74791482485267, -122.43318557739258, '800 Embarcadero, San Francisco, CA 94107', 84, True, 549934.8644184133, 4177997.458160931)\n", "...\n", "Total: 118\n" ] } ], "source": [ "print('List of all Italian restaurants')\n", "print('---------------------------')\n", "for r in list(sf_italian.values())[:10]:\n", " print(r)\n", "print('...')\n", "print('Total:', len(sf_italian))" ] }, { "cell_type": "code", "execution_count": 191, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Author Restaurants\n", "---------------------------\n", "Restaurants around location 101: \n", "Restaurants around location 102: Rainbow Cafe\n", "Restaurants around location 103: \n", "Restaurants around location 104: restaurante pressman@berman, Le Chateau De Bob\n", "Restaurants around location 105: \n", "Restaurants around location 106: Lolinda, Foreign Cinema, El Techo, Loló, Radio Habana Social Club, Naked Kitchen, Californios, Udupi Palace\n", "Restaurants around location 107: Heirloom Café, Bon, Nene, El Metate, flour + water, Sushi Hon, Mis Antojitos, El Porvenir Produce Market, Sasaki\n", "Restaurants around location 108: La Paz Restaurant Pupuseria, VBOWLS\n", "Restaurants around location 109: \n", "Restaurants around location 110: ChocolateLab\n" ] } ], "source": [ "print('Author Restaurants')\n", "print('---------------------------')\n", "for i in range(100, 110):\n", " rs = location_restaurants[i][:8]\n", " names = ', '.join([r[1] for r in rs])\n", " print('Restaurants around location {}: {}'.format(i+1, names))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All restaurants in the city of San Francisco are indicated in gray and those associated with Italian restaurants will be highlighted in red." ] }, { "cell_type": "code", "execution_count": 192, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 192, "metadata": {}, "output_type": "execute_result" } ], "source": [ "map_sf = folium.Map(location=sf_center, zoom_start=13, tiles=tileset, attr=attribution)\n", "folium.Marker(sf_center, popup='San Francisco').add_to(map_sf)\n", "for res in restaurants.values():\n", " lat = res[2]; lon = res[3]\n", " is_cafe = res[6]\n", " color = 'red' if is_cafe else 'grey'\n", " folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_sf)\n", "map_sf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we calculate the distance ** from the nearest Italian restaurant to each grid ** (not only those located less than 300 m away, since we also want to know the distance to the nearest center." ] }, { "cell_type": "code", "execution_count": 194, "metadata": {}, "outputs": [], "source": [ "distances_to_sf_italian = []\n", "\n", "for area_x, area_y in zip(xs, ys):\n", " min_distance = 100\n", " for res in sf_italian.values():\n", " res_x = res[7]\n", " res_y = res[8]\n", " d = calc_xy_distance(area_x, area_y, res_x, res_y)\n", " if d\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DirecciónLatitudeLongitudeXYDistance from centroidDistances to the Italian restaurant
0San Jose Avenue, Excelsior, San Francisco, San...37.723793-122.443598549033.4653394.175316e+065992.495307100.0
1nothing found37.723760-122.436790549633.4653394.175316e+065840.376700100.0
2335, Edinburgh Street, Excelsior, San Francisc...37.723727-122.429982550233.4653394.175316e+065747.173218100.0
3John McLaren Park Playground, Burrows Street, ...37.723694-122.423174550833.4653394.175316e+065715.767665100.0
4400, Yale Street, Portola, San Francisco, San ...37.723661-122.416365551433.4653394.175316e+065747.173218100.0
5Bowdoin Street, Portola, San Francisco, San Fr...37.723627-122.409557552033.4653394.175316e+065840.376700100.0
6717, Girard Street, Portola, San Francisco, Sa...37.723593-122.402749552633.4653394.175316e+065992.495307100.0
7Archbishop Riordan High School, Judson Avenue,...37.728524-122.453776548133.4653394.175835e+065855.766389100.0
8212, Judson Avenue, Ingleside, San Francisco, ...37.728492-122.446967548733.4653394.175835e+065604.462508100.0
9Samoan Assemblies of God, 1819, San Jose Avenu...37.728460-122.440159549333.4653394.175835e+065408.326913100.0
\n", "" ], "text/plain": [ " Dirección Latitude Longitude \\\n", "0 San Jose Avenue, Excelsior, San Francisco, San... 37.723793 -122.443598 \n", "1 nothing found 37.723760 -122.436790 \n", "2 335, Edinburgh Street, Excelsior, San Francisc... 37.723727 -122.429982 \n", "3 John McLaren Park Playground, Burrows Street, ... 37.723694 -122.423174 \n", "4 400, Yale Street, Portola, San Francisco, San ... 37.723661 -122.416365 \n", "5 Bowdoin Street, Portola, San Francisco, San Fr... 37.723627 -122.409557 \n", "6 717, Girard Street, Portola, San Francisco, Sa... 37.723593 -122.402749 \n", "7 Archbishop Riordan High School, Judson Avenue,... 37.728524 -122.453776 \n", "8 212, Judson Avenue, Ingleside, San Francisco, ... 37.728492 -122.446967 \n", "9 Samoan Assemblies of God, 1819, San Jose Avenu... 37.728460 -122.440159 \n", "\n", " X Y Distance from centroid \\\n", "0 549033.465339 4.175316e+06 5992.495307 \n", "1 549633.465339 4.175316e+06 5840.376700 \n", "2 550233.465339 4.175316e+06 5747.173218 \n", "3 550833.465339 4.175316e+06 5715.767665 \n", "4 551433.465339 4.175316e+06 5747.173218 \n", "5 552033.465339 4.175316e+06 5840.376700 \n", "6 552633.465339 4.175316e+06 5992.495307 \n", "7 548133.465339 4.175835e+06 5855.766389 \n", "8 548733.465339 4.175835e+06 5604.462508 \n", "9 549333.465339 4.175835e+06 5408.326913 \n", "\n", " Distances to the Italian restaurant \n", "0 100.0 \n", "1 100.0 \n", "2 100.0 \n", "3 100.0 \n", "4 100.0 \n", "5 100.0 \n", "6 100.0 \n", "7 100.0 \n", "8 100.0 \n", "9 100.0 " ] }, "execution_count": 195, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_locations.head(10)" ] }, { "cell_type": "code", "execution_count": 196, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Average distance in meters from the nearest coffee shop to each center: 98.57250001080786\n" ] } ], "source": [ "print('Average distance in meters from the nearest coffee shop to each center:', df_locations['Distances to the Italian restaurant'].mean())\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We use ** HeatMap with Mapbox to visualize the density of restaurants in the selected radio from downtown San Francisco. **" ] }, { "cell_type": "code", "execution_count": 197, "metadata": {}, "outputs": [], "source": [ "restaurant_latlons = [[res[2], res[3]] for res in restaurants.values()]\n", "\n", "italian_latlons = [[res[2], res[3]] for res in sf_italian.values()]" ] }, { "cell_type": "code", "execution_count": 198, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 198, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from folium import plugins\n", "from folium.plugins import HeatMap\n", "\n", "map_sf = folium.Map(location=sf_center, zoom_start=13, tiles=tileset, attr=attribution)\n", "HeatMap(restaurant_latlons).add_to(map_sf)\n", "folium.Marker(sf_center).add_to(map_sf)\n", "folium.Circle(sf_center, radius=1000, fill=False, color='white').add_to(map_sf)\n", "folium.Circle(sf_center, radius=2000, fill=False, color='blue').add_to(map_sf)\n", "folium.Circle(sf_center, radius=3000, fill=False, color='red').add_to(map_sf)\n", "map_sf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we present another visualization with a Heatmap of only Italian restaurants" ] }, { "cell_type": "code", "execution_count": 199, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 199, "metadata": {}, "output_type": "execute_result" } ], "source": [ "map_sf = folium.Map(location=sf_center, zoom_start=13, tiles=tileset, attr=attribution)\n", "HeatMap(italian_latlons).add_to(map_sf)\n", "folium.Marker(sf_center).add_to(map_sf)\n", "folium.Circle(sf_center, radius=1000, fill=False, color='white').add_to(map_sf)\n", "folium.Circle(sf_center, radius=2000, fill=False, color='blue').add_to(map_sf)\n", "folium.Circle(sf_center, radius=3000, fill=False, color='red').add_to(map_sf)\n", "map_sf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From the above maps, we found that most of the restaurants are scattered on the north side of the center of the area under study. We will focus on the areas with the lowest density to locate the candidates." ] }, { "cell_type": "code", "execution_count": 200, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 200, "metadata": {}, "output_type": "execute_result" } ], "source": [ "roi_x_min = sf_center_x - 2000\n", "roi_y_max = sf_center_y + 1000\n", "roi_width = 5000\n", "roi_height = 5000\n", "roi_center_x = roi_x_min + 1900\n", "roi_center_y = roi_y_max - 700\n", "roi_center_lon, roi_center_lat = xy_to_lonlat(roi_center_x, roi_center_y)\n", "roi_center = [roi_center_lat, roi_center_lon]\n", "map_caba = folium.Map(location=sf_center, zoom_start=13, tiles=tileset, attr=attribution)\n", "HeatMap(restaurant_latlons).add_to(map_sf)\n", "folium.Marker(sf_center).add_to(map_sf)\n", "folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.4).add_to(map_sf)\n", "map_sf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we build a grid again to locate the candidates and the main tourist attractions." ] }, { "cell_type": "code", "execution_count": 201, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2120 Locations with possible candidates.\n" ] } ], "source": [ "k = math.sqrt(3) / 2 \n", "x_step = 100\n", "y_step = 100 * k \n", "roi_y_min = roi_center_y - 2500\n", "\n", "roi_latitudes = []\n", "roi_longitudes = []\n", "roi_xs = []\n", "roi_ys = []\n", "for i in range(0, int(51/k)):\n", " y = roi_y_min + i * y_step\n", " x_offset = 50 if i%2==0 else 0\n", " for j in range(0, 51):\n", " x = roi_x_min + j * x_step + x_offset\n", " d = calc_xy_distance(roi_center_x, roi_center_y, x, y)\n", " if (d <= 2501):\n", " lon, lat = xy_to_lonlat(x, y)\n", " roi_latitudes.append(lat)\n", " roi_longitudes.append(lon)\n", " roi_xs.append(x)\n", " roi_ys.append(y)\n", "\n", "print(len(roi_latitudes), 'Locations with possible candidates.')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We calculate two more important things for each candidate location: the number of nearby restaurants ** (we will use a radius of 250 meters) ** and the distance to the nearest Italian restaurant." ] }, { "cell_type": "code", "execution_count": 216, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Generating the data of potential candidates... done.\n" ] } ], "source": [ "def count_restaurants_nearby(x, y, restaurants, radius=250): \n", " count = 0\n", " for res in restaurants.values():\n", " res_x = res[7]; res_y = res[8]\n", " d = calc_xy_distance(x, y, res_x, res_y)\n", " if d<=radius:\n", " count += 1\n", " return count\n", "\n", "def find_nearest_restaurant(x, y, restaurants):\n", " d_min = 100000\n", " for res in restaurants.values():\n", " res_x = res[7]; res_y = res[8]\n", " d = calc_xy_distance(x, y, res_x, res_y)\n", " if d<=d_min:\n", " d_min = d\n", " return d_min\n", "\n", "roi_restaurant_counts = []\n", "roi_italian_distances = []\n", "\n", "print('Generating the data of potential candidates... ', end='')\n", "for x, y in zip(roi_xs, roi_ys):\n", " count = count_restaurants_nearby(x, y, restaurants, radius=250)\n", " roi_restaurant_counts.append(count)\n", " distance = find_nearest_restaurant(x, y, sf_italian)\n", " roi_italian_distances.append(distance)\n", "print('done.')\n" ] }, { "cell_type": "code", "execution_count": 217, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
LatitudeLongitudeXYNearby RestaurantsDistance to nearby Italian restaurants
198837.795104-122.405581552333.4653394.183248e+0643161.782747
195537.794320-122.405020552383.4653394.183162e+0643104.505932
198737.795109-122.406717552233.4653394.183248e+0637242.223144
145137.785083-122.431214550083.4653394.182122e+0636322.730435
195437.794326-122.406156552283.4653394.183162e+0635202.651736
\n", "
" ], "text/plain": [ " Latitude Longitude X Y Nearby Restaurants \\\n", "1988 37.795104 -122.405581 552333.465339 4.183248e+06 43 \n", "1955 37.794320 -122.405020 552383.465339 4.183162e+06 43 \n", "1987 37.795109 -122.406717 552233.465339 4.183248e+06 37 \n", "1451 37.785083 -122.431214 550083.465339 4.182122e+06 36 \n", "1954 37.794326 -122.406156 552283.465339 4.183162e+06 35 \n", "\n", " Distance to nearby Italian restaurants \n", "1988 161.782747 \n", "1955 104.505932 \n", "1987 242.223144 \n", "1451 322.730435 \n", "1954 202.651736 " ] }, "execution_count": 217, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\n", "df_roi_locations = pd.DataFrame({'Latitude':roi_latitudes,\n", " 'Longitude':roi_longitudes,\n", " 'X':roi_xs,\n", " 'Y':roi_ys,\n", " 'Nearby Restaurants':roi_restaurant_counts,\n", " 'Distance to nearby Italian restaurants':roi_italian_distances})\n", "\n", "\n", "df_roi_locations.sort_values(by=['Nearby Restaurants'], ascending=False, inplace=True)\n", "\n", "df_roi_locations.head(5)" ] }, { "cell_type": "code", "execution_count": 218, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2120, 6)" ] }, "execution_count": 218, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_roi_locations.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we are going to ** filter ** these places: we are only interested in ** locations with no more than two restaurants within a radius of 250 meters and no Italian Restaurant within a perimeter of 400 meters. **" ] }, { "cell_type": "code", "execution_count": 219, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Places with no more than two restaurants nearby: 596\n", "Grids without Italian restaurants within 400 m.: 823\n", "Places with both conditions met: 356\n" ] } ], "source": [ "good_res_count = np.array((df_roi_locations['Nearby Restaurants']<=2))\n", "print('Places with no more than two restaurants nearby:', good_res_count.sum())\n", "\n", "good_ind_distance = np.array(df_roi_locations['Distance to nearby Italian restaurants']>=400)\n", "print('Grids without Italian restaurants within 400 m.:', good_ind_distance.sum())\n", "\n", "good_locations = np.logical_and(good_res_count, good_ind_distance)\n", "print('Places with both conditions met:', good_locations.sum())\n", "\n", "df_good_locations = df_roi_locations[good_locations]\n" ] }, { "cell_type": "code", "execution_count": 220, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 220, "metadata": {}, "output_type": "execute_result" } ], "source": [ "good_latitudes = df_good_locations['Latitude'].values\n", "good_longitudes = df_good_locations['Longitude'].values\n", "\n", "good_locations = [[lat, lon] for lat, lon in zip(good_latitudes, good_longitudes)]\n", "map_sf = folium.Map(location=sf_center, zoom_start=14, tiles=tileset, attr=attribution)\n", "HeatMap(restaurant_latlons).add_to(map_sf)\n", "folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.6).add_to(map_sf)\n", "folium.Marker(sf_center).add_to(map_sf)\n", "for lat, lon in zip(good_latitudes, good_longitudes):\n", " folium.CircleMarker([lat, lon], radius=2, color='purple', fill=True, fill_color='blue', fill_opacity=1).add_to(map_sf) \n", "map_sf" ] }, { "cell_type": "code", "execution_count": 215, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 215, "metadata": {}, "output_type": "execute_result" } ], "source": [ "map_sf = folium.Map(location=sf_center, zoom_start=14, tiles=tileset, attr=attribution)\n", "HeatMap(good_locations, radius=25).add_to(map_sf)\n", "folium.Marker(sf_center).add_to(map_sf)\n", "for lat, lon in zip(good_latitudes, good_longitudes):\n", " folium.CircleMarker([lat, lon], radius=2, color='purple', fill=True, fill_color='blue', fill_opacity=1).add_to(map_sf)\n", "map_sf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we are going to ** group ** these locations using a machine learning algorithm in this case K-medias to create ** 8 groups that contain good locations. ** These areas, their centers and addresses will be the final result of our analysis." ] }, { "cell_type": "code", "execution_count": 221, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 221, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.cluster import KMeans\n", "\n", "number_of_clusters = 8\n", "\n", "good_xys = df_good_locations[['X', 'Y']].values\n", "kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_xys)\n", "\n", "cluster_centers = [xy_to_lonlat(cc[0], cc[1]) for cc in kmeans.cluster_centers_]\n", "\n", "map_caba = folium.Map(location=sf_center, zoom_start=14, tiles=tileset, attr=attribution)\n", "HeatMap(restaurant_latlons).add_to(map_sf)\n", "folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.4).add_to(map_sf)\n", "folium.Marker(sf_center).add_to(map_sf)\n", "for lon, lat in cluster_centers:\n", " folium.Circle([lat, lon], radius=500, color='gray', fill=True, fill_opacity=0.25).add_to(map_sf) \n", "for lat, lon in zip(good_latitudes, good_longitudes):\n", " folium.CircleMarker([lat, lon], radius=2, color='purple', fill=True, fill_color='blue', fill_opacity=1).add_to(map_sf)\n", "map_sf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's look at these areas west and south of the city with a Heatmap, using shaded areas to indicate the 8 groups created:" ] }, { "cell_type": "code", "execution_count": 222, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 222, "metadata": {}, "output_type": "execute_result" } ], "source": [ "map_caba = folium.Map(location=sf_center, zoom_start=14, tiles=tileset, attr=attribution)\n", "folium.Marker(sf_center).add_to(map_sf)\n", "for lat, lon in zip(good_latitudes, good_longitudes):\n", " folium.Circle([lat, lon], radius=250, color='#00000000', fill=True, fill_color='#0066ff', fill_opacity=0.07).add_to(map_sf)\n", "for lat, lon in zip(good_latitudes, good_longitudes):\n", " folium.CircleMarker([lat, lon], radius=2, color='purple', fill=True, fill_color='blue', fill_opacity=1).add_to(map_sf)\n", "for lon, lat in cluster_centers:\n", " folium.Circle([lat, lon], radius=500, color='white', fill=False).add_to(map_sf) \n", "map_sf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we are going to list the candidate locations" ] }, { "cell_type": "code", "execution_count": 223, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==============================================================\n", "Addresses of recommended locations\n", "==============================================================\n", "\n", "nothing found => 2.1km from downtown San Francisco\n", "1049, Laguna Street, Western Addition City and County, California, 94115 => 0.7km from downtown San Francisco\n", "355, Buena Vista Avenue East, Haight-Ashbury City and County, California, 94117 of America => 1.7km from downtown San Francisco\n", "219, Saint Josephs Avenue, Western Addition City and County, California, 94115 => 1.8km from downtown San Francisco\n", "20th Street, Liberty Street Historic District City and County, California, 94143 => 1.9km from downtown San Francisco\n", "nothing found => 1.6km from downtown San Francisco\n", "2801, Pacific Avenue, Pacific Heights City and County, California, 94123 => 2.5km from downtown San Francisco\n", "2247, Octavia Street, Japantown City and County, California, 94109 => 2.0km from downtown San Francisco\n" ] } ], "source": [ "candidate_area_addresses = []\n", "print('==============================================================')\n", "print('Addresses of recommended locations')\n", "print('==============================================================\\n')\n", "for lon, lat in cluster_centers:\n", " addr = get_address(lat, lon)\n", " addr = addr.replace(', United States', '')\n", " addr = addr.replace(', San Francisco', '')\n", " addr = addr.replace(', USA', '')\n", " addr = addr.replace(', SF', '')\n", " addr = addr.replace(\"'\", '')\n", " candidate_area_addresses.append(addr) \n", " x, y = lonlat_to_xy(lon, lat)\n", " d = calc_xy_distance(x, y, sf_center_x, sf_center_y)\n", " print('{}{} => {:.1f}km from downtown San Francisco'.format(addr, ' '*(50-len(addr)), d/1000))\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Results\n", "\n" ] }, { "cell_type": "code", "execution_count": 224, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 224, "metadata": {}, "output_type": "execute_result" } ], "source": [ "map_sf = folium.Map(location=sf_center, zoom_start=14, tiles=tileset, attr=attribution)\n", "folium.Circle(sf_center, radius=50, color='red', fill=True, fill_color='red', fill_opacity=1).add_to(map_sf)\n", "for lonlat, addr in zip(cluster_centers, candidate_area_addresses):\n", " folium.Marker([lonlat[1], lonlat[0]], popup=addr).add_to(map_sf) \n", "for lat, lon in zip(good_latitudes, good_longitudes):\n", " folium.Circle([lat, lon], radius=250, color='#0000ff00', fill=True, fill_color='#0066ff', fill_opacity=0.05).add_to(map_sf)\n", "map_sf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above locations are quite close to downtown San Francisco and each of these locations has no more than two restaurants within a radius of 250 m, no Italian Restaurant 400 m away. Any of these establishments is a potential candidate for the new restaurant, at least considering the nearby competition. The K-means unsupervised learning algorithm has allowed us to group the 8 locations with an appropriate choice for interested parties to choose from the results presented below." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Conclusions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The objective of this project was to identify the areas of San Francisco near the center, with a small number of restaurants (especially Italian restaurants) to help stakeholders reduce the search for an optimal location for a new Italian restaurant.\n", "\n", "When calculating the distribution of restaurant density from the Foursquare API data, it is possible to generate a large collection of locations that meet certain basic requirements.\n", "\n", "This data was then grouped using machine learning algorithms (K-means) to create the main areas of interest (containing the greatest number of potential locations) and the addresses of these area centers were created. From this interpretation we can have a starting point for the final exploration by the interested parties.\n", "\n", "Interested parties will make the final decision on the optimal location of the restaurants based on the specific characteristics and locations of the neighborhood in each recommended area, taking into account additional factors such as the attractiveness of each location (proximity to a park or water), levels of noise / main roads. real estate availability, price, social and economic dynamics of each neighborhood, etc.\n", "\n", "Finally, a more complete analysis and future work should integrate data from other external databases." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# References" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. The Battle of Neighborhoods: Coursera Capstone Project\n", "\n", "2. Geopy Geocoders\n", "\n", "3. Foursquare API\n", "\n", "4. MapBox Location Data Visualization library for Jupyter Notebooks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 👍👍
I invite you to write me your ideas, your comments and above all share your opinions🌍
##" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }