{ "cells": [ { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "# Using Maps and Geo-data to identify the ideal implantation for a new super marker in Brussels." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "### Intro\n", "\n", "In this [blog](https://indepthdata.wordpress.com/), we will have :\n", "- Scrap a dynamic website to extract geographical position of super markets in the city of Brussels\n", "- Cleanup the extracted data to avoid duplications from several lists (based on distance information)\n", "- Compute the center of mass of the shops and use this to center a map\n", "- Draw the shop position on a map\n", "- Use the OSRM API and public server to compute the travel time (by car) between a specific point and all identified shops\n", "- Make a map of shop coverage, by coloring area according to the brand of the closest (in time) shop\n", "- Make a map of the travel time to the closest shop (in order to identify area that are far from existing shops)\n", "\n", "Putting all together, we will be able to identify the best implentation to open a new shop of a given brand.\n" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "### Typical imports\n", "\n", "Nothing very special here." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true, "deletable": true, "editable": true, "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": true } } } } }, "outputs": [], "source": [ "#needed for general operations\n", "import os, operator, time, random \n", "\n", "#needed for mathematical operations or drawing\n", "import numpy as np\n", "import matplotlib\n", "import matplotlib.pyplot as plt\n", "import matplotlib.patches as patches\n", "\n", "#needed for http dialog with RESTful API\n", "import json\n", "import urllib3\n", "http = urllib3.PoolManager()\n", "\n", "#needed for browser emulation (dynamic page scrapping) in hidden window\n", "from selenium import webdriver\n", "from selenium.webdriver.common.keys import Keys\n", "from pyvirtualdisplay import Display\n", "\n", "#needed for web page mining\n", "from bs4 import BeautifulSoup\n", "\n", "#needed for address to coordinate conversions and distance computations\n", "from geopy.geocoders import Nominatim\n", "from geopy.distance import great_circle\n", "geolocator = Nominatim()\n", "\n", "#needed to display map widget in notebook\n", "from ipyleaflet import Map,DrawControl, Circle, Rectangle, CircleMarker, LayerGroup, Marker" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "### Building a module to extract data from the Pagesdor web site.\n", "\n", "As the webpage are dynamically generated via javascrips, we are using Selenium to emulate a firefox browser while navigating on the pages d'or website. Also, only the 20 first results are loaded on the page by default. So we need to press a button to load following results until everything is loaded. The rest of the code is traditional beautifulSoup parsing." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false, "deletable": true, "editable": true, "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": true } } } } }, "outputs": [], "source": [ "def getPagesdorData(driver, topic, city):\n", " url = \"http://www.pagesdor.be/qn/business/advanced/where/\"+city+\"/what/\"+topic+\"/\"\n", " print(\"processing %s\" % url)\n", " driver.get(url)\n", " \n", " while(True): #click next button until all elements are loaded on the page\n", " try:\n", " nextElemButton = driver.find_element_by_css_selector(\".-next-result\")\n", " if(nextElemButton.is_displayed()):\n", " nextElemButton.click()\n", " print(\" - load next results\")\n", " else: break\n", " except: break\n", "\n", " resultsElem = driver.find_element_by_css_selector(\".cst-results\")\n", " resultsHTML = resultsElem.get_attribute(\"innerHTML\")\n", " resultsBS = BeautifulSoup( resultsHTML , \"lxml\") \n", " results = resultsBS.body.find_all(class_=\"-result list-item\", recursive=True) #get the block corresponding to each results\n", " data = []\n", " for result in results: \n", " fields = result.find_all(class_=\"row-fluid\", recursive=True)\n", " Name = fields[0].find_all(\"a\")[1].get_text(\"\", strip=True) \n", " Address = fields[1].get_text(\"\", strip=True)\n", " dic = {\"name\":Name, \"address\":Address}\n", " data += [dic] \n", " return data " ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "### Scrapping data for super markets in Brussels area\n", "\n", "- We use pyvirtualdisplay to emulate a display where the Selenium fake firefox browser can live\n", "- We then scrap pages d'or super marker data for each town zip code in brussels (data are aggregated into a single array)\n", "- We can then turn off the virtual display" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false, "deletable": true, "editable": true, "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": false } } } } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "processing http://www.pagesdor.be/qn/business/advanced/where/1000/what/super marché/\n", " - load next results\n", "processing http://www.pagesdor.be/qn/business/advanced/where/1030/what/super marché/\n", " - load next results\n", "processing http://www.pagesdor.be/qn/business/advanced/where/1040/what/super marché/\n", " - load next results\n", "processing http://www.pagesdor.be/qn/business/advanced/where/1050/what/super marché/\n", " - load next results\n", "processing http://www.pagesdor.be/qn/business/advanced/where/1060/what/super marché/\n", " - load next results\n", "processing http://www.pagesdor.be/qn/business/advanced/where/1070/what/super marché/\n", " - load next results\n", "processing http://www.pagesdor.be/qn/business/advanced/where/1080/what/super marché/\n", " - load next results\n", "processing http://www.pagesdor.be/qn/business/advanced/where/1081/what/super marché/\n", " - load next results\n", "processing http://www.pagesdor.be/qn/business/advanced/where/1082/what/super marché/\n", " - load next results\n", "processing http://www.pagesdor.be/qn/business/advanced/where/1083/what/super marché/\n", " - load next results\n", "processing http://www.pagesdor.be/qn/business/advanced/where/1090/what/super marché/\n", " - load next results\n", "processing http://www.pagesdor.be/qn/business/advanced/where/1140/what/super marché/\n", " - load next results\n", "processing http://www.pagesdor.be/qn/business/advanced/where/1150/what/super marché/\n", " - load next results\n", "processing http://www.pagesdor.be/qn/business/advanced/where/1160/what/super marché/\n", " - load next results\n", "processing http://www.pagesdor.be/qn/business/advanced/where/1170/what/super marché/\n", " - load next results\n", "processing http://www.pagesdor.be/qn/business/advanced/where/1180/what/super marché/\n", " - load next results\n", "processing http://www.pagesdor.be/qn/business/advanced/where/1190/what/super marché/\n", " - load next results\n", "processing http://www.pagesdor.be/qn/business/advanced/where/1200/what/super marché/\n", " - load next results\n", "processing http://www.pagesdor.be/qn/business/advanced/where/1210/what/super marché/\n", " - load next results\n", "processing http://www.pagesdor.be/qn/business/advanced/where/1700/what/super marché/\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "os.environ[\"PATH\"] += \":/home/loicus/Data/Soft/geckodriver\"\n", "display = Display(visible=0, size=(1024, 768))\n", "display.start()\n", "driver = webdriver.Firefox()\n", "driver.implicitly_wait(10) # seconds\n", "\n", "data = []\n", "data += getPagesdorData(driver, \"super marché\" , \"1000\" ) \n", "data += getPagesdorData(driver, \"super marché\" , \"1030\" ) \n", "data += getPagesdorData(driver, \"super marché\" , \"1040\" ) \n", "data += getPagesdorData(driver, \"super marché\" , \"1050\" ) \n", "data += getPagesdorData(driver, \"super marché\" , \"1060\" ) \n", "data += getPagesdorData(driver, \"super marché\" , \"1070\" ) \n", "data += getPagesdorData(driver, \"super marché\" , \"1080\" ) \n", "data += getPagesdorData(driver, \"super marché\" , \"1081\" ) \n", "data += getPagesdorData(driver, \"super marché\" , \"1082\" ) \n", "data += getPagesdorData(driver, \"super marché\" , \"1083\" ) \n", "data += getPagesdorData(driver, \"super marché\" , \"1090\" ) \n", "data += getPagesdorData(driver, \"super marché\" , \"1140\" ) \n", "data += getPagesdorData(driver, \"super marché\" , \"1150\" ) \n", "data += getPagesdorData(driver, \"super marché\" , \"1160\" ) \n", "data += getPagesdorData(driver, \"super marché\" , \"1170\" ) \n", "data += getPagesdorData(driver, \"super marché\" , \"1180\" ) \n", "data += getPagesdorData(driver, \"super marché\" , \"1190\" ) \n", "data += getPagesdorData(driver, \"super marché\" , \"1200\" ) \n", "data += getPagesdorData(driver, \"super marché\" , \"1210\" ) \n", "data += getPagesdorData(driver, \"super marché\" , \"1700\" ) #add dilbeek\n", " \n", "driver.quit()\n", "display.stop() " ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "### Data cross-cleaning\n", "\n", "- We keep only super market that are in the four main brands in Belgium: Delhaize, Carrefour, Colruyt and Lidl\n", "- We define a color code for these 4 brands (Red, Blue, Orange and Yellow, respectively)\n", "- We compute the GPS coordinates from the adress of the shop (using the geopy geolocator)\n", "- We make sure that all shops in our list are distant from each other by at least 10m. (This is to avoid counting the same shop twice).\n", "- We count the number of identified shops" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/plain": [ "107" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "colorMap = {\"Delhaize\":\"red\", \"Carrefour\":\"blue\", \"Colruyt\":\"DarkOrange\", \"Lidl\":\"yellow\"}\n", "\n", "cleanedData = []\n", "for d in data: \n", " if (\"delhaize\" in d[\"name\"].lower()):d[\"brand\"] = \"Delhaize\"\n", " elif(\"lidl\" in d[\"name\"].lower()):d[\"brand\"] = \"Lidl\"\n", " elif(\"colruyt\" in d[\"name\"].lower()):d[\"brand\"] = \"Colruyt\"\n", " elif(\"carrefour\" in d[\"name\"].lower()):d[\"brand\"] = \"Carrefour\"\n", " else: continue\n", " \n", " loc = geolocator.geocode(d[\"address\"])\n", " if(not loc): continue\n", " d[\"coord\"] = [loc.latitude, loc.longitude]\n", "\n", " minDist=1e99\n", " for d2 in cleanedData: \n", " dist = great_circle(d[\"coord\"], d2[\"coord\"]).meters\n", " if(dist10): #only consider shops that aren't too close to avoid duplicates\n", " cleanedData += [d]\n", " \n", "data = cleanedData\n", "len(data)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "### Shops center of mass\n", "\n", "- We compute the center of mass of our shop clouds (using even weight for each shop)\n", "- We will use this to center the map " ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false, "deletable": true, "editable": true, "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": false } } } } }, "outputs": [], "source": [ "averageLat = 0.0\n", "averageLon = 0.0\n", "averageCnt = 0\n", "for d in data: \n", " averageLat+=d[\"coord\"][0]\n", " averageLon+=d[\"coord\"][1]\n", " averageCnt+=1\n", " \n", "averageLat /= averageCnt \n", "averageLon /= averageCnt " ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "### Maps\n", "\n", "- We create three maps centered on the shop clouds\n", "- We fix the zoop level to a proper range\n", "- We don't display the maps here. (will me drawn later on)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false, "deletable": true, "editable": true, "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": false } } } } }, "outputs": [], "source": [ "m0 = Map(center=[averageLat, averageLon], zoom=13)\n", "m0.scroll_wheel_zoom =False\n", "\n", "m1 = Map(center=[averageLat, averageLon], zoom=13)\n", "m1.scroll_wheel_zoom =False\n", "\n", "m2 = Map(center=[averageLat, averageLon], zoom=13)\n", "m2.scroll_wheel_zoom =False" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "### Shop location\n", "\n", "- We draw shop locations on the Maps using circle marker.\n", "- The marker color is chosen according to the shop brand." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false, "deletable": true, "editable": true, "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": false } } } } }, "outputs": [], "source": [ "for d in data:\n", " if(\"coord\" not in d or d[\"coord\"]==None):continue\n", " color=colorMap.get(d[\"brand\"], \"black\")\n", " c = Circle(location=d[\"coord\"], radius=50, weight=1, fill_opacity=1.0, fill_color=color, color=\"black\")\n", " #c = Marker(location=d[\"coord\"], title=d[\"brand\"], rise_offset=100, fill_color=color, color=color)\n", " m0.add_layer(c)\n", " m1.add_layer(c)\n", " m2.add_layer(c) " ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "m0" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "### Color Legend\n", "\n", "- We use matplotlib to get the color legend associated to the brands" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false, "deletable": true, "editable": true, "extensions": { "jupyter_dashboards": { "version": 1, "views": { "grid_default": {}, "report_default": { "hidden": false } } } } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAApgAAAA9CAYAAAAJfxm9AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAADfVJREFUeJzt3X20XFV5x/Hvj1elQhAEpS1atBUoLqqVsmLREBURoS8W\nNYWgJL5U1FpEoTYghZSyiEItLdgUCiYIC20VLG0VeSkh2AipvFlRCiKFUCVBBBKxvJo8/ePZY04m\nM/fOvXfPTO7N77PWXZM758w+++Q+s89zzt5nH0UEZmZmZma1bDHsCpiZmZnZ1OIE08zMzMyqcoJp\nZmZmZlU5wTQzMzOzqpxgmpmZmVlVTjDNzMzMrConmGZmZmZWlRNMMzMzM6vKCaaZmZmZVeUE08zM\nzMyqcoJpZmZmZlU5wTQzMzOzqpxgmpmZmVlVTjDNzMzMrConmGZmZmZWlRNMMzMzM6tqq4l9XKuA\nF1apSX88BPGiQW5Q0jpgaUS8ocf1DwSuB+ZHxGmN95cCMyJicpwEaBLEQgw2FiZK0hxgMTA3Ii5u\nvH8/sC4iXtrHbS9lMsVfP316EsT28ZMjtgcRu5sbiU0+PiOYFPE5TJJ+FTgLmE7+PVdHxE7DrdXE\nTPTgsSkHNVSqn6R1ktb2uHqUn4mqVc6gbE6x0Px5StKPJN0q6QJJh0iqmZR1ioFBxEUA6wawnclg\ns4jtFkl7SjpX0h2SVkt6WtIPJX1F0nskbTOB4jfJNk3SnPJ9PnrYdRmHzSo+oe8xOnDlmPEvwCHA\nvwHzgU8Os041TPAKpnWwN/DEsCthfRVkAyBgS2BHYB/gncB7gVskHRUR9wythhP3LmC7YVfCBkvS\nKcApZGzfBFwH/JRMEmYCFwAfAPYfUhX7aZNMfm1DUzRG9yBzh/Mj4oPDrkwtTjAri4jvDbsO1n8R\n8Zft70naBTgXmAVcK2m/iPjxwCtXQUT8YNh1sMGSdBJ54rQCeEdE3NJhnUOB4wdctUHQsCtgo5vC\nMfpL5XXlUGtRmcdXVVa6WZZ0eH9XSZ+VtErSE5Jun6TdMdZFRDwMHAksBXYHTmpfR9LzJS2QdGeJ\ng9WS/l3Sm8a6PUnbSTpL0orSVX+PpI93WXeupMsk3Vu2u0bSMklHdVl/aRlP3HyvfXhA+88p/dpX\n6y9JLwFOBZ4BDu104AaIiCvJbrzmZ2dJ+nr5+z4h6duS5vXaTdnsni5DTK4vZa1t1a0sX9Tl8xvE\nqqSDy/qf7bL+NpJ+XIa2bC3peqBV9kWNeF4r6cW97IP133hjdJxt39oSG6dIuqu0r4sa5XWN10Y5\ne0q6SNIDpQt/laRLJb28bb115DEDYH6n9lTSDqUtvUvSk5IelXSVpDd2qP+Iwz065SiSWtudIWm2\npOWSHpf0P53K6JWvYA6ApJ3JS/m/AvwH8A1gN+DvgWtx18yUEREh6XSyq+ZI4GOtZeVgdQPwYjIO\nvgb8AvA7wFWS3h8RHQ+KHWwNXE3G0ZXAz4C3Ap+UtG2HK6wLge+U7a8EdgYOBS6R9PKIOLV9V9g4\nLud3qcvRZBfPz4eGVN5X67/3kDH1+Yj475FWjIhnW/+WdAYwD3gYuJTsqnwLcAZwsKSDI+JnPWw/\ngHeQicGVZNvYa3K3QaxGxDWS7gVmSTouIh5vW//twE7AWRHxrKTFwGPA7wNXAN9qlLu6xzpY/40r\nRhlf2wdwObAf2Xb9M/CjxvIR41XSIeXzW5FjKr8P/DJwOHCYpJkR0Yqz+WRuMJdMNJeW95eWsqYB\nNwJ7ATcDXwZeQPaUXSPpAxFxQZd96FVrn04ADip1XgJMG2M5G3CCORgLyAA6OyJOaL0p6TPA8mFV\nyvpmGZnw7SrpJRGxorx/MXll84iI+FJrZUk7kI3fOZL+tVwJHc0vkgfCgyLi6VLOacD3gI9KOiMi\nmmfU+0TEfc0CJG0FXAXMk3ReRIzYPdOc5aBRxlzgpeRJ0zmNRTX31frvAPIAs1HvSzeSppPJ5Qpg\n/9bfUtKJZKJ2GHnA6uVmBZGJ6Vsi4tqxVb2j84AzybHEC9uWvZ/c1wsAIuJiSaIkmM0ZG2yTMuYY\nLcbT9olMGPeJiMc6lNk1XiXtCHyBPNmaERF3N5b9OvCfwIVk8kpEnKacTWYuOQNNezt7Jjk+87yI\n+FCjrE8Bt5Jt6dUR8cDo/xUjEvB6YHpEfHuCZQHuIu+7EsizgceBv2gui4jbyLN+m0Ii4hngkfLr\nLgCS9gVmAJc3E66y/k/Irp/nAG8bw6aObSWXpZyHyTsRpwF7tm3jvrbPUq4s/R15orlRV8toSvfM\n+eTZ+VvLfvdrX62/diuvYxl7+17ygH9680QhItaRY+ACeN8YyruiUnIJOb3X08AxzTdL9+QMYElE\nfL/StmwwxhOj4237Aji5S3LZ0i1e5wA7kFMP3t1cEBF3kic2r5K012h1l7Q1cBSZP2ww5Coi7iVP\n6rche5FqOL9Wcgm+gjkIe5F34369Q1cN5GXwOQOtkQ1C66aBVlfFa8rrNEntXTIAu5bP7N1j+Ws6\nNZzA/5bX529QGWl38mrTG8gz8+c2FgfrB5n3pJyJXw78hBwP9Uhjce19tU3Tq8rr9e0LIuIeST8A\n9pC0fZe2r93NtSoWEY9K+iLwLknTI6LVU3QMGe/n1dqWbdom0PaNFo/dlk8vr6/s0v61xmDuDdw1\nyjb2JPOHZRHRabjGEuBk1n8XJyKo+B0EJ5iD0BrD8FCX5asGVREbDEnbkmO8IMemQY77AXhT+ekk\nyHGKveg2Nqw13m3LRn32IBuOaeR4yKuBNcBacujGHGDbHreLpBeS4462BQ7rcCWo9r5a/60kT4bH\ncqLRatu6Da1YSQ6T2JG8AjOa2m3hQvLKzjHAcuVNR0eTY+muqLwt678xx+hE2r6I6HbMbukWrzuT\nJ9CjXb1/3ijLobfvGOR3rIaq30EnmP23prx2m2zWTziYel5HfrdWNcbFtOLgIxHxmQHX53jyiubc\niLikuUDSEeTYn55Iei7wFTJxOCoivtFhtWHuq43PMvIKzxvJ7uVetP7OLwI6XU3frW29kYz0YInW\nHeLdjlcdD64R8U1Jt1Nu9iHHhO4MLGgbn2yTw3hitFrb12akeF1Tlu0bEd8dZ/nNsqB7ntDpO7aO\nTHA3+r6UG4ZGUvWGY4/B7L+7yLtrXylp+w7LX4/vIp8yys0CnyD/ps3xta0uutcNvFLwsvL65Q7L\nZtJj/JV9+wLwm8CfR8Q/dll1mPtq47MYeBZ422hjw7R++qHby+vMDuu8jLxr9r4y7nYiWuPgdu+w\nne1Z3+XYyUJyvO8c4I/Ig2/7HbeQV7RaD06wTdN4YrRK2zdGy8lYmlGhrLvJ/OE3yg2S7VqPpL6t\n8V7X7wvwWxXq1DMnmH1WBhNfShn021wmaT/yBiCbAiTtCvwTcCB5Z+2C1rKIuJXsojlc0ru7fP4V\nysnaa7u/vM5s296byRs1enU28HvARRFxRreVhryvNg5lpoP5ZHfhlZJe3Wm9Mv3K18qvi8gD6cmS\nXtBYZwvg02XZhRXq9lPyRP2AZmJRtnM2G46pa/d5cpzwx8nv5TURcX+H9VpjiD3v5SZqnDF6f3md\n2bbOWNu+sVhMDmE6VdJGCZ3Sgb0UVKZbauUPG0w9V07ijiXnBW1enb2FPJGaXXqcWuvvBHyKAV7Q\nchf5GJT50rr5YEQ81WXZSeRl/eNKwC0jp5mZBXyVnB7DJpHG4O0tWP+oyNeS87QtB94ZEY+2fWw2\n+VizCyUdS05XsZq80rNvKeM1rB+3CXWeMLIQeDdwmaTLgAeBVwBvBr4IHDFaASVujwWeBFZ2Gby+\nNCJuKP8ez77aEEXEAklbknf53yzpRvJg1XoM3wzg14BvlvVvknQm8KfAd0ps/R85fcs+5EnGX/W4\n+dHi/CwyWb1R0peAp8jen62A/yJjqtM+PSnpc2TsBjnrQSc3kVeKjivJcmss2jk93qBkAzDWGKVC\n29dF13gtN5i9nbxqulzSdcB3yfjbnWz3dqL3R/HOI3uDPixpf/Kmul3IeTifB/xxYyo8ImKVpEvJ\nRxd/S9JXyQT1UHKKuBo3BPVkognmQ/ThQfYVjTZIt1etjL/bVAABfIRs9DYamxERj0j6bXLy4d8F\nXk1e+j4GeIC8KtTprGIydZ1vbrHQesrCM+QNDCuAzwGXdZtqJSJ+WM66/4Scomc22SW3CrgT+Fvg\nji7b61aP0SsccYekmcDpZCPTOij/AXl15w+7lNd8b7vy+3OAE0eo0w1lm+PZ103V5hLbRMTpJYH7\nEJnAzSX/5o+Q864uoDH0IyLmSboN+DA55+TWwL3kMJG/7jLJ+pjbuohYnCM0+BjZDj9G3qjzCfJA\nPtLnF5EJ5kpyAulO5a+WdDiZuMxh/Q1ol9DbDUrDtNnEJ4wtRiu1fR2rMUodl5Tp2k4gk9nXkseK\nB8kT78u6lLlRuRHxWJlz9kRyovaPkif6y8mHBVzXoaz3kW3tkeT/0wPA35A9C7N62L8qFDGZchgz\nM7PelYcBLAJOi4j5w62N2ebDCaaZmU1JpTv1dnI+wT0i4sEhV8lss+ExmGZmNqVIOoC8sWMmOR70\nXCeXZoPlBNPMzKaag8hx0o8C/wD82XCrY7b5cRe5mZmZmVXleTDNzMzMrConmGZmZmZWlRNMMzMz\nM6vKCaaZmZmZVeUE08zMzMyqcoJpZmZmZlU5wTQzMzOzqpxgmpmZmVlVTjDNzMzMrConmGZmZmZW\nlRNMMzMzM6vKCaaZmZmZVeUE08zMzMyqcoJpZmZmZlU5wTQzMzOzqpxgmpmZmVlVTjDNzMzMrCon\nmGZmZmZWlRNMMzMzM6vKCaaZmZmZVeUE08zMzMyqcoJpZmZmZlU5wTQzMzOzqpxgmpmZmVlV/w9h\nwrzUBsw4AAAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "\n", "labels = list(colorMap.keys())\n", "colors = list(colorMap.values())\n", "\n", "plt.figure(figsize=(8, 0.5))\n", "legpatches = [ patches.Patch(color=color, label=label) for label, color in zip(labels, colors)]\n", "plt.legend(legpatches, labels, loc='center', frameon=False, ncol=4, prop={'size':20})\n", "plt.axis('off')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "### OSRM helper function\n", "\n", "- We create a helper function to compute OSRM time distance between points\n", "- This is based on a RESTful API deployed on the OSRM public server (free of charge, but without SLA)\n", "- The GET URL contains the list of GPS coordinates to evaluate\n", "- The Response is a JSON containing the distance matrix between sources and destiations\n", "- The number of points that we can test by request is limited, so there is a function to chunk our data in smaller pieces in order to do the computation per bunch\n", "- Function output is a vector containing the tested cell, the time to closest shop and the closest shop itself" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "\n", "def getRoutesChunk(data, cells): \n", " url = 'http://router.project-osrm.org/table/v1/driving/'\n", " count=0\n", " sources = []\n", " goodData = []\n", " for index,d in enumerate(data):\n", " if(\"coord\" not in d):continue \n", " if(index!=0): url+=\";\" \n", " url += \"%.8f,%.8f\" % (d[\"coord\"][1],d[\"coord\"][0])\n", " goodData +=[d] \n", " count +=1;\n", " \n", " for index,cell in enumerate(cells):\n", " center = ( (cell[1][1]+cell[0][1])*0.5 , (cell[1][0]+cell[0][0])*0.5 )\n", " url +=\";\"\n", " url += \"%.8f,%.8f\" % (center[0], center[1])\n", " sources +=[count] \n", " count +=1;\n", "\n", " url += \"?sources=\" \n", " for index,source in enumerate(sources):\n", " if(index!=0): url+=\";\"\n", " url += str(source) \n", " \n", " request = http.request('GET', url)\n", " if(request.status!=200):\n", " print(\"bad response: %s\" % str(request.status) )\n", " return []\n", " \n", " response = json.loads( request.data.decode('utf8'))\n", " request.release_conn()\n", " #print( json.dumps(response, sort_keys=True, indent=3) )\n", "\n", " toReturn = []\n", " cellTimes = response[\"durations\"] \n", " for index,cellTime in enumerate(cellTimes): \n", " timeToData = cellTime[:len(goodData)]\n", " min_index, min_time = min(enumerate(timeToData), key=operator.itemgetter(1)) \n", " toReturn += [ (cells[index],min_time,goodData[min_index]) ]\n", " \n", " return toReturn\n", "\n", "def getRoutes(data, cells, chunkSize=50):\n", " chunkCells = (cells[i:i+chunkSize] for i in range(0, len(cells), chunkSize))\n", " toReturn = []\n", " for chunk in chunkCells:\n", " toReturn += getRoutesChunk(data, chunk)\n", " return toReturn " ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "### Cell Grids \n", "\n", "- We split the map into a grid of cells\n", "- The finner the grid, the higher the resolution but the slower the computation\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "NedgesX = 150\n", "NedgesY = 75\n", "boundX = 0.1700\n", "boundY = 0.0500\n", "edgeX = boundX/NedgesX\n", "edgeY = boundY/NedgesY\n", "cells = []\n", "for i in range(0,NedgesY):\n", " for j in range(0,NedgesX):\n", " y = averageLat - boundY*0.5 + i*edgeY\n", " x = averageLon - boundX*0.5 + j*edgeX\n", " cells += [ [(y, x), (y+edgeY, x+edgeX)] ] " ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "### Compute Cells\n", "- for each cells we compute the time distance to shop using OSRM helper functions" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "results = getRoutes(data, cells, 50)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "### Display cells on map\n", "\n", "- We display each cell on the maps (either according to brand of closest shop or according to the time to the closest shop)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "for (cell,time,closest) in results:\n", " color=colorMap.get(closest[\"brand\"], \"black\")\n", " r1 = Rectangle(bounds=cell, weight=0, fill_opacity=0.40, fill_color=color, color=color) \n", " m1.add_layer(r1)\n", " \n", " color = matplotlib.colors.rgb2hex( np.array([1.0, 0.0, 0.0]) + max(0,min(1,time/300.0)) * np.array([0.0, 1.0, 1.0]) )\n", " r2 = Rectangle(bounds=cell, weight=0, fill_opacity=0.60, fill_color=color, color=color)\n", " m2.add_layer(r2) " ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "### We display the map\n", "\n", "- finaly!" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "m1" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "m2" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "deletable": true, "editable": true }, "source": [ "### Identify the best place for a new shop\n", "\n", "- We can see that there is a large area in the north-west of Anderlecht where there are currently no shops and where the travel time to the closest shop is therefore quite large (>5min)\n", "- This area is currently shared among Carrefour, Delhaize and Colruyt, so Lidl would be smart to implement a new shop there." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "# What's next\n", "\n", "- We should correlate these information with immo-data in order to find a good location/price commercial surface for renting\n", "- We should correlate there information with census data in order to open the shop in high population density area\n", "- We should redo these maps with different new shop implantation hypothesis and see how much these choice would impact the competitors \n", "- Etc... As usual so many things can be done from there" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "# Do you need help with your next shop business ?\n", "Feel free to contact me, I'd be happy to help\n", "\n", "Many other fun data science examples are available on my blog: https://indepthdata.wordpress.com/" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "extensions": { "jupyter_dashboards": { "activeView": "report_default", "version": 1, "views": { "grid_default": { "name": "grid", "type": "grid" }, "report_default": { "name": "report", "type": "report" } } } }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" }, "widgets": { "state": { "5ac66e25f0334dbd86c4fe4b0fa4934c": { "views": [ { "cell_index": 29 } ] }, "62fe6724227848888b12e4548fe2313b": { "views": [ { "cell_index": 16 } ] }, "8cf18bad256140f3b73fccf79f2025fa": { "views": [ { "cell_index": 28 } ] } }, "version": "1.2.0" } }, "nbformat": 4, "nbformat_minor": 2 }