{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# PoC of Recommender System for safe and efficient Food Deliveries during Infectious Disease induced Lockdowns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Course \"IBM Data Science\", Capstone Project, April 2020 by [Markus Mächler](https://www.linkedin.com/in/markus-maechler/)\n", "\n", "_written and tested with Jupyter Notebook 6.0.3 (with Python 3.7.6 on Win10x64 and MacOS 10.15.4)_" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Folium maps are not rendered on github. For the best experience, [please view this notebook with this link](https://nbviewer.jupyter.org/github/Funisher-code/Coursera_Capstone/blob/master/notebook/POC_food_delivery_recommender_system.ipynb).**\n", "\n", "----" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Major problems during a lockdown concerning shopping / food deliveries\n", "1. keep people at risk **safe but not hungry**\n", "2. keep shops and restaurants **up and running**\n", "3. use helpers as **efficiently and safely** as possible" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The purpose of this project is creating a small POC (proof of concept) to help tackling our three problems by creating a simple but efficient recommender system that could be used to place actual orders." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**For more background information, [read the full report here](https://github.com/Funisher-code/Coursera_Capstone/blob/master/report/COVID-19_Safe_And_Efficient_Food_Deliveries.md).**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data + Methodology" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Location Data\n", "\n", "For every order, **location data of all the three parties** is absolutely necessary. For this POC I'm using the following data sources.\n", "\n", "- customers: **hypothetical address of a customer in Zurich**\n", "- helpers: **hypothetical addresses in Zurich**\n", "- shops: **Foursquare location data** acquired via API in the vicinity of people at risk\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### install dependencies if not already installed" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# map rendering library\n", "\n", "#!conda install -c conda-forge folium=0.5.0 --yes # uncomment if not installed already" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# used to locate the coordinates of addresses, cities, countries, and landmarks across the globe\n", "# using third-party geocoders and other data sources\n", "\n", "#!conda install -c conda-forge geopy --yes # uncomment if not installed already" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# install python library to generate fake data\n", "\n", "#!conda install -c conda-forge faker --yes # uncomment if not installed already" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### import necessary libraries" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Libraries imported.\n" ] } ], "source": [ "from geopy.geocoders import Nominatim\n", "import geopy.distance\n", "import folium \n", "from collections import namedtuple\n", "import pandas as pd # for working with dataframes\n", "import numpy as np # for workig with matrizes\n", "from scipy import stats # for getting quick descriptive statistics\n", "import matplotlib.pyplot as plt # for plotting\n", "from matplotlib.colors import LinearSegmentedColormap # to define custom colored maps in our plots\n", "import requests # to handle requests\n", "import time # used to sleep for a defined amount of time\n", "from faker import Faker # to generate fake person data\n", "\n", "print('Libraries imported.')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### customer location data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's define an address of a customer to work with. This would simply be something a customer would have to specify upon signing up. I just picked one at random." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "custAddr = 'Gsteigstrasse 9, 8049 Zurich'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we define an instance of geocoder with user agent _food_explorer_.\n", "\n", "And we use it to convert our customer address to a location in terms of latitude and longitude." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The geograpical coordinate of Gsteigstrasse 9, 8049 Zurich are 47.4028839, 8.499580847726023.\n" ] } ], "source": [ "geolocator = Nominatim(user_agent=\"food_explorer\")\n", "location = geolocator.geocode(custAddr)\n", "custLatitude = location.latitude\n", "custLongitude = location.longitude\n", "print('The geograpical coordinate of {} are {}, {}.'.format(custAddr, custLatitude, custLongitude))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's quickly check if the coordinates reflect the addess of the customer." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# create map of Zurich using the latitude and longitude value generated by geocoder\n", "map_folium = folium.Map(location=[custLatitude, custLongitude], zoom_start=15)\n", "\n", "# add marker of our address to the map\n", "label = custAddr\n", "label = folium.Popup(label, parse_html=True)\n", "folium.CircleMarker(\n", " [custLatitude, custLongitude],\n", " radius=5,\n", " popup=label,\n", " color='red',\n", " parse_html=False).add_to(map_folium) \n", " \n", "map_folium" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, if you can't see the map above, that's totally normal github behaviour. [Please view this notebook with this link](https://nbviewer.jupyter.org/github/Funisher-code/Coursera_Capstone/blob/master/notebook/POC_food_delivery_recommender_system.ipynb).\n", "\n", "Back to Geolocator. That certainly looks like Zurich but let's zoom all in to make sure it's accurate." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# create map of Zurich using the latitude and longitude value generated by geocoder\n", "map_folium = folium.Map(location=[custLatitude, custLongitude], zoom_start=18)\n", "\n", "# add marker of our address to the map\n", "label = custAddr\n", "label = folium.Popup(label, parse_html=True)\n", "folium.CircleMarker(\n", " [custLatitude, custLongitude],\n", " radius=10,\n", " popup=label,\n", " color='red',\n", " parse_html=False).add_to(map_folium) \n", " \n", "map_folium" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nice, that's something we can work with.\n", "\n", "So customers signing up for the delivery service can simply specify their address as location data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### helper location data " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Just like with our customer we will use a Nominatom geolocator to convert our helper addresses to geolocations.\n", "\n", "Unlike with the freshly signed up customer, we will import a list I created with __fictional helper data__ in the vicinity of our customer.\n", "\n", "_TLDR alert, if you don't care how I got this data, leave the next session out [by clicking here to import the resulting csv](#tldr1)._\n", ".\n", "\n", ".\n", "\n", ".\n", "\n", ".\n", "\n", ".\n", "\n", ".\n", "\n", ".\n", "\n", ".\n", "\n", ".\n", "\n", ".\n", "\n", ".\n", "\n", "Or you can take the long way\n", "\n", ".\n", "\n", ".\n", "\n", ".\n", "\n", ".\n", "\n", ".\n", "\n", ".\n", "\n", "**Thanks for staying with me. Here's what I did:**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. query https://tel.search.ch/ for the postal code 8049, since that's where our customer lives.\n", "2. copy all search results from the website. that looks something like this:\n", "```\n", "name01, surname01\t\n", "name01, surname01\n", "street01 number01, 8049 Zürich\n", "044 123 45 67\n", "Details\n", "name02, surname02\t\n", "name02, surname02\n", "street02 number02, 8049 Zürich\n", "044 765 43 21 *\n", "Details\n", "```\n", "3. use some powerful editor: get rid of leading and trailing garbage, then convert to a usable csv format. I was on Win10 then and used notepad++ for the following:\n", " - search for \"\\n\" and replace with \";\" since \",\" is already widely used in the addresses\n", " - search for \"\\r\" and \"\\t\" and replace with nothing\n", " - search for \";Details;\" and replace with \"\\n\"\n", "\n", "\n", "4. what we now have resembles something like this:\n", "```\n", "name01, surname01;name01, surname01;street01 number01, 8049 Zürich;044 123 45 67\n", "name02, surname02;name02, surname02;street02 number02, 8049 Zürich;044 765 43 21 *\n", "...\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First we import the resulting file to a pandas dataframe. **I tried the following code:**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```python\n", "helperData=pd.read_csv('helperAddr_raw.csv', sep=';', header=None)\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This leads to the following error:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```python\n", "ParserError: Error tokenizing data. C error: Expected 4 fields in line 51, saw 5\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That didn't work. What happened? After inspection of line 51 I realized that there are cases, where people also specify professions. This results in an additional column.\n", "\n", "**This is just another example why carefully curing your raw data is both important and time consuming.**\n", "\n", "For our purpose I'll just skip rows that don't match the number of columns." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "b'Skipping line 4: expected 4 fields, saw 5\\nSkipping line 5: expected 4 fields, saw 5\\nSkipping line 8: expected 4 fields, saw 5\\nSkipping line 10: expected 4 fields, saw 5\\nSkipping line 11: expected 4 fields, saw 5\\nSkipping line 12: expected 4 fields, saw 5\\nSkipping line 14: expected 4 fields, saw 5\\nSkipping line 16: expected 4 fields, saw 5\\nSkipping line 17: expected 4 fields, saw 5\\nSkipping line 21: expected 4 fields, saw 5\\nSkipping line 23: expected 4 fields, saw 5\\nSkipping line 27: expected 4 fields, saw 5\\nSkipping line 28: expected 4 fields, saw 5\\nSkipping line 33: expected 4 fields, saw 5\\nSkipping line 39: expected 4 fields, saw 5\\nSkipping line 41: expected 4 fields, saw 5\\nSkipping line 42: expected 4 fields, saw 5\\nSkipping line 43: expected 4 fields, saw 5\\nSkipping line 44: expected 4 fields, saw 5\\nSkipping line 46: expected 4 fields, saw 5\\nSkipping line 48: expected 4 fields, saw 5\\nSkipping line 50: expected 4 fields, saw 5\\nSkipping line 53: expected 4 fields, saw 5\\nSkipping line 54: expected 4 fields, saw 5\\nSkipping line 61: expected 4 fields, saw 5\\nSkipping line 64: expected 4 fields, saw 5\\nSkipping line 71: expected 4 fields, saw 5\\nSkipping line 72: expected 4 fields, saw 5\\nSkipping line 73: expected 4 fields, saw 5\\nSkipping line 77: expected 4 fields, saw 5\\nSkipping line 78: expected 4 fields, saw 5\\nSkipping line 79: expected 4 fields, saw 5\\nSkipping line 80: expected 4 fields, saw 5\\nSkipping line 82: expected 4 fields, saw 5\\nSkipping line 84: expected 4 fields, saw 5\\nSkipping line 85: expected 4 fields, saw 5\\nSkipping line 87: expected 4 fields, saw 5\\nSkipping line 90: expected 4 fields, saw 5\\nSkipping line 92: expected 4 fields, saw 5\\nSkipping line 93: expected 4 fields, saw 5\\nSkipping line 94: expected 4 fields, saw 5\\nSkipping line 97: expected 4 fields, saw 5\\nSkipping line 100: expected 4 fields, saw 5\\nSkipping line 102: expected 4 fields, saw 5\\nSkipping line 104: expected 4 fields, saw 5\\nSkipping line 105: expected 4 fields, saw 5\\nSkipping line 106: expected 4 fields, saw 5\\nSkipping line 107: expected 4 fields, saw 5\\nSkipping line 108: expected 4 fields, saw 5\\nSkipping line 109: expected 4 fields, saw 5\\nSkipping line 110: expected 4 fields, saw 5\\nSkipping line 111: expected 4 fields, saw 5\\nSkipping line 113: expected 4 fields, saw 5\\nSkipping line 115: expected 4 fields, saw 5\\nSkipping line 117: expected 4 fields, saw 5\\nSkipping line 122: expected 4 fields, saw 5\\nSkipping line 134: expected 4 fields, saw 5\\nSkipping line 142: expected 4 fields, saw 5\\nSkipping line 151: expected 4 fields, saw 5\\nSkipping line 161: expected 4 fields, saw 5\\nSkipping line 163: expected 4 fields, saw 5\\nSkipping line 164: expected 4 fields, saw 5\\nSkipping line 166: expected 4 fields, saw 5\\nSkipping line 172: expected 4 fields, saw 5\\nSkipping line 175: expected 4 fields, saw 5\\nSkipping line 188: expected 4 fields, saw 5\\nSkipping line 189: expected 4 fields, saw 5\\nSkipping line 192: expected 4 fields, saw 5\\nSkipping line 199: expected 4 fields, saw 5\\nSkipping line 200: expected 4 fields, saw 5\\n'\n" ] } ], "source": [ "helperData=pd.read_csv('helperAddr_raw.csv', sep=';', header=None, error_bad_lines=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's quite some skipped lines. Let's see how many addresses we got:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(130, 4)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "helperData.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So after skipping quite some rows this resulted in 130 unique \"fictional helpers\".\n", "\n", "By the way, the **asterik after the phone number** signifies, that the holder of the number doesn't want to get bothered by ad calls.\n", "\n", "**Allthough this is all puplicly available information, let's get rid of all names and phone numbers.**\n", "\n", "Then we name the remaining column address and have a look at the first 5 rows." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
address
0Michelstrasse 6, 8049 Zürich
1Riedhofstrasse 277, 8049 Zürich
2Brunnwiesenstrasse. 78, 8049 Zürich
3Rütihofstrasse 32, 8049 Zürich
4Segantinistrasse 38, 8049 Zürich
\n", "
" ], "text/plain": [ " address\n", "0 Michelstrasse 6, 8049 Zürich\n", "1 Riedhofstrasse 277, 8049 Zürich\n", "2 Brunnwiesenstrasse. 78, 8049 Zürich\n", "3 Rütihofstrasse 32, 8049 Zürich\n", "4 Segantinistrasse 38, 8049 Zürich" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "helperData.drop(columns=[0,1,3], inplace=True)\n", "helperData.rename(columns={2:'address'}, inplace=True)\n", "helperData.head()" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "helperData.to_csv('helperData_addresses.csv',sep=';') # save dataframe to a csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "#### TLDR off\n", "\n", "If you don't want to perform the manual steps up above you may join again from here.\n", "\n", "Or if you accidentally missed the fun part or just made up your mind, you can [click here to go back](#tldr1back).\n", "\n", "If you're not interested in enriching the helper dataframe with geolocation and fake email data, [you can skip that and jump to the next part, where we're visualizing the helper and customer locations on a map.](#tldr1evenfurther).\n", "\n", "\n", "\n", "**Again, thanks for your interest and sticking with me.**\n", "\n", "Let's import the csv containing our addresses." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
address
0Michelstrasse 6, 8049 Zürich
1Riedhofstrasse 277, 8049 Zürich
2Brunnwiesenstrasse. 78, 8049 Zürich
3Rütihofstrasse 32, 8049 Zürich
4Segantinistrasse 38, 8049 Zürich
\n", "
" ], "text/plain": [ " address\n", "0 Michelstrasse 6, 8049 Zürich\n", "1 Riedhofstrasse 277, 8049 Zürich\n", "2 Brunnwiesenstrasse. 78, 8049 Zürich\n", "3 Rütihofstrasse 32, 8049 Zürich\n", "4 Segantinistrasse 38, 8049 Zürich" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "helperData=pd.read_csv('helperData_addresses.csv',sep=';') # import csv that resulted from manual data gathering\n", "helperData.drop(columns=['Unnamed: 0'], inplace=True) # drop the unnamed column with the row numbers\n", "helperData.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the next step, we us geolocator again to convert our helper addresses to locations.\n", "\n", "I had to find out that too many requests in a short time leads to timeouts and the for loop to crash all the time.\n", "But the **following combination of a try block and sleep function workes just fine**: " ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "starting geolocator conversion:\n", "row 1 converted: Michelstrasse 6, 8049 Zürich to 47.4045882 , 8.4956993\n", "row 2 converted: Riedhofstrasse 277, 8049 Zürich to 47.4068335 , 8.486364788336479\n", "row 3 converted: Brunnwiesenstrasse. 78, 8049 Zürich to 47.401818899999995 , 8.502904109747357\n", "row 4 converted: Rütihofstrasse 32, 8049 Zürich to 47.4139362 , 8.477824429475469\n", "row 5 converted: Segantinistrasse 38, 8049 Zürich to 47.405278499999994 , 8.500653449999996\n", "row 6 converted: Riedhofstrasse 63, 8049 Zürich to 47.4047016 , 8.492083950865386\n", "row 7 converted: Riedhofstrasse 299, 8049 Zürich to 47.4077043 , 8.4850702\n", "row 8 converted: Imbisbühlstrasse 17, 8049 Zürich to 47.403117 , 8.4936818\n", "row 9 converted: Limmattalstrasse 266, 8049 Zürich to 47.4023642 , 8.4938806\n", "row 10 converted: Im Wingert 12, 8049 Zürich to 47.4003858 , 8.509406051787845\n", "row 11 converted: Kettberg 5, 8049 Zürich to 47.398512049999994 , 8.511144443967313\n", "row 12 converted: Riedhofstrasse 29, 8049 Zürich to 47.40424195 , 8.493492214249404\n", "row 13 converted: Im Wingert 36, 8049 Zürich to 47.401517850000005 , 8.50783103406484\n", "row 14 converted: Eichholzweg 6, 8049 Zürich to 47.4035913 , 8.507237738482058\n", "row 15 converted: Nötzlistrasse 15, 8049 Zürich to 47.4048336 , 8.502146001864272\n", "row 16 converted: Giacomettistrasse 19, 8049 Zürich to 47.402628500000006 , 8.505656048990708\n", "row 17 converted: Limmattalstrasse 353, 8049 Zürich to 47.4023642 , 8.4938806\n", "row 18 converted: Regensdorferstrasse 29, 8049 Zürich to 47.40405615 , 8.49537580285676\n", "row 19 converted: Winzerhalde 46, 8049 Zürich to 47.400393699999995 , 8.490019776228419\n", "row 20 converted: Kappenbühlweg 11, 8049 Zürich to 47.40397745 , 8.498320080768725\n", "row 21 converted: Segantinistrasse 125, 8049 Zürich to 47.406653649999996 , 8.493464163600624\n", "row 22 converted: Kettberg 7, 8049 Zürich to 47.398474050000004 , 8.511562547187086\n", "row 23 converted: Limmattalstrasse 388, 8049 Zürich to 47.4023642 , 8.4938806\n", "row 24 converted: Segantinistrasse 141, 8049 Zürich to 47.40724525 , 8.49231699229448\n", "row 25 converted: Riedhofstrasse 260, 8049 Zürich to 47.4073525 , 8.487545992277301\n", "row 26 converted: Winzerhalde 46, 8049 Zürich to 47.400393699999995 , 8.490019776228419\n", "row 27 converted: Riedhofstrasse 378, 8049 Zürich to 47.40997865 , 8.481313518053021\n", "row 28 converted: Giacomettistrasse 16, 8049 Zürich to 47.40261315 , 8.504931210121436\n", "row 29 converted: Limmattalstrasse 155, 8049 Zürich to 47.4009218 , 8.501674200879306\n", "row 30 converted: Limmattalstrasse 289, 8049 Zürich to 47.4023642 , 8.4938806\n", "row 31 converted: Winzerstrasse 85, 8049 Zürich to 47.4001692 , 8.4942713\n", "row 32 converted: Ottenbergstrasse 52, 8049 Zürich to 47.3982443 , 8.5082464\n", "row 33 converted: Rebstockweg 10, 8049 Zürich to 47.40084375 , 8.499320839294402\n", "row 34 converted: Limmattalstrasse 29, 8049 Zürich to 47.39646655 , 8.510388795637967\n", "row 35 converted: Ackersteinstrasse 143, 8049 Zürich to 47.398583200000004 , 8.504129803030196\n", "row 36 converted: Obere Bläsistrasse 3, 8049 Zürich to 47.40407485 , 8.504589908836735\n", "row 37 converted: Grossmannstrasse 30, 8049 Zürich to 47.3965458 , 8.506668409076683\n", "row 38 converted: Geeringstrasse 37, 8049 Zürich to 47.414803649999996 , 8.4800326180024\n", "row 39 converted: Naglerwiesenstrasse 52, 8049 Zürich to 47.4129728 , 8.481258365051458\n", "row 40 converted: Limmattalstrasse 5, 8049 Zürich to 47.3959048 , 8.51437249405286\n", "row 41 converted: Imbisbühlhalde 11, 8049 Zürich to 47.407041050000004 , 8.483379649989264\n", "row 42 converted: Segantinistrasse 114, 8049 Zürich to 47.406776750000006 , 8.494158042122827\n", "row 43 converted: Ottenbergstrasse 23, 8049 Zürich to 47.39691345 , 8.511821033987438\n", "row 44 converted: Ferdinand-Hodler-Strasse 10, 8049 Zürich to 47.40406205 , 8.5014806\n", "row 45 converted: Am Börtli 6, 8049 Zürich to 47.399609 , 8.508015663133941\n", "row 46 converted: Heizenholz 45, 8049 Zürich to 47.411592299999995 , 8.485578726437662\n", "row 47 converted: Heizenholz 45, 8049 Zürich to 47.411592299999995 , 8.485578726437662\n", "row 48 converted: Limmattalstrasse 163, 8049 Zürich to 47.40139945 , 8.50031672341791\n", "row 49 converted: Riedhofweg 35, 8049 Zürich to 47.4094958 , 8.484568246313057\n", "row 50 converted: Appenzellerstrasse 11, 8049 Zürich to 47.39936925 , 8.50878727210031\n", "row 51 converted: Segantinistrasse 216, 8049 Zürich to 47.4055211 , 8.4990666\n", "row 52 converted: Limmattalstrasse 55, 8049 Zürich to 47.397697300000004 , 8.507149700000003\n", "row 53 converted: Grossmannstrasse 30, 8049 Zürich to 47.3965458 , 8.506668409076683\n", "row 54 converted: Segantinisteig 2, 8049 Zürich to 47.40659025 , 8.492463546329287\n", "row 55 converted: Kürbergstrasse 34, 8049 Zürich to 47.39851575 , 8.50836297095779\n", "row 56 converted: Riedhofstrasse 366, 8049 Zürich to 47.4099269 , 8.482156201366113\n", "row 57 converted: Schärrergasse 3, 8049 Zürich to 47.4024113 , 8.497628509978188\n", "row 58 converted: Rütihofstrasse 2, 8049 Zürich to 47.41447625 , 8.479546130169688\n", "row 59 converted: Konrad-Ilg-Strasse 22, 8049 Zürich to 47.406844050000004 , 8.478812911455975\n", "row 60 converted: Im oberen Boden 11, 8049 Zürich to 47.4158279 , 8.4810428\n", "row 61 converted: Hohenklingenstrasse 28, 8049 Zürich to 47.40206635 , 8.492932251058683\n", "row 62 converted: Ackersteinstrasse 161, 8049 Zürich to 47.3996702 , 8.502347348296613\n", "row 63 converted: Im Maas 8, 8049 Zürich to 47.40202845 , 8.503932799627133\n", "row 64 converted: Reinhold-Frei-Strasse 27, 8049 Zürich to 47.40922205 , 8.484806457645165\n", "row 65 converted: Konrad-Ilg-Strasse 22, 8049 Zürich to 47.406844050000004 , 8.478812911455975\n", "row 66 converted: Riedhofstrasse 366, 8049 Zürich to 47.4099269 , 8.482156201366113\n", "row 67 converted: Rebbergstrasse 49, 8049 Zürich to 47.397550949999996 , 8.512940485974205\n", "row 68 converted: Ferdinand-Hodler-Strasse 15, 8049 Zürich to 47.404049 , 8.5007635\n", "row 69 converted: Kappenbühlweg 9, 8049 Zürich to 47.4035729 , 8.4986268\n", "row 70 converted: Riedhofstrasse 366, 8049 Zürich to 47.4099269 , 8.482156201366113\n", "row 71 converted: Riedhofstrasse 366, 8049 Zürich to 47.4099269 , 8.482156201366113\n", "row 72 converted: Hardeggstrasse 11, 8049 Zürich to 47.39591585 , 8.503814796285795\n", "row 73 converted: Winzerstrasse 9, 8049 Zürich to 47.4001692 , 8.4942713\n", "row 74 converted: Rütihofstrasse 32, 8049 Zürich to 47.4139362 , 8.477824429475469\n", "row 75 converted: Winzerstrasse 17, 8049 Zürich to 47.4001692 , 8.4942713\n", "row 76 converted: Segantinistrasse 116, 8049 Zürich to 47.40698605 , 8.494096435663089\n", "row 77 converted: Limmattalstrasse 387, 8049 Zürich to 47.4023642 , 8.4938806\n", "row 78 converted: Limmattalstrasse 44, 8049 Zürich to 47.3967363 , 8.510600824286353\n", "row 79 converted: Grossmannstrasse 25, 8049 Zürich to 47.3957288 , 8.507356001739868\n", "row 80 converted: Ackersteinstrasse 72, 8049 Zürich to 47.39660415 , 8.508825040147928\n", "row 81 converted: Limmattalstrasse 385, 8049 Zürich to 47.4023642 , 8.4938806\n", "row 82 converted: Limmattalstrasse 47, 8049 Zürich to 47.3972976 , 8.507906203219179\n", "row 83 converted: Ferdinand-Hodler-Strasse 3, 8049 Zürich to 47.4033394 , 8.502052907946574\n", "row 84 converted: Rütihofstrasse 27, 8049 Zürich to 47.4134899 , 8.479216234892291\n", "row 85 converted: Limmattalstrasse 65, 8049 Zürich to 47.39809405 , 8.506421018270299\n", "row 86 converted: Bergellerstrasse 41, 8049 Zürich to 47.4065698 , 8.4918502\n", "row 87 converted: Lachenacker 23, 8049 Zürich to 47.406785150000005 , 8.489951210345343\n", "row 88 converted: Michelstrasse 42, 8049 Zürich to 47.406724 , 8.495634063234608\n", "row 89 converted: Segantinistrasse 67, 8049 Zürich to 47.4055331 , 8.498280308181073\n", "row 90 converted: Kappenbühlweg 11, 8049 Zürich to 47.40397745 , 8.498320080768725\n", "row 91 converted: Heizenholz 32, 8049 Zürich to 47.411048050000005 , 8.487030915985708\n", "row 92 converted: Michelstrasse 52, 8049 Zürich to 47.40735665 , 8.49514231967135\n", "row 93 converted: Segantinistrasse 141, 8049 Zürich to 47.40724525 , 8.49231699229448\n", "row 94 converted: Wildenstrasse 15, 8049 Zürich to 47.4074537 , 8.488328928012049\n", "row 95 converted: Ackersteinstrasse 12, 8049 Zürich to 47.39557175 , 8.514209950400817\n", "row 96 converted: Bläsistrasse 49, 8049 Zürich to 47.40327485 , 8.50388585112654\n", "row 97 converted: Imbisbühlstrasse 159, 8049 Zürich to 47.4035273 , 8.4925379\n", "row 98 converted: Rütihofstrasse 3, 8049 Zürich to 47.41406945 , 8.480366719245135\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "row 99 converted: Reinhold-Frei-Strasse 67, 8049 Zürich to 47.4105669 , 8.483620068895908\n", "row 100 converted: Riedhofstrasse 41, 8049 Zürich to 47.4039369 , 8.4929601\n", "row 101 converted: Rütihofstrasse 31, 8049 Zürich to 47.4135121 , 8.478689754897568\n", "row 102 converted: Regensdorferstrasse 157, 8049 Zürich to 47.4034924 , 8.4973495\n", "row 103 converted: Im oberen Boden 72, 8049 Zürich to 47.416623 , 8.481363723023257\n", "row 104 converted: Segantinistrasse 63, 8049 Zürich to 47.4054201 , 8.498685540823669\n", "row 105 converted: Rebbergstrasse 65, 8049 Zürich to 47.3977891 , 8.5115899\n", "row 106 converted: Appenzellerstrasse 55, 8049 Zürich to 47.4006704 , 8.507789952394452\n", "row 107 converted: Kappenbühlweg 11, 8049 Zürich to 47.40397745 , 8.498320080768725\n", "row 108 converted: Heizenholz 45, 8049 Zürich to 47.411592299999995 , 8.485578726437662\n", "row 109 converted: Frankentalerstrasse 35, 8049 Zürich to 47.40705455 , 8.479552765429101\n", "row 110 converted: Bauherrenstrasse 48, 8049 Zürich to 47.401067999999995 , 8.498660900058468\n", "row 111 converted: Am Wasser 60, 8049 Zürich to 47.3960814 , 8.5061461\n", "row 112 converted: Frankentalerstrasse 35, 8049 Zürich to 47.40705455 , 8.479552765429101\n", "row 113 converted: Am Wasser 135, 8049 Zürich to 47.3991721 , 8.49784505\n", "row 114 converted: Engadinerweg 3, 8049 Zürich to 47.40721385 , 8.494296400462744\n", "row 115 converted: Riedhofstrasse 291, 8049 Zürich to 47.4075104 , 8.4859317\n", "row 116 converted: Ackersteinstrasse 150, 8049 Zürich to 47.3995796 , 8.503456578133632\n", "row 117 converted: Im Stelzenacker 17, 8049 Zürich to 47.4164693 , 8.482322248287408\n", "row 118 converted: Imbisbühlstrasse 91, 8049 Zürich to 47.4035273 , 8.4925379\n", "row 119 converted: Am Wasser 104C, 8049 Zürich to 47.3972397 , 8.5029985\n", "row 120 converted: Rütihofstrasse 45, 8049 Zürich to 47.41304865 , 8.4780279\n", "row 121 converted: Rütihofstrasse 14, 8049 Zürich to 47.41509235 , 8.47869765\n", "row 122 converted: Segantinistrasse 93, 8049 Zürich to 47.4062361 , 8.495016249999999\n", "row 123 converted: Bäulistrasse 26A, 8049 Zürich to 47.3986874 , 8.502970441951632\n", "row 124 converted: Am Börtli 10, 8049 Zürich to 47.4000024 , 8.508161923884499\n", "row 125 converted: Giblenstrasse 48, 8049 Zürich to 47.4122174 , 8.481714491018776\n", "row 126 converted: Riedhofweg 4, 8049 Zürich to 47.408466 , 8.484371898020221\n", "row 127 converted: Kürbergstrasse 50, 8049 Zürich to 47.398748499999996 , 8.510023075722788\n", "row 128 converted: Riedhofstrasse 70, 8049 Zürich to 47.40531435 , 8.492104419535483\n", "row 129 converted: Winzerhalde 85, 8049 Zürich to 47.4024914 , 8.485379154222267\n", "row 130 converted: Gsteigstrasse 31, 8049 Zürich to 47.4030586 , 8.5019213\n" ] } ], "source": [ "helpLat=[] # list to store latitutes\n", "helpLong= [] # list to store longitudes\n", "location=None # variable to check if geolocator returned a results\n", "count=0 # counter for converted addresses \n", "\n", "print('starting geolocator conversion:')\n", "for tmpAddr in helperData['address']:\n", " while(location is None): # try to use geolocator until a result is returned\n", " try: \n", " location = geolocator.geocode(tmpAddr) # try converting address to gps position\n", " count+=1\n", " helpLat.append(location.latitude) # append latitude to list\n", " helpLong.append(location.longitude) # append longitude to list\n", " print('row',count,'converted:',tmpAddr,'to',location.latitude,',',location.longitude)\n", " except: # in case geocoder times out, wait half a second \n", " #print('.',end='') # uncomment to see how many attempts failed\n", " time.sleep(0.5) #wait half a second\n", " location=None" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of elements in the list helpLat: 130\n", "Number of elements in the list helpLong: 130\n" ] } ], "source": [ "print('Number of elements in the list helpLat: ',len(helpLat))\n", "print('Number of elements in the list helpLong:',len(helpLong))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Seems to have worked.\n", "\n", "Now let's **add the resulting geoposition rows to our helper dataframe**, name the columns and have a look at the resulting dataframe." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
addresslatitudelongitude
helperId
0Michelstrasse 6, 8049 Zürich47.4045888.495699
1Riedhofstrasse 277, 8049 Zürich47.4068338.486365
2Brunnwiesenstrasse. 78, 8049 Zürich47.4018198.502904
3Rütihofstrasse 32, 8049 Zürich47.4139368.477824
4Segantinistrasse 38, 8049 Zürich47.4052788.500653
\n", "
" ], "text/plain": [ " address latitude longitude\n", "helperId \n", "0 Michelstrasse 6, 8049 Zürich 47.404588 8.495699\n", "1 Riedhofstrasse 277, 8049 Zürich 47.406833 8.486365\n", "2 Brunnwiesenstrasse. 78, 8049 Zürich 47.401819 8.502904\n", "3 Rütihofstrasse 32, 8049 Zürich 47.413936 8.477824\n", "4 Segantinistrasse 38, 8049 Zürich 47.405278 8.500653" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "helperData['latitude']=helpLat\n", "helperData['longitude']=helpLong\n", "helperData.index.name='helperId'\n", "helperData.head()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
addresslatitudelongitude
helperId
125Riedhofweg 4, 8049 Zürich47.4084668.484372
126Kürbergstrasse 50, 8049 Zürich47.3987488.510023
127Riedhofstrasse 70, 8049 Zürich47.4053148.492104
128Winzerhalde 85, 8049 Zürich47.4024918.485379
129Gsteigstrasse 31, 8049 Zürich47.4030598.501921
\n", "
" ], "text/plain": [ " address latitude longitude\n", "helperId \n", "125 Riedhofweg 4, 8049 Zürich 47.408466 8.484372\n", "126 Kürbergstrasse 50, 8049 Zürich 47.398748 8.510023\n", "127 Riedhofstrasse 70, 8049 Zürich 47.405314 8.492104\n", "128 Winzerhalde 85, 8049 Zürich 47.402491 8.485379\n", "129 Gsteigstrasse 31, 8049 Zürich 47.403059 8.501921" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "helperData.tail()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we **create and add some fake email addresses** for our helpers:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "# create a list of fake names and email addresses\n", "fake = Faker(['de_DE']) \n", "helperEmail=[]\n", "for i in range(len(helperData)):\n", " helperEmail.append(fake.email())" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['olena23@web.de',\n", " 'abdul37@lorch.de',\n", " 'ron06@etzler.org',\n", " 'stanislaw11@gorlitz.com',\n", " 'dorleboerner@aol.de']" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# check result\n", "helperEmail[:5]" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "# add fake email addresses to our dataframe\n", "helperData['email']=helperEmail" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Have a final look at our dataframe:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
addresslatitudelongitudeemail
helperId
0Michelstrasse 6, 8049 Zürich47.4045888.495699olena23@web.de
1Riedhofstrasse 277, 8049 Zürich47.4068338.486365abdul37@lorch.de
2Brunnwiesenstrasse. 78, 8049 Zürich47.4018198.502904ron06@etzler.org
3Rütihofstrasse 32, 8049 Zürich47.4139368.477824stanislaw11@gorlitz.com
4Segantinistrasse 38, 8049 Zürich47.4052788.500653dorleboerner@aol.de
\n", "
" ], "text/plain": [ " address latitude longitude \\\n", "helperId \n", "0 Michelstrasse 6, 8049 Zürich 47.404588 8.495699 \n", "1 Riedhofstrasse 277, 8049 Zürich 47.406833 8.486365 \n", "2 Brunnwiesenstrasse. 78, 8049 Zürich 47.401819 8.502904 \n", "3 Rütihofstrasse 32, 8049 Zürich 47.413936 8.477824 \n", "4 Segantinistrasse 38, 8049 Zürich 47.405278 8.500653 \n", "\n", " email \n", "helperId \n", "0 olena23@web.de \n", "1 abdul37@lorch.de \n", "2 ron06@etzler.org \n", "3 stanislaw11@gorlitz.com \n", "4 dorleboerner@aol.de " ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "helperData.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks good to me.\n", "\n", "How about saving our work to a csv in case anything goes wrong:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "helperData.to_csv('helperData.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**With that, our manual data gathering part for our ficticious helpers and customer is done.**\n", "\n", "----" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### customers and helper geolocations visualized on a map\n", "Now let's visualize the helpers and our customer on the map.\n", "\n", "The customer is represented with a red circle and the possible helpers with blue circles." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "# helperData = pd.read_csv('helperData.csv') # uncomment in case you want to proceed from here" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# create map of Zurich using the latitude and longitude values generated by geocoder\n", "map_folium = folium.Map(location=[custLatitude, custLongitude], zoom_start=15)\n", "\n", "# add a red cirle for the position of our customer to the map\n", "label = custAddr\n", "label = folium.Popup(label, parse_html=True)\n", "folium.CircleMarker(\n", " [custLatitude, custLongitude],\n", " radius=10,\n", " popup=label,\n", " color='red',\n", " parse_html=False).add_to(map_folium) \n", "\n", "# add smaller blue circles for helpers\n", "for email, lat, lng in zip(helperData['email'], helperData['latitude'], helperData['longitude']): \n", " label = email\n", " label = folium.Popup(label, parse_html=True)\n", " folium.CircleMarker(\n", " [lat, lng],\n", " radius=3,\n", " popup=label,\n", " color='blue',\n", " fill=True,\n", " fill_color='#1234cc',\n", " fill_opacity=0.7,\n", " parse_html=False).add_to(map_folium) \n", "\n", "map_folium" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Very nice. Let's hope that we have that many helpers in the real world." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### shops - get Foursquare location data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Right now we have a customer and possible helpers. But we still need positions of shops or restaurants where the helpers can fetch the goods and bring them to the customer. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For that will use realtime data we get by querying the Foursquare API. So we're finally done with fake data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First of all, we're gonna read the necessary Foursquare credentials and version **from the local File \"cred_foursquare.json\"**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "this json could resemble the following structure:\n", "```json\n", "{\n", " \"name\": \"Password JSON\",\n", " \"version\": \"1.0.0\",\n", " \"description\": \"\",\n", " \"command\": \"\",\n", " \"log\": \"\",\n", " \"location\": \"\",\n", " \"timeout\": \"0\",\n", " \"commandargs\": \"\",\n", " \"keys\":\n", " [\n", " {\n", " \"scriptkey\": \"VERSION\",\n", " \"scriptvalue\": \"20180605\",\n", " \"scriptdefaultvalue\": \"\"\n", " },\n", " {\n", " \"scriptkey\": \"ID\",\n", " \"scriptvalue\": \"***************************\"\n", " \"scriptdefaultvalue\": \"\"\n", " },\n", " {\n", " \"scriptkey\": \"SECRET\",\n", " \"scriptvalue\": \"***************************\",\n", " \"scriptdefaultvalue\": \"\",\n", " \"type\": \"password\"\n", " }\n", " ]\n", "}\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The ID and SECRET can be obtained for free on https://developer.foursquare.com/ by registering for a free developer account.\n", "\n", "After saving the credentials in the file \"cred_foursquare.json\" we can load the values into variables." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "cred=pd.read_json(r'cred_foursquare.json') # read the json file" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "VERSION=cred['keys'][0]['scriptvalue'] # Foursquare API version\n", "CLIENT_ID=cred['keys'][1]['scriptvalue'] # Foursquare ID\n", "CLIENT_SECRET=cred['keys'][2]['scriptvalue'] # Foursquare Secret\n", "LIMIT=100 # max number of answers because number of queries on the free account are limited" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's think about __what type of venues__ we want to include as possible shops / restaurants \n", "\n", "Everybody can have a look at all of Foursquare's venue categories and the corresponding IDs that can be queried with the API:\n", "https://developer.foursquare.com/docs/build-with-foursquare/categories/\n", "\n", "__I'll go with the following in this case:__\n", "\n", "- Food\n", "4d4b7105d754a06374d81259\n", "\n", "- Food & Drink Shop\n", "4bf58dd8d48988d1f9941735\n", "\n", "- Fruit & Vegetable Store\n", "52f2ab2ebcbc57f1066b8b1c\n", "\n", "- Market\n", "50be8ee891d4fa8dcc7199a7\n", "\n", "- Pharmacy\n", "4bf58dd8d48988d10f951735\n", "\n", "- Shopping Mall\n", "4bf58dd8d48988d1fd941735\n", "\n", "\n", "Of course there would also be a possibility to query for much more specific kinds of restaurants, e.g. a\n", "Vegetarian / Vegan Restaurant\n", "4bf58dd8d48988d1d3941735\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To get going, we first __define a function that queries the Foursquare API and returns nearby venues__." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "def getNearbyVenues(names, latitudes, longitudes, radius, categories): # function adapted from course script\n", " \n", " venues_list=[]\n", " for name, lat, lng in zip(names, latitudes, longitudes):\n", " print('Searching venues for',name)\n", " \n", " # create the API request URL\n", " url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(\n", " CLIENT_ID, \n", " CLIENT_SECRET, \n", " VERSION, \n", " lat, \n", " lng, \n", " radius, \n", " LIMIT,\n", " categories)\n", " \n", " # make the GET request\n", " results = requests.get(url).json()[\"response\"]['groups'][0]['items']\n", " \n", " # return only relevant information for each nearby venue\n", " venues_list.append([(\n", " name, \n", " lat, \n", " lng, \n", " v['venue']['name'], \n", " v['venue']['location']['lat'], \n", " v['venue']['location']['lng'], \n", " v['venue']['categories'][0]['name']) for v in results])\n", "\n", " nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])\n", " nearby_venues.columns = ['Customer Email', \n", " 'Neighborhood Latitude', \n", " 'Neighborhood Longitude', \n", " 'Venue', \n", " 'Venue Latitude', \n", " 'Venue Longitude', \n", " 'Venue Category']\n", " \n", " return(nearby_venues)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have the function in place, let's use it to query for the venues that are close to our customer.\n", "\n", "For the radius, let's just assume a reasonable number: A normal healthy young human walks at about 4 km/h by foot. So in 15 minutes the helper would travel 1 kilometer (=1000 meters). That's a reasonable number in a City like Zurich, where one has plenty of shopping possibilites. In rural areas one would probably consider cars as the transport way of choice and would also need to apply a larger radius.\n", "\n", "For here, 1000 meters should be fine." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Searching venues for hungrycustomer1942@bluewin.ch\n" ] }, { "data": { "text/plain": [ "(12, 7)" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "venues = getNearbyVenues(names=['hungrycustomer1942@bluewin.ch'],\n", " latitudes=[custLatitude],\n", " longitudes=[custLongitude],\n", " radius=1000,\n", " categories='4d4b7105d754a06374d81259,4bf58dd8d48988d1f9941735,52f2ab2ebcbc57f1066b8b1c,50be8ee891d4fa8dcc7199a7,4bf58dd8d48988d10f951735,4bf58dd8d48988d1fd941735'\n", " )\n", "venues.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So in our customers case we were able to find 12 venues nearby (in the radius of 1000 meters)." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Customer EmailNeighborhood LatitudeNeighborhood LongitudeVenueVenue LatitudeVenue LongitudeVenue Category
0hungrycustomer1942@bluewin.ch47.4028848.499581Kebab-Haus Höngg47.4021728.495997Fast Food Restaurant
1hungrycustomer1942@bluewin.ch47.4028848.499581Desperado47.4020838.496903Mexican Restaurant
2hungrycustomer1942@bluewin.ch47.4028848.499581Argentina Steakhouse47.4039358.496524Steakhouse
3hungrycustomer1942@bluewin.ch47.4028848.499581Osteria da Biagio47.4022648.495558Italian Restaurant
4hungrycustomer1942@bluewin.ch47.4028848.499581Pizzeria Rapido47.4016728.499121Pizza Place
5hungrycustomer1942@bluewin.ch47.4028848.499581Marcello's Bistro47.4030018.498206Café
6hungrycustomer1942@bluewin.ch47.4028848.499581Restaurant Turbinenhaus47.3953748.505774Italian Restaurant
7hungrycustomer1942@bluewin.ch47.4028848.499581Del Sole Pizzakurier Höngg47.4002818.503689Pizza Place
8hungrycustomer1942@bluewin.ch47.4028848.499581Maharani47.4027798.492496Indian Restaurant
9hungrycustomer1942@bluewin.ch47.4028848.499581Restaurant Werdinsel47.3993538.489361Snack Place
10hungrycustomer1942@bluewin.ch47.4028848.499581ETH Chemiecafeteria47.4080148.507681Café
11hungrycustomer1942@bluewin.ch47.4028848.499581SV Bistro Hönggerberg47.4083128.507627Café
\n", "
" ], "text/plain": [ " Customer Email Neighborhood Latitude \\\n", "0 hungrycustomer1942@bluewin.ch 47.402884 \n", "1 hungrycustomer1942@bluewin.ch 47.402884 \n", "2 hungrycustomer1942@bluewin.ch 47.402884 \n", "3 hungrycustomer1942@bluewin.ch 47.402884 \n", "4 hungrycustomer1942@bluewin.ch 47.402884 \n", "5 hungrycustomer1942@bluewin.ch 47.402884 \n", "6 hungrycustomer1942@bluewin.ch 47.402884 \n", "7 hungrycustomer1942@bluewin.ch 47.402884 \n", "8 hungrycustomer1942@bluewin.ch 47.402884 \n", "9 hungrycustomer1942@bluewin.ch 47.402884 \n", "10 hungrycustomer1942@bluewin.ch 47.402884 \n", "11 hungrycustomer1942@bluewin.ch 47.402884 \n", "\n", " Neighborhood Longitude Venue Venue Latitude \\\n", "0 8.499581 Kebab-Haus Höngg 47.402172 \n", "1 8.499581 Desperado 47.402083 \n", "2 8.499581 Argentina Steakhouse 47.403935 \n", "3 8.499581 Osteria da Biagio 47.402264 \n", "4 8.499581 Pizzeria Rapido 47.401672 \n", "5 8.499581 Marcello's Bistro 47.403001 \n", "6 8.499581 Restaurant Turbinenhaus 47.395374 \n", "7 8.499581 Del Sole Pizzakurier Höngg 47.400281 \n", "8 8.499581 Maharani 47.402779 \n", "9 8.499581 Restaurant Werdinsel 47.399353 \n", "10 8.499581 ETH Chemiecafeteria 47.408014 \n", "11 8.499581 SV Bistro Hönggerberg 47.408312 \n", "\n", " Venue Longitude Venue Category \n", "0 8.495997 Fast Food Restaurant \n", "1 8.496903 Mexican Restaurant \n", "2 8.496524 Steakhouse \n", "3 8.495558 Italian Restaurant \n", "4 8.499121 Pizza Place \n", "5 8.498206 Café \n", "6 8.505774 Italian Restaurant \n", "7 8.503689 Pizza Place \n", "8 8.492496 Indian Restaurant \n", "9 8.489361 Snack Place \n", "10 8.507681 Café \n", "11 8.507627 Café " ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "venues" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now how about we create a **function** we can reuse to **display a map of all three parties (customer, helpers, shops) together**:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "scrolled": true }, "outputs": [], "source": [ "def createRawMap(custLatitude, custLongitude, custAddr, helperData, venues, zoomStart=15):\n", "\n", " # create map of Zurich using the latitude and longitude values generated by geocoder\n", " map_folium = folium.Map(location=[custLatitude, custLongitude], zoom_start=zoomStart)\n", "\n", " # add a red cirle for the position of our customer to the map\n", " label = custAddr\n", " label = folium.Popup(label, parse_html=True)\n", " folium.CircleMarker(\n", " [custLatitude, custLongitude],\n", " radius=10,\n", " popup=label,\n", " color='red',\n", " parse_html=False).add_to(map_folium) \n", "\n", "\n", " # add smaller blue circles for helpers\n", " for email, lat, lng in zip(helperData['email'], helperData['latitude'], helperData['longitude']): \n", " label = email\n", " label = folium.Popup(label, parse_html=True)\n", " folium.CircleMarker(\n", " [lat, lng],\n", " radius=3,\n", " popup=label,\n", " color='blue',\n", " fill=True,\n", " fill_color='#1234cc',\n", " fill_opacity=0.7,\n", " parse_html=False).add_to(map_folium) \n", "\n", " # add smaller green circles for helpers\n", " for venue, lat, lng in zip(venues['Venue'], venues['Venue Latitude'], venues['Venue Longitude']): \n", " label = venue\n", " label = folium.Popup(label, parse_html=True)\n", " folium.CircleMarker(\n", " [lat, lng],\n", " radius=3,\n", " popup=label,\n", " color='green',\n", " fill=True,\n", " fill_color='#90ee90',\n", " fill_opacity=0.7,\n", " parse_html=False).add_to(map_folium)\n", " \n", " return map_folium" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and see if it works:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display(createRawMap(custLatitude, custLongitude, custAddr, helperData, venues))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Nice.\n", "\n", "Customer = red circle\n", "\n", "Helpers = blue dots\n", "\n", "Shops = green dots\n", "\n", "As a human it would be quite obvious now to tell the customer to order at Marcello's Bistro or Pizzerie Rapodio.\n", "There are also quite a few ideal helpers because they live both near to the bistro and the customer.\n", "\n", "However, our customer may not like the food Marcello's Bistro produces etc., choices may not be that obvious or there simply could be too many reqeusts for a person to process by phone.\n", "\n", "__We would like to automate such decisions and ideally introduce various parameters in the process.__\n", "\n", "So now here our recommender system comes into place." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### recommender system - what measurements do we use to get to our recommendations?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "what we want to minimize:\n", "- total distance helper has to travel\n", "\n", "assumptions:\n", "- distances measured in air-line distance (may in some cases where street topology is special not lead to the ultimate best decisions, e.g. elevation ignored or dead end streets)\n", "\n", "variables to measure distances the helper has to travel:\n", "1. __HtoS__ = distance from the helper (H) to the shop (S)\n", "2. __StoC__ = distance from shop (S) to customer (C)\n", "3. __CtoH__ = distance from customer (C) to shop (S)\n", "\n", "for the travel time of our helper to be minimized we basically need to **minimze the sum of our variables 1 to 3** (HtoS+StoC+CtoH = __totDis__)" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "# initialize variables\n", "numH = helperData.shape[0] # number of helpers\n", "numS = venues.shape[0] # number of shops\n", "numC = 1 # test with only 1 customer\n", "HtoS = np.zeros([numS, numH]) # 2-dim matrix, helpers in columns, shops in rows\n", "StoC = np.zeros([numS,1]) # 1-dim matrix, shops in rows\n", "CtoH = np.zeros([1, numH]) # 1-dim matrix, helpers in columns\n", "totDist = np.zeros([numS, numH]) #2-dim matrix, helpers in columns, shops in rows" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "# calculate distances from all helpers to all shops (in meters rounded to whole numbers)\n", "for hId, helper in helperData.iterrows():\n", " for sId, shop in venues.iterrows():\n", " tmpCord1 = (helper['latitude'], helper['longitude'])\n", " tmpCord2 = (shop['Venue Latitude'], shop['Venue Longitude'])\n", " HtoS[sId][hId]=round(geopy.distance.distance(tmpCord1, tmpCord2).m,0)" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "# calculate distances from all shops to our customer (in meters rounded to whole numbers)\n", "tmpCord1 = (custLatitude, custLongitude)\n", "for sId, shop in venues.iterrows():\n", " tmpCord2 = (shop['Venue Latitude'], shop['Venue Longitude'])\n", " StoC[sId][0]=round(geopy.distance.distance(tmpCord1, tmpCord2).m,0)" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "# calculate distances from our customer to all helpers (in meters rounded to whole numbers)\n", "tmpCord2 = (custLatitude, custLongitude)\n", "for hId, helper in helperData.iterrows():\n", " tmpCord1 = (helper['latitude'], helper['longitude'])\n", " CtoH[0][hId]=round(geopy.distance.distance(tmpCord1, tmpCord2).m,0) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### descriptive statistics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we do some quick and dirty **heatmaps and descriptive statistics to get a feel for distribution of the distances** we have calculated here." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "# define custom color map ranging from green to white to red \n", "greenToRed=LinearSegmentedColormap.from_list('gr',[\"g\", \"w\",\"w\",\"w\",\"w\",\"w\", \"r\"], N=128)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The color range from green to red is **separated by 5 segments of white to get a better feel for the extreme values**." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### HtoS – distances between helpers and shops" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DescribeResult(nobs=1560, minmax=(66.0, 2997.0), mean=1020.1467948717949, variance=347633.2350121708, skewness=0.6149120572489256, kurtosis=-0.05753935414874878)\n" ] } ], "source": [ " print(stats.describe(HtoS, axis=None))" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "scrolled": false }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# set font size of plot labels to 24 pt\n", "font = {'size' : 24}\n", "plt.rc('font', **font)\n", "\n", "# Heatmap of distances from all helpers to all shops (green = closer, red = further away)\n", "plt.figure(figsize = (30,30))\n", "plt.imshow(HtoS, cmap=greenToRed)\n", "plt.title(\"distances - HtoS\")\n", "plt.xlabel('helpers')\n", "plt.ylabel('shops')\n", "plt.show())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The closest shop to a helper is only 66 meters away, mean distance is about 1 km and variance is obviously very high.\n", "\n", "The shop with indice 6 seems to be the one that is furthest away from some of the helpers. If you have a look a the venues datafram, you see, that this is \"Restaurant Turbinenhaus\". If you look on the map, you can see it's located at the bottom right of the map and therefore naturally far away from helpers on the top left." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### StoC – distances between shops and customers" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DescribeResult(nobs=12, minmax=(105.0, 957.0), mean=482.5833333333333, variance=99210.08333333334, skewness=0.3728582113979914, kurtosis=-1.475914078451692)\n" ] } ], "source": [ "print(stats.describe(StoC,axis=None))" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "scrolled": true }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAADwAAAD4CAYAAACwhYXQAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAHa0lEQVR4nO2dbYgdVxnHf393t63pVpq6qTYvsLHESPBL3EtR2/ohaSG+kAgKGrQ0RRHBl7YIEvFDwS9aKb6hCLWJFgxRqAWjlGptGzWgpZs0tY1xzbqGZJs02dVaG0uTjX38cCdy9+buS+acmW6e+/zgcufOnDnz/O6ZO3fu3GfOkZnRTbzutQ6gbkLYOyHsnd46NzYwMGCDg4PJ9Rw+fJjJyUmVWbdW4cHBQYaHh5PraTQapdftul06hL0TwheCpA2SRiSNStqaK6gqKS0sqQf4HvBeYA2wWdKaXIFVRUoLXweMmtmYmZ0BfgJsyhNWdaQILwOOtrweL+ZNQ9KnJA1LGp6YmEjYXB5ShDud6Zz349rM7jWzhpk1lixZkrC5PKQIjwMrWl4vB46lhVM9KcJPAqskrZR0CfBRYFeesKqj9Lm0mZ2V9FngV0APsN3MDmSLrCKSfjyY2UPAQ5liqYU40/JO1wnXegHg5amX2Xtsb5Z6ytJ1LRzC3glh74Swd0LYOyHsnRD2Tgh7J4S9U+sVj0V9ixhaOpSlnrJ0XQuHsHdC2DspKQ8rJD0u6aCkA5JuzxlYVaR8LZ0FvmBm+yRdAeyV9IiZ/TlTbJVQuoXN7LiZ7SumXwIO0iHlYaGR5TMsaRBYCzyRo74qSRaW1A/8DLjDzP7dYbmbpBYk9dGU3WFmD3Yq4yapRZKAbcBBM/tGvpCqJaWFrwduAdZJ2l883pcprspISWrZQ+dcrQVNnGl5J4S9E8LeCWHvhLB3Qtg7IeydEPZOCHsnhL0Twt6pNallZHKEG7ffmKWesnRdC4ewd0LYOzn+EO+R9JSkX+YIqGpytPDtNPM7LgpSMwCWA+8H7ssTTvWktvC3gC8Cr85UoDXHY+rUVOLm0klJefgAcNLMZr0DujXHo6+/r+zmspGa8rBR0mGa/fCsk/TjLFFVSEpi2pfMbLmZDdLstOQxM/t4tsgqouu+h7P8WjKz3cDuHHVVTde1cAh7p9YrHqfOnGLPkT3pFZ0pv2rXtXAIeyeEvRPC3glh74Swd0LYOyHsnRD2juocQqwhWfpILdAAhs1K3QbYdS0cwt4JYe+kZgBcKekBSX8pOjB5V67AqiL1Qvy3gYfN7MPFWA/lO7qqidLCkt4AvAfYAlAMUJPwn0A9pOzSbwEmgB8WaUv3Sbq8vdC0fjwSNpaLFOFe4B3A981sLfAf4LxBpqb145GwsVykDk4zbmbnuqN5gOYbsKBJyfF4HjgqaXUxaz2woHtagvSj9OeAHcURegy4LT2kakkdnGY/zR8vFw1xpuWdEK6UoSEwS38Mle/uNVrYOyHsnRD2Tgh7J4S9E8LeCWHvhLB3Qtg7IeydEPZOCF8Iku4sBqZ5VtJOSZflCqwqUjo9WAZ8HmiY2duBHpq3xi9oUnfpXuD1knppJrQcSw+pWlL+EH8OuAc4AhwHXjSzX7eXczNWi6TFwCZgJbAUuFzSeb08uBmrBbgJ+LuZTZjZFPAg8O48YVVHivAR4J2SFhUD1aznIuiEKOUz/ATNzJ19wDNFXfdmiqsyUnM87gLuyhRLLcSZlne6TrjWfjyYnIRt2/LUU5Kua+EQ9k4IeyeEvRPC3glh74Swd0LYOyHsna4TrvcST38/3HBDnnpK0nUtHMLeCeF2JG2XdFLSsy3zrpL0iKRDxfPiasPMx3xa+EfAhrZ5W4FHzWwV8CgdOjtYqMwpbGa/A/7ZNnsTcH8xfT/wwcxxVUbZz/CbzOw4QPF8db6QqqXyg9a0pJYXXqh6c3NSVviEpGsAiueTMxWcltSy+LU/tpUV3gXcWkzfCvw8TzjVM5+vpZ3AH4DVksYlfQL4GnCzpEPAzcXri4I5fzyY2eYZFq3PHEstxJmWd0LYO/Ve8Th9GkZH89RTkq5r4RD2Tgh7J4S9E8LeCWHvhLB3Qtg7IeydWgenkfQSMDJHsQFgrvt0VpvZFWViqPcSD4yY2az9UUsank+ZsgF03S4dwhUzn/uLc5XpSK0HrYVA7NLeqVR4tvQmSRskjUgalfSqpP3FY1ex/FJJv5V0RtIrku7uUP8WSRMt635yzqDMrLIH8HVgazG9Fbi7mO4B/kZzgJtLgP8Ca9rW/QzwYlHmY8C/OpTZAnz3QmKqepeeKb3pOmDUzMaKYYvOFmVbuQU4aGZjwE+Bvg5lLpiqhWdKb1oGHG0p1wvcKemPks69KUtpDoeCmZ0FTgHXdtjGhyT9qRjZa8VcASWfWkr6DfDmDou+PNtqba/vAN4KfBN4TNIzHcoAtH+H/gLYaWanJX2a5l60brZ4k4XN7KaZlkk6IekaMzvelt40DrS2Rj9wzMzGJO0G1gLP0fz8UvTm1E/R4i3b/kfLyx8A5x3Y2ql6l54pvelJYJWklZKuBjYDuyQNANfTHMZoB7BG0krgI8BUUd//OZcrVrCR+XSNU/FR+o00k08PFc9XFfMbwMPAX2m29vPA08AJ4DtFmcuA39Mch+0V4J5i/leAjcX0V4EDxbqPA2+bK6Y4tfROCHsnhL0Twt75H/3/GNhgr4fiAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Heatmap of distances from all shops to our customer (green = closer, red = further away)\n", "plt.imshow(StoC.transpose(), cmap=greenToRed) # transposed to use less space\n", "#ax1 = plt.axes(frameon=False)\n", "plt.title(\"distances - StoC\")\n", "plt.xlabel('shops')\n", "plt.tick_params(\n", " axis='y', # changes apply to the x-axis\n", " which='both', # both major and minor ticks are affected\n", " left=False, # ticks along the bottom edge are off\n", " top=False, # ticks along the top edge are off\n", " labelleft=False) # labels along the bottom edge are off\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We get a good picture about possible shop recommendations here. The shop with indice 6 also seems to be far away from our customer. It probably won't make it into our recommendation list, then.\n", "\n", "**Shops 5,4 and 1 could be potentially interesting when the shop has to be close.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### CtoH – distances between our customer and our helpers" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DescribeResult(nobs=130, minmax=(105.0, 2080.0), mean=876.6769230769231, variance=306164.14287418005, skewness=0.6655659563462873, kurtosis=-0.6208715228192303)\n" ] } ], "source": [ "print(stats.describe(CtoH,axis=None))" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAABrcAAAAwCAYAAAC7biG8AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAMYElEQVR4nO3df6zV5X3A8fenFwV/FfBXR0EnZHe0jHWlEmPX2Rhtp1QDTVaqTbuxdoa4rNMt+1EoWdc1NbPbsmnr2qVBq8uaasvcSrrC5mqXYhasWIylOAbDRUEUKYhSFAU/++N8iTeXc+2ec849557zfb+Sm3u+3/N8zvPk3nOe5/k+n/P9fiMzkSRJkiRJkiRJkvrBG3rdAEmSJEmSJEmSJOn/y+SWJEmSJEmSJEmS+obJLUmSJEmSJEmSJPUNk1uSJEmSJEmSJEnqGya3JEmSJEmSJEmS1DdMbkmSJEmSJEmSJKlvtJXciogzI+K+iNhe/Z4+RrljEfFI9bO2nTolSZIkSZIkSZJUX5GZrQdH/AWwPzNvjogVwPTM/ESTcocy8/Q22ilJkiRJkiRJkiS1ndzaBlyamXsiYgbwH5k5t0k5k1uSJEmSJEmSJElqW7v33HpTZu4BqH6fO0a5KRGxKSI2RsT726xTkiRJkiRJkiRJNTXppxWIiH8HfqbJU6sK6jk/M5+KiDnA/RHxw8z8nyZ1LQeWA0w5dcqFs2bPKqgCpk6ZWlQeYM+hPcUxM7Y/XVbHcLM/30+p4/QZxTGauPb+ZG9xzLmnjZUr7qyDLx0sjmnls7Zl75ai8vPPnV9ch1rzyrFXimNK39Mz3zizuI5u2XlgZ3HMnOlzxqElJ2rlf3PS0EllAdu3F9fBBReUx5xU2K4Jbtu+bcUxc88+4eT2CWHrs1uLY+adM28cWnKiiTx+TlSl4y3A/IOTyysaHi6P2Vr+XmNe2XvtWB4rruLQkUPFMa3MhSayVt43R44eKSp/4ZsvLK5jQtu8uTxmwYLOt0Md0dK4fviU8orOP788ZoLad3hfcczTh8rWUsDjwgMvHiiOmX7K9PKK9pSvjTGjS+tWO8uP17ae8VJR+Xm7ysY0mNhrfS3NB7v0Wdu+v+z4c/jMFuacgm3l4xpzu3C82sr6w/PPl8dcOGDzzgHSyvrDi0+8uC8zz2n2XFcuSzgq5k7gW5m55vXKDc8fzlvuvaWoPVf9/FVF5QFu+t5NxTGrrrq5rI5/WVFex7tLcofddezV8kWDoTcMjUNL+setG28tjrnx4huLY1r536zfsb44ppXP2vDnyyYk229oYcBrge9n2P387uKY275/W1H5z1722eI6uvV3Xvr1pcUx3/jgN8ahJSdq5X9TnEhctKi4DlavLo+ZOXETnK245I5LimM2fGzDOLSkfQv+rnyxdfP1LSzqtqBb4+cgKR1vAbav+7nyitatK49pZWG/MIHQypd2HnjigeKYVuZCE1kr75sd+3cUlc9Pt37cOSGdcUZ5zAsvdL4dTZTObyfy3LZbc/WWxvXNby+O4QtfKI+ZoG7/we3FMTc/ULaWAt07Lpyo7tlyT3HMNfOvKa/opvK1MVZ1ad1qafnx2oLLy8aozX9UVh4m9lpfS/PBLn3WFv1D2fHnuo+0MOdswcCtDV1SPq6xoQvHq62sP6wvX7ekjXyHxlcr6w+P/PYjD2fmwmbPtXtZwrXAsurxMuCbowtExPSImFw9Pht4F9DC1zYlSZIkSZIkSZJUd+0mt24G3hsRu4E/AT4QESsiYmFEHP9a91uBhyPiOWA3MAQcbrNeSZIkSZIkSZIk1VBbya3M/DHwq8BLwC8CbwE+BBzOzOuqMv8JfBG4OzMnA58EPtdOvZIkSZIkSZIkSaqnds/cArgI2JGZOzPzZeBuYMmoMkuAu6rHa4DLIyI6ULckSZIkSZIkSZJqpBPJrZnAkyO2d1X7mpbJzKPAQeCs0S8UEcsjYlNEbDp4oPxm0JIkSZIkSZIkSRpsnUhuNTsDK1soQ2Z+OTMXZubCqdOndqBpkiRJkiRJkiRJGiSdSG7tAs4bsT0LeGqsMhExCZgK7O9A3ZIkSZIkSZIkSaqRSR14jYeAt0XETuBV4DTgPaPKPAPcHxHbgGnA45l5wplbkiRJkiRJkiRJ0uvpxJlbx5NUwWuXH8yI+ExELK62NwC7gdOBvcA1HahXkiRJkiRJkiRJNdOJM7cuAh7NzCsAImIlsCQzPzWizCvA+sz8eAfqkyRJkiRJkiRJUk114sytmcCTI7Z3VftG+7WIeDQi1kTEeU2elyRJkiRJkiRJkl5XtHvrq4hYClyRmddV278OXJSZvzuizFnAocw8EhHXAx/MzMuavNZyYHm1ORfYNka1ZwP72mq4pH5mHyDJfkCS/YBUb/YBkuwHpHqzD6iHn83Mc5o90Ynk1juBT4+6LCGZ+edjlB8C9mfm1Dbq3JSZC1uNl9Tf7AMk2Q9Ish+Q6s0+QJL9gFRv9gHqxGUJHwKGI2J2RJwMXAusHVkgImaM2FwMPNaBeiVJkiRJkiRJklQzk9p9gcw8GhEfB/4VGALuyMwfRcRngE2ZuRa4ISIWA0eB/cBvtluvJEmSJEmSJEmS6qft5BZAZn4b+PaofZ8a8XglsLITdVW+3MHXktR/7AMk2Q9Ish+Q6s0+QJL9gFRv9gE11/Y9tyRJkiRJkiRJkqRu6cQ9tyRJkiRJkiRJkqSu6KvkVkRcGRHbImJHRKzodXskjb+IOC8ivhsRj0XEjyLixmr/mRFxX0Rsr35P73VbJY2fiBiKiM0R8a1qe3ZEPFj1AfdExMm9bqOk8RMR0yJiTUT8VzUneKdzAak+IuL3q2OBLRHxtYiY4lxAGmwRcUdE7I2ILSP2NR37o+Hz1XrhoxHxjt61XFKnjNEP/GV1TPBoRPxTREwb8dzKqh/YFhFX9KbV6qa+SW5FxBDwt8AiYB7woYiY19tWSeqCo8AfZOZbgYuB36k++yuA72TmMPCdalvS4LoReGzE9ueAv6n6gAPAb/WkVZK65VZgfWa+BfglGv2BcwGpBiJiJnADsDAz5wNDwLU4F5AG3Z3AlaP2jTX2LwKGq5/lwJe61EZJ4+tOTuwH7gPmZ+bbgP8GVgJUa4XXAr9QxXyxyidogPVNcgu4CNiRmTsz82XgbmBJj9skaZxl5p7M/EH1+AUai1kzaXz+76qK3QW8vzctlDTeImIWcBWwutoO4DJgTVXEPkAaYBHxRuDdwO0AmflyZj6HcwGpTiYBp0TEJOBUYA/OBaSBlpnfA/aP2j3W2L8E+Pts2AhMi4gZ3WmppPHSrB/IzH/LzKPV5kZgVvV4CXB3Zh7JzMeBHTTyCRpg/ZTcmgk8OWJ7V7VPUk1ExAXAAuBB4E2ZuQcaCTDg3N61TNI4uwX4Y+DVavss4LkRE1rnBNJgmwM8C3ylujzp6og4DecCUi1k5m7gr4AnaCS1DgIP41xAqqOxxn7XDKV6+hiwrnpsP1BD/ZTciib7suutkNQTEXE68I/A72Xm871uj6TuiIirgb2Z+fDI3U2KOieQBtck4B3AlzJzAfATvAShVBvVPXWWALOBNwOn0bgE2WjOBaT68vhAqpmIWEXjViZfPb6rSTH7gQHXT8mtXcB5I7ZnAU/1qC2SuigiTqKR2PpqZt5b7X7m+GUGqt97e9U+SePqXcDiiPhfGpckvozGmVzTqksTgXMCadDtAnZl5oPV9hoayS7nAlI9vAd4PDOfzcxXgHuBX8a5gFRHY439rhlKNRIRy4CrgQ9n5vEElv1ADfVTcushYDgiZkfEyTRuELe2x22SNM6qe+vcDjyWmX894qm1wLLq8TLgm91um6Txl5krM3NWZl5AY+y/PzM/DHwX+EBVzD5AGmCZ+TTwZETMrXZdDmzFuYBUF08AF0fEqdWxwfE+wLmAVD9jjf1rgd+IhouBg8cvXyhpsETElcAngMWZeXjEU2uBayNickTMBoaB7/eijeqeeC25OfFFxPtofFt7CLgjM2/qcZMkjbOI+BVgA/BDXrvfzidp3Hfr68D5NA54l2bm6JvNShogEXEp8IeZeXVEzKFxJteZwGbgI5l5pJftkzR+IuLtwGrgZGAn8FEaX9RzLiDVQET8GXANjcsPbQauo3EfDecC0oCKiK8BlwJnA88Afwr8M03G/irxfRtwJXAY+GhmbupFuyV1zhj9wEpgMvDjqtjGzLy+Kr+Kxn24jtK4rcm60a+pwdJXyS1JkiRJkiRJkiTVWz9dllCSJEmSJEmSJEk1Z3JLkiRJkiRJkiRJfcPkliRJkiRJkiRJkvqGyS1JkiRJkiRJkiT1DZNbkiRJkiRJkiRJ6hsmtyRJkiRJkiRJktQ3TG5JkiRJkiRJkiSpb5jckiRJkiRJkiRJUt/4P/fnt1+HTOYJAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Heatmap of distances from our customer to all helpers (green = closer, red = further away)\n", "plt.figure(figsize = (30,30))\n", "plt.imshow(CtoH, cmap=greenToRed)\n", "plt.title(\"distances - CtoH\")\n", "plt.xlabel('helpers')\n", "plt.tick_params(\n", " axis='y', # changes apply to the x-axis\n", " which='both', # both major and minor ticks are affected\n", " left=False, # ticks along the bottom edge are off\n", " top=False, # ticks along the top edge are off\n", " labelleft=False) # labels along the bottom edge are offplt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are quite a few helpers located close to our customer, which is promising." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### totDist – total distance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Having our 3 matrices in place it is actually really simple now to **calculate our total distance:**" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [], "source": [ "# calculate total distance\n", "totDist=HtoS+StoC+CtoH" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, we do simple descriptive statistics and a quick heatmap:" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DescribeResult(nobs=1560, minmax=(281.0, 6034.0), mean=2379.4070512820513, variance=1379222.639206181, skewness=0.5854207297337536, kurtosis=-0.2367640214900928)\n" ] } ], "source": [ "print(stats.describe(totDist,axis=None))" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Heatmap of total distances for all shop and helper combinations (green = closer, red = further away)\n", "plt.figure(figsize = (30,30))\n", "plt.imshow(HtoS, cmap=greenToRed)\n", "plt.title(\"distances - totDist (total distance)\")\n", "plt.xlabel('helpers')\n", "plt.ylabel('shops')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, we can clearly see that \"Restaurant Turbinenhaus\" with index 6 will not be a good recommendation. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### weighting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we have 4 different distance measures with totDist clearly being the most important one.\n", "\n", "**BUT: Let's consider the case, where the customer orders refrigerated or especially heavy goods.**\n", "\n", "As a customer you don't want to receive your refrigerated goods thawed.\n", "\n", "As a helper you don't want to lift heavy goods for a long time and over a long distance.\n", "\n", "**So total distance won't be the most important measure anymore.** For both cases it would then make sense to **minimize the distance between the shop and the customer**. Even if this means to have a larger total distance than any other helper shop combinations. \n", "\n", "Additionally, refrigerate goods thaw more quickly, the hotter it is. So the local weather needs to be addressed as well." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### calculation of the weights" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Warning: Tuning of the following parameters is based on (somewhat educated) guesses, not sound science." ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [], "source": [ "# function that returns a weight matrix depending on the weather, items purchased and equipment of the helper\n", "def returnWeights (temperature, isRefrigerated=False, isHeavy=False):\n", " \n", " # array structure \n", " # for weights[HtoS, StoC, StoC_T, CtoH, totDist] -> \"_T\" means that the helper has a trolley\n", " w_standard = np.array([0, 0, 0, 0, 10]) # total distance gets optimized, rest omitted\n", " \n", " if temperature>30: # StoC gets weighted more the hotter it is (if we have refrigerated items),\n", " # for the helper having a trolley (with a thermos bag) helps to keep the goods cold\n", " w_refrigerated = np.array([0, 20, 10, 0, 0])\n", " elif temperature>25:\n", " w_refrigerated = np.array([0, 10, 5, 0, 0])\n", " elif temperature>15:\n", " w_refrigerated = np.array([0, 4, 2, 0, 0])\n", " else:\n", " w_refrigerated = np.array([0, 2, 1, 0, 0])\n", "\n", " # StoC gets weighted more for heavy items (3) and even more so if the helper has no trolley (8)\n", " w_heavy = np.array([0, 20, 10, 0, 0])\n", " \n", " # calculate total weight by adding the subweights\n", " w_total = w_standard + isRefrigerated*w_refrigerated + isHeavy*w_heavy\n", "\n", " # calculate and return normalized weight matrix where all weights sum up to 1\n", " w_total = w_total / w_total.sum()\n", " return w_total" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We somehow need to get hold of *daily temperature* data that can be used to call our returnWeights function.\n", "\n", "For the current purpose we can use the free **openweathermap.org API**.\n", "\n", "We simply repeat the same process as for the Foursquare API:" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [], "source": [ "# read Openweathermap.org API credentials like in the example of the foursquare API further above\n", "cred_weather=pd.read_json(r'cred_openweathermap.json')\n", "\n", "OWM_VERSION=cred_weather['keys'][0]['scriptvalue'] # Openweathermap API version\n", "OWM_APIKEY=cred_weather['keys'][1]['scriptvalue'] # Openweathermap API key" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "def getWeatherData (lat, lon, APIKEY, VERSION):\n", " # create the API request URL\n", " url = 'https://api.openweathermap.org/data/{}/weather?lat={}&lon={}&appid={}&units=metric'.format(\n", " VERSION, \n", " lat, \n", " lon, \n", " APIKEY)\n", "\n", " # make the GET request\n", " results = requests.get(url).json()\n", " \n", " return results" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'coord': {'lon': 8.5, 'lat': 47.4},\n", " 'weather': [{'id': 500,\n", " 'main': 'Rain',\n", " 'description': 'light rain',\n", " 'icon': '10d'}],\n", " 'base': 'stations',\n", " 'main': {'temp': 19.93,\n", " 'feels_like': 16.91,\n", " 'temp_min': 18.33,\n", " 'temp_max': 21,\n", " 'pressure': 1008,\n", " 'humidity': 32},\n", " 'visibility': 10000,\n", " 'wind': {'speed': 2.1, 'deg': 40},\n", " 'rain': {'3h': 0.33},\n", " 'clouds': {'all': 95},\n", " 'dt': 1587910061,\n", " 'sys': {'type': 1,\n", " 'id': 6941,\n", " 'country': 'CH',\n", " 'sunrise': 1587874624,\n", " 'sunset': 1587925808},\n", " 'timezone': 7200,\n", " 'id': 2658017,\n", " 'name': 'Werdhölzli',\n", " 'cod': 200}" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# call function to get current weather of our customer location\n", "weather = getWeatherData(custLatitude, custLongitude, OWM_APIKEY, OWM_VERSION)\n", "weather" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This works fine, the selected weather station 'Werdhölzli' makes sense and data seems to be reasonable for today.\n", "\n", "Let's extract the current temperature at our customer location." ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "19.93 °C\n" ] } ], "source": [ "# extract max temperature\n", "temp = weather['main']['temp']\n", "print(temp,'°C')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's create a few test cases, where we play around with the parameters and have a look at the changes it produces on the weight matrix." ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Now we have a look at what weights are applied when we alter our variables\n", "\n", "First we're getting the current temperature:\n", "temperature at customer location: 19.93 °C\n", "\n", "[HtoS StoC StoC_T CtoH totDis]\n", "[0.000 0.000 0.000 0.000 1.000] -> standard case\n", "[0.000 0.500 0.250 0.000 0.250] -> heavy items\n", "[0.000 0.250 0.125 0.000 0.625] -> refrigerated items with today's weather\n", "[0.000 0.571 0.286 0.000 0.143] -> heavy and refrigerated item on a hot summer day\n" ] } ], "source": [ "print('Now we have a look at what weights are applied when we alter our variables\\n')\n", "\n", "print('First we\\'re getting the current temperature:')\n", "temp=getWeatherData(custLatitude, custLongitude, OWM_APIKEY, OWM_VERSION)['main']['temp']\n", "print('temperature at customer location:',temp,'°C\\n')\n", "\n", "np.set_printoptions(formatter={'float': lambda x: \"{0:0.3f}\".format(x)}) # print floats to only 3 digits precision\n", "\n", "print('[HtoS StoC StoC_T CtoH totDis]')\n", "standard_weights = returnWeights (temp, isRefrigerated=False, isHeavy=False)\n", "print(standard_weights,'-> standard case')\n", "\n", "heavy_weights = returnWeights (temp, isRefrigerated=False, isHeavy=True)\n", "print(heavy_weights,'-> heavy items')\n", "\n", "frigo_weights = returnWeights (temp, isRefrigerated=True, isHeavy=False)\n", "print(frigo_weights,'-> refrigerated items with today\\'s weather')\n", "\n", "hot_heavy_frigo_weights = returnWeights (temperature=35, isRefrigerated=True, isHeavy=True)\n", "print(hot_heavy_frigo_weights,'-> heavy and refrigerated item on a hot summer day')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So in this setting **total distance gets less weight and the distance from the shop to the customer gets more weight under certain conditions**. Just like we wanted to." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's also obvious that HtoS and CtoH have no relevance at all in our current setting. So we might as well leave it out completely. But because future relevant use cases might bring importance to those, we'll keep it for now." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The parameter isRefrigerated depends on the food ordered. The trolly on the other hand is something that is determined by the helper.\n", "\n", "So let's create a **one hot encoded matrix for trolleys** with 1 indicating that the helper has a trolley:" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DescribeResult(nobs=130, minmax=(0, 1), mean=0.23846153846153847, variance=0.18300536672629697, skewness=1.2274688184341327, kurtosis=-0.4933202997719137)\n" ] } ], "source": [ "# generate matrix with length corresponding to our number of helpers and about 20% having a trolley\n", "hTrolley=np.random.choice([0, 1], size=helperData.shape[0], p=[.8, .2])\n", "#hTrolley=hTrolley.reshape((-1, 1))\n", "print(stats.describe(hTrolley,axis=None))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we're only missing a reusable **function** that applies all this data and returns a **score matrix**:" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "scrolled": true }, "outputs": [], "source": [ "def calcScores(HtoS, StoC, CtoH, totDist, weights, hTrolley):\n", " w_HtoS=weights[0]\n", " w_StoC=weights[1]\n", " w_StoC_T=weights[2]\n", " w_CtoH=weights[3]\n", " w_totDist=weights[4]\n", " \n", " return w_HtoS*HtoS + w_StoC*StoC*hTrolley + w_StoC_T*StoC*abs(hTrolley-1) + w_CtoH*CtoH + w_totDist*totDist" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This function norms the score matrix by dividing it by the maximum value. This way we get a range from 0 to 1.\n", "\n", "0 would be the case, where the helper and the customer live at the same address as the shop.\n", "\n", "**So the closer the score to 0, the better:**" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [], "source": [ "def normPerfectToZero(array):\n", " return array / array.max()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we should be all set up." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There could be numerous other parameters applied, e.g. shop rating and so an. But for the purpose of this PoC we'll leave it at that and see what our recommender system returns for our customer." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### standard case - not refrigerated, not heavy" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [], "source": [ "standard_score = normPerfectToZero(calcScores(HtoS, StoC, CtoH, totDist, standard_weights, hTrolley))" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [], "source": [ "# set font size of plot labels to 24 pt\n", "font = {'size' : 24}\n", "plt.rc('font', **font)" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "scrolled": true }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Heatmap of total distances for all shop and helper combinations\n", "# (green = lower score = better, red = high score = bad choice)\n", "plt.figure(figsize = (30,30))\n", "plt.imshow(standard_score, cmap=greenToRed)\n", "plt.title(\"scores – standard case\")\n", "plt.xlabel('helpers')\n", "plt.ylabel('shops')\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DescribeResult(nobs=1560, minmax=(0.04656943984090156, 1.0), mean=0.39433328658966715, variance=0.03788120326673271, skewness=0.5854207297337527, kurtosis=-0.23676402149009323)" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stats.describe(standard_score,axis=None)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### heavy items" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [], "source": [ "heavy_score = normPerfectToZero(calcScores(HtoS, StoC, CtoH, totDist, heavy_weights, hTrolley))" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Heatmap of total distances for all shop and helper combinations\n", "# (green = lower score = better, red = high score = bad choice)\n", "plt.figure(figsize = (30,30))\n", "plt.imshow(heavy_score, cmap=greenToRed)\n", "plt.title(\"scores – heavy items\")\n", "plt.xlabel('helpers')\n", "plt.ylabel('shops')\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DescribeResult(nobs=1560, minmax=(0.04902832465388035, 1.0), mean=0.3781364090839513, variance=0.03131248140918763, skewness=0.5316299109179674, kurtosis=-0.04014952503549729)" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stats.describe(heavy_score,axis=None)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### refrigerated items" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [], "source": [ "frigo_score = normPerfectToZero(calcScores(HtoS, StoC, CtoH, totDist, frigo_weights, hTrolley))" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Heatmap of total distances for all shop and helper combinations\n", "# (green = lower score = better, red = high score = bad choice)\n", "plt.figure(figsize = (30,30))\n", "plt.imshow(frigo_score, cmap=greenToRed)\n", "plt.title(\"scores – refrigerated items\")\n", "plt.xlabel('helpers')\n", "plt.ylabel('shops')\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DescribeResult(nobs=1560, minmax=(0.047620549370841085, 1.0), mean=0.39404257951515825, variance=0.03629694972459201, skewness=0.5651590284958143, kurtosis=-0.17996869422459616)" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stats.describe(frigo_score,axis=None)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### refrigerated and heavy items on a hot summers day" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [], "source": [ "hot_heavy_frigo_score = normPerfectToZero(calcScores(HtoS, StoC, CtoH, totDist, hot_heavy_frigo_weights, hTrolley))" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Heatmap of total distances for all shop and helper combinations\n", "# (green = lower score = better, red = high score = bad choice)\n", "plt.figure(figsize = (30,30))\n", "plt.imshow(hot_heavy_frigo_score, cmap=greenToRed)\n", "plt.title(\"scores – refrigerated and heavy items on a hot summers day\")\n", "plt.xlabel('helpers')\n", "plt.ylabel('shops')\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DescribeResult(nobs=1560, minmax=(0.050168590988045374, 1.0), mean=0.36525276858627226, variance=0.030269464745680003, skewness=0.5951194092669662, kurtosis=0.06004874094664281)" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stats.describe(hot_heavy_frigo_score,axis=None)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### short analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When we have a quick look at the heatmaps of our score matrix, we can see that the score matrix seems to represent what it should and makes sense.\n", "\n", "There's shop rows that are either mostly white (shops with medium distance to the customer), white and green (closer shops) or white and red (shops with a larger distance). This is nice.\n", "\n", "**Shops 0 to 5 seem to have the highest scores. It's not as clear anymore that shops 5,4 and 1 are closest to the customer since not only StoC has an effect, but also the total distance.**\n", "\n", "When we look at the rows in the score matrices, we can clearly see that **helpers are either a fit or not**. There's not a single helper that has both red and green scores with different shops. Then there's helper rows that show no color at all, which are probably the ones having a medium distance to the customer.\n", "\n", "Also it's quite striking, that the \"standard case\" and \"refrigerated items case\" look quite alike. This is not surprising, since on the day I calculated this, the weather was not particularly hot. So there's only a minor differnece in weighting taking place.\n", "\n", "Also the \"heavy items case\" and the \"refrigerated and heavy items on a hot summers day case\" share more similarities, with the **extreme combination showing the most differences to the standard case**. This makes sense because there the weighting has the most differences to the standard case.\n", "\n", "**This appears to be a good base for making our recommendations.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### recommendation lists" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can list the **top 20 recommendations** (smallest scores and the corresponding helper shop combination):" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [], "source": [ "# function that takes a 2dim numpy array and a number k\n", "# it returns a list with k rows where each row consists of [row index, column index, value]\n", "def findKMinFromNp(originalArray, k):\n", " array=np.copy(originalArray) # copy the numpy array in order not to change the original array\n", " dimArray=len(array[0]) # get number of columns\n", " arrMax=np.amax(array)+1 # get maximum value of array and add 1\n", " \n", " res=[] # initialize results list\n", " for i in range(k): # repeat this step k times\n", " minIndex=array.argmin() #find minimum value in array\n", " row=int(minIndex/dimArray) # determine row index of min value\n", " col=minIndex%dimArray # determine column index of min value\n", " value=array[row][col] # save value\n", " res.append([row, col, value]) # append results in results list in the form [value, row index, column index]\n", " array[row][col]=arrMax # overwrite the min value with a value that is larger than the current maximum\n", " # ...to prevent finding the same minimum in the next iteration \n", " return res" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [], "source": [ "def printRecommendation(scores, venues):\n", " for rank, score in enumerate(scores, start=1):\n", " print(rank,'/ shop:',venues.iloc[score[0]]['Venue'],'/ helper:',helperData.iloc[score[1]]['email'])" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 / shop: Marcello's Bistro / helper: zrogge@karz.com\n", "2 / shop: Marcello's Bistro / helper: alexeiholsten@web.de\n", "3 / shop: Marcello's Bistro / helper: bbachmann@web.de\n", "4 / shop: Marcello's Bistro / helper: gerlachgerhardt@gmail.com\n", "5 / shop: Marcello's Bistro / helper: angelicabloch@web.de\n", "6 / shop: Marcello's Bistro / helper: iris38@beyer.de\n", "7 / shop: Pizzeria Rapido / helper: vzirme@tlustek.de\n", "8 / shop: Pizzeria Rapido / helper: selmaaustermuehle@googlemail.com\n", "9 / shop: Pizzeria Rapido / helper: alexeiholsten@web.de\n", "10 / shop: Desperado / helper: alexeiholsten@web.de\n" ] } ], "source": [ "standard_recommendation=findKMinFromNp(standard_score,10)\n", "printRecommendation(standard_recommendation, venues)" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 / shop: Marcello's Bistro / helper: zrogge@karz.com\n", "2 / shop: Marcello's Bistro / helper: bbachmann@web.de\n", "3 / shop: Marcello's Bistro / helper: gerlachgerhardt@gmail.com\n", "4 / shop: Marcello's Bistro / helper: angelicabloch@web.de\n", "5 / shop: Marcello's Bistro / helper: iris38@beyer.de\n", "6 / shop: Marcello's Bistro / helper: alexeiholsten@web.de\n", "7 / shop: Pizzeria Rapido / helper: selmaaustermuehle@googlemail.com\n", "8 / shop: Marcello's Bistro / helper: wally73@heydrich.com\n", "9 / shop: Pizzeria Rapido / helper: zrogge@karz.com\n", "10 / shop: Pizzeria Rapido / helper: kreinberta@paertzelt.net\n" ] } ], "source": [ "heavy_recommendation=findKMinFromNp(heavy_score,10)\n", "printRecommendation(heavy_recommendation, venues)" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 / shop: Marcello's Bistro / helper: zrogge@karz.com\n", "2 / shop: Marcello's Bistro / helper: alexeiholsten@web.de\n", "3 / shop: Marcello's Bistro / helper: bbachmann@web.de\n", "4 / shop: Marcello's Bistro / helper: gerlachgerhardt@gmail.com\n", "5 / shop: Marcello's Bistro / helper: angelicabloch@web.de\n", "6 / shop: Marcello's Bistro / helper: iris38@beyer.de\n", "7 / shop: Pizzeria Rapido / helper: selmaaustermuehle@googlemail.com\n", "8 / shop: Pizzeria Rapido / helper: vzirme@tlustek.de\n", "9 / shop: Pizzeria Rapido / helper: zrogge@karz.com\n", "10 / shop: Pizzeria Rapido / helper: kreinberta@paertzelt.net\n" ] } ], "source": [ "frigo_recommendation=findKMinFromNp(frigo_score,10)\n", "printRecommendation(frigo_recommendation, venues)" ] }, { "cell_type": "code", "execution_count": 73, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 / shop: Marcello's Bistro / helper: zrogge@karz.com\n", "2 / shop: Marcello's Bistro / helper: bbachmann@web.de\n", "3 / shop: Marcello's Bistro / helper: gerlachgerhardt@gmail.com\n", "4 / shop: Marcello's Bistro / helper: angelicabloch@web.de\n", "5 / shop: Marcello's Bistro / helper: iris38@beyer.de\n", "6 / shop: Marcello's Bistro / helper: wally73@heydrich.com\n", "7 / shop: Pizzeria Rapido / helper: selmaaustermuehle@googlemail.com\n", "8 / shop: Pizzeria Rapido / helper: zrogge@karz.com\n", "9 / shop: Pizzeria Rapido / helper: kreinberta@paertzelt.net\n", "10 / shop: Marcello's Bistro / helper: selmaaustermuehle@googlemail.com\n" ] } ], "source": [ "hot_heavy_frigo_recommendation=findKMinFromNp(hot_heavy_frigo_score,10)\n", "printRecommendation(hot_heavy_frigo_recommendation, venues)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now isn't that nice. Already in the top 10 recommendations we can see major differences depending on the circumstances.\n", "\n", "Since Marcello's Bistro is definitely the closest to our customer, it makes sense that the top 6 recommendations are for this shop. But there's quite some differences in helper choice. Obviously because of the helpers having trolleys or not. \n", "\n", "Take for example alexeiholsten@web.de as number 2 recommendation in the standard case. This helper is also number 2 in the \"refrigerated items case\" under not so hot circumstances. In the \"heavy items case\" he's number 6 and in the extreme combo he's not even in the top 10 anymore.\n", "\n", "So this works quite well." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So now we have our list. It further is very easy to get to the details of the shop and the helper, once the decision is made. e.g. here's the details for combo from the top of the list:" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Customer Email hungrycustomer1942@bluewin.ch\n", "Neighborhood Latitude 47.4029\n", "Neighborhood Longitude 8.49958\n", "Venue Marcello's Bistro\n", "Venue Latitude 47.403\n", "Venue Longitude 8.49821\n", "Venue Category Café\n", "Name: 5, dtype: object" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "venues.iloc[standard_recommendation[0][0]]" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "address Kappenbühlweg 9, 8049 Zürich\n", "latitude 47.4036\n", "longitude 8.49863\n", "email zrogge@karz.com\n", "Name: 68, dtype: object" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "helperData.iloc[standard_recommendation[0][1]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### create a map of the distances for our top 3 recommendations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now wouldn't it be nice to see the top 3 recommendations visually on our folium map?\n", "\n", "Credit for the following two nice functions, that draw lines with arrows on folium maps, goes to [Bob Haffner](https://medium.com/@bobhaffner/folium-lines-with-arrows-25a0fe88e4e)." ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [], "source": [ "def get_arrows(locations, color='blue', size=6, n_arrows=3):\n", " \n", " '''\n", " Get a list of correctly placed and rotated \n", " arrows/markers to be plotted\n", " \n", " Parameters\n", " locations : list of lists of lat lons that represent the \n", " start and end of the line. \n", " eg [[41.1132, -96.1993],[41.3810, -95.8021]]\n", " arrow_color : default is 'blue'\n", " size : default is 6\n", " n_arrows : number of arrows to create. default is 3\n", " Return\n", " list of arrows/markers\n", " '''\n", " \n", " Point = namedtuple('Point', field_names=['lat', 'lon'])\n", " \n", " # creating point from our Point named tuple\n", " p1 = Point(locations[0][0], locations[0][1])\n", " p2 = Point(locations[1][0], locations[1][1])\n", " \n", " # getting the rotation needed for our marker. \n", " # Subtracting 90 to account for the marker's orientation\n", " # of due East(get_bearing returns North)\n", " rotation = get_bearing(p1, p2) - 90\n", " \n", " # get an evenly space list of lats and lons for our arrows\n", " # note that I'm discarding the first and last for aesthetics\n", " # as I'm using markers to denote the start and end\n", " arrow_lats = np.linspace(p1.lat, p2.lat, n_arrows + 2)[1:n_arrows+1]\n", " arrow_lons = np.linspace(p1.lon, p2.lon, n_arrows + 2)[1:n_arrows+1]\n", " \n", " arrows = []\n", " \n", " #creating each \"arrow\" and appending them to our arrows list\n", " for points in zip(arrow_lats, arrow_lons):\n", " arrows.append(folium.RegularPolygonMarker(location=points, \n", " fill_color=color, number_of_sides=3, \n", " radius=size, rotation=rotation))\n", " return arrows" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [], "source": [ "def get_bearing(p1, p2):\n", " \n", " '''\n", " Returns compass bearing from p1 to p2\n", " \n", " Parameters\n", " p1 : namedtuple with lat lon\n", " p2 : namedtuple with lat lon\n", " \n", " Return\n", " compass bearing of type float\n", " \n", " Notes\n", " Based on https://gist.github.com/jeromer/2005586\n", " '''\n", " \n", " long_diff = np.radians(p2.lon - p1.lon)\n", " \n", " lat1 = np.radians(p1.lat)\n", " lat2 = np.radians(p2.lat)\n", " \n", " x = np.sin(long_diff) * np.cos(lat2)\n", " y = (np.cos(lat1) * np.sin(lat2) \n", " - (np.sin(lat1) * np.cos(lat2) \n", " * np.cos(long_diff)))\n", " bearing = np.degrees(np.arctan2(x, y))\n", " \n", " # adjusting for compass bearing\n", " if bearing < 0:\n", " return bearing + 360\n", " return bearing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Now we can draw the final map:**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "standard case:" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [], "source": [ "topHelper=helperData.iloc[standard_recommendation[0][1]]\n", "topVenue=venues.iloc[standard_recommendation[0][0]]\n", "secondHelper=helperData.iloc[standard_recommendation[1][1]]\n", "secondVenue=venues.iloc[standard_recommendation[1][0]]\n", "thirdHelper=helperData.iloc[standard_recommendation[2][1]]\n", "thirdVenue=venues.iloc[standard_recommendation[2][0]]\n", "\n", "p=[]\n", "\n", "p.append([topHelper['latitude'], topHelper['longitude'], topVenue['Venue Latitude'], topVenue['Venue Longitude'], 'green'])\n", "p.append([topVenue['Venue Latitude'], topVenue['Venue Longitude'], custLatitude, custLongitude, 'green'])\n", "p.append([custLatitude, custLongitude, topHelper['latitude'], topHelper['longitude'], 'green'])\n", "\n", "p.append([secondHelper['latitude'], secondHelper['longitude'], secondVenue['Venue Latitude'], secondVenue['Venue Longitude'], 'orange'])\n", "p.append([secondVenue['Venue Latitude'], secondVenue['Venue Longitude'], custLatitude, custLongitude, 'orange'])\n", "p.append([custLatitude, custLongitude, secondHelper['latitude'], secondHelper['longitude'], 'orange'])\n", "\n", "p.append([thirdHelper['latitude'], thirdHelper['longitude'], thirdVenue['Venue Latitude'], thirdVenue['Venue Longitude'], 'red'])\n", "p.append([thirdVenue['Venue Latitude'], thirdVenue['Venue Longitude'], custLatitude, custLongitude, 'red'])\n", "p.append([custLatitude, custLongitude, thirdHelper['latitude'], thirdHelper['longitude'], 'red'])" ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# draw base map with position markers\n", "folium_map=createRawMap(custLatitude, custLongitude, custAddr, helperData, venues, zoomStart=17)\n", "\n", "# add lines and arrows\n", "for pos in p:\n", " p1=[pos[0], pos[1]]\n", " p2=[pos[2], pos[3]]\n", " colorLine=pos[4]\n", " folium.PolyLine(locations=[p1, p2], color=colorLine).add_to(folium_map)\n", "\n", " arrows = get_arrows(locations=[p1, p2], color=colorLine, n_arrows=1)\n", " for arrow in arrows:\n", " arrow.add_to(folium_map)\n", " \n", "display(folium_map)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now this is really a funny coincidence. 2 of the \"helpers\" are living at \"Tertianum Residenz - Im Brühl\". This is a place where you can spend your retirement days.\n", "\n", "Maybe not the best location to recruit helpers, when the helpers belong to the risk group themselves.\n", "\n", "In real life, of course one would have to make reasonably sure, that the helpers are up for the task.\n", "\n", "Since this is only based on fake data extracted from OSINT sources, it's not relevant yet. But either way, this example once again shows that data cleaning is the most important process. Otherwise you'll have a lot of work for your model that ends producing in accordance to the garbage in garbage out principle.\n", "\n", "Here maybe it's the janitor and his 18 year old son, both living at the place and young and healthy?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### That was fun. Let's try to apply the code to a different customer.\n", "\n", "Once you have arrived here, you can easily:\n", "1. plug in a different customer address (custAddr2) down below\n", "2. mark this cell\n", "3. jump to the [end of the notebook](#end) and hit shift and mark the last cell, then...\n", "3. hit run" ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [], "source": [ "custAddr_2 = 'Singlistrasse 11, 8049 Zurich' # <-- input address of choice here, then run rest of the notebook" ] }, { "cell_type": "code", "execution_count": 81, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The geograpical coordinate of Singlistrasse 11, 8049 Zurich are 47.403521, 8.4947084.\n" ] } ], "source": [ "# convert address to location\n", "location_2 = geolocator.geocode(custAddr_2)\n", "custLatitude_2 = location_2.latitude\n", "custLongitude_2 = location_2.longitude\n", "print('The geograpical coordinate of {} are {}, {}.'.format(custAddr_2, custLatitude_2, custLongitude_2))" ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Searching venues for hungrycustomer1942@bluewin.ch\n" ] }, { "data": { "text/plain": [ "(9, 7)" ] }, "execution_count": 82, "metadata": {}, "output_type": "execute_result" } ], "source": [ "venues_2 = getNearbyVenues(names=['hungrycustomer1942@bluewin.ch'],\n", " latitudes=[custLatitude_2],\n", " longitudes=[custLongitude_2],\n", " radius=1000,\n", " categories='4d4b7105d754a06374d81259,4bf58dd8d48988d1f9941735,52f2ab2ebcbc57f1066b8b1c,50be8ee891d4fa8dcc7199a7,4bf58dd8d48988d10f951735,4bf58dd8d48988d1fd941735')\n", "venues_2.shape" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [], "source": [ "# initialize variables\n", "numH_2 = helperData.shape[0] # number of helpers\n", "numS_2 = venues_2.shape[0] # number of shops\n", "numC_2 = 1 # test with only 1 customer\n", "HtoS_2 = np.zeros([numS_2, numH_2]) # 2-dim matrix, helpers in columns, shops in rows\n", "StoC_2 = np.zeros([numS_2,1]) # 1-dim matrix, shops in rows\n", "CtoH_2 = np.zeros([1, numH_2]) # 1-dim matrix, helpers in columns\n", "totDist_2 = np.zeros([numS_2, numH_2]) #2-dim matrix, helpers in columns, shops in rows" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [], "source": [ "# calculate distances from all helpers to all shops (in meters rounded to whole numbers)\n", "for hId, helper in helperData.iterrows():\n", " for sId, shop in venues_2.iterrows():\n", " tmpCord1 = (helper['latitude'], helper['longitude'])\n", " tmpCord2 = (shop['Venue Latitude'], shop['Venue Longitude'])\n", " HtoS_2[sId][hId]=round(geopy.distance.distance(tmpCord1, tmpCord2).m,0)" ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [], "source": [ "# calculate distances from all shops to our customer (in meters rounded to whole numbers)\n", "tmpCord1 = (custLatitude_2, custLongitude_2)\n", "for sId, shop in venues_2.iterrows():\n", " tmpCord2 = (shop['Venue Latitude'], shop['Venue Longitude'])\n", " StoC_2[sId][0]=round(geopy.distance.distance(tmpCord1, tmpCord2).m,0)" ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [], "source": [ "# calculate distances from our customer to all helpers (in meters rounded to whole numbers)\n", "tmpCord2 = (custLatitude_2, custLongitude_2)\n", "for hId, helper in helperData.iterrows():\n", " tmpCord1 = (helper['latitude'], helper['longitude'])\n", " CtoH_2[0][hId]=round(geopy.distance.distance(tmpCord1, tmpCord2).m,0) " ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [], "source": [ "# calculate total distance\n", "totDist_2=HtoS_2+StoC_2+CtoH_2" ] }, { "cell_type": "code", "execution_count": 88, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 / shop: Argentina Steakhouse / helper: anneli05@aol.de\n", "2 / shop: Maharani / helper: kuschkarl-august@holzapfel.de\n", "3 / shop: Argentina Steakhouse / helper: olena23@web.de\n", "4 / shop: Osteria da Biagio / helper: kuschkarl-august@holzapfel.de\n", "5 / shop: Argentina Steakhouse / helper: iris38@beyer.de\n", "6 / shop: Osteria da Biagio / helper: metin34@aol.de\n", "7 / shop: Osteria da Biagio / helper: ehoerle@gmail.com\n", "8 / shop: Osteria da Biagio / helper: josef95@boerner.com\n", "9 / shop: Osteria da Biagio / helper: ditschlerinmehdi@googlemail.com\n", "10 / shop: Osteria da Biagio / helper: evelyneloeffler@paertzelt.de\n" ] } ], "source": [ "standard_score2 = normPerfectToZero(calcScores(HtoS_2, StoC_2, CtoH_2, totDist_2, standard_weights, hTrolley))\n", "standard_recommendation2=findKMinFromNp(standard_score2,10)\n", "printRecommendation(standard_recommendation2, venues_2)" ] }, { "cell_type": "code", "execution_count": 89, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 / shop: Argentina Steakhouse / helper: anneli05@aol.de\n", "2 / shop: Argentina Steakhouse / helper: olena23@web.de\n", "3 / shop: Argentina Steakhouse / helper: iris38@beyer.de\n", "4 / shop: Osteria da Biagio / helper: kuschkarl-august@holzapfel.de\n", "5 / shop: Osteria da Biagio / helper: metin34@aol.de\n", "6 / shop: Osteria da Biagio / helper: ehoerle@gmail.com\n", "7 / shop: Osteria da Biagio / helper: josef95@boerner.com\n", "8 / shop: Osteria da Biagio / helper: ditschlerinmehdi@googlemail.com\n", "9 / shop: Osteria da Biagio / helper: evelyneloeffler@paertzelt.de\n", "10 / shop: Osteria da Biagio / helper: walburgajacob@gmx.de\n" ] } ], "source": [ "hot_heavy_frigo_score2 = normPerfectToZero(calcScores(HtoS_2, StoC_2, CtoH_2, totDist_2, hot_heavy_frigo_weights, hTrolley))\n", "hot_heavy_frigo_recommendation2=findKMinFromNp(hot_heavy_frigo_score2,10)\n", "printRecommendation(hot_heavy_frigo_recommendation2, venues_2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To display the distances on the map, we need to extract the positions of our top 3 helper shop combinations:" ] }, { "cell_type": "code", "execution_count": 90, "metadata": {}, "outputs": [], "source": [ "topHelper=helperData.iloc[standard_recommendation2[0][1]]\n", "topVenue=venues_2.iloc[standard_recommendation2[0][0]]\n", "secondHelper=helperData.iloc[standard_recommendation2[1][1]]\n", "secondVenue=venues_2.iloc[standard_recommendation2[1][0]]\n", "thirdHelper=helperData.iloc[standard_recommendation2[2][1]]\n", "thirdVenue=venues_2.iloc[standard_recommendation2[2][0]]\n", "custLat=custLatitude_2\n", "custLong=custLongitude_2\n", "\n", "p=[]\n", "\n", "p.append([topHelper['latitude'], topHelper['longitude'], topVenue['Venue Latitude'], topVenue['Venue Longitude'], 'green'])\n", "p.append([topVenue['Venue Latitude'], topVenue['Venue Longitude'], custLat, custLong, 'green'])\n", "p.append([custLat, custLong, topHelper['latitude'], topHelper['longitude'], 'green'])\n", "\n", "p.append([secondHelper['latitude'], secondHelper['longitude'], secondVenue['Venue Latitude'], secondVenue['Venue Longitude'], 'orange'])\n", "p.append([secondVenue['Venue Latitude'], secondVenue['Venue Longitude'], custLat, custLong, 'orange'])\n", "p.append([custLat, custLong, secondHelper['latitude'], secondHelper['longitude'], 'orange'])\n", "\n", "p.append([thirdHelper['latitude'], thirdHelper['longitude'], thirdVenue['Venue Latitude'], thirdVenue['Venue Longitude'], 'red'])\n", "p.append([thirdVenue['Venue Latitude'], thirdVenue['Venue Longitude'], custLat, custLong, 'red'])\n", "p.append([custLat, custLong, thirdHelper['latitude'], thirdHelper['longitude'], 'red'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Now we can draw the final map:**" ] }, { "cell_type": "code", "execution_count": 91, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# draw base map with position markers\n", "folium_map=createRawMap(custLatitude_2, custLongitude_2, custAddr_2, helperData, venues_2, zoomStart=16)\n", "\n", "# add lines and arrows\n", "for pos in p:\n", " p1=[pos[0], pos[1]]\n", " p2=[pos[2], pos[3]]\n", " colorLine=pos[4]\n", " folium.PolyLine(locations=[p1, p2], color=colorLine).add_to(folium_map)\n", "\n", " arrows = get_arrows(locations=[p1, p2], color=colorLine, n_arrows=1)\n", " for arrow in arrows:\n", " arrow.add_to(folium_map)\n", " \n", "display(folium_map)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Very nice!\n", "\n", "If you want to try with a different customer adress, [simply jump here and have some fun](#tldr2).\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **For discussion and conclusion you can [have a look at the full report here](https://github.com/Funisher-code/Coursera_Capstone/blob/master/report/COVID-19_Safe_And_Efficient_Food_Deliveries.md).**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "\n", "### And now?\n", "\n", "The goal was to build a simple PoC of a recommender system.\n", "\n", "Goal achieved.\n", "\n", "But of course there is lots of room for improvement!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**I hope you enjoyed following the code as much as I did piecing it together.**\n", "\n", "If you have suggestions or remarks, you can reach me [here](https://www.linkedin.com/in/markus-maechler/).\n", "\n", "----" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }