{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Two-layer Territory Management. Optimizing territories considering clients and sales reps\n",
"\n",
"In territory management, a territory is a customer group or geographic area over which either an individual salesperson or a sales team has responsibility. These territories are usually defined based on geography, sales potential, number of clients or a combination of these factors.\n",
"\n",
"The main complexity in territory management is to create areas that are balanced with regards to more than one factor that usually behave very differently. There is no one-size-fits-all solution, and if the balance is off, sales management is likely to leave someone within their organization unhappy or leave money on the table. This is why it is very important to identify and understand all the components and requirements of your use case to apply the most appropriate technique.\n",
"\n",
"We can differentiate between two main use cases: when the location of sales reps is important (usually because they have to travel to visit their clients) and when it is not (travel rarely occurs). The first case is clearly more complex than the latter.\n",
"\n",
"In this notebook we will work on a Territory Management problem in which clusters need to be balanced in terms of number of clients and distance to their sales rep. This way, we will have two layers of data: one consisting of the client locations and the other one with the sales rep locations. We will prove the value Spatial Data Science techniques by showing their additional value compared to traditional techniques."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Use case description\n",
"\n",
"A pharma lab is interested in balancing their sales territories in the state of Texas based on the number of current and potential clients per territory. They would also like these clusters to be close to sales reps locations since they usually have to travel to visit them.\n",
"\n",
"Their clients are mainly offices and clinics of medical doctors.\n",
"\n",
"They are interested in creating 5 balanced territories taking into account number of clients and distance traveled.\n",
"\n",
"We will use the following three datasets from [CARTO's Data Observatory](https://carto.com/spatial-data-catalog/):\n",
"- Points of Interest (POIs). In particular, office and clinic of medical doctors POIs. We will use [Pitney Bowes POI-Consumer dataset](https://carto.com/spatial-data-catalog/browser/dataset/pb_consumer_po_62cddc04/).\n",
"- Texas boundary geometry. We'll use [Who's on First GeoJSON - Global dataset](https://carto.com/spatial-data-catalog/browser/geography/wof_geojson_4e78587c/).\n",
"- Sales rep locations. This dataset is provided by the client.\n",
"\n",
"*Note* the POI dataset is premium and a subscription is needed to access this data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 0. Setup\n",
"\n",
"We'll start by importing all packages we'll use."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import geopandas as gpd\n",
"import matplotlib.pyplot as plt\n",
"import networkx as nx\n",
"import numpy as np\n",
"import pandas as pd\n",
"import seaborn as sns\n",
"\n",
"from cartoframes.auth import set_default_credentials\n",
"from cartoframes.data.observatory import *\n",
"from cartoframes.viz import *\n",
"from h3 import h3\n",
"from libpysal.weights import Rook\n",
"from ortools.graph import pywrapgraph\n",
"from scipy.spatial.distance import cdist\n",
"from shapely import wkt\n",
"from shapely.geometry import mapping, Polygon\n",
"from sklearn.cluster import KMeans\n",
"from splot.libpysal import plot_spatial_weights\n",
"\n",
"pd.set_option('display.max_columns', None)\n",
"plt.rc('axes', titlesize='large')\n",
"plt.rc('xtick', labelsize='large')\n",
"plt.rc('ytick', labelsize='large')\n",
"sns.set_style('whitegrid')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to be able to use the Data Observatory via CARTOframes, you need to set your CARTO account credentials first.\n",
"\n",
"Please, visit the [Authentication guide](https://carto.com/developers/cartoframes/guides/Authentication/) for further detail."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"set_default_credentials('creds.json')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 0.1. Functions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following function creates an [H3](https://eng.uber.com/h3/) polyfill of the polygon and at the resolution indicated."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"def create_h3_grid(polygon, resolution=8):\n",
" hex_id_list = list(h3.polyfill(geojson = mapping(polygon), res = resolution, geo_json_conformant=True))\n",
" hexagon_list = list(map(lambda x : Polygon(h3.h3_to_geo_boundary(h=x, geo_json=True)), hex_id_list))\n",
" grid = pd.DataFrame(data={'hex_id':hex_id_list, 'geometry':hexagon_list})\n",
" grid = gpd.GeoDataFrame(grid, crs='epsg:4326')\n",
" return grid"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The function below is used throughout the analysis to check is clusters are balanced based on different metrics.\n",
"\n",
"The function arguments are:\n",
"- `cluster_names` so that we can provide descriptive names to clusters\n",
"- `areas_df` is the GeoDataFrame\n",
"- `groupby` is the column with the cluster to which each cell belongs to\n",
"- `**kaggregations` for the different metrics we'd like to calculate"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"def plot_cluster_comparison(cluster_names, areas_df, groupby, **kaggregations):\n",
" areas_df_g = areas_df.groupby(groupby).agg(kaggregations).reset_index()\n",
"\n",
" n_plots = len(kaggregations)\n",
" fig, axs = plt.subplots(1, n_plots, figsize=(9 + 3*n_plots,4))\n",
" if n_plots == 1:\n",
" axs = [axs]\n",
" \n",
" for i in range(n_plots):\n",
" sns.barplot(y=groupby, x=list(kaggregations.keys())[i], data=areas_df_g, order=cluster_names, \n",
" palette=['#7F3C8D','#11A579','#3969AC','#F2B701','#E73F74'], ax=axs[i])\n",
" axs[i].set_xlabel(list(kaggregations.keys())[i], fontsize=13)\n",
" axs[i].set_ylabel('Sales rep locations', fontsize=13)\n",
" \n",
" fig.tight_layout()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1. Download and visualize data\n",
"\n",
"Next, we will download the data described in the usecase using [CARTOframes](https://carto.com/developers/cartoframes/).\n",
"\n",
"*Note* in this notebook some prior knowledge on how to explore and download data from the [Data Observatory](https://carto.com/spatial-data-catalog/) is assumed. If this is your first time exploring and downloading data from the [Data Observatory](https://carto.com/spatial-data-catalog/), take a look at [CARTOframes Guides](https://carto.com/developers/cartoframes/guides/Introduction/) and the [Data Observatory examples](https://carto.com/developers/cartoframes/guides/Data-Observatory/) and **discover how easy it is to get started!**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 1.1 Texas boundary geometry\n",
"\n",
"We are interested in the geometry of the state of Texas. We'll download it from [Who's on First GeoJSON - Global dataset](https://carto.com/spatial-data-catalog/browser/geography/wof_geojson_4e78587c/)."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'slug': 'wof_geojson_4e78587c',\n",
" 'name': 'GeoJSON - Global',\n",
" 'description': \"The main table in Who's On First. Holds all the relevant information for a place in the 'body' JSON field.\",\n",
" 'country_id': 'glo',\n",
" 'provider_id': 'whos_on_first',\n",
" 'geom_type': 'MULTIPLE',\n",
" 'update_frequency': None,\n",
" 'is_public_data': True,\n",
" 'lang': 'eng',\n",
" 'version': '20190520',\n",
" 'provider_name': \"Who's On First\",\n",
" 'id': 'carto-do-public-data.whos_on_first.geography_glo_geojson_20190520'}"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wof_grographies = Geography.get('wof_geojson_4e78587c')\n",
"wof_grographies.to_dict()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
"
],
"text/plain": [
" geoid id body \\\n",
"0 85688753 85688753 {\"id\": 85688753, \"type\": \"Feature\", \"propertie... \n",
"\n",
" name country parent_id is_current placetype geometry_type \\\n",
"0 Texas US 85633793 1 region Polygon \n",
"\n",
" bbox \\\n",
"0 POLYGON((-93.508039 25.837164, -93.508039 36.5... \n",
"\n",
" geom lastmodified \\\n",
"0 POLYGON ((-103.06466 32.95910, -103.06442 32.0... 1555446728 \n",
"\n",
" lastmodified_timestamp \n",
"0 2019-04-16 20:32:08+00:00 "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"state_name = 'Texas'\n",
"country_code = 'US'\n",
"placetype = 'region'\n",
"\n",
"sql_query = f\"\"\"SELECT * \n",
" FROM $geography$ \n",
" WHERE name = '{state_name}' AND \n",
" country = '{country_code}' AND \n",
" placetype='{placetype}'\"\"\"\n",
"\n",
"tx_boundary = wof_grographies.to_dataframe(sql_query=sql_query)\n",
"tx_boundary.crs='epsg:4326'\n",
"tx_boundary['geom'] = tx_boundary.simplify(0.01)\n",
"tx_boundary"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 1.2. Client locations\n",
"\n",
"We'll download all POIs in Texas classified as \"OFFICES AND CLINICS OF MEDICAL DOCTORS\" from [Pitney Bowes POI-Consumer dataset](https://carto.com/spatial-data-catalog/browser/dataset/pb_consumer_po_62cddc04/).\n",
"\n",
"*Note* this is a premium dataset and a subscription is required."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"poi_dataset = Dataset.get('pb_consumer_po_62cddc04')"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
geoid
\n",
"
do_date
\n",
"
name
\n",
"
brandname
\n",
"
pb_id
\n",
"
trade_name
\n",
"
franchise_name
\n",
"
iso3
\n",
"
areaname4
\n",
"
areaname3
\n",
"
areaname2
\n",
"
areaname1
\n",
"
stabb
\n",
"
postcode
\n",
"
formattedaddress
\n",
"
mainaddressline
\n",
"
addresslastline
\n",
"
longitude
\n",
"
latitude
\n",
"
georesult
\n",
"
confidence_code
\n",
"
country_access_code
\n",
"
tel_num
\n",
"
faxnum
\n",
"
email
\n",
"
http
\n",
"
open_24h
\n",
"
business_line
\n",
"
sic1
\n",
"
sic2
\n",
"
sic8
\n",
"
sic8_description
\n",
"
alt_industry_code
\n",
"
micode
\n",
"
trade_division
\n",
"
group
\n",
"
class
\n",
"
sub_class
\n",
"
employee_here
\n",
"
employee_count
\n",
"
year_start
\n",
"
sales_volume_local
\n",
"
sales_volume_us_dollars
\n",
"
currency_code
\n",
"
agent_code
\n",
"
legal_status_code
\n",
"
status_code
\n",
"
subsidiary_indicator
\n",
"
parent_business_name
\n",
"
parent_address
\n",
"
parent_street_address
\n",
"
parent_areaname3
\n",
"
parent_areaname1
\n",
"
parent_country
\n",
"
parent_postcode
\n",
"
domestic_ultimate_business_name
\n",
"
domestic_ultimate_address
\n",
"
domestic_ultimate_street_address
\n",
"
domestic_ultimate_areaname3
\n",
"
domestic_ultimate_areaname1
\n",
"
domestic_ultimate_postcode
\n",
"
global_ultimate_indicator
\n",
"
global_ultimate_business_name
\n",
"
global_ultimate_address
\n",
"
global_ultimate_street_address
\n",
"
global_ultimate_areaname3
\n",
"
global_ultimate_areaname1
\n",
"
global_ultimate_country
\n",
"
global_ultimate_postcode
\n",
"
family_members
\n",
"
hierarchy_code
\n",
"
ticker_symbol
\n",
"
exchange_name
\n",
"
geom
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
1128750296#-96.398536#32.469934
\n",
"
2020-08-01
\n",
"
SIMMONS & ASSOC SOUTH CENTRAL LLC
\n",
"
NaN
\n",
"
1128750296
\n",
"
SIMMONS & ASSOCIATES
\n",
"
NaN
\n",
"
USA
\n",
"
NaN
\n",
"
SCURRY
\n",
"
NaN
\n",
"
TEXAS
\n",
"
TX
\n",
"
75158-3304
\n",
"
9084 FM 2451, SCURRY, TX, 75158-3304
\n",
"
9084 FM 2451
\n",
"
SCURRY, TX, 75158-3304
\n",
"
-96.398536
\n",
"
32.469934
\n",
"
S8HPNTSCZA
\n",
"
HIGH
\n",
"
1.0
\n",
"
(972) 452-8013
\n",
"
NaN
\n",
"
NaN
\n",
"
WWW.SIMMONSINC.COM
\n",
"
NaN
\n",
"
OFFICES AND CLINICS MEDICAL DOCTORS,NSK
\n",
"
8011.0
\n",
"
NaN
\n",
"
80110000
\n",
"
OFFICES AND CLINICS OF MEDICAL DOCTORS
\n",
"
621111.0
\n",
"
10238011
\n",
"
DIVISION I. - SERVICES
\n",
"
HEALTH SERVICES
\n",
"
OFFICES AND CLINICS OF DOCTORS OF MEDICINE
\n",
"
OFFICES AND CLINICS OF MEDICAL DOCTORS
\n",
"
1.0
\n",
"
1.0
\n",
"
1997.0
\n",
"
251657.0
\n",
"
251657.0
\n",
"
20.0
\n",
"
G
\n",
"
3.0
\n",
"
0.0
\n",
"
0.0
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
N
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
0.0
\n",
"
0.0
\n",
"
NaN
\n",
"
NaN
\n",
"
POINT (-96.39854 32.46993)
\n",
"
\n",
"
\n",
"
1
\n",
"
1217171653#-96.858002#32.715481
\n",
"
2020-08-01
\n",
"
SOUTHWEST DALLAS ORTHOPEDIC ASSOCIATES
\n",
"
NaN
\n",
"
1217171653
\n",
"
NaN
\n",
"
NaN
\n",
"
USA
\n",
"
NaN
\n",
"
DALLAS
\n",
"
NaN
\n",
"
TEXAS
\n",
"
TX
\n",
"
75224-3059
\n",
"
2909 S HAMPTON RD STE D121, DALLAS, TX, 75224-...
\n",
"
2909 S HAMPTON RD STE D121
\n",
"
DALLAS, TX, 75224-3059
\n",
"
-96.858002
\n",
"
32.715481
\n",
"
S8HPNTSCZA
\n",
"
HIGH
\n",
"
1.0
\n",
"
(214) 333-3741
\n",
"
NaN
\n",
"
NaN
\n",
"
WWW.DALLASORTHO.COM
\n",
"
NaN
\n",
"
OFFICES AND CLINICS MEDICAL DOCTORS,NSK
\n",
"
8011.0
\n",
"
NaN
\n",
"
80110514
\n",
"
ORTHOPEDIC PHYSICIAN
\n",
"
621111.0
\n",
"
10942514
\n",
"
DIVISION I. - SERVICES
\n",
"
HEALTH SERVICES
\n",
"
OFFICES AND CLINICS OF DOCTORS OF MEDICINE
\n",
"
OFFICES AND CLINICS OF MEDICAL DOCTORS
\n",
"
6.0
\n",
"
6.0
\n",
"
1991.0
\n",
"
620608.0
\n",
"
620608.0
\n",
"
20.0
\n",
"
G
\n",
"
13.0
\n",
"
0.0
\n",
"
0.0
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
N
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
0.0
\n",
"
0.0
\n",
"
NaN
\n",
"
NaN
\n",
"
POINT (-96.85800 32.71548)
\n",
"
\n",
"
\n",
"
2
\n",
"
1123005494#-97.104542#32.926825
\n",
"
2020-08-01
\n",
"
S ROBERT HARLA DOPA
\n",
"
NaN
\n",
"
1123005494
\n",
"
GRAPEVINE DERMATOLOGY
\n",
"
NaN
\n",
"
USA
\n",
"
NaN
\n",
"
GRAPEVINE
\n",
"
NaN
\n",
"
TEXAS
\n",
"
TX
\n",
"
76051-8632
\n",
"
2321 IRA E WOODS AVE STE 180, GRAPEVINE, TX, 7...
\n",
"
2321 IRA E WOODS AVE STE 180
\n",
"
GRAPEVINE, TX, 76051-8632
\n",
"
-97.104542
\n",
"
32.926825
\n",
"
S8HPNTSCZA
\n",
"
HIGH
\n",
"
1.0
\n",
"
(817) 329-2263
\n",
"
NaN
\n",
"
NaN
\n",
"
WWW.DERMDFW.COM
\n",
"
NaN
\n",
"
OFFICES AND CLINICS MEDICAL DOCTORS,NSK
\n",
"
8011.0
\n",
"
NaN
\n",
"
80110503
\n",
"
DERMATOLOGIST
\n",
"
621111.0
\n",
"
10942503
\n",
"
DIVISION I. - SERVICES
\n",
"
HEALTH SERVICES
\n",
"
OFFICES AND CLINICS OF DOCTORS OF MEDICINE
\n",
"
OFFICES AND CLINICS OF MEDICAL DOCTORS
\n",
"
13.0
\n",
"
13.0
\n",
"
1990.0
\n",
"
1345253.0
\n",
"
1345253.0
\n",
"
20.0
\n",
"
G
\n",
"
13.0
\n",
"
0.0
\n",
"
0.0
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
N
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
0.0
\n",
"
0.0
\n",
"
NaN
\n",
"
NaN
\n",
"
POINT (-97.10454 32.92683)
\n",
"
\n",
"
\n",
"
3
\n",
"
1221252299#-101.902112#33.573453
\n",
"
2020-08-01
\n",
"
CONSULTANTS IN INFECTIOUS DISEASES LLP
\n",
"
NaN
\n",
"
1221252299
\n",
"
NaN
\n",
"
NaN
\n",
"
USA
\n",
"
NaN
\n",
"
LUBBOCK
\n",
"
NaN
\n",
"
TEXAS
\n",
"
TX
\n",
"
79410-1804
\n",
"
4102 24TH ST STE 403, LUBBOCK, TX, 79410-1804
\n",
"
4102 24TH ST STE 403
\n",
"
LUBBOCK, TX, 79410-1804
\n",
"
-101.902112
\n",
"
33.573453
\n",
"
S8HPNTSCZA
\n",
"
HIGH
\n",
"
1.0
\n",
"
(806) 725-7150
\n",
"
NaN
\n",
"
NaN
\n",
"
WWW.COVMEDGROUP.ORG
\n",
"
NaN
\n",
"
OFFICES AND CLINICS MEDICAL DOCTORS,NSK
\n",
"
8011.0
\n",
"
NaN
\n",
"
80110510
\n",
"
INFECTIOUS DISEASE SPECIALIST, PHYSICIAN/SURGEON
\n",
"
621111.0
\n",
"
10942510
\n",
"
DIVISION I. - SERVICES
\n",
"
HEALTH SERVICES
\n",
"
OFFICES AND CLINICS OF DOCTORS OF MEDICINE
\n",
"
OFFICES AND CLINICS OF MEDICAL DOCTORS
\n",
"
22.0
\n",
"
22.0
\n",
"
1995.0
\n",
"
1058845.0
\n",
"
1058845.0
\n",
"
20.0
\n",
"
G
\n",
"
12.0
\n",
"
0.0
\n",
"
0.0
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
N
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
0.0
\n",
"
0.0
\n",
"
NaN
\n",
"
NaN
\n",
"
POINT (-101.90211 33.57345)
\n",
"
\n",
"
\n",
"
4
\n",
"
1217934291#-95.419502#29.170283
\n",
"
2020-08-01
\n",
"
SUZAN CARPENTER
\n",
"
NaN
\n",
"
1217934291
\n",
"
CARPENTER, SU ZAN MD
\n",
"
NaN
\n",
"
USA
\n",
"
NaN
\n",
"
ANGLETON
\n",
"
NaN
\n",
"
TEXAS
\n",
"
TX
\n",
"
77515-5836
\n",
"
1113 E CEDAR ST, ANGLETON, TX, 77515-5836
\n",
"
1113 E CEDAR ST
\n",
"
ANGLETON, TX, 77515-5836
\n",
"
-95.419502
\n",
"
29.170283
\n",
"
S8HPNTSCZA
\n",
"
HIGH
\n",
"
1.0
\n",
"
(979) 849-5703
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
OFFICES AND CLINICS MEDICAL DOCTORS,NSK
\n",
"
8011.0
\n",
"
NaN
\n",
"
80119901
\n",
"
GENERAL AND FAMILY PRACTICE, PHYSICIAN/SURGEON
\n",
"
621111.0
\n",
"
10230302
\n",
"
DIVISION I. - SERVICES
\n",
"
HEALTH SERVICES
\n",
"
OFFICES AND CLINICS OF DOCTORS OF MEDICINE
\n",
"
OFFICES AND CLINICS OF MEDICAL DOCTORS
\n",
"
3.0
\n",
"
3.0
\n",
"
1990.0
\n",
"
257133.0
\n",
"
257133.0
\n",
"
20.0
\n",
"
G
\n",
"
13.0
\n",
"
0.0
\n",
"
0.0
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
N
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
0.0
\n",
"
0.0
\n",
"
NaN
\n",
"
NaN
\n",
"
POINT (-95.41950 29.17028)
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" geoid do_date \\\n",
"0 1128750296#-96.398536#32.469934 2020-08-01 \n",
"1 1217171653#-96.858002#32.715481 2020-08-01 \n",
"2 1123005494#-97.104542#32.926825 2020-08-01 \n",
"3 1221252299#-101.902112#33.573453 2020-08-01 \n",
"4 1217934291#-95.419502#29.170283 2020-08-01 \n",
"\n",
" name brandname pb_id \\\n",
"0 SIMMONS & ASSOC SOUTH CENTRAL LLC NaN 1128750296 \n",
"1 SOUTHWEST DALLAS ORTHOPEDIC ASSOCIATES NaN 1217171653 \n",
"2 S ROBERT HARLA DOPA NaN 1123005494 \n",
"3 CONSULTANTS IN INFECTIOUS DISEASES LLP NaN 1221252299 \n",
"4 SUZAN CARPENTER NaN 1217934291 \n",
"\n",
" trade_name franchise_name iso3 areaname4 areaname3 areaname2 \\\n",
"0 SIMMONS & ASSOCIATES NaN USA NaN SCURRY NaN \n",
"1 NaN NaN USA NaN DALLAS NaN \n",
"2 GRAPEVINE DERMATOLOGY NaN USA NaN GRAPEVINE NaN \n",
"3 NaN NaN USA NaN LUBBOCK NaN \n",
"4 CARPENTER, SU ZAN MD NaN USA NaN ANGLETON NaN \n",
"\n",
" areaname1 stabb postcode \\\n",
"0 TEXAS TX 75158-3304 \n",
"1 TEXAS TX 75224-3059 \n",
"2 TEXAS TX 76051-8632 \n",
"3 TEXAS TX 79410-1804 \n",
"4 TEXAS TX 77515-5836 \n",
"\n",
" formattedaddress \\\n",
"0 9084 FM 2451, SCURRY, TX, 75158-3304 \n",
"1 2909 S HAMPTON RD STE D121, DALLAS, TX, 75224-... \n",
"2 2321 IRA E WOODS AVE STE 180, GRAPEVINE, TX, 7... \n",
"3 4102 24TH ST STE 403, LUBBOCK, TX, 79410-1804 \n",
"4 1113 E CEDAR ST, ANGLETON, TX, 77515-5836 \n",
"\n",
" mainaddressline addresslastline longitude \\\n",
"0 9084 FM 2451 SCURRY, TX, 75158-3304 -96.398536 \n",
"1 2909 S HAMPTON RD STE D121 DALLAS, TX, 75224-3059 -96.858002 \n",
"2 2321 IRA E WOODS AVE STE 180 GRAPEVINE, TX, 76051-8632 -97.104542 \n",
"3 4102 24TH ST STE 403 LUBBOCK, TX, 79410-1804 -101.902112 \n",
"4 1113 E CEDAR ST ANGLETON, TX, 77515-5836 -95.419502 \n",
"\n",
" latitude georesult confidence_code country_access_code tel_num \\\n",
"0 32.469934 S8HPNTSCZA HIGH 1.0 (972) 452-8013 \n",
"1 32.715481 S8HPNTSCZA HIGH 1.0 (214) 333-3741 \n",
"2 32.926825 S8HPNTSCZA HIGH 1.0 (817) 329-2263 \n",
"3 33.573453 S8HPNTSCZA HIGH 1.0 (806) 725-7150 \n",
"4 29.170283 S8HPNTSCZA HIGH 1.0 (979) 849-5703 \n",
"\n",
" faxnum email http open_24h \\\n",
"0 NaN NaN WWW.SIMMONSINC.COM NaN \n",
"1 NaN NaN WWW.DALLASORTHO.COM NaN \n",
"2 NaN NaN WWW.DERMDFW.COM NaN \n",
"3 NaN NaN WWW.COVMEDGROUP.ORG NaN \n",
"4 NaN NaN NaN NaN \n",
"\n",
" business_line sic1 sic2 sic8 \\\n",
"0 OFFICES AND CLINICS MEDICAL DOCTORS,NSK 8011.0 NaN 80110000 \n",
"1 OFFICES AND CLINICS MEDICAL DOCTORS,NSK 8011.0 NaN 80110514 \n",
"2 OFFICES AND CLINICS MEDICAL DOCTORS,NSK 8011.0 NaN 80110503 \n",
"3 OFFICES AND CLINICS MEDICAL DOCTORS,NSK 8011.0 NaN 80110510 \n",
"4 OFFICES AND CLINICS MEDICAL DOCTORS,NSK 8011.0 NaN 80119901 \n",
"\n",
" sic8_description alt_industry_code \\\n",
"0 OFFICES AND CLINICS OF MEDICAL DOCTORS 621111.0 \n",
"1 ORTHOPEDIC PHYSICIAN 621111.0 \n",
"2 DERMATOLOGIST 621111.0 \n",
"3 INFECTIOUS DISEASE SPECIALIST, PHYSICIAN/SURGEON 621111.0 \n",
"4 GENERAL AND FAMILY PRACTICE, PHYSICIAN/SURGEON 621111.0 \n",
"\n",
" micode trade_division group \\\n",
"0 10238011 DIVISION I. - SERVICES HEALTH SERVICES \n",
"1 10942514 DIVISION I. - SERVICES HEALTH SERVICES \n",
"2 10942503 DIVISION I. - SERVICES HEALTH SERVICES \n",
"3 10942510 DIVISION I. - SERVICES HEALTH SERVICES \n",
"4 10230302 DIVISION I. - SERVICES HEALTH SERVICES \n",
"\n",
" class \\\n",
"0 OFFICES AND CLINICS OF DOCTORS OF MEDICINE \n",
"1 OFFICES AND CLINICS OF DOCTORS OF MEDICINE \n",
"2 OFFICES AND CLINICS OF DOCTORS OF MEDICINE \n",
"3 OFFICES AND CLINICS OF DOCTORS OF MEDICINE \n",
"4 OFFICES AND CLINICS OF DOCTORS OF MEDICINE \n",
"\n",
" sub_class employee_here employee_count \\\n",
"0 OFFICES AND CLINICS OF MEDICAL DOCTORS 1.0 1.0 \n",
"1 OFFICES AND CLINICS OF MEDICAL DOCTORS 6.0 6.0 \n",
"2 OFFICES AND CLINICS OF MEDICAL DOCTORS 13.0 13.0 \n",
"3 OFFICES AND CLINICS OF MEDICAL DOCTORS 22.0 22.0 \n",
"4 OFFICES AND CLINICS OF MEDICAL DOCTORS 3.0 3.0 \n",
"\n",
" year_start sales_volume_local sales_volume_us_dollars currency_code \\\n",
"0 1997.0 251657.0 251657.0 20.0 \n",
"1 1991.0 620608.0 620608.0 20.0 \n",
"2 1990.0 1345253.0 1345253.0 20.0 \n",
"3 1995.0 1058845.0 1058845.0 20.0 \n",
"4 1990.0 257133.0 257133.0 20.0 \n",
"\n",
" agent_code legal_status_code status_code subsidiary_indicator \\\n",
"0 G 3.0 0.0 0.0 \n",
"1 G 13.0 0.0 0.0 \n",
"2 G 13.0 0.0 0.0 \n",
"3 G 12.0 0.0 0.0 \n",
"4 G 13.0 0.0 0.0 \n",
"\n",
" parent_business_name parent_address parent_street_address parent_areaname3 \\\n",
"0 NaN NaN NaN NaN \n",
"1 NaN NaN NaN NaN \n",
"2 NaN NaN NaN NaN \n",
"3 NaN NaN NaN NaN \n",
"4 NaN NaN NaN NaN \n",
"\n",
" parent_areaname1 parent_country parent_postcode \\\n",
"0 NaN NaN NaN \n",
"1 NaN NaN NaN \n",
"2 NaN NaN NaN \n",
"3 NaN NaN NaN \n",
"4 NaN NaN NaN \n",
"\n",
" domestic_ultimate_business_name domestic_ultimate_address \\\n",
"0 NaN NaN \n",
"1 NaN NaN \n",
"2 NaN NaN \n",
"3 NaN NaN \n",
"4 NaN NaN \n",
"\n",
" domestic_ultimate_street_address domestic_ultimate_areaname3 \\\n",
"0 NaN NaN \n",
"1 NaN NaN \n",
"2 NaN NaN \n",
"3 NaN NaN \n",
"4 NaN NaN \n",
"\n",
" domestic_ultimate_areaname1 domestic_ultimate_postcode \\\n",
"0 NaN NaN \n",
"1 NaN NaN \n",
"2 NaN NaN \n",
"3 NaN NaN \n",
"4 NaN NaN \n",
"\n",
" global_ultimate_indicator global_ultimate_business_name \\\n",
"0 N NaN \n",
"1 N NaN \n",
"2 N NaN \n",
"3 N NaN \n",
"4 N NaN \n",
"\n",
" global_ultimate_address global_ultimate_street_address \\\n",
"0 NaN NaN \n",
"1 NaN NaN \n",
"2 NaN NaN \n",
"3 NaN NaN \n",
"4 NaN NaN \n",
"\n",
" global_ultimate_areaname3 global_ultimate_areaname1 global_ultimate_country \\\n",
"0 NaN NaN NaN \n",
"1 NaN NaN NaN \n",
"2 NaN NaN NaN \n",
"3 NaN NaN NaN \n",
"4 NaN NaN NaN \n",
"\n",
" global_ultimate_postcode family_members hierarchy_code ticker_symbol \\\n",
"0 NaN 0.0 0.0 NaN \n",
"1 NaN 0.0 0.0 NaN \n",
"2 NaN 0.0 0.0 NaN \n",
"3 NaN 0.0 0.0 NaN \n",
"4 NaN 0.0 0.0 NaN \n",
"\n",
" exchange_name geom \n",
"0 NaN POINT (-96.39854 32.46993) \n",
"1 NaN POINT (-96.85800 32.71548) \n",
"2 NaN POINT (-97.10454 32.92683) \n",
"3 NaN POINT (-101.90211 33.57345) \n",
"4 NaN POINT (-95.41950 29.17028) "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sql_query = \"\"\"\n",
" SELECT * except(do_label) FROM $dataset$ \n",
" WHERE SUB_CLASS = 'OFFICES AND CLINICS OF MEDICAL DOCTORS' \n",
" AND STABB = 'TX'\n",
" AND CAST(do_date AS date) >= (SELECT MAX(CAST(do_date AS date)) from $dataset$)\n",
"\"\"\"\n",
"pois = poi_dataset.to_dataframe(sql_query=sql_query)\n",
"pois.columns = list(map(str.lower, pois.columns))\n",
"pois.crs='epsg:4326'\n",
"pois.head()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(59554, 74)"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pois.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 1.3. Sales reps"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
id
\n",
"
city
\n",
"
state_name
\n",
"
lat
\n",
"
lon
\n",
"
geometry
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
0
\n",
"
Dallas
\n",
"
Texas
\n",
"
32.725728
\n",
"
-96.844694
\n",
"
POINT (-96.84469 32.72573)
\n",
"
\n",
"
\n",
"
1
\n",
"
1
\n",
"
Austin
\n",
"
Texas
\n",
"
30.255450
\n",
"
-97.729611
\n",
"
POINT (-97.72961 30.25545)
\n",
"
\n",
"
\n",
"
2
\n",
"
2
\n",
"
San Antonio
\n",
"
Texas
\n",
"
29.424510
\n",
"
-98.524053
\n",
"
POINT (-98.52405 29.42451)
\n",
"
\n",
"
\n",
"
3
\n",
"
3
\n",
"
Houston
\n",
"
Texas
\n",
"
29.778360
\n",
"
-95.332154
\n",
"
POINT (-95.33215 29.77836)
\n",
"
\n",
"
\n",
"
4
\n",
"
4
\n",
"
El Paso
\n",
"
Texas
\n",
"
31.795306
\n",
"
-106.456893
\n",
"
POINT (-106.45689 31.79531)
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id city state_name lat lon \\\n",
"0 0 Dallas Texas 32.725728 -96.844694 \n",
"1 1 Austin Texas 30.255450 -97.729611 \n",
"2 2 San Antonio Texas 29.424510 -98.524053 \n",
"3 3 Houston Texas 29.778360 -95.332154 \n",
"4 4 El Paso Texas 31.795306 -106.456893 \n",
"\n",
" geometry \n",
"0 POINT (-96.84469 32.72573) \n",
"1 POINT (-97.72961 30.25545) \n",
"2 POINT (-98.52405 29.42451) \n",
"3 POINT (-95.33215 29.77836) \n",
"4 POINT (-106.45689 31.79531) "
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sales_reps = pd.read_csv('data/sales_reps.csv')\n",
"sales_reps['geometry'] = gpd.points_from_xy(sales_reps['lon'], sales_reps['lat'])\n",
"sales_reps = gpd.GeoDataFrame(sales_reps, crs='epsg:4326')\n",
"sales_reps"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 1.4. Visualize data"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Map([Layer(tx_boundary,\n",
" style=basic_style(opacity=0, stroke_color='#11A579', stroke_width=5),\n",
" legends=basic_legend('Texas Boundary')),\n",
" Layer(pois.sample(5000), \n",
" style=basic_style(color='#F2B701', size=2, opacity=0.9, stroke_width=0),\n",
" popup_hover=[popup_element('name', 'Client'),\n",
" popup_element('employee_here', '# Employees')],\n",
" legends=basic_legend('Client Locations')),\n",
" Layer(sales_reps,\n",
" style=basic_style(color='red', size=5, stroke_width=0),\n",
" popup_hover=popup_element('city', 'City'),\n",
" legends=basic_legend('Sales Rep Locations'))], \n",
" basemap=basemaps.darkmatter)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2. Discretize space. H3 grid\n",
"\n",
"A fundamental step in territory management is to discretize space. Territory management algorithms are computationally complex and hence it is crucial to leverage the spatial component to reduce complexity. We can do this by working at an aggregated level instead of considering each client location independently.\n",
"\n",
"We first need to identify the smallest spatial aggregation that makes sense for your business, our **geographic support**. This can be census block groups, zip codes or counties, or you can be interested in using a standard grid, in which case it would ideally be a hierarchical spatial index such as [Quadkey grid](https://docs.microsoft.com/en-us/azure/azure-maps/zoom-levels-and-tile-grid?tabs=csharp) or [H3 grid](https://eng.uber.com/h3/).\n",
"\n",
"In this notebook we will use an H3 grid of resolution 5. We can easily discretize space by performing a polyfill of the Texas boundary polygon. \n",
"\n",
"*Note* a buffer has been applied because H3 will fill the polygon with all hexagons of resolution 5 whose centroid lies within the polygon to be filled and we want to make sure the whole territory is covered."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
"
],
"text/plain": [
" hex_id geometry \\\n",
"0 8526d9d7fffffff POLYGON ((-99.33557 32.87875, -99.26773 32.953... \n",
"1 8526d9a3fffffff POLYGON ((-98.71570 32.47235, -98.64788 32.546... \n",
"2 8526dc1bfffffff POLYGON ((-100.32151 34.39136, -100.25314 34.4... \n",
"3 85446c77fffffff POLYGON ((-94.98183 29.87978, -94.91454 29.952... \n",
"4 85444db7fffffff POLYGON ((-95.10421 32.87010, -95.03517 32.941... \n",
"\n",
" poi_count employee_avg \n",
"0 0 0.000000 \n",
"1 0 0.000000 \n",
"2 0 0.000000 \n",
"3 13 3.833333 \n",
"4 1 3.000000 "
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pois_g = gpd.sjoin(pois, grid, how='right').groupby('hex_id').agg({'geoid':'count', 'employee_here':'mean'}).\\\n",
" reset_index().rename(columns={'geoid':'poi_count', 'employee_here':'employee_avg'})\n",
"pois_g[['poi_count', 'employee_avg']] = pois_g[['poi_count', 'employee_avg']].fillna(0)\n",
"areas = grid.merge(pois_g, on='hex_id')\n",
"areas = gpd.GeoDataFrame(areas, crs='epsg:4326')\n",
"areas.head()"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"count 2584.000000\n",
"mean 23.047214\n",
"std 151.594386\n",
"min 0.000000\n",
"50% 0.000000\n",
"75% 1.000000\n",
"90% 14.000000\n",
"97.5% 226.000000\n",
"99% 571.870000\n",
"max 3471.000000\n",
"Name: poi_count, dtype: float64"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"areas['poi_count'].describe(percentiles=[0.75, 0.9, 0.975, 0.99])"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"breaks = [15, 200, 550]\n",
"\n",
"Map([Layer(areas[areas['poi_count'] > 0], \n",
" style=color_bins_style('poi_count', breaks=breaks),\n",
" legends=color_bins_legend('Number of Clients')),\n",
" Layer(sales_reps, \n",
" style=basic_style('#F2B701'), \n",
" popup_hover=popup_element('city'),\n",
" legends=basic_legend('Sales Reps'))])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3. Territory Optimization\n",
"\n",
"Once we have our data aggregated, it's time to start working on building balanced territories.\n",
"\n",
"It is important to note that clients mainly concentrate around sales rep locations, and there are large areas with a very low density of clients. In addition, the sales rep in El Paso is very far from Eastern Texas, where most clients concentrate. This means it's going to be very challenging to balance distance and number of clients, especially for the sales rep in El Paso.\n",
"\n",
"We will explore two different techniques:\n",
"- Closest distance.\n",
"- [Minimum cost flow approach](https://en.wikipedia.org/wiki/Minimum-cost_flow_problem#:~:text=The%20minimum%2Dcost%20flow%20problem,flow%20through%20a%20flow%20network.). "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 3.1. Data preparation\n",
"\n",
"For both techniques we need to calculate sales rep - client distance matrix.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Distance metric**\n",
"\n",
"*Note* different distance metrics can be used for calculating distance. Results change significantly wdepending on this metric.\n",
" - `metric`: Distance metric. [Here](https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html) you can find all available metrics."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"metric = 'cityblock' # 'euclidean'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Representative point**\n",
"\n",
"In order to calculate distances, we need a reference location for each cell. Since we are using regular polygons, we can use the centroid. However, selecting the centroid is discouraged with non-convex geometries as they may have their centroid out of the polygon. If that's the case, the Shapely function `representative_point()` returns a point within the polygon."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plot_cluster_comparison(sales_reps['city'].tolist(), areas[areas['poi_count']>0], 'sales_rep_1_cat', sales_rep_dist_1='max')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 3.3. Approach 2. Minimum Cost Flow\n",
"\n",
"[Minimum cost flow](https://en.wikipedia.org/wiki/Minimum-cost_flow_problem#:~:text=The%20minimum%2Dcost%20flow%20problem,flow%20through%20a%20flow%20network.) is a well known type of problems in Optimization that finds the cheapest possible way of sending a certain amount of flow through a flow network. Initially it is not the approach to follow for spatial clustering problems. However, since we can represent our problem as a graph and interpret sales reps as source nodes and grid cells as sink nodes, we can use algorithms designed for Minimum Cost Flow problems for solving Two-Layer Territory Management use cases.\n",
"\n",
"We will use [ortools pywrapgraph wrapper](https://developers.google.com/optimization/flow/mincostflow) as solver."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### 3.3.1. Data preparation\n",
"\n",
"We need to perform some extra data processing for in order tu use this algorithm."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Balancing criteria**\n",
"\n",
"We would like to balance clusters based on total number of clients and maximum distance. Normally we are not looking for a perfect balance, especially when dealing with more than one multiple criteria, and a balance tolerance is introduced. In our case, we will consider a tolerance of 10% in number of clients, which means that we allow clusters to be as much as 10% below the perfect balance.\n",
"\n",
"*Note* if you get very unbalanced clusters in terms of maximum distance, you should set a higher tolerance.\n",
"\n",
"*Note* we are only using the number of clients and the maximum distance to balance, but this dataset also has the number of employees per client and you might even have other data you might be interested in using. In order to introduce other criteria in this algorithm, you need to create a multidimensional index to combine all criteria. Distance may or may not be included as it is always treated separately."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The maximum number of clients a sales rep can have assigned is 13102\n"
]
}
],
"source": [
"balance_tolerance = 0.1 # 10%\n",
"max_pois= int(np.round((1+balance_tolerance) * areas['poi_count'].sum()/sales_reps.shape[0]))\n",
"print('The maximum number of clients a sales rep can have assigned is', max_pois)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Pairwise distance**\n",
"\n",
"First, we need pairwise distances between cells. We will use the euclidean distance in this case."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"catesian_dist = cdist(gpd.GeoSeries(areas['rep_point'], crs='epsg:4326').to_crs('epsg:26914').apply(lambda point:[point.x, point.y]).tolist(),\n",
" gpd.GeoSeries(areas['rep_point'], crs='epsg:4326').to_crs('epsg:26914').apply(lambda point:[point.x, point.y]).tolist())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Consider all cells**\n",
"\n",
"Even though we only care about cells with clients, we will consider all of them to reduce complexity. This way, we only need to consider contiguous cells as adjacent, otherwise we would need to define some k-nearest weights to make sure we can get from one cell to any other.\n",
"\n",
"In order for the algorithm to not disregard cells without clients, we artifically add 1 to all cells."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"areas['poi_count'] = areas['poi_count']+1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Graph component data**\n",
"\n",
"Next, we need to generate all information required by the algorithm, namely:\n",
"- Adjacency matrix\n",
"- Edges, unit cost, capacity, and supply/demand"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*Adjacency matrix*\n",
"\n",
"We will use [Rook weights](https://pysal.org/libpysal/generated/libpysal.weights.Rook.html) which considers two polygons to be contiguous if they share one edge."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
"wgt = Rook.from_dataframe(areas, geom_col='geometry')"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[(2, 5), (3, 65), (4, 123), (5, 69), (6, 2322)]"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wgt.histogram"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*Edges, unit cost, capacity, and supply/demand*\n",
"\n",
"Here we create all graph components required by the algorithm."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
start_node
\n",
"
end_node
\n",
"
capacity
\n",
"
unit_cost
\n",
"
\n",
" \n",
" \n",
"
\n",
"
14974
\n",
"
2584
\n",
"
2589
\n",
"
13102
\n",
"
20
\n",
"
\n",
"
\n",
"
14975
\n",
"
2584
\n",
"
255
\n",
"
13102
\n",
"
0
\n",
"
\n",
"
\n",
"
14976
\n",
"
2585
\n",
"
2589
\n",
"
13102
\n",
"
20
\n",
"
\n",
"
\n",
"
14977
\n",
"
2585
\n",
"
1850
\n",
"
13102
\n",
"
0
\n",
"
\n",
"
\n",
"
14978
\n",
"
2586
\n",
"
2589
\n",
"
13102
\n",
"
20
\n",
"
\n",
"
\n",
"
14979
\n",
"
2586
\n",
"
1165
\n",
"
13102
\n",
"
0
\n",
"
\n",
"
\n",
"
14980
\n",
"
2587
\n",
"
2589
\n",
"
13102
\n",
"
20
\n",
"
\n",
"
\n",
"
14981
\n",
"
2587
\n",
"
2235
\n",
"
13102
\n",
"
0
\n",
"
\n",
"
\n",
"
14982
\n",
"
2588
\n",
"
2589
\n",
"
13102
\n",
"
20
\n",
"
\n",
"
\n",
"
14983
\n",
"
2588
\n",
"
547
\n",
"
13102
\n",
"
0
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" start_node end_node capacity unit_cost\n",
"14974 2584 2589 13102 20\n",
"14975 2584 255 13102 0\n",
"14976 2585 2589 13102 20\n",
"14977 2585 1850 13102 0\n",
"14978 2586 2589 13102 20\n",
"14979 2586 1165 13102 0\n",
"14980 2587 2589 13102 20\n",
"14981 2587 2235 13102 0\n",
"14982 2588 2589 13102 20\n",
"14983 2588 547 13102 0"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Starting idx for sales reps and default sink\n",
"sales_reps_first_idx = areas.shape[0]\n",
"default_sink_idx = sales_reps_first_idx + sales_reps.shape[0]\n",
"\n",
"# Default sink demand\n",
"default_sink_demand = int(max_pois*sales_reps.shape[0]-areas['poi_count'].sum())\n",
"\n",
"# Edges, unit cost, and capacity for grid cell connections\n",
"graph_info_nd_array = [[i, k, max_pois, int(np.round(catesian_dist[i][k]/1e3))] for i in range(len(wgt.cardinalities)) for k in wgt[i]]\n",
"\n",
"# Default sink unit cost\n",
"default_sink_ucost = np.transpose(graph_info_nd_array)[-1].max()+1\n",
"\n",
"# Edges, unit cost, and capacity for sales rep - cell connections and default sink\n",
"for idx, row in gpd.sjoin(sales_reps, areas).iterrows():\n",
" # default sink\n",
" graph_info_nd_array.append([sales_reps_first_idx+idx, default_sink_idx, max_pois, default_sink_ucost])\n",
" # same cell connection\n",
" cell_id = row['index_right']\n",
" graph_info_nd_array.append([sales_reps_first_idx+idx, cell_id, max_pois, 0])\n",
" \n",
"# DataFrame with edges, unit cost, and capacity\n",
"graph_info = pd.DataFrame(graph_info_nd_array, columns=['start_node', 'end_node', 'capacity', 'unit_cost'])\n",
"graph_info.tail(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### 3.3.2. Calculate clusters"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We first build the graph with all its information."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"# Instantiate a SimpleMinCostFlow solver.\n",
"min_cost_flow = pywrapgraph.SimpleMinCostFlow()\n",
"\n",
"# Add edges, unit costs, and capacities\n",
"for idx, row in graph_info.iterrows():\n",
" min_cost_flow.AddArcWithCapacityAndUnitCost(row['start_node'].item(), row['end_node'].item(), row['capacity'].item(), row['unit_cost'].item())\n",
" \n",
"# Add supply/demand\n",
"for idx, row in areas.iterrows():\n",
" min_cost_flow.SetNodeSupply(idx, -row['poi_count'])\n",
" \n",
"for idx, row in sales_reps.iterrows():\n",
" min_cost_flow.SetNodeSupply(sales_reps_first_idx+idx, max_pois)\n",
" \n",
"min_cost_flow.SetNodeSupply(default_sink_idx, -default_sink_demand)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We're ready to solve now."
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Minimum cost: 11903441\n"
]
}
],
"source": [
"if min_cost_flow.Solve() == min_cost_flow.OPTIMAL:\n",
" print('Minimum cost:', min_cost_flow.OptimalCost())\n",
" '''print('')\n",
" print(' Arc Flow / Capacity Cost')\n",
" for i in range(min_cost_flow.NumArcs()):\n",
" cost = min_cost_flow.Flow(i) * min_cost_flow.UnitCost(i)\n",
" print('%1s -> %1s %3s / %3s %3s' % (\n",
" min_cost_flow.Tail(i),\n",
" min_cost_flow.Head(i),\n",
" min_cost_flow.Flow(i),\n",
" min_cost_flow.Capacity(i),\n",
" cost))'''\n",
"else:\n",
" print('There was an issue with the min cost flow input.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### 3.3.3. Label grid cells\n",
"\n",
"As a solution of the Minimum Cost Flow problem, we have a graph with 5 components, one per cluster. In order to infer the cluster to which each cell belongs, we need to recursively explore the graph.\n",
"\n",
"Let's first build the solution graph."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
}
],
"source": [
"SG = nx.DiGraph()\n",
"\n",
"nodes = [(i, {'color':'blue'}) for i in range(areas.shape[0])]\n",
"nodes += [(i+sales_reps_first_idx, {'color':'red'}) for i in range(sales_reps.shape[0])]\n",
"SG.add_nodes_from(nodes)\n",
"\n",
"edges = []\n",
"total = 0\n",
"for i in range(min_cost_flow.NumArcs()):\n",
" if min_cost_flow.Flow(i) > 0.1 and min_cost_flow.Head(i) != default_sink_idx: \n",
" edges.append((min_cost_flow.Tail(i), min_cost_flow.Head(i)))\n",
" total +=1\n",
" \n",
"SG.add_edges_from(edges)\n",
"\n",
"'''pos = areas['rep_point'].apply(lambda point:(point.x, point.y)).tolist() +\\\n",
" sales_reps.apply(lambda row: (row['lon'], row['lat']), axis=1).tolist()\n",
"pos = dict(zip(list(range(len(nodes))), pos))\n",
"\n",
"plt.figure(figsize=(18, 18))\n",
"\n",
"node_color = ['blue']*areas.shape[0] + ['red']*sales_reps.shape[0]\n",
"nx.draw(SG, node_size=5, pos=pos,width=0.1, edge_color='grey', node_color=node_color)\n",
"#nx.draw(SG, node_size=5,width=0.1, edge_color='grey', node_color=node_color)'''\n",
"print('')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's now calculate each cell's cluster."
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [],
"source": [
"def tag_successors(successors, cluster, sol_dict):\n",
" for s in successors:\n",
" \n",
" if sol_dict.get(s) is None:\n",
" sol_dict[s] = cluster\n",
" elif sol_dict[s] != cluster:\n",
" # Border cells can be assign to more than one cluster.\n",
" #print(f'Watch out! Node {s} belongs to clusters {sol_dict.get(s)} and {cluster}')\n",
" pass\n",
" else:\n",
" # Already tagged path\n",
" continue\n",
" \n",
" if len(list(SG.successors(s))) == 0:\n",
" pass\n",
" else:\n",
" tag_successors(SG.successors(s), cluster, sol_dict)\n",
"\n",
"sol_dict = dict()\n",
"for cluster in range(sales_reps.shape[0]):\n",
" successors = list(SG.successors(sales_reps_first_idx+cluster))\n",
" tag_successors(successors, sales_reps_first_idx+cluster, sol_dict) "
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [],
"source": [
"assert(len(sol_dict)==areas.shape[0])"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
hex_id
\n",
"
geometry
\n",
"
poi_count
\n",
"
employee_avg
\n",
"
rep_point
\n",
"
sales_rep_1
\n",
"
sales_rep_1_cat
\n",
"
sales_rep_dist_1
\n",
"
sales_rep_2
\n",
"
sales_rep_2_cat
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
8526d9d7fffffff
\n",
"
POLYGON ((-99.33557 32.87875, -99.26773 32.953...
\n",
"
1
\n",
"
0.000000
\n",
"
POINT (-99.37959 32.96652)
\n",
"
0
\n",
"
Dallas
\n",
"
262156.516612
\n",
"
4
\n",
"
El Paso
\n",
"
\n",
"
\n",
"
1
\n",
"
8526d9a3fffffff
\n",
"
POLYGON ((-98.71570 32.47235, -98.64788 32.546...
\n",
"
1
\n",
"
0.000000
\n",
"
POINT (-98.75897 32.56035)
\n",
"
0
\n",
"
Dallas
\n",
"
199715.113991
\n",
"
4
\n",
"
El Paso
\n",
"
\n",
"
\n",
"
2
\n",
"
8526dc1bfffffff
\n",
"
POLYGON ((-100.32151 34.39136, -100.25314 34.4...
\n",
"
1
\n",
"
0.000000
\n",
"
POINT (-100.36707 34.47834)
\n",
"
0
\n",
"
Dallas
\n",
"
520625.827092
\n",
"
4
\n",
"
El Paso
\n",
"
\n",
"
\n",
"
3
\n",
"
85446c77fffffff
\n",
"
POLYGON ((-94.98183 29.87978, -94.91454 29.952...
\n",
"
14
\n",
"
3.833333
\n",
"
POINT (-95.02049 29.96866)
\n",
"
3
\n",
"
Houston
\n",
"
51541.627399
\n",
"
3
\n",
"
Houston
\n",
"
\n",
"
\n",
"
4
\n",
"
85444db7fffffff
\n",
"
POLYGON ((-95.10421 32.87010, -95.03517 32.941...
\n",
"
2
\n",
"
3.000000
\n",
"
POINT (-95.14402 32.95831)
\n",
"
0
\n",
"
Dallas
\n",
"
188844.476467
\n",
"
0
\n",
"
Dallas
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" hex_id geometry \\\n",
"0 8526d9d7fffffff POLYGON ((-99.33557 32.87875, -99.26773 32.953... \n",
"1 8526d9a3fffffff POLYGON ((-98.71570 32.47235, -98.64788 32.546... \n",
"2 8526dc1bfffffff POLYGON ((-100.32151 34.39136, -100.25314 34.4... \n",
"3 85446c77fffffff POLYGON ((-94.98183 29.87978, -94.91454 29.952... \n",
"4 85444db7fffffff POLYGON ((-95.10421 32.87010, -95.03517 32.941... \n",
"\n",
" poi_count employee_avg rep_point sales_rep_1 \\\n",
"0 1 0.000000 POINT (-99.37959 32.96652) 0 \n",
"1 1 0.000000 POINT (-98.75897 32.56035) 0 \n",
"2 1 0.000000 POINT (-100.36707 34.47834) 0 \n",
"3 14 3.833333 POINT (-95.02049 29.96866) 3 \n",
"4 2 3.000000 POINT (-95.14402 32.95831) 0 \n",
"\n",
" sales_rep_1_cat sales_rep_dist_1 sales_rep_2 sales_rep_2_cat \n",
"0 Dallas 262156.516612 4 El Paso \n",
"1 Dallas 199715.113991 4 El Paso \n",
"2 Dallas 520625.827092 4 El Paso \n",
"3 Houston 51541.627399 3 Houston \n",
"4 Dallas 188844.476467 0 Dallas "
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"areas['sales_rep_2'] = list({k: v for k, v in sorted(sol_dict.items(), key=lambda item:item[0])}.values())\n",
"areas['sales_rep_2'] -= sales_reps_first_idx\n",
"areas['sales_rep_2_cat'] = sales_reps.loc[areas['sales_rep_2'], 'city'].tolist()\n",
"areas.head()"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [],
"source": [
"areas['sales_rep_dist_2'] = areas.apply(lambda row : dist_array[row.name, row.sales_rep_2], axis=1)"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [],
"source": [
"areas['poi_count'] = areas['poi_count']-1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### 3.3.4. Visualize and analyze results\n",
"\n",
"We can see how the number of clients per cluster is more balanced now with the exception of El Paso. This issue was already identified at the beginning of the analysis. In order to double the number of clients for El Paso, we went from a maximum distance of 840 km to more than 1150 km, doubling the max distance of the cluster with the second highest max distnace.\n",
"\n",
"Next step would be to play with the balance tolerance to give a higher priority to distance or to the number of clients."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following layout map compares the clusters generated by both techniques."
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Layout([Map([Layer(areas[areas['poi_count'] > 0],\n",
" style=color_category_style('sales_rep_1_cat', cat=sales_reps['city'].tolist()), \n",
" legends=color_category_legend('Closest Distance', 'Sales Reps'), encode_data=False),\n",
" Layer(sales_reps, \n",
" style=basic_style(color='black'))]),\n",
" Map([Layer(areas[areas['poi_count'] > 1],\n",
" style=color_category_style('sales_rep_2_cat', cat=sales_reps['city'].tolist()), \n",
" legends=color_category_legend('Min Cost Flow', 'Sales Reps'), encode_data=False),\n",
" Layer(sales_reps, \n",
" style=basic_style(color='black'))])\n",
" ], map_height=400)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following two charts show the number of clients and maximum distance per cluster, respectively."
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA1gAAAEYCAYAAABBWFftAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3deZgkVZnv8W81vbApCoIIKI0DvDQNCBYgjI4CXnZFYERQkAGHYVcZR0QYRAQBZVOQTQVBBrkyyOoCCLIpCGoBsli+sjWLgDabXpbe8/4R0VqU3V3Z3REVnZXfz/P005kRGZFvntMk9atz4kRPq9VCkiRJkrTwRjVdgCRJkiSNFAYsSZIkSaqIAUuSJEmSKmLAkiRJkqSKGLAkSZIkqSKjmy6gm911112tJZZYoukyutrUqVMZN25c02V0Ldu/efZBs2z/5tkHzbL9m2cfLLhXXnnl2d7e3uUHbzdgNainp4cJEyY0XUZX6+/vtw8aZPs3zz5olu3fPPugWbZ/8+yDBdfX1/fYnLY7RVCSJEmSKmLAkiRJkqSKGLAkSZIkqSI9rVar6Rq61gMPPNCaOHFi02VIkiRJw2r6tBmMGdvZy0H09fX19fb2bjh4e2d/qg43atQoDt/6y02XIUmSJA2rE647sukSauMUQUmSJEmqiAFLkiRJkipiwJIkSZKkihiwJEmSJKkiBixJkiRJqogBS5IkSZIqYsCSJEmSpIoYsCRJkiSpIgYsSZIkSaqIAUuSJEmSKmLAkiRJkqSKGLAkSZIkqSIGLEmSJEmqiAFLkiRJkioyuukC6hYR44GHgfvKTaOA6cBpmXnhEMe2gOWBDwAfzswP1FiqJEmSpA434gNW6dXMXH/2k4hYFfhZRLycmZc1WJckSZKkEaRbAtZrZOZjEXEUcGhE3AecCSwNrATcA+yamVPmdGxEbAKcCIwD3gJcn5n/HhGjgW8A7wGmAY8Ae2fmS7V/IEmSJEmLhG6+Buu3wLrAfwDfzcxNgdWB1YDt53Hcp4GjMvNdwNrADhHRC2wKbAasl5m9FAFrvfrKlyRJkrSo6coRrFILeAU4DNgyIj4HrEkxirX0PI77N2C7iDgCWAtYsnz9b4GZwJ0RcR1wWWb+qsb6JUmSJC1iunkEayOKhS/+L7Av8BjwNeAuoGcex/0c2A74PXAM8CTQk5kvAu8APksRtC6JiP+srXpJkiRJi5yuDFgRsSbwBeAUYGvgmMy8hGJU613AYnM57o3AhsBhmXk5sDLFtMLFIuIDwM+A2zPzaOBCisAlSZIkqUt0yxTBJSLinvLxLGAKcHhm/ric6ndFRDxPMWXwForQ9A8y84WIOAG4KyKeA54Fbitffy6wLXB/RLwEvEBxfZckSZKkLtHTarWarqFr9ff3ty48xFXiJUmS1F1OuO7IpktYaH19fX29vb0bDt7elVMEJUmSJKkOBixJkiRJqogBS5IkSZIqYsCSJEmSpIoYsCRJkiSpIgYsSZIkSaqIAUuSJEmSKmLAkiRJkqSKGLAkSZIkqSIGLEmSJEmqiAFLkiRJkipiwJIkSZKkioxuuoBuNmvWLE647simy5AkSZKG1fRpMxgzdmRGEUewGjRt2rSmS+h6/f39TZfQ1Wz/5tkHzbL9m2cfNMv2b15TfTBSwxUYsCRJkiSpMgYsSZIkSaqIAUuSJEmSKmLAkiRJkqSKGLAkSZIkqSIGLEmSJEmqiAFLkiRJkipiwJIkSZKkivS0Wq2ma+haDzzwQGvixIlNlyFJkiQt8qbOmM640WOaLuNv+vr6+np7ezccvH3k3kK5A4waNYrVz/ivpsuQJEmSFnkPHXxK0yW0xSmCkiRJklQRA5YkSZIkVcSAJUmSJEkVMWBJkiRJUkUMWJIkSZJUEQOWJEmSJFXEgCVJkiRJFTFgSZIkSVJFDFiSJEmSVBEDliRJkiRVZHS7L4yILTLzxohYATgaeA44PjNfras4SZIkSeokbY1gRcRXgfPLp98CJgCbAGfVVJckSZIkdZx2R7B2BjaNiGWA7YE1gcnAY3UVJkmSJEmdpt1rsJbLzKeALYFHM/NRYDrQU1tlFYuIMRHxVERcu5DnWS0iLisfrxQRt1dToSRJkqRO1+4I1t0R8TXgfcAVEbEscCJwR22VVW8n4F6gNyImZGb/Ap5nVSAAytD5zxXVJ0mSJKnDtRuw9gKOA35DscDFesCbgX1rqaoeBwLfBx4CDgH2i4jNgDMycx2Agc8jYi3gPGBxipG6c4Fvln+vHBHXAfsB92fm0hFxNDAeeAtFCJsM7FqGMEmSJEldoK2AlZlPAHsO2HQn8MFaKqpBRKxNsSjHzkAfcEtEHDHEYYcCP8zMr0TEisDXgXOAfShC2NYRMX7QMf8CbJCZf42IqykC2Bcr/CiSJEmSFmFtBayIeBtwOLAGg67byswtaqiragcAP87M54HnI+JRivAzr+unrgAujIiNgRuAT2XmrIiY1/vcnJl/LR/fDSy78KVLkiRJ6hTtThG8kGKa3BUUi1t0jIhYimL0bUpETCo3vx44CLiN1y7UMXb2g8z8UUSsQbGwx/uBL0bEUNdbDbwnWIsOWgREkiRJ0sJrN2C9E1g5M/9fncXUZHfgWWDNzJwJEBFvoFhi/j3A28qbJ08Gdpx9UERcDPwiM88qVw3cDHgrMAMYM6yfQJIkSVJHaHeZ9oeBN9ZZSI0OAE6dHa4AMvNF4HSKQPVNisU77gCeHnDcscDuEfFbimvOrgBuAR4AZkbEr3CESpIkSdIAPa1Wa8gXRcSxwG7ARRQjPX+TmWfVU9rI19/f3/rgz85tugxJkiRpkffQwac0XcJr9PX19fX29m44eHu7UwTfAzxJMU1uoBZgwJIkSZIk2l+mffO6C5EkSZKkTtfuCBYRsS/FDYdXAf4EXJSZp9VUlyRJkiR1nLYWuYiIQ4HPARcA/wF8B/hURHy+vtIkSZIkqbO0O4K1H7B9ZubsDRFxE3A98JU6CpMkSZKkTtPuMu3LUizVPtAjwFLVliNJkiRJnavdgHUbcGxEjAIo/z4G+GVdhUmSJElSp2l3iuAhFNMB942IP1IsdPE08MG6CpMkSZKkTtPuMu0PR0QA/wKsADwO/CozZ9RZnCRJkiR1knlOEYyIfcq/D6RYPXAtiuux1qcYzTqw9golSZIkqUMMNYK1M3AusMtc9reAsyqtSJIkSZI61DwDVmZuVz7cOTNfGLw/It5eS1WSJEmS1IHaXUXwscEbImI0cHe15UiSJElS55rrCFZEjAduL1+zdET8edBLFgd+V19pI9+sWbN46OBTmi5DkiRJWuRNnTGdcaPHNF3GkOY6gpWZk4DtgI8AUyiuwxr4Zytgi/pLHLmmTZvWdAldr7+/v+kSuprt3zz7oFm2f/Psg2bZ/s3rpD7ohHAFQ1+DdQ9ARKyamZMH759942FJkiRJUvs3Gn5jRJwMrMzfR73GAGsAK9ZRmCRJkiR1mnZHoL4DLEOx2EULuAlYDTizprokSZIkqeO0G7A2AD4KnAKQmccCHwY+VFNdkiRJktRx2g1Yz1EsdPEwMBEgM+8A/qmmuiRJkiSp47R7DdbdwPHAl4CnImIn4BXg5boKkyRJkqRO027AOgT4NrAccBhwGcV9sPavqS5JkiRJ6jhtTRHMzEeBLYGnM/N6YHXgTZn5nTqLG+nGjh3bdAldb8KECU2X0NVs/+bZB82y/ZvXKX0wbfqMpkuQ1Ka2RrAi4p3A5cCuwJ3AfwG7RMT2mdk5dydbxIwaNYptDj6/6TIkSdIi7toz9m66BEltaneRi7OAEzPzToDMPAw4GfhmXYVJkiRJUqdpN2CtnZlnDdp2NrBexfVIkiRJUsdqN2A9HhFbDtq2GcWNhyVJkiRJtL+K4BeBqyLiGuBJYBWKRS92q6swSZIkSeo07a4ieBmwMXAfMJbivlgbZuZPaqxNkiRJkjpKuyNYAA8Ct1OMXv0JeKKWiiRJkiSpQ7W7TPuawDUUo1dPAm8DWhGxpcu0S5IkSVKh3UUuTge+C7wtMzelGMX6NvCNugqTJEmSpE7TbsDaCDg+M1sA5d/Hl9slSZIkSbQfsF4EYtC2ACZXW44kSZIkda52F7k4HbgmIr5Gce+r8cAhwCk11SVJkiRJHaetgJWZp0XEq8AewArA48Dhmfm9OouTJEmSpE7S9jLtmfkt4Fs11iJJkiRJHW2eASsifg205vWazNy40ookSZIkqUMNNYJ1RhVvEhGbACcAy1EsrPEE8NnMfKCK85fvMYbi+rB7M3ObNo/ZCPj3zNx/Id73HmCzzHxxQc8hSZIkaWSYZ8DKzO8u7BtExDjgR8BWmXlXuW0PikUzVsvMmQv7HqWdgHuB3oiY0OYNkCdS3NNrgWXm+gtzvCRJkqSRo+1rsBbCksAbgKUHbPse8FdgsYhoAV8DNgFeB/QA+2TmbRFxQfm6dYG3Ar8HdsvMl+bwPgcC3wceoljhcD+AiNgMOA54BFgHGAccVL7uGGCZiDg/M/eOiH2BTwEzgT8BB2fmH+ZVR1n/8pn5bER8AfgoMAP4Q3n8MwvRdpIkSZI6SLv3wVpgmfkC8Dng2oh4JCL+B9gbuCEzpwHvAlYCNs3MtYHvAp8fcIpeYBtgQvm6XQa/R0SsTRHQ/rc8/uMRsdyAl7wLOCUzNwDOA47OzCeAo4Cfl+Fqi7LOzTPzHcDFwJUR0dNOHRGxN7AtsFFmrgfcD1wwv+0lSZIkqXPVHrAAMvNU4M0Uo0NPA4cBd0fEMpn5S+BIYL+IOBn4MK8d7bo2M6dm5nTgPmDZObzFAcCPM/P5zPw18CjlCFbpscy8p3x811zOsQ1wSWZOLmu+AFiZ4p5f7dSxLXB+Zr5cPj8NeH9EjJ1bu0iSJEkaWdqeIhgRE4CPACtSLFLx/cx8pI3j3g38c2aeRHEt1o8i4giKkLJleX+t0yhuWnwVxfS7PQac4tUBj1sUUwgHnn8pYE9gSkRMKje/HjgoIk5q5xylOYXNHmBMm+cYfPwoivad03tJkiRJGoHaGsGKiF0pRn7Wp7i+aBPgtxGxdRuHTwaOjIj3DNj2FmApypAF/DAzzwZ+DewILNb2J4DdgWeBlTJzfGaOB95OMQr2kSGOncHfA9R1wK4RsTz8bcrfcxTXarXjOmDvMvBBMVp3a2ZObfeDSJIkSeps7U4RPA7YITN3ysxPZuYOwK4Uo07zlJl/oAhNx5fXYP2O4lqpfTMzgXOA90XEvcAvgYeB1SKi3doOAE4duBphuWT66RSLXczLL4G1IuKKzLyeYrGNGyPiAeDfgA9k5qw26zgPuAH4VUT0A++kCH+SJEmSukRPqzXP+wgDEBF/BZYrrz+avW008ExmvqnG+ka0/v7+1n+eeUfTZUiSpEXctWfs3XQJtejv72fChAlNl9HV7IMF19fX19fb27vh4O3tjhL9D/CV2Qs2lCvrHQpcUl2JkiRJktTZ2l3k4n3A2sA+EfEkxYqAywIvRsTflivPzBWqL1GSJEmSOkO7AeugWquQJEmSpBGgrYCVmbcARMTqwKrALcBSmfmXGmuTJEmSpI7S7jLtb46ImymWVb8KWB2YFBGb1libJEmSJHWUdhe5OIviHlXLANMz8/fAURTLmkuSJEmSaD9gvRf478ycBsxe1/1MYK1aqpIkSZKkDtRuwHoBePugbasBf662HEmSJEnqXO0GrK8B10bEp4ExEbEXcCVwRl2FSZIkSVKnaStgZebZFDcW3gZ4HNgDODEzT6+xNkmSJEnqKG0t0x4RZwKHZealNdcjSZIkSR2r3SmCuwLT6ixEkiRJkjpdWyNYwGXAlRFxGfAMf19JkMz8SR2FSZIkSVKnaTdgbVX+feSg7S3+cXVBtWnWrFlce8beTZchSZIWcdOmz2DsmHZ/bJPUpLb+S83M1eoupBtNm+asy6b19/czYcKEpsvoWrZ/8+yDZtn+zeuUPjBcSZ2j3WuwJEmSJElDMGBJkiRJUkUMWJIkSZJUkQUKWBGxYkS8rupiJEmSJKmTtRWwIuKdEXFz+Xgv4I/A0xGxbX2lSZIkSVJnaXcE61TgxojoAY4B9gQ+DJxUV2GSJEmS1GnaDVgTM/MYYD1gOeDSzLwWWLW2yiRJkiSpw7QbsF6JiFWAXYBbMnNaRGwATK6vtJFv3NixTZfQ9Trh3icjme3fPPugWbZ/8+yD4TVr5pSmS5Bq1+5d674O9ANjgG0jYmPgeuCwugrrBj2jRvHAJeObLkOSJGlYTNx1UtMlSLVrawQrM78GrA+smZk3AY8AW2TmOXUWJ0mSJEmdZH6WaX8O2CoivgRMAV5fT0mSJEmS1JnaXaZ9I+BB4GPAZ4A3AVdFxCdqrE2SJEmSOkq7I1inA/tn5hbAjMycBGwLHF5XYZIkSZLUadoNWGsBV5SPWwCZeRuwQh1FSZIkSVInajdgPQhsP3BDRGwB/KHyiiRJkiSpQ7W7TPtngR9GxI3AkhFxAfBB4CN1FSZJkiRJnabdZdpvBdYF7gDOAx4C3pWZP6uxNkmSJEnqKO2OYJGZjwNfrbEWSZIkSepo8wxYETGZclGLuclMF7qQJEmSJIYewfrwsFQhSZIkSSPAPANWZt4yt30RsRgwofKKJEmSJKlDtXUNVkTsAHwDWBnoGbDrZeD1NdQlSZIkSR2n3ftgnQScAxwCXApsBvwcOKaesuYuIloR8aZB2/aKiB/V9H4/Hfx+kiRJkjQn7QaslYGvAD8EVsvMnwMfB/avq7BFyJZNFyBJkiSpM7S7TPvTwFLAE8DqEdGTmU9ExCK3gmBELAOcCaxPsQLiNcARmTkjIlrA8pn5bPnaFrA8MAU4H1gDmAX0AftR3PML4KaI2I5iOuQZwHLluU/JzAsjYjPgOOARYB1gHHBQZt5U/yeWJEmStKhodwTreuAqYBngTuCUiDgOmFRTXUO5KSLumf2H105VPB14juLGyBsC7wA+O8T5dgJel5nrAxuV296emXuXjzenCJlXA9/IzPWAbYHjI2LT8jXvoghcG1AEs6MX5gNKkiRJ6jztBqzPADdRjNocSDFKswWwb011DWXzzFx/9h/gqAH7tgXOyMxWZk6luHZs2yHO9wtgYkTcDHwe+HpmPjToNWsCi2fm5QCZ+RRwGbBNuf+xzLynfHwXsOwCfjZJkiRJHaqtKYKZOQX4cvn0RWCr2ipaeIND4yhgzIDnPQARMXb2hsx8NCJWp1i8Ywvghoj4ZGb+YB7nHXzuVwdsb/Ha1RYlSZIkdYEhR7AiYqeI2LV8vGxEXBERj0bEaRHR7jVcw+k64KCI6ImIcRSjbNeX+yZTTBsE2Hn2ARFxAMU1WD/NzMPKc6xT7p5JEaISmBYRO5fHrAT864BzS5IkSepy8wxYEfEJ4FyKBS6gWOBhFeBgYC3gC7VWt2A+BawA3Ff+SYoFKGbvOzMi7gI2oLiuCuBCYDHgdxHxG4rFLE4r911OMYUwgB2BT0fEvcANwDEuZCFJkiRptp5WqzXXnRFxN/DpzLw1IpYEnge2y8wbyyl1P83Mtw9TrSNOf39/a9a9Q10eJkmSNDJM3HXSa5739/czYcKEZooRYB8sjL6+vr7e3t4NB28faorg2zPz1vLxxhTXFv0CoFwEYpFbpl2SJEmSmjJUwJo5YDGIzYA7M3MaQEQsD7xcY22SJEmS1FGGCli3Ap+NiPHAHhTXI812BHBLTXVJkiRJUscZahXAQ4FrgWOBmynuKUVEPAwsDbynzuIkSZIkqZPMM2Bl5oPlYhZvyszJA3YdDtyQmc/XWp0kSZIkdZAh72OVmS2K+0cN3Pa/tVUkSZIkSR1qyBsNS5IkSZLaY8CSJEmSpIoYsCRJkiSpIgYsSZIkSaqIAUuSJEmSKmLAkiRJkqSKGLAkSZIkqSJD3gdL9WnNmsXEXSc1XYYkSdKwmDVzCqMWW7zpMqRaOYLVoKnTpjVdQtfr7+9vuoSuZvs3zz5olu3fPPtgeBmu1A0MWJIkSZJUEQOWJEmSJFXEgCVJkiRJFTFgSZIkSVJFDFiSJEmSVBEDliRJkiRVxIAlSZIkSRUxYDVo3NixTZfQ9SZMmNB0CV3N9m9eN/bBrGnTmy5BkjSCjW66gG7WM2oU9+3yhabLkKSusu6lxzZdgiRpBHMES5IkSZIqYsCSJEmSpIoYsCRJkiSpIgYsSZIkSaqIAUuSJEmSKmLAkiRJkqSKGLAkSZIkqSIGLEmSJEmqiAFLkiRJkipiwJIkSZKkihiwJEmSJKkiBixJkiRJqogBS5IkSZIqYsCSJEmSpIqMbrqAhRURLeB+YOagXTsC44EzMnOdQceMBx4G7huwuQc4LTO/U1uxkiRJkka0jg9Ypc0z89nBG8sgNTevZub6A167MnB/RPwmM++toUZJkiRJI9xICVgLLTP/GBEPAmtGxMPA2cCawLLA/wM+lpkZETsDRwKzKEbNDs3MWyNilfKY8RSjYd/NzJMa+CiSJEmSGjJSrsG6KSLuGfDnivk9QURsCqwO3AlsC7yYmZtk5prAr4GDy5eeBByYmRsCXwA2K7d/D7gpM9cF3g3sERG7LdSnkiRJktRRRsoI1hynCA5hiYi4p3w8GngW2D0znwCeiIhHIuKTFKFrM+CX5Wu/D1wRET8GrgdOjIilKELVVgCZ+ZeIuIAiqH1/wT+WJEmSpE4yUgLWgnjNNVgDRcQBwL7AGcDFwPPAagCZ+d8RcR5FmNoL+DxFAOsZdJpRwJg6CpckSZK0aBopUwSrtjVwQWaeByTwQWCxiBgdEZOApTLzHOBAYAIwBbgDOAggIpYB9qQY4ZIkSZLUJUbKCNZNETF4mfYjgFcW8HwnA9+KiL0pFrLoA9bNzBkRcQhwcURMp1jo4hOZOTUidgfOLI8ZS3FN1gUL+P6SJEmSOlDHB6zMHDw1b7B1Bm/IzEnA0vM45y+Ateey70rgyrmcc/shapEkSZI0gjlFUJIkSZIqYsCSJEmSpIoYsCRJkiSpIgYsSZIkSaqIAUuSJEmSKmLAkiRJkqSKGLAkSZIkqSIGLEmSJEmqiAFLkiRJkipiwJIkSZKkihiwJEmSJKkiBixJkiRJqogBS5IkSZIqMrrpArpZa9Ys1r302KbLkKSuMmvadEaNHdN0GZKkEcoRrAZNnTat6RK6Xn9/f9MldDXbv3nd2AeGK0lSnQxYkiRJklQRA5YkSZIkVaSn1Wo1XUPX6uvrmww81nQdkiRJkubbqr29vcsP3mjAkiRJkqSKOEVQkiRJkipiwJIkSZKkihiwJEmSJKkiBixJkiRJqogBS5IkSZIqYsCSJEmSpIqMbrqAbhQR2wMnAOOAe4F/z8y/NlvVyBARewCHAi3gFeBTmfmbiDgC2JPi3/xFwJcysxURywMXAqsCs4B9M/P28lz20wKKiB2BCzPz9eVz23+YRMS6wDeAZYCZwH6Z2WcfDJ+I2An4EkV7vgDsA0wCTgW2puiDkzPznPL1awDfAZYDXgL2zMzfl/s+QfGdNhq4geI7bfpwfp5OERE9wPnA/Zl5ckQsRoVtHhFLAucCG1D8gvqwzLxyOD/jomwO7b8EcCawEUV73QkclJmvLsj3zrz6U4XBfTBo3+XAU5l5cPncPqiRI1jDrPwHfT7wr5kZwCPAV5qtamSIiABOArbJzPWBLwOXR8R2wC5AL7AOsHn5HIov/59n5trAHsClEbGk/bTgyh9cTqb8frH9h0/5A+BPgRMzcwPgWOB79sHwKX+ovAjYufweuho4HdgPWIOi/TcCDomIjcvDvgecXfbBF4HLIqInItahCGrvBQJ4A/Cfw/l5OkVETAB+BnxkwOaq2/xo4KXMnABsCZwVEavU+sE6xFza/78pfgh/B7AesARweLlvQb535tWfXW8ufTB73+eAfxm02T6okQFr+G0F/DozHyyfnw3sXv7WQQtnKrBPZj5dPv8NsCLFD5IXZ+bLmTmF4otjj4gYDXwA+DZAZt4DPAhsg/20QMof8C8CPjNg807Y/sNlK+DhzPxJ+fxqiv/Z2gfDZzGgh2IEEWBpYApFH5yfmTMy8wXg+xR9sDKwVvmczLwGWIpilORDwNWZOTkzZwHfpPhBSP/oIIp/1/87YFvVbb4Tf/9v5XGKX2b8ww+zXWpO7X8r8OXMnJWZM4G7gVUX4ntnjv1Z/0frGHPqAyJic4q2PWfANvugZk4RHH5vBZ4Y8PxJ4PXA6wCn3iyEzJxEMQ1n9jD5qRQ/YL4FuG7AS58EVgHeBIzKzMlz2Lck9tOC+Gb5594B295K8Vu12Wz/+qwJPBMR51H81vhF4HPYB8MmM1+KiP2B2yPiOYrA9W7gR/xje65H0TdPlT/MD9y3Srlv0hy2a5AB057eP2DznP5/uzBtPqfz2R/Muf0z86ezH0fEqsAhwL4s+PfO3PpTzLkPImIl4DSKKX37DXi5fVAzR7CG39zafOawVjGCRcRSFL/BWZ3i2oc5tfnMuWxvZ5/mICIOBGZk5ncG7bL9h88YYDvgW5m5IcW1WD+hmEc/mH1Qg/IauKOAtTNzJeA44DKKoDXYgvSB7d++qr57Zre5/bEAIqIX+DlwRmb+iAX/3rH950NEjKEYYTpkwMye2eyDmhmwht/jFCMqs60MvJCZLzdUz4gSEW8Dbqf4D37zzHyRObf5k8Cfy2PeOId99tP82wvYKCLuofihfony8ZPY/sPlKeD3mXknQGZeRfGD/Szsg+GyNXBbZj5cPj+T4nqFx5hzHzwOrDho6uW8+uDJmuoeiebWfgva5vbHfIqI3YDrgc9n5vHl5gX93rH958+GwGrAqeX/i/cHdo2Ic7EPamfAGn4/BTYpFwKA4h/8VQ3WM2JExLLALcDlmblbZr5a7rqKYv7wUhExjiIIXJmZM4AfUw6bR8R6wNrAzdhP8y0zN87MdcoL+7cDXi0fX4HtP1yuAcaXvzEmIt5LsaLm17EPhstdwPsi4s3l8x2BRyna7hMRMToi3gDsRtEHTwIPA7sCRMTWFIH4PoopzjtExAplGNgXcNW69lXd5leVzykXt9iGYuqn5iAiPkyxwMtWmXnx7O0L8b0zx/4cho/SkTLzl5n51sxcv/x/8TnAJZm5j31QP6/BGmaZ+aNUyU8AAAPhSURBVOeI2Bv4QUSMpfiS37PhskaKA4C3ATtFsUzybO8HLgd+BYyl+IK4sNx3IHBuRNxP8YPoxzPzLwD2UzUy84fltCnbv2aZ+UwUS+SfVU6VnUqxmt0v7IPhkZk3RsRJwM0RMQ14nmLhhAT+CfgtRR98MzNvKQ/bDfh2RBxJsSDGLuX1QfdGxDHAjRTTP+8EvjqsH6iznU21bf5F4OyIeIBiZPjQASOV+kcnUCz4cm5EzN52W2YexIJ978yrPzX/7IMa9bRaraZrkCRJkqQRwSmCkiRJklQRA5YkSZIkVcSAJUmSJEkVMWBJkiRJUkUMWJIkSZJUEQOWJEkdLCJWa7oGSdLfGbAkSRpCRFwTEfs2XcdgEbEBcHvTdUiS/s4bDUuSNITM3LbpGuZiGYob4kqSFhHeaFiS1DUiYjxwP/BV4DPAq8BXM/O0iFgd+DrwbuAF4BzgpMxsRcTNwA8y84whzr94ee6PUcwSuRo4IDOnRMSWwAnAmsAjwBGZ+ZPyuBawbmbeXz7/AXB/Zh5dvvcvgA8A/wTcBfwb8ArwGLA48DKwamY+t9CNJElaKE4RlCR1m6WA9YBVgA8CR0fEDsD1wO+AFYHtgP3KP/PjaGBT4B3AasCqwFERMZEibB0PLAscAVwaEeu2ed6PAjuVNfcAh2fmn4Ftgecyc2nDlSQtGgxYkqRudEhmvpyZfcB3gTOBN1CMKk3NzN8DJwJ7zed5Pwocl5lPZeZfgT2B84DdgBsy8/LMnFGOXF0N7N7meS/KzEcz8y/AFcAa81mXJGmYeA2WJKnbTMnMPw54/iSwAvBgZs4YsP0xihGj+fHm8nwAZOaTABGxQnm+gebn/JMHPJ6OvyCVpEWWX9CSpG6zeES8ccDzVSlW4ls5Igb+4nE14E/zee4/AivPfhIRG0bEJ4HHgfGDXjvw/LOAsQP2LTef7ytJWkQ4giVJ6kYnRMSnKa6V+jjwIYpFLY6PiC9QhJ9DgXkuajEH3wMOj4g7gSnAV4A7gAuAIyJiZ+AqYCtgB+C95XF/AD4UEXcD/4fiOq5b2ni/qRSBcWxmTpvPWiVJNXAES5LUjV6iGFW6BPh0Zt5CseDFOsAzwM8orp36+nye9ziK0bC7gYcogtMxmfkQsCNwJPAicBLwscz8dXncJ4Gdgb8ABwMXt/l+9wIPAM+VqyBKkhrmMu2SpK5RLtP+KPC6zHyp4XIkSSOQI1iSJEmSVBGvwZIkqU0R8Qyw9Fx2X5SZ+w9nPZKkRY9TBCVJkiSpIk4RlCRJkqSKGLAkSZIkqSIGLEmSJEmqiAFLkiRJkipiwJIkSZKkivx/gLmkLvTYv0wAAAAASUVORK5CYII=\n",
"text/plain": [
"