## Tips to extract data from a  geojson  dict to define a  `choroplethmapbox` chart

The choroplethmapbox chart type  is available starting with the Plotly version 4.1.0. It is  defined by a geojson file and eventually a pandas dataframe. The dict `jdata` read from the geojson file must have the following structure:

```
jdata = {"type": "FeatureCollection",
         "features": []
        }
```

where jdata['features'] is  a list of features, i.e. a list of dicts
that contain   at least the keys: `['type',  'geometry']`.
There exist more than one  definition of each feature in `jdata['features']`, because  a geojson file  has  "an open standard format", i.e.   "there is no single definition and  interpretations vary with usage" [Wikipedia](https://en.wikipedia.org/wiki/GeoJSON). 

Most geojson files provide for each feature a subdict  `'properties'`.
A go.Choroplethmapbox chart is defined by a geojson file whose `'geometry'` feature is of type `'Polygon'` or 
`'Multipolygon'`.

A `go.Choroplethmapbox` trace has as basic attributes: the `geojson` data (the dict jdata),  the `locations`, i.e. a list of ids, representing  each `feature['geometry']` to be colored according to the numerical values
 given in  a  list or dataframe column as `z`. These z-values are usually provided by another sources, not by the geojson file. z can contain population, unemployment percent, etc  for a geographi unit.

`locations` can be the entire list of all `feature['geometry]` identifiers,  read from the geojson file or only a sublist.

An important key that  makes a correspondence between each `feature['geometry']` (geographical unit) and the  data file  associated to these units is a key that uniquely identifies each unit to be colored to get the choropleth

Thats why  the **FIRST STEP** after reading a geojson file as a dict, jdata, is to  inspect its structure: 

```
print(jdata.keys())
```
There can be three cases:

1. Each feature dict in the list `jdata['features']` has a key 'id', that can be identified
from displaying:
```
jdata['features][0].keys()
````
In this case will be displayed the following keys:
````
dict_keys(['type', 'id', 'geometry'])
```

2. There is no key called `id` within the feature dicts  (neither outside nor inside an inner dict of each feature definition), like  `feature['properties'][id]` or `feature['anykey']['id']`). In this case if there is no key with another name that uniquely identifies each `feature['geometry']`, define yourself an id for each feature as follows:

```
 for k in range(len(jdata['features'])):
    jdata['features'][k]['id'] = k
    
```

3. When  displaying:
```
jdata['features][0].keys()
```

more keys than above are listed:

```
dict_keys(['type', 'geometry', 'properties', 'anykey'])
```

Inside `feature['properties']` or eventually `feature['anykey']` there is a key called either 'id' or it has another name, let us say `'someidentifier'`, that identifies uniquely a geographic region defined by `feature['geometry']`.


**SECOND STEP**: Based on the jdata definition one  defines a dataframe, df, that has a column (with a name at your choice, but 'ids' is the most suggestive), consisting in all ids or a part of them for the cases 1 and 2, above, or in the case 3, all or a subset of the ids recorded as `feature['properties']['id']`, `feature['properties']['someidentifier']`  or `feature['anykey']['someidentifier']`.
Note that the `list(df['ids'])` can be a permutation of the jdata ids or a part/subset of a permutation.

The second column, let us say `df['vals']` is a numerical one, and it can contain the population in each geographical region, represented by `feature['geometry']` or unemployment percent, etc.

With these data we can define a trace of type choroplethmapbox as follows:

Case 1 and 2:
```
trace= go.Choroplethmapbox(geojson=jdata,
                           locations=df['ids'],
                           z=df['vals'],
                           colorscale='Viridis',
                           colorbar_thickness=20,
                           hoverinfo='all',
                        )
```                        
Case 3:
```
trace= go.Choroplethmapbox(geojson=jdata,
                           locations=df['ids'],
                           z=df['vals'],
                           featureidkey='properties.id', #or 'anykey.someindentifier'                                                                                               
                           colorscale='deep_r',
                           colorbar_thickness=20,
                           hoverinfo='all',
                        )
 ```                       

The geojson files for a choropletmapbox can be found on the  web, searching for the geojson file of states, regions, provinces or counties in some country.
Often we can find `topojson` files for such administrative divisions of a country.

A `topojson` file can be converted online to a `geojson` file [https://mygeodata.cloud/converter/topojson-to-geojson](https://mygeodata.cloud/converter/topojson-to-geojson).

If no geojson or topojson file  can be found for a country/region, then the solution is to read a shapefile and convert it
to a geojson file. Details in the last section.

Examples: 

In [1]:
import plotly
plotly.__version__

'4.9.0'

### Choropleth mapbox for a few China provinces

In [2]:
import numpy as np
import json
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode,  iplot
init_notebook_mode(connected=True)

Read a geojson file from an url, and check its structure:

In [3]:
china_url = 'https://raw.githubusercontent.com/chemzqm/geomap/master/china-province.geojson'

In [4]:
import urllib.request

def read_geojson(url):
    with urllib.request.urlopen(url) as url:
        jdata = json.loads(url.read().decode())
    return jdata 

In [5]:
jdata = read_geojson(china_url)

Inspect the geojson file content:

In [6]:
jdata['type']

'FeatureCollection'

In [7]:
jdata['features'][0].keys()
                       

dict_keys(['type', 'id', 'properties', 'geometry'])

In [8]:
jdata['features'][0]['properties']

{'GEO_ID': 23, 'NAME': '黑龙江'}

In [9]:
#jdata['features'][9]['geometry']['coordinates']

For Choroplethmapbox attributes see: [https://plot.ly/python/reference/#choroplethmapbox](https://plot.ly/python/reference/#choroplethmapbox).

 Let us select a list of ids as locations:

In [10]:
locations = [15+k for k in range(13)]
text = [feat['properties']['NAME']  for feat in jdata['features'] if feat['id'] in locations] #province names
text      

['福建', '广西', '广东', '海南', '吉林', '辽宁', '天津', '青海', '甘肃', '陕西', '内蒙古', '重庆', '河北']

Define here some synthetic data for z:

In [11]:
z = [ 4.2,  8.1,  6.85, 11.3,  3.56, 10.3,  8.25,  12.57,  5.28,  14.9,  8.67, 10.3,  6.1]

In [12]:
mapboxt = open(".mapbox_token").read().rstrip() #my mapbox_access_token  must be used only for special mapbox style

For hovering we can set hoverinfo ='all' (to  display on hover the location, z-value, and text) or any combination
between the 'location', 'z', 'text'. (Attn!!!, although the attribute is `locations`, for `hoverinfo` one uses `location` (why???!!!!).


In [13]:
fig= go.Figure(go.Choroplethmapbox(z=z,
                            locations=locations,
                            colorscale='reds',
                            colorbar=dict(thickness=20, ticklen=3),
                            geojson=jdata,
                            text=text,
                            hoverinfo='all',
                            marker_line_width=1, marker_opacity=0.75))
                            
                            
fig.update_layout(title_text= 'Choroplethmapbox',
                  title_x=0.5, width = 700,# height=700,
                  mapbox = dict(center= dict(lat=36.913818,  lon=106.363625),
                                 accesstoken= mapboxt,
                                 style='basic',
                                 zoom=2.35,
                               ));

#fig.show()
                

Seeing only numbers and text on hover, as above, is not sufficiently informative. 
To display what each one represents, we define `hovertemplate`
From  `go.Choroplethmapbox` docs we learn that:

In [14]:
#help(go.Choroplethmapbox.hovertemplate)

In [15]:
fig.data[0].hovertemplate =  '<b>Province</b>: <b>%{text}</b>'+\
                              '<br> <b>Val </b>: %{z}<br>'
fig.update_layout(title_text= "Choroplethmapbox with hovertemplate");
iplot(fig)

Notice a fantastic feature of this chart type:
although we don't give anywhere in the trace definition, the geographical position of province (polygons) centers,
the hoverbox is authomatically placed at the visual center of a polygon/multipolygon.

### Choropleth mapbox for Swiss cantons

In [None]:
swiss_url = 'https://raw.githubusercontent.com/empet/Datasets/master/swiss-cantons.geojson'
jdata = read_geojson(swiss_url)

In [None]:
jdata['features'][0].keys()

In [None]:
jdata['features'][0]['properties']

In [None]:
import pandas as pd

data_url = "https://raw.githubusercontent.com/empet/Datasets/master/Swiss-synthetic-data.csv"
df = pd.read_csv(data_url)
df.head()

In [None]:
mycustomdata = np.stack((df['canton-name'], df['2018']), axis=-1)
title = 'Swiss Canton Choroplethmapbox'

fig = go.Figure(go.Choroplethmapbox(geojson=jdata, 
                                    locations=df['canton-id'], 
                                    z=df['2018'],
                                    featureidkey='properties.id',
                                    coloraxis="coloraxis",
                                    customdata=mycustomdata,
                                    hovertemplate= 'Canton: %{customdata[0]}'+\
                                                   '<br>2018: %{customdata[1]}%<extra></extra>',
                                    marker_line_width=1))


fig.update_layout(title_text = title,
                  title_x=0.5,
                  coloraxis_colorscale='algae_r',
                  mapbox=dict(style='carto-positron',
                              zoom=6.5, 
                              center = {"lat": 46.8181877 , "lon":8.2275124 },
                              )); 
                            
#fig.show()
                

Plotly express version of the same choroplethmapbox:

In [None]:
import plotly.express as px

fig = px.choropleth_mapbox(df, geojson=jdata, 
          featureidkey='properties.id',
          locations='canton-id',
          color='2018',
        color_continuous_scale  ='algae_r',      
                          
          zoom=6.5,
          center={'lat': 46.8181877 , 'lon':8.2275124 },
          mapbox_style='carto-positron')

fig.update_layout(title_text='', #title,
                  title_x=0.5,
                  coloraxis_reversescale=True,
                  #coloraxis_colorscale=algae  #'Viridis',
                  );
fig.show()

### Choroplethmapbox from a geojson dict derived from a shapefile

To get the shapefile for the counties/regions of a country we access the Global Administrative Areas Database (GADM) https://gadm.org/, select Data, and then click the link country and choose from a dropdown menu the country of interest https://gadm.org/download_country_v3.html.

Each zip file downloaded from GADM contains multiple shapefiles, indexed by the level of detail, with 0, 1, 2, 3, eventually 4. Level 0 contains the shapefile of a country (UK, for example). Level 1 corresponds to provinces (regions) (in UK there exist four provinces: England, Scotland, Wales, and Northern Ireland. Level 2 shapefiles represent counties, and level 3, 4, smaller administrative subdivisions of each county.

There exist at least 4 files with the same level index. Their extension is shp, shx, dbf, prj. For more information on these files see https://en.wikipedia.org/wiki/Shapefile.

A shape file is read as a geopandas dataframe, by `geopandas.read_file('filename.shp')`, https://github.com/geopandas/geopandas/blob/fbe743f3131cc5942fef8362ef2aed606dc45e23/doc/source/io.rst

Then it is converted to a geojson file to be used for a choroplethmapbox definition. 

In [None]:
import geopandas as gpd
gpd.__version__

We downloaded a zip file containing Norway administrative regions. Read the level 1, shape file:

In [None]:
level = 1
gdf = gpd.read_file(f"gadm36_NOR_shp/gadm36_NOR_{level}.shp", encoding='utf-8')
#gdf.head()

 To be sure that you set right data for your go.Choroplethmapbox and it  will be displayed you must check the CRS for `gdf`. The geometric shapes in your GeoDataFrame, gdf,  are represented by
    coordinates in an arbitrary space. A CRS (Coordinate Reference System) tells Python how those coordinates relate to places on the Earth. 

In [None]:
gdf.crs

Hence our gdf contains data (coordinates) in the WGS84 (EPSG:4326) standard.  This is the best case when we intend to convert the geodataframe to a geojson file for mapbox. Mapbox maps are visually rendered in the Web Mercator Projection (EPSG:3857), but by  [https://docs.mapbox.com/api/#coordinate-format](https://docs.mapbox.com/api/#coordinate-format)
when we provide   geographic coordinates to a Mapbox API (in our case to define a go.Scattermapbox or go.Choroplethmapbox), they should be formatted in the order longitude, latitude, and specified as decimal degrees in the WGS84 coordinate system.  If `gdf.crs` is WGS84 then the following conversion defines a right geojson file to be used for a Chroplethmapbox: 

In [None]:
gdf.to_file('norway-geo.json', driver = 'GeoJSON')
with open('norway-geo.json') as geofile:
    jdataNo = json.load(geofile)    

If `gdf.crs` does not displays the WGS84 coordinate system, then before the conversion, gdf to geojson,  a crs conversion must be performed, as follows:
`gdf.to_crs(epsg=4326)`    

Now let us check the Norway's geojson file:

In [None]:
jdataNo['features'][0].keys()

In [None]:
jdataNo['features'][0]['properties']

Since it it is difficult to decide which key identifies uniquely each region we are defining a default id as follows:

In [None]:
for k in range(len(jdataNo['features'])):
    jdataNo['features'][k]['id'] = k

Based on this `'id'` definition we set up a pandas dataframe that contains data to be associated to each Norway region:

In [None]:
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/empet/Datasets/master/Norway-vals.csv')
df.head()

In [None]:
fig = go.Figure(go.Choroplethmapbox(z=df['vals'],
                            locations = df['geo-id'], 
                            colorscale = 'ice',
                            colorbar = dict(thickness=20, ticklen=3),
                            geojson = jdataNo,
                            text = df['geo-name'],
                            hovertemplate = '<b>State</b>: <b>%{text}</b>'+
                                            '<br> <b>Val </b>: %{z}<br>',
                            marker_line_width=0.1, marker_opacity=0.7))

fig.update_layout(title_text ='Norway mapbox choropleth', title_x =0.5, width=750, height=700,
                 mapbox = dict(center= dict(lat=64.5, lon=18.75),            
                               accesstoken= mapboxt,
                               zoom=3
                               ))

iplot(fig)

.