## Geographic Visualization using plotly/geoplotib - Neerja Doshi

Dataset - The dataset used here is the World Happiness Report data for 2015. It can be found here: https://www.kaggle.com/unsdsn/world-happiness/data

In [1]:
# import the necessary packages
import pandas as pd
import numpy as np

import geoplotlib
from geoplotlib.colors import ColorMap
from geoplotlib.colors import create_set_cmap
import pyglet
from sklearn.cluster import KMeans
from geoplotlib.layers import BaseLayer
from geoplotlib.core import BatchPainter
from geoplotlib.utils import BoundingBox

import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)


In [2]:
df = pd.read_csv('2015.csv')
print(df.head())
map_data = pd.read_csv('countries.csv')
map_data.head()

       Country          Region  Happiness Rank  Happiness Score  \
0  Switzerland  Western Europe               1            7.587   
1      Iceland  Western Europe               2            7.561   
2      Denmark  Western Europe               3            7.527   
3       Norway  Western Europe               4            7.522   
4       Canada   North America               5            7.427   

   Standard Error  Economy (GDP per Capita)   Family  \
0         0.03411                   1.39651  1.34951   
1         0.04884                   1.30232  1.40223   
2         0.03328                   1.32548  1.36058   
3         0.03880                   1.45900  1.33095   
4         0.03553                   1.32629  1.32261   

   Health (Life Expectancy)  Freedom  Trust (Government Corruption)  \
0                   0.94143  0.66557                        0.41978   
1                   0.94784  0.62877                        0.14145   
2                   0.87464  0.64938           

Unnamed: 0,ISO 3166 Country Code,Country,Latitude,Longitude
0,AD,Andorra,42.5,1.5
1,AE,United Arab Emirates,24.0,54.0
2,AF,Afghanistan,33.0,65.0
3,AG,Antigua and Barbuda,17.05,-61.8
4,AI,Anguilla,18.25,-63.17


### Choropleth
This map shows the happiness rank of all the countries in the world in 2015. The darker the colour, the higher the rank, i.e. the happier the people in that country. It can be seen that the countries of North America (the US, Mexico and Canada), Australia, New Zealand and the western countries of Europe have the happiest citizens. Countries that have lower happiness scores are the ones that are either war struck (eg., Iraq) or are highly underdeveloped as can be said of the countries in Africa like Congo and Chad. The top 5 countries are:
1. Switzerland
2. Iceland
3. Denmark
4. Norway
5. Canada


In [3]:
scl = [[0,"rgb(5, 10, 172)"],[0.35,"rgb(40, 60, 190)"],[0.5,"rgb(70, 100, 245)"],\
            [0.6,"rgb(90, 120, 245)"],[0.7,"rgb(106, 137, 247)"],[1,"rgb(240, 210, 250)"]]


# scl = [[0.0, 'rgb(50,10,143)'],[0.2, 'rgb(117,107,177)'],[0.4, 'rgb(158,154,200)'],\
#             [0.6, 'rgb(188,189,220)'],[0.8, 'rgb(218,208,235)'],[1.0, 'rgb(250,240,255)']]
data = dict(type = 'choropleth', 
            colorscale = scl,
            autocolorscale = True,
            reversescale = True,
           locations = df['Country'],
           locationmode = 'country names',
           z = df['Happiness Rank'], 
           text = df['Country'],
           colorbar = {'title':'Happiness'})
layout = dict(title = 'Global Happiness', 
             geo = dict(showframe = False, 
                       projection = {'type': 'Orthographic'}))

choromap3 = go.Figure(data = [data], layout=layout)
iplot(choromap3)

### Symbol Map
To take our analysis further, we plot a symbol map where the size of the circles represents the Happiness Score while the colour represents the GDP of the countries. Larger the circle, happier the citizens while darker the circle, higher the GDP.
From this plot, we can see that the top 5 countries we have seen above definitely have a much higher GDP. ** This seems to imply that more well off a country is economically, the happier its citizens are. ** Also, the underdeveloped countries like Chad, Congo, Burundi and Togo have a very low GDP and also a very low happiness score. While we cannot directly say that low GDP implies lower happiness, it seems like an important factor. This trend remain consistent throughout all the countries. ** There are almost no countries that have a high GDP but low happiness index or vice versa. **

Also geographic location and neighbours may be playing an important role. We can clusters/regions with countries having similar GDPs and similar Happiness Scores. ** So countries that have a good economy and good relations with their neighbours benefit from mutual growth and this is also reflected in their happiness scores. Countries that have disturbed neighbourhoods, like in middle east Asia (Iraq, Afghanistan, etc.), show much lower growth/economic prosperity as well as lower happiness scores. **

One thing that can also be noted is that in general, countries which are known for their lower population densities (https://www.worldatlas.com/articles/the-10-least-densely-populated-places-in-the-world-2015.html) like Denmark and Iceland are much happier than the more densly populated countries.

In [4]:
df = df.merge(map_data, how='left', on = ['Country'])
df.head()

Unnamed: 0,Country,Region,Happiness Rank,Happiness Score,Standard Error,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Dystopia Residual,ISO 3166 Country Code,Latitude,Longitude
0,Switzerland,Western Europe,1,7.587,0.03411,1.39651,1.34951,0.94143,0.66557,0.41978,0.29678,2.51738,CH,47.0,8.0
1,Iceland,Western Europe,2,7.561,0.04884,1.30232,1.40223,0.94784,0.62877,0.14145,0.4363,2.70201,IS,65.0,-18.0
2,Denmark,Western Europe,3,7.527,0.03328,1.32548,1.36058,0.87464,0.64938,0.48357,0.34139,2.49204,DK,56.0,10.0
3,Norway,Western Europe,4,7.522,0.0388,1.459,1.33095,0.88521,0.66973,0.36503,0.34699,2.46531,NO,62.0,10.0
4,Canada,North America,5,7.427,0.03553,1.32629,1.32261,0.90563,0.63297,0.32957,0.45811,2.45176,CA,60.0,-95.0


In [5]:
df['Happiness Score'].min(), df['Happiness Score'].max()

(2.839, 7.5870000000000006)

In [6]:
df['text']=df['Country'] + '<br>Happiness Score ' + (df['Happiness Score']).astype(str)
scl = [ [0,"rgb(5, 10, 172)"],[0.35,"rgb(40, 60, 190)"],[0.5,"rgb(70, 100, 245)"],\
    [0.6,"rgb(90, 120, 245)"],[0.7,"rgb(106, 137, 247)"],[1,"rgb(220, 220, 220)"] ]

data = [ dict(
        type = 'scattergeo',
        locationmode = 'country names',
        lon = df['Longitude'],
        lat = df['Latitude'],
        text = df['text'],
        mode = 'markers',
        marker = dict(
            size = df['Happiness Score']*3,
            opacity = 0.8,
            reversescale = True,
            autocolorscale = False,
            symbol = 'circle',
            line = dict(
                width=1,
                color='rgba(102, 102, 102)'
            ),
            colorscale = scl,
            cmin = 0,
            color = df['Economy (GDP per Capita)'],
            cmax = df['Economy (GDP per Capita)'].max(),
            colorbar=dict(
                title="GDP per Capita"
            )
        ))]

layout = dict(
        title = 'Happiness Scores by GDP',
        geo = dict(
#             scope='usa',
            projection=dict( type='Mercator' ),
            showland = True,
            landcolor = "rgb(250, 250, 250)",
#             subunitcolor = "rgb(217, 217, 217)",
#             countrycolor = "rgb(217, 217, 217)",
            countrywidth = 0.5,
            subunitwidth = 0.5
        ),
    )

symbolmap = go.Figure(data = data, layout=layout)

iplot(symbolmap)

### Extra plot
A cluster plot may also be used to see if the clustered regions coincide with any of the regions above. (This plot opens in a new window)

In [7]:
"""
Example of keyboard interaction
"""

class KMeansLayer(BaseLayer):

    def __init__(self, data):
        self.data = data
        self.k = 2


    def invalidate(self, proj):
        self.painter = BatchPainter()
        x, y = proj.lonlat_to_screen(self.data['Longitude'], self.data['Latitude'])

        k_means = KMeans(n_clusters=self.k)
        k_means.fit(np.vstack([x,y]).T)
        labels = k_means.labels_

        self.cmap = create_set_cmap(set(labels), 'hsv')
        for l in set(labels):
            self.painter.set_color(self.cmap[l])
            self.painter.convexhull(x[labels == l], y[labels == l])
            self.painter.points(x[labels == l], y[labels == l], 2)
    
            
    def draw(self, proj, mouse_x, mouse_y, ui_manager):
        ui_manager.info('Use left and right to increase/decrease the number of clusters. k = %d' % self.k)
        self.painter.batch_draw()


    def on_key_release(self, key, modifiers):
        if key == pyglet.window.key.LEFT:
            self.k = max(2,self.k - 1)
            return True
        elif key == pyglet.window.key.RIGHT:
            self.k = self.k + 1
            return True
        return False
  




In [8]:
data = geoplotlib.utils.DataAccessObject(df)
geoplotlib.add_layer(KMeansLayer(data))
geoplotlib.set_smoothing(True)
geoplotlib.set_bbox(geoplotlib.utils.BoundingBox.DK)
geoplotlib.show()