GEOG 172: INTERMEDIATE GEOGRAPHICAL ANALYSIS

Evgeny Noi

Lecture 04: Distance, Adjacency and Contiguity

Reminders¶

  1. Lab2 is due next week
  2. Quiz 2 is tomorrow

Characterizing location of an object¶

  • 1d - one reference point
  • 2d - two reference points

Coordinate Systems¶

  • Plane Coordinate systems
    • Cartesian coordinates (origin + axes). $x$ (horizontal, E-W), $y$ (vertical, N-S).
    • Polar coordinates (distance from origin + angle): R and $\theta$. $x=r \sin \theta$, $y=r \cos \theta$, $r=\sqrt{x^2+y^2}$, $\theta=\arctan(x/y)$
  • Global Reference Systems
    • Latitude (90 degrees S/N, parallels, ecuator), Longitude (180 degrees E/W, meridians, Greenwich). Graticule.

Distance

Georeferencing and distance¶

  • Once we can 'place' an object in space we can calculate the distance between any number of objects
  • Distance will depend on the underlying coordinate system that we use
    • large distances: use great circle distance
    • small distances: use Euclidean coordinates or other CS

Great Circle Distance¶

  • Assume Earth is a sphere with radius $R$ and two points $A(\phi_1, \lambda_1)$ and $B(\phi_2, \lambda_2)$
  • Formula: $$ d = R \arccos([\sin \phi_1 \times \sin \phi_2]+[\cos \phi_1 \times cos \phi_2 \times cos(\lambda_1 - \lambda_2)]) $$

Exercise¶

In groups of 2-4 people, write a Python program that calculates great circle distance between NYC and London.

In [2]:
import numpy as np
import math
from geopy.distance import great_circle

def great_circle_dist(a,b):
    '''
    given two points a and b, calculate great circle distance between them
    where phi(s) are latitudes and lambda(s) are longitudes
    '''
    r = 6378.1 # km
    phi1 = math.radians(a[1]) 
    phi2 = math.radians(b[1])
    lambda1 = math.radians(a[0])
    lambda2 = math.radians(b[0])
    
    d = r * math.acos((math.sin(phi1)*math.sin(phi2)) + (math.cos(phi1)*math.cos(phi2)*math.cos(abs(lambda1-lambda2))))
    return d 

NYC = (-73.935242, 40.730610) # lng, lat
London = (-0.1275, 51.50722) # lng, lat

dist = great_circle_dist(NYC, London)
print(dist, 'km')
print(great_circle(NYC[::-1], London[::-1]))
5570.560077026861 km
5564.366878189245 km
In [4]:
from IPython.display import Image
Image(filename='flights.jpeg') 
Out[4]:
In [5]:
import osmnx as ox
import pyproj 
import geopandas as gpd
#pyproj.datadir.get_data_dir()
#pyproj.datadir.set_data_dir('C:/Users/barguzin/Anaconda3/envs/geo_env/Library/share/proj')
ox.settings.use_cache = True
import contextily as cx

goleta = ox.geocode_to_gdf("Goleta, CA, USA") # get Goleta

go_graph = ox.graph_from_place('Goleta, CA, USA', network_type="drive")
go_nodes, go_streets  = ox.graph_to_gdfs(go_graph)

tags = {'amenity': ['pub', 'bar', 'cafe', 'restaurant']} # used for parsing OSM data
dist = 5000 # set search radius (in meters) 

# download POIs
pois = ox.geometries.geometries_from_point(center_point = (goleta.lat[0],goleta.lon[0]), tags=tags, dist=dist)
# keep only points 
pois = pois.loc[pois.geom_type=="Point"]

go_proj = ox.project_gdf(goleta, to_crs='EPSG:3857') # re-project layers
go_streets_proj = ox.project_gdf(go_streets, to_crs='EPSG:3857')
pois_proj = ox.project_gdf(pois, to_crs='EPSG:3857')

# print to the output
pois.head(3)
Out[5]:
amenity brand brand:wikidata brand:wikipedia cuisine name official_name takeaway geometry source ... disused:building capacity height roof:colour roof:shape contact:facebook addr:housename designation ways type
element_type osmid
node 448863565 cafe Starbucks Q37158 en:Starbucks coffee_shop Starbucks Starbucks Coffee yes POINT (-119.84817 34.41147) NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1341709739 restaurant NaN NaN NaN american IHOP NaN NaN POINT (-119.78818 34.44325) Local Knowledge ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1348747781 restaurant NaN NaN NaN indian Masala Spice Indian Cuisine NaN NaN POINT (-119.82570 34.44172) NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

3 rows × 58 columns

In [6]:
fig, ax = plt.subplots()

go_proj.plot(ax=ax, color='gray') 
go_streets_proj.plot(ax=ax, zorder=2, lw=.25, color='k')
pois_proj.plot(ax=ax, color='r', markersize=15, ec='w')
go_proj.centroid.plot(color='r', markersize=200, ec='w', marker='*', ax=ax, zorder=3)
cx.add_basemap(ax=ax)
_ = ax.axis("off")

Different Types of Distance¶

Assume we would like to calculate how far each of the POIs from the center of the polygon.

  1. Calculate pairwise distance between points
    1. Euclidean
    2. Manhattan
    3. Network distance
In [7]:
centr = gpd.GeoDataFrame(data=[1], columns=['geo_id'], geometry=go_proj.geometry.centroid)

dist_to_centr = pois_proj.distance(centr.geometry.iloc[0]).reset_index()
dist_to_centr.drop('element_type', axis=1, inplace=True)
dist_to_centr.columns = ['osm_id', 'dist_m']
dist_to_centr.head()
Out[7]:
osm_id dist_m
0 448863565 3548.258535
1 1341709739 8014.092418
2 1348747781 3856.467935
3 1396621915 2469.264505
4 1469690940 6317.223055
In [8]:
# add the data back to the pois_proj 
print(pois_proj.shape)
pois_proj = pois_proj.merge(dist_to_centr, left_on='osmid', right_on='osm_id')
print(pois_proj.shape)
print(type(pois_proj))
(46, 58)
(46, 60)
<class 'geopandas.geodataframe.GeoDataFrame'>
In [9]:
import shapely.geometry as spg 

lines = []

for i, j in zip(pois_proj.geometry.values, np.repeat(centr.centroid.geometry.values, pois_proj.shape[0])):
    ax, ay = i.x, i.y
    bx, by = j.x, j.y
    l = spg.LineString([(ax,ay),(bx,by)])
    lines.append(l)
    
lines_gdf = gpd.GeoDataFrame(data=np.arange(0,len(lines)), columns=['geo_id'], geometry=lines, crs='epsg:3857')

print(lines_gdf.crs, centr.crs)
lines_gdf.head()
epsg:3857 EPSG:3857
Out[9]:
geo_id geometry
0 0 LINESTRING (-13341437.257 4084187.000, -133427...
1 1 LINESTRING (-13334759.579 4088475.583, -133427...
2 2 LINESTRING (-13338935.474 4088270.153, -133427...
3 3 LINESTRING (-13344993.881 4088445.536, -133427...
4 4 LINESTRING (-13336396.922 4087430.222, -133427...
In [10]:
fig, ax = plt.subplots(figsize=(12,5))

go_proj.plot(ax=ax, color='gray') 
go_streets_proj.plot(ax=ax, zorder=2, lw=.25, color='k')
pois_proj.plot(ax=ax, color='r', markersize=15, ec='w')
go_proj.centroid.plot(color='r', markersize=200, ec='w', marker='*', ax=ax, zorder=3)
lines_gdf.plot(ax=ax, color='r', linestyle='dashed', linewidth=.5)
cx.add_basemap(ax=ax)
    
for index, row in pois_proj.iloc[:5,].iterrows():
    ax.annotate(np.round(row['dist_m'],2), (row['geometry'].x, row['geometry'].y), fontsize=10, weight='bold')

_ = ax.axis("off")
In [11]:
print(f'Average distance from POIs to the centroid of Goleta is {np.round(pois_proj.dist_m.mean(),2)}m')
Average distance from POIs to the centroid of Goleta is 3039.32m
In [12]:
# reproject graph and generate find central node
Gp = ox.project_graph(go_graph, to_crs='epsg:3857')
# find each nearest node to several points, and optionally return distance
nodes, dists = ox.nearest_nodes(Gp, pois_proj.geometry.x, pois_proj.geometry.y, return_dist=True)

centr_node = ox.nearest_nodes(Gp, centr.geometry.x, centr.geometry.y)
centr_node = centr_node[0]
In [13]:
routes_list = []
for i in nodes:
    route = ox.shortest_path(Gp, centr_node, list(go_graph[i])[0], weight="length")
    if route is not None:
        routes_list.append(route)

ox.plot_graph_routes(Gp, routes=routes_list, route_linewidth=1, node_size=0);
In [14]:
# create lines from routes 
nodes2, edges2 = ox.graph_to_gdfs(Gp)

list_lines = []

for r in routes_list:
    route_nodes = nodes2.loc[r]
    l = spg.LineString(route_nodes['geometry'].to_list())
    list_lines.append(l)

gdf1 = gpd.GeoDataFrame(geometry=list_lines, crs='epsg:3857')

print(f'Average distance from POIs to the centroid of Goleta on the road network: {np.round(gdf1.geometry.length.mean(),2)}m')

gdf1.plot();
Average distance from POIs to the centroid of Goleta on the road network: 3763.46m

Geographic Proximity¶

What do we call adjacent when it comes to points, lines, polygons?

  1. Buffer
  2. Nearest Neighbor
  3. Adjacency
In [15]:
# Buffer
fig, ax = plt.subplots(figsize=(12,5))

go_proj.plot(ax=ax, color='gray') 
go_proj.centroid.plot(color='r', markersize=200, ec='w', marker='*', ax=ax)
go_proj.centroid.buffer(1000).plot(fc='None', ec='r', ax=ax)
go_proj.centroid.buffer(2000).plot(fc='None', ec='r', ax=ax, linestyle='dashed')
go_proj.centroid.buffer(5000).plot(fc='None', ec='r', ax=ax, linestyle='-.')

pois_proj.plot(ax=ax, color='b', markersize=15, ec='w')
cx.add_basemap(ax=ax)

_ = ax.axis("off")

Polygons¶

In [16]:
import cenpy 
dectest = cenpy.products.Decennial2010()
gol_data = dectest.from_place('Goleta,CA', level='block', variables=['^P004', 'P001001'])

fig, ax = plt.subplots(figsize=(12,4))

gol_data.plot(column='P004003', cmap='Reds', ax=ax, fc='w', ec='k');
gol_data.plot(ax=ax, fc='None', ec='k');
Matched: Goleta,CA to Goleta city within layer Incorporated Places
C:\Users\noibar\AppData\Local\Temp\ipykernel_22332\2738113903.py:3: FutureWarning: The `op` parameter is deprecated and will be removed in a future release. Please use the `predicate` parameter instead.
  gol_data = dectest.from_place('Goleta,CA', level='block', variables=['^P004', 'P001001'])
In [20]:
from pysal.lib import weights

w_queen = weights.contiguity.Queen.from_dataframe(gol_data)
w_rook = weights.contiguity.Rook.from_dataframe(gol_data)

w_queen.neighbors[0]
C:\Users\noibar\anaconda3\envs\geo_env\lib\site-packages\libpysal\weights\weights.py:172: UserWarning: The weights matrix is not fully connected: 
 There are 11 disconnected components.
 There are 7 islands with ids: 86, 134, 144, 190, 232, 264, 276.
  warnings.warn(message)
C:\Users\noibar\anaconda3\envs\geo_env\lib\site-packages\libpysal\weights\weights.py:172: UserWarning: The weights matrix is not fully connected: 
 There are 12 disconnected components.
 There are 7 islands with ids: 86, 134, 144, 190, 232, 264, 276.
  warnings.warn(message)
Out[20]:
[322, 42, 146, 85, 182, 87, 184, 286]
In [21]:
import pandas as pd

df_q = pd.DataFrame(*w_queen.full()).astype(int)
df_r = pd.DataFrame(*w_rook.full()).astype(int)
print(df_q.shape) 
(365, 365)
In [23]:
import warnings
warnings.filterwarnings('ignore')

# Plot tract geography
fig, ax = plt.subplots(figsize=(4, 4))

minx, miny, maxx, maxy = go_proj.centroid.buffer(2000).total_bounds;

gol_data.plot(ax=ax, fc='w', ec='k');
w_queen.plot(gol_data,ax=ax, edge_kws=dict(color="r", linestyle=":", linewidth=1),
        node_kws=dict(marker=""));

ax.set_xlim(minx, maxx)
ax.set_ylim(miny, maxy)
Out[23]:
(4085497.682154004, 4089497.682154004)
In [24]:
import pandas as pd

fig, ax = plt.subplots(figsize=(12,4))

# check the number of neighbors
s = pd.Series(w_queen.cardinalities)
s2 = pd.Series(w_rook.cardinalities)
s.plot.hist(bins=s.unique().shape[0], ax=ax, color='b', alpha=.5);
s2.plot.hist(bins=s.unique().shape[0], ax=ax, color='r', alpha=.5);

ax.axvline(s.mean(), color='b', lw=2)
ax.axvline(s2.mean(), color='r', lw=2)
Out[24]:
<matplotlib.lines.Line2D at 0x20eeeff3be0>

Weight matrices¶

  • Contiguity-based
  • Distance-based
  • Kernel-based
In [25]:
knn4_w = weights.distance.KNN.from_dataframe(gol_data, k=4)

knn4_w.islands
Out[25]:
[]
In [26]:
knn4_w.histogram
Out[26]:
[(4, 365)]

Exercise¶

Create a weight matrix for the units below.

In [28]:
from IPython.display import Image
Image(filename='vor.png') 
Out[28]:

More examples on spatial weighs¶

For more examples check out the Chapter on Spatial Weights in Geographic Data Science

Issues with Spatial Data¶

  • Abstraction
  • Data errors and Uncertainty (think about GPS accuracy versus Bluetooth accuracy)
  • Independence Assumption
  • Measurement and methods
  • Boundary
  • MAUP and Ecological Fallacy

PROBLEM: Statistics requires random sample. Geographic samples are almost never random.¶

SOLUTION: Account for this and run tests.¶

Spatial Autocorrelation¶

  • Closer objects tend to be more similar to one another
  • Drawing samples from the same neighborhood is redundant (does not yield any new information)

Boundary¶

  • Spatial extent matters
    • Size
    • Shape
  • Where do we draw a boundary? Franklin County, Columbus MSA, and Columbus will have different population density, but are very close geographically
  • Edge effects, non-uniformity

Spatial Analysis Concern¶

  • Modifiable Area Unit Problem
    • Method is sensitive to spatial scale / unit definition
    • Changes in spatial representation might affect outcomes
  • Ecological Fallacy
    • erroneous conculsion made about individuals in the area based on aggregate characteristics of area

Source: GIS Lounge

Source: Swift, A., Liu, L., & Uber, J. (2008). Reducing MAUP bias of correlation statistics between water quality and GI illness. Computers, Environment and Urban Systems, 32(2), 134-148.

More examples on MAUP with Python Code¶

https://gist.github.com/darribas/8b5a7b93d4085223f1c5

Questions?¶