GEOG 172: INTERMEDIATE GEOGRAPHICAL ANALYSIS

Evgeny Noi

Lecture 10: Global Spatial Autocorrelation

Global Spatial Autocorrrelation¶

Moran's $I$ is just one measure of spatial autocorrelation. There are multiple SA measures.
Geary's $c$
Getis and Ord's $G$

Geary's $c$¶

$$ C = \dfrac{(n-1)} {2 \sum_i \sum_j w_{ij}} \dfrac{\sum_i \sum_j w_{ij} (y_i - y_{j})^2} {\sum_i (y_i - \bar{y})^2} $$

where $n$ is the number of observations, $w_{ij}$ is the cell of binary matrix, and $\bar{y}$ is the sample mean.

When compared to Moran’s I, it is apparent both measures compare the relationship of $Y$ within each observation's local neighborhood.
Moran’s I takes cross-products on the standardized values, Geary’s C uses differences on the values without any standardization.

In [1]:

from libpysal.examples import load_example
import geopandas as gpd
guerry_loc = load_example('Guerry')
guerry = gpd.read_file(guerry_loc.get_path('guerry.geojson'))
print(guerry.shape)
guerry.head()

Example not available: Guerry
Example not downloaded: Chicago parcels
Example not downloaded: Chile Migration
Example not downloaded: Spirals
(85, 24)

Out[1]:

	dept	Region	Dprtmnt	Crm_prs	Crm_prp	Litercy	Donatns	Infants	Suicids	MainCty	...	Infntcd	Dntn_cl	Lottery	Desertn	Instrct	Prsttts	Distanc	Area	Pop1831	geometry
0	1	E	Ain	28870	15890	37	5098	33120	35039	2	...	60	69	41	55	46	13	218.372	5762	346.03	MULTIPOLYGON (((801150.00000 2092615.00000, 80...
1	2	N	Aisne	26226	5521	51	8901	14572	12831	2	...	82	36	38	82	24	327	65.945	7369	513.00	MULTIPOLYGON (((729326.00000 2521619.00000, 72...
2	3	C	Allier	26747	7925	13	10973	17044	114121	2	...	42	76	66	16	85	34	161.927	7340	298.26	MULTIPOLYGON (((710830.00000 2137350.00000, 71...
3	4	E	Basses-Alpes	12935	7289	46	2733	23018	14238	1	...	12	37	80	32	29	2	351.399	6925	155.90	MULTIPOLYGON (((882701.00000 1920024.00000, 88...
4	5	E	Hautes-Alpes	17488	8174	69	6962	23076	16171	1	...	23	64	79	35	7	1	320.280	5549	129.10	MULTIPOLYGON (((886504.00000 1922890.00000, 88...

5 rows × 24 columns

In [2]:

from libpysal.weights.contiguity import Queen
import splot
from esda.moran import Moran

y = guerry['Crm_prp'].values
w = Queen.from_dataframe(guerry)
w.transform = 'r'

moran = Moran(y, w)
moran.I

Out[2]:

0.26355334031776523

Geary's C Permutation Inference¶

Inference is performed in a similar way as with Moran’s I.
Through simulation we can draw an empirical distribution of the statistic under the null of spatial randomness, and then compare it with the statistic obtained when using the observed geographical distribution of the data.
To access the pseudo p-value, calculated as in the Moran case, we can call p_sim:

In [3]:

from esda.geary import Geary

geary = Geary(guerry["Crm_prp"], w)
print(f"The Geary's C is {geary.C}")
print(f"The simulated pseudo p-value is {geary.p_sim}")

The Geary's C is 0.6778380960702041
The simulated pseudo p-value is 0.001

Getis & Ord's G¶

Global version of a family of statistics of spatial autocorrelation based on distance. Originally measure of 'concentration'
Originally conceived for points, thus we use distances in $W_{ij}$ (as opposed to binary)
It is designed for the study of positive variables with a natural origin

$$ G(d) = \dfrac{ \sum_i \sum_j w_{ij}(d) \, y_i \, y_j } { \sum_i \sum_j y_i \, y_j } $$

where $w_{ij}(d)$ is the binary weight assigned based on the distance bin.

Getis & Ord's G¶

IMPORTANT! Best suited to test to what extent similar values (either high or low) tend to co-locate. $G$ is a statistic of positive spatial autocorrelation. Thus the statistic is not able to pick up cases of negative spatial autocorrelation.

In [4]:

import pandas as pd
from pysal.lib import weights
pts = guerry.centroid
xys = pd.DataFrame({"X": pts.x, "Y": pts.y})
min_thr = weights.util.min_threshold_distance(xys)
min_thr

C:\Users\noibar\AppData\Local\Temp\ipykernel_6356\1545389804.py:3: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  pts = guerry.centroid

Out[4]:

96726.14323874969

Getis & Ord's G¶

Similarly, inference can also be carried out by relying on computational simulations that replicate several instances of spatial randomness using the values in the variable of interest, but shuffling their locations. In this case, the pseudo P-value computed suggests a clear departure from the hypothesis of no concentration.

In [5]:

from esda.getisord import G

w_db = weights.DistanceBand.from_dataframe(guerry, min_thr)
gao = G(guerry["Crm_prp"], w_db)
print(
    "Getis & Ord G: %.3f | Pseudo P-value: %.3f" % (gao.G, gao.p_sim)
)

Getis & Ord G: 0.049 | Pseudo P-value: 0.001

C:\Users\noibar\anaconda3\envs\geo_env\lib\site-packages\esda\getisord.py:167: RuntimeWarning: overflow encountered in longlong_scalars
  EG2DEN = ((sum(y) ** 2 - sum(y2)) ** 2) * n * (n - 1) * (n - 2) * (n - 3)

Bivariate Moran's¶

The concept of bivariate spatial correlation is complex and often misinterpreted. It is typically considered to be the correlation between one variable and the spatial lag of another variable, as originally implemented in the precursor of GeoDa (e.g., as described in Anselin, Syabri, and Smirnov 2002). However, this does not take into account the inherent correlation between the two variables. More precisely, the bivariate spatial correlation is between $x_i$ and $\Sigma_j w_{ij}y_{j}$, but does not take into account the correlation between $x_i$ and $y_i$, i.e., between the two variables at the same location. <...> As a result, this statistic is often interpreted incorrectly, as it may overestimate the spatial aspect of the correlation that instead may be due mostly to the in-place correlation.

source

Bivariate Moran's¶

In its initial conceptualization, as mentioned above, a bivariate Moran scatter plot extends the idea of a Moran scatter plot with a variable on the x-axis and its spatial lag on a y-axis to a bivariate context. The fundamental difference is that in the bivariate case the spatial lag pertains to a different variable.

$$ I_B = \frac{\sum_i (\sum_j w_{ij} y_j \times x_i)}{ \sum_i x_i^2}, $$

In [8]:

from esda.moran import Moran_BV

x = guerry['Litercy'].values
y = guerry['Crm_prp'].values

moran_bv = Moran_BV(y, x, w)
print(f"Bivariate Moran's I: {round(moran_bv.I, 3)}")

Bivariate Moran's I: -0.316

In [11]:

import scipy 
import matplotlib.pyplot as plt
import numpy as np

plt.scatter(x,y)
plt.plot(np.unique(x), np.poly1d(np.polyfit(x, y, 1))(np.unique(x)))

r, p = scipy.stats.pearsonr(x, y) 

In [12]:

from splot.esda import plot_moran_bv_simulation, plot_moran_bv
import matplotlib.pyplot as plt

plot_moran_bv(moran_bv)
plt.show()

C:\Users\noibar\anaconda3\envs\geo_env\lib\site-packages\splot\_viz_esda_mpl.py:628: FutureWarning: 

`shade` is now deprecated in favor of `fill`; setting `fill=True`.
This will become an error in seaborn v0.14.0; please update your code.

  sbn.kdeplot(moran_bv.sim, shade=shade, color=color, ax=ax, **kwargs)

In [13]:

from esda.moran import Moran_BV_matrix
from splot.esda import moran_facet
guerry_vars = guerry[['Crm_prs', 'Litercy', 'Donatns', 'Wealth']]
matrix = Moran_BV_matrix(guerry_vars, w)

moran_facet(matrix)
fig = plt.gcf()
fig.savefig('moran_facet.png')
plt.close()

In [14]:

from IPython.display import Image
Image(filename='moran_facet.png') 

Out[14]:

Even more types of autocorrelating one variable¶

Differential Moran's $I$
Moran's Scatter plot for EB rates
Spatial correlogram

Check out GeoDa documentation for more details.¶

GeoDa¶

Basic Mapping
E(S)DA
Spatial Weights
Global Spatial Autocorrelation
Local Spatial Autocorrelation
Clustering / Regionalization
Dimension reduction

GEOG 172: INTERMEDIATE GEOGRAPHICAL ANALYSIS

Evgeny Noi

Lecture 10: Global Spatial Autocorrelation

Global Spatial Autocorrrelation¶

Geary's $c$¶

Geary's C Permutation Inference¶

Getis & Ord's G¶

Getis & Ord's G¶

Getis & Ord's G¶

Bivariate Moran's¶

Bivariate Moran's¶

Bivariate Moran Facet Matrix¶

Even more types of autocorrelating one variable¶

Check out GeoDa documentation for more details.¶

GeoDa¶

GeoDa¶