{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
},
"colab": {
"name": "01 - GIS introduction with geopandas (vector data) #bigdive.ipynb",
"provenance": []
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "RcHoA7XG2k9X",
"colab_type": "text"
},
"source": [
"# GIS introduction with geopandas (vector data)\n",
"based on scipy2018-geospatial\n",
"\n",
"## goals of the tutorial\n",
"- the vector data and ESRI Shapefile\n",
"- the geodataframe in geopandas\n",
"- spatial projection\n",
"\n",
"**based on the open data of:**\n",
"- [ISTAT](https://www.istat.it/it/archivio/222527) Italian National Institute of Statistic \n",
"\n",
"### requirements\n",
"- python knowledge\n",
"- pandas\n",
"\n",
"### status \n",
"*\"The Earth isn't flat!!!\"*\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "PJgnO67uDpcn",
"colab_type": "text"
},
"source": [
"## install geopandas"
]
},
{
"cell_type": "code",
"metadata": {
"id": "ETYot_2J2k9c",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 465
},
"outputId": "fd2436ad-80ec-4936-9be2-23505ad3239b"
},
"source": [
"!pip install geopandas"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"Collecting geopandas\n",
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/f7/a4/e66aafbefcbb717813bf3a355c8c4fc3ed04ea1dd7feb2920f2f4f868921/geopandas-0.8.1-py2.py3-none-any.whl (962kB)\n",
"\u001b[K |████████████████████████████████| 972kB 2.8MB/s \n",
"\u001b[?25hRequirement already satisfied: shapely in /usr/local/lib/python3.6/dist-packages (from geopandas) (1.7.1)\n",
"Collecting fiona\n",
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/36/8b/e8b2c11bed5373c8e98edb85ce891b09aa1f4210fd451d0fb3696b7695a2/Fiona-1.8.17-cp36-cp36m-manylinux1_x86_64.whl (14.8MB)\n",
"\u001b[K |████████████████████████████████| 14.8MB 297kB/s \n",
"\u001b[?25hRequirement already satisfied: pandas>=0.23.0 in /usr/local/lib/python3.6/dist-packages (from geopandas) (1.0.5)\n",
"Collecting pyproj>=2.2.0\n",
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/e5/c3/071e080230ac4b6c64f1a2e2f9161c9737a2bc7b683d2c90b024825000c0/pyproj-2.6.1.post1-cp36-cp36m-manylinux2010_x86_64.whl (10.9MB)\n",
"\u001b[K |████████████████████████████████| 10.9MB 41.6MB/s \n",
"\u001b[?25hRequirement already satisfied: click<8,>=4.0 in /usr/local/lib/python3.6/dist-packages (from fiona->geopandas) (7.1.2)\n",
"Collecting click-plugins>=1.0\n",
" Downloading https://files.pythonhosted.org/packages/e9/da/824b92d9942f4e472702488857914bdd50f73021efea15b4cad9aca8ecef/click_plugins-1.1.1-py2.py3-none-any.whl\n",
"Requirement already satisfied: attrs>=17 in /usr/local/lib/python3.6/dist-packages (from fiona->geopandas) (20.2.0)\n",
"Requirement already satisfied: six>=1.7 in /usr/local/lib/python3.6/dist-packages (from fiona->geopandas) (1.15.0)\n",
"Collecting cligj>=0.5\n",
" Downloading https://files.pythonhosted.org/packages/e4/be/30a58b4b0733850280d01f8bd132591b4668ed5c7046761098d665ac2174/cligj-0.5.0-py3-none-any.whl\n",
"Collecting munch\n",
" Downloading https://files.pythonhosted.org/packages/cc/ab/85d8da5c9a45e072301beb37ad7f833cd344e04c817d97e0cc75681d248f/munch-2.5.0-py2.py3-none-any.whl\n",
"Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib/python3.6/dist-packages (from pandas>=0.23.0->geopandas) (1.18.5)\n",
"Requirement already satisfied: python-dateutil>=2.6.1 in /usr/local/lib/python3.6/dist-packages (from pandas>=0.23.0->geopandas) (2.8.1)\n",
"Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.6/dist-packages (from pandas>=0.23.0->geopandas) (2018.9)\n",
"Installing collected packages: click-plugins, cligj, munch, fiona, pyproj, geopandas\n",
"Successfully installed click-plugins-1.1.1 cligj-0.5.0 fiona-1.8.17 geopandas-0.8.1 munch-2.5.0 pyproj-2.6.1.post1\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "i528d6Nu2k9-",
"colab_type": "text"
},
"source": [
"---"
]
},
{
"cell_type": "code",
"metadata": {
"id": "RDCQpfag2k-D",
"colab_type": "code",
"colab": {}
},
"source": [
"import os\n",
"import geopandas as gpd"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "djyLoZlL2k-p",
"colab_type": "text"
},
"source": [
"# Let's start with GeoPandas\n",
"\n",
"## Importing geospatial data\n",
"\n",
"geopandas supports all the vector format offered by the project gdal/ogr\n",
"\n",
"https://www.gdal.org/ogr_formats.html \n",
"\n",
"we will play with the geospatial data offered by ISTAT\n",
"\n",
"https://www.istat.it/it/archivio/104317\n",
"\n",
"\n",
"### administrative borders\n",
"https://www.istat.it/it/archivio/222527\n",
"\n",
"the big zip with everything - year 2020\n",
"\n",
"http://www.istat.it/storage/cartografia/confini_amministrativi/generalizzati/Limiti01012020.zip"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "P9BLW56_2k-q",
"colab_type": "text"
},
"source": [
"---\n",
"#### download and investigate the data"
]
},
{
"cell_type": "code",
"metadata": {
"id": "4Y8VjbcS2k-t",
"colab_type": "code",
"colab": {}
},
"source": [
"if not os.path.exists('Limiti01012020'):\n",
" # download the data\n",
" import requests, zipfile, io\n",
" zip_file_url = 'http://www.istat.it/storage/cartografia/confini_amministrativi/generalizzati/Limiti01012020.zip'\n",
" #request the file\n",
" r = requests.get(zip_file_url)\n",
" z = zipfile.ZipFile(io.BytesIO(r.content))\n",
" #unzip the file\n",
" z.extractall()\n"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "5uRgjWv-EJnc",
"colab_type": "text"
},
"source": [
"Directory listening"
]
},
{
"cell_type": "code",
"metadata": {
"id": "NdlPDnUv2k-4",
"colab_type": "code",
"colab": {}
},
"source": [
"os.listdir(\".\")"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "kQfg5pGB2k_F",
"colab_type": "code",
"colab": {}
},
"source": [
"os.listdir('Limiti01012020')"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "kjEvvecf2k_S",
"colab_type": "text"
},
"source": [
"Limiti01012020 => main folder with all the administrative borders of Italy in 2020\n",
"- ProvCM01012020 => folder with the provinces of Italy\n",
"- Reg01012020 => folder with the regions of Italy\n",
"- RipGeo01012020 => folder with the macro-regions of Italy\n",
"- Com01012020 => folder with the municipalities of Italy\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "iSNsqfAT2k_V",
"colab_type": "text"
},
"source": [
"#### Inspect the the macro regions\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "ol9efFyJ2k_W",
"colab_type": "code",
"colab": {}
},
"source": [
"#look to the data inside the macro regions\n",
"os.chdir('Limiti01012020')\n",
"os.chdir('RipGeo01012020')"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "sqGx69vL2k_f",
"colab_type": "code",
"colab": {}
},
"source": [
"#show only the files\n",
"for root, dirs, files in os.walk(\".\"): \n",
" for filename in files:\n",
" print(filename)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "v9UDse3y2k_r",
"colab_type": "text"
},
"source": [
"### ESRI Shapefile\n",
"\n",
"this is a *ESRI Shapefile* (an old but common used format for the geospatial vector data)\n",
"\n",
"The format is proprietary and some format specifications are public.\n",
"A \"ESRI Shapefile\" is a collection of different files with the same name and different extensions.\n",
"\n",
"The public specifications are for the extensions:\n",
"\n",
"| extension | meaning | content of the file |\n",
"| --------- | ------------- | -------------------------------------------------------------------- |\n",
"| .shp | shape | the geometries (point, line, polygon) |\n",
"| .dbf | database file | the attributes to associate with the geometries |\n",
"| .shx | shape indices | the indices to join the geometries with the attributes |\n",
"| .prj | projection | the rule to understand the kind of projection used by the geometries |\n",
"\n",
"To manage the data are _necessary_ 3 files (*.shp, .shx, and .dbf*), however the *.prj* file is crucial to analyze the data togheter with other sources.\n",
"\n",
"It's possibile find other kind of files\n",
"\n",
"more informations are here\n",
"\n",
"https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7_jIxrVS2k_u",
"colab_type": "text"
},
"source": [
"**read the file with gepandas**"
]
},
{
"cell_type": "code",
"metadata": {
"id": "IBTGAbcE2k_v",
"colab_type": "code",
"colab": {}
},
"source": [
"# read the file\n",
"macroregions=gpd.read_file('RipGeo01012020_WGS84.shp')"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "VWZer03C2k_5",
"colab_type": "code",
"colab": {}
},
"source": [
"type(macroregions)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZWbCtsxc2lAD",
"colab_type": "text"
},
"source": [
"### GeoDataframe\n",
"\n",
"geopandas transform everything in a [GeoDataFrame](http://geopandas.org/data_structures.html#geodataframe).\n",
"\n",
"a geodataframe is a pandas [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) with the column \"geometry\" and special geospatial methods"
]
},
{
"cell_type": "code",
"metadata": {
"id": "881_roIL2lAF",
"colab_type": "code",
"colab": {}
},
"source": [
"macroregions"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "YrMy2Bcg2lAP",
"colab_type": "text"
},
"source": [
"Eg.\n",
"calculate the area of each geometry"
]
},
{
"cell_type": "code",
"metadata": {
"id": "XALEDsAb2lAS",
"colab_type": "code",
"colab": {}
},
"source": [
"macroregions.geometry.area"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "7vkrcV5j2lAc",
"colab_type": "text"
},
"source": [
"**you can plot it**"
]
},
{
"cell_type": "code",
"metadata": {
"id": "TTAF2UCs2lAe",
"colab_type": "code",
"colab": {}
},
"source": [
"macroregions.plot(figsize=(10,10))"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "rorgHLgg2lAn",
"colab_type": "text"
},
"source": [
"... and use the **classic methods of the pandas DataFrame.**\n",
"\n",
"Eg.\n",
"\n",
"extract a (geo)DataFrame by filter from an attribute"
]
},
{
"cell_type": "code",
"metadata": {
"id": "VPsZfibd2lAo",
"colab_type": "code",
"colab": {}
},
"source": [
"isole = macroregions[macroregions['DEN_RIP'] == 'Isole']"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "Ih-pqOIj2lAy",
"colab_type": "code",
"colab": {}
},
"source": [
"isole.plot()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "p-EzUSjX2lA6",
"colab_type": "text"
},
"source": [
"in a esri shapefile the kind of geometry is always the same, but a geodataframe can accept mixed geometries for each row."
]
},
{
"cell_type": "code",
"metadata": {
"id": "L34gC60U2lA7",
"colab_type": "code",
"colab": {}
},
"source": [
"macroregions.geom_type"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "8-iew2aa2lBE",
"colab_type": "text"
},
"source": [
"in our case we have a MultiPolygon\n",
"the geometries allowed are:\n",
"\n",
"|geometry|images|\n",
"|:--|--:|\n",
"|POINT||\n",
"|LINESTRING||\n",
"|LINEARRING||\n",
"|POLYGON|\n",
"|MULTIPOINT| \n",
"| MULITLINESTRING|| \n",
"| MULTIPOLYGON || \n",
"| GEOMETRYCOLLECTION | | \n",
"\n",
"note: table based on the wikipedia page [WKT](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9sIjat2e2lBF",
"colab_type": "text"
},
"source": [
"#### and we are ready to look how are the geometries"
]
},
{
"cell_type": "code",
"metadata": {
"id": "MLNkTOTE2lBG",
"colab_type": "code",
"colab": {}
},
"source": [
"macroregions.geometry[0]"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "zt9ni3-N2lBO",
"colab_type": "code",
"colab": {}
},
"source": [
"macroregions.DEN_RIP[0]"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "gn0Pb7R12lBW",
"colab_type": "code",
"colab": {}
},
"source": [
"macroregions.geometry[1]"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "LWa8Cst_2lBd",
"colab_type": "code",
"colab": {}
},
"source": [
"macroregions.DEN_RIP[1]"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "626pbzf72lBj",
"colab_type": "code",
"colab": {}
},
"source": [
"macroregions.geometry[2]"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "mku7VLw92lBr",
"colab_type": "code",
"colab": {}
},
"source": [
"macroregions.DEN_RIP[2]"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "wd5RNYpo2lBx",
"colab_type": "code",
"colab": {}
},
"source": [
"macroregions.geometry[3]"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "HKmlh4wl2lB4",
"colab_type": "code",
"colab": {}
},
"source": [
"macroregions.geometry[4]"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "JWINHkgX2lB8",
"colab_type": "code",
"colab": {}
},
"source": [
"macroregions.DEN_RIP[4]"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "HQjiV1mZ2lCH",
"colab_type": "text"
},
"source": [
"**the red color, in this case, means a mistake on the geometries!!!**\n",
"##### ... and we can check it!"
]
},
{
"cell_type": "code",
"metadata": {
"id": "uzdcGL1H2lCI",
"colab_type": "code",
"colab": {}
},
"source": [
"macroregions.geometry.is_valid"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "xe08t9SsGI3X",
"colab_type": "code",
"colab": {}
},
"source": [
"macroregions.geometry[4].buffer(0)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "oK4VI9AM2lCN",
"colab_type": "text"
},
"source": [
"#### Do you want know the centroid of each geometry?"
]
},
{
"cell_type": "code",
"metadata": {
"id": "JG3nAhUp2lCN",
"colab_type": "code",
"colab": {}
},
"source": [
"macroregions.geometry.centroid"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "ArKCUxnR2lCT",
"colab_type": "text"
},
"source": [
"the output of the geometries is in [well knowtext format (WKT)](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry)\n",
"\n",
"but ... how are expressed the coordinates??\n",
"we have to know the Coordinate Reference System (CRS)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "yH2upKTF2lCV",
"colab_type": "text"
},
"source": [
"## The true size\n",
"\n",
"\n",
"https://thetruesize.com/"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "_WGDUJt12lCW",
"colab_type": "text"
},
"source": [
"# SPATIAL PROJECTIONS\n",
"\n",
"**CRS** = *Coordinate Reference System*"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "A_fmf24U2lCX",
"colab_type": "text"
},
"source": [
"## How to convert in latitude/longitude?"
]
},
{
"cell_type": "code",
"metadata": {
"id": "9y_0IEny2lCY",
"colab_type": "code",
"colab": {}
},
"source": [
"macroregions.crs"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "dMGysEI32lCd",
"colab_type": "text"
},
"source": [
"## EPSG?\n",
"European Petroleum Survey Group (1986-2005)
\n",
"[IOGP](https://www.iogp.org/about-us/) - International Association of Oil & Gas Producers (2005-now)\n",
"\n",
"An important project is the [EPSG registry](http://www.epsg-registry.org/) - the dataset of geodetic parameters\n",
"\n",
"http://epsg.io/32632\n",
"\n",
"\n",
"\n",
"\n",
"\n",
""
]
},
{
"cell_type": "code",
"metadata": {
"id": "pfYfEdPm2lCf",
"colab_type": "code",
"colab": {}
},
"source": [
"macroregions.to_crs({'init': 'epsg:4326'}).geometry.centroid"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "qysf6w-w2lCk",
"colab_type": "code",
"colab": {}
},
"source": [
"macroregions.to_crs({'init': 'epsg:4326'}).plot(figsize=(10,10))"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "kZkt6qfn2lCq",
"colab_type": "code",
"colab": {}
},
"source": [
"# in UTM 32N\n",
"macroregions.plot(figsize=(10,10))"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "jKPjCeIt2lC1",
"colab_type": "text"
},
"source": [
"\n",
"### WGS84 VS ETRS89\n",
"\n",
"| [WGS84](https://epsg.io/4326) | [ETRS89](https://epsg.io/4258) |\n",
"|---|---|\n",
"|  |  |"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "w_7rmHa-2lC2",
"colab_type": "text"
},
"source": [
"## exploring a .prj file"
]
},
{
"cell_type": "code",
"metadata": {
"id": "GcZ3tDKn2lC3",
"colab_type": "code",
"colab": {}
},
"source": [
"f=open('RipGeo01012020_WGS84.prj','r')"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "_MFmsoca2lC9",
"colab_type": "code",
"colab": {}
},
"source": [
"f.read()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "EeNhHdDu2lDC",
"colab_type": "text"
},
"source": [
"... like here\n",
"http://epsg.io/32632.wkt\n",
"\n",
"http://epsg.io/32632.prettywkt\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "iuDKufVM2lDD",
"colab_type": "text"
},
"source": [
"