{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Structures" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "ename": "ModuleNotFoundError", "evalue": "No module named 'rasterio'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mpandas\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mgeopandas\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 5\u001b[0;31m \u001b[0;32mimport\u001b[0m \u001b[0mrasterio\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 6\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mrasterio\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mplot\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mshow\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mrioshow\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'rasterio'" ] } ], "source": [ "%matplotlib inline\n", "\n", "import pandas\n", "import geopandas\n", "import rasterio\n", "from rasterio.plot import show as rioshow" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Contents\n", "\n", "* [Core data types](#Core-data-types)\n", "* [Reading (spatial) data](#Reading-(spatial)-data)\n", "* [`(Geo)DataFrames`](#(Geo)DataFrames)\n", "* [`Series`](#Series)\n", "* [The `geometry` column](#The-geometry-column)\n", " * [CRS](#CRS)\n", " * [Geometries](#Geometries)\n", " * [Geometric operations](#Geometric-operations)\n", "* [A note on rasters](#A-note-on-rasters)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Core data types" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Core:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "int" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(1)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "float" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(1.0)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "str" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type('a')" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "str" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type('hello world!')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Extensions:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Timestamp('2019-11-05 09:00:00')" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pandas.to_datetime(\"2019-11-05 9:00\")" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[Apples, Oranges]\n", "Categories (2, object): [Apples, Oranges]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pandas.Categorical([\"Apples\", \"Oranges\"])" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "" ], "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from shapely.geometry import Point\n", "\n", "Point(-0.08947918950509948, 51.49441830214852)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reading (spatial) data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For non-spatial data, we use `pandas` and its `read_XXX` methods. Have a peak at what's available by typing `pandas.read_` and pressing `TAB`; auto-completion will show you all supported file formats." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For spatial data, `geopandas` *extends* `pandas` functionality to support vector spatial data. Let's illustrate its main `read_file` method with a dataset of AirBnb aggregate statistics for Inner London:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "db = geopandas.read_file('../data/lux_regions.gpkg')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `(Geo)DataFrames`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When you read a multi-column tabular file, a `DataFrame` is created. If that table contains spatial information and is read with `geopandas`, you get a `GeoDataFrame`:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "geopandas.geodataframe.GeoDataFrame" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(db)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Both data structures are very similar and modeled after relational databases like SQL (and not completely unlike an Excel Spreadsheet!). Let's print the top (\"head\") of the table to inspect its contents:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
POPULATIONCOMMUNE_1LAU2_subtypeCOMMUNEDISTRICTCANTONtree_countghsl_poplight_levelgeometry
010750Schifflange0214Inspire_10072018SchifflangeLuxembourgEsch-sur-Alzette3.07699.378065681.0POLYGON ((5.999664443592493 49.51861834298563,...
11265Bech1002Inspire_10072018BechGrevenmacherEchternach4.01156.143936567.0POLYGON ((6.347791382484162 49.76015684859829,...
235040Esch-sur-Alzette0204Inspire_10072018Esch-sur-AlzetteLuxembourgEsch-sur-Alzette0.030072.8041721370.0POLYGON ((5.966652609366625 49.51287419342548,...
38169Walferdange0310Inspire_10072018WalferdangeLuxembourgLuxembourg4.07496.442968730.0POLYGON ((6.125796013396808 49.66347171771235,...
49440Mersch0409Inspire_10072018MerschLuxembourgMersch9.08168.8358422388.0POLYGON ((6.08267171189537 49.77458013764257, ...
\n", "
" ], "text/plain": [ " POPULATION COMMUNE_1 LAU2 _subtype COMMUNE \\\n", "0 10750 Schifflange 0214 Inspire_10072018 Schifflange \n", "1 1265 Bech 1002 Inspire_10072018 Bech \n", "2 35040 Esch-sur-Alzette 0204 Inspire_10072018 Esch-sur-Alzette \n", "3 8169 Walferdange 0310 Inspire_10072018 Walferdange \n", "4 9440 Mersch 0409 Inspire_10072018 Mersch \n", "\n", " DISTRICT CANTON tree_count ghsl_pop light_level \\\n", "0 Luxembourg Esch-sur-Alzette 3.0 7699.378065 681.0 \n", "1 Grevenmacher Echternach 4.0 1156.143936 567.0 \n", "2 Luxembourg Esch-sur-Alzette 0.0 30072.804172 1370.0 \n", "3 Luxembourg Luxembourg 4.0 7496.442968 730.0 \n", "4 Luxembourg Mersch 9.0 8168.835842 2388.0 \n", "\n", " geometry \n", "0 POLYGON ((5.999664443592493 49.51861834298563,... \n", "1 POLYGON ((6.347791382484162 49.76015684859829,... \n", "2 POLYGON ((5.966652609366625 49.51287419342548,... \n", "3 POLYGON ((6.125796013396808 49.66347171771235,... \n", "4 POLYGON ((6.08267171189537 49.77458013764257, ... " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "db.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Other quick exploratory methods:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 102 entries, 0 to 101\n", "Data columns (total 11 columns):\n", "POPULATION 102 non-null int64\n", "COMMUNE_1 102 non-null object\n", "LAU2 102 non-null object\n", "_subtype 102 non-null object\n", "COMMUNE 102 non-null object\n", "DISTRICT 102 non-null object\n", "CANTON 102 non-null object\n", "tree_count 102 non-null float64\n", "ghsl_pop 102 non-null float64\n", "light_level 102 non-null float64\n", "geometry 102 non-null object\n", "dtypes: float64(3), int64(1), object(7)\n", "memory usage: 8.9+ KB\n" ] } ], "source": [ "db.info()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(102, 11)" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "db.shape" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
POPULATIONtree_countghsl_poplight_level
count102.000000102.000000102.000000102.000000
mean5902.0098045.2450985542.1616661076.931373
std12231.2439235.62778911155.324437749.838811
min790.0000000.000000815.144388299.000000
25%1915.0000002.0000001726.167268628.750000
50%2937.5000004.0000003072.414461896.500000
75%5489.0000007.0000005190.1778611255.250000
max116323.00000043.000000106143.9568255614.000000
\n", "
" ], "text/plain": [ " POPULATION tree_count ghsl_pop light_level\n", "count 102.000000 102.000000 102.000000 102.000000\n", "mean 5902.009804 5.245098 5542.161666 1076.931373\n", "std 12231.243923 5.627789 11155.324437 749.838811\n", "min 790.000000 0.000000 815.144388 299.000000\n", "25% 1915.000000 2.000000 1726.167268 628.750000\n", "50% 2937.500000 4.000000 3072.414461 896.500000\n", "75% 5489.000000 7.000000 5190.177861 1255.250000\n", "max 116323.000000 43.000000 106143.956825 5614.000000" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "db.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `Series`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`DataFrames` are two-dimensional array-like structures (think a matrix but with mixed types), and are \"made up\" of `Series`, which are one-dimensional objects (think of vectors). " ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 10750\n", "1 1265\n", "2 35040\n", "3 8169\n", "4 9440\n", "Name: POPULATION, dtype: int64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "db['POPULATION'].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The `geometry` column" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 POLYGON ((5.999664443592493 49.51861834298563,...\n", "1 POLYGON ((6.347791382484162 49.76015684859829,...\n", "2 POLYGON ((5.966652609366625 49.51287419342548,...\n", "3 POLYGON ((6.125796013396808 49.66347171771235,...\n", "4 POLYGON ((6.08267171189537 49.77458013764257, ...\n", "Name: geometry, dtype: object" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "db['geometry'].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Remember:\n", "\n", "- (Almost) like a standard `Series` object\n", "- Only one per `GeoDataFrame`\n", "- Extends `pandas` bringing all sorts of geospatial goodies" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### CRS" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Small but powerful attribute:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'init': 'epsg:4326'}" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "db.crs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**IMPORTANT**: `crs` is an attribute of a `GeoDataFrame`, not of each geometry!" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "db_wgs84 = db.to_crs(epsg=4326)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Geometries" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "poly = db.loc[0, 'geometry']" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "" ], "text/plain": [ "" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "poly" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(5.993289939672974, 49.483288039852084, 6.041161215183009, 49.52370478379051)" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "poly.bounds" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.0009631318046161011" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "poly.area" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "#poly.crs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Geometric operations" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "" ], "text/plain": [ "" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "poly2 = db.loc[27, 'geometry']\n", "poly2" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "poly.touches(poly2)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "poly.intersects(poly2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And we can \"broadcast\" this too!" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 False\n", "1 False\n", "2 True\n", "3 False\n", "4 False\n", " ... \n", "97 False\n", "98 False\n", "99 False\n", "100 False\n", "101 False\n", "Length: 102, dtype: bool" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "db.touches(poly)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
POPULATIONCOMMUNE_1LAU2_subtypeCOMMUNEDISTRICTCANTONtree_countghsl_poplight_levelgeometry
235040Esch-sur-Alzette0204Inspire_10072018Esch-sur-AlzetteLuxembourgEsch-sur-Alzette0.030072.8041721370.0POLYGON ((5.966652609366625 49.51287419342548,...
711003Bettembourg0201Inspire_10072018BettembourgLuxembourgEsch-sur-Alzette1.011060.8217531909.0POLYGON ((6.06035425169626 49.54561673366194, ...
419098Kayl0206Inspire_10072018KaylLuxembourgEsch-sur-Alzette7.06947.8241431267.0POLYGON ((6.018431355561023 49.46931155795932,...
476936Mondercange0208Inspire_10072018MondercangeLuxembourgEsch-sur-Alzette4.08458.7847202011.0POLYGON ((5.976497438379162 49.52610334413541,...
\n", "
" ], "text/plain": [ " POPULATION COMMUNE_1 LAU2 _subtype COMMUNE \\\n", "2 35040 Esch-sur-Alzette 0204 Inspire_10072018 Esch-sur-Alzette \n", "7 11003 Bettembourg 0201 Inspire_10072018 Bettembourg \n", "41 9098 Kayl 0206 Inspire_10072018 Kayl \n", "47 6936 Mondercange 0208 Inspire_10072018 Mondercange \n", "\n", " DISTRICT CANTON tree_count ghsl_pop light_level \\\n", "2 Luxembourg Esch-sur-Alzette 0.0 30072.804172 1370.0 \n", "7 Luxembourg Esch-sur-Alzette 1.0 11060.821753 1909.0 \n", "41 Luxembourg Esch-sur-Alzette 7.0 6947.824143 1267.0 \n", "47 Luxembourg Esch-sur-Alzette 4.0 8458.784720 2011.0 \n", "\n", " geometry \n", "2 POLYGON ((5.966652609366625 49.51287419342548,... \n", "7 POLYGON ((6.06035425169626 49.54561673366194, ... \n", "41 POLYGON ((6.018431355561023 49.46931155795932,... \n", "47 POLYGON ((5.976497438379162 49.52610334413541,... " ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "db[db.touches(poly)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A note on rasters" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Very different approach. Your friend here is `rasterio`." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "p = '../data/lights.tif'\n", "src = rasterio.open(p)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "src.count" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "CRS.from_epsg(4326)" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "src.crs" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BoundingBox(left=5.737499257050018, bottom=49.45416676884999, right=6.529165920550014, top=50.17916676594999)" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "src.bounds" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rioshow(src)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 4 }