{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Working with 3D city models in Python\n", "\n", "\n", "\n", "**Balázs Dukai** [*@BalazsDukai*](https://twitter.com/balazsdukai), **FOSS4G 2019**\n", "\n", "Tweet #CityJSON\n", "\n", "[3D geoinformation research group, TU Delft, Netherlands](https://3d.bk.tudelft.nl/)\n", "\n", "![](figures/logos.png)\n", "\n", "Repo of this talk: [https://github.com/balazsdukai/foss4g2019](https://github.com/balazsdukai/foss4g2019)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# 3D + city + model ?\n", "![](figures/google_earth.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "Probably the most well known 3d city model is what we see in Google Earth. And it is a very nice model to look at and it is improving continuously. However, certain applications require more information than what is stored in such a mesh model. They need to know what does an object in the model represent in the real world." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# Semantic models\n", "![](figures/semantic_model.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "That is why we have semantic models, where for each object in the model we store a label of is meaning.\n", "Once we have labels on the object and on their parts, data preparation becomes more simple. An important property for analytical applications, such as wind flow simulations." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# Useful for urban analysis\n", "\n", "![](figures/cfd.gif)\n", "\n", "García-Sánchez, C., van Beeck, J., Gorlé, C., Predictive Large Eddy Simulations for Urban Flows: Challenges and Opportunities, Building and Environment, 139, 146-156, 2018." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "But we can do much more with 3d city models. We can use them to better estimate the energy consumption in buildings, simulate noise in cities or analyse views and shadows. In the Netherlands sunshine is precious commodity, so we like to get as much as we can." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# And many more...\n", "\n", "![3d city model applications](figures/3d_cm_applications.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "There are many open 3d city models available. They come in different formats and quality. However, at our group we are still waiting for the \"year of the 3d city model\" to come. We don't really see mainstream use, apart of visualisation. Which is nice, I belive they can provide much more value than having a nice thing to simply look at." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# ...mostly just production of the models\n", "\n", "many available, but who **uses** them? **For more than visualisation?**\n", "\n", "![open 3d city models](figures/open_cms.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# In truth, 3D CMs are a bit difficult to work with" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "### Our built environment is complex, and the objects are complex too\n", "\n", "![](figures/assembling_solid.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "### Software are lagging behind\n", "\n", "+ not many software supports 3D city models\n", "\n", "+ if they do, mostly propietary data model and format\n", "\n", "+ large, *\"eterprise\"*-type applications (think Esri, FME, Bentley ... )\n", "\n", "+ few tools accessible for the individual developer / hobbyist" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "2. GML doesn't help ( *[GML madness](http://erouault.blogspot.com/2014/04/gml-madness.html) by Even Rouault* )" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "That is why we are developing CityJSON, which is a data format for 3d city models. Essentially, it aims to increase the value of 3d city models by making it more simple to work with them and lower the entry for a wider audience than cadastral organisations." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "![cityjson logo](figures/cityjson_webpage.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Key concepts of CityJSON" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "+ *simple*, as in easy to implement\n", "+ designed with programmers in mind\n", "+ fully developed in the open\n", "+ flattened hierarchy of objects\n", "+ implementation first\n", "\n", "![GitHub Issues](figures/github_issues.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "CityJSON implements the data model of CityGML. CityGML is an international standard for 3d city models and it is coupled with its GML-based encoding. \n", "\n", "We don't really like GML, because it's verbose, files are deeply nested and large (often several GB). And there are many different ways to do one thing.\n", "\n", "Also, I'm not a web-developer, but I would be surprised if anyone prefers GML over JSON for sending stuff around the web." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# JSON-based encoding of the CityGML data model\n", "![](figures/citygml_encoding.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "

I just got sent a CityGML file. pic.twitter.com/jnTVoRnVLS

— James Fee (@jamesmfee) June 29, 2016
\n", "\n", "+ files are deeply nested, and large\n", "+ many \"points of entry\"\n", "+ many diff ways to do one thing (GML doesn't help, *[GML madness](http://erouault.blogspot.com/2014/04/gml-madness.html) by Even Rouault* )" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "## The CityGML data model\n", "\n", "![](figures/citygml_uml.gif)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Compression ~6x over CityGML\n", "\n", "![](figures/zurich_size.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "## Compression\n", "| file | CityGML size (original) | CityGML size (w/o spaces) | textures | CityJSON | compression |\n", "| -------- | ----------------------- | ----------------------------- |--------- | ------------ | --------------- | \n", "| [CityGML demo \"GeoRes\"](https://www.citygml.org/samplefiles/) | 4.3MB | 4.1MB | yes | 524KB | 8.0 |\n", "| [CityGML v2 demo \"Railway\"](https://www.citygml.org/samplefiles/) | 45MB | 34MB | yes | 4.3MB | 8.1 |\n", "| [Den Haag \"tile 01\"](https://data.overheid.nl/data/dataset/ngr-3d-model-den-haag) | 23MB | 18MB | no, material | 2.9MB | 6.2 |\n", "| [Montréal VM05](http://donnees.ville.montreal.qc.ca/dataset/maquette-numerique-batiments-citygml-lod2-avec-textures/resource/36047113-aa19-4462-854a-cdcd6281a5af) | 56MB | 42MB | yes | 5.4MB | 7.8 |\n", "| [New York LoD2 (DA13)](https://www1.nyc.gov/site/doitt/initiatives/3d-building.page) | 590MB | 574MB | no | 105MB | 5.5 |\n", "| [Rotterdam Delfshaven](http://rotterdamopendata.nl/dataset/rotterdam-3d-bestanden/resource/edacea54-76ce-41c7-a0cc-2ebe5750ac18) | 16MB | 15MB | yes | 2.6MB | 5.8 |\n", "| [Vienna (the demo file)](https://www.data.gv.at/katalog/dataset/86d88cae-ad97-4476-bae5-73488a12776d) | 37MB | 36MB | no | 5.3MB | 6.8 |\n", "| [Zürich LoD2](https://www.data.gv.at/katalog/dataset/86d88cae-ad97-4476-bae5-73488a12776d) | 3.03GB | 2.07GB | no | 292MB | 7.1 |" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "If you are interested in a more detailed comparison between CityGML and CityJSON you can read our article, its open access." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![cityjson paper](figures/cityjson_paper.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "And yes, we are guilty of charge." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![standards](figures/standards.png)\n", "\n", "[https://xkcd.com/927/](https://xkcd.com/927/)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Let's have a look-see, shall we?\n", "![](figures/looksee.gif)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "Now let's take a peek under the hood, what's going on in a CityJSON file." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## An empty CityJSON file\n", "\n", "![](figures/cj01.svg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "In a city model we represent the real-world objects such as buildings, bridges, trees as different types of CityObjects. Each CityObject has its \n", "\n", "+ unique ID, \n", "+ attributes,\n", "+ geometry,\n", "+ and it can have children objects or it can be part of a parent object.\n", "\n", "Note however, that CityObject are not nested. Each of them is stored at root and the hierachy represented by linking to object IDs. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## A CityObject\n", "\n", "![](figures/cj02.svg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "Each CityObject has a geometry representation. This geometry is composed of *boundaries* and *semantics*." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Geometry\n", "\n", "+ **boundaries** definition uses vertex indices (inspired by Wavefront OBJ)\n", "+ We have a vertex list at the root of the document\n", "+ Vertices are not repeated (unlike Simple Features)\n", "+ **semantics** are linked to the boundary surfaces\n", "![](figures/cj04.svg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "This `MulitSurface` has \n", "\n", "5 surfaces \n", "```json\n", "[[0, 3, 2, 1]], [[4, 5, 6, 7]], [[0, 1, 5, 4]], [[0, 2, 3, 8]], [[10, 12, 23, 48]]\n", "```\n", "each surface has only an exterior ring (the first array)\n", "```json\n", "[ [0, 3, 2, 1] ]\n", "```\n", "\n", "The semantic surfaces in the `semantics` json-object are linked to the boundary surfaces. The integers in the `values` property of `surfaces` are the 0-based indices of the surfaces of the boundary." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "pycharm": { "is_executing": false }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 16 CityObjects\n", "{C9D4A5CF-094A-47DA-97E4-4A3BFD75D3AE} \t\n", "{71B60053-BC28-404D-BAB9-8A642AAC0CF4} \t\n", "{6271F75F-E8D8-4EE4-AC46-9DB02771A031} \t\n", "{DE77E78F-B110-43D2-A55C-8B61911192DE} \t\n", "{19935DFC-F7B3-4D6E-92DD-C48EE1D1519A} \t\n", "{953BC999-2F92-4B38-95CF-218F7E05AFA9} \t\n", "{8D716FDE-18DD-4FB5-AB06-9D207377240E} \t\n", "{C6AAF95B-8C09-4130-AB4D-6777A2A18A2E} \t\n", "{72390BDE-903C-4C8C-8A3F-2DF5647CD9B4} \t\n", "{8244B286-63E2-436E-9D4E-169B8ACFE9D0} \t\n", "{87316D28-7574-4763-B9CE-BF6A2DF8092C} \t\n", "{CD98680D-A8DD-4106-A18E-15EE2A908D75} \t\n", "{64A9018E-4F56-47CD-941F-43F6F0C4285B} \t\n", "{459F183A-D0C2-4F8A-8B5F-C498EFDE366D} \t\n", "{237D41CC-991E-4308-8986-42ABFB4F7431} \t\n", "{23D8CA22-0C82-4453-A11E-B3F2B3116DB4} \t\n" ] } ], "source": [ "import json\n", "import os\n", "\n", "path = os.path.join('data', 'rotterdam_subset.json')\n", "with open(path) as fin:\n", " cm = json.loads(fin.read())\n", " \n", "print(f\"There are {len(cm['CityObjects'])} CityObjects\")\n", "\n", "# list all IDs\n", "for id in cm['CityObjects']:\n", " print(id, \"\\t\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "+ Working with a CityJSON file is straightforward. One can open it with the standard library and get going.\n", "+ But you need to know the schema well.\n", "+ And you need to write everything from scratch." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "That is why we are developing **cjio**. \n", "\n", "**cjio** is how *we eat what we cook*\n", "\n", "Aims to help to actually work with and analyse 3D city models, and extract more value from them. Instead of letting them gather dust in some governmental repository." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "![cjio](figures/cjio_docs.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## `cjio` has a (quite) stable CLI\n", "\n", "```bash\n", "$ cjio city_model.json reproject 2056 export --format glb /out/model.glb\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "## and an experimental API\n", "\n", "```python\n", "from cjio import cityjson\n", "\n", "cm = cityjson.load('city_model.json')\n", "\n", "cm.get_cityobjects(type='building')\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "**`pip install cjio`**" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "This notebook is based on the develop branch." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "**`pip install git+https://github.com/tudelft3d/cjio@develop`**" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# `cjio`'s CLI" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "pycharm": { "is_executing": false }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Usage: cjio [OPTIONS] INPUT COMMAND1 [ARGS]... [COMMAND2 [ARGS]...]...\r\n", "\r\n", " Process and manipulate a CityJSON file, and allow different outputs. The\r\n", " different operators can be chained to perform several processing in one\r\n", " step, the CityJSON model goes through the different operators.\r\n", "\r\n", " To get help on specific command, eg for 'validate':\r\n", "\r\n", " cjio validate --help\r\n", "\r\n", " Usage examples:\r\n", "\r\n", " cjio example.json info validate\r\n", " cjio example.json assign_epsg 7145 remove_textures export output.obj\r\n", " cjio example.json subset --id house12 save out.json\r\n", "\r\n", "Options:\r\n", " --version Show the version and exit.\r\n", " --ignore_duplicate_keys Load a CityJSON file even if some City Objects have\r\n", " the same IDs (technically invalid file)\r\n", " --help Show this message and exit.\r\n", "\r\n", "Commands:\r\n", " assign_epsg Assign a (new) EPSG.\r\n", " clean Clean = remove_duplicate_vertices +...\r\n", " compress Compress a CityJSON file, ie stores its...\r\n", " decompress Decompress a CityJSON file, ie remove the...\r\n", " export Export the CityJSON to another format.\r\n", " extract_lod Extract only one LoD for a dataset.\r\n", " info Output info in simple JSON.\r\n", " locate_textures Output the location of the texture files.\r\n", " merge Merge the current CityJSON with others.\r\n", " partition Partition the city model into tiles.\r\n", " remove_duplicate_vertices Remove duplicate vertices a CityJSON file.\r\n", " remove_materials Remove all materials from a CityJSON file.\r\n", " remove_orphan_vertices Remove orphan vertices a CityJSON file.\r\n", " remove_textures Remove all textures from a CityJSON file.\r\n", " reproject Reproject the CityJSON to a new EPSG.\r\n", " save Save the city model to a CityJSON file.\r\n", " subset Create a subset of a CityJSON file.\r\n", " translate Translate the file by its (-minx, -miny,...\r\n", " update_bbox Update the bbox of a CityJSON file.\r\n", " update_textures Update the location of the texture files.\r\n", " upgrade_version Upgrade the CityJSON to the latest version.\r\n", " validate Validate the CityJSON file: (1) against its...\r\n" ] } ], "source": [ "! cjio --help" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "pycharm": { "is_executing": false }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[30m\u001b[46mParsing data/rotterdam_subset.json\u001b[0m\r\n", "{\r\n", " \"cityjson_version\": \"1.0\",\r\n", " \"epsg\": 7415,\r\n", " \"bbox\": [\r\n", " 90454.18900000001,\r\n", " 435614.88,\r\n", " 0.0,\r\n", " 91002.41900000001,\r\n", " 436048.217,\r\n", " 18.29\r\n", " ],\r\n", " \"transform/compressed\": true,\r\n", " \"cityobjects_total\": 16,\r\n", " \"cityobjects_present\": [\r\n", " \"Building\"\r\n", " ],\r\n", " \"materials\": false,\r\n", " \"textures\": true\r\n", "}\r\n" ] } ], "source": [ "! cjio data/rotterdam_subset.json info" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "pycharm": { "is_executing": false }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[30m\u001b[46mParsing data/rotterdam_subset.json\u001b[0m\n", "\u001b[30m\u001b[46m===== Validation (with official CityJSON schemas) =====\u001b[0m\n", "-- Validating the syntax of the file\n", "\t(using the schemas 1.0.0)\n", "-- Validating the internal consistency of the file (see docs for list)\n", "\t--Vertex indices coherent\n", "\t--Specific for CityGroups\n", "\t--Semantic arrays coherent with geometry\n", "\t--Root properties\n", "\t--Empty geometries\n", "\t--Duplicate vertices\n", "\t--Orphan vertices\n", "\t--CityGML attributes\n", "=====\n", "\u001b[32mFile is valid\u001b[0m\n", "\u001b[31mFile has warnings\u001b[0m\n", "--- WARNINGS ---\n", "WARNING: attributes 'TerrainHeight' not in CityGML schema\n", "\t(16 CityObjects have this warning)\n", "WARNING: attributes 'bron_tex' not in CityGML schema\n", "\t(16 CityObjects have this warning)\n", "WARNING: attributes 'voll_tex' not in CityGML schema\n", "\t(16 CityObjects have this warning)\n", "WARNING: attributes 'bron_geo' not in CityGML schema\n", "\t(16 CityObjects have this warning)\n", "WARNING: attributes 'status' not in CityGML schema\n", "\t(16 CityObjects have this warning)\n", "=====================================\n" ] } ], "source": [ "! cjio data/rotterdam_subset.json validate" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "pycharm": { "is_executing": false }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[30m\u001b[46mParsing data/rotterdam_subset.json\u001b[0m\n", "\u001b[30m\u001b[46mSubset of CityJSON\u001b[0m\n", "\u001b[30m\u001b[46mMerging files\u001b[0m\n", "\u001b[30m\u001b[46mReproject to EPSG:2056\u001b[0m\n", "\u001b[?25l [####################################] 100% \u001b[?25h\n", "\u001b[30m\u001b[46mSaving CityJSON to a file /home/balazs/Reports/talk_cjio_foss4g_2019/data/test_rotterdam.json\u001b[0m\n" ] } ], "source": [ "! cjio data/rotterdam_subset.json \\\n", " subset --exclude --id \"{CD98680D-A8DD-4106-A18E-15EE2A908D75}\" \\\n", " merge data/rotterdam_one.json \\\n", " reproject 2056 \\\n", " save data/test_rotterdam.json" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "+ The CLI was first, no plans for API\n", "\n", "+ **Works with whole city model only**\n", "\n", "+ Functions for the CLI work with the JSON directly, passing it along\n", "\n", "+ Simple and effective architecture" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# `cjio`'s API\n", "\n", "Allow *read* --> *explore* --> *modify* --> *write* iteration\n", "\n", "Work with CityObjects and their parts\n", "\n", "Functions for common operations\n", "\n", "Inspired by the *tidyverse* from the R ecosystem" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "pycharm": { "is_executing": false }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import os\n", "from copy import deepcopy\n", "from cjio import cityjson\n", "from shapely.geometry import Polygon\n", "import matplotlib.pyplot as plt\n", "plt.close('all')\n", "from sklearn.preprocessing import FunctionTransformer\n", "from sklearn import cluster\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "skip" } }, "source": [ "In the following we work with a subset of the 3D city model of Rotterdam\n", "![](figures/rotterdam_subset.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Load a CityJSON" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "skip" } }, "source": [ "The `load()` method loads a CityJSON file into a CityJSON object." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "pycharm": { "is_executing": false, "name": "#%%\n" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "path = os.path.join('data', 'rotterdam_subset.json')\n", "\n", "cm = cityjson.load(path)\n", "\n", "print(type(cm))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "## Using the CLI commands in the API\n", "You can use any of the CLI commands on a CityJSON object \n", "\n", "*However,* not all CLI commands are mapped 1-to-1 to `CityJSON` methods\n", "\n", "And we haven't harmonized the CLI and the API yet. " ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "pycharm": { "is_executing": false, "name": "#%%\n" }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-- Validating the syntax of the file\n", "\t(using the schemas 1.0.0)\n", "-- Validating the internal consistency of the file (see docs for list)\n", "\t--Vertex indices coherent\n", "\t--Specific for CityGroups\n", "\t--Semantic arrays coherent with geometry\n", "\t--Root properties\n", "\t--Empty geometries\n", "\t--Duplicate vertices\n", "\t--Orphan vertices\n", "\t--CityGML attributes\n" ] }, { "data": { "text/plain": [ "(True,\n", " False,\n", " [],\n", " [\"WARNING: attributes 'TerrainHeight' not in CityGML schema\",\n", " '\\t(16 CityObjects have this warning)',\n", " \"WARNING: attributes 'bron_tex' not in CityGML schema\",\n", " '\\t(16 CityObjects have this warning)',\n", " \"WARNING: attributes 'voll_tex' not in CityGML schema\",\n", " '\\t(16 CityObjects have this warning)',\n", " \"WARNING: attributes 'bron_geo' not in CityGML schema\",\n", " '\\t(16 CityObjects have this warning)',\n", " \"WARNING: attributes 'status' not in CityGML schema\",\n", " '\\t(16 CityObjects have this warning)'])" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.validate()" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "skip" } }, "source": [ "## Explore the city model\n", "\n", "Print the basic information about the city model. Note that `print()` returns the same information as the `info` command in the CLI." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "pycharm": { "is_executing": false, "name": "#%%\n" }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " \"cityjson_version\": \"1.0\",\n", " \"epsg\": 7415,\n", " \"bbox\": [\n", " 90454.18900000001,\n", " 435614.88,\n", " 0.0,\n", " 91002.41900000001,\n", " 436048.217,\n", " 18.29\n", " ],\n", " \"transform/compressed\": true,\n", " \"cityobjects_total\": 16,\n", " \"cityobjects_present\": [\n", " \"Building\"\n", " ],\n", " \"materials\": false,\n", " \"textures\": true\n", "}\n" ] } ], "source": [ "print(cm)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "## Getting objects from the model\n", "Get CityObjects by their *type*, or a list of types. Also by their IDs. \n", "\n", "Note that `get_cityobjects()` == `cm.cityobjects`" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "pycharm": { "is_executing": false, "name": "#%%\n" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "buildings = cm.get_cityobjects(type='building')\n", "\n", "# both Building and BuildingPart objects\n", "buildings_parts = cm.get_cityobjects(type=['building', 'buildingpart'])\n", "\n", "r_ids = ['{C9D4A5CF-094A-47DA-97E4-4A3BFD75D3AE}',\n", " '{6271F75F-E8D8-4EE4-AC46-9DB02771A031}']\n", "buildings_ids = cm.get_cityobjects(id=r_ids)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "skip" } }, "source": [ "## Properties and geometry of objects" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "pycharm": { "is_executing": false, "name": "#%%\n" }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " \"id\": \"{C9D4A5CF-094A-47DA-97E4-4A3BFD75D3AE}\",\n", " \"type\": \"Building\",\n", " \"attributes\": {\n", " \"TerrainHeight\": 3.03,\n", " \"bron_tex\": \"UltraCAM-X 10cm juni 2008\",\n", " \"voll_tex\": \"complete\",\n", " \"bron_geo\": \"Lidar 15-30 punten - nov. 2008\",\n", " \"status\": \"1\"\n", " },\n", " \"children\": null,\n", " \"parents\": null,\n", " \"geometry_type\": [\n", " \"MultiSurface\"\n", " ],\n", " \"geometry_lod\": [\n", " 2\n", " ],\n", " \"semantic_surfaces\": [\n", " \"WallSurface\",\n", " \"RoofSurface\",\n", " \"GroundSurface\"\n", " ]\n", "}\n" ] } ], "source": [ "b01 = buildings_ids['{C9D4A5CF-094A-47DA-97E4-4A3BFD75D3AE}']\n", "print(b01)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "pycharm": { "is_executing": false, "name": "#%%\n" }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "{'TerrainHeight': 3.03,\n", " 'bron_tex': 'UltraCAM-X 10cm juni 2008',\n", " 'voll_tex': 'complete',\n", " 'bron_geo': 'Lidar 15-30 punten - nov. 2008',\n", " 'status': '1'}" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b01.attributes" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "skip" } }, "source": [ "CityObjects can have *children* and *parents*" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "pycharm": { "is_executing": false, "name": "#%%\n" }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b01.children is None and b01.parents is None" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "skip" } }, "source": [ "CityObject geometry is a list of `Geometry` objects. That is because a CityObject can have multiple geometry representations in different levels of detail, eg. a geometry in LoD1 and a second geometry in LoD2." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "pycharm": { "is_executing": false, "name": "#%%\n" }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b01.geometry" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "pycharm": { "is_executing": false, "name": "#%%\n" }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MultiSurface, lod 2\n" ] } ], "source": [ "geom = b01.geometry[0]\n", "print(\"{}, lod {}\".format(geom.type, geom.lod))" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "skip" } }, "source": [ "### Geometry boundaries and Semantic Surfaces\n", "On the contrary to a CityJSON file, the geometry boundaries are dereferenced when working with the API. This means that the vertex coordinates are included in the boundary definition, not only the vertex indices.\n", "\n", "`cjio` doesn't provide specific geometry classes (yet), eg. MultiSurface or Solid class. If you are working with the geometry boundaries, you need to the geometric operations yourself, or cast the boundary to a geometry-class of some other library. For example `shapely` if 2D is enough." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "Vertex coordinates are kept 'as is' on loading the geometry. CityJSON files are often compressed and coordinates are shifted and transformed into integers so probably you'll want to transform them back. Otherwise geometry operations won't make sense." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "pycharm": { "is_executing": false }, "scrolled": true, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "[(90988.79100000001, 435638.657, 10.652000000000001),\n", " (90987.429, 435642.77, 10.652000000000001),\n", " (90986.46900000001, 435641.09, 10.652000000000001),\n", " (90985.781, 435640.846, 10.652000000000001),\n", " (90986.801, 435637.955, 10.652000000000001)]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "transformation_object = cm.transform\n", "\n", "geom_transformed = geom.transform(transformation_object)\n", "\n", "geom_transformed.boundaries[0][0]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "But it might be easier to transform (decompress) the whole model on load." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "pycharm": { "is_executing": false }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " \"cityjson_version\": \"1.0\",\n", " \"epsg\": 7415,\n", " \"bbox\": [\n", " 90454.18900000001,\n", " 435614.88,\n", " 0.0,\n", " 91002.41900000001,\n", " 436048.217,\n", " 18.29\n", " ],\n", " \"transform/compressed\": false,\n", " \"cityobjects_total\": 16,\n", " \"cityobjects_present\": [\n", " \"Building\"\n", " ],\n", " \"materials\": false,\n", " \"textures\": true\n", "}\n" ] } ], "source": [ "cm_transformed = cityjson.load(path, transform=True)\n", "print(cm_transformed)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "skip" } }, "source": [ "Semantic Surfaces are stored in a similar fashion as in a CityJSON file, in the `surfaces` attribute of a Geometry object." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "pycharm": { "is_executing": false, "name": "#%%\n" }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "{0: {'surface_idx': [[0], [1], [2]], 'type': 'RoofSurface'},\n", " 1: {'surface_idx': [[3]], 'type': 'GroundSurface'},\n", " 2: {'surface_idx': [[4],\n", " [5],\n", " [6],\n", " [7],\n", " [8],\n", " [9],\n", " [10],\n", " [11],\n", " [12],\n", " [13],\n", " [14],\n", " [15],\n", " [16],\n", " [17],\n", " [18],\n", " [19]],\n", " 'type': 'WallSurface'}}" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "geom.surfaces" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "skip" } }, "source": [ "`surfaces` does not store geometry boundaries, just references (`surface_idx`). Use the `get_surface_boundaries()` method to obtain the boundary-parts connected to the semantic surface." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "pycharm": { "is_executing": false, "name": "#%%\n" }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "{0: {'surface_idx': [[0], [1], [2]], 'type': 'RoofSurface'}}" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "roofs = geom.get_surfaces(type='roofsurface')\n", "roofs" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "pycharm": { "is_executing": false, "name": "#%%\n" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "roof_boundaries = []\n", "for r in roofs.values():\n", " roof_boundaries.append(geom.get_surface_boundaries(r))" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "pycharm": { "is_executing": false }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "[[[[[579471, 198217, 10652],\n", " [578109, 202330, 10652],\n", " [577149, 200650, 10652],\n", " [576461, 200406, 10652],\n", " [577481, 197515, 10652]]],\n", " [[[580840, 194082, 15211],\n", " [579471, 198217, 15211],\n", " [577481, 197515, 15211],\n", " [576461, 200406, 15211],\n", " [572239, 198909, 15211],\n", " [571839, 200119, 15211],\n", " [571503, 201071, 15211],\n", " [566651, 199359, 15211],\n", " [569801, 190223, 15211],\n", " [573253, 191430, 15211],\n", " [574658, 191922, 15211]]],\n", " [[[565589, 202439, 11036],\n", " [566651, 199359, 11036],\n", " [571503, 201071, 11036],\n", " [571839, 200119, 11036],\n", " [573299, 200640, 11036],\n", " [572089, 204029, 11036],\n", " [570629, 203440, 11036],\n", " [570379, 204150, 11036]]]]]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "roof_boundaries" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "skip" } }, "source": [ "### Assigning attributes to Semantic Surfaces\n", "1. extract the surfaces,\n", "2. make the changes on the surface,\n", "3. overwrite the CityObjects with the changes." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "pycharm": { "is_executing": false, "name": "#%%\n" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "cm_copy = deepcopy(cm)\n", "new_cos = {}\n", "for co_id, co in cm.cityobjects.items():\n", " new_geoms = []\n", " for geom in co.geometry:\n", " # Only LoD >= 2 models have semantic surfaces\n", " if geom.lod >= 2.0:\n", " # Extract the surfaces\n", " roofsurfaces = geom.get_surfaces('roofsurface')\n", " for i, rsrf in roofsurfaces.items():\n", " # Change the attributes\n", " if 'attributes' in rsrf.keys():\n", " rsrf['attributes']['cladding'] = 'tiles'\n", " else:\n", " rsrf['attributes'] = {}\n", " rsrf['attributes']['cladding'] = 'tiles'\n", " geom.surfaces[i] = rsrf\n", " new_geoms.append(geom)\n", " else:\n", " # Use the unchanged geometry\n", " new_geoms.append(geom)\n", " co.geometry = new_geoms\n", " new_cos[co_id] = co\n", "cm_copy.cityobjects = new_cos" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "pycharm": { "is_executing": false }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " \"id\": \"{C9D4A5CF-094A-47DA-97E4-4A3BFD75D3AE}\",\n", " \"type\": \"Building\",\n", " \"attributes\": {\n", " \"TerrainHeight\": 3.03,\n", " \"bron_tex\": \"UltraCAM-X 10cm juni 2008\",\n", " \"voll_tex\": \"complete\",\n", " \"bron_geo\": \"Lidar 15-30 punten - nov. 2008\",\n", " \"status\": \"1\"\n", " },\n", " \"children\": null,\n", " \"parents\": null,\n", " \"geometry_type\": [\n", " \"MultiSurface\"\n", " ],\n", " \"geometry_lod\": [\n", " 2\n", " ],\n", " \"semantic_surfaces\": [\n", " \"WallSurface\",\n", " \"RoofSurface\",\n", " \"GroundSurface\"\n", " ]\n", "}\n" ] } ], "source": [ "print(cm_copy.cityobjects['{C9D4A5CF-094A-47DA-97E4-4A3BFD75D3AE}'])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "### Create new Semantic Surfaces\n", "The process is similar as previously. However, in this example we create new SemanticSurfaces that hold the values which we compute from the geometry. The input city model has a single semantic \"WallSurface\", without attributes, for all the walls of a building. The snippet below illustrates how to separate surfaces and assign the semantics to them." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "pycharm": { "is_executing": false, "name": "#%%\n" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "new_cos = {}\n", "\n", "for co_id, co in cm_copy.cityobjects.items():\n", " new_geoms = []\n", " \n", " for geom in co.geometry:\n", " if geom.lod >= 2.0:\n", " max_id = max(geom.surfaces.keys())\n", " old_ids = []\n", " \n", " for w_i, wsrf in geom.get_surfaces('wallsurface').items():\n", " old_ids.append(w_i)\n", " del geom.surfaces[w_i]\n", " boundaries = geom.get_surface_boundaries(wsrf)\n", " \n", " for j, boundary_geometry in enumerate(boundaries):\n", " # The original geometry has the same Semantic for all wall, \n", " # but we want to divide the wall surfaces by their orientation, \n", " # thus we need to have the correct surface index\n", " surface_index = wsrf['surface_idx'][j]\n", " new_srf = {\n", " 'type': wsrf['type'],\n", " 'surface_idx': surface_index\n", " }\n", " \n", " for multisurface in boundary_geometry:\n", " # Do any operation here\n", " x, y, z = multisurface[0]\n", " if j % 2 > 0:\n", " orientation = 'north'\n", " else:\n", " orientation = 'south'\n", " \n", " # Add the new attribute to the surface \n", " if 'attributes' in wsrf.keys():\n", " wsrf['attributes']['orientation'] = orientation\n", " else:\n", " wsrf['attributes'] = {}\n", " wsrf['attributes']['orientation'] = orientation\n", " \n", " new_srf['attributes'] = wsrf['attributes']\n", " \n", " # if w_i in geom.surfaces.keys():\n", " # del geom.surfaces[w_i]\n", " \n", " max_id = max_id + 1\n", " geom.surfaces[max_id] = new_srf\n", " \n", " new_geoms.append(geom)\n", " \n", " else:\n", " # If LoD1, just add the geometry unchanged\n", " new_geoms.append(geom)\n", " \n", " co.geometry = new_geoms\n", " new_cos[co_id] = co\n", " \n", "cm_copy.cityobjects = new_cos" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Analysing CityModels\n", "\n", "![](figures/zurich.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "In the following I show how to compute some attributes from CityObject geometry and use these attributes as input for machine learning. For this we use the LoD2 model of Zürich.\n", "\n", "Download the Zürich data set from https://3d.bk.tudelft.nl/opendata/cityjson/1.0/Zurich_Building_LoD2_V10.json" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "pycharm": { "is_executing": false }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "path = os.path.join('data', 'zurich.json')\n", "zurich = cityjson.load(path, transform=True)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## A simple geometry function" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "Here is a simple geometry function that computes the area of the groundsurface (footprint) of buildings in the model. It also show how to cast surfaces, in this case the ground surface, to Shapely Polygons." ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def compute_footprint_area(co):\n", " \"\"\"Compute the area of the footprint\"\"\"\n", " footprint_area = 0\n", " for geom in co.geometry:\n", " \n", " # only LoD2 (or higher) objects have semantic surfaces\n", " if geom.lod >= 2.0:\n", " footprints = geom.get_surfaces(type='groundsurface')\n", " \n", " # there can be many surfaces with label 'groundsurface'\n", " for i,f in footprints.items():\n", " for multisurface in geom.get_surface_boundaries(f):\n", " for surface in multisurface:\n", " \n", " # cast to Shapely polygon\n", " shapely_poly = Polygon(surface)\n", " footprint_area += shapely_poly.area\n", " \n", " return footprint_area" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Compute new attributes" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "Then we need to loop through the CityObjects and update add the new attributes. Note that the `attributes` CityObject attribute is just a dictionary.\n", "\n", "Thus we compute the number of vertices of the CityObject and the area of is footprint. Then we going to cluster these two variables. This is completely arbitrary excercise which is simply meant to illustrate how to transform a city model into machine-learnable features." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "for co_id, co in zurich.cityobjects.items():\n", " co.attributes['nr_vertices'] = len(co.get_vertices())\n", " co.attributes['fp_area'] = compute_footprint_area(co)\n", " zurich.cityobjects[co_id] = co" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "skip" } }, "source": [ "It is possible to export the city model into a pandas DataFrame. Note that only the CityObject attributes are exported into the dataframe, with CityObject IDs as the index of the dataframe. Thus if you want to export the attributes of SemanticSurfaces for example, then you need to add them as CityObject attributes.\n", "\n", "The function below illustrates this operation." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "def assign_cityobject_attribute(cm):\n", " \"\"\"Copy the semantic surface attributes to CityObject attributes.\n", " Returns a copy of the citymodel.\n", " \"\"\"\n", " new_cos = {}\n", " cm_copy = deepcopy(cm)\n", " for co_id, co in cm.cityobjects.items():\n", " for geom in co.geometry:\n", " for srf in geom.surfaces.values():\n", " if 'attributes' in srf:\n", " for attr,a_v in srf['attributes'].items():\n", " if (attr not in co.attributes) or (co.attributes[attr] is None):\n", " co.attributes[attr] = [a_v]\n", " else:\n", " co.attributes[attr].append(a_v)\n", " new_cos[co_id] = co\n", " cm_copy.cityobjects = new_cos\n", " return cm_copy" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "scrolled": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
creationDateGeomtypenr_verticesfp_areaclassHerkunftQualitaetStatusFileCreationDateRegionGebaeudeStatus
UUID_93fc5bae-4446-4336-9ff8-6679ebfdfde32017-01-231.02465.209763NaNNaNNaNNaNNaNNaN
UUID_c9884c4e-1cac-47f5-b88b-6fb074c0ae502017-01-23NaN00.000000BB01EE_LB_20071.02012-02-232.01.0
UUID_a4a09780-153f-4385-ad19-3a92a6c4eec42017-01-231.03820.784309NaNNaNNaNNaNNaNNaN
UUID_ba0bb815-5276-4e35-b4c1-878cbf6ba9342017-01-23NaN00.000000BB07EE_LB_20071.02012-02-232.01.0
UUID_bb1835bc-7437-453f-ac08-885de0503aaa2017-01-231.08769.363823NaNNaNNaNNaNNaNNaN
\n", "
" ], "text/plain": [ " creationDate Geomtype nr_vertices \\\n", "UUID_93fc5bae-4446-4336-9ff8-6679ebfdfde3 2017-01-23 1.0 24 \n", "UUID_c9884c4e-1cac-47f5-b88b-6fb074c0ae50 2017-01-23 NaN 0 \n", "UUID_a4a09780-153f-4385-ad19-3a92a6c4eec4 2017-01-23 1.0 38 \n", "UUID_ba0bb815-5276-4e35-b4c1-878cbf6ba934 2017-01-23 NaN 0 \n", "UUID_bb1835bc-7437-453f-ac08-885de0503aaa 2017-01-23 1.0 87 \n", "\n", " fp_area class Herkunft \\\n", "UUID_93fc5bae-4446-4336-9ff8-6679ebfdfde3 65.209763 NaN NaN \n", "UUID_c9884c4e-1cac-47f5-b88b-6fb074c0ae50 0.000000 BB01 EE_LB_2007 \n", "UUID_a4a09780-153f-4385-ad19-3a92a6c4eec4 20.784309 NaN NaN \n", "UUID_ba0bb815-5276-4e35-b4c1-878cbf6ba934 0.000000 BB07 EE_LB_2007 \n", "UUID_bb1835bc-7437-453f-ac08-885de0503aaa 69.363823 NaN NaN \n", "\n", " QualitaetStatus FileCreationDate \\\n", "UUID_93fc5bae-4446-4336-9ff8-6679ebfdfde3 NaN NaN \n", "UUID_c9884c4e-1cac-47f5-b88b-6fb074c0ae50 1.0 2012-02-23 \n", "UUID_a4a09780-153f-4385-ad19-3a92a6c4eec4 NaN NaN \n", "UUID_ba0bb815-5276-4e35-b4c1-878cbf6ba934 1.0 2012-02-23 \n", "UUID_bb1835bc-7437-453f-ac08-885de0503aaa NaN NaN \n", "\n", " Region GebaeudeStatus \n", "UUID_93fc5bae-4446-4336-9ff8-6679ebfdfde3 NaN NaN \n", "UUID_c9884c4e-1cac-47f5-b88b-6fb074c0ae50 2.0 1.0 \n", "UUID_a4a09780-153f-4385-ad19-3a92a6c4eec4 NaN NaN \n", "UUID_ba0bb815-5276-4e35-b4c1-878cbf6ba934 2.0 1.0 \n", "UUID_bb1835bc-7437-453f-ac08-885de0503aaa NaN NaN " ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = zurich.to_dataframe()\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "In order to have a nicer distribution of the data, we remove the missing values and apply a log-transform on the two variables. Note that the `FuntionTransformer.transform` transforms a DataFrame to a numpy array that is ready to be used in `scikit-learn`. The details of a machine learning workflow is beyond the scope of this tutorial however." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "df_subset = df[df['Geomtype'].notnull() & df['fp_area'] > 0.0].loc[:, ['nr_vertices', 'fp_area']]\n", "transformer = FunctionTransformer(np.log, validate=True)\n", "df_logtransform = transformer.transform(df_subset)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "scrolled": true, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fig = plt.figure()\n", "ax = fig.add_subplot(1, 1, 1)\n", "ax.scatter(df_logtransform[:,0], df_logtransform[:,1], alpha=0.3, s=1.0)\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "def plot_model_results(model, data):\n", " fig = plt.figure()\n", " ax = fig.add_subplot(1, 1, 1)\n", " colormap = np.array(['lightblue', 'red', 'lime', 'blue','black'])\n", " ax.scatter(data[:,0], data[:,1], c=colormap[model.labels_], s=10, alpha=0.5)\n", " ax.set_xlabel('Number of vertices [log]')\n", " ax.set_ylabel('Footprint area [log]')\n", " plt.title(f\"DBSCAN clustering with estimated {len(set(model.labels_))} clusters\")\n", " plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "Since we transformed our DataFrame, we can fit any model in `scikit-learn`. I use DBSCAN because I wanted to find the data points on the fringes of the central cluster." ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "scrolled": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "application/javascript": [ "/* Put everything inside the global mpl namespace */\n", "window.mpl = {};\n", "\n", "\n", "mpl.get_websocket_type = function() {\n", " if (typeof(WebSocket) !== 'undefined') {\n", " return WebSocket;\n", " } else if (typeof(MozWebSocket) !== 'undefined') {\n", " return MozWebSocket;\n", " } else {\n", " alert('Your browser does not have WebSocket support. ' +\n", " 'Please try Chrome, Safari or Firefox ≥ 6. ' +\n", " 'Firefox 4 and 5 are also supported but you ' +\n", " 'have to enable WebSockets in about:config.');\n", " };\n", "}\n", "\n", "mpl.figure = function(figure_id, websocket, ondownload, parent_element) {\n", " this.id = figure_id;\n", "\n", " this.ws = websocket;\n", "\n", " this.supports_binary = (this.ws.binaryType != undefined);\n", "\n", " if (!this.supports_binary) {\n", " var warnings = document.getElementById(\"mpl-warnings\");\n", " if (warnings) {\n", " warnings.style.display = 'block';\n", " warnings.textContent = (\n", " \"This browser does not support binary websocket messages. \" +\n", " \"Performance may be slow.\");\n", " }\n", " }\n", "\n", " this.imageObj = new Image();\n", "\n", " this.context = undefined;\n", " this.message = undefined;\n", " this.canvas = undefined;\n", " this.rubberband_canvas = undefined;\n", " this.rubberband_context = undefined;\n", " this.format_dropdown = undefined;\n", "\n", " this.image_mode = 'full';\n", "\n", " this.root = $('
');\n", " this._root_extra_style(this.root)\n", " this.root.attr('style', 'display: inline-block');\n", "\n", " $(parent_element).append(this.root);\n", "\n", " this._init_header(this);\n", " this._init_canvas(this);\n", " this._init_toolbar(this);\n", "\n", " var fig = this;\n", "\n", " this.waiting = false;\n", "\n", " this.ws.onopen = function () {\n", " fig.send_message(\"supports_binary\", {value: fig.supports_binary});\n", " fig.send_message(\"send_image_mode\", {});\n", " if (mpl.ratio != 1) {\n", " fig.send_message(\"set_dpi_ratio\", {'dpi_ratio': mpl.ratio});\n", " }\n", " fig.send_message(\"refresh\", {});\n", " }\n", "\n", " this.imageObj.onload = function() {\n", " if (fig.image_mode == 'full') {\n", " // Full images could contain transparency (where diff images\n", " // almost always do), so we need to clear the canvas so that\n", " // there is no ghosting.\n", " fig.context.clearRect(0, 0, fig.canvas.width, fig.canvas.height);\n", " }\n", " fig.context.drawImage(fig.imageObj, 0, 0);\n", " };\n", "\n", " this.imageObj.onunload = function() {\n", " fig.ws.close();\n", " }\n", "\n", " this.ws.onmessage = this._make_on_message_function(this);\n", "\n", " this.ondownload = ondownload;\n", "}\n", "\n", "mpl.figure.prototype._init_header = function() {\n", " var titlebar = $(\n", " '
');\n", " var titletext = $(\n", " '
');\n", " titlebar.append(titletext)\n", " this.root.append(titlebar);\n", " this.header = titletext[0];\n", "}\n", "\n", "\n", "\n", "mpl.figure.prototype._canvas_extra_style = function(canvas_div) {\n", "\n", "}\n", "\n", "\n", "mpl.figure.prototype._root_extra_style = function(canvas_div) {\n", "\n", "}\n", "\n", "mpl.figure.prototype._init_canvas = function() {\n", " var fig = this;\n", "\n", " var canvas_div = $('
');\n", "\n", " canvas_div.attr('style', 'position: relative; clear: both; outline: 0');\n", "\n", " function canvas_keyboard_event(event) {\n", " return fig.key_event(event, event['data']);\n", " }\n", "\n", " canvas_div.keydown('key_press', canvas_keyboard_event);\n", " canvas_div.keyup('key_release', canvas_keyboard_event);\n", " this.canvas_div = canvas_div\n", " this._canvas_extra_style(canvas_div)\n", " this.root.append(canvas_div);\n", "\n", " var canvas = $('');\n", " canvas.addClass('mpl-canvas');\n", " canvas.attr('style', \"left: 0; top: 0; z-index: 0; outline: 0\")\n", "\n", " this.canvas = canvas[0];\n", " this.context = canvas[0].getContext(\"2d\");\n", "\n", " var backingStore = this.context.backingStorePixelRatio ||\n", "\tthis.context.webkitBackingStorePixelRatio ||\n", "\tthis.context.mozBackingStorePixelRatio ||\n", "\tthis.context.msBackingStorePixelRatio ||\n", "\tthis.context.oBackingStorePixelRatio ||\n", "\tthis.context.backingStorePixelRatio || 1;\n", "\n", " mpl.ratio = (window.devicePixelRatio || 1) / backingStore;\n", "\n", " var rubberband = $('');\n", " rubberband.attr('style', \"position: absolute; left: 0; top: 0; z-index: 1;\")\n", "\n", " var pass_mouse_events = true;\n", "\n", " canvas_div.resizable({\n", " start: function(event, ui) {\n", " pass_mouse_events = false;\n", " },\n", " resize: function(event, ui) {\n", " fig.request_resize(ui.size.width, ui.size.height);\n", " },\n", " stop: function(event, ui) {\n", " pass_mouse_events = true;\n", " fig.request_resize(ui.size.width, ui.size.height);\n", " },\n", " });\n", "\n", " function mouse_event_fn(event) {\n", " if (pass_mouse_events)\n", " return fig.mouse_event(event, event['data']);\n", " }\n", "\n", " rubberband.mousedown('button_press', mouse_event_fn);\n", " rubberband.mouseup('button_release', mouse_event_fn);\n", " // Throttle sequential mouse events to 1 every 20ms.\n", " rubberband.mousemove('motion_notify', mouse_event_fn);\n", "\n", " rubberband.mouseenter('figure_enter', mouse_event_fn);\n", " rubberband.mouseleave('figure_leave', mouse_event_fn);\n", "\n", " canvas_div.on(\"wheel\", function (event) {\n", " event = event.originalEvent;\n", " event['data'] = 'scroll'\n", " if (event.deltaY < 0) {\n", " event.step = 1;\n", " } else {\n", " event.step = -1;\n", " }\n", " mouse_event_fn(event);\n", " });\n", "\n", " canvas_div.append(canvas);\n", " canvas_div.append(rubberband);\n", "\n", " this.rubberband = rubberband;\n", " this.rubberband_canvas = rubberband[0];\n", " this.rubberband_context = rubberband[0].getContext(\"2d\");\n", " this.rubberband_context.strokeStyle = \"#000000\";\n", "\n", " this._resize_canvas = function(width, height) {\n", " // Keep the size of the canvas, canvas container, and rubber band\n", " // canvas in synch.\n", " canvas_div.css('width', width)\n", " canvas_div.css('height', height)\n", "\n", " canvas.attr('width', width * mpl.ratio);\n", " canvas.attr('height', height * mpl.ratio);\n", " canvas.attr('style', 'width: ' + width + 'px; height: ' + height + 'px;');\n", "\n", " rubberband.attr('width', width);\n", " rubberband.attr('height', height);\n", " }\n", "\n", " // Set the figure to an initial 600x600px, this will subsequently be updated\n", " // upon first draw.\n", " this._resize_canvas(600, 600);\n", "\n", " // Disable right mouse context menu.\n", " $(this.rubberband_canvas).bind(\"contextmenu\",function(e){\n", " return false;\n", " });\n", "\n", " function set_focus () {\n", " canvas.focus();\n", " canvas_div.focus();\n", " }\n", "\n", " window.setTimeout(set_focus, 100);\n", "}\n", "\n", "mpl.figure.prototype._init_toolbar = function() {\n", " var fig = this;\n", "\n", " var nav_element = $('
');\n", " nav_element.attr('style', 'width: 100%');\n", " this.root.append(nav_element);\n", "\n", " // Define a callback function for later on.\n", " function toolbar_event(event) {\n", " return fig.toolbar_button_onclick(event['data']);\n", " }\n", " function toolbar_mouse_event(event) {\n", " return fig.toolbar_button_onmouseover(event['data']);\n", " }\n", "\n", " for(var toolbar_ind in mpl.toolbar_items) {\n", " var name = mpl.toolbar_items[toolbar_ind][0];\n", " var tooltip = mpl.toolbar_items[toolbar_ind][1];\n", " var image = mpl.toolbar_items[toolbar_ind][2];\n", " var method_name = mpl.toolbar_items[toolbar_ind][3];\n", "\n", " if (!name) {\n", " // put a spacer in here.\n", " continue;\n", " }\n", " var button = $('