{ "cells": [ { "cell_type": "markdown", "source": [ "
\n", " \n", "
\n", " \n", "
Run in Google Colab
\n", "
\n", "
\n", " \n", "
\n", " \n", "
Download Notebook
\n", "
\n", "
\n", " \n", "
\n", " \n", "
View on GitHub
\n", "
\n", "
\n", "
\n", "\n", "# Real estate in Zürich\n", "\n", "the Zürich Statistical Office collects data on the city and its residents. This data is published as [Linked Data](https://en.wikipedia.org/wiki/Linked_data). \n", "\n", "In this tutorial, we will show **how to work with Linked Data.** Mainly, we will see how to work with the real estate dataset. \n", "We will look into how to query, process, and visualize it.\n" ], "metadata": {} }, { "cell_type": "markdown", "source": [ "[1. Housing prices](#Housing-prices) \n", "      [1.1 Prices per apartment type](#Prices-per-apartment-type) \n", "      [1.2 Prices per district](#Prices-per-district) \n", "      [1.3 Prices per district and apartment type](#Prices-per-district-and-apartment-type) \n", "      [1.4 Prices over time](#Prices-over-time) \n", "      [1.5 Prices over time for various apartment types](#Prices-over-time-for-various-apartment-types) \n", "[2. Apartments and population growth](#Apartments-and-population-growth) \n", "[3. Apartment types over time](#Apartment-types-over-time) " ], "metadata": {} }, { "cell_type": "markdown", "source": [ "### SPARQL endpoint\n", "\n", "Data on the real estate market is published as Linked Data. It can be accessed with [SPARQL queries](https://www.w3.org/TR/rdf-sparql-query/). \n", "You can send queries using HTTP requests. The API endpoint is **[https://ld.stadt-zuerich.ch/query](https://ld.stadt-zuerich.ch/query).** \n", " \n", " \n", "Let's use `SparqlClient` from [graphly](https://github.com/zazuko/graphly) to communicate with the database. \n", "Graphly will allow us to:\n", "* send SPARQL queries\n", "* automatically add prefixes to all queries\n", "* format response to `pandas` or `geopandas`" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "# Uncomment to install dependencies in Colab environment\n", "#!pip install git+https://github.com/zazuko/graphly.git" ], "outputs": [], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "import datetime\n", "import re\n", "\n", "import pandas as pd\n", "import numpy as np\n", "\n", "import plotly.express as px\n", "import plotly.graph_objects as go\n", "from plotly.subplots import make_subplots\n", "\n", "from graphly.api_client import SparqlClient" ], "outputs": [], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "sparql = SparqlClient(\"https://ld.stadt-zuerich.ch/query\")\n", "sparql.add_prefixes({\n", " \"schema\": \"\",\n", " \"cube\": \"\",\n", " \"property\": \"\",\n", " \"measure\": \"\",\n", " \"skos\": \"\",\n", " \"ssz\": \"\"\n", "})" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "SPARQL queries can become very long. To improve the readibility, we will work wih [prefixes](https://en.wikibooks.org/wiki/SPARQL/Prefixes).\n", " \n", "Using the `add_prefixes` method, we define persistent prefixes. \n", "Every time you send a query, `graphly` will automatically add the prefixes for you." ], "metadata": {} }, { "cell_type": "markdown", "source": [ "## Housing prices\n", "\n", "\n", "Let's find the average price per m2 for an apartment in Zurich. This data is available in the `QMP-EIG-HAA-OBJ-ZIM` data cube. It will allow us to find the price per city district and apartment type. The data is also available for different points in time. \n", "\n", "The query for housing prices in city of Zürich for different districts and apartment types over time looks as follows:\n", "\n" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "query = \"\"\"\n", "SELECT ?time ?place ?rooms ?price\n", "FROM \n", "WHERE {\n", " ssz:QMP-EIG-HAA-OBJ-ZIM a cube:Cube;\n", " cube:observationSet/cube:observation ?observation. \n", " ?observation property:TIME ?time ;\n", " property:RAUM ?place_uri;\n", " property:ZIM/schema:name ?rooms;\n", " measure:QMP ?price .\n", " ?place_uri skos:inScheme ;\n", " schema:name ?place .\n", " FILTER regex(str(?place),\"ab|Stadtgebiet vor\")\n", " FILTER (?price > 0)\n", "}\n", "ORDER BY ?time\n", "\"\"\"\n", "\n", "df = sparql.send_query(query)\n", "df.head()" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "### Prices per apartment type\n", "\n", "Let's visualize the housing prices per apartment type. To do this, we will aggregate the prices per `rooms`. \n", "The cleaned dataframe becomes:" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "df.place = df.place.apply(lambda x: re.findall('Kreis \\d+', x)[0])\n", "df.rooms = df.rooms.apply(lambda x: int(re.findall('\\d+', x)[0]))\n", "plot_df = df[[\"rooms\", \"price\"]][df.time == df.time.max()].groupby([\"rooms\"]).mean().astype(int).sort_values(by=\"rooms\").reset_index()\n", "plot_df" ], "outputs": [], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "fig = px.bar(plot_df, x=\"rooms\", y=\"price\")\n", "fig.update_layout(\n", " title='Housing prices in Zürich', \n", " title_x=0.5,\n", " yaxis_title=\"CHF per m2\",\n", " xaxis_title=\"rooms\"\n", ")\n", "fig.show()" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "### Prices per district\n", "\n", "Let's visualize the housing prices per district. To do this, we will aggregate the prices per `place`. \n", "The cleaned dataframe becomes:" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "plot_df = df[[\"place\", \"price\"]][df.time == df.time.max()].groupby([\"place\"]).mean().astype(int).sort_values(by=\"price\").reset_index()" ], "outputs": [], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "fig = px.bar(plot_df, x=\"place\", y=\"price\")\n", "fig.update_layout(\n", " title='Housing prices in Zürich', \n", " title_x=0.5,\n", " yaxis_title=\"CHF per m2\",\n", " xaxis_title=\"district\",\n", ")\n", "fig.show()" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "### Prices per district and apartment type\n", "\n", "Now, let's combine both views. We will visualize housing prices across districts and apartment types. \n", "The dataframe for plotting becomes:" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "plot_df = df[[\"place\", \"rooms\", \"price\"]][df.time == df.time.max()]\n", "plot_df = plot_df[plot_df.rooms <= 4]\n", "plot_df[\"avg_price\"] = plot_df[[\"price\", \"place\"]].groupby([\"place\"]).transform(lambda x: sum(x)/len(x))\n", "plot_df = plot_df.sort_values(by=[\"avg_price\", \"rooms\"])\n", "\n", "plot_df.head()" ], "outputs": [], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "fig = make_subplots(rows=3, cols=4, subplot_titles=plot_df[\"place\"].unique(), shared_yaxes=True, y_title='CHF per m2', x_title='rooms', vertical_spacing=0.1)\n", "\n", "for i, district in enumerate(plot_df[\"place\"].unique()):\n", " \n", " row = i//4 + 1\n", " col = i%4 + 1\n", " subset = plot_df[plot_df[\"place\"] == district]\n", " fig.append_trace(go.Bar(\n", " x=subset[\"rooms\"],\n", " y=subset[\"price\"],\n", " name=district, \n", " marker_color=px.colors.qualitative.Dark24[0]\n", " ), row=row, col=col)\n", "\n", "fig.update_layout(height=800, width=1000, title={\"text\": \"Housing prices in Zürich\", \"x\": 0.5}, showlegend=False)\n", "fig.update_yaxes(range=[0,20000])\n", "fig.show()" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "

City districts

\n", "\"drawing\" \n", "\n", "\n", "\n" ], "metadata": {} }, { "cell_type": "markdown", "source": [ "### Prices over time\n", "\n", "Let's take a look at averge housing prices over time. We will aggregate the results by `time`. \n", "Here is the dataframe and the graph:" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "plot_df = df[[\"time\", \"price\"]].groupby([\"time\"]).mean().reset_index()\n", "plot_df.head()" ], "outputs": [], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "fig = px.line(plot_df, x='time', y=\"price\")\n", "fig.update_layout(\n", " title='Housing prices in Zürich', \n", " title_x=0.5,\n", " yaxis_title=\"CHF per m2\",\n", " xaxis_title=\"Year\",\n", ")\n", "fig.show()" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "### Prices over time for various apartment types\n", "\n", "Now that we have seen the trend, let's dig a bit deeper. We will visualize housing prices over time for 2, 4 and 6-room apartments.\n", "To achieve this, we need to reshape our dataframe to show prices per apartment type over time. \n", "\n", "The reshaped dataframe becomes:" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "plot_df = pd.pivot_table(df, index=\"time\", columns=\"rooms\", values=\"price\", aggfunc=np.mean).reset_index()\n", "plot_df.head()" ], "outputs": [], "metadata": { "scrolled": true } }, { "cell_type": "code", "execution_count": null, "source": [ "fig = make_subplots(rows=3, cols=1, y_title='CHF per m2')\n", "\n", "for i, j in enumerate([2,4,6]):\n", " \n", " fig.append_trace(go.Scatter(\n", " x=plot_df[\"time\"],\n", " y=plot_df[j],\n", " name=\"Rooms: {}\".format(j), \n", " marker_color=px.colors.qualitative.Dark24[i]\n", " ), row=i+1, col=1)\n", "\n", "fig.update_layout(title={\"text\": \"Housing prices in Zürich\", \"x\": 0.5}, showlegend=True)\n", "fig.update_yaxes(range=[8000,14000])\n", "fig.show()" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "## Apartments and population growth\n", "\n", "Real estate prices are influenced by:\n", "* Number of available apartments\n", "* Number of people willing to buy a an apartment\n", "\n", "Let's take a look at how those numbers evolved over time. Mainly, we want find the apartments and population count in city of Zurich. \n", "\n", "The query for the number of inhabitants and apartments over time looks as follows:" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "query = \"\"\"\n", "SELECT * \n", "FROM \n", "WHERE{ \n", " {\n", " SELECT ?time (SUM(?pop_count) AS ?pop)\n", " WHERE {\n", " ssz:BEW a cube:Cube;\n", " cube:observationSet/cube:observation ?obs_bew. \n", " ?obs_bew property:TIME ?time ;\n", " property:RAUM ?place_uri_pop;\n", " measure:BEW ?pop_count .\n", "\n", " ?place_uri_pop skos:inScheme ;\n", " schema:name ?place_pop .\n", "\n", " FILTER regex(str(?place_pop),\"ab|Stadtgebiet vor\")\n", " }\n", " GROUP BY ?time\n", " }\n", " {\n", " SELECT ?time (SUM(?apt_count) AS ?apts)\n", " WHERE {\n", " ssz:WHG a cube:Cube;\n", " cube:observationSet/cube:observation ?obs_apt. \n", " ?obs_apt property:TIME ?time ;\n", " property:RAUM ?place_uri_apt;\n", " measure:WHG ?apt_count .\n", "\n", " ?place_uri_apt skos:inScheme ;\n", " schema:name ?place .\n", "\n", " FILTER regex(str(?place),\"ab|Stadtgebiet vor\")\n", " }\n", " GROUP BY ?time\n", " }\n", "}\n", "ORDER BY ?time\n", "\"\"\"\n", "\n", "df = sparql.send_query(query)\n", "df.head()" ], "outputs": [], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "fig = make_subplots(specs=[[{\"secondary_y\": True}]])\n", "\n", "fig.add_trace(\n", " go.Scatter(x=df[\"time\"], y=df[\"pop\"], name=\"Population\"),\n", " secondary_y=False,\n", ")\n", "\n", "fig.add_trace(\n", " go.Scatter(x=df[\"time\"], y=df[\"apts\"], name=\"Apartments\"),\n", " secondary_y=True,\n", ")\n", "\n", "fig.update_layout(title={\"text\": \"Population and Apartments in Zürich\", \"x\": 0.5})\n", "fig.update_yaxes(title_text=\"population\", secondary_y=False)\n", "fig.update_yaxes(title_text=\"apartments\", secondary_y=True)\n", "fig.show()" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "## Apartment types over time\n", "\n", "Let's take a look at what apartments were, and are available in Zurich. How did the share of various apparmnent types evolve over time?\n", "Is there a trend to build more smaller (or bigger) apartments?\n", "These insights are available in the `WHG-ZIM` data cube.\n", "\n", "The query for the number of different apartment types over time looks as follows:" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "query = \"\"\"\n", "SELECT ?time ?rooms (SUM(?count) AS ?apts)\n", "FROM \n", "WHERE {\n", " ssz:WHG-ZIM a cube:Cube;\n", " cube:observationSet/cube:observation ?obs. \n", " ?obs property:TIME ?time ;\n", " property:RAUM ?place_uri;\n", " property:ZIM/schema:name ?rooms ;\n", " measure:WHG ?count .\n", "\n", " ?place_uri skos:inScheme ;\n", " schema:name ?place .\n", " \n", " FILTER regex(str(?place),\"ab|Stadtgebiet vor\")\n", " FILTER (?time >= \"1977-01-01\"^^xsd:time)\n", "}\n", "GROUP BY ?time ?rooms\n", "ORDER BY ?time ?rooms\n", "\"\"\"\n", "\n", "df = sparql.send_query(query)\n", "df.head()" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "Let's reshape our dataframe to show the apartment count per apartment type, over time:" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "def rename_rooms(x: str) -> str:\n", " \n", " elements = x.split(\"- und \")\n", " if elements[1] == \"mehr-Zimmer Wohnung\":\n", " return elements[0] + \"+\"\n", " else:\n", " return elements [0]\n", "\n", "\n", "df.rooms = df.rooms.apply(rename_rooms)\n", "df = pd.pivot_table(df, index=\"time\", columns=\"rooms\", values=\"apts\")\n", "\n", "# Data cleaning\n", "df[\"5+\"][df[\"5+\"].isna()] = df[\"5\"][df[\"5+\"].isna()] + df[\"6+\"][df[\"5+\"].isna()]\n", "df = df[[\"1\", \"2\", \"3\", \"4\", \"5+\"]].reset_index().rename_axis(None, axis=1)\n", "df = df.fillna(method=\"ffill\")\n", "df.head()" ], "outputs": [], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "cols = [\"1\", \"2\", \"3\", \"4\", \"5+\"]\n", "start = df[cols].iloc[0] / sum(df[cols].iloc[0])\n", "end = df[cols].iloc[-1] / sum(df[cols].iloc[-1])\n", "diff = abs(end-start)\n", "cols = [x for _, x in sorted(zip(diff, cols))]\n", "\n", "fig = px.histogram(df, x=\"time\", y=cols, barnorm=\"percent\")\n", "fig.update_layout(\n", " title='Apartments in Zürich', \n", " title_x=0.5,\n", " yaxis_title=\"Share of apartments in %\",\n", " xaxis_title=\"Year\",\n", " legend={\"title\": \"Rooms\"}\n", ")\n", "fig['layout']['yaxis']['range'] = [0,100]\n", "fig.show()" ], "outputs": [], "metadata": {} } ], "metadata": { "kernelspec": { "display_name": "zazuko", "language": "python", "name": "zazuko" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.0" }, "title": "Real estate" }, "nbformat": 4, "nbformat_minor": 5 }