{ "cells": [ { "cell_type": "markdown", "id": "88776377", "metadata": {}, "source": [ "
\n", " \n", "
\n", " \n", "
Run in Google Colab
\n", "
\n", "
\n", " \n", "
\n", " \n", "
Download Notebook
\n", "
\n", "
\n", " \n", "
\n", " \n", "
View on GitHub
\n", "
\n", "
\n", "
\n", "\n", "# Economic activities in Zürich\n", "\n", "Zürich Statistical Office collects data on the city and its residents. This data is published as [Linked Data](https://en.wikipedia.org/wiki/Linked_data). \n", "\n", "In this tutorial, we will show **how to work with Linked Data.** Mainly, we will see how to work with data on economic activities. \n", "We will look into how to query, process, and visualize it. \n" ] }, { "cell_type": "markdown", "id": "2924a8b7", "metadata": {}, "source": [ "[1. Restaurants over time](#Restaurants-over-time) \n", "[2. Restaurants in city quartiers](#Restaurants-in-city-quartiers) \n", "[3. After-school care: gender-representation](#After-school-care:-gender-representation)" ] }, { "cell_type": "markdown", "id": "cb4ca49f", "metadata": {}, "source": [ "### SPARQL endpoint\n", "\n", "Data on some economic activities is published as Linked Data. It can be accessed with [SPARQL queries](https://www.w3.org/TR/rdf-sparql-query/). \n", "You can send queries using HTTP requests. The API endpoint is **[https://ld.stadt-zuerich.ch/query](https://ld.stadt-zuerich.ch/query).** \n", " \n", " \n", "Let's use `SparqlClient` from [graphly](https://github.com/zazuko/graphly) to communicate with the database. \n", "Graphly will allow us to:\n", "* send SPARQL queries\n", "* automatically add prefixes to all queries\n", "* format response to `pandas` or `geopandas`" ] }, { "cell_type": "code", "execution_count": null, "id": "db0ac7ed", "metadata": {}, "outputs": [], "source": [ "# Uncomment to install dependencies in Colab environment\n", "#!pip install mapclassify\n", "#!pip install git+https://github.com/zazuko/graphly.git" ] }, { "cell_type": "code", "execution_count": null, "id": "ede96f69", "metadata": {}, "outputs": [], "source": [ "import mapclassify\n", "import matplotlib\n", "import matplotlib.cm\n", "\n", "import pandas as pd\n", "import plotly.express as px\n", "import plotly.graph_objects as go\n", "\n", "from graphly.api_client import SparqlClient" ] }, { "cell_type": "code", "execution_count": null, "id": "5fc411a1", "metadata": {}, "outputs": [], "source": [ "sparql = SparqlClient(\"https://ld.stadt-zuerich.ch/query\")\n", "wikisparql = SparqlClient(\"https://query.wikidata.org/sparql\")\n", "\n", "sparql.add_prefixes({\n", " \"schema\": \"\",\n", " \"cube\": \"\",\n", " \"property\": \"\",\n", " \"measure\": \"\",\n", " \"skos\": \"\",\n", " \"ssz\": \"\"\n", "})" ] }, { "cell_type": "markdown", "id": "c5beeef5", "metadata": {}, "source": [ "SPARQL queries can become very long. To improve the readibility, we will work wih [prefixes](https://en.wikibooks.org/wiki/SPARQL/Prefixes).\n", " \n", "Using `add_prefixes` method, we can define persistent prefixes. \n", "Every time you send a query, `graphly` will now automatically add the prefixes for you." ] }, { "cell_type": "markdown", "id": "bea24838", "metadata": {}, "source": [ "## Restaurants over time\n", "\n", "Let's find the number of restaurants in Zurich over time. This information is available in the `AST-BTA` data cube. To give restaurant numbers a context, let's scale them by population size. The number of inhabitants over time can be found in the `BEW` data cube.\n", "\n", "The query for number of inhabitants and restaurants over time is as follows:\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "id": "058fd0ea", "metadata": {}, "outputs": [], "source": [ "query = \"\"\"\n", "SELECT *\n", "FROM \n", "WHERE {\n", " {\n", " SELECT ?time (SUM(?ast) AS ?restaurants)\n", " WHERE {\n", " ssz:AST-BTA a cube:Cube;\n", " cube:observationSet/cube:observation ?obs_rest. \n", " ?obs_rest property:TIME ?time ; \n", " property:RAUM ;\n", " property:BTA ;\n", " measure:AST ?ast . \n", " }\n", " GROUP BY ?time ?place\n", " }\n", " {\n", " SELECT ?time ?pop\n", " WHERE {\n", " ssz:BEW a cube:Cube;\n", " cube:observationSet/cube:observation ?obs_pop. \n", " ?obs_pop property:TIME ?time ; \n", " property:RAUM ;\n", " measure:BEW ?pop\n", " }\n", " } \n", "}\n", "ORDER BY ?time\n", "\"\"\"\n", "\n", "df = sparql.send_query(query)\n", "df.head()" ] }, { "cell_type": "markdown", "id": "daf5bb10", "metadata": {}, "source": [ "Let's calculate number of restaurants per 10 000 inhabitants" ] }, { "cell_type": "code", "execution_count": null, "id": "1fc70a24", "metadata": {}, "outputs": [], "source": [ "df = df.fillna(method=\"ffill\")\n", "df[\"Restaurants per 10 000 inhabitants\"] = df[\"restaurants\"]/df[\"pop\"]*10000" ] }, { "cell_type": "code", "execution_count": null, "id": "ccc41ff8", "metadata": {}, "outputs": [], "source": [ "fig = px.line(df, x=\"time\", y = \"Restaurants per 10 000 inhabitants\", labels={\"time\": \"Years\"})\n", "fig.update_layout(title_text='Restaurants in Zürich over time', title_x=0.5)" ] }, { "cell_type": "markdown", "id": "6ff63afe", "metadata": {}, "source": [ "## Restaurants in city quartiers \n", "\n", "Let's find the number of restaurants in different parts of the city. The data on restaurants is available in the `AST-BTA` data cube. To show the quartiers on a map, we will need their geographic coordinates. This data is available in `Wikidata`. We will get the number of restaurants per district from our endpoint, and the quartier centroid from `Wikidata`. \n", "\n", "Both information can be obtained using a SPARQL [federated query](https://www.w3.org/TR/sparql11-federated-query/). The endpoint for Wikidata is ``.\n", "\n", "The query for quartiers, and number of restaurants is:" ] }, { "cell_type": "code", "execution_count": null, "id": "b78ba25a", "metadata": {}, "outputs": [], "source": [ "query = \"\"\"\n", "PREFIX p: \n", "PREFIX ps: \n", "\n", "SELECT ?place ?wikidata_iri (SUM(?ast) AS ?restaurants)\n", "WHERE {\n", " \n", " ssz:AST-BTA a cube:Cube;\n", " cube:observationSet/cube:observation ?obs. \n", " \n", " ?obs property:TIME ?time ; \n", " property:RAUM ?place_uri ;\n", " property:BTA/schema:name ?bta ;\n", " measure:AST ?ast .\n", "\n", " ?place_uri skos:inScheme ;\n", " schema:name ?place ;\n", " schema:sameAs ?wikidata_id .\n", " \n", " FILTER (?time = \"2017-12-31\"^^xsd:date)\n", " \n", " BIND(IRI(?wikidata_id ) AS ?wikidata_iri ) .\n", " \n", " FILTER (?bta = \"Verpflegungsbetriebe\")\n", " \n", "}\n", "GROUP BY ?place ?wikidata_iri ?time\n", "\"\"\"\n", "\n", "restaurants = sparql.send_query(query)\n", "restaurants.head()" ] }, { "cell_type": "markdown", "id": "914715f9", "metadata": {}, "source": [ "The query for quartiers' centroids is:" ] }, { "cell_type": "code", "execution_count": null, "id": "4e1bff91", "metadata": {}, "outputs": [], "source": [ "query = \"\"\"\n", "SELECT * WHERE {{\n", " ?wikidata_iri wdt:P31 wd:Q19644586; # All objects being \"statistical neighbourhoods of Zurich\"\n", " p:P625/ps:P625 ?geometry. # Their coordinates\n", "\n", " FILTER(?wikidata_iri IN({}))\n", "}}\n", "\"\"\".format(\"<\" + \">,<\".join(restaurants.wikidata_iri) + \">\")\n", "\n", "geometries = wikisparql.send_query(query)\n", "geometries.head()" ] }, { "cell_type": "markdown", "id": "421aca69", "metadata": {}, "source": [ "By joining `restaurants` and `geometries`, we get:" ] }, { "cell_type": "code", "execution_count": null, "id": "c70d2f72", "metadata": {}, "outputs": [], "source": [ "df = pd.merge(geometries, restaurants, how=\"inner\", on=\"wikidata_iri\")\n", "df.head()" ] }, { "cell_type": "markdown", "id": "82421837", "metadata": {}, "source": [ "Let's classify the number of restaurants into 5 different buckets. We can use the `mapclassify` library to assign values in the `restaurant` column into one of five categories.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "aee11741", "metadata": {}, "outputs": [], "source": [ "N_CATEGORIES = 5\n", "df[\"text\"] = df.place + \"
Restaurants: \" + df.restaurants.astype(int).astype(str)\n", "classifier = mapclassify.NaturalBreaks(y=df[\"restaurants\"], k=N_CATEGORIES)\n", "df[\"rest_buckets\"] = df[[\"restaurants\"]].apply(classifier) " ] }, { "cell_type": "markdown", "id": "8465a34c", "metadata": {}, "source": [ "Classified values can be easily visualized on the map." ] }, { "cell_type": "code", "execution_count": null, "id": "38d2822e", "metadata": {}, "outputs": [], "source": [ "norm = matplotlib.colors.Normalize(vmin=0, vmax=N_CATEGORIES)\n", "colormap = matplotlib.cm.ScalarMappable(norm=norm, cmap=matplotlib.cm.viridis)\n", "labels = mapclassify.classifiers._get_mpl_labels(classifier, fmt=\"{:.0f}\")\n", "\n", "fig = go.Figure()\n", "\n", "for bucket in range(N_CATEGORIES):\n", "\n", " subset = df[df.rest_buckets == bucket]\n", " fig.add_trace(go.Scattermapbox(\n", " mode=\"markers\",\n", " lat=subset.geometry.y,\n", " lon=subset.geometry.x,\n", " hovertext = subset.text,\n", " hoverinfo = \"text\",\n", " name=labels[bucket],\n", " marker={'size': ((subset.restaurants)**1.5)*0.6, \"sizemode\": \"area\", \"sizemin\": 4, \"color\": \"rgba{}\".format(colormap.to_rgba(bucket+1))}, \n", " ))\n", "\n", "fig.update_layout(\n", " margin={'l': 0, 't': 50, 'b': 0, 'r': 0},\n", " mapbox={\n", " 'center': {\"lat\": 47.3815, \"lon\": 8.532},\n", " 'style': \"carto-darkmatter\",\n", " 'zoom': 11},\n", " showlegend=True,\n", " legend_title=\"Restaurants count\",\n", " title_text='Restaurants in Zürich Quartiers', \n", " title_x=0.5\n", ")\n", "\n", "fig.show(\"notebook\")" ] }, { "cell_type": "markdown", "id": "b71df885", "metadata": {}, "source": [ "## After-school care: gender-representation\n", "\n", "Let's take a look at gender representation in the public sector. In the `BES-BTA-SEX` data cube we can find information on the number of employees in different organizations. The data is reported separately for each sex, and various establishment types. Let's find the number of male and female employees in after-school care (*Hort*). \n", "\n", "The query for the number of female and male employees in after-school care over time looks as follows:\n" ] }, { "cell_type": "code", "execution_count": null, "id": "96226935", "metadata": {}, "outputs": [], "source": [ "query = \"\"\"\n", "SELECT ?time ?employees ?sex\n", "FROM \n", "WHERE {\n", " ssz:BES-BTA-SEX a cube:Cube;\n", " cube:observationSet/cube:observation ?obs. \n", " ?obs property:TIME ?time ; \n", " property:RAUM/skos:inScheme ;\n", " property:BTA/schema:name \"Horte\" ;\n", " property:SEX/schema:name ?sex ;\n", " measure:BES ?employees .\n", "}\n", "ORDER BY ?time\n", "\"\"\"\n", "df = sparql.send_query(query)\n", "df.head()" ] }, { "cell_type": "markdown", "id": "b115123c", "metadata": {}, "source": [ "Let's rearrange and rename the columns:" ] }, { "cell_type": "code", "execution_count": null, "id": "10532404", "metadata": {}, "outputs": [], "source": [ "df = pd.pivot_table(df, index=\"time\", columns=\"sex\", values=\"employees\")\n", "df = df.reset_index().rename_axis(None, axis=1)\n", "df = df.rename(columns={\"männlich\": \"male\", \"weiblich\": \"female\"})" ] }, { "cell_type": "code", "execution_count": null, "id": "85d1efdc", "metadata": {}, "outputs": [], "source": [ "fig = px.histogram(df, x=\"time\", y=df.columns, barnorm=\"percent\", labels={'x':'total_bill', 'y':'count'})\n", "fig.update_layout(\n", " title='After-school care: gender representation', \n", " title_x=0.5,\n", " yaxis_title=\"Employees in %\",\n", " xaxis_title=\"Years\"\n", ")\n", "fig['layout']['yaxis']['range'] = [0,100]\n", "fig.show(\"notebook\")" ] } ], "metadata": { "interpreter": { "hash": "7df9fa510e94119d9b6414b7654a807785c4522000477a84b4534377ae180111" }, "kernelspec": { "display_name": "zazuko", "language": "python", "name": "zazuko" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.0" }, "title": "Economy" }, "nbformat": 4, "nbformat_minor": 5 }