{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "import json\n", "from pandas.io.json import json_normalize\n", "\n", "from IPython.display import IFrame" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Datengui.de Praxis @SciCar" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Disclaimer: \n", "\n", "Datengui.de befindet sich in der Entwicklung. \n", "Es gibt eine API und erste Prototypen, um die API zu benutzen. \n", "Die eigentliche Anwendung ist aber noch nicht da. \n", "Deswegen ist der hier beschriebene Weg etwas umständlich und beeinhaltet technische Hürden. \n", "Trotzdem wollten wir Euch die API schon einmal ausprobieren lassen, denn jeder Hinweis von Euch hilft, um die API in der späteren Anwendung besser nutzbar zu machen. Und falls ihr die Hürden nehmt, könnt ihr die Daten schon nutzen :)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Schritt 1: Datensatz suchen" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Simon Wörpel hat dazu folgende Flask App gebaut: \n", "**[Regionalstatistik: docs.genesapi.org](https://docs.genesapi.org)**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sucht doch mal zum Beispiel nach 'Kühe' " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ergebnis:\n", " \n", "**https://docs.genesapi.org/AI0902**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Runterscrollen bis zum [**Beispiel**](https://api-next.datengui.de/graphql?query=%7B%0A%20%20allRegions%20%7B%0A%20%20%20%20page%0A%20%20%20%20total%0A%20%20%20%20itemsPerPage%0A%20%20%20%20regions%20%7B%0A%20%20%20%20%20%20id%0A%20%20%20%20%20%20name%0A%20%20%20%20%20%20AI0902%20%7B%0A%20%20%20%20%20%20%20%20year%0A%20%20%20%20%20%20%20%20value%0A%20%20%20%20%20%20%7D%0A%20%20%20%20%7D%0A%20%20%7D%0A%7D%0A)" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "{\n", " allRegions {\n", " page\n", " total\n", " itemsPerPage\n", " regions {\n", " id\n", " name\n", " AI0902 {\n", " year\n", " value\n", " }\n", " }\n", " }\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Dieser Link führt zu: \n", " [**Datenguide API Playground - alpha**](https://api-next.datengui.de/graphql)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Auf ▶️ Play-Drücken:\n", " \n", "Daten werden abgerufen und rechts angezeigt." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Was könnt und solltet ihr anpassen:** " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- `itemsPerPage`: Wie viele Elemente enthält der Datensatz\n", " \n", "- `nuts`: Das ist die lustige, aber offizielle EU-weite Bezeichnung für Gebiets-Ebenen. Nicht alle Daten sind auf allen Gebiets-Ebenen verfügbar.\n", " - `nuts: 1` steht für die Ebene Bundesländer\n", " - `nuts: 3` steht für die Ebene Kreise und kreisfreie Städte\n", " - Statt `nuts` könnt ihr auch `lau: 1` benutzen, um Daten auf Gemeinde-Ebene abzufragen.\n", " \n", " \n", "Einschränken der Datenabfrage mittels Filter:\n", " \n", "Ihr könnt in Klammern hinter dem Merkmal (`AI0902`) noch Filter setzen:\n", " \n", "- `year`: Zeitpunkt der Daten einschränken, zum Beispiel `year: 2016`, um nur Daten für 2016 zu zeigen. Aktuell sind in unserer (experimentellen) API Daten bis 2017 enthalten.\n", "- *Ausprägungen des Merkmals:*\n", " Viele Merkmale haben noch verschiedene Ausprägungen. Zum Beispiel [Bevölkerungsstand](https://docs.genesapi.org/BEVSTD) – hier könnt ihr die Abfrage durch Attribute eingränzen, zum Beispiel nur Daten für das weibliche Geschlecht abfragen ([Beispiel](https://docs.genesapi.org/BEVSTD/GES))" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "{\n", " allRegions (itemsPerPage: 50) {\n", " page\n", " total\n", " itemsPerPage\n", " regions (nuts: 1) {\n", " id\n", " name\n", " AI0902 (year: 2016) {\n", " year\n", " value\n", " }\n", " }\n", " }\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Schritt 2: Daten speichern" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Rechte Seite komplett markieren und als `kuehe.json` speichern" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Alternative für Experten: Daten direkt in dieses Notebook laden\n", "\n", "Ihr könnt mit der `requests`-Library ganz einfach auch die Antwort der API-Abfrage direkt laden:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import requests\n", "\n", "url = 'https://api-next.datengui.de/graphql'\n", "\n", "query = \"\"\"\n", "{\n", " allRegions (itemsPerPage: 50) {\n", " page\n", " total\n", " itemsPerPage\n", " regions (nuts: 1) {\n", " id\n", " name\n", " AI0902 (year: 2016) {\n", " year\n", " value\n", " }\n", " }\n", " }\n", "}\n", "\"\"\"\n", "\n", "res = requests.post(url, data={'query': query})\n", "d = res.json() # das weiter unten nutzen anstatt es aus der Datei (nächster Schritt) zu laden" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Schritt 3: .json einlesen" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In Zukunft soll das Ganze natürlich in verschiedenen Formaten (zum Beispiel .csv) herunterladbar sein.\n", "Gregor Aisch hat da mal einen Prototypen gebaut: https://gka.me/datenguide/. Rückmeldungen, Wünsche dazu gerne. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hier kommt der Python-Weg:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "#load json object\n", "with open('kuehe.json') as f:\n", " d = json.load(f)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'data': {'allRegions': {'page': 0,\n", " 'total': 16,\n", " 'itemsPerPage': 50,\n", " 'regions': [{'id': '10',\n", " 'name': 'Saarland',\n", " 'AI0902': [{'year': 2016, 'value': 61.7}]},\n", " {'id': '11', 'name': 'Berlin', 'AI0902': [{'year': 2016, 'value': 23.6}]},\n", " {'id': '12',\n", " 'name': 'Brandenburg',\n", " 'AI0902': [{'year': 2016, 'value': 40.4}]},\n", " {'id': '13',\n", " 'name': 'Mecklenburg-Vorpommern',\n", " 'AI0902': [{'year': 2016, 'value': 40.6}]},\n", " {'id': '14', 'name': 'Sachsen', 'AI0902': [{'year': 2016, 'value': 53.7}]},\n", " {'id': '15',\n", " 'name': 'Sachsen-Anhalt',\n", " 'AI0902': [{'year': 2016, 'value': 28.6}]},\n", " {'id': '16',\n", " 'name': 'Thüringen',\n", " 'AI0902': [{'year': 2016, 'value': 41.5}]},\n", " {'id': '01',\n", " 'name': 'Schleswig-Holstein',\n", " 'AI0902': [{'year': 2016, 'value': 110.7}]},\n", " {'id': '02', 'name': 'Hamburg', 'AI0902': [{'year': 2016, 'value': 42.1}]},\n", " {'id': '03',\n", " 'name': 'Niedersachsen',\n", " 'AI0902': [{'year': 2016, 'value': 100.3}]},\n", " {'id': '04', 'name': 'Bremen', 'AI0902': [{'year': 2016, 'value': 124.1}]},\n", " {'id': '05',\n", " 'name': 'Nordrhein-Westfalen',\n", " 'AI0902': [{'year': 2016, 'value': 98.1}]},\n", " {'id': '06', 'name': 'Hessen', 'AI0902': [{'year': 2016, 'value': 57.2}]},\n", " {'id': '07',\n", " 'name': 'Rheinland-Pfalz',\n", " 'AI0902': [{'year': 2016, 'value': 49.1}]},\n", " {'id': '08',\n", " 'name': 'Baden-Württemberg',\n", " 'AI0902': [{'year': 2016, 'value': 69.5}]},\n", " {'id': '09',\n", " 'name': 'Bayern',\n", " 'AI0902': [{'year': 2016, 'value': 101.9}]}]}}}" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "d = d['data']['allRegions']['regions']" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'id': '10', 'name': 'Saarland', 'AI0902': [{'year': 2016, 'value': 61.7}]},\n", " {'id': '11', 'name': 'Berlin', 'AI0902': [{'year': 2016, 'value': 23.6}]},\n", " {'id': '12',\n", " 'name': 'Brandenburg',\n", " 'AI0902': [{'year': 2016, 'value': 40.4}]},\n", " {'id': '13',\n", " 'name': 'Mecklenburg-Vorpommern',\n", " 'AI0902': [{'year': 2016, 'value': 40.6}]},\n", " {'id': '14', 'name': 'Sachsen', 'AI0902': [{'year': 2016, 'value': 53.7}]},\n", " {'id': '15',\n", " 'name': 'Sachsen-Anhalt',\n", " 'AI0902': [{'year': 2016, 'value': 28.6}]},\n", " {'id': '16', 'name': 'Thüringen', 'AI0902': [{'year': 2016, 'value': 41.5}]},\n", " {'id': '01',\n", " 'name': 'Schleswig-Holstein',\n", " 'AI0902': [{'year': 2016, 'value': 110.7}]},\n", " {'id': '02', 'name': 'Hamburg', 'AI0902': [{'year': 2016, 'value': 42.1}]},\n", " {'id': '03',\n", " 'name': 'Niedersachsen',\n", " 'AI0902': [{'year': 2016, 'value': 100.3}]},\n", " {'id': '04', 'name': 'Bremen', 'AI0902': [{'year': 2016, 'value': 124.1}]},\n", " {'id': '05',\n", " 'name': 'Nordrhein-Westfalen',\n", " 'AI0902': [{'year': 2016, 'value': 98.1}]},\n", " {'id': '06', 'name': 'Hessen', 'AI0902': [{'year': 2016, 'value': 57.2}]},\n", " {'id': '07',\n", " 'name': 'Rheinland-Pfalz',\n", " 'AI0902': [{'year': 2016, 'value': 49.1}]},\n", " {'id': '08',\n", " 'name': 'Baden-Württemberg',\n", " 'AI0902': [{'year': 2016, 'value': 69.5}]},\n", " {'id': '09', 'name': 'Bayern', 'AI0902': [{'year': 2016, 'value': 101.9}]}]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "d = json_normalize(d, 'AI0902', ['id', 'name', ['year', 'value']], errors='ignore')" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
yearvalueidnameyear.value
0201661.710SaarlandNaN
1201623.611BerlinNaN
2201640.412BrandenburgNaN
3201640.613Mecklenburg-VorpommernNaN
4201653.714SachsenNaN
5201628.615Sachsen-AnhaltNaN
6201641.516ThüringenNaN
72016110.701Schleswig-HolsteinNaN
8201642.102HamburgNaN
92016100.303NiedersachsenNaN
102016124.104BremenNaN
11201698.105Nordrhein-WestfalenNaN
12201657.206HessenNaN
13201649.107Rheinland-PfalzNaN
14201669.508Baden-WürttembergNaN
152016101.909BayernNaN
\n", "
" ], "text/plain": [ " year value id name year.value\n", "0 2016 61.7 10 Saarland NaN\n", "1 2016 23.6 11 Berlin NaN\n", "2 2016 40.4 12 Brandenburg NaN\n", "3 2016 40.6 13 Mecklenburg-Vorpommern NaN\n", "4 2016 53.7 14 Sachsen NaN\n", "5 2016 28.6 15 Sachsen-Anhalt NaN\n", "6 2016 41.5 16 Thüringen NaN\n", "7 2016 110.7 01 Schleswig-Holstein NaN\n", "8 2016 42.1 02 Hamburg NaN\n", "9 2016 100.3 03 Niedersachsen NaN\n", "10 2016 124.1 04 Bremen NaN\n", "11 2016 98.1 05 Nordrhein-Westfalen NaN\n", "12 2016 57.2 06 Hessen NaN\n", "13 2016 49.1 07 Rheinland-Pfalz NaN\n", "14 2016 69.5 08 Baden-Württemberg NaN\n", "15 2016 101.9 09 Bayern NaN" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "d = d[['value', 'name']]" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valuename
061.7Saarland
123.6Berlin
240.4Brandenburg
340.6Mecklenburg-Vorpommern
453.7Sachsen
528.6Sachsen-Anhalt
641.5Thüringen
7110.7Schleswig-Holstein
842.1Hamburg
9100.3Niedersachsen
10124.1Bremen
1198.1Nordrhein-Westfalen
1257.2Hessen
1349.1Rheinland-Pfalz
1469.5Baden-Württemberg
15101.9Bayern
\n", "
" ], "text/plain": [ " value name\n", "0 61.7 Saarland\n", "1 23.6 Berlin\n", "2 40.4 Brandenburg\n", "3 40.6 Mecklenburg-Vorpommern\n", "4 53.7 Sachsen\n", "5 28.6 Sachsen-Anhalt\n", "6 41.5 Thüringen\n", "7 110.7 Schleswig-Holstein\n", "8 42.1 Hamburg\n", "9 100.3 Niedersachsen\n", "10 124.1 Bremen\n", "11 98.1 Nordrhein-Westfalen\n", "12 57.2 Hessen\n", "13 49.1 Rheinland-Pfalz\n", "14 69.5 Baden-Württemberg\n", "15 101.9 Bayern" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "d = d.set_index('name')" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
value
name
Saarland61.7
Berlin23.6
Brandenburg40.4
Mecklenburg-Vorpommern40.6
Sachsen53.7
Sachsen-Anhalt28.6
Thüringen41.5
Schleswig-Holstein110.7
Hamburg42.1
Niedersachsen100.3
Bremen124.1
Nordrhein-Westfalen98.1
Hessen57.2
Rheinland-Pfalz49.1
Baden-Württemberg69.5
Bayern101.9
\n", "
" ], "text/plain": [ " value\n", "name \n", "Saarland 61.7\n", "Berlin 23.6\n", "Brandenburg 40.4\n", "Mecklenburg-Vorpommern 40.6\n", "Sachsen 53.7\n", "Sachsen-Anhalt 28.6\n", "Thüringen 41.5\n", "Schleswig-Holstein 110.7\n", "Hamburg 42.1\n", "Niedersachsen 100.3\n", "Bremen 124.1\n", "Nordrhein-Westfalen 98.1\n", "Hessen 57.2\n", "Rheinland-Pfalz 49.1\n", "Baden-Württemberg 69.5\n", "Bayern 101.9" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# d.to_clipboard()\n", "d.to_csv('kuehe.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Schritt 4: Datawrapper" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Neue Karte\n", "- Deutschland: Bundesländer \n", "- Daten einfügen\n", "- Anpassen" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "IFrame('//datawrapper.dwcdn.net/AWDSM/2/', width='600', height='400')" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }