{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Hintergrund\n", "\n", "Siehe den [Blogeintrag](http://datenspieler.com/Karten-NOE) zu Idee und Hintergrund dieses Notebooks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Benötigte Pakete laden" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import folium\n", "import json" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Daten zusammentragen" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bevölkerungsdaten\n", "\n", "Daten zur Bevölkerung in Niederösterreich, [hier](https://www.data.gv.at/katalog/dataset/land-noe-bevolkerung-nach-alter-und-geschlecht) beschrieben. Konkret wird diese [CSV Datein](http://open-data.noe.gv.at/RU2/noe_pop_age_sex_2012_2015_lau2.csv) verwendet." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "FILE_BEVOELKERUNG = 'noe_pop_age_sex_2012_2015_lau2.csv'" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NUTS1NUTS2NUTS3LAU2_CODELAU2_NAMEAGE_GROUPPOP_TOTALPOP_MALEPOP_FEMALEYEAR
0AT1AT12AT12430101Krems an der Donau5_99504864642015
1AT1AT12AT12430101Krems an der Donau10_149714844872015
2AT1AT12AT12430101Krems an der Donau15_1911845915932015
\n", "
" ], "text/plain": [ " NUTS1 NUTS2 NUTS3 LAU2_CODE LAU2_NAME AGE_GROUP POP_TOTAL \\\n", "0 AT1 AT12 AT124 30101 Krems an der Donau 5_9 950 \n", "1 AT1 AT12 AT124 30101 Krems an der Donau 10_14 971 \n", "2 AT1 AT12 AT124 30101 Krems an der Donau 15_19 1184 \n", "\n", " POP_MALE POP_FEMALE YEAR \n", "0 486 464 2015 \n", "1 484 487 2015 \n", "2 591 593 2015 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bev = pd.read_csv(FILE_BEVOELKERUNG, encoding='latin-1', delimiter=';', decimal=',', skiprows=1)\n", "bev[:3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Formatieren des DatenFrames: Löschen von unbenötigten Spalten, Umbenennen von Spalten, Umformatieren und Berechnen des durchschnittlichen Alters pro Altersgruppe." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
iso_gemeindename_gemeindeAGE_GROUPPOP_TOTALPOP_MALEPOP_FEMALEYEARiso_bezirkalter_durchschnitt
030101Krems an der Donau5_995048646420153017.0
130101Krems an der Donau10_14971484487201530112.0
230101Krems an der Donau15_191184591593201530117.0
\n", "
" ], "text/plain": [ " iso_gemeinde name_gemeinde AGE_GROUP POP_TOTAL POP_MALE POP_FEMALE \\\n", "0 30101 Krems an der Donau 5_9 950 486 464 \n", "1 30101 Krems an der Donau 10_14 971 484 487 \n", "2 30101 Krems an der Donau 15_19 1184 591 593 \n", "\n", " YEAR iso_bezirk alter_durchschnitt \n", "0 2015 301 7.0 \n", "1 2015 301 12.0 \n", "2 2015 301 17.0 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bev = bev.drop(['NUTS1', 'NUTS2', 'NUTS3'], axis=1)\n", "bev = bev.rename(columns={'LAU2_CODE':'iso_gemeinde', 'LAU2_NAME':'name_gemeinde'})\n", "bev['iso_gemeinde'] = bev.iso_gemeinde.astype('str')\n", "bev['iso_bezirk'] = bev.iso_gemeinde.str[0:3]\n", "bev['alter_durchschnitt'] = bev.AGE_GROUP.str.split('_', expand=True).replace('', np.nan).astype('float').mean(axis=1)\n", "bev[:3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Erzeugen einer Liste aller Gemeinden und Bezirke (jeweils ISO code), für die Daten vorhanden sind. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [], "source": [ "iso_gemeinde = bev.iso_gemeinde.unique().astype('str').tolist()\n", "iso_bezirk = bev.iso_bezirk.unique().astype('str').tolist()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## GeoJSON Daten\n", "\n", "Daten zur graphischen Darstellung von Bezirken und Gemeinden finden sich [hier](http://www.strategieanalysen.at/wahlen/geojson/). Konkret verwende ich aus [dieser Datei](http://www.strategieanalysen.at/wahlen/geojson/json_94.7z) die Dateien `gemeinden.json` und `bezirke.json`." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "FILE_GEOJSON_GEMEINDE = 'gemeinden.json'\n", "FILE_GEOJSON_BEZIRK = 'bezirke.json'" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [], "source": [ "geo_json_data_gemeinde = json.load(open(FILE_GEOJSON_GEMEINDE, encoding='latin-1'))\n", "geo_json_data_bezirk = json.load(open(FILE_GEOJSON_BEZIRK, encoding='latin-1'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Die GeoJSON Files enthalten Informationen zu allen Gemeinden und Bezirken in Österreich. Da ich mich hier nur Niederösterreich interessiere, erstelle ich neue Geojson Files, die nur diese Information enthalten.\n", "\n", "Zuerst wird der Eintrag `type` kopiert, dann die Einträge `features` auf Niederösterreich eingeschränkt." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'FeatureCollection'" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "geo_json_data_gemeinde['type']" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [], "source": [ "geo_gemeinde = dict()\n", "geo_bezirk = dict()\n", "geo_gemeinde['type'] = geo_json_data_gemeinde['type']\n", "geo_bezirk['type'] = geo_json_data_bezirk['type']" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "{'iso': '80214', 'iso_alt': None, 'name': 'Gaissau'}" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "geo_json_data_gemeinde['features'][1]['properties']" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [], "source": [ "geo_gemeinde['features'] = [data for data in geo_json_data_gemeinde['features'] \n", " if data['properties']['iso'] in iso_gemeinde]\n", "geo_bezirk['features'] = [data for data in geo_json_data_bezirk['features'] \n", " if data['properties']['iso'] in iso_bezirk]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Um in unserem DataFrame `bev` auch den Namen des Bezirks zu haben, wird zuerst, basierend auf der Information in `geo_bezirk` ein Mapping zwischen Iso-Code und Bezirksname erstellt. Dieses wird dann auf den Iso-Code des Bezirks in `bev` angewendet." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "{'301': 'Krems an der Donau (Stadt)',\n", " '302': 'St.Poelten (Stadt)',\n", " '303': 'Waidhofen an der Ybbs (Stadt)',\n", " '304': 'Wiener Neustadt (Stadt)',\n", " '305': 'Amstetten',\n", " '306': 'Baden',\n", " '307': 'Bruck an der Leitha',\n", " '308': 'Gaenserndorf',\n", " '309': 'Gmuend',\n", " '310': 'Hollabrunn',\n", " '311': 'Horn',\n", " '312': 'Korneuburg',\n", " '313': 'Krems (Land)',\n", " '314': 'Lilienfeld',\n", " '315': 'Melk',\n", " '316': 'Mistelbach',\n", " '317': 'Moedling',\n", " '318': 'Neunkirchen',\n", " '319': 'Sankt Poelten (Land)',\n", " '320': 'Scheibbs',\n", " '321': 'Tulln',\n", " '322': 'Waidhofen an der Thaya',\n", " '323': 'Wiener Neustadt (Land)',\n", " '324': 'Wien-Umgebung',\n", " '325': 'Zwettl'}" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "map_iso_bezirk = {data['properties']['iso']: data['properties']['name'] for data in geo_bezirk['features']}\n", "map_iso_bezirk" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
iso_gemeindename_gemeindeAGE_GROUPPOP_TOTALPOP_MALEPOP_FEMALEYEARiso_bezirkalter_durchschnittname_bezirk
030101Krems an der Donau5_995048646420153017.0Krems an der Donau (Stadt)
130101Krems an der Donau10_14971484487201530112.0Krems an der Donau (Stadt)
230101Krems an der Donau15_191184591593201530117.0Krems an der Donau (Stadt)
\n", "
" ], "text/plain": [ " iso_gemeinde name_gemeinde AGE_GROUP POP_TOTAL POP_MALE POP_FEMALE \\\n", "0 30101 Krems an der Donau 5_9 950 486 464 \n", "1 30101 Krems an der Donau 10_14 971 484 487 \n", "2 30101 Krems an der Donau 15_19 1184 591 593 \n", "\n", " YEAR iso_bezirk alter_durchschnitt name_bezirk \n", "0 2015 301 7.0 Krems an der Donau (Stadt) \n", "1 2015 301 12.0 Krems an der Donau (Stadt) \n", "2 2015 301 17.0 Krems an der Donau (Stadt) " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bev['name_bezirk'] = bev.iso_bezirk.map(map_iso_bezirk)\n", "bev[:3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Karten erstellen\n", "\n", "Dank `folium` ist das recht einfach. Zuerst wird eine Karte erstellt, hier muss man die Koordinaten und den Zoomlevel angeben. Dann wird die Bezirksinformation aus dem GeoJSON File hinzugefügt. Und dann noch die Gemeindeinformation, wobei dort noch eine Formatierung (Farbe, Strichbreite, ..) angegeben wird. Zum Schluss zeige die Karte im Notebook an. Die Karte ist interaktiv, sprich man kann darin zoomen, verschieben, ..." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m = folium.Map(location=[48.2,15.8], zoom_start=8)\n", "folium.GeoJson(geo_bezirk).add_to(m)\n", "folium.GeoJson(\n", " geo_gemeinde,\n", " style_function=lambda feature:{\n", " 'fillColor': 'red',\n", " 'color' : 'black',\n", " 'weight' : 2,\n", " 'dashArray' : '5, 5'\n", " }\n", ").add_to(m)\n", "m" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Daten hinzufügen\n", "\n", "## Einwohner pro Bezirk\n", "\n", "Nun sollen Daten zu den Karten hinzugefügt werden. Als erstes Beispiel sollen die Bezirke basierend auf der Einwohneranzahl eingefärbt werden. Auch das ist mit `folium` recht einfach. Zuerst erzeuge ich einen Datenframe mit der entsprechenden Information pro Bezirk. Mit dem Befehl `choropleth` lassen sich die Daten dann mit der Karte verbinden. Am Schluss speichere ich die Karte noch in einer extra html-Datei für meinen Blog." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
iso_bezirkname_bezirkPOP_TOTAL
2303Waidhofen an der Ybbs (Stadt)11306
0301Krems an der Donau (Stadt)24011
13314Lilienfeld26074
\n", "
" ], "text/plain": [ " iso_bezirk name_bezirk POP_TOTAL\n", "2 303 Waidhofen an der Ybbs (Stadt) 11306\n", "0 301 Krems an der Donau (Stadt) 24011\n", "13 314 Lilienfeld 26074" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = bev[bev.YEAR == 2015].groupby(['iso_bezirk', 'name_bezirk'])[['POP_TOTAL']].sum().reset_index()\n", "df = df.sort_values(by='POP_TOTAL')\n", "df[:3]" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
iso_bezirkname_bezirkPOP_TOTAL
16317Moedling116878
23324Wien-Umgebung118691
5306Baden141750
\n", "
" ], "text/plain": [ " iso_bezirk name_bezirk POP_TOTAL\n", "16 317 Moedling 116878\n", "23 324 Wien-Umgebung 118691\n", "5 306 Baden 141750" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[-3:]" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\uniqu\\Anaconda3\\lib\\site-packages\\ipykernel\\__main__.py:6: FutureWarning: 'threshold_scale' default behavior has changed. Now you get a linear scale between the 'min' and the 'max' of your data. To get former behavior, use folium.utilities.split_six.\n" ] }, { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m = folium.Map(location=[48.2, 15.8], zoom_start=8)\n", "m.choropleth(geo_str=geo_bezirk,\n", " data=df,\n", " columns=['iso_bezirk', 'POP_TOTAL'],\n", " key_on='feature.properties.iso',\n", " fill_color='YlOrRd', fill_opacity=0.7, line_opacity=0.3\n", ")\n", "m.save('01.html')\n", "m" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Frauen vs Männer\n", "\n", "In welchem Bezirk wohnen prozentuell am meisten Frauen, wo am wenigsten?" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
iso_gemeindename_gemeindeAGE_GROUPPOP_TOTALPOP_MALEPOP_FEMALEYEARiso_bezirkalter_durchschnittname_bezirk
030101Krems an der Donau5_995048646420153017.0Krems an der Donau (Stadt)
130101Krems an der Donau10_14971484487201530112.0Krems an der Donau (Stadt)
230101Krems an der Donau15_191184591593201530117.0Krems an der Donau (Stadt)
\n", "
" ], "text/plain": [ " iso_gemeinde name_gemeinde AGE_GROUP POP_TOTAL POP_MALE POP_FEMALE \\\n", "0 30101 Krems an der Donau 5_9 950 486 464 \n", "1 30101 Krems an der Donau 10_14 971 484 487 \n", "2 30101 Krems an der Donau 15_19 1184 591 593 \n", "\n", " YEAR iso_bezirk alter_durchschnitt name_bezirk \n", "0 2015 301 7.0 Krems an der Donau (Stadt) \n", "1 2015 301 12.0 Krems an der Donau (Stadt) \n", "2 2015 301 17.0 Krems an der Donau (Stadt) " ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bev[:3]" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
iso_bezirkname_bezirkPOP_MALEPOP_FEMALEFEMALE_RATIO
24325Zwettl215242141899.507526
19320Scheibbs2051820552100.165708
4305Amstetten5645557050101.053937
\n", "
" ], "text/plain": [ " iso_bezirk name_bezirk POP_MALE POP_FEMALE FEMALE_RATIO\n", "24 325 Zwettl 21524 21418 99.507526\n", "19 320 Scheibbs 20518 20552 100.165708\n", "4 305 Amstetten 56455 57050 101.053937" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = bev[bev.YEAR == 2015].groupby(['iso_bezirk', 'name_bezirk'])[['POP_MALE', 'POP_FEMALE']].sum().reset_index()\n", "df['FEMALE_RATIO'] = df.POP_FEMALE / df.POP_MALE * 100\n", "df = df.sort_values('FEMALE_RATIO')\n", "df[:3]" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
iso_bezirkname_bezirkPOP_MALEPOP_FEMALEFEMALE_RATIO
23324Wien-Umgebung5763861053105.924911
0301Krems an der Donau (Stadt)1164412367106.209206
16317Moedling5615360725108.142040
\n", "
" ], "text/plain": [ " iso_bezirk name_bezirk POP_MALE POP_FEMALE FEMALE_RATIO\n", "23 324 Wien-Umgebung 57638 61053 105.924911\n", "0 301 Krems an der Donau (Stadt) 11644 12367 106.209206\n", "16 317 Moedling 56153 60725 108.142040" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[-3:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Man sieht also, dass in Zwettl auf 100 Männer nur 99.5 Frauen kommen, während das in Mödling 108 sind. Auf einer Landkarte sieht das wie folgt aus." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\uniqu\\Anaconda3\\lib\\site-packages\\ipykernel\\__main__.py:6: FutureWarning: 'threshold_scale' default behavior has changed. Now you get a linear scale between the 'min' and the 'max' of your data. To get former behavior, use folium.utilities.split_six.\n" ] }, { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m = folium.Map(location=[48.2, 15.8], zoom_start=8)\n", "m.choropleth(geo_str=geo_bezirk,\n", " data=df,\n", " columns=['iso_bezirk', 'FEMALE_RATIO'],\n", " key_on='feature.properties.iso',\n", " fill_color='YlOrRd', fill_opacity=0.7, line_opacity=0.3)\n", "m.save('02.html')\n", "m" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Durchschnittliches Alter\n", "\n", "In welcher Gemeinde ist das durchschnittliche Alter am höchsten, wo leben die durchschnittlich jüngsten Leute?" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
iso_gemeindename_gemeindeAGE_GROUPPOP_TOTALPOP_MALEPOP_FEMALEYEARiso_bezirkalter_durchschnittname_bezirk
030101Krems an der Donau5_995048646420153017.0Krems an der Donau (Stadt)
130101Krems an der Donau10_14971484487201530112.0Krems an der Donau (Stadt)
230101Krems an der Donau15_191184591593201530117.0Krems an der Donau (Stadt)
\n", "
" ], "text/plain": [ " iso_gemeinde name_gemeinde AGE_GROUP POP_TOTAL POP_MALE POP_FEMALE \\\n", "0 30101 Krems an der Donau 5_9 950 486 464 \n", "1 30101 Krems an der Donau 10_14 971 484 487 \n", "2 30101 Krems an der Donau 15_19 1184 591 593 \n", "\n", " YEAR iso_bezirk alter_durchschnitt name_bezirk \n", "0 2015 301 7.0 Krems an der Donau (Stadt) \n", "1 2015 301 12.0 Krems an der Donau (Stadt) \n", "2 2015 301 17.0 Krems an der Donau (Stadt) " ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bev[:3]" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
iso_gemeindename_gemeindealter_durchschnitt
44832010Reinsberg37.579310
2130522Oed-Oehling37.845361
28231528Nöchling38.101289
\n", "
" ], "text/plain": [ " iso_gemeinde name_gemeinde alter_durchschnitt\n", "448 32010 Reinsberg 37.579310\n", "21 30522 Oed-Oehling 37.845361\n", "282 31528 Nöchling 38.101289" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = bev[bev.YEAR == 2015].groupby(['iso_gemeinde', 'name_gemeinde'])[['POP_TOTAL', 'alter_durchschnitt']].\\\n", " apply(lambda x: sum(x['POP_TOTAL'] * x['alter_durchschnitt']) / sum(x['POP_TOTAL']))\n", "df = pd.DataFrame(df, columns=['alter_durchschnitt']).reset_index().sort_values('alter_durchschnitt')\n", "df[:3]" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
iso_gemeindename_gemeindealter_durchschnitt
14630925Litschau49.480440
36031805Breitenstein49.651515
18731113Langau50.958021
\n", "
" ], "text/plain": [ " iso_gemeinde name_gemeinde alter_durchschnitt\n", "146 30925 Litschau 49.480440\n", "360 31805 Breitenstein 49.651515\n", "187 31113 Langau 50.958021" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[-3:]" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\uniqu\\Anaconda3\\lib\\site-packages\\ipykernel\\__main__.py:6: FutureWarning: 'threshold_scale' default behavior has changed. Now you get a linear scale between the 'min' and the 'max' of your data. To get former behavior, use folium.utilities.split_six.\n" ] }, { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m = folium.Map(location=[48.2, 15.8], zoom_start=8)\n", "m.choropleth(geo_str=geo_gemeinde,\n", " data=df,\n", " columns=['iso_gemeinde', 'alter_durchschnitt'],\n", " key_on='feature.properties.iso',\n", " fill_color='YlOrRd', fill_opacity=0.7, line_opacity=0.3)\n", "m.save('03.html')\n", "m" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Kinder\n", "\n", "In welchem Bezirk leben prozentuell die meisten Kinder (jünger als 10 Jahre)?" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
iso_gemeindename_gemeindeAGE_GROUPPOP_TOTALPOP_MALEPOP_FEMALEYEARiso_bezirkalter_durchschnittname_bezirk
030101Krems an der Donau5_995048646420153017.0Krems an der Donau (Stadt)
130101Krems an der Donau10_14971484487201530112.0Krems an der Donau (Stadt)
230101Krems an der Donau15_191184591593201530117.0Krems an der Donau (Stadt)
\n", "
" ], "text/plain": [ " iso_gemeinde name_gemeinde AGE_GROUP POP_TOTAL POP_MALE POP_FEMALE \\\n", "0 30101 Krems an der Donau 5_9 950 486 464 \n", "1 30101 Krems an der Donau 10_14 971 484 487 \n", "2 30101 Krems an der Donau 15_19 1184 591 593 \n", "\n", " YEAR iso_bezirk alter_durchschnitt name_bezirk \n", "0 2015 301 7.0 Krems an der Donau (Stadt) \n", "1 2015 301 12.0 Krems an der Donau (Stadt) \n", "2 2015 301 17.0 Krems an der Donau (Stadt) " ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bev[:3]" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Die Altersgruppe 5-9 hat das durchschnittliche Alter 7. Daher filtere ich für die Anzahl der Kinder, \n", "# nach dem durchschnittlichen Alter <= Variable jung = 7\n", "jung = 7\n", "\n", "df = bev[bev.YEAR == 2015].groupby(['iso_bezirk', 'name_bezirk'])[['POP_TOTAL', 'alter_durchschnitt']].\\\n", " apply(lambda x: sum(x['POP_TOTAL'] * (x['alter_durchschnitt'] <= jung)) / sum(x['POP_TOTAL']) * 100)\n", "df = pd.DataFrame(df, columns=['prozent_jung']).reset_index().sort_values('prozent_jung')" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\uniqu\\Anaconda3\\lib\\site-packages\\ipykernel\\__main__.py:6: FutureWarning: 'threshold_scale' default behavior has changed. Now you get a linear scale between the 'min' and the 'max' of your data. To get former behavior, use folium.utilities.split_six.\n" ] }, { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m = folium.Map(location=[48.2, 15.8], zoom_start=8)\n", "m.choropleth(geo_str=geo_bezirk,\n", " data=df,\n", " columns=['iso_bezirk', 'prozent_jung'],\n", " key_on='feature.properties.iso',\n", " fill_color='YlOrRd', fill_opacity=0.7, line_opacity=0.3)\n", "m.save('04.html')\n", "m" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Wachstum\n", "\n", "Welche Gemeinden sind am meisten gewachsten bzw. geschrumpft?" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
iso_gemeindename_gemeindeAGE_GROUPPOP_TOTALPOP_MALEPOP_FEMALEYEARiso_bezirkalter_durchschnittname_bezirk
030101Krems an der Donau5_995048646420153017.0Krems an der Donau (Stadt)
130101Krems an der Donau10_14971484487201530112.0Krems an der Donau (Stadt)
230101Krems an der Donau15_191184591593201530117.0Krems an der Donau (Stadt)
\n", "
" ], "text/plain": [ " iso_gemeinde name_gemeinde AGE_GROUP POP_TOTAL POP_MALE POP_FEMALE \\\n", "0 30101 Krems an der Donau 5_9 950 486 464 \n", "1 30101 Krems an der Donau 10_14 971 484 487 \n", "2 30101 Krems an der Donau 15_19 1184 591 593 \n", "\n", " YEAR iso_bezirk alter_durchschnitt name_bezirk \n", "0 2015 301 7.0 Krems an der Donau (Stadt) \n", "1 2015 301 12.0 Krems an der Donau (Stadt) \n", "2 2015 301 17.0 Krems an der Donau (Stadt) " ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bev[:3]" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
iso_gemeindename_gemeindedelta_pop_total
17031038Retzbach-6.945766
25431409Ramsau-6.888634
19031119Röhrenbach-6.842105
\n", "
" ], "text/plain": [ " iso_gemeinde name_gemeinde delta_pop_total\n", "170 31038 Retzbach -6.945766\n", "254 31409 Ramsau -6.888634\n", "190 31119 Röhrenbach -6.842105" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = bev.groupby(['iso_gemeinde', 'name_gemeinde'])[['POP_TOTAL', 'YEAR']].\\\n", " apply(lambda x: \n", " (sum(x['POP_TOTAL'] * (x['YEAR'] == 2015)) / sum(x['POP_TOTAL'] * (x['YEAR'] == 2012)) - 1 ) * 100)\n", "df = pd.DataFrame(df, columns=['delta_pop_total']).reset_index().sort_values('delta_pop_total')\n", "df[:3]" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
iso_gemeindename_gemeindedelta_pop_total
33631701Achau12.881916
8930802Andlersdorf13.953488
12730858Untersiebenbrunn14.356436
\n", "
" ], "text/plain": [ " iso_gemeinde name_gemeinde delta_pop_total\n", "336 31701 Achau 12.881916\n", "89 30802 Andlersdorf 13.953488\n", "127 30858 Untersiebenbrunn 14.356436" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[-3:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mit `folium` kann man die Farbpalette genau angeben. Hier verwenden wir einen Verlauf von rot (starker Rückgang) über weiß hin zu grün (starkes Wachstum)." ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "-6.94576593720266414.356435643564346" ], "text/plain": [ "" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "colormap = folium.colormap.LinearColormap(('red', 'white', 'green'), \n", " index=(df.delta_pop_total.min(), 0, df.delta_pop_total.max()),\n", " vmin = df.delta_pop_total.min(), vmax=df.delta_pop_total.max())\n", "colormap.caption = 'Bevölkerungsentwicklung in %, 2012 vs 2015'\n", "colormap" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Um die Karte zu erstellen, benötigt man eine Funktion, die basierend auf dem ISO-Code der Gemeinde die entsprechende Farbe ausgibt." ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def colormap_iso(iso):\n", " delta_pop_total = float(df.loc[df.iso_gemeinde == iso, 'delta_pop_total'])\n", " return colormap(delta_pop_total)" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'#fff5f5'" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "colormap_iso('30506')" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'30506'" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "geo_gemeinde['features'][0]['properties']['iso']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Damit sich das Grün der Farbpalette nicht mit dem Grün in der Karte verwende ich diesmal einen anderen Kartentyp." ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m = folium.Map(location=[48.2, 15.8], zoom_start=8, tiles='cartodbpositron')\n", "folium.GeoJson(geo_gemeinde,\n", " style_function=lambda feature: {\n", " 'fillColor': colormap_iso(feature['properties']['iso']),\n", " 'color' : 'black',\n", " 'weight' : 2,\n", " 'dashArray' : '5, 5'\n", " }).add_to(m)\n", "m.add_children(colormap)\n", "m.save('05.html')\n", "m" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.1" } }, "nbformat": 4, "nbformat_minor": 0 }