{ "cells": [ { "cell_type": "markdown", "id": "0ea02b0c", "metadata": {}, "source": [ "
\n", " \n", "
\n", " \n", "
Run in Google Colab
\n", "
\n", "
\n", " \n", "
\n", " \n", "
Download Notebook
\n", "
\n", "
\n", " \n", "
\n", " \n", "
View on GitHub
\n", "
\n", "
\n", "
\n", "\n", "# Swiss Parliament: Members\n", "\n", "The Federal Chancellery maintains data on the Swiss political system. The curia dataset is publicly available and it provides data on political parties, parliamentary comissions, members of parliament and their affiliations. \n", "\n", "Parliament data is also available as [Linked Data](https://en.wikipedia.org/wiki/Linked_data). " ] }, { "cell_type": "markdown", "id": "06f10cf8", "metadata": {}, "source": [ "## Setup" ] }, { "cell_type": "markdown", "id": "9b91c431", "metadata": {}, "source": [ "### SPARQL endpoint\n", "\n", "Swiss political data can be accessed with [SPARQL queries](https://www.w3.org/TR/rdf-sparql-query/). \n", "You can send queries using HTTP requests. The API endpoint is **[https://lindas.admin.ch/query/](https://lindas.admin.ch/query).** " ] }, { "cell_type": "markdown", "id": "b7a85d46", "metadata": {}, "source": [ "### SPARQL client\n", "\n", "We will use the `SparqlClient` from [graphly](https://github.com/zazuko/graphly) to communicate with the database. \n", "Graphly will allow us to:\n", "* send SPARQL queries\n", "* automatically add prefixes to all queries\n", "* format response to `pandas` or `geopandas`" ] }, { "cell_type": "code", "execution_count": null, "id": "2e5953d3", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import plotly.express as px\n", "import plotly.graph_objects as go\n", "from plotly.subplots import make_subplots\n", "import datetime\n", "import warnings\n", "\n", "from graphly.api_client import SparqlClient\n", "\n", "%matplotlib inline\n", "pd.set_option('display.max_rows', 100)\n", "warnings.filterwarnings('ignore')" ] }, { "cell_type": "code", "execution_count": null, "id": "5abffb11", "metadata": {}, "outputs": [], "source": [ "# Uncomment to install dependencies in Colab environment\n", "#!pip install git+https://github.com/zazuko/graphly.git" ] }, { "cell_type": "markdown", "id": "d06f39ee", "metadata": {}, "source": [] }, { "cell_type": "code", "execution_count": null, "id": "5d2666e4", "metadata": {}, "outputs": [], "source": [ "sparql = SparqlClient(\"https://int.lindas.admin.ch/query\")\n", "\n", "sparql.add_prefixes({\n", " \"schema\": \"\"\n", "})" ] }, { "cell_type": "markdown", "id": "1ffbace4", "metadata": {}, "source": [ "SPARQL queries can become very long. To improve the readibility, we will work wih [prefixes](https://en.wikibooks.org/wiki/SPARQL/Prefixes).\n", " \n", "Using the `add_prefixes` method, we can define persistent prefixes. \n", "Every time you send a query, `graphly` will now automatically add the prefixes for you." ] }, { "cell_type": "markdown", "id": "71526fe9", "metadata": {}, "source": [ "## MEP by gender, age and chamber" ] }, { "cell_type": "code", "execution_count": null, "id": "7b7bde5c", "metadata": {}, "outputs": [], "source": [ "# https://s.zazuko.com/5nEJGu\n", "query = \"\"\"\n", "SELECT ?council ?name ?gender ?age ?start\n", "FROM \n", "FROM \n", "WHERE {\n", " \n", " VALUES ?_council { }\n", " ?member a schema:Person ;\n", " schema:name ?name;\n", " schema:gender ?gender;\n", " schema:birthDate ?birthday;\n", " schema:memberOf ?role.\n", " \n", " ?role a schema:Role;\n", " schema:member ?member;\n", " schema:startDate ?start;\n", " schema:memberOf ?_council.\n", " \n", " ?_council schema:name ?council.\n", " \n", " \n", " FILTER (lang(?council) = \"de\")\n", " FILTER NOT EXISTS { ?role schema:endDate ?end }\n", " \n", " BIND( \"2022-06-16\"^^ as ?today )\n", " BIND(YEAR(?today)-YEAR(?birthday) as ?age)\n", "}\n", "ORDER BY ?age\n", "\"\"\"\n", "\n", "df = sparql.send_query(query)\n", "df = df.replace({\"http://schema.org/Male\": \"Male\", \"http://schema.org/Female\": \"Female\"})" ] }, { "cell_type": "code", "execution_count": null, "id": "bee24a15", "metadata": {}, "outputs": [], "source": [ "bucket=5\n", "df[\"bucket\"] = df.age.apply(lambda x: \"{}-{}\".format((x//bucket)*bucket, (x//bucket+1)*bucket))\n", "counts = df.groupby([\"bucket\", \"gender\"]).size().reset_index(name='count')\n", "\n", "fig = px.bar(counts, x=\"bucket\", y=\"count\", color=\"gender\",\n", " barmode=\"stack\", color_discrete_sequence=[px.colors.qualitative.Plotly[1],px.colors.qualitative.Plotly[0]])\n", "\n", "fig.update_layout(\n", " title='Members of Parliament (as of today)', \n", " title_x=0.5,\n", " yaxis_title=\"Share in %\",\n", " xaxis_title=\"Age\"\n", ")\n", "fig.show()" ] }, { "cell_type": "markdown", "id": "2079a499", "metadata": {}, "source": [ "### Age distribution" ] }, { "cell_type": "code", "execution_count": null, "id": "fa2832e1", "metadata": {}, "outputs": [], "source": [ "fig = make_subplots(rows=2, cols=1, shared_yaxes=True, y_title='Share in %', x_title='Age')\n", "\n", "for i, gender in enumerate(df.gender.unique()[::-1]):\n", " subset=df[df.gender == gender][\"age\"]\n", " fig.append_trace(go.Histogram(x=subset, histnorm='percent', xbins=dict(start=25, end=80,size=5), name=gender), row=i+1, col=1)\n", "fig.update_layout(\n", " title='Members of Parliament (as of today)', \n", " title_x=0.5,\n", " bargap=0.1\n", ")\n", "fig.update_yaxes(range=[0,27.5])\n", "fig.update_xaxes(range=[25,80])\n", "fig.show()" ] }, { "cell_type": "markdown", "id": "7f94955c", "metadata": {}, "source": [ "### Gender distribution by chamber" ] }, { "cell_type": "code", "execution_count": null, "id": "c4d5831c", "metadata": {}, "outputs": [], "source": [ "counts = df.groupby([\"council\", \"gender\"]).size().reset_index(name=\"count\")\n", "\n", "fig = make_subplots(rows=1, cols=2, subplot_titles=counts[\"council\"].unique(), specs=[[{\"type\": \"pie\"}, {\"type\": \"pie\"}]])\n", "\n", "for i, council in enumerate(counts[\"council\"].unique()):\n", " fig.add_trace(go.Pie(\n", " values=counts[counts.council == council][\"count\"],\n", " labels=counts[counts.council == council][\"gender\"]\n", " ), row=1, col=i+1)\n", "\n", "fig.update_annotations(yshift=-280)\n", "fig.update_layout(height=400, title={\"text\": \"Members of Parliament (as of today)\", \"x\": 0.5})\n", "fig.show()" ] }, { "cell_type": "markdown", "id": "2804c846", "metadata": {}, "source": [ "## MEP gender over time" ] }, { "cell_type": "code", "execution_count": null, "id": "c6feb602", "metadata": {}, "outputs": [], "source": [ "# https://s.zazuko.com/3VYuVk\n", "query = \"\"\"\n", "SELECT ?name ?gender ?birthday ?start ?end\n", "FROM \n", "FROM \n", "WHERE {\n", " \n", " VALUES ?_council { }\n", " ?member a schema:Person ;\n", " schema:name ?name;\n", " schema:gender ?gender;\n", " schema:birthDate ?birthday;\n", " schema:memberOf ?role.\n", " \n", " ?role a schema:Role;\n", " schema:member ?member;\n", " schema:startDate ?start;\n", " schema:memberOf ?_council.\n", " \n", " OPTIONAL {?role schema:endDate ?end.}\n", " \n", "}\n", "ORDER BY ?start\n", "\"\"\"\n", "\n", "df = sparql.send_query(query)\n", "df.end = df.end.fillna(datetime.datetime.now())" ] }, { "cell_type": "code", "execution_count": null, "id": "e84300bf", "metadata": {}, "outputs": [], "source": [ "m = []\n", "f = []\n", "yrs = range(1965,2021)\n", "\n", "for year in yrs:\n", " \n", " point_in_time = datetime.datetime(year, 7, 1, 12, 0, 0)\n", " subset = df[(df.start <= point_in_time) & (point_in_time < df.end)]\n", " counts = subset.gender.value_counts()\n", " m.append(counts[\"http://schema.org/Male\"])\n", " if \"http://schema.org/Female\" in counts:\n", " f.append(counts[\"http://schema.org/Female\"])\n", " else:\n", " f.append(0)\n", "\n", "res = pd.DataFrame(data={\"year\": yrs, \"male\": m, \"female\": f})\n", "res[\"f_share\"] = res[\"female\"]/(res[\"female\"] + res[\"male\"])" ] }, { "cell_type": "code", "execution_count": null, "id": "a0954e6e", "metadata": {}, "outputs": [], "source": [ "fig = make_subplots(rows=1, cols=1)\n", "\n", "fig.append_trace(go.Bar(x=res[\"year\"],y=(1-res[\"f_share\"])*100,\n", " name=\"male\",\n", " marker_color=px.colors.qualitative.Plotly[0]), row=1, col=1)\n", "\n", "fig.append_trace(go.Bar(x=res[\"year\"],y=(res[\"f_share\"])*100,\n", " name=\"female\",\n", " marker_color=px.colors.qualitative.Plotly[1]), row=1, col=1)\n", "\n", "fig['layout']['yaxis']['title']='MPs share in %'\n", "fig['layout']['yaxis']['range']= [0,100]\n", "fig.update_layout(height=800, title={\"text\": \"Female in Swiss Parliment\", \"x\": 0.5}, barmode = \"stack\",\n", " legend = {\"x\": 1, \"y\": 0.37})\n", "fig.show()" ] }, { "cell_type": "markdown", "id": "90b90494", "metadata": {}, "source": [ "## MEP: age over time" ] }, { "cell_type": "code", "execution_count": null, "id": "201fb7b9", "metadata": {}, "outputs": [], "source": [ "yrs = range(1850, 2021, 4)\n", "#bucket=10\n", "#all_buckets = set(range(2,9))\n", "bucket=5\n", "all_buckets = set(range(5,17))\n", "bucket2age = {i: \"{}-{}\".format(i*bucket, (i+1)*bucket-1) for i in all_buckets}\n", "\n", "res = pd.DataFrame()\n", "average_age = pd.DataFrame(columns=[\"year\", \"Male\", \"Female\"])\n", "for year in yrs:\n", " \n", " point_in_time = datetime.datetime(year, 7, 1, 12, 0, 0)\n", " subset = df[(df.start <= point_in_time) & (point_in_time < df.end)]\n", " subset.loc[:,\"age\"] = point_in_time.year - subset.birthday.dt.year\n", " males_only = subset['gender'] == 'http://schema.org/Male'\n", " average_age_male = subset.loc[males_only, 'age'].mean()\n", " females_only = subset['gender'] == 'http://schema.org/Female'\n", " average_age_female = subset.loc[females_only, 'age'].mean()\n", " #average_age = subset[\"age\"].mean()\n", " subset.loc[:,\"bucket\"] = subset.age.apply(lambda x: x//bucket)\n", " grouped = subset.groupby([\"bucket\"]).size().reset_index(name='count')\n", " grouped.loc[:, \"count\"] = grouped[\"count\"]/(grouped[\"count\"].sum())*100\n", " for b in all_buckets.difference(set(grouped.bucket.unique())):\n", " grouped = grouped.append({\"bucket\": b, \"count\": 0.0}, ignore_index=True)\n", " grouped.loc[:, \"year\"] = year\n", " res = res.append(grouped)\n", " average_age.loc[len(average_age.index)] = [year, average_age_male, average_age_female]\n", " \n", "res = res.sort_values(by=[\"year\", \"bucket\"]).reset_index(drop=True)\n", "res.loc[:,\"age\"] = res[\"bucket\"].replace(bucket2age)\n", " " ] }, { "cell_type": "code", "execution_count": null, "id": "afda0db1", "metadata": {}, "outputs": [], "source": [ "fig = px.bar(res, x=\"age\", y=\"count\", animation_frame=\"year\", range_y=[0,35])\n", "fig.update_layout(\n", " title='Age Distribution', \n", " title_x=0.5,\n", " yaxis_title=\"Share in %\",\n", " xaxis_title=\"Age\",\n", ")\n", "fig.show()\n", "\n", "fig_average = px.bar(average_age, x=\"year\", y=[\"Male\", \"Female\"], barmode=\"group\")\n", "fig_average.update_layout(\n", " title='Age over time', \n", " title_x=0.5,\n", " yaxis_title=\"Average Age\",\n", " xaxis_title=\"Year\",\n", " legend=dict(title=None)\n", ")\n", "fig_average.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.0" }, "title": "Members of Parliament", "vscode": { "interpreter": { "hash": "52612798f5eced5fbe3f58e9481faf8401bf1ada0bff399e6095e3d5ee493763" } } }, "nbformat": 4, "nbformat_minor": 5 }