{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Análise da actividade parlamentar das XIV Legislatura: Dezembro de 2020" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introdução" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Este bloco de notas é uma actualização ao trabalho inicial que pode ser acedido pelos atalhos acima e que descreve de forma detalhada todas as opções, com visibilidade de todo o código, explicação dos algoritmos utilizados e todo o processo de tratamento e exploração de dados; cobre toda a legistlatura até ao fim de 2020, cobrindo assim a votação do Orçamento de Estado - a versão inicial foi feita antes dessas votações.**\n", "\n", "O posicionamento absoluto e relativo dos vários partidos políticos no Parlamento português tem sido motivo de interesse redobrado nos últimos anos. A eleição de deputados de partidos sem anterior presença parlamentar tem alimentado o debate cujas implicações ideológicas foram vísiveis de forma bastante prática na problemática em torno da escolha de lugares: partidos desagradados com o lugar atribuído (“Iniciativa Liberal Descontente Com Lugar Atribuído a Deputado No Parlamento - TSF” 2020), dificuldades gerais em termos de arrumação dos deputados (Renascença 2019), questões de ordem mais ou menos prática em torno de acessos (Almeida 2019), enfim, várias dimensões para uma questão que acaba por revelar a importância simbólica do posicionamento absoluto e relativo de cada partido no hemiciclo.\n", "\n", "Esta questão não é particularmente nova (Lourenço 2020), colocando-se em maior ou em menor grau com a entrada de novos partidos e a consequente necessidade de tomada de posição por parte do recém-chegado partido e a harmonização (possível) com os restantes, sendo que a sua posterior actividade parlamentar (nas suas diversas vertentes) poderá ou não alinhar-se com a sua auto-identificação (reflectida ou não nos lugares no hemiciclo).\n", "\n", "O ponto de partida para esta análise foi precisamente tentar descobrir se exclusivamente com base na actividade parlamentar, e em concreto no registo de votações, é possível estabelecer relações de proximidade e distância que permitam um agrupamento que não dependa de classificações a priori, e se sim, de que forma estes agrupamentos confirmam ou divergem da percepção existente?\n", "\n", "A utilização de dados abertos disponibilizados pelo Parlamento torna esta análise substancialmente mais simples, embora não sem a necessidade de tratamento e validação dos dados; de um ponto de vista prático este bloco de notas demonstra como aceder e transformar os dados de uma forma que pode ser útil para outras análises. No cenário nacional referência para a iniciativa http://hemiciclo.pt que, em linha com iniciativas europeias semelhantes, fornecesse um interface para um maior escrutinio da actividade parlamentar e um conjunto alargado de indicadores directos e indirectos do maior interesse (Sapage 2020). O presente trabalho tem alguns pontos de contacto com esta iniciativa, dentro dos limites que o seu objectivo pedagógico estabelece.\n", "\n", "A combinação de dados abertos com um bloco de notas Jupyter permite que o leitor tenha visibilidade dos vários passos e transformações (Randles et al. 2017), o que pode por vezes apresentar uma excessiva complexidade para quem não tenha familiaridade com programação; tentámos obviar esta limitação através da descrição das várias acções de forma a que se possa seguir a lógica e fruir dos resultados. Esta transparência assume uma dimensão adicional tendo em conta a temática que nos proposmos analisar, embora seja importante de forma tranversal (sobre a importância da repetibilidade, rastreabilidade, acesso e o papel de blocos Jupyter no contexto de open science ver, entre outros, exemplos em ecologia (Powers and Hampton 2019) astronomia (Wofford et al. 2019)).\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Metodologia" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Com base nos dados disponibilizados pela Assembleia da República em formato XML [DadosAbertos] são criadas _dataframes_ (tabelas de duas dimensões) com base na selecção de informação relativa aos padrões de votação de cada partido (e/ou deputados não-inscritos).\n", "\n", "São fundamentalmente feitas as seguintes análises:\n", "\n", "1. Vista geral das votações de cada partido, visualizado através de um _heatmap_\n", "2. Matriz de distância entre todos os partidos e dendograma\n", "3. Identificação de grupos (_spectral clustering_) e visualização das distâncias num espaço cartesiano (_multidimensional scaling_)\n", "\n", "\n", "O tratamento prévio dos dados em formato XML é feito de forma a seleccionar as votações de cada partido (ou deputado não inscrito); este processo tem alguma complexidade que se prende com o próprio processo de votação parlamentar, com múltiplas sessões e votações, pelo que foram \n", "\n", "De forma acessória são também feitas algumas análises adicionais, já mais removidas do objectivo central de determinação do distânciamento mas que complementam o quadro geral do que é possível." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Obtenção e tratamento dos dados\n", "\n", "Esta fase é fundamental para toda a restante análise: é onde obtemos os dados e os transformamos em informação num formato que pode ser facilmente manipulado." ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [], "source": [ "!pip3 install --user -q itables matplotlib pandas bs4 html5lib lxml seaborn sklearn pixiedust\n", "\n", "%matplotlib inline\n", "\n", "from itables import show\n", "import itables.options as opt\n", "\n", "opt.maxColumns=100\n", "opt.maxRows=2000\n", "opt.lengthMenu = [10, 20, 50, 100, 200, 500]\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Obtenção do ficheiro e conversão para dataframe" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from urllib.request import urlopen\n", "import xml.etree.ElementTree as ET\n", "\n", "ini_url = 'http://app.parlamento.pt/webutils/docs/doc.xml?path=6148523063446f764c324679626d56304c3239775a57356b595852684c3052685a47397a51574a6c636e52766379394a626d6c6a6157463061585a68637939595356596c4d6a424d5a57647063327868644856795953394a626d6c6a6157463061585a686331684a566935346257773d&fich=IniciativasXIV.xml&Inline=true'\n", "ini_tree = ET.parse(urlopen(ini_url))" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "from bs4 import BeautifulSoup\n", "import re\n", "\n", "## Iteract through the existing dict\n", "def party_from_votes (votes):\n", " \"\"\"\n", " Determines the position of a party based on the majority position by summing all the individual votes.\n", " Argument is a dictionary returned by parse_voting()\n", " Returns a dictionary with the majority position of each party\n", " \"\"\"\n", " party_vote = {}\n", " for k, v in votes.items():\n", " ## Erase the name of the MP and keep the party only\n", " ## only when it's not from the \"Ninsc\" group - \n", " ## these need to be differentiated by name\n", " if re.match(\".*\\(Ninsc\\)\" , k) is None:\n", " nk = re.sub(r\".*\\((.+?)\\).*\", r\"\\1\", k)\n", " else:\n", " nk = k\n", " ## If it's the first entry for a key, create it\n", " if nk not in party_vote:\n", " party_vote[nk] = [0,0,0]\n", " ## Add to a specific index in a list\n", " if v == \"A Favor\":\n", " party_vote[nk][0] += 1\n", " elif v == \"Abstenção\":\n", " party_vote[nk][1] += 1\n", " elif v == \"Contra\":\n", " party_vote[nk][2] += 1\n", " for k,v in party_vote.items():\n", " party_vote[k]=[\"A Favor\", \"Abstenção\", \"Contra\"][v.index(max(v))]\n", " return party_vote\n", "\n", "def parse_voting(v_str):\n", " \"\"\"Parses the voting details in a string and returns a dict.\n", " \n", " Keyword arguments:\n", " \n", " v_str: a string with the description of the voting behaviour.\n", " \"\"\"\n", " ## Split by the HTML line break and put it in a dict\n", " d = dict(x.split(':') for x in v_str.split('
'))\n", " ## Remove the HTML tags\n", " for k, v in d.items():\n", " ctext = BeautifulSoup(v, \"lxml\")\n", " d[k] = ctext.get_text().strip().split(\",\")\n", " ## Invert the dict to get a 1-to-1 mapping\n", " ## and trim it\n", " votes = {}\n", " if len(v_str) < 1000: # Naive approach but realistically speaking... works well enough.\n", " for k, v in d.items():\n", " for p in v:\n", " if (p != ' ' and # Bypass empty entries\n", " re.match(\"[0-9]+\", p.strip()) is None and # Bypass quantified divergent voting patterns\n", " (re.match(\".*\\w +\\(.+\\)\", p.strip()) is None or # Bypass individual votes...\n", " re.match(\".*\\(Ninsc\\)\" , p.strip()) is not None)): # ... except when coming from \"Ninsc\"\n", " #print(\"|\"+ p.strip() + \"|\" + \":\\t\" + k)\n", " votes[p.strip()] = k\n", " else: # This is a nominal vote since the size of the string is greater than 1000\n", " for k, v in d.items():\n", " for p in v:\n", " if p != ' ':\n", " votes[p.strip()] = k\n", " ## Call the auxiliary function to produce the party position based on the majority votes\n", " votes = party_from_votes(votes)\n", " return votes" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "textn" ] } ], "source": [ "import collections\n", "\n", "root = ini_tree\n", "\n", "counter=0\n", "\n", "## We will build a dataframe from a list of dicts\n", "## Inspired by the approach of Chris Moffitt here https://pbpython.com/pandas-list-dict.html\n", "init_list = []\n", "\n", "for voting in ini_tree.findall(\".//pt_gov_ar_objectos_VotacaoOut\"):\n", " votep = voting.find('./detalhe')\n", " if votep is not None:\n", " init_dict = collections.OrderedDict()\n", " counter +=1 \n", " init_dict['id'] = voting.find('id').text\n", " ## Add the \"I\" for Type to mark this as coming from \"Iniciativas\"\n", " init_dict['Tipo'] = \"I\"\n", " for c in voting:\n", " if c.tag == \"detalhe\":\n", " for party, vote in parse_voting(c.text).items():\n", " init_dict[party] = vote \n", " elif c.tag == \"descricao\":\n", " init_dict[c.tag] = c.text\n", " elif c.tag == \"ausencias\":\n", " init_dict[c.tag] = c.find(\"string\").text\n", " else:\n", " init_dict[c.tag] = c.text\n", " init_list.append(init_dict)\n", " ## Provide progression feedback\n", " print('.', end='')\n", " \n", "print(counter)" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "ini_df = pd.DataFrame(init_list)\n", "#print(ini_df.shape)\n", "#ini_df.head()\n" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [], "source": [ "## Copy Livre voting record to new aggregate columns...\n", "ini_df[\"L/JKM\"] = ini_df[\"L\"]\n", "## ... and fill the NAs with JKM voting record.\n", "#ini_df[\"L/JKM\"] = ini_df[\"L/JKM\"].fillna(ini_df[\"Joacine Katar Moreira (Ninsc)\"])\n", "#ini_df[[\"descricao\",\"L\",\"Joacine Katar Moreira (Ninsc)\",\"L/JKM\"]]" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [], "source": [ "## Copy PAN voting record to new aggregate columns...\n", "ini_df[\"PAN/CR\"] = ini_df[\"PAN\"]\n", "## ... and update/replace with CR voting where it exists\n", "#ini_df[\"PAN/CR\"].update(ini_df[\"Cristina Rodrigues (Ninsc)\"])\n", "#ini_df[[\"descricao\",\"PAN\",\"Cristina Rodrigues (Ninsc)\",\"PAN/CR\"]]\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Actividades" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "act_url = 'http://app.parlamento.pt/webutils/docs/doc.xml?path=6148523063446f764c324679626d56304c3239775a57356b595852684c3052685a47397a51574a6c636e52766379394264476c32615752685a47567a4c31684a566955794d45786c5a326c7a6247463064584a684c30463061585a705a47466b5a584e595356597565473173&fich=AtividadesXIV.xml&Inline=true'\n", "act_tree = ET.parse(urlopen(act_url))" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "............................................................................................................................................................................................................................................................................268\n" ] } ], "source": [ "import re\n", "import collections\n", "\n", "root = act_tree\n", "\n", "counter=0\n", "\n", "## We will build a dataframe from a list of dicts\n", "## Inspired by the approach of Chris Moffitt here https://pbpython.com/pandas-list-dict.html\n", "act_list = []\n", "\n", "def get_toplevel_desc (vid, tree):\n", " \"\"\"\n", " Gets the top-level title from a voting id\n", " \"\"\"\n", " for c in tree.find(\".//pt_gov_ar_objectos_VotacaoOut/[id='\"+ vid +\"']/../..\"):\n", " if c.tag == \"assunto\":\n", " return c.text\n", "\n", "for voting in act_tree.findall(\".//pt_gov_ar_objectos_VotacaoOut\"):\n", " act_dict = collections.OrderedDict()\n", " counter +=1\n", " votep = voting.find('./detalhe')\n", " if votep is not None:\n", " act_dict['id'] = voting.find('id').text\n", " ## Add the \"A\" for Type to mark this as coming from \"Iniciativas\"\n", " act_dict['Tipo'] = \"A\"\n", " for c in voting:\n", " if c.tag == \"id\":\n", " act_dict['descricao'] = get_toplevel_desc(c.text, act_tree)\n", " if c.tag == \"detalhe\":\n", " for party, vote in parse_voting(c.text).items():\n", " act_dict[party] = vote \n", " elif c.tag == \"ausencias\":\n", " act_dict[c.tag] = c.find(\"string\").text\n", " else:\n", " act_dict[c.tag] = c.text\n", " act_list.append(act_dict)\n", " ## Provide progression feedback\n", " print('.', end='')\n", "\n", "print(counter)" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "act_df = pd.DataFrame(act_list)\n", "#print(act_df.shape)\n", "\n", "#act_df.head()" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [], "source": [ "## Copy Livre voting record to new aggregate columns...\n", "act_df[\"L/JKM\"] = act_df[\"L\"]\n", "## ... and fill the NAs with JKM voting record.\n", "act_df[\"L/JKM\"] = act_df[\"L/JKM\"].fillna(act_df[\"Joacine Katar Moreira (Ninsc)\"])\n", "\n", "## Copy PAN voting record to new aggregate columns...\n", "act_df[\"PAN/CR\"] = act_df[\"PAN\"]\n", "## ... and update/replace with CR voting where it exists\n", "act_df[\"PAN/CR\"].update(act_df[\"Cristina Rodrigues (Ninsc)\"])\n", "#act_df[[\"descricao\",\"PAN\",\"Cristina Rodrigues (Ninsc)\",\"PAN/CR\"]].head()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "votes = pd.concat([ini_df.drop([\"tipoReuniao\"],axis=1),act_df.drop([\"data\",\"publicacao\"],axis=1)], sort=True)" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [], "source": [ "votes_hm = votes[['BE', 'PCP', 'PEV', 'L/JKM', 'PS', 'PAN','PSD','IL','CDS-PP', 'CH']]\n", "#votes_hm.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Mapa térmico" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAcwAAAHSCAYAAABhB/ttAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3dfYxlyVnf8V9N98rYIihSIIOjMOrZTbBBxDYYaSGEdyLFCgTMu0UUYFdqs4sGAckAg5nF6rZ6EIvkxEPa9sKsEArYlgV5I46BYK9kCJrwYgzGJia7TMYk0spgIhFpsbc7lT/6tvfu3Tqn77lzT9X9Pef7kfaPPT09XVN9Tj31PFWnbso5CwAA9DvXugEAADggYAIAsAQCJgAASyBgAgCwBAImAABLIGACALCE7b4v/vCf/DDvnMDe3sNPtm7CYA9dPt+6CcBkHVw8SKXrvQETAJblNjFJ1+9t3YTBrt6+1boJg+xf2GndhLUiYCI8sjVE4RiA3IK8JB1cLF8PFzAdbyg3+dLN1k0IjyA/vnzppl2W6fjs7Zv1cZ9wAdNxNmPHbDB3KxWiHscA5MRtQnIWdskCAEYRbUISLsMEgGU4Zj9uAcixj/sQMAFMklvwQXuhAiYbfupgnRgROGY/bkHecWNVn1ABk4EcwLIcx4u91g0YKFKwlIIFTABYlmNFKosd3y2xSxYAMAq3EvJZQmWYjjNGR46lLAD1UZLdYAzkKOHUHKCNfOmm5fM3maPxgEWOJ/04DjJASaTKX8q5+xO8jh68n4/3wnO4lVkc11EImIjANVhube0WP96rN2C6fR6m6y/HDaXv8Tney273BX1ch2M/rxQwj48fsQqYqMPxoQXQxrW7r7VuwmA55/gZJlDCGmYdjv2Mcbkt35yaREkWAIA7dXDxoBgw2SULYC3IMBHGYfkyARPhOQ7kjiVZ1/KbE7cd3473sSQddFynJAsAwBxKslgbx11vV5640roJ4Tm+PgAMESpgUnpDydMP3CeZ3RuO9wXlwvE5jnGWDneLlynJIjzHQcZxPdAtYDr2MepY6bUSjsZDBGQSdbj1s2MJ2W1S4mr78AZrmABwiuCDoQiYANbCLmO7vtO6BYMR5NsKtYZp98Ca2t5+desmDMIOWZQ4jhec41zHJF4r4WYCsCzGCwwVKmCiDjI2lLhlbI4B062PJc9+7kLABLAWkQbGTUUft0XARHi8olGHWz87vofpuOnHsZ+7EDABTJJj8HFk2c+c9AMAwNlW2iXrVmJx5FiusJwxmqEkOz7HPnbkuFGpCxkmwnMbyCXPwdxtYHSc+DHBrqPraDwCJgAAc7pKsudqNwQAAEfsksUkuJVlHUuybn3sWN5EWwRMhOc2kEt+64GS7A4zd1xbcxNtUsLnYQIbyDHDBKJY6bUSHlpE4ZhluqGPURIpy+wNmG5lIUosddz1xkdbN2GQpx+4r3UTJsFtYHQ8l9VtTL56+5Zk2M8HF8vXea0E4ZH51OFWkXK8L9z62NUkPg8TKGGQQQn3BYbiaDxgAzGYA5uHTT8A1sJyfc2MWx9Hw2slwAZisooIXAP81tYur5UAGI/bEo7brl60x6YfAGvhFoAoyWIoAibCc8t8JM/qjlsAcgw+bn3squs9THbJAlgLt/Hi6uXWLYCbUGuYzBhR4jaQu3IbL4ChQh2Nhzrc7ossAmYNbveF42TVrY+j4Wg8AADmcDQeJsuxJOtY3nTrZ7ddvZLnB0w43stdKMkiPEqyKHEMPm5BPl+6aTeRkiQdli/3BkzHGj+waK91AyYiUiaxsdzGZNN74qDj+rmqrQAaYCAHsA6sYSI8x5KQY5uBMFYpyQIRkGECGKKrJMumn8Yc14kd7wu3DR6OQd7xvnB7/hz7OBI2/TTm+AC43ReUN+twm5Sk6/faPX+OfRwJJdnG3IIPgHYcA5BbkJckHe4WLxMwAUyS40DuFjAd+7gPARPhOa4HAiXRApAbAibCc1zDJMiPz7GP3dZcJc9lp5V2yQIAcCccg3wXAiYAmHArybqtuZ6Fj/cCAGDOJD7eK1Lqv8nc1iRYw6zDrZ8d+9htjHMbK85ChglsILeBUaJciDi2tnbjZ5hAFJYzc7eMzbGPUcXBxfJ1Pt4LAIAlkGEiPLe1Nclzfc2xn4Gijo/3Yg0TwCQ5rhNbluoNTWKXLIB23DLMLK/2StJe6wYMFG1jFQETwFo4lpExMtOMuGvTT7iA6VZmocSCKNwyzGjZD8YXLmASgLDIbSCXPAdzxxKnG7d3XV2rDitlmG7ZGlDCQI4o3CZS+4qVxPQGzEj/UEyY4yzX8Nlz25CSL920C0BoK1RJ1jEjZlIyPseSrGspy41bidOR20RKUud7mKECJsEHwLIcs0vHAO84+VvpA6TdMjYCZh3X7r7WugmD7D1wX+smDOb27El+a8WO48W+WZB37OM+odYwKb0hCrdnTzItvZlxyzBt7wmOxsNUMZGqw7GfMb673vho6yYMlnPmaDwAOOU4KXGzf2FHR2Zl5D69AdNtxsgDgCgc1zCvXm7dAmya7e1Xt27CSnLeLV7vLckePXi/VUnWcdebI7eH4Ojoza2bMJjbWpXkN2F1nJQ43heO4/LW1m6xJMsaJoC1oCI1Prc+dly/lLrXMAmYAADM4fMwAWCOW7bmyrEk2yVUwOQBqMOtlOV4X7j1seTXz44DudsapmMf96EkCwDAHEqymCzH3ZCOJ/1gfI73stuuekk6yOXTZMkwAQCYQ4aJyXJbW5NYw6zBcX3NbQ1T8uznLmSYAGDCsSTruLxAhgkA5hyDTyQETADAaNw+P1fq3vRDwAQAjObKE1daN2FtCJgAgFE4ZpfSRDJMxwVxR6yjAFhGpOxSChYwGchR4va6g+T5WonbhNXxFQ03kV4pkXitBACAZ5nEayVkEijhvqjDsZ+BosPy5VAB03GQAQB4oCQLYJLIiMfnuoa5tbVLSXYTkRWPj/uiDrdNP7q+07oFg7ltVMqXbtoGzZJQAdNxkAHQhuWuerMxzm4SdYZQARNAO27Zj1vwcWQ5KZF0cLF8nYCJ8Kg81EE/IzoCJsJjDbMOx34GijpeK+ndJXt8/IjVLlnX9N+N27oE9wVKCPDosn14o7hLltdKAACYs9JJP26ZhN2mA1Nu28Qd7wtKsuOjj9FplZN+3EpZ+2YDuSu3+2KvdQMmwjEAAUNQkgUAYM4kDl8HStyWFiS/LB6YAgImwnNcw3R8qd5tYsKkBEOFC5hui+Ks+9RBP4/PbWLC2jY6TeE9TNTBzBxAZCutYTIwAgBwIlxJFljkVqaXPEvIjv0MFHWUZM/VbQUAAJ5CZZiOM1zHTAKIwO3EKslvY5XkOcYddFwPFTAdfzFAidsrGpKU5TVhdQw+jhwTmZWOxgPQhuOGO7ejKR0DpmNWHEmoo/EcZ+WOHAdzAFjWJI7GYyBHiWNJyHF5wW3CSoaJoUIFTABYFsEHQxEwAayFW8bmmMWjjoOL5esETABr4Zax7bduAOwQMAGshVuGiTrcJlJ9wgVMt40HjthcBWBZlhOpw93i5d7XSo4evN/qtZJIM5lN5vYAsFaFEsfd06hj+/BG8bUSzpIFAGAJoQKmW+aDOsgkAKxDqJN+GBjrcCtxOt4Xbn0s+e0fcJxgs+xUx9bWLiVZAABWdeamH8eZLsa19/CTlhmQE/p4fI597Nbm/Qs7drvq9y/sdGaYoUqyqMOt9Ob2wEp+fSz59bNjH7txLHtL3btke9/DdFz7wfjcPvdQRjPyU5YDjVk/W/YxmuoNmE6pv0SAr4X7YnxufSz5ZWx2Ez+x6ac1SrIAAMyZxOdhAiVumY/ktx4ITAEBE4O5BSCCD9DGtbuvtW7CSg7yQfE6ARODEYAATFGow9dRh9vGA8fdkI6bfty4VUokJqu1rLSG6fbQOj4AAAAPvRnm8fEjZJiwx6wcaMcxkek66ac3w2SgQQS8h1mH48DoxnF5Qdd3Wrdgbdj0g8Hcdr7tPXBf6yYM5hh83AZzx0mJ22lKkiTDxOvgYvl6qIDpOMhYeuJK6xYMY5hhWlZ3HAdzYIBQAdNykAEAWAgVMIESy9IbgI3TGzDdSpxuayiueA9zfAT58bEZDEP1BkzHgQbjsyt9Xz5vOThiXAQfDBXq4AKghGBZBxWp8TEmt8UaJgZze63EblevKbfKw77Z0oIky1c0IiFgYrArhgHILct0zSScsky3AH+KPm6HD5DGYG4ZpmOAx/icAs+paAFoU03iA6QdHwBLZgHILbuU/HYiS35rgll+98Ve6wZMxWH5cqiA6chxxuiWYboFeEmea1VmZWTHCbbbeOHYx30oySI8xwzTdQ0TiGASJVnUQYYJtOGYsbllxX0ImAiPDLMOx352w7prJaxhYl3sdp0ykFfhuFEJ43LNLg86rp+r2goAwGQ4lpD7kGECGyjaQLOJ3F6DcUVJFsCoGMyBzUPABDaQ46YfsmJEx3uYAADM4T1MTJbj6w5kmMDmIWAiPMfgA2DzhAqYjjNc1/eUnJBh1uG2UcnxvVH6uC3WMAHAhGNSsL396tZNGCznzBom1sPxoSWTH5/jfYHx2Z0M1qM3YDqWsjA+zrMcn2Mpi0kJousNmI7rKEAIBB9g41CSxSS4VUtcJ6tO/fzQ5fN2ZeSrt2/RxyPrq5Sw6QfhuT2wEuVNlLndy673MQcXYLJcH1pgEfdyW3y8FwAAS2CXLIC1cNzZi3FFy4hDrWG61fdduT0EjhM/x+DjdgqN48Yqx3vZ0fbhjeFrmG6/HMf3Ax25vdMIROE2kXKbXJ866LjeGzDdfjmOXG8oJ24TP8nzvmAiNT7H+yKS3oDp9suhJIsoHO9ltwqPYx+jrd6AyQ01Psfsx3Htx43bZFXyyzAd+xh1HFwsXyfDbMyy7G12XwBROI5xbnGkDwcXNBbpZgIwLredyJK075gUdAj1WgkAAHdqpaPxHNN/jI+sGCVu44XjfezWx9GEWsMESthYVQfjBaKjJAsAGIXjZFVa8aQfAABW5VgpkVY86QcAgDtx7e5rrZsw2EEuh8xQAdNxQZx1n/E5loUcZ+aO/Yzx7T1wX+smrE2ogOn4jpIMB0YgAsdDQxzHOMd+7nJmwHTK2q5e9mqvJOn2Lbs2XzVr89XLntmPW5sfunzeqs1Xb9+yaq900sdWz97tW5Yng3UdjccuWQAA5nBwAWCEtW1g84Q6uIAAjxLHdR/HtW238ibQ6bB8OdSmH7cAjzrcPnbKVaTNHZvKbYyLlsSECphAieVAbjYwOnKsPLhN/rKe1F1vfLR1MwbLebd4nYCJ8BwHxkgfibSpHCdSjvfy07yHCWBMbqU3KV75DXfOcVLSh9dKAACjcTwaL+fM4euYJnZvAm2k6/dq/+jNrZuxNr0B022giZb+byrHcqEbzpIdn2MfU/Zui5IsAGA0lGQBI26Zj+SZ/QAlV5640roJa0PARHgEHwDrQMBEeGSYdbC+Nj7H9zAj7S0hYAJYCzaDVWA4kXI8tarr471CBUzHGS6DDKJwy+Qds3i3PpZiZZjskgWwFkxYx+fYx462tnaLu2R7A+bRg/cTMGHPcYbrNpADkaz0AdKOJQvgOQg+ANYg1Ek/qOOhy+et7g239konbXYrv+VLN62yebf2nnKqPuxf2LFqr9Rf9mYNE+G5BUvJs7rjFuCBLl1rmKF2yQJoxy2TQB2RJlKUZDGYWxkri/sYaMVxIrXSe5iOZSFUYPYA7JsFeEl2fSzFyiQ2ldtJP26T67NQksVgbgOj4wzXEf1cgVsSc/uW3XjRh4CJwRgYgTbcgs/V27csx4tJHI0HAJE5Bp9IQgVMt9mXxANQg+PmNfYPjM9xvHBbw5RirWOGCpgEHwDLcgw+lhMpw3GZkiyAUdllbNd3WrdgsP3WDZg4AibCs5yVY3RUpOqwm0j1IGAiPMcH1nEwd2sz98X4HPu4DwET4TmuVdm9b2fILfg4cu3jyaxhus1oXG8oYJHjbmSg6LB8mU8rAQBgzkofIO2WrQEljlm8Y7bG5iqURIojvRnm8fEjdhmm5XqVGccXkR2DJrDIMfg4PntdGWZvwDx68H6rgOk4kDva3n516yYMcuWJK62bMJhjhsnzh0WuCcz24Y3hAZM1TADA1Ky0hrn38JNW6xJu7ZV823zXGx9t3YylOWaY+xd27EpZbm12a6/k12bX8W2lXbJua5iu6b8bp2Ap+QZMYJFTsJQ8lxYkSrKYMMeH1m1WLnn2M8bnNsGWpJzz8IDplmECJW6zctRBFo8uW1u7w9cw3Uqc7NJDFI7ZmltW7DiRIsi3RUkWwFq4DeaOARN1rLRLFii5dve11k0YhE0/dRCAsMjxPu5DhgkAwBwyTKwNGSaAKQoVMKOl/5vKK1wCcTDGtRUqYLKGAmBZBB8MFSpgoo6joze3bsIgbq9HSX6vaEh+Achxgu3Wx5JnPx9cLF8PdXCB48DoyO19V8cHFuMj+KDLSpt+3H45+2YDueTXx5K037oBA3EIQB2OAQjji3RfhMowgRLHyoNjwASimESGiToizRixPm73heNEym05JBo2/WAwt4nUXusGTIRbAHLM4t2WQ6IhYCK8hy6ft1zHdOOW/ezLb/KHtgiYCI9giS5uZWS01RswuZlQwqwcaMOt7C35VR76kGFiMLeJVBYZJtBCpGAp8QHSANaEykMFZhuVom1S6g2YdrvIeGBRwC7ZOtzWiu3GN/lVd6KhJAtsIMeB0a307RbgJb8+jlb1C/UB0o6DjCNKb0AbjHF1bG3t8gHSmCbHTMKxXOjWz9GyH4wvVIYJAMCdWuksWbcZIxCFY4bpVi5kaQFDxdolCwThFnwkAhDiYw0T2EAEH2DznGvdAAAAHBAwAQBYAiVZAMAoHNfi+4QKmOzqrcNtM5jjfeHWx44cB3O3tW239p46uFi+HipgMsgA7bhNTNyOmZOkfcPDFtw+xEOSdFi+HCpguj2wANpxPOnHLfik6/da9nMXTvoBABOUketY6aQfIAq36oPj8oLjYO7GLcOUPMvIXUIFTB7YOtxmjG7BEojEMcjrcLd4OVTABEocszVHbhMpxwm243qg230hSQcd1wmYCM8xw3QM8o4BCOOLdF+w6QcAgDls+gGMOM7K3Upvjn3syO2+6EPABDZQpEFmU9HHGCpUwGTGWAcDDUrcnj/H+9itj6PpXcM8evB+1jCBBhw3/ThurgJKtg9vxF/DZJBBieN94cjtlQfHDNNNtPGtN2C6PQAyfAAYzFHiWHqze0Hd8Nmzuy+u7/jdFz16AyYzMJS4PbTcx4giUvBxxBomsIEcKw/Rym+YrpXWMB0fWiACtyxeknR9p3ULBnGsPDjeF5Gy4lCbfoAoHAdzuwzTMCGIFHwcETABwITbRsxoAZ6zZAEAo7GrPGgi72ECUbBWBWweAiawgRzXMPfNyoWWfWw4kYqkN2A6ptIYn9vuacf72K2PUQdZfCWHu8XL4V4rcRwcMa6HLp+3uy/IJMa3f2HHMstEO72bfo6PH2HTD+w5zsodJ6tuQd4xWNLHdaz0AdKOAw2ANlwHR4zHLcCfJdTh6zywQDvRBkdgEYevIzy39UtXWV797JYQSH5VP8c+7sNrJQAmyS34OMqXboYKmgRMAMBoLCcmq7xWAqANx12ybqVvx8zHLfg49nEfAiYGc9vccfVy6xZMg9vg6LhHw+00pWgImBjMbaBxy3wkv+Aj+WU/e60bsAI2VrUVKmC6ZT6u3AKmI7fgA5TY3sdTWMNkIEcUrGECm6c3YJKxoYSJyfgcnz23ciEwFB8gDWAt3IK848TPrY9dbW3txv8AaW6mOtwGGsdSoeNmCbf1Kscdp259bKtjDZMME8BauE1Y3SZ+kl8fu5pEhgmUOGaYjpt+3LIfMsw6HKslXc61bgAAAA4oyQKACUqydVCSxWRRkq3DsZ/dOL66E6kkS4YJACbIMOuYRIbJzVSH4+5CAPW5jhUHF8vXQwVM118OxkWpEFG4lWQdD7iXJB2WL/eWZI+PH6Eki+dgYoIIqEihy0olWQZGAFExvqHLJEqyzBjrYKABMEWhAiYDOYDI3JKCaGNyqIAJlDhu+uE9TJS4bfpxPH6wDwETwCQ5vlDvdpZsvnTTsp+7EDABTJJb8JE8g3wknPQDABiN27qrNJGTfoASx7U1x0zCLWNz7GO0RcAEAIzCdZfsJN7DBKJwHGjcjkFz7GO38ub+hR3Lfu5CwAQwSW7BB+0RMAFMkmPm4xjkHdvchYAJYJIiDeSoI9xrJTwE43OcmQPAsg4uHgx/rcQt+ORLN+2OjnLktrnDEUfjjc+xj93G5GiT63AZJrDIbSCXPN8R5D1MRDGZgwvcZmCO3GaND10+bxc03fpY8qs85Es37bJMxre2wgVMx4EG43ILlqjDLVhKfuNbtAAfLmACwDIcJ1KUkdsiYAKYJIIPhiJgApgkt01KEkG+NQImgEki+GAoXisBAGDOSgcXAMCy3HZEOpZk3UTL4nszzOPjR8gwYc9tKz7QxW1S4moyBxcAETgOjGRs4+Poz0oOd4uXWcMEAGDOSmuYji/2AoscT3Rx5JYVO5bq3fo4GjJMAMAort19rXUTVpJzZg0TwHjcKlJUHsZ35YkrrZuwVgRMhOc2kEsM5ijjXm6LgAlgkhzXA912yabr92q/dSPWiIAJYC3cMolIA/mmypduhjq8gICJ8NwGcld25cLrO61bADOhdsk6llgcOW7HB4BlTeI9TLf6vi2zjM3tPnZFJo/oQmWYElkmysiKUeI2mXKclDi+izmJ9zAJlighWKJk/8KO3zqm2b3sGCz7hMswAQCbwzFodmWYfLwXwiPDRIlbOVbyK8k6BktpxZIsAw2AqNyCD9o717oBAICYop0lS8AEAIzCtSTbJdR7mEAUjseJ5Us3WzdhEEqy43v6gftaN2Gt2CWLwdxe33Fci3ecrBKAEMUkTvrhga3DLZPYa90AACH0Bky3AOSW+bi6erl1CwCgvlAn/TiW3jA+t0qJ5DdZlZiwIj7WMAEAo3DdJctZspgstzVXyTPDBBZFew+zN2ASgBCB4ysabodsA1PA0XgIj4lfHW797Fh5cJv8RYshrGECADBnpfcwHbnNcqPNwDaV205ZxzVMt2cPGCpcwCQAYZFbsHTlVuJ0K2+ivXABE0AbBKDxuU1KpFj3RbgPkCbDRInb+2CO2/EdM3m30rdjHzvaPrxRXMMMFzAxPrdJiePamlsfA5GstOmHhxYAgBOh1jApV9ThVsZyXPeRWR9Lfs+f230s+VVLoiVdoQKm4wMAoA23AC9JWV5t3g+04Ufi4AIAAJ5lEh8gHWn7MtbHsSzkVnqTTEvfQMlh+XJvwHQLQDywdbjdF6jDbUnEcVLiNvlze53rVD68UbzOayWNuT0Akt9A49jHjtzuC6DL1tbu8PcwWcMEsCy3gMlEany2GWbHB0iTYSI8BkagHceg2RUwObgAADAax2Meu5xr3QAAABwQMAEAWAIBEwCAJYQ6Gg8AsDkcN/xI0kE+KF4nYAIARhFpw49ESRYAgKUQMAEAWAIBEwCAJRAwAQBYAgETAIAlsEsWADAax1dLeK0EAFBdpFdLCJgAgFE4ZpcSGSbWiM89RAn3xfjc+nj/6M2tm7BWfIA0AABzDi4eDP88TLfZDFCSL91s3YRJeOjy+dZNAEbVGzAZaFCSrt/bugmDuLVX4tkDNlFvwGTGiCLDtR87PHvAxqEki8EcN0u4cXz2uC8QXW/AdHsAHAcZoMTt2QOmgKPxAABYQqj3MJmVAwDGEipgAsCyWMKpI1IiEypgOj4AkW4mwInjs+c4xjm2uUvvST9HD97PST8AlsJraIhipZN+HF/4BkrcDgIg+ACbh12yCM8tWALYTKHewwSKyNYArMGZm36cFmxdMwm30vfV27es7ourt29p7+EnWzdjEEqywObh470AAJiz0qYfAFiWU9VB8q1IoYLD8mUCJoC1IAAhOgImgLVg3RVRHHRcJ2BiMLfSG7u963DbWAV06ijJsukHAIA5K236YcaIErfXYFAHmTyi6w2YbmsSbqVCVwyMKGGCjTAoyQIAcLZJvIfJDLcOt5Ks4+sObtUdYArIMAHABMtOdWxt7RYzTD4PE8BakBUjikmUZIEo3MrekiQ2gyG4ULtkHTmWWNglWwF9DGyc3pLsqD84pd2c8yNNfvgK3Nor+bXZrb0Sba7Brb0Sba6hRXvP1fxhC3Yb/uxVuLVX8muzW3sl2lyDW3sl2lxD9fa2DJgAANggYAIAsISWAdOmVj7j1l7Jr81u7ZVocw1u7ZVocw3V29ts0w8AAE4oyQIAsITRA2ZK6Til9HsppfellH43pfT3Z9d3UkpPzb52+t8/G7s9S7Tz/Smlt6eUXjC7/ukppbemlB5PKf1OSukdKaXPXGj/B1JKb0opVZuA9LT3eKFPfyil9KMppWsL3/+ylNIHR27j/+24/sKU0q/M+vD9s2tfllL6pbk/87qU0jtTSs9LKT2WUrqdUkpzX/93XX//WHr6/DUppT9MKf3+7OvNTh3oaeN2SukjKaUfW/jzj6WUfnvu/z8/pfTYBrS32Kez9v732fU/Sin9ZErpr9dqb5/T+3H+vq74s7vGqadSSu9NKX0wpfTfUkrfMfc951NKvzQbmz+QUnpHx9/92pTS/5r7Pf2TvuuV/33vX/hzr00p/Yux2lFjgH8q5/yynPNLJV2RND9wPz772ul/P1uhPV1O2/k5kj4u6btmA/S/lfRYzvmenPPLdfJvOD3R4fGc88skvUTSZ0v6upbtXbh++t+PSXqLpG9Z+P5vnV1v4R9J+uWuL6aUfkTSF0l6Zc75Y7PL/2d2TbMB8oVjN7KgdI98oaSvlvR5OeeXSPoqSR9u0LbONs6u/0NJH5L0TfMTj5m/mVJ6Rc1GzlmlT79tdv0lkj4m6d/XbvQmOWOcejzn/Lk558/SyTP/vSml75x9656kX805vzTn/NmSfqjnx7x+NtZ9k6RH55KDruu1/n1V1S7Jfoqkv6j8M1fxHkl/R9KXS3o65/ym0y/knN+Xc37P/B/OOR9J+q+z72nhtL1FOecPSfqLhcznm9U2YP7n0hdSSgGNJbYAAATUSURBVP9c0iskfU3O+am5L71VJw+8JH29pF8ctYVnO+3zF0r6s9PAnnP+s5zz/27asmfM3xevkvSvJN2W9IULf+5hSa+p2K4ug/o05/xxST8g6UJK6aVVW7pZiuOUFiZuOecnJH2/pO+ZXXqhpD+d+/rvn/WDcs4flHQk6VOXub4mS/37aqgRMJ8/S9n/SNJPS9qf+9o9C+XDL67Qnl4ppW2dDNh/IOlzJP3OEt/zAklfOfueqhbaKz3T36f/nWaWb9Es4KSUvkDSR3POf9ygvVuSXpRz/kDhy1+kk4zoFTnnxXLrr0n6ktn3f6ukt43b0m4Lff4rkj4jpfShlNJhSulLW7Vr3nwbU0qfpJMs7T/q5D541cIf/01JH08pfXndVj5j1T7NOR9Lep+kF9dp6UZaapya+V0901f/WtKNlNK7ZyXwv3XWN88m3f9P0keWub4mff++Z8UQPVNRGUXNkuyLdZJZ/OxcSWixJPuenr9nbM+fdfhv62QWfmOJ77ln9j2/Iek/5ZyLWdNIutq7WJI9DSxvk/SNs5JJy3LsvZK6PqDyf0hKOikfLjqW9Os6afvzc863Rmldv+f0+Sywv1wnp458RNLb5teJGijdF18t6d2zjP0XJH3dbOIx73WSfqRqS0+so0+LnyyBok/0Vc75lyXdLemndBJE35tS+rSO7/u+2e/pJyR9S37m9Yqu67U8K4ZIetOZ33EHqn5aSc75N1NKnyqp65fS0lOzDv+ElNIfSvrGnu95fPF7KnpOe/vknD+cUvoTSV8q6Rv03LJcLa+Q9M6Orz0p6dsk/VpK6aM553cvfP2tOlnLeO14zetV7PNZlvOYpMdSSn8g6dsl/Uzdpn1C6T5+laR/kFK6Nbv0NyR9haRfPf0zOed3pZReJ+kLajV05o76dBb4/56kUTewbbizxql5n6u5vso5f1TSz0v6+XSy6e5LUkqfJ+kfz75++rt5fc75Jwp/X9f1dRry7xtV1TXMlNKLJW1J+vOaP/cOvEvS81JKnzizMKX0kk0oHa/oLZJeL+mJnPOfnvWHR/KVkv5L1xdn661fL+nfpJQWB9L36GTTWKvs+DlSSi9KKf3duUsvk/Q/W7VnUUrpUyR9saQLOeednPOOpO/Wc8uy0kmW+QMVm1e0bJ+mlO7Syf3w4WXW3wIrjlOSPmP+D6WUdnSSCV6f/f9XpGd2Jf81SfdIup1zfs1cxrYJlvr31VAjwzwtuUgn5YBvzzkfz6qy98x9TZIezTm/oUKblpJzzimlV0r6lymlH5T0V5JuSfrepg3r9/yFPn1nzvl099vbJb1B0qVKbXlBSmk+MF+X9Fc557+c/f+2TnY5PkvO+bdmO/n+w/y62qzcM/ZsdqhPlnR9tnP3SCdl5U06xPqVkt41t9tYOtlV+uMppefN/8Gc8ztSSmOsQQ11Vp/+XErpY5Kep5PJ19fWb+KZXrRw739fzvntY/ygM8ape1JK75X0SZL+UtIbcs4/M/vWl0v6yZTSkU6Sp5/OOf/WGG28E5s0DnPSD6pJKf1TSX979qqLUkpfq5NXBL65bcsA4GwETDSRUtrTSWbwHTnn97ZuDwCchYAJAMASOEsWAIAlEDABAFgCARMAgCUQMAEAWAIBEwCAJRAwAQBYwv8Hkhoqy8ycZgcAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "import matplotlib as mpl\n", "import seaborn as sns\n", "\n", "votes_hmn = votes_hm.replace([\"A Favor\", \"Contra\", \"Abstenção\", \"Ausência\"], [1,-1,0,2]).fillna(0)\n", "\n", "voting_palette = [\"#FB6962\",\"#FCFC99\",\"#79DE79\", \"black\"]\n", "\n", "fig = plt.figure(figsize=(8,8))\n", "sns.heatmap(votes_hmn,\n", " square=False,\n", " yticklabels = False,\n", " cbar=False,\n", " cmap=sns.color_palette(voting_palette),\n", " )\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Quem vota com quem\n", "\n", "Com estes dados podemos tentar obter uma resposta mais clara do que o \"mapa térmico\" anterior nos apresenta como sendo semelhanças e diferenças no registo de votação.\n", "\n", "Uma das questões que se coloca (e normalmente coloca-se com maior ênfase sempre que há uma votação que é apontada como sendo \"atípica\", com base na percepção geral do que é o comportamente de voto habitual de cada partido) é saber \"quem vota com quem\". Estes dados podem ser obtidos através da identificação, para cada partido, da quantidade de votações onde cada outro votou da mesma forma e criação de uma tabela com os resultados:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total voting instances: 2457\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BEPCPPEVL/JKMPSPANPSDILCDS-PPCH
BE2457197320312047118918511121120411061110
PCP1973245723261819115715771093117611001093
PEV2031232624571879114316611066118410511069
L/JKM2047181918792457108417621037117410301040
PS118911571143108424571214164112201287912
PAN1851157716611762121424571248137312091171
PSD1121109310661037164112482457147117231344
IL1204117611841174122013731471245715611476
CDS-PP1106110010511030128712091723156124571525
CH111010931069104091211711344147615252457
\n", "
" ], "text/plain": [ " BE PCP PEV L/JKM PS PAN PSD IL CDS-PP CH\n", "BE 2457 1973 2031 2047 1189 1851 1121 1204 1106 1110\n", "PCP 1973 2457 2326 1819 1157 1577 1093 1176 1100 1093\n", "PEV 2031 2326 2457 1879 1143 1661 1066 1184 1051 1069\n", "L/JKM 2047 1819 1879 2457 1084 1762 1037 1174 1030 1040\n", "PS 1189 1157 1143 1084 2457 1214 1641 1220 1287 912\n", "PAN 1851 1577 1661 1762 1214 2457 1248 1373 1209 1171\n", "PSD 1121 1093 1066 1037 1641 1248 2457 1471 1723 1344\n", "IL 1204 1176 1184 1174 1220 1373 1471 2457 1561 1476\n", "CDS-PP 1106 1100 1051 1030 1287 1209 1723 1561 2457 1525\n", "CH 1110 1093 1069 1040 912 1171 1344 1476 1525 2457" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "pv_list = []\n", "print(\"Total voting instances: \", votes_hm.shape[0])\n", "\n", "## Not necessarily the most straightforard way (check .crosstab or .pivot_table, possibly with pandas.melt and/or groupby)\n", "## but follows the same approach as before in using a list of dicts\n", "for party in votes_hm.columns:\n", " pv_dict = collections.OrderedDict()\n", " for column in votes_hmn:\n", " pv_dict[column]=votes_hmn[votes_hmn[party] == votes_hmn[column]].shape[0]\n", " pv_list.append(pv_dict)\n", "\n", "pv = pd.DataFrame(pv_list,index=votes_hm.columns)\n", "pv" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fig = plt.figure(figsize=(8,8))\n", "ax = fig.add_subplot()\n", "\n", "sns.heatmap(\n", " pv,\n", " cmap=sns.color_palette(\"mako_r\"),\n", " linewidth=1,\n", " annot = True,\n", " square =True,\n", " fmt=\"d\",\n", " cbar_kws={\"shrink\": 0.8})\n", "plt.title('Portuguese Parliament 14th Legislature, identical voting count')\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Matriz de distância\n", "\n", "Com base nos histórico de votações de cada partido produzimos uma matriz de distâncias entre eles; uma matriz de distâncias é uma matriz quadradra $n\\times n$ (onde _n_ é o número de partidos) e onde a distância entre _p_ e _q_ é o valor de $ d_{pq} $.\n", "\n", "$ \n", "\\begin{equation}\n", "D= \\begin{bmatrix} d_{11} & d_{12} & \\cdots & d_{1 n} \\\\ d_{21} & d_{22} & \\cdots & d_{2 n} \\\\ \\vdots & \\vdots & \\ddots & \\vdots \\\\ d_{31} & d_{32} & \\cdots & d_{n n} \\end{bmatrix}_{\\ n\\times n} \n", "\\end{equation}\n", "$\n", "\n", "\n", "A distância é obtida através da comparação de todas as observações de cada par usando uma determinada métrica de distância, sendo a distância euclideana bastante comum em termos gerais e também dentro de estudos sobre o mesmo domínio temático _(Krilavičius and Žilinskas 2008)_: cada elemento da matriz representa $ d\\left( p,q\\right) = \\sqrt {\\sum _{i=1}^{n} \\left( q_{i}-p_{i}\\right)^2 }$, equivalente, para dois pontos $P,Q $ , à mais genérica distância de Minkowski $ D\\left(P,Q\\right)=\\left(\\sum _{i=1}^{n}|x_{i}-y_{i}|^{p}\\right)^{\\frac {1}{p}} $ para $ p = 1$, mas note-se que a diagonal da matrix irá representar a distância entre um partido e ele próprio, logo $ d_{11} = d_{22} = \\dots = d_{nn} = 0 $.\n", "\n", "Na secção [Distâncias e matrizes](Distâncias_e_matrizes) colocámos uma discussão mais detalhada (mas passo-a-passo e destinada a quem não tenha necessariamente presente a matemática utilizada) sobre distâncias, _clustering_ e como são calculdadas, para quem tenha interesse numa compreensão mais quantitativa da matéria.\n", "\n", "A conversão de votos em representações númericas pode ser feita de várias formas _(Hix, Noury, and Roland 2006)_; adoptamos a abordagem de Krilavičius & Žilinskas (2008) no já citado trabalho relativo às votações no parlamento lituano por nos parecer apropriada à realidade portuguesa:\n", "\n", "* A favor: 1\n", "* Contra: -1\n", "* Abstenção: 0\n", "* Ausência: 0\n", "\n", "Este ponto é (mais um) dos que de forma relativamente opaca - pois raramente os detalhes têm a mesma projecção que os resultado finais - podem influenciar os resultados; cremos que em particular a equiparação entre _abstenção_ e _ausência_ merece alguma reflexão: considerámos que uma ausência em determinada votação tem um peso equivalente à abstenção, embora uma de forma passiva e outra activa.\n", "\n", "Para obtermos a matriz de distância usamos a função `pdist` e construímos um _dataframe_ que é uma matriz simétrica das distâncias entre os partidos." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BEPCPPEVL/JKMPSPANPSDILCDS-PPCH
BE0.00000029.00000025.57342422.69361163.55312733.89690361.34329652.49761956.87706049.386233
PCP29.0000000.00000015.06651931.11269860.23288142.04759258.18934652.50714254.42425948.249352
PEV25.57342415.0665190.00000028.23118861.50609739.50949359.64059052.74466856.11595149.183331
L/JKM22.69361131.11269828.2311880.00000062.91263833.55592360.01666450.60632455.73149948.062459
PS63.55312760.23288161.50609762.9126380.00000062.81719544.02272154.50688050.49752556.727418
PAN33.89690342.04759239.50949333.55592362.8171950.00000056.70978847.19110152.76362446.432747
PSD61.34329658.18934659.64059060.01666444.02272156.7097880.00000043.94314534.84250343.058100
IL52.49761952.50714252.74466850.60632454.50688047.19110143.9431450.00000036.56501137.054015
CDS-PP56.87706054.42425956.11595155.73149950.49752552.76362434.84250336.5650110.00000034.263683
CH49.38623348.24935249.18333148.06245956.72741846.43274743.05810037.05401534.2636830.000000
\n", "
" ], "text/plain": [ " BE PCP PEV L/JKM PS PAN \\\n", "BE 0.000000 29.000000 25.573424 22.693611 63.553127 33.896903 \n", "PCP 29.000000 0.000000 15.066519 31.112698 60.232881 42.047592 \n", "PEV 25.573424 15.066519 0.000000 28.231188 61.506097 39.509493 \n", "L/JKM 22.693611 31.112698 28.231188 0.000000 62.912638 33.555923 \n", "PS 63.553127 60.232881 61.506097 62.912638 0.000000 62.817195 \n", "PAN 33.896903 42.047592 39.509493 33.555923 62.817195 0.000000 \n", "PSD 61.343296 58.189346 59.640590 60.016664 44.022721 56.709788 \n", "IL 52.497619 52.507142 52.744668 50.606324 54.506880 47.191101 \n", "CDS-PP 56.877060 54.424259 56.115951 55.731499 50.497525 52.763624 \n", "CH 49.386233 48.249352 49.183331 48.062459 56.727418 46.432747 \n", "\n", " PSD IL CDS-PP CH \n", "BE 61.343296 52.497619 56.877060 49.386233 \n", "PCP 58.189346 52.507142 54.424259 48.249352 \n", "PEV 59.640590 52.744668 56.115951 49.183331 \n", "L/JKM 60.016664 50.606324 55.731499 48.062459 \n", "PS 44.022721 54.506880 50.497525 56.727418 \n", "PAN 56.709788 47.191101 52.763624 46.432747 \n", "PSD 0.000000 43.943145 34.842503 43.058100 \n", "IL 43.943145 0.000000 36.565011 37.054015 \n", "CDS-PP 34.842503 36.565011 0.000000 34.263683 \n", "CH 43.058100 37.054015 34.263683 0.000000 " ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from scipy.spatial.distance import squareform\n", "from scipy.spatial.distance import pdist\n", "import scipy.spatial as sp, scipy.cluster.hierarchy as hc\n", "from itables import show\n", "\n", "votes_hmn = votes_hm.replace([\"A Favor\", \"Contra\", \"Abstenção\", \"Ausência\"], [1,-1,0,0]).fillna(0)\n", "\n", "## Transpose the dataframe used for the heatmap\n", "votes_t = votes_hmn.transpose()\n", "\n", "## Determine the Eucledian pairwise distance\n", "## (\"euclidean\" is actually the default option)\n", "pwdist = pdist(votes_t, metric='euclidean')\n", "\n", "## Create a square dataframe with the pairwise distances: the distance matrix\n", "distmat = pd.DataFrame(\n", " squareform(pwdist), # pass a symmetric distance matrix\n", " columns = votes_t.index,\n", " index = votes_t.index\n", ")\n", "#show(distmat, scrollY=\"200px\", scrollCollapse=True, paging=False)\n", "\n", "## Normalise by scaling between 0-1, using dataframe max value to keep the symmetry.\n", "## This is essentially a cosmetic step to \n", "#distmat=((distmat-distmat.min().min())/(distmat.max().max()-distmat.min().min()))*1\n", "distmat" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "/* Put everything inside the global mpl namespace */\n", "window.mpl = {};\n", "\n", "\n", "mpl.get_websocket_type = function() {\n", " if (typeof(WebSocket) !== 'undefined') {\n", " return WebSocket;\n", " } else if (typeof(MozWebSocket) !== 'undefined') {\n", " return MozWebSocket;\n", " } else {\n", " alert('Your browser does not have WebSocket support. ' +\n", " 'Please try Chrome, Safari or Firefox ≥ 6. ' +\n", " 'Firefox 4 and 5 are also supported but you ' +\n", " 'have to enable WebSockets in about:config.');\n", " };\n", "}\n", "\n", "mpl.figure = function(figure_id, websocket, ondownload, parent_element) {\n", " this.id = figure_id;\n", "\n", " this.ws = websocket;\n", "\n", " this.supports_binary = (this.ws.binaryType != undefined);\n", "\n", " if (!this.supports_binary) {\n", " var warnings = document.getElementById(\"mpl-warnings\");\n", " if (warnings) {\n", " warnings.style.display = 'block';\n", " warnings.textContent = (\n", " \"This browser does not support binary websocket messages. \" +\n", " \"Performance may be slow.\");\n", " }\n", " }\n", "\n", " this.imageObj = new Image();\n", "\n", " this.context = undefined;\n", " this.message = undefined;\n", " this.canvas = undefined;\n", " this.rubberband_canvas = undefined;\n", " this.rubberband_context = undefined;\n", " this.format_dropdown = undefined;\n", "\n", " this.image_mode = 'full';\n", "\n", " this.root = $('
');\n", " this._root_extra_style(this.root)\n", " this.root.attr('style', 'display: inline-block');\n", "\n", " $(parent_element).append(this.root);\n", "\n", " this._init_header(this);\n", " this._init_canvas(this);\n", " this._init_toolbar(this);\n", "\n", " var fig = this;\n", "\n", " this.waiting = false;\n", "\n", " this.ws.onopen = function () {\n", " fig.send_message(\"supports_binary\", {value: fig.supports_binary});\n", " fig.send_message(\"send_image_mode\", {});\n", " if (mpl.ratio != 1) {\n", " fig.send_message(\"set_dpi_ratio\", {'dpi_ratio': mpl.ratio});\n", " }\n", " fig.send_message(\"refresh\", {});\n", " }\n", "\n", " this.imageObj.onload = function() {\n", " if (fig.image_mode == 'full') {\n", " // Full images could contain transparency (where diff images\n", " // almost always do), so we need to clear the canvas so that\n", " // there is no ghosting.\n", " fig.context.clearRect(0, 0, fig.canvas.width, fig.canvas.height);\n", " }\n", " fig.context.drawImage(fig.imageObj, 0, 0);\n", " };\n", "\n", " this.imageObj.onunload = function() {\n", " fig.ws.close();\n", " }\n", "\n", " this.ws.onmessage = this._make_on_message_function(this);\n", "\n", " this.ondownload = ondownload;\n", "}\n", "\n", "mpl.figure.prototype._init_header = function() {\n", " var titlebar = $(\n", " '
');\n", " var titletext = $(\n", " '
');\n", " titlebar.append(titletext)\n", " this.root.append(titlebar);\n", " this.header = titletext[0];\n", "}\n", "\n", "\n", "\n", "mpl.figure.prototype._canvas_extra_style = function(canvas_div) {\n", "\n", "}\n", "\n", "\n", "mpl.figure.prototype._root_extra_style = function(canvas_div) {\n", "\n", "}\n", "\n", "mpl.figure.prototype._init_canvas = function() {\n", " var fig = this;\n", "\n", " var canvas_div = $('
');\n", "\n", " canvas_div.attr('style', 'position: relative; clear: both; outline: 0');\n", "\n", " function canvas_keyboard_event(event) {\n", " return fig.key_event(event, event['data']);\n", " }\n", "\n", " canvas_div.keydown('key_press', canvas_keyboard_event);\n", " canvas_div.keyup('key_release', canvas_keyboard_event);\n", " this.canvas_div = canvas_div\n", " this._canvas_extra_style(canvas_div)\n", " this.root.append(canvas_div);\n", "\n", " var canvas = $('');\n", " canvas.addClass('mpl-canvas');\n", " canvas.attr('style', \"left: 0; top: 0; z-index: 0; outline: 0\")\n", "\n", " this.canvas = canvas[0];\n", " this.context = canvas[0].getContext(\"2d\");\n", "\n", " var backingStore = this.context.backingStorePixelRatio ||\n", "\tthis.context.webkitBackingStorePixelRatio ||\n", "\tthis.context.mozBackingStorePixelRatio ||\n", "\tthis.context.msBackingStorePixelRatio ||\n", "\tthis.context.oBackingStorePixelRatio ||\n", "\tthis.context.backingStorePixelRatio || 1;\n", "\n", " mpl.ratio = (window.devicePixelRatio || 1) / backingStore;\n", "\n", " var rubberband = $('');\n", " rubberband.attr('style', \"position: absolute; left: 0; top: 0; z-index: 1;\")\n", "\n", " var pass_mouse_events = true;\n", "\n", " canvas_div.resizable({\n", " start: function(event, ui) {\n", " pass_mouse_events = false;\n", " },\n", " resize: function(event, ui) {\n", " fig.request_resize(ui.size.width, ui.size.height);\n", " },\n", " stop: function(event, ui) {\n", " pass_mouse_events = true;\n", " fig.request_resize(ui.size.width, ui.size.height);\n", " },\n", " });\n", "\n", " function mouse_event_fn(event) {\n", " if (pass_mouse_events)\n", " return fig.mouse_event(event, event['data']);\n", " }\n", "\n", " rubberband.mousedown('button_press', mouse_event_fn);\n", " rubberband.mouseup('button_release', mouse_event_fn);\n", " // Throttle sequential mouse events to 1 every 20ms.\n", " rubberband.mousemove('motion_notify', mouse_event_fn);\n", "\n", " rubberband.mouseenter('figure_enter', mouse_event_fn);\n", " rubberband.mouseleave('figure_leave', mouse_event_fn);\n", "\n", " canvas_div.on(\"wheel\", function (event) {\n", " event = event.originalEvent;\n", " event['data'] = 'scroll'\n", " if (event.deltaY < 0) {\n", " event.step = 1;\n", " } else {\n", " event.step = -1;\n", " }\n", " mouse_event_fn(event);\n", " });\n", "\n", " canvas_div.append(canvas);\n", " canvas_div.append(rubberband);\n", "\n", " this.rubberband = rubberband;\n", " this.rubberband_canvas = rubberband[0];\n", " this.rubberband_context = rubberband[0].getContext(\"2d\");\n", " this.rubberband_context.strokeStyle = \"#000000\";\n", "\n", " this._resize_canvas = function(width, height) {\n", " // Keep the size of the canvas, canvas container, and rubber band\n", " // canvas in synch.\n", " canvas_div.css('width', width)\n", " canvas_div.css('height', height)\n", "\n", " canvas.attr('width', width * mpl.ratio);\n", " canvas.attr('height', height * mpl.ratio);\n", " canvas.attr('style', 'width: ' + width + 'px; height: ' + height + 'px;');\n", "\n", " rubberband.attr('width', width);\n", " rubberband.attr('height', height);\n", " }\n", "\n", " // Set the figure to an initial 600x600px, this will subsequently be updated\n", " // upon first draw.\n", " this._resize_canvas(600, 600);\n", "\n", " // Disable right mouse context menu.\n", " $(this.rubberband_canvas).bind(\"contextmenu\",function(e){\n", " return false;\n", " });\n", "\n", " function set_focus () {\n", " canvas.focus();\n", " canvas_div.focus();\n", " }\n", "\n", " window.setTimeout(set_focus, 100);\n", "}\n", "\n", "mpl.figure.prototype._init_toolbar = function() {\n", " var fig = this;\n", "\n", " var nav_element = $('
');\n", " nav_element.attr('style', 'width: 100%');\n", " this.root.append(nav_element);\n", "\n", " // Define a callback function for later on.\n", " function toolbar_event(event) {\n", " return fig.toolbar_button_onclick(event['data']);\n", " }\n", " function toolbar_mouse_event(event) {\n", " return fig.toolbar_button_onmouseover(event['data']);\n", " }\n", "\n", " for(var toolbar_ind in mpl.toolbar_items) {\n", " var name = mpl.toolbar_items[toolbar_ind][0];\n", " var tooltip = mpl.toolbar_items[toolbar_ind][1];\n", " var image = mpl.toolbar_items[toolbar_ind][2];\n", " var method_name = mpl.toolbar_items[toolbar_ind][3];\n", "\n", " if (!name) {\n", " // put a spacer in here.\n", " continue;\n", " }\n", " var button = $('');\n", " button.click(method_name, toolbar_event);\n", " button.mouseover(tooltip, toolbar_mouse_event);\n", " nav_element.append(button);\n", " }\n", "\n", " // Add the status bar.\n", " var status_bar = $('');\n", " nav_element.append(status_bar);\n", " this.message = status_bar[0];\n", "\n", " // Add the close button to the window.\n", " var buttongrp = $('
');\n", " var button = $('');\n", " button.click(function (evt) { fig.handle_close(fig, {}); } );\n", " button.mouseover('Stop Interaction', toolbar_mouse_event);\n", " buttongrp.append(button);\n", " var titlebar = this.root.find($('.ui-dialog-titlebar'));\n", " titlebar.prepend(buttongrp);\n", "}\n", "\n", "mpl.figure.prototype._root_extra_style = function(el){\n", " var fig = this\n", " el.on(\"remove\", function(){\n", "\tfig.close_ws(fig, {});\n", " });\n", "}\n", "\n", "mpl.figure.prototype._canvas_extra_style = function(el){\n", " // this is important to make the div 'focusable\n", " el.attr('tabindex', 0)\n", " // reach out to IPython and tell the keyboard manager to turn it's self\n", " // off when our div gets focus\n", "\n", " // location in version 3\n", " if (IPython.notebook.keyboard_manager) {\n", " IPython.notebook.keyboard_manager.register_events(el);\n", " }\n", " else {\n", " // location in version 2\n", " IPython.keyboard_manager.register_events(el);\n", " }\n", "\n", "}\n", "\n", "mpl.figure.prototype._key_event_extra = function(event, name) {\n", " var manager = IPython.notebook.keyboard_manager;\n", " if (!manager)\n", " manager = IPython.keyboard_manager;\n", "\n", " // Check for shift+enter\n", " if (event.shiftKey && event.which == 13) {\n", " this.canvas_div.blur();\n", " // select the cell after this one\n", " var index = IPython.notebook.find_cell_index(this.cell_info[0]);\n", " IPython.notebook.select(index + 1);\n", " }\n", "}\n", "\n", "mpl.figure.prototype.handle_save = function(fig, msg) {\n", " fig.ondownload(fig, null);\n", "}\n", "\n", "\n", "mpl.find_output_cell = function(html_output) {\n", " // Return the cell and output element which can be found *uniquely* in the notebook.\n", " // Note - this is a bit hacky, but it is done because the \"notebook_saving.Notebook\"\n", " // IPython event is triggered only after the cells have been serialised, which for\n", " // our purposes (turning an active figure into a static one), is too late.\n", " var cells = IPython.notebook.get_cells();\n", " var ncells = cells.length;\n", " for (var i=0; i= 3 moved mimebundle to data attribute of output\n", " data = data.data;\n", " }\n", " if (data['text/html'] == html_output) {\n", " return [cell, data, j];\n", " }\n", " }\n", " }\n", " }\n", "}\n", "\n", "// Register the function which deals with the matplotlib target/channel.\n", "// The kernel may be null if the page has been refreshed.\n", "if (IPython.notebook.kernel != null) {\n", " IPython.notebook.kernel.comm_manager.register_target('matplotlib', mpl.mpl_figure_comm);\n", "}\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "## Display the heatmap of the distance matrix\n", "\n", "fig = plt.figure(figsize=(8,8))\n", "ax = fig.add_subplot()\n", "\n", "sns.heatmap(\n", " distmat,\n", " cmap=sns.color_palette(\"Reds_r\"),\n", " linewidth=1,\n", " annot = True,\n", " square =True,\n", " cbar_kws={\"shrink\": 0.8})\n", "plt.title('Portuguese Parliament 14th Legislature, Distance Matrix')\n", "\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "## Perform hierarchical linkage on the distance matrix using Ward's method.\n", "distmat_link = hc.linkage(pwdist, method=\"ward\", optimal_ordering=True )\n", "\n", "sns.clustermap(\n", " distmat,\n", " annot = True,\n", " cmap=sns.color_palette(\"Reds_r\"),\n", " linewidth=1,\n", " #standard_scale=1,\n", " row_linkage=distmat_link,\n", " col_linkage=distmat_link,\n", " figsize=(8,8)).fig.suptitle('Portuguese Parliament 14th Legislature, Clustermap')\n", "\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from scipy.cluster.hierarchy import dendrogram\n", "fig = plt.figure(figsize=(8,5))\n", "dendrogram(distmat_link, labels=votes_hmn.columns)\n", "\n", "plt.title(\"Portuguese Parliament 14th Legislature, Dendogram\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## _Clustering_ de observações: DBSCAN e _Spectrum Scaling_\n" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BEPCPPEVL/JKMPSPANPSDILCDS-PPCH
BE0.0000000.4563110.4023940.3570811.0000000.5333630.9652290.8260430.8949530.777086
PCP0.4563110.0000000.2370700.4895540.9477560.6616130.9156020.8261930.8563580.759197
PEV0.4023940.2370700.0000000.4442140.9677900.6216770.9384370.8299300.8829770.773893
L/JKM0.3570810.4895540.4442140.0000000.9899220.5279980.9443540.7962840.8769280.756256
PS1.0000000.9477560.9677900.9899220.0000000.9884200.6926920.8576590.7945720.892598
PAN0.5333630.6616130.6216770.5279980.9884200.0000000.8923210.7425460.8302290.730613
PSD0.9652290.9156020.9384370.9443540.6926920.8923210.0000000.6914400.5482420.677513
IL0.8260430.8261930.8299300.7962840.8576590.7425460.6914400.0000000.5753460.583040
CDS-PP0.8949530.8563580.8829770.8769280.7945720.8302290.5482420.5753460.0000000.539134
CH0.7770860.7591970.7738930.7562560.8925980.7306130.6775130.5830400.5391340.000000
\n", "
" ], "text/plain": [ " BE PCP PEV L/JKM PS PAN PSD \\\n", "BE 0.000000 0.456311 0.402394 0.357081 1.000000 0.533363 0.965229 \n", "PCP 0.456311 0.000000 0.237070 0.489554 0.947756 0.661613 0.915602 \n", "PEV 0.402394 0.237070 0.000000 0.444214 0.967790 0.621677 0.938437 \n", "L/JKM 0.357081 0.489554 0.444214 0.000000 0.989922 0.527998 0.944354 \n", "PS 1.000000 0.947756 0.967790 0.989922 0.000000 0.988420 0.692692 \n", "PAN 0.533363 0.661613 0.621677 0.527998 0.988420 0.000000 0.892321 \n", "PSD 0.965229 0.915602 0.938437 0.944354 0.692692 0.892321 0.000000 \n", "IL 0.826043 0.826193 0.829930 0.796284 0.857659 0.742546 0.691440 \n", "CDS-PP 0.894953 0.856358 0.882977 0.876928 0.794572 0.830229 0.548242 \n", "CH 0.777086 0.759197 0.773893 0.756256 0.892598 0.730613 0.677513 \n", "\n", " IL CDS-PP CH \n", "BE 0.826043 0.894953 0.777086 \n", "PCP 0.826193 0.856358 0.759197 \n", "PEV 0.829930 0.882977 0.773893 \n", "L/JKM 0.796284 0.876928 0.756256 \n", "PS 0.857659 0.794572 0.892598 \n", "PAN 0.742546 0.830229 0.730613 \n", "PSD 0.691440 0.548242 0.677513 \n", "IL 0.000000 0.575346 0.583040 \n", "CDS-PP 0.575346 0.000000 0.539134 \n", "CH 0.583040 0.539134 0.000000 " ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "\n", "distmat_mm=((distmat-distmat.min().min())/(distmat.max().max()-distmat.min().min()))*1\n", "pd.DataFrame(distmat_mm, distmat.index, distmat.columns)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BEPCPPEVL/JKMPSPANPSDILCDS-PPCH
BE1.0000000.5436890.5976060.6429190.0000000.4666370.0347710.1739570.1050470.222914
PCP0.5436891.0000000.7629300.5104460.0522440.3383870.0843980.1738070.1436420.240803
PEV0.5976060.7629301.0000000.5557860.0322100.3783230.0615630.1700700.1170230.226107
L/JKM0.6429190.5104460.5557861.0000000.0100780.4720020.0556460.2037160.1230720.243744
PS0.0000000.0522440.0322100.0100781.0000000.0115800.3073080.1423410.2054280.107402
PAN0.4666370.3383870.3783230.4720020.0115801.0000000.1076790.2574540.1697710.269387
PSD0.0347710.0843980.0615630.0556460.3073080.1076791.0000000.3085600.4517580.322487
IL0.1739570.1738070.1700700.2037160.1423410.2574540.3085601.0000000.4246540.416960
CDS-PP0.1050470.1436420.1170230.1230720.2054280.1697710.4517580.4246541.0000000.460866
CH0.2229140.2408030.2261070.2437440.1074020.2693870.3224870.4169600.4608661.000000
\n", "
" ], "text/plain": [ " BE PCP PEV L/JKM PS PAN PSD \\\n", "BE 1.000000 0.543689 0.597606 0.642919 0.000000 0.466637 0.034771 \n", "PCP 0.543689 1.000000 0.762930 0.510446 0.052244 0.338387 0.084398 \n", "PEV 0.597606 0.762930 1.000000 0.555786 0.032210 0.378323 0.061563 \n", "L/JKM 0.642919 0.510446 0.555786 1.000000 0.010078 0.472002 0.055646 \n", "PS 0.000000 0.052244 0.032210 0.010078 1.000000 0.011580 0.307308 \n", "PAN 0.466637 0.338387 0.378323 0.472002 0.011580 1.000000 0.107679 \n", "PSD 0.034771 0.084398 0.061563 0.055646 0.307308 0.107679 1.000000 \n", "IL 0.173957 0.173807 0.170070 0.203716 0.142341 0.257454 0.308560 \n", "CDS-PP 0.105047 0.143642 0.117023 0.123072 0.205428 0.169771 0.451758 \n", "CH 0.222914 0.240803 0.226107 0.243744 0.107402 0.269387 0.322487 \n", "\n", " IL CDS-PP CH \n", "BE 0.173957 0.105047 0.222914 \n", "PCP 0.173807 0.143642 0.240803 \n", "PEV 0.170070 0.117023 0.226107 \n", "L/JKM 0.203716 0.123072 0.243744 \n", "PS 0.142341 0.205428 0.107402 \n", "PAN 0.257454 0.169771 0.269387 \n", "PSD 0.308560 0.451758 0.322487 \n", "IL 1.000000 0.424654 0.416960 \n", "CDS-PP 0.424654 1.000000 0.460866 \n", "CH 0.416960 0.460866 1.000000 " ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "affinmat_mm = pd.DataFrame(1-distmat_mm, distmat.index, distmat.columns)\n", "affinmat_mm " ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sns.set(style=\"white\")\n", "\n", "## Make the top triangle\n", "mask = np.triu(np.ones_like(affinmat_mm, dtype=np.bool))\n", "fig = plt.figure(figsize=(8,8))\n", "ax = fig.add_subplot()\n", "plt.title('Portuguese Parliament 14th Legislature, Affinity Matrix')\n", "\n", "## Display the heatmap of the affinity matrix, masking the top triangle\n", "\n", "sns.heatmap(\n", " affinmat_mm,\n", " cmap=sns.color_palette(\"Greens\"),\n", " linewidth=1,\n", " annot = False,\n", " square =True,\n", " cbar_kws={\"shrink\": .8},\n", " mask=mask,linewidths=.5)\n", "\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'BE': 0,\n", " 'PCP': 0,\n", " 'PEV': 0,\n", " 'L/JKM': 0,\n", " 'PS': 1,\n", " 'PAN': 0,\n", " 'PSD': 1,\n", " 'IL': 1,\n", " 'CDS-PP': 1,\n", " 'CH': 1}" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.cluster import DBSCAN\n", "\n", "dbscan_labels = DBSCAN(eps=1.1).fit(affinmat_mm)\n", "dbscan_labels.labels_\n", "dbscan_dict = dict(zip(distmat_mm,dbscan_labels.labels_))\n", "dbscan_dict" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'BE': 2, 'PCP': 3, 'PEV': 3, 'L/JKM': 2, 'PS': 1, 'PAN': 2, 'PSD': 1, 'IL': 0, 'CDS-PP': 0, 'CH': 0}\n" ] } ], "source": [ "from sklearn.cluster import SpectralClustering\n", "sc = SpectralClustering(4, affinity=\"precomputed\",random_state=2020).fit_predict(affinmat_mm)\n", "sc_dict = dict(zip(distmat,sc))\n", "\n", "print(sc_dict)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### _Multidimensional scaling_ \n", "\n", "Até agora temos conseguido extrair informação interessante dos dados de votação:\n", "\n", "1. O mapa térmico de votação permite-nos uma primeira visão do comportamente de todos os partidos. \n", "2. A matriz de distâncias fornece-nos uma forma de comparar as distâncias entre os diferentes partidos através de um mapa térmico.\n", "3. O dendograma identifica de forma hierárquica agrupamentos.\n", "4. Através de DBSCAN e _Spectrum Clustering_ identificamos \"blocos\" com base na matriz de afinidade.\n", "\n", "Não temos ainda uma forma de visualizar a distância relativa de cada partido em relação aos outros com base nas distâncias/semelhanças: temos algo próximo com base no dendograma mas existem outras formas de visualização interessantes.\n", "\n", "Uma das formas é o _multidimensional scaling_ que permite visualizar a distância ao projectar em 2 ou 3 dimensões (também conhecidas como _dimensões visualizavies_) conjuntos multidimensionais, mantendo a distância relativa _(“Graphical Representation of Proximity Measures for Multidimensional Data « The Mathematica Journal” 2020)_.\n", "\n", "Como é habitual temos em Python, através da biblioteca `scikit-learn` (que já usámos para DBSCAN e _Spectrum Clustering_), uma implementação que podemos usar sem grande dificuldade _(“2.2. Manifold Learning — Scikit-Learn 0.23.2 Documentation” 2020)_." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0.33299885, 0.36138221],\n", " [-0.00657669, 0.42473032],\n", " [ 0.10930041, 0.44054896],\n", " [ 0.39557388, 0.21339868],\n", " [-0.68909687, 0.15393775],\n", " [ 0.49318781, -0.08163115],\n", " [-0.51275165, -0.28664118],\n", " [ 0.14351589, -0.50188438],\n", " [-0.22776952, -0.48473615],\n", " [-0.03838211, -0.23910504]])" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.manifold import MDS\n", "\n", "mds = MDS(n_components=2, dissimilarity='precomputed',random_state=2020, n_init=100, max_iter=1000)\n", "\n", "## We use the normalised distance matrix but results would\n", "## be similar with the original one, just with a different scale/axis\n", "results = mds.fit(distmat_mm.values)\n", "coords = results.embedding_\n", "coords" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "## Graphic options\n", "sns.set()\n", "sns.set_style(\"ticks\")\n", "\n", "fig, ax = plt.subplots(figsize=(8,8))\n", "\n", "plt.title('Portuguese Parliament Voting Records Analysis, 14th Legislature', fontsize=14)\n", "\n", "for label, x, y in zip(distmat_mm.columns, coords[:, 0], coords[:, 1]):\n", " ax.scatter(x, y, s=250)\n", " ax.axis('equal')\n", " ax.annotate(label,xy = (x-0.02, y+0.025))\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from sklearn.manifold import MDS\n", "import random\n", "\n", "sns.set()\n", "sns.set_style(\"ticks\")\n", "\n", "\n", "fig, ax = plt.subplots(figsize=(8,8))\n", "\n", "fig.suptitle('Portuguese Parliament Voting Records Analysis, 14th Legislature', fontsize=14)\n", "ax.set_title('MDS with DBSCAN clusters (2D)')\n", "\n", "for label, x, y in zip(distmat_mm.columns, coords[:, 0], coords[:, 1]):\n", " ax.scatter(x, y, c = \"C\"+str(dbscan_dict[label]), s=250)\n", " ax.axis('equal')\n", " ax.annotate(label,xy = (x-0.02, y+0.025))\n", "\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from sklearn.manifold import MDS\n", "import random\n", "\n", "sns.set()\n", "sns.set_style(\"ticks\")\n", "\n", "fig, ax = plt.subplots(figsize=(8,8))\n", "fig.suptitle('Portuguese Parliament Voting Records Analysis, 14th Legislature', fontsize=14)\n", "ax.set_title('MDS with Spectrum Scaling clusters (2D)')\n", "\n", "\n", "for label, x, y in zip(distmat_mm.columns, coords[:, 0], coords[:, 1]):\n", " ax.scatter(x, y, c = \"C\"+str(sc_dict[label]), s=250)\n", " ax.axis('equal')\n", " ax.annotate(label,xy = (x-0.02, y+0.025))\n", "\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "## From https://stackoverflow.com/questions/10374930/matplotlib-annotating-a-3d-scatter-plot\n", "\n", "from mpl_toolkits.mplot3d.proj3d import proj_transform\n", "from matplotlib.text import Annotation\n", "\n", "class Annotation3D(Annotation):\n", " '''Annotate the point xyz with text s'''\n", "\n", " def __init__(self, s, xyz, *args, **kwargs):\n", " Annotation.__init__(self,s, xy=(0,0), *args, **kwargs)\n", " self._verts3d = xyz \n", "\n", " def draw(self, renderer):\n", " xs3d, ys3d, zs3d = self._verts3d\n", " xs, ys, zs = proj_transform(xs3d, ys3d, zs3d, renderer.M)\n", " self.xy=(xs,ys)\n", " Annotation.draw(self, renderer)\n", " \n", "def annotate3D(ax, s, *args, **kwargs):\n", " '''add anotation text s to to Axes3d ax'''\n", "\n", " tag = Annotation3D(s, *args, **kwargs)\n", " ax.add_artist(tag)" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from sklearn.manifold import MDS\n", "import mpl_toolkits.mplot3d\n", "import random\n", "\n", "mds = MDS(n_components=3, dissimilarity='precomputed',random_state=1234, n_init=100, max_iter=1000)\n", "results = mds.fit(distmat.values)\n", "parties = distmat.columns\n", "coords = results.embedding_\n", "\n", "sns.set()\n", "sns.set_style(\"ticks\")\n", "\n", "fig = plt.figure(figsize=(8,8))\n", "ax = fig.add_subplot(111, projection='3d')\n", "\n", "fig.suptitle('Portuguese Parliament Voting Records Analysis, 14th Legislature', fontsize=14)\n", "ax.set_title('MDS with Spectrum Scaling clusters (3D)')\n", "\n", "for label, x, y, z in zip(parties, coords[:, 0], coords[:, 1], coords[:, 2]):\n", " #print(label,pmds_colors[label])\n", " ax.scatter(x, y, z, c=\"C\"+str(sc_dict[label]),s=250)\n", " annotate3D(ax, s=str(label), xyz=[x,y,z], fontsize=10, xytext=(-3,3),\n", " textcoords='offset points', ha='right',va='bottom') \n", "plt.show()\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" } }, "nbformat": 4, "nbformat_minor": 4 }