{ "cells": [ { "cell_type": "markdown", "id": "a4a3de80", "metadata": {}, "source": [ "# POlitical DIscourse Ontology Introduction" ] }, { "cell_type": "markdown", "id": "16365aa8", "metadata": {}, "source": [ "This research addresses the need for the systematic organization and integration of political discourse, regardless of the communication channels employed, whether rooted in social platforms or dissemination channels. The aim is to facilitate nuanced and advanced analyses that consider specific aspects of the political domain, including the persuasive nature of discourse, the ideological basis of the messages, the targeted audience (wether groups or comcommunities), and the temporal context of communication. The resulting ontology, named PODIO (POlitical DIscourse Ontology), offers a structured framework to enhance the understanding of political debate. It was successfully evaluated, confirming its error-free design and alignment with functional requirements. For validation, we integrated existing datasets in the Knowledge Graph from social media, news, and electoral programs, demonstrating the effectiveness of PODIO in representing diverse forms of political discourse." ] }, { "cell_type": "markdown", "id": "fc9c4bab", "metadata": {}, "source": [ "The available resources are listed bellow:\n", "- [The ontology github repository.](https://github.com/oeg-upm/PODIO)\n", "- [The ontology and the documentation.](https://w3id.org/podio)\n", "- [The Knowledge Graph.](https://w3id.org/podio/sparql)\n" ] }, { "attachments": { "image.png": { "image/png": "" } }, "cell_type": "markdown", "id": "90896c11", "metadata": { "hide_input": true }, "source": [ "![image.png](attachment:image.png)" ] }, { "cell_type": "markdown", "id": "e0188633", "metadata": {}, "source": [ "# PODIO Knowledge Graph" ] }, { "cell_type": "markdown", "id": "68717218", "metadata": {}, "source": [ "## Generating triples" ] }, { "cell_type": "markdown", "id": "c88d9142", "metadata": { "ExecuteTime": { "end_time": "2023-12-20T19:42:50.523364Z", "start_time": "2023-12-20T19:42:50.520202Z" } }, "source": [ "You will need to install [yarrrml-parser](https://rml.io/yarrrml/tutorial/getting-started/) and download [rmlmapper-6.1.3](https://github.com/RMLio/rmlmapper-java/releases) to generate the RDF code and triples." ] }, { "cell_type": "code", "execution_count": 18, "id": "664ca44e", "metadata": { "ExecuteTime": { "end_time": "2024-02-12T11:52:33.720162Z", "start_time": "2024-02-12T11:52:33.717031Z" } }, "outputs": [], "source": [ "# This is the procedure to be used to generate the different triples.\n", "\n", "import os\n", "\n", "## Path where rmlmapper is\n", "script_path=\"/home/ibai/OEG/MadridElectoralTwitterScrapper/SeleniumTwitterScrapper/mapper/mapping_requirements\"\n", "\n", "def generate_triples(filename_mappings, verbose=False):\n", " ### Generate turtle file\n", " os.system(f\"yarrrml-parser -i {filename_mappings}.yml -o {filename_mappings}.ttl\")\n", " ### Generate triples\n", " if verbose:\n", " os.system(f\"java -jar {script_path}/rmlmapper-6.1.3-r367-all.jar -v -m {filename_mappings}.ttl -o {filename_mappings}.nt\")\n", " else:\n", " os.system(f\"java -jar {script_path}/rmlmapper-6.1.3-r367-all.jar -m {filename_mappings}.ttl -o {filename_mappings}.nt\")\n", "\n", " return \"Triples Generated\"" ] }, { "cell_type": "markdown", "id": "e52e363f", "metadata": { "heading_collapsed": true }, "source": [ "### Party Manifestos and Political Proposals" ] }, { "cell_type": "markdown", "id": "9ee5656a", "metadata": { "hidden": true }, "source": [ "Among the datasets of party manifestos available online, the most relevant we have found are:\n", "\n", "Description|Dataset\n", "---|---\n", "**USA elections party manifestos**| Woolley, J. T., Peters, G. & University Of California, S. B. (1999) The American Presidency Project. Santa Barbara, Calif.: University of California. [Web.] Retrieved from the Library of Congress, https://lccn.loc.gov/2005616760.\n", "**USA States elections party manifestos** | Hopkins, Daniel J; Coffey, Daniel J; Galvin, Daniel J; Gamm, Gerald; Henderson, John; Paddock, Joel W.; Schickler, Eric, 2022, \"Select American State Party Platforms, 1846-2017\", https://doi.org/10.7910/DVN/KNOSHL, Harvard Dataverse, V1\n", "**Scottish elections party manifestos** | Greene, Zachary; McMillan, Fraser, 2020, \"Scottish Party Election Manifestos, 1999-2016\", https://doi.org/10.7910/DVN/PH8XZO, Harvard Dataverse, V1\n", "**German local elections party manifestos** | Gross, M., & Jankowski, M. (2019). Dimensions of political conflict and party positions in multi-level democracies: evidence from the Local Manifesto Project. In West European Politics (Vol. 43, Issue 1, pp. 74–101). Informa UK Limited. https://doi.org/10.1080/01402382.2019.1602816\n", "**Spanish regional elections party manifestos** | Alonso, S., Gómez, B., & Cabeza, L. (2013). Measuring Centre–Periphery Preferences: The Regional Manifestos Project. In Regional & Federal Studies (Vol. 23, Issue 2, pp. 189–211). Informa UK Limited. https://doi.org/10.1080/13597566.2012.754351\n", "**European elections party manifestos** | Schmitt, Hermann, & Wüst, Andreas M. (2012). Euromanifestos Project (EMP) 1979 - 2004. GESIS Data Archive, Cologne. ZA4457 Data file Version 1.0.0, https://doi.org/10.4232/1.4457.\n" ] }, { "cell_type": "markdown", "id": "ea4f269b", "metadata": { "hidden": true }, "source": [ "To meet the competency questions we will reuse USA elections party manifestos from 2020 and 2016. This implies that we will use the 2020 and 2016 Democratic party manifestos and the 2016 Republican party manifesto. This is because there is no new party platform for republican party in 2020 elections, they reuse the 2016 manifesto: [*\"RESOLVED, That the 2020 Republican National Convention will adjourn without adopting a new platform until the 2024 Republican National Convention;\"*](https://www.presidency.ucsb.edu/documents/resolution-regarding-the-republican-party-platform)\n", "\n", "Party manifestos are replete of policy proposals such as the following: *Democrats will aggressively enforce non-discrimination protections in the Americans with Disabilities Act and other civil rights laws, especially when designing emergency management systems and new facilities and services in response to the pandemic. Democrats will prohibit unjustified segregation of patients with disabilities, and additionally prohibit rationing of health care that refuses or diverts hospitalization, treatment, or supplies based on a patient's disability. We recognize people with disabilities living in group homes and other care facilities are at greater risk of contracting COVID-19, and that people with disabilities may require additional resources to protect their health, well-being, and independence during the pandemic. We will improve oversight and expand protections for residents and staff at nursing homes, which have seen some of the worst COVID-19 outbreaks. And we will expand support for telemedicine, so Americans do not have to go without essential health care during the pandemic.*\n", "\n", "Extracting the policy proposals with granularity and precision is a complicated task. Whereas in other party manifestos it is easier because the proposals are numbered, in the US elections manifestos this is not the case. In order to populate the KG, we decided to generalise and take every paragraph of the party manifesto as a policy proposal. " ] }, { "cell_type": "code", "execution_count": null, "id": "b1fa99d6", "metadata": { "ExecuteTime": { "end_time": "2024-02-06T16:57:05.876944Z", "start_time": "2024-02-06T16:57:03.534104Z" }, "code_folding": [], "hidden": true, "scrolled": false }, "outputs": [], "source": [ "# Downloading the Manifestos, aggregate with metadata and save them as JSON\n", "from bs4 import BeautifulSoup\n", "import requests, json, os\n", "\n", "## Set filenames (WITHOUT extension)\n", "if not os.path.exists(\"data\"): os.mkdir(\"data\")\n", "filename_manifestos= \"data/manifestos\"\n", "filename_proposals= \"data/proposals\"\n", "\n", "## Download the manifestos from the web and aggregate with extra data\n", "manifestos= {}\n", "pmanifestos= [\"2020-democratic-party-platform\", \"2016-democratic-party-platform\", \"2016-republican-party-platform\"]\n", "for pmanifesto in pmanifestos:\n", " response = requests.get(f'https://www.presidency.ucsb.edu/documents/{pmanifesto}')\n", " assert(response.status_code==200)\n", "\n", " soup = BeautifulSoup(response.content, 'html.parser')\n", " content= soup.find('div', class_='field-docs-content').text\n", " manifestos[pmanifesto]= {\"date\": f\"{pmanifesto.split('-')[0]}-1-1T00:00:00\", \n", " \"pparty_id\": f'{pmanifesto.split(\"-\")[1]}Party',\n", " \"source\": f'https://www.presidency.ucsb.edu/documents/{pmanifesto}',\n", " \"language\": \"http://id.loc.gov/vocabulary/iso639-2/eng\",\n", " \"content\": content}\n", "\n", "manifestos[\"2020-democratic-party-platform\"][\"pparty\"]= \"http://www.wikidata.org/entity/Q29552\"\n", "manifestos[\"2020-democratic-party-platform\"][\"ideology\"]= \"http://www.wikidata.org/entity/Q16152203\"\n", "manifestos[\"2020-democratic-party-platform\"][\"candidate\"]= \"http://www.wikidata.org/entity/Q6279\"\n", "manifestos[\"2020-democratic-party-platform\"][\"party_wikidata_id\"]= \"Q29552\"\n", "manifestos[\"2020-democratic-party-platform\"][\"candidate_wikidata_id\"]= \"Q6279\"\n", "\n", "\n", "\n", "manifestos[\"2016-democratic-party-platform\"][\"pparty\"]= \"http://www.wikidata.org/entity/Q29552\"\n", "manifestos[\"2016-democratic-party-platform\"][\"ideology\"]= \"http://www.wikidata.org/entity/Q16152203\"\n", "manifestos[\"2016-democratic-party-platform\"][\"candidate\"]= \"http://www.wikidata.org/entity/Q6294\"\n", "manifestos[\"2016-democratic-party-platform\"][\"party_wikidata_id\"]= \"Q29552\"\n", "manifestos[\"2016-democratic-party-platform\"][\"candidate_wikidata_id\"]= \"Q6294\"\n", "\n", "\n", "manifestos[\"2016-republican-party-platform\"][\"pparty\"]= \"http://www.wikidata.org/entity/Q29468\"\n", "manifestos[\"2016-republican-party-platform\"][\"ideology\"]= \"http://www.wikidata.org/entity/Q7169\"\n", "manifestos[\"2016-republican-party-platform\"][\"candidate\"]= \"http://www.wikidata.org/entity/Q22686\"\n", "manifestos[\"2016-republican-party-platform\"][\"party_wikidata_id\"]= \"Q29468\"\n", "manifestos[\"2016-republican-party-platform\"][\"candidate_wikidata_id\"]= \"Q22686\"\n", "\n", "\n", "manifestos[\"2020-republican-party-platform\"]= manifestos[\"2016-republican-party-platform\"].copy()\n", "manifestos[\"2020-republican-party-platform\"][\"date\"]= \"2020-1-1T00:00:00\"\n", "pmanifestos.append(\"2020-republican-party-platform\")\n", "\n", "## Save manifestos data\n", "with open(f\"{filename_manifestos}.json\", \"w\") as f:\n", " json.dump(manifestos, f)\n", "\n", "## Extract policy proposals from manifestos\n", "policy_proposals= []\n", "for pmanifesto in pmanifestos:\n", " proposals= [x for x in manifestos[pmanifesto][\"content\"].split(\"\\n\") if x != \"\"]\n", " counter= 0\n", " for proposal in proposals:\n", " counter += 1\n", " if pmanifesto.split(\"-\")[1] == \"republican\": \n", " author= \"http://www.wikidata.org/entity/Q29468\"\n", " ideology= \"http://www.wikidata.org/entity/Q7169\"\n", " else: \n", " author=\"http://www.wikidata.org/entity/Q29552\"\n", " ideology= \"http://www.wikidata.org/entity/Q16152203\"\n", " proposal_json= {\n", " \"id\": counter,\n", " \"date\": f\"{pmanifesto.split('-')[0]}-1-1T00:00:00\", \n", " \"language\": \"http://id.loc.gov/vocabulary/iso639-2/eng\",\n", " \"content\": proposal, \n", " \"source\": f'https://www.presidency.ucsb.edu/documents/{pmanifesto}',\n", " \"word_count\": len(proposal.split()),\n", " \"part_of\": f'{pmanifesto.split(\"-\")[1]}Party',\n", " \"target\": \"http://www.wikidata.org/entity/Q846570\",\n", " \"ideology\": ideology, \n", " \"creator\": author,\n", " \"publisher\": author\n", " }\n", " policy_proposals.append(proposal_json)\n", "\n", "## Save policy proposals data\n", "with open(f\"{filename_proposals}.json\", \"w\") as f:\n", " json.dump(policy_proposals, f)" ] }, { "cell_type": "code", "execution_count": null, "id": "5f94501a", "metadata": { "ExecuteTime": { "end_time": "2024-02-06T16:57:30.037738Z", "start_time": "2024-02-06T16:57:05.878173Z" }, "code_folding": [ 8, 31, 35 ], "hidden": true }, "outputs": [], "source": [ "# Generate the data mapping file to transform the above data into triples\n", "import os\n", "\n", "## Set filenames (WITHOUT extension)\n", "filename_mappings= \"mappings/mappings_manifestos\"\n", "if not os.path.exists(\"mappings\"): os.mkdir(\"mappings\")\n", "\n", "## Generate mapping file according to the JSONs data\n", "mapping= f\"\"\"\n", "prefixes:\n", " #Core imports\n", " rdf: \"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"\n", " rdfs: \"http://www.w3.org/2000/01/rdf-schema#\"\n", " xsd: \"http://www.w3.org/2001/XMLSchema#\"\n", " xml: \"http://www.w3.org/XML/1998/namespace\"\n", " #Vocabulary imports\n", " schema: \"http://schema.org/\"\n", " terms: \"http://purl.org/dc/terms/\"\n", " dc: \"http://purl.org/dc/elements/1.1/\"\n", " dcam: \"http://purl.org/dc/dcam/\"\n", " vann: \"http://purl.org/vocab/vann/\"\n", " skos: \"http://www.w3.org/2004/02/skos/core#\"\n", " #Ontology imports\n", " owl: \"http://www.w3.org/2002/07/owl#\"\n", " foaf: \"http://xmlns.com/foaf/0.1/\"\n", " sioc: \"http://rdfs.org/sioc/ns#\"\n", " lkg: \"http://lkg.lynx-project.eu/def/\"\n", " nif: \"http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#\"\n", " eli: \"http://data.europa.eu/eli/ontology#\"\n", " #Knowledge graph domain declaration\n", " podio: \"http://w3id.org/podio#\" # URL to ontoology\n", "\n", "sources:\n", " manifestos_json: [{filename_manifestos}.json~jsonpath, \"$.[*]\"]\n", " proposals_json: [{filename_proposals}.json~jsonpath, \"$.[*]\"]\n", "\n", "mappings:\n", " Manifestos:\n", " sources: \n", " - manifestos_json\n", " s: podio:PartyManifesto/USA/$(pparty_id)/$(date)\n", " po:\n", " - [a, podio:PartyManifesto]\n", " - [terms:language, $(language)~iri]\n", " - [terms:created, $(date), xsd:dateTime]\n", " - [terms:source, $(source)~iri]\n", " - [terms:publisher, $(pparty)~iri]\n", " - [podio:content, $(content), xsd:string]\n", " - [podio:ideology, $(ideology)~iri]\n", " - [podio:proposesCandidate, $(candidate)~iri]\n", "\n", " Proposals:\n", " sources: \n", " - proposals_json\n", " s: podio:PolicyProposal/USA/$(part_of)/$(date)/$(id)\n", " po:\n", " - [a, podio:PolicyProposal]\n", " - [terms:language, $(language)~iri]\n", " - [terms:created, $(date), xsd:dateTime]\n", " - [terms:source, $(source)~iri]\n", " - [terms:identifier, $(id), xsd:int]\n", " - [podio:ideology, $(ideology)~iri]\n", " - [podio:content, $(content), xsd:string]\n", " - [schema:wordCount, $(word_count)]\n", " - [terms:publisher, $(publisher)~iri]\n", " - [terms:creator, $(creator)~iri]\n", " - [podio:hasTarget, $(target)~iri]\n", " - [terms:isPartOf, podio:PartyManifesto/USA/$(part_of)/$(date)~iri]\n", " \n", " AgentCandidate:\n", " sources: \n", " - manifestos_json\n", " s: $(candidate)\n", " po:\n", " - [a, foaf:Agent]\n", " - [terms:identifier, $(candidate_wikidata_id), xsd:string]\n", " - [rdfs:isDefinedBy, $(candidate)~iri]\n", " \n", " AgentParty:\n", " sources: \n", " - manifestos_json\n", " s: $(pparty)\n", " po:\n", " - [a, foaf:Agent]\n", " - [terms:identifier, $(party_wikidata_id), xsd:string]\n", " - [rdfs:isDefinedBy, $(pparty)~iri]\n", "\"\"\"\n", "\n", "## Save mappings file\n", "with open(f\"{filename_mappings}.yml\", \"w\") as f:\n", " f.write(mapping)\n", " \n", "## Generate the triples\n", "generate_triples(filename_mappings)\n", "\n", "print(\"Political Party Manifestos Triples Generated\")" ] }, { "cell_type": "markdown", "id": "a6a89168", "metadata": {}, "source": [ "### Social Media Posts" ] }, { "cell_type": "markdown", "id": "f71044ba", "metadata": {}, "source": [ "Among the datasets of political social media posts available online, the most relevant we have found are:\n", "\n", "Social Media | Description | Dataset\n", "--- | --- | ---\n", "Facebook | **2019 Spanish General Elections Facebook Ads** | Baviera Puig, T. (2020). 2019 Spanish General Elections Facebook Ads Dataset. Universitat Politècnica de València. https://doi.org/10.4995/Dataset/10251/146502\n", "Twitter | **Spanish political parties tweets** | https://www.kaggle.com/datasets/ricardomoya/tweets-poltica-espaa/data\n", "Twitter | **Trump Tweets as of June 2020** | https://www.kaggle.com/datasets/austinreese/trump-tweets/data\n", "Twitter | **Bidedn Tweets in 2019 and 2020** | https://www.kaggle.com/datasets/akashdusane/joe-biden-tweets-us-elections\n", "\n", "To show the ability to work with different social networks we will include data from both Twitter and Facebook. In addition, we will include posts from two different countries to demonstrate that PODIO is able to correctly represent the international political discourse. This knowledge will be exploited with SPARQL queries.\n", "\n", "As many datasets are from Kaggle, you need to install [kaggle](https://pypi.org/project/kaggle/) library and get a free API key, see the [kaggle documentation](https://www.kaggle.com/docs/api) for more information about this. " ] }, { "cell_type": "code", "execution_count": 30, "id": "5d57024f", "metadata": { "ExecuteTime": { "end_time": "2024-02-12T12:00:34.331589Z", "start_time": "2024-02-12T12:00:27.417066Z" }, "code_folding": [] }, "outputs": [ { "data": { "text/plain": [ "4605002" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Data set downloading\n", "import kaggle\n", "import requests\n", "import os \n", "\n", "if not os.path.exists(\"data\"): os.mkdir(\"data\")\n", "\n", "## https://www.kaggle.com/docs/api\n", "kaggle.api.authenticate\n", "\n", "## Spanish political parties tweets\n", "kaggle.api.dataset_download_files('ricardomoya/tweets-poltica-espaa', path='data/', unzip=True)\n", "\n", "## Trump Tweets as of June 2020\n", "kaggle.api.dataset_download_files(\"austinreese/trump-tweets\", path='data/', unzip=True)\n", "\n", "## Bidedn Tweets in 2019 and 2020\n", "kaggle.api.dataset_download_files(\"akashdusane/joe-biden-tweets-us-elections\", path='data/', unzip=True)\n", "\n", "## 2019 Spanish General Elections Facebook Ads\n", "response = requests.get(\"https://riunet.upv.es/bitstream/handle/10251/146502/Facebook_Ads_2019_Spanish_General_Elections.csv?sequence=1&isAllowed=y\")\n", "open(\"data/facebook_ads.csv\", \"w\").write(response.content.decode('utf-16'))\n" ] }, { "cell_type": "code", "execution_count": 31, "id": "bb443c3f", "metadata": { "ExecuteTime": { "end_time": "2024-02-12T12:00:36.281706Z", "start_time": "2024-02-12T12:00:34.333174Z" }, "code_folding": [], "scrolled": true }, "outputs": [], "source": [ "# Data sets loading, cleaning, enriching and storage\n", "import pandas as pd\n", "from datetime import datetime\n", "import re, json\n", "\n", "## Set filenames (WITHOUT extension)\n", "filename_all_data= \"data/conversational\"\n", "filename_extra_data= \"data/conversational_extra\"\n", "filename_metrics_data= \"data/conversational_metrics\"\n", "\n", "## Limit to avoid overloading the graph database due to excessive volume of data sets.\n", "limit=4000 #If you do not want limit set as None\n", "\n", "## Dataset loading\n", "df_spain_facebook_ads= pd.read_csv(\"data/facebook_ads.csv\", sep=';', on_bad_lines='skip')[:limit]\n", "df_spain_tweets= pd.read_csv(\"data/tweets_politica_kaggle.csv\", sep='\\t', on_bad_lines='skip')[:limit]\n", "df_trump_tweets= pd.read_csv(\"data/realdonaldtrump.csv\", sep=',', on_bad_lines='skip')[:limit]\n", "df_biden_tweets= pd.read_csv(\"data/JoeBiden_Tweets_2019-20.csv\", sep=',', on_bad_lines='skip')[:limit]\n", "\n", "## Manual entity linking\n", "wikidata_ideology= {\"izquierdaunida\": \"http://www.wikidata.org/entity/Q121254\",\n", " \"ciudadanos\": \"http://www.wikidata.org/entity/Q6216\",\n", " \"ciudadanoscs\": \"http://www.wikidata.org/entity/Q6216\",\n", " \"psoe\": \"http://www.wikidata.org/entity/Q821102\",\n", " \"PSOE\": \"http://www.wikidata.org/entity/Q821102\",\n", " \"partidopopular\": \"http://www.wikidata.org/entity/Q617609\",\n", " \"pp\": \"http://www.wikidata.org/entity/Q617609\",\n", " \"podemos\": \"http://www.wikidata.org/entity/Q275595\",\n", " \"PODEMOS\": \"http://www.wikidata.org/entity/Q275595\",\n", " \"ppopular\": \"http://www.wikidata.org/entity/Q617609\",\n", " \"voxespaña\": \"http://www.wikidata.org/entity/Q948731\",\n", " \"vox\": \"http://www.wikidata.org/entity/Q948731\",\n", " \"vox_es\": \"http://www.wikidata.org/entity/Q948731\",\n", " \"realdonaldtrump\": \"http://www.wikidata.org/entity/Q31838499\",\n", " \"joebiden\": \"http://www.wikidata.org/entity/Q16152203\"}\n", "\n", "wikidata_agent= { \"izquierdaunida\": \"http://www.wikidata.org/entity/Q623740\",\n", " \"ciudadanos\": \"http://www.wikidata.org/entity/Q1393123\",\n", " \"ciudadanoscs\": \"http://www.wikidata.org/entity/Q1393123\",\n", " \"psoe\": \"http://www.wikidata.org/entity/Q138198\",\n", " \"PSOE\": \"http://www.wikidata.org/entity/Q138198\",\n", " \"partidopopular\": \"http://www.wikidata.org/entity/Q185088\",\n", " \"podemos\": \"http://www.wikidata.org/entity/Q16059622\",\n", " \"PODEMOS\": \"http://www.wikidata.org/entity/Q16059622\",\n", " \"pp\": \"http://www.wikidata.org/entity/Q185088\",\n", " \"ppopular\": \"http://www.wikidata.org/entity/Q185088\",\n", " \"voxespaña\": \"http://www.wikidata.org/entity/Q15630787\",\n", " \"vox\": \"http://www.wikidata.org/entity/Q15630787\",\n", " \"vox_es\": \"http://www.wikidata.org/entity/Q15630787\",\n", " \"realdonaldtrump\": \"http://www.wikidata.org/entity/Q22686\",\n", " \"joebiden\": \"http://www.wikidata.org/entity/Q6279\"}\n", "\n", "twitter_accounts= {\n", " \"psoe\": \"PSOE\",\n", " \"vox\": \"vox_es\",\n", " \"ciudadanos\": \"CiudadanosCs\",\n", " \"pp\": \"ppopular\",\n", " \"podemos\": \"PODEMOS\"\n", "}\n", "\n", "## Regular expresion for extracting social media specific data\n", "pattern_twitter_mentions= r'(?<=[^\\w!])@(\\w+)\\b'\n", "pattern_facebook_mentions = r'(?<=[^\\w!])@([A-Z][a-z0-9]*\\s?(?:[A-Z][a-z0-9]*)*)\\b'\n", "pattern_hashtag = r'(?!\\s)#([A-Za-z]\\w*)\\b'\n", "pattern_links= r'(?:http|ftp|https):\\/\\/(?:[\\w_-]+(?:(?:\\.[\\w_-]+)+))(?:[\\w.,@?^=%&:\\/~+#-]*[\\w@?^=%&\\/~+#-])'\n", "\n", "## Dataset specific operations: cleaning and enriching\n", "df_spain_tweets.drop(['cuenta'], axis=1, inplace=True)\n", "df_spain_tweets.rename(columns = {'timestamp':'date'}, inplace = True) \n", "df_spain_tweets.rename(columns = {'tweet':'content'}, inplace = True) \n", "df_spain_tweets.rename(columns = {'partido':'account'}, inplace = True) \n", "df_spain_tweets[\"account\"]= [x.replace(\" \",\"\") for x in df_spain_tweets[\"account\"]]\n", "df_spain_tweets[\"account\"]= [twitter_accounts[x] for x in df_spain_tweets[\"account\"]]\n", "df_spain_tweets[\"date\"]= [str(datetime.fromtimestamp(row).isoformat()) for row in df_spain_tweets[\"date\"]]\n", "df_spain_tweets[\"mentions\"]= [re.findall(pattern_twitter_mentions, str(text), re.IGNORECASE) for text in df_spain_tweets[\"content\"]]\n", "df_spain_tweets[\"hashtags\"]= [re.findall(pattern_hashtag, str(text), re.IGNORECASE) for text in df_spain_tweets[\"content\"]]\n", "df_spain_tweets[\"links\"]= [re.findall(pattern_links, str(text), re.IGNORECASE) for text in df_spain_tweets[\"content\"]]\n", "df_spain_tweets[\"agent\"]= [wikidata_agent[x.lower()] for x in df_spain_tweets[\"account\"]]\n", "df_spain_tweets[\"ideology\"]= [wikidata_ideology[x.lower()] for x in df_spain_tweets[\"account\"]]\n", "df_spain_tweets[\"media\"]=[\"Twitter\" for x in range(df_spain_tweets.shape[0])]\n", "df_spain_tweets[\"media_url\"]=[\"https://twitter.com/\" for x in range(df_spain_tweets.shape[0])]\n", "df_spain_tweets[\"source\"]=[\"https://www.kaggle.com/datasets/ricardomoya/tweets-poltica-espaa\" for x in range(df_spain_tweets.shape[0])]\n", "df_spain_tweets[\"language\"]=[\"http://id.loc.gov/vocabulary/iso639-2/spa\" for x in range(df_spain_tweets.shape[0])]\n", "df_spain_tweets[\"id\"]= [f\"twes{x}\" for x in range(df_spain_tweets.shape[0])]\n", "df_spain_tweets[\"word_count\"]= [len(str(text).split()) for text in df_spain_tweets[\"content\"]]\n", "df_spain_tweets[\"content\"]= [str(x).replace(\"\\\"\", \"\\'\").replace(\"“\", \"\\'\") for x in df_spain_tweets[\"content\"]]\n", "\n", "### ----------- ###\n", "\n", "df_spain_facebook_ads.drop(['id_anuncio'], axis=1, inplace=True)\n", "df_spain_facebook_ads.drop(['id_nombre_archivo'], axis=1, inplace=True)\n", "df_spain_facebook_ads.drop(['elecciones'], axis=1, inplace=True)\n", "df_spain_facebook_ads.drop(['Identificador_Fb'], axis=1, inplace=True)\n", "df_spain_facebook_ads.drop(['Fecha cierre'], axis=1, inplace=True)\n", "df_spain_facebook_ads.drop(['Texto de las Fechas'], axis=1, inplace=True)\n", "df_spain_facebook_ads.drop(['Imagen'], axis=1, inplace=True)\n", "df_spain_facebook_ads.drop(['Video'], axis=1, inplace=True)\n", "df_spain_facebook_ads.drop(['Carrusel'], axis=1, inplace=True)\n", "df_spain_facebook_ads.drop(['Dinero'], axis=1, inplace=True)\n", "df_spain_facebook_ads.drop(['id_contenido_anuncio'], axis=1, inplace=True)\n", "df_spain_facebook_ads.rename(columns = {'Fecha lanzamiento':'date'}, inplace = True) \n", "df_spain_facebook_ads.rename(columns = {'Texto del Anuncio':'content'}, inplace = True) \n", "df_spain_facebook_ads.rename(columns = {'Impresiones':'views'}, inplace = True) \n", "df_spain_facebook_ads.rename(columns = {'Partido':'account'}, inplace = True)\n", "df_spain_facebook_ads[\"account\"]= [x.replace(\" \",\"\") for x in df_spain_facebook_ads[\"account\"]]\n", "df_spain_facebook_ads[\"date\"]= [str(datetime.strptime(f\"{row[:-2]}2019\", '%d/%m/%Y').isoformat()) for row in df_spain_facebook_ads[\"date\"]]\n", "df_spain_facebook_ads[\"mentions\"]= [re.findall(pattern_facebook_mentions, str(text), re.IGNORECASE) for text in df_spain_facebook_ads[\"content\"]]\n", "df_spain_facebook_ads[\"hashtags\"]= [re.findall(pattern_hashtag, str(text), re.IGNORECASE) for text in df_spain_facebook_ads[\"content\"]]\n", "df_spain_facebook_ads[\"links\"]= [re.findall(pattern_links, str(text), re.IGNORECASE) for text in df_spain_facebook_ads[\"content\"]]\n", "df_spain_facebook_ads[\"agent\"]= [wikidata_agent[x.lower()] for x in df_spain_facebook_ads[\"account\"]]\n", "df_spain_facebook_ads[\"ideology\"]= [wikidata_ideology[x.lower()] for x in df_spain_facebook_ads[\"account\"]]\n", "df_spain_facebook_ads[\"media\"]=[\"Facebook\" for x in range(df_spain_facebook_ads.shape[0])]\n", "df_spain_facebook_ads[\"media_url\"]=[\"https://facebook.com/\" for x in range(df_spain_facebook_ads.shape[0])]\n", "df_spain_facebook_ads[\"source\"]=[\"https://doi.org/10.4995/Dataset/10251/146502\" for x in range(df_spain_facebook_ads.shape[0])]\n", "df_spain_facebook_ads[\"language\"]=[\"http://id.loc.gov/vocabulary/iso639-2/spa\" for x in range(df_spain_facebook_ads.shape[0])]\n", "df_spain_facebook_ads[\"id\"]= [f\"fbesads{x}\" for x in range(df_spain_facebook_ads.shape[0])]\n", "df_spain_facebook_ads[\"word_count\"]= [len(str(text).split()) for text in df_spain_facebook_ads[\"content\"]]\n", "df_spain_facebook_ads[\"content\"]= [str(x).replace(\"\\\"\", \"\\'\").replace(\"“\", \"\\'\") for x in df_spain_facebook_ads[\"content\"]]\n", "\n", "### ----------- ###\n", " \n", "#The trump dataset comes with mentions and hashtags fields\n", "df_trump_tweets.fillna(\"\", inplace=True)\n", "df_trump_tweets.rename(columns = {'retweets':'reposts'}, inplace = True) \n", "df_trump_tweets.rename(columns = {'link':'source'}, inplace = True) \n", "df_trump_tweets.rename(columns = {'favorites':'likes'}, inplace = True) \n", "df_trump_tweets[\"links\"]= [re.findall(pattern_links, str(text), re.IGNORECASE) for text in df_trump_tweets[\"content\"]]\n", "df_trump_tweets[\"account\"]=[\"realDonaldTrump\" for x in range(df_trump_tweets.shape[0])]\n", "df_trump_tweets[\"agent\"]= [wikidata_agent[x.lower()] for x in df_trump_tweets[\"account\"]]\n", "df_trump_tweets[\"ideology\"]= [wikidata_ideology[x.lower()] for x in df_trump_tweets[\"account\"]]\n", "df_trump_tweets[\"media\"]=[\"Twitter\" for x in range(df_trump_tweets.shape[0])]\n", "df_trump_tweets[\"media_url\"]=[\"https://twitter.com/\" for x in range(df_trump_tweets.shape[0])]\n", "df_trump_tweets[\"language\"]=[\"http://id.loc.gov/vocabulary/iso639-2/eng\" for x in range(df_trump_tweets.shape[0])]\n", "df_trump_tweets[\"mentions\"]= [x.replace(\"@\", \"\").split(\",\") for x in df_trump_tweets[\"mentions\"]]\n", "df_trump_tweets[\"hashtags\"]= [x.replace(\"#\", \"\").split(\",\") for x in df_trump_tweets[\"hashtags\"]]\n", "df_trump_tweets['id'] = df_trump_tweets['id'].astype(str)\n", "df_trump_tweets[\"word_count\"]= [len(str(text).split()) for text in df_trump_tweets[\"content\"]]\n", "df_trump_tweets[\"content\"]= [str(x).replace(\"\\\"\", \"\\'\").replace(\"“\", \"\\'\") for x in df_trump_tweets[\"content\"]]\n", "\n", "### ----------- ###\n", "\n", "df_biden_tweets.rename(columns = {'tweet':'content'}, inplace = True) \n", "df_biden_tweets.rename(columns = {'retweets':'reposts'}, inplace = True) \n", "df_biden_tweets.rename(columns = {'url':'source'}, inplace = True) \n", "df_biden_tweets.rename(columns = {'timestamp':'date'}, inplace = True) \n", "df_biden_tweets[\"mentions\"]= [re.findall(pattern_twitter_mentions, str(text), re.IGNORECASE) for text in df_biden_tweets[\"content\"]]\n", "df_biden_tweets[\"hashtags\"]= [re.findall(pattern_hashtag, str(text), re.IGNORECASE) for text in df_biden_tweets[\"content\"]]\n", "df_biden_tweets[\"links\"]= [re.findall(pattern_links, str(text), re.IGNORECASE) for text in df_biden_tweets[\"content\"]]\n", "df_biden_tweets[\"account\"]=[\"joebiden\" for x in range(df_biden_tweets.shape[0])]\n", "df_biden_tweets[\"agent\"]= [wikidata_agent[x.lower()] for x in df_biden_tweets[\"account\"]]\n", "df_biden_tweets[\"ideology\"]= [wikidata_ideology[x.lower()] for x in df_biden_tweets[\"account\"]]\n", "df_biden_tweets[\"date\"]= [str(datetime.strptime(f\"{str(row)}\", '%d-%m-%Y %H:%M').isoformat()) for row in df_biden_tweets[\"date\"]]\n", "df_biden_tweets[\"media\"]=[\"Twitter\" for x in range(df_biden_tweets.shape[0])]\n", "df_biden_tweets[\"media_url\"]=[\"https://twitter.com/\" for x in range(df_biden_tweets.shape[0])]\n", "df_biden_tweets[\"language\"]=[\"http://id.loc.gov/vocabulary/iso639-2/eng\" for x in range(df_biden_tweets.shape[0])]\n", "#df_biden_tweets['id'] = df_biden_tweets['id'].astype(str)\n", "#Wrong kaggle data, we have to regenerate ids\n", "df_biden_tweets[\"id\"]= [f\"jbustw{x}\" for x in range(df_biden_tweets.shape[0])]\n", "df_biden_tweets[\"word_count\"]= [len(str(text).split()) for text in df_biden_tweets[\"content\"]]\n", "df_biden_tweets[\"content\"]= [str(x).replace(\"\\\"\", \"\\'\").replace(\"“\", \"\\'\") for x in df_biden_tweets[\"content\"]]\n", "\n", "\n", "## Merge all datasets\n", "all_data= pd.concat([df_spain_tweets, df_spain_facebook_ads, df_biden_tweets, df_trump_tweets], axis=0, join='outer', ignore_index=True)\n", "all_data.fillna(\"\", inplace=True)\n", "\n", "## We detect noise in the dataset that causes errors, so we must remove \\ character \n", "all_data[\"content\"]= [str(x).replace(\"\\\\\", \"\\\\\\\\\") for x in all_data[\"content\"]]\n", "\n", "## Extract some data from the dataset to different JSON files. This facilitates mapping.\n", "extra_data= []\n", "metrics_data= []\n", "for r in range(all_data.shape[0]):\n", " for link in all_data[\"links\"][r]:\n", " if link==\"\": continue\n", " links_json= {\"media\": all_data[\"media\"][r],\n", " \"id\": all_data[\"id\"][r],\n", " \"account\": all_data[\"account\"][r],\n", " \"link\": link}\n", " extra_data.append(links_json)\n", "\n", " for hashtag in all_data[\"hashtags\"][r]:\n", " if hashtag==\"\": continue\n", " hashtags_json= {\"media\": all_data[\"media\"][r],\n", " \"id\": all_data[\"id\"][r],\n", " \"account\": all_data[\"account\"][r],\n", " \"hashtag\": hashtag}\n", " extra_data.append(hashtags_json)\n", " \n", " for mention in all_data[\"mentions\"][r]:\n", " if mention==\"\": continue\n", " mentions_json= {\"media\": all_data[\"media\"][r],\n", " \"id\": all_data[\"id\"][r],\n", " \"account\": all_data[\"account\"][r],\n", " \"mention\": mention}\n", " extra_data.append(mentions_json)\n", " \n", " #Views\n", " views= all_data[\"views\"][r]\n", " if views !=\"\":\n", " views_json= {\"media\": all_data[\"media\"][r],\n", " \"id\": all_data[\"id\"][r],\n", " \"account\": all_data[\"account\"][r],\n", " \"metric\": \"http://schema.org/ViewAction\",\n", " \"metric_name\": \"ViewAction\",\n", " \"number\": views}\n", " metrics_data.append(views_json)\n", " \n", " #Replies\n", " replies = all_data[\"replies\"][r]\n", " if replies !=\"\": \n", " replies_json= { \"media\": all_data[\"media\"][r],\n", " \"id\": all_data[\"id\"][r],\n", " \"account\": all_data[\"account\"][r],\n", " \"metric\": \"http://schema.org/ReplyAction\",\n", " \"metric_name\": \"ReplyAction\",\n", " \"number\": replies}\n", " metrics_data.append(replies_json)\n", "\n", " #Reposts\n", " reposts= all_data[\"reposts\"][r]\n", " if reposts !=\"\":\n", " reposts_json= { \"media\": all_data[\"media\"][r],\n", " \"id\": all_data[\"id\"][r],\n", " \"account\": all_data[\"account\"][r],\n", " \"metric\": \"http://schema.org/ShareAction\",\n", " \"metric_name\": \"ShareAction\",\n", " \"number\": reposts}\n", " metrics_data.append(reposts_json)\n", " \n", " #Quotes\n", " quotes= all_data[\"quotes\"][r]\n", " if quotes !=\"\":\n", " quotes_json= {\"media\": all_data[\"media\"][r],\n", " \"id\": all_data[\"id\"][r],\n", " \"account\": all_data[\"account\"][r],\n", " \"metric\": \"http://schema.org/CommentAction\",\n", " \"metric_name\": \"CommentAction\",\n", " \"number\": quotes}\n", " metrics_data.append(quotes_json)\n", " \n", " #Likes\n", " likes= all_data[\"likes\"][r]\n", " if likes !=\"\":\n", " likes_json= {\"media\": all_data[\"media\"][r],\n", " \"id\": all_data[\"id\"][r],\n", " \"account\": all_data[\"account\"][r],\n", " \"metric\": \"http://schema.org/LikeAction\",\n", " \"metric_name\": \"LikeAction\",\n", " \"number\": likes}\n", " metrics_data.append(likes_json)\n", "\n", "\n", "## Remove redundant fields that exists in JSONs\n", "all_data.drop(['mentions'], axis=1, inplace=True)\n", "all_data.drop(['hashtags'], axis=1, inplace=True)\n", "all_data.drop(['links'], axis=1, inplace=True)\n", "\n", "all_data.drop(['views'], axis=1, inplace=True)\n", "all_data.drop(['replies'], axis=1, inplace=True)\n", "all_data.drop(['reposts'], axis=1, inplace=True)\n", "all_data.drop(['quotes'], axis=1, inplace=True)\n", "all_data.drop(['likes'], axis=1, inplace=True)\n", "\n", "## Export the dataset and the JSON files\n", "with open(f\"{filename_extra_data}.json\", \"w\") as f:\n", " json.dump(extra_data, f)\n", " \n", "with open(f\"{filename_metrics_data}.json\", \"w\") as f:\n", " json.dump(metrics_data, f)\n", " \n", "all_data.to_csv(f\"{filename_all_data}.csv\", sep=',', index=False)\n", " " ] }, { "cell_type": "code", "execution_count": 32, "id": "aded5d7d", "metadata": { "ExecuteTime": { "end_time": "2024-02-12T12:00:42.915435Z", "start_time": "2024-02-12T12:00:36.282970Z" }, "code_folding": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Conversational Discourse Triples Generated\n" ] } ], "source": [ "# Generate the data mapping file to transform the above data into triples\n", "import os\n", "\n", "## Set filenames (WITHOUT extension)\n", "filename_mappings= \"mappings/mappings_conversational\"\n", "if not os.path.exists(\"mappings\"): os.mkdir(\"mappings\")\n", "\n", "## Generate mapping file according to the JSONs and dataset\n", "mapping= f\"\"\"\n", "prefixes:\n", " #Core imports\n", " rdf: \"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"\n", " rdfs: \"http://www.w3.org/2000/01/rdf-schema#\"\n", " xsd: \"http://www.w3.org/2001/XMLSchema#\"\n", " xml: \"http://www.w3.org/XML/1998/namespace\"\n", " #Vocabulary imports\n", " schema: \"http://schema.org/\"\n", " terms: \"http://purl.org/dc/terms/\"\n", " dc: \"http://purl.org/dc/elements/1.1/\"\n", " dcam: \"http://purl.org/dc/dcam/\"\n", " vann: \"http://purl.org/vocab/vann/\"\n", " skos: \"http://www.w3.org/2004/02/skos/core#\"\n", " #Ontology imports\n", " owl: \"http://www.w3.org/2002/07/owl#\"\n", " foaf: \"http://xmlns.com/foaf/0.1/\"\n", " sioc: \"http://rdfs.org/sioc/ns#\"\n", " lkg: \"http://lkg.lynx-project.eu/def/\"\n", " nif: \"http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#\"\n", " eli: \"http://data.europa.eu/eli/ontology#\"\n", " #Knowledge graph domain declaration\n", " podio: \"http://w3id.org/podio#\" # URL to ontoology\n", "\n", "sources:\n", " extra_json: [{filename_extra_data}.json~jsonpath, \"$.[*]\"]\n", " metrics_json: [{filename_metrics_data}.json~jsonpath, \"$.[*]\"]\n", " data: [{filename_all_data}.csv~csv ]\n", "\n", "mappings:\n", " Conversational:\n", " sources:\n", " - data\n", " s: podio:Conversational/$(media)/$(account)/$(id)\n", " po:\n", " - [a, podio:Conversational]\n", " - [terms:language, $(language)~iri]\n", " - [terms:created, $(date), xsd:dateTime]\n", " - [terms:source, $(source)~iri]\n", " - [terms:identifier, $(id), xsd:string]\n", " - [terms:publisher, $(agent)~iri]\n", " - [terms:creator, $(agent)~iri]\n", " - [sioc:has_creator, podio:UserAccount/$(media)/$(account)~iri]\n", " - [podio:content, $(content), xsd:string]\n", " - [podio:ideology, $(ideology)~iri]\n", " - [schema:wordCount, $(word_count)]\n", "\n", " conversationalObjectProperties:\n", " sources:\n", " - extra_json\n", " s: podio:Conversational/$(media)/$(account)/$(id)\n", " po:\n", " - [sioc:has_container, podio:Hashtag/$(hashtag)~iri]\n", " - [sioc:links_to, $(link), xsd:string]\n", " - [sioc:mentions, podio:UserAccount/$(media)/$(mention)~iri]\n", " \n", " Hashtag:\n", " sources:\n", " - extra_json\n", " s: podio:Hashtag/$(hashtag)\n", " po:\n", " - [a, podio:Hashtag]\n", " - [terms:identifier, $(hashtag), xsd:string]\n", "\n", " InteractionCounter:\n", " sources:\n", " - metrics_json\n", " s: podio:Conversational/$(media)/$(account)/$(id)/$(metric_name)\n", " po:\n", " - [a, schema:InteractionCounter]\n", " - [schema:interactionType, $(metric)~iri]\n", " - [schema:userInteractionCount, $(number), xsd:string]\n", " \n", " interactionStatistic:\n", " sources:\n", " - metrics_json\n", " s: podio:Conversational/$(media)/$(account)/$(id)\n", " po:\n", " - [schema:interactionStatistic, podio:Conversational/$(media)/$(account)/$(id)/$(metric_name)~iri]\n", " \n", " UserAccount:\n", " sources:\n", " - data\n", " s: podio:UserAccount/$(media)/$(account)\n", " po:\n", " - [a, sioc:UserAccount]\n", " - [sioc:account_of, $(agent)]\n", " - [foaf:accountName, $(account), xsd:string]\n", " - [foaf:accountServiceHomepage, $(media_url)]\n", " - [sioc:creator_of, podio:Conversational/$(media)/$(account)/$(id)]\n", " Agent:\n", " sources: \n", " - data\n", " s: $(agent)\n", " po:\n", " - [a, foaf:Agent]\n", " - [rdfs:isDefinedBy, $(agent)~iri]\n", " - [foaf:holdsAccount, podio:UserAccount/$(media)/$(account)~iri]\n", " \n", "\"\"\"\n", "\n", "## Save mappings file\n", "with open(f\"{filename_mappings}.yml\", \"w\") as f:\n", " f.write(mapping)\n", " \n", "## Generate the triples\n", "generate_triples(filename_mappings)\n", "\n", "print(\"Conversational Discourse Triples Generated\")" ] }, { "cell_type": "markdown", "id": "a2bfe788", "metadata": { "heading_collapsed": true }, "source": [ "### Approved Policies" ] }, { "cell_type": "markdown", "id": "91118419", "metadata": { "hidden": true }, "source": [ "Of the resources available on the web on legislative documents, the most relevant we found to demonstrate the potential of PODIO are:\n", "\n", "Description | Dataset\n", "--- | --- \n", "**European Legal Knowledge Graph** | Moreno Schneider, J., Rehm, G., Montiel-Ponsoda, E., Rodríguez-Doncel, V., Martín-Chozas, P., Navas-Loro, M., Kaltenböck, M., Revenko, A., Karampatakis, S., Sageder, C., Gracia, J., Maganza, F., Kernerman, I., Lonke, D., Lagzdins, A., Bosque Gil, J., Verhoeven, P., Gomez Diaz, E., & Boil Ballesteros, P. (2022). Lynx: A knowledge-based AI service platform for content processing, enrichment and analysis for the legal domain. In Information Systems (Vol. 106, p. 101966). Elsevier BV. https://doi.org/10.1016/j.is.2021.101966 \n", "**Publications of Arganda del Rey city council in the official gazette of the state and of the community in the period 1985-2023**| https://datos.gob.es/es/catalogo/l01280148-publicaciones-boe-2023\n", "\n", "On the one hand, to reuse the *European Legal Knowledge Graph* resource it is not necessary to download anything, just use federated SPARQL queries to access its data. This is explored in more detail in the *Querying the graph* section. Anyway, for more information check the [documentation](https://www.w3.org/TR/sparql11-federated-query/).\n", "\n", "On the other hand, to exploit the open data of Arganda del Rey city council, we will follow the steps below. It is mandatory to download the package [PyMuPDF](https://pypi.org/project/PyMuPDF/) to read the content of the legislation in pdf format. " ] }, { "cell_type": "code", "execution_count": null, "id": "a8a824cf", "metadata": { "ExecuteTime": { "end_time": "2024-02-06T16:57:46.910250Z", "start_time": "2024-02-06T16:57:46.234081Z" }, "hidden": true }, "outputs": [], "source": [ "# Dataset downloading\n", "import requests\n", "import os \n", "\n", "## Generate folders where data will be stored\n", "if not os.path.exists(\"data\"): os.mkdir(\"data\")\n", "if not os.path.exists(\"data/aux\"): os.mkdir(\"data/aux\")\n", "\n", "## Downloading and storage of the dataset\n", "response = requests.get(\"https://datosabiertos.ayto-arganda.es/dataset/e519f1ba-8dfd-41b0-bda0-5b6660dbdda7/resource/99282a1a-8eaf-41dd-8cf4-cef6f930b0a3/download/publicaciones1985_2023.csv\")\n", "open(\"data/arganda.csv\", \"w\").write(response.content.decode('utf-8-sig'))\n" ] }, { "cell_type": "code", "execution_count": null, "id": "788e702f", "metadata": { "ExecuteTime": { "end_time": "2024-02-06T16:58:40.092250Z", "start_time": "2024-02-06T16:57:46.915631Z" }, "code_folding": [ 21, 71 ], "hidden": true }, "outputs": [], "source": [ "# Dataset loading and enriching\n", "import pandas as pd\n", "from datetime import datetime\n", "import fitz \n", "import numpy as np\n", "import os\n", "import requests\n", "import xml.etree.ElementTree as ET\n", "\n", "## Set filenames (WITHOUT extension)\n", "filename_legislations= \"data/legislation\"\n", "\n", "## Limit to avoid overloading the graph database due to excessive volume of data sets.\n", "limit=4000 #If you do not want limit set as None\n", "\n", "## Dataset loading\n", "df_arganda_legislation= pd.read_csv(\"data/arganda.csv\", sep=';', on_bad_lines='skip')[:limit]\n", "\n", "## Extracting the content of the pdf links\n", "content= []\n", "\n", "for index, link in enumerate(df_arganda_legislation[\"Hipervinculo\"]):\n", " filename_aux= f\"data/aux/{link.split('/')[-1].split('.pdf')[0]}.pdf\"\n", " \n", " if not os.path.isfile(filename_aux):\n", " #Download the file\n", " headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36'}\n", " try:\n", " response = requests.get(link, headers=headers)\n", " if response.status_code == 200:\n", " open(filename_aux, \"wb\").write(response.content)\n", " else:\n", " open(filename_aux, \"w\").write(\"\")\n", " except:\n", " open(filename_aux, \"w\").write(\"\")\n", " try: \n", " with fitz.open(filename_aux) as doc:\n", " global_text = []\n", " doc_content= \"\"\n", " for pagenumber, page in enumerate(doc):\n", " doc_content+= page.get_text()\n", " content.append(doc_content.replace(\"\\\"\", \"\\'\")) #Is important to remove because otherwhise will generate malformed csv\n", " except:\n", " content.append(\"\")\n", "\n", "## Manual entity linking\n", "publishers= {\"BOE\": \"http://www.wikidata.org/entity/Q5659724\", \"BOC\": \"http://www.wikidata.org/entity/Q578788\"}\n", "audiences= {\"BOE\": \"http://www.wikidata.org/entity/Q29\", \"BOC\": \"http://www.wikidata.org/entity/Q5756\"}\n", "\n", "## Dataset cleaning and enriching\n", "df_arganda_legislation[\"content\"]= content\n", "df_arganda_legislation.rename(columns = {'Hipervinculo':'source'}, inplace = True) \n", "df_arganda_legislation.rename(columns = {'Fecha':'date'}, inplace = True) \n", "df_arganda_legislation.rename(columns = {'Materia':'topic'}, inplace = True) \n", "df_arganda_legislation.rename(columns = {'Descripcion':'description'}, inplace = True) \n", "df_arganda_legislation.rename(columns = {'Año':'year'}, inplace = True) \n", "df_arganda_legislation[\"publisher\"] = [publishers[x] for x in df_arganda_legislation[\"Boletin\"]]\n", "df_arganda_legislation[\"audience\"]= [audiences[x] for x in df_arganda_legislation[\"Boletin\"]]\n", "df_arganda_legislation[\"date\"]= [str(datetime.strptime(row, \"%Y-%m-%d\").isoformat()) for row in df_arganda_legislation[\"date\"]]\n", "df_arganda_legislation[\"language\"]=[\"http://id.loc.gov/vocabulary/iso639-2/spa\" for x in range(df_arganda_legislation.shape[0])]\n", "df_arganda_legislation[\"creator\"]=[\"http://www.wikidata.org/entity/Q60052813\" for x in range(df_arganda_legislation.shape[0])]\n", "df_arganda_legislation[\"jurisdiction\"]=[\"ES-MA\" for x in range(df_arganda_legislation.shape[0])]\n", "df_arganda_legislation[\"title\"]=[\" \" for x in range(df_arganda_legislation.shape[0])]\n", "df_arganda_legislation[\"word_count\"]= [len(x.split()) for x in df_arganda_legislation[\"content\"]]\n", "\n", "## Extracting and generating legislative document identifiers\n", "parents_links= []\n", "parent_ids= []\n", "parent_pdfs= []\n", "ids= []\n", "\n", "for x in range(df_arganda_legislation.shape[0]):\n", "\n", " if df_arganda_legislation[\"Boletin\"][x]==\"BOE\":\n", " link_pdf= f\"https://www.boe.es/boe/dias/{df_arganda_legislation['date'][x].replace('-','/')}/pdfs/BOE-S-{df_arganda_legislation['date'][x].split('-')[0]}-{df_arganda_legislation['Numero Boletin'][x]}.pdf\"\n", " parent_pdfs.append(link_pdf)\n", " \n", " parent_id= f\"BOE-S-{df_arganda_legislation['year'][x]}-{df_arganda_legislation['Numero Boletin'][x]}\"\n", " parent_ids.append(parent_id)\n", "\n", " link_xml= f\"https://www.boe.es/diario_boe/xml.php?id={parent_id}\"\n", " xmlstring= requests.get(link_xml).content.decode(\"utf-8\")\n", " tree = ET.ElementTree(ET.fromstring(xmlstring))\n", " \n", " parent_map = {c: p for p in tree.iter() for c in p}\n", " found= False\n", " for idx, c in enumerate(tree.findall(\".//titulo\")):\n", " if c.text == df_arganda_legislation[\"description\"][x]:\n", " found= True\n", " break\n", " if found: \n", " identifier= parent_map[c].attrib['id']\n", " else:\n", " e= 1\n", " while True:\n", " _id= f\"{parent_id}-{e}\"\n", " if _id not in ids:\n", " identifier= _id\n", " break\n", " else:\n", " e+= 1\n", " ids.append(identifier)\n", "\n", " link_id= f\"https://www.boe.es/diario_boe/txt.php?id={identifier}\"\n", " parents_links.append(link_id)\n", " \n", " elif df_arganda_legislation[\"Boletin\"][x]==\"BOC\":\n", " \n", " link_pdf= f\"https://www.bocm.es/boletin/CM_Boletin_BOCM/{df_arganda_legislation['date'][x].replace('-','/')}/{str(df_arganda_legislation['Numero Boletin'][x]).zfill(3)}00.pdf\"\n", " parent_pdfs.append(link_pdf)\n", " \n", " link_id= f\"https://www.bocm.es/boletin-completo/bocm-{df_arganda_legislation['date'][x].replace('-','')}/{df_arganda_legislation['Numero Boletin'][x]}/\"\n", " parents_links.append(link_id)\n", " parent_id= f\"BOCM-{df_arganda_legislation['date'][x].replace('-','')}-{df_arganda_legislation['Numero Boletin'][x]}\"\n", " parent_ids.append(parent_id)\n", " \n", " # We generate child ids because they are not defined by the CAM\n", " e= 1\n", " while True:\n", " child_id = f\"{parent_id}-{e}\"\n", " if child_id not in ids:\n", " ids.append(child_id)\n", " break\n", " else:\n", " e +=1\n", "\n", "df_arganda_legislation[\"id\"]= ids\n", "df_arganda_legislation[\"parent_id\"]= parent_ids\n", "df_arganda_legislation[\"parent_source\"]= parents_links\n", "\n", "## Remove unused columns\n", "df_arganda_legislation.drop(['Numero Boletin'], axis=1, inplace=True)\n", "df_arganda_legislation.drop(['Pagina'], axis=1, inplace=True)\n", "df_arganda_legislation.drop(['Boletin'], axis=1, inplace=True)\n", "df_arganda_legislation.drop(['year'], axis=1, inplace=True)\n", "\n", "## Export the data to csv\n", "df_arganda_legislation.to_csv(f\"{filename_legislations}.csv\", sep=',', index=False)" ] }, { "cell_type": "code", "execution_count": null, "id": "427e5506", "metadata": { "ExecuteTime": { "end_time": "2024-02-06T17:03:18.507522Z", "start_time": "2024-02-06T16:58:40.093776Z" }, "code_folding": [ 9, 32, 35 ], "hidden": true }, "outputs": [], "source": [ "# Generate the data mapping file to transform the above data into triples\n", "import os\n", "\n", "## Set filenames (WITHOUT extension)\n", "filename_mappings= \"mappings/mappings_legislation\"\n", "if not os.path.exists(\"mappings\"): os.mkdir(\"mappings\")\n", "\n", "## Generate mapping file according to the previous dataset\n", "mapping= f\"\"\"\n", "prefixes:\n", " #Core imports\n", " rdf: \"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"\n", " rdfs: \"http://www.w3.org/2000/01/rdf-schema#\"\n", " xsd: \"http://www.w3.org/2001/XMLSchema#\"\n", " xml: \"http://www.w3.org/XML/1998/namespace\"\n", " #Vocabulary imports\n", " schema: \"http://schema.org/\"\n", " terms: \"http://purl.org/dc/terms/\"\n", " dc: \"http://purl.org/dc/elements/1.1/\"\n", " dcam: \"http://purl.org/dc/dcam/\"\n", " vann: \"http://purl.org/vocab/vann/\"\n", " skos: \"http://www.w3.org/2004/02/skos/core#\"\n", " #Ontology imports\n", " owl: \"http://www.w3.org/2002/07/owl#\"\n", " foaf: \"http://xmlns.com/foaf/0.1/\"\n", " sioc: \"http://rdfs.org/sioc/ns#\"\n", " lkg: \"http://lkg.lynx-project.eu/def/\"\n", " nif: \"http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#\"\n", " eli: \"http://data.europa.eu/eli/ontology#\"\n", " #Knowledge graph domain declaration\n", " podio: \"http://w3id.org/podio#\" # URL to ontoology\n", "\n", "sources:\n", " data: [{filename_legislations}.csv~csv ]\n", " \n", "mappings:\n", " ApprovedPolicy:\n", " sources:\n", " - data\n", " s: podio:ApprovedPolicy/$(id)\n", " po:\n", " - [a, podio:ApprovedPolicy]\n", " - [lkg:metadata, podio:ApprovedPolicy/$(id)/Metadata~iri]\n", " - [eli:is_part_of, podio:ApprovedPolicy/$(parent_id)~iri]\n", " - [podio:content, $(content), xsd:string]\n", " - [terms:language, $(language)~iri]\n", " - [terms:created, $(date), xsd:dateTime]\n", " - [terms:description, $(description), xsd:string]\n", " - [terms:source, $(source)~iri]\n", " - [terms:identifier, $(id), xsd:int]\n", " - [schema:wordCount, $(word_count), xsd:int]\n", " - [terms:audience, $(audience)~iri]\n", " - [terms:publisher, $(publisher)~iri]\n", " - [terms:creator, $(creator)~iri]\n", "\n", " ParentPolicy:\n", " sources:\n", " - data\n", " s: podio:ApprovedPolicy/$(parent_id)\n", " po:\n", " - [a, podio:ApprovedPolicy]\n", " - [eli:has_part, podio:ApprovedPolicy/$(id)~iri]\n", " - [terms:language, $(language)~iri]\n", " - [terms:source, $(parent_source)~iri]\n", " - [terms:identifier, $(parent_id), xsd:int]\n", " - [terms:created, $(date), xsd:dateTime]\n", " - [terms:audience, $(audience)~iri]\n", " - [terms:publisher, $(publisher)~iri]\n", " - [terms:creator, $(publisher)~iri]\n", " \n", " Metadata:\n", " sources:\n", " - data\n", " s: podio:ApprovedPolicy/$(id)/Metadata\n", " po:\n", " - [a, podio:ApprovedPolicy]\n", " - [terms:language, $(language)~iri]\n", " - [lkg:hasPDF, $(source)~iri]\n", " - [terms:source, $(parent_source)~iri]\n", " - [eli:version_date, $(date), xsd:dateTime]\n", " - [eli:id_local, $(id), xsd:string]\n", " - [lkg:summary, $(description), xsd:string]\n", " - [terms:subject, $(topic), xsd:string]\n", " - [terms:title, $(title), xsd:string]\n", " - [lkg:hasAuthority, $(publisher)~iri]\n", " - [eli:jurisdiction, $(jurisdiction), xsd:string]\n", "\n", "\"\"\"\n", "\n", "## Save mappings file\n", "with open(f\"{filename_mappings}.yml\", \"w\") as f:\n", " f.write(mapping)\n", "\n", "## Generate the triples\n", "generate_triples(filename_mappings)\n", "\n", "print(\"Approved Policies Discourse Triples Generated\")" ] }, { "cell_type": "markdown", "id": "8422c3a9", "metadata": { "heading_collapsed": true }, "source": [ "### Other Discourses" ] }, { "cell_type": "markdown", "id": "9fa08c9c", "metadata": { "hidden": true }, "source": [ "There are also other types of discourses, such as those made in Spain by the head of the state at Christmas. These speeches are expository and are broadcast on television on Christmas Eve. We found the following resource that collects the different ones: \n", "\n", "Description|Dataset\n", "---|---\n", "**Spanish head of state speeches** | Elena Álvarez-Mellado. 2020. A Corpus of Spanish Political Speeches from 1937 to 2019. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 928–932, Marseille, France. European Language Resources Association.\n", "\n", "To work with this resource it is necessary to install the [git](https://gitpython.readthedocs.io/en/stable/intro.html) package in python (or skip the step and do it manually)." ] }, { "cell_type": "code", "execution_count": null, "id": "ec1d83be", "metadata": { "ExecuteTime": { "end_time": "2024-02-06T17:03:18.673083Z", "start_time": "2024-02-06T17:03:18.509330Z" }, "hidden": true }, "outputs": [], "source": [ "from git import Repo # pip install gitpython\n", "\n", "## Set filenames (WITHOUT extension)\n", "repo_dir= \"data/aux/discursos-de-navidad\"\n", "\n", "## Cloning repository\n", "Repo.clone_from(\"https://github.com/lirondos/discursos-de-navidad.git\", repo_dir)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "025072d6", "metadata": { "ExecuteTime": { "end_time": "2024-02-06T17:03:28.314326Z", "start_time": "2024-02-06T17:03:28.250292Z" }, "code_folding": [ 20 ], "hidden": true }, "outputs": [], "source": [ "#Dataset loading, cleaning, enriching and storage \n", "import pandas as pd\n", "import numpy as np\n", "import os\n", "\n", "## Set filenames (WITHOUT extension)\n", "repo_dir= \"data/aux/discursos-de-navidad\"\n", "filename_discursos= \"data/discursos_navidad\"\n", "\n", "## Limit to avoid overloading the graph database due to excessive volume of data sets.\n", "limit=4000 #If you do not want limit set as None else, an int\n", "\n", "## Dataset loading\n", "discursos_navidad= pd.read_csv(f\"{repo_dir}/data/metadata.csv\", sep=',', on_bad_lines='skip')[:limit]\n", "\n", "## Manual entity linking\n", "head_of_state = {\"Felipe de Borbón\": \"http://www.wikidata.org/entity/Q191045\", \n", " \"Juan Carlos de Borbón\": \"http://www.wikidata.org/entity/Q19943\", \n", " \"Francisco Franco\": \"http://www.wikidata.org/entity/Q29179\"}\n", "\n", "ideologies = { \"Felipe de Borbón\": \"\", \n", " \"Juan Carlos de Borbón\": \"\", \n", " \"Francisco Franco\": \"http://www.wikidata.org/entity/Q210890\"}\n", "\n", "\n", "## Dataset cleaning and enriching\n", "discursos_navidad.rename(columns = {'url_text':'source'}, inplace = True) \n", "discursos_navidad.rename(columns = {'head_of_state':'creator'}, inplace = True) \n", "discursos_navidad[\"ideology\"]= [ideologies[x] for x in discursos_navidad[\"creator\"]]\n", "discursos_navidad[\"creator\"]= [head_of_state[x] for x in discursos_navidad[\"creator\"]]\n", "discursos_navidad[\"target\"]= [\"http://www.wikidata.org/entity/Q29\" for x in range(discursos_navidad.shape[0])]\n", "discursos_navidad[\"language\"]=[\"http://id.loc.gov/vocabulary/iso639-2/spa\" for x in range(discursos_navidad.shape[0])]\n", "discursos_navidad[\"id\"]=[f\"{x}-spa-christmas\" for x in discursos_navidad[\"year\"]]\n", "discursos_navidad.rename(columns = {'year':'date'}, inplace = True) \n", "discursos_navidad[\"date\"]= [f\"{x}-12-24T21:00:00\" for x in discursos_navidad[\"date\"]]\n", "\n", "## Read discourse files content and add to the dataset\n", "contents= []\n", "for file_name in discursos_navidad[\"file_name\"]:\n", " with open(f\"{repo_dir}/data/speeches/{file_name}\", \"r\") as f:\n", " contents.append(f.read())\n", "discursos_navidad[\"content\"]= contents\n", "discursos_navidad[\"word_count\"]= [len(x.split()) for x in discursos_navidad[\"content\"]]\n", "discursos_navidad.drop(['file_name'], axis=1, inplace=True)\n", "\n", "## Save dataset to file\n", "discursos_navidad.to_csv(f\"{filename_discursos}.csv\", sep=',', index=False)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "d5ac6f86", "metadata": { "ExecuteTime": { "end_time": "2024-02-06T17:03:31.975300Z", "start_time": "2024-02-06T17:03:29.104828Z" }, "code_folding": [ 9 ], "hidden": true }, "outputs": [], "source": [ "# Generate the data mapping file to transform the above data into triples\n", "import os\n", "\n", "## Set filenames (WITHOUT extension)\n", "filename_mappings= \"mappings/mappings_discursos_navidad\"\n", "if not os.path.exists(\"mappings\"): os.mkdir(\"mappings\")\n", "\n", "## Generate mapping file according to the JSONs and dataset\n", "mapping= f\"\"\"\n", "prefixes:\n", " #Core imports\n", " rdf: \"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"\n", " rdfs: \"http://www.w3.org/2000/01/rdf-schema#\"\n", " xsd: \"http://www.w3.org/2001/XMLSchema#\"\n", " xml: \"http://www.w3.org/XML/1998/namespace\"\n", " #Vocabulary imports\n", " schema: \"http://schema.org/\"\n", " terms: \"http://purl.org/dc/terms/\"\n", " dc: \"http://purl.org/dc/elements/1.1/\"\n", " dcam: \"http://purl.org/dc/dcam/\"\n", " vann: \"http://purl.org/vocab/vann/\"\n", " skos: \"http://www.w3.org/2004/02/skos/core#\"\n", " #Ontology imports\n", " owl: \"http://www.w3.org/2002/07/owl#\"\n", " foaf: \"http://xmlns.com/foaf/0.1/\"\n", " sioc: \"http://rdfs.org/sioc/ns#\"\n", " lkg: \"http://lkg.lynx-project.eu/def/\"\n", " nif: \"http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#\"\n", " eli: \"http://data.europa.eu/eli/ontology#\"\n", " #Knowledge graph domain declaration\n", " podio: \"http://w3id.org/podio#\" # URL to ontoology\n", "\n", "sources:\n", " data: [{filename_discursos}.csv~csv ]\n", "\n", "mappings:\n", " Expository:\n", " sources:\n", " - data\n", " s: podio:Expository/$(id)\n", " po:\n", " - [a, podio:Expository]\n", " - [terms:identifier, $(id), xsd:int]\n", " - [podio:content, $(content), xsd:string]\n", " - [terms:language, $(language)~iri]\n", " - [terms:created, $(date), xsd:dateTime]\n", " - [terms:source, $(source)~iri]\n", " - [schema:wordCount, $(word_count), xsd:int]\n", " - [podio:hasTarget, $(target)~iri]\n", " - [podio:ideology, $(ideology)~iri]\n", " - [terms:creator, $(creator)~iri]\n", "\n", "\"\"\"\n", "\n", "## Save mappings file\n", "with open(f\"{filename_mappings}.yml\", \"w\") as f:\n", " f.write(mapping)\n", "\n", "## Generate the triples\n", "\n", "generate_triples(filename_mappings)\n", "\n", "print(\"Expository Discourse Triples Generated\")" ] }, { "cell_type": "markdown", "id": "449d388a", "metadata": { "heading_collapsed": true }, "source": [ "### Additional Triples" ] }, { "cell_type": "markdown", "id": "fa22b803", "metadata": { "hidden": true }, "source": [ "Aquí generamos tripletas transversales para el correcto funcionamiento del grafo. Entre las tripletas que creamos son las de definir algunos agentes como partidos políticos, " ] }, { "cell_type": "code", "execution_count": null, "id": "ed67a59b", "metadata": { "ExecuteTime": { "end_time": "2024-02-06T17:04:10.606120Z", "start_time": "2024-02-06T17:04:10.601820Z" }, "hidden": true }, "outputs": [], "source": [ "# Generate the extra triples \n", "import os\n", "\n", "## Set filenames (WITHOUT extension)\n", "filename_extra= \"mappings/extra_triples\"\n", "if not os.path.exists(\"mappings\"): os.mkdir(\"mappings\")\n", "\n", " \n", "wikidata_id= { \n", " \"http://www.wikidata.org/entity/Q623740\": \"Q623740\",\n", " \"http://www.wikidata.org/entity/Q1393123\": \"Q1393123\",\n", " \"http://www.wikidata.org/entity/Q138198\": \"Q138198\",\n", " \"http://www.wikidata.org/entity/Q185088\": \"Q185088\",\n", " \"http://www.wikidata.org/entity/Q16059622\": \"Q16059622\",\n", " \"http://www.wikidata.org/entity/Q15630787\": \"Q15630787\",\n", " \"http://www.wikidata.org/entity/Q29552\": \"Q29552\",\n", " \"http://www.wikidata.org/entity/Q29468\": \"Q29468\",\n", " \"http://www.wikidata.org/entity/Q22686\": \"Q22686\",\n", " \"http://www.wikidata.org/entity/Q6279\": \"Q6279\",\n", " \"http://www.wikidata.org/entity/Q191045\": \"Q191045\",\n", " \"http://www.wikidata.org/entity/Q19943\": \"Q19943\",\n", " \"http://www.wikidata.org/entity/Q29179\": \"Q29179\",\n", " \"http://www.wikidata.org/entity/Q60052813\": \"Q60052813\"\n", " \n", "}\n", "\n", "\n", "trip= \"\"\n", "for key, value in wikidata_id.items():\n", " trip += f'<{key}> \"{value}\".\\n'\n", "\n", "\n", "with open(f\"{filename_extra}.nt\", \"w\") as f:\n", " f.write(trip)\n", "\n", "print(\"Extra triples generated\")\n" ] }, { "cell_type": "markdown", "id": "4846f258", "metadata": { "heading_collapsed": true }, "source": [ "### Summary" ] }, { "cell_type": "markdown", "id": "51a8ba40", "metadata": { "hidden": true }, "source": [ "In this section we downloaded different datasets available on the internet on political discourse from different countries. We cleaned, enriched, merged and mapped them to the ontology to generate the triplets. We set a limit of the first 4000 entries for each dataset in order to make the database solve queries faster. However, this notebook can be used to load more datasets without any limit. The triples that we generated, and to which we are going to make queries are:\n", "\n", "Categories | Triples\n", "--- | ---\n", "Party Manifestos and Political Proposals | 19.390\n", "Social Media Posts | 446.346\n", "Approved Policies | 127.838\n", "Other Discoueses | 743\n", "Additional triples | 14\n", "**Total** | **549.321**" ] }, { "cell_type": "markdown", "id": "9a34256c", "metadata": { "heading_collapsed": true }, "source": [ "## Querying the graph" ] }, { "cell_type": "markdown", "id": "4cbc1f84", "metadata": { "hidden": true }, "source": [ "To execute the queries we used the python package [SPARQLWrapper](https://pypi.org/project/SPARQLWrapper/). The endpoint we use is the one associated to https://w3id.org/podio/sparql, although you are free to use another one and replicate or expand the previous processes. " ] }, { "cell_type": "code", "execution_count": null, "id": "cc939de1", "metadata": { "ExecuteTime": { "end_time": "2024-02-06T14:05:45.176985Z", "start_time": "2024-02-06T14:05:45.172872Z" }, "code_folding": [], "hidden": true, "hide_input": false, "scrolled": true }, "outputs": [], "source": [ "#Default endpoint configuration to run SPARQL queries\n", "from SPARQLWrapper import SPARQLWrapper, JSON, CSV\n", "import pandas as pd\n", "from io import StringIO\n", "from IPython.display import display, HTML\n", "\n", "sparql = SPARQLWrapper(\"https://graphdb.linkeddata.es/repositories/podio\")\n", "sparql.setReturnFormat(CSV)\n", "\n", "\n", "\n", "def run_query(query):\n", " sparql.setQuery(query)\n", " try:\n", " results = sparql.queryAndConvert()\n", " decode_results= results.decode(encoding='utf-8', errors='strict')\n", " df = pd.read_csv(StringIO(decode_results), sep=\",\")\n", " display(HTML(df.to_html()))\n", " except Exception as e:\n", " print(e)" ] }, { "cell_type": "markdown", "id": "d30b3600", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### CQ1: What are the names of all political parties, when were they created and what is their ideology?" ] }, { "cell_type": "code", "execution_count": null, "id": "250dc2b3", "metadata": { "ExecuteTime": { "end_time": "2024-02-06T16:06:47.306125Z", "start_time": "2024-02-06T16:06:44.547887Z" }, "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "select ?name ?creationDate ?ideologyLabel where { \n", "\n", " select ?s (SAMPLE(?creation) as ?creationDate) (SAMPLE(?ideologyLabel) as ?ideologyLabel) (SAMPLE(?name) as ?name)\n", " where { \n", "\n", " ?s a foaf:Agent;\n", " terms:identifier ?id.\n", "\n", " BIND(IRI(CONCAT(\"http://www.wikidata.org/entity/\", str(?id))) AS ?wikidataIri) .\n", "\n", " # Query Wikidata using federation\n", " SERVICE {\n", " ?wikidataIri rdfs:label ?name;\n", " wdt:P31/wdt:P279* wd:Q7278;\n", " wdt:P571 ?creation;\n", " wdt:P1387 ?ideology.\n", " ?ideology rdfs:label ?ideologyLabel.\n", " FILTER(LANGMATCHES(LANG(?name), \"en\")).\n", " FILTER(LANGMATCHES(LANG(?ideologyLabel), \"en\")).\n", " SERVICE wikibase:label { bd:serviceParam wikibase:language \"en\". }\n", " }\n", " } group by ?s\n", "\n", "}\n", "\"\"\"\n", "\n", "run_query(query)" ] }, { "cell_type": "markdown", "id": "5058fe5f", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### CQ2: Which social media accounts does the PP have?" ] }, { "cell_type": "code", "execution_count": null, "id": "53d1d027", "metadata": { "ExecuteTime": { "end_time": "2024-02-05T16:06:29.421807Z", "start_time": "2024-02-05T16:06:29.388101Z" }, "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "select ?socialMediaUserName ?socialMedia where { \n", " foaf:holdsAccount ?accounts .\n", " ?accounts foaf:accountServiceHomepage ?socialMedia;\n", " foaf:accountName ?socialMediaUserName.\n", "} \n", "\"\"\"\n", "\n", "run_query(query)" ] }, { "cell_type": "markdown", "id": "025c2929", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### CQ3: How many posts have political parties published on each of their social media accounts?" ] }, { "cell_type": "code", "execution_count": null, "id": "7a267a58", "metadata": { "ExecuteTime": { "end_time": "2024-02-06T16:10:08.850489Z", "start_time": "2024-02-06T16:10:03.989746Z" }, "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "\n", "select (?name as ?PoliticalParty) ?socialMedia (?accname as ?accountName) ?numberOfpost where { \n", "\n", " select ?socialMedia ?accname\n", " (SAMPLE(?name) AS ?name)\n", " (COUNT(DISTINCT(?discourse)) AS ?numberOfpost) \n", " where { \n", " ?discourse a podio:Discourse;\n", " sioc:has_creator ?account.\n", " ?s a foaf:Agent;\n", " terms:identifier ?ids;\n", " foaf:holdsAccount ?account .\n", " ?account foaf:accountServiceHomepage ?socialMedia;\n", " foaf:accountName ?accname.\n", " BIND(IRI(CONCAT(\"http://www.wikidata.org/entity/\", str(?ids))) AS ?wikidataIri) .\n", "\n", " # Query Wikidata using federation\n", " SERVICE {\n", " ?wikidataIri rdfs:label ?name;\n", " wdt:P31/wdt:P279* wd:Q7278.\n", " FILTER(LANGMATCHES(LANG(?name), \"en\")).\n", " SERVICE wikibase:label { bd:serviceParam wikibase:language \"en\". }\n", " }\n", " } group by ?socialMedia ?accname\n", "}\n", "\"\"\"\n", "\n", "run_query(query)" ] }, { "cell_type": "markdown", "id": "75c9eea2", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### CQ4: Which account and on which social media network is the most mentioned by each politician and political party?" ] }, { "cell_type": "code", "execution_count": null, "id": "86a96a91", "metadata": { "ExecuteTime": { "end_time": "2024-02-06T14:07:14.006517Z", "start_time": "2024-02-06T14:07:12.214027Z" }, "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "\n", "SELECT (SAMPLE(?name) as ?agentName) (SAMPLE(?socialMedia) as ?socialMedia) (SAMPLE(?mentioned_account) as ?maxMentionedAccount) (MAX(?count_mentions) as ?numberOfTimesMentioned) WHERE{\n", " \n", " SELECT DISTINCT ?pparty ?name ?socialMedia ?mentioned_account (COUNT (?tweet) as ?count_mentions)\n", " WHERE {\n", " ?tweet a podio:Discourse ;\n", " sioc:mentions ?mentioned_account ;\n", " podio:content ?tweet_text ;\n", " sioc:has_creator ?account .\n", " ?pparty foaf:holdsAccount ?account;\n", " terms:identifier ?wikiId.\n", " ?account foaf:accountServiceHomepage ?socialMedia.\n", "\n", " BIND(IRI(CONCAT(\"http://www.wikidata.org/entity/\", str(?wikiId))) AS ?wikidataIri) .\n", "\n", " # Query Wikidata using federation\n", " SERVICE {\n", " ?wikidataIri rdfs:label ?name.\n", " FILTER(LANGMATCHES(LANG(?name), \"en\")).\n", " SERVICE wikibase:label { bd:serviceParam wikibase:language \"en\". }\n", " }\n", " } GROUP BY ?pparty ?mentioned_account ?name ?socialMedia\n", " \n", "} group by ?pparty \n", "ORDER BY DESC(?mentions)\n", "\"\"\"\n", "\n", "run_query(query)" ] }, { "cell_type": "markdown", "id": "7d50426b", "metadata": { "ExecuteTime": { "end_time": "2024-02-06T15:03:05.467876Z", "start_time": "2024-02-06T15:03:05.464693Z" }, "heading_collapsed": true, "hidden": true }, "source": [ "### CQ5: What were the top 10 hashtags used in political speech during 2020?" ] }, { "cell_type": "code", "execution_count": null, "id": "36729a4b", "metadata": { "ExecuteTime": { "end_time": "2024-02-06T15:02:34.584683Z", "start_time": "2024-02-06T15:02:34.489026Z" }, "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "\n", "\n", "Select ?hashtagName (MAX(?count) as ?usedTimes) where{\n", " select ?hashtagName (count(distinct ?s) as ?count) where {\n", " ?s a podio:Discourse;\n", " podio:content ?text;\n", " terms:creator ?authAcc;\n", " sioc:has_container ?container;\n", " terms:created ?date .\n", " ?container a podio:Hashtag;\n", " terms:identifier ?hashtagName.\n", "\n", " \n", " FILTER(year(?date) = 2020)\n", " } GROUP BY ?hashtagName \n", "\n", "} group by ?authAcc ?hashtagName\n", "Order BY desc (?usedTimes) LIMIT 10\n", "\n", "\"\"\"\n", "\n", "run_query(query)" ] }, { "cell_type": "markdown", "id": "3212419c", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### CQ6: Which political party has generated the most speeches?" ] }, { "cell_type": "code", "execution_count": null, "id": "7ccf644f", "metadata": { "ExecuteTime": { "end_time": "2024-02-06T09:07:00.394957Z", "start_time": "2024-02-06T09:06:58.301208Z" }, "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "select (?name as ?PoliticalParty) ?numberOfDiscourses where { \n", "\n", " select ?s\n", " (SAMPLE(?name) AS ?name)\n", " (COUNT(DISTINCT(?discourse)) AS ?numberOfDiscourses) \n", " where { \n", " ?discourse a podio:Discourse;\n", " terms:creator ?s.\n", " ?s a foaf:Agent;\n", " terms:identifier ?ids.\n", " \n", " BIND(IRI(CONCAT(\"http://www.wikidata.org/entity/\", str(?ids))) AS ?wikidataIri) .\n", "\n", " # Query Wikidata using federation\n", " SERVICE {\n", " ?wikidataIri rdfs:label ?name;\n", " wdt:P31/wdt:P279* wd:Q7278.\n", " FILTER(LANGMATCHES(LANG(?name), \"en\")).\n", " SERVICE wikibase:label { bd:serviceParam wikibase:language \"en\". }\n", " }\n", " } group by ?s\n", "}ORDER BY DESC(?numberOfDiscourses)\n", "LIMIT 1\n", "\"\"\"\n", "run_query(query)" ] }, { "cell_type": "markdown", "id": "59605058", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### CQ7: How many references to the LGTBI community have been made in each year's political speeches by the different political agents?" ] }, { "cell_type": "code", "execution_count": null, "id": "e5442aab", "metadata": { "ExecuteTime": { "end_time": "2024-02-06T14:32:10.374675Z", "start_time": "2024-02-06T14:32:06.275746Z" }, "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "select (SAMPLE(?name) as ?name) ?timeslice (count(distinct ?s) as ?count) where {\n", " ?s a podio:Discourse;\n", " podio:content ?text;\n", " terms:creator ?auth;\n", " terms:created ?date .\n", " BIND(year(?date) as ?timeslice) .\n", " filter contains(lcase(str(?text)),\"lgtbi\")\n", " \n", " ?auth terms:identifier ?ids.\n", " BIND(IRI(CONCAT(\"http://www.wikidata.org/entity/\", str(?ids))) AS ?wikidataIri) .\n", "\n", " SERVICE {\n", " ?wikidataIri rdfs:label ?name.\n", " FILTER(LANGMATCHES(LANG(?name), \"en\")).\n", " SERVICE wikibase:label { bd:serviceParam wikibase:language \"en\". }\n", " }\n", " \n", " } GROUP BY ?timeslice ?auth\n", "ORDER BY DESC (?count)\n", "\n", "\"\"\"\n", "\n", "run_query(query)" ] }, { "cell_type": "markdown", "id": "44686cf8", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### CQ8: What were the five most shared speeches?" ] }, { "cell_type": "code", "execution_count": null, "id": "bd5155ad", "metadata": { "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "select ?text (?int as ?numberOfTimeShared) where { \n", " ?s a podio:Discourse;\n", " schema:interactionStatistic ?stats;\n", " terms:creator ?auth;\n", " podio:content ?text.\n", " \n", " ?stats schema:interactionType schema:ShareAction;\n", " schema:userInteractionCount ?number.\n", "\n", " BIND( abs(xsd:float(?number)) as ?int)\n", "\n", "\n", "} \n", "ORDER BY DESC (?int)\n", "LIMIT 5\n", "\"\"\"\n", "\n", "run_query(query)" ] }, { "cell_type": "markdown", "id": "f2ef6523", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### CQ9:What are the ten authorities that have approved the most legislations?" ] }, { "cell_type": "code", "execution_count": null, "id": "fabcbff8", "metadata": { "ExecuteTime": { "end_time": "2024-02-06T11:59:57.197182Z", "start_time": "2024-02-06T11:59:41.028911Z" }, "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "\n", "SELECT ?jurisdiction (?auth as ?authority) (MAX(?numberOfLegislations) AS ?numberOfLegislations)\n", "WHERE {\n", " SELECT (?juris as ?jurisdiction) ?auth (COUNT(DISTINCT(?id)) AS ?numberOfLegislations)\n", " WHERE {\n", " {\n", " ?s a podio:ApprovedPolicy ;\n", " podio:content ?s_text ;\n", " lkg:metadata ?metadata .\n", " ?metadata eli:jurisdiction ?juris ;\n", " eli:version_date ?date ;\n", " eli:id_local ?id ;\n", " lkg:hasAuthority ?auth ;\n", " terms:source ?source .\n", " \n", " }\n", " UNION{\n", " SERVICE {\n", " ?s a lkg:Legislation ;\n", " nif:isString ?s_text ;\n", " lkg:metadata ?metadata .\n", " ?metadata eli:jurisdiction ?juris ;\n", " eli:version_date ?date ;\n", " eli:id_local ?id ;\n", " lkg:hasAuthority ?auth ;\n", " terms:source ?source .\n", " }\n", " }\n", " BIND(year(?date) as ?year)\n", " } group by ?juris ?auth\n", " \n", "} group by ?jurisdiction ?auth\n", "ORDER BY DESC(?numberOfLegislations) Limit 10\n", "\n", "\"\"\"\n", "\n", "run_query(query)\n" ] }, { "cell_type": "markdown", "id": "87c0d99d", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### CQ10: How much legislations are available for each jurisdiction and year?" ] }, { "cell_type": "code", "execution_count": null, "id": "9db23272", "metadata": { "ExecuteTime": { "end_time": "2024-02-05T10:15:05.049922Z", "start_time": "2024-02-05T10:14:48.582259Z" }, "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "SELECT (?juris as ?jurisdiction) ?year (COUNT(DISTINCT(?id)) AS ?numberOfLegislations)\n", "WHERE {\n", " {\n", " ?s a lkg:Legislation ;\n", " nif:isString ?s_text ;\n", " lkg:metadata ?metadata .\n", " ?metadata eli:jurisdiction ?juris ;\n", " eli:version_date ?date ;\n", " eli:id_local ?id ;\n", " terms:source ?source .\n", " }\n", " UNION{\n", " SERVICE {\n", " ?s a lkg:Legislation ;\n", " nif:isString ?s_text ;\n", " lkg:metadata ?metadata .\n", " ?metadata eli:jurisdiction ?juris ;\n", " eli:version_date ?date ;\n", " eli:id_local ?id ;\n", " terms:source ?source .\n", " }\n", " }\n", " BIND(year(?date) as ?year)\n", " \n", "} group by ?juris ?year\n", "\n", "\"\"\"\n", "\n", "run_query(query)\n", "\n" ] }, { "cell_type": "markdown", "id": "903e4f1f", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### CQ11: What is the year in which more legislations were approved by each jurisdiction?" ] }, { "cell_type": "code", "execution_count": null, "id": "e294725d", "metadata": { "ExecuteTime": { "end_time": "2023-12-21T11:36:05.822275Z", "start_time": "2023-12-21T11:35:50.082728Z" }, "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "SELECT ?jurisdiction (SAMPLE(?year) as ?year) (MAX(?numberOfLegislations) AS ?numberOfLegislations)\n", "WHERE {\n", " SELECT (?juris as ?jurisdiction) ?year (COUNT(DISTINCT(?id)) AS ?numberOfLegislations)\n", " WHERE {\n", " {\n", " ?s a lkg:Legislation ;\n", " nif:isString ?s_text ;\n", " lkg:metadata ?metadata .\n", " ?metadata eli:jurisdiction ?juris ;\n", " eli:version_date ?date ;\n", " eli:id_local ?id ;\n", " terms:source ?source .\n", " }\n", " UNION{\n", " SERVICE {\n", " ?s a lkg:Legislation ;\n", " nif:isString ?s_text ;\n", " lkg:metadata ?metadata .\n", " ?metadata eli:jurisdiction ?juris ;\n", " eli:version_date ?date ;\n", " eli:id_local ?id ;\n", " terms:source ?source .\n", " }\n", " }\n", " BIND(year(?date) as ?year)\n", "\n", " } group by ?juris ?year\n", "} group by ?jurisdiction\n", "\n", "\"\"\"\n", "\n", "run_query(query)\n" ] }, { "cell_type": "markdown", "id": "e65f6c41", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### CQ12: What has been the latest legislation approved by each jurisdiction?" ] }, { "cell_type": "code", "execution_count": null, "id": "b9b2b928", "metadata": { "ExecuteTime": { "end_time": "2024-02-05T10:14:12.998280Z", "start_time": "2024-02-05T10:13:53.841761Z" }, "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "\n", "\n", "SELECT (?juris as ?jurisdiction) (MAX(?date) AS ?date) (SAMPLE(?id) AS ?id) (SAMPLE(?source) AS ?source) (SAMPLE(?s_text) AS ?content) \n", "WHERE {\n", " {\n", " ?s a lkg:Legislation ;\n", " nif:isString ?s_text ;\n", " lkg:metadata ?metadata .\n", " ?metadata eli:jurisdiction ?juris ;\n", " eli:version_date ?date ;\n", " eli:id_local ?id ;\n", " terms:source ?source .\n", " }\n", " UNION{\n", " SERVICE {\n", " ?s a lkg:Legislation ;\n", " nif:isString ?s_text ;\n", " lkg:metadata ?metadata .\n", " ?metadata eli:jurisdiction ?juris ;\n", " eli:version_date ?date ;\n", " eli:id_local ?id ;\n", " terms:source ?source .\n", " }\n", " } \n", "} group by ?juris\n", "\"\"\"\n", "\n", "run_query(query)\n", "\n" ] }, { "cell_type": "markdown", "id": "fad9450d", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### CQ13: What was the last legislation approved by the government of the Community of Madrid?" ] }, { "cell_type": "code", "execution_count": null, "id": "3e6aed30", "metadata": { "ExecuteTime": { "end_time": "2024-02-06T12:03:16.030100Z", "start_time": "2024-02-06T12:03:00.292895Z" }, "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "\n", "\n", "SELECT ?auth (SAMPLE(?juris) as ?jurisdiction) (MAX(?date) AS ?date) (SAMPLE(?id) AS ?id) (SAMPLE(?source) AS ?source) (SAMPLE(?s_text) AS ?content) \n", "WHERE {\n", " {\n", " ?s a lkg:Legislation ;\n", " nif:isString ?s_text ;\n", " lkg:metadata ?metadata .\n", " ?metadata eli:jurisdiction ?juris ;\n", " eli:version_date ?date ;\n", " eli:id_local ?id ;\n", " lkg:hasAuthority ?auth;\n", " terms:source ?source .\n", " }\n", " UNION{\n", " SERVICE {\n", " ?s a lkg:Legislation ;\n", " nif:isString ?s_text ;\n", " lkg:metadata ?metadata .\n", " ?metadata eli:jurisdiction ?juris ;\n", " eli:version_date ?date ;\n", " eli:id_local ?id ;\n", " lkg:hasAuthority ?auth;\n", " terms:source ?source .\n", " }\n", " } FILTER(?auth=)\n", "}GROUP BY ?auth\n", "\"\"\"\n", "\n", "run_query(query)\n" ] }, { "cell_type": "markdown", "id": "95773e74", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### CQ14: What is the longest legislation?" ] }, { "cell_type": "code", "execution_count": null, "id": "159f3ffa", "metadata": { "ExecuteTime": { "end_time": "2023-12-21T13:16:19.741150Z", "start_time": "2023-12-21T13:16:04.097220Z" }, "hidden": true, "scrolled": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "\n", "\n", "SELECT (SAMPLE(?juris) as ?jurisdiction) (SAMPLE(?date) AS ?date) (SAMPLE(?id) AS ?id) \n", "(SAMPLE(?source) AS ?source) (MAX(STRLEN(?s_text)) AS ?len_s_text) (SAMPLE(?s_text) as ?content)\n", "WHERE {\n", " {\n", " ?s a lkg:Legislation ;\n", " nif:isString ?s_text ;\n", " lkg:metadata ?metadata .\n", " ?metadata eli:jurisdiction ?juris ;\n", " eli:version_date ?date ;\n", " eli:id_local ?id ;\n", " terms:source ?source .\n", " }\n", " UNION{\n", " SERVICE {\n", " ?s a lkg:Legislation ;\n", " nif:isString ?s_text ;\n", " lkg:metadata ?metadata .\n", " ?metadata eli:jurisdiction ?juris ;\n", " eli:version_date ?date ;\n", " eli:id_local ?id ;\n", " terms:source ?source .\n", " }\n", " } \n", "}\n", "\"\"\"\n", "\n", "run_query(query)\n" ] }, { "cell_type": "markdown", "id": "3160df4b", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### R1: Political discourse has different metrics (views, listeners, attendees, likes, etc.)" ] }, { "cell_type": "code", "execution_count": null, "id": "e3ba47ba", "metadata": { "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "select DISTINCT ?metrics where { \n", " ?s a podio:Discourse;\n", " schema:interactionStatistic ?stats.\n", " ?stats a schema:InteractionCounter;\n", " schema:interactionType ?metrics.\n", "}\n", "\n", "\"\"\"\n", "\n", "run_query(query)" ] }, { "cell_type": "markdown", "id": "6e8937c2", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### R2: A speech is published by an agent who may not be the creator of the speech itself." ] }, { "cell_type": "code", "execution_count": null, "id": "4620e1c7", "metadata": { "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "select DISTINCT ?s ?publisher ?creator where { \n", " ?s a podio:Discourse;\n", " terms:publisher ?publisher;\n", " terms:creator ?creator.\n", " FILTER(?publisher != ?creator)\n", "}\n", "\n", "\"\"\"\n", "\n", "run_query(query)" ] }, { "cell_type": "markdown", "id": "0b23be7e", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### R3: A speech has an audience. That is, the group of people who have received the message." ] }, { "cell_type": "code", "execution_count": null, "id": "5a5298d3", "metadata": { "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "select DISTINCT ?audience where { \n", " ?s a podio:Discourse;\n", " terms:audience ?audience.\n", "}\n", "\n", "\"\"\"\n", "\n", "run_query(query)" ] }, { "cell_type": "markdown", "id": "80dcc938", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### R4: A speech has a target. That is, the group of people to whom the message is intended. This group does not necessarily have to coincide with the audience." ] }, { "cell_type": "code", "execution_count": null, "id": "a53c4201", "metadata": { "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "select DISTINCT ?audience ?target where { \n", " ?s a podio:Discourse;\n", " terms:audience ?audience;\n", " podio:hasTarget ?target.\n", " FILTER(?target != ?audience)\n", "}\n", "\n", "\"\"\"\n", "\n", "run_query(query)" ] }, { "cell_type": "markdown", "id": "cac600cb", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### R5: Both speeches and party manifestos are written in one or more languages." ] }, { "cell_type": "code", "execution_count": null, "id": "7324a5f0", "metadata": { "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "select DISTINCT ?language where { \n", " ?s a podio:Discourse;\n", " terms:language ?language.\n", "}\n", "\"\"\"\n", "\n", "run_query(query)" ] }, { "cell_type": "markdown", "id": "d7ddeab6", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### R6: Both speeches and party manifestos have a publication date." ] }, { "cell_type": "code", "execution_count": null, "id": "c563f7c8", "metadata": { "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "select ?date where { \n", " ?s a podio:Discourse;\n", " terms:created ?date.\n", "} Limit 10\n", "\"\"\"\n", "\n", "run_query(query)" ] }, { "cell_type": "markdown", "id": "ff5dd4f5", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### R7: Both speeches and party manifestos can have a description." ] }, { "cell_type": "code", "execution_count": null, "id": "0b48972d", "metadata": { "ExecuteTime": { "end_time": "2024-02-06T16:49:52.806604Z", "start_time": "2024-02-06T16:49:52.773220Z" }, "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "select ?desc where { \n", " VALUES ?clases {\n", " podio:Discourse\n", " podio:PartyManifesto\n", " }\n", " ?s a ?clases;\n", " terms:description ?desc.\n", "} Limit 10\n", "\n", "\"\"\"\n", "\n", "run_query(query)" ] }, { "cell_type": "markdown", "id": "21b2387c", "metadata": { "ExecuteTime": { "end_time": "2024-02-06T17:09:13.610230Z", "start_time": "2024-02-06T17:09:13.606913Z" }, "heading_collapsed": true, "hidden": true }, "source": [ "### R8: Both speeches and party manifestos have an ideology." ] }, { "cell_type": "code", "execution_count": null, "id": "c492ac14", "metadata": { "ExecuteTime": { "end_time": "2024-02-06T17:09:13.603052Z", "start_time": "2024-02-06T17:08:54.253934Z" }, "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "select DISTINCT ?ideology ?ideologyLabel where { \n", " VALUES ?clases {\n", " podio:Discourse\n", " podio:PartyManifesto\n", " }\n", " ?s a ?clases;\n", " podio:ideology ?ideology.\n", " \n", " # Query Wikidata using federation\n", " SERVICE {\n", " ?ideology rdfs:label ?ideologyLabel.\n", " FILTER(LANGMATCHES(LANG(?ideologyLabel), \"en\")).\n", " SERVICE wikibase:label { bd:serviceParam wikibase:language \"en\". }\n", " }\n", " \n", "} Limit 3\n", "\n", "\"\"\"\n", "\n", "run_query(query)" ] }, { "cell_type": "markdown", "id": "77c46630", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### R9: Both speeches and party manifestos should reference the source of information in the official channel." ] }, { "cell_type": "code", "execution_count": null, "id": "4f55826a", "metadata": { "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "select DISTINCT ?sources where { \n", " VALUES ?clases {\n", " podio:Discourse\n", " podio:PartyManifesto\n", " }\n", " ?s a ?clases;\n", " terms:source ?sources.\n", " \n", " \n", "} Limit 3\n", "\n", "\"\"\"\n", "\n", "run_query(query)" ] }, { "cell_type": "markdown", "id": "6a5d986a", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### R10: A party manifesto has a textual content extracted from a document." ] }, { "cell_type": "code", "execution_count": null, "id": "6c72b535", "metadata": { "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "select DISTINCT ?content ?sources where { \n", " ?s a podio:PartyManifesto;\n", " podio:content ?content;\n", " terms:source ?sources.\n", "\n", "} Limit 3\n", "\"\"\"\n", "\n", "run_query(query)" ] }, { "cell_type": "markdown", "id": "1e1975c8", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### R11: A party manifesto proposes a candidate for political position." ] }, { "cell_type": "code", "execution_count": null, "id": "392260a5", "metadata": { "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "select DISTINCT ?candidate ?candidateLabel ?sources where { \n", " ?s a podio:PartyManifesto;\n", " podio:proposesCandidate ?candidate;\n", " terms:source ?sources.\n", " \n", " # Query Wikidata using federation\n", " SERVICE {\n", " ?candidate rdfs:label ?candidateLabel.\n", " FILTER(LANGMATCHES(LANG(?candidateLabel), \"en\")).\n", " SERVICE wikibase:label { bd:serviceParam wikibase:language \"en\". }\n", " }\n", " \n", "} Limit 3\n", "\n", "\"\"\"\n", "\n", "run_query(query)" ] }, { "cell_type": "markdown", "id": "d04b715c", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### R12: A party manifesto is published by the political party that drafts it." ] }, { "cell_type": "code", "execution_count": null, "id": "f42867a7", "metadata": { "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "select DISTINCT ?publisher ?publisherLabel ?sources where { \n", " ?s a podio:PartyManifesto;\n", " terms:publisher ?publisher.\n", " \n", " # Query Wikidata using federation\n", " SERVICE {\n", " ?publisher rdfs:label ?publisherLabel.\n", " FILTER(LANGMATCHES(LANG(?publisherLabel), \"en\")).\n", " SERVICE wikibase:label { bd:serviceParam wikibase:language \"en\". }\n", " }\n", " \n", "} Limit 3\n", "\"\"\"\n", "run_query(query)" ] }, { "cell_type": "markdown", "id": "653b4f17", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### R13: A party manifesto has several policy proposals." ] }, { "cell_type": "code", "execution_count": null, "id": "de532edc", "metadata": { "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "select DISTINCT ?pmanifest ?proposal ?proposalContent where { \n", " ?pmanifest a podio:PartyManifesto.\n", " \n", " ?proposal terms:isPartOf ?pmanifest;\n", " podio:content ?proposalContent.\n", " \n", " \n", "} Limit 3\n", "\"\"\"\n", "run_query(query)" ] }, { "cell_type": "markdown", "id": "9b10f19b", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### R14: Political speech can have content in any format, both text and multimedia." ] }, { "cell_type": "code", "execution_count": null, "id": "0acaac39", "metadata": { "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "select DISTINCT ?content where { \n", " ?speech a podio:Discourse;\n", " podio:content ?content.\n", " \n", "} Limit 3\n", "\"\"\"\n", "run_query(query)" ] }, { "cell_type": "markdown", "id": "f6552fe8", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### R15: There are speeches that are shared in digital communities and allow direct interaction with the audience." ] }, { "cell_type": "code", "execution_count": null, "id": "0ea88c5e", "metadata": { "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "select DISTINCT ?content where { \n", " ?speech a podio:Conversational;\n", " podio:content ?content.\n", " \n", "} Limit 3\n", "\"\"\"\n", "run_query(query)" ] }, { "cell_type": "markdown", "id": "68d562d3", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### R16: There are speeches that may be shared on channels that do not allow interaction with the audience." ] }, { "cell_type": "code", "execution_count": null, "id": "b3178478", "metadata": { "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "select DISTINCT ?content where { \n", " ?speech a podio:Expository;\n", " podio:content ?content.\n", " \n", "} Limit 3\n", "\"\"\"\n", "run_query(query)" ] }, { "cell_type": "markdown", "id": "64209ba0", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### R17: Speeches shared in digital communities should implement the basic mechanics of interaction (reply, mention, thread, hashtags, post, repost, follow, and share content)." ] }, { "cell_type": "code", "execution_count": null, "id": "7bf6f9a9", "metadata": { "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "select DISTINCT ?p where { \n", " ?speech a podio:Conversational;\n", " ?p ?o.\n", " \n", "} \n", "\n", "\"\"\"\n", "run_query(query)" ] }, { "cell_type": "markdown", "id": "7800861c", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### R18: The laws approved by political parties and their electoral proposals are speeches that do not allow for direct interaction with the audience." ] }, { "cell_type": "code", "execution_count": null, "id": "1d8c8a24", "metadata": { "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "select DISTINCT ?speech where { \n", " VALUES ?clases {\n", " podio:PolicyProposal\n", " podio:ApprovedPolicy\n", " }\n", " ?speech a ?clases;\n", " a podio:Expository.\n", " \n", "} \n", "\"\"\"\n", "run_query(query)" ] }, { "cell_type": "markdown", "id": "3399e2fe", "metadata": { "hidden": true }, "source": [ "### R19: It should be possible to explore the existing laws in the Lynx knowledge graph." ] }, { "cell_type": "code", "execution_count": null, "id": "0d154bb6", "metadata": { "hidden": true }, "outputs": [], "source": [ "query= \"\"\"\n", "PREFIX dc: \n", "PREFIX dcam: \n", "PREFIX eli: \n", "PREFIX foaf: \n", "PREFIX lkg: \n", "PREFIX nif-core: \n", "PREFIX owl: \n", "PREFIX podio: \n", "PREFIX rdf: \n", "PREFIX schema: \n", "PREFIX sioc: \n", "PREFIX skos: \n", "PREFIX terms: \n", "PREFIX vann: \n", "PREFIX xml: \n", "PREFIX xsd: \n", "PREFIX rdfs: \n", "PREFIX nif: \n", "PREFIX wd: \n", "PREFIX wdt: \n", "PREFIX wikibase: \n", "PREFIX bd: \n", "\n", "SELECT (?juris as ?jurisdiction) ?year ?s_text\n", "WHERE {\n", " \n", " SERVICE {\n", " ?s a lkg:Legislation ;\n", " nif:isString ?s_text ;\n", " lkg:metadata ?metadata .\n", " ?metadata eli:jurisdiction ?juris ;\n", " eli:version_date ?date ;\n", " eli:id_local ?id ;\n", " terms:source ?source .\n", " }\n", " \n", " BIND(year(?date) as ?year)\n", " \n", "} LIMIT 10\n", "\"\"\"\n", "run_query(query)" ] } ], "metadata": { "hide_input": false, "kernelspec": { "display_name": "mets", "language": "python", "name": "mets" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.6" }, "toc": { "base_numbering": 1, "nav_menu": { "height": "319px", "width": "160px" }, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 5 }