{ "cells": [ { "cell_type": "markdown", "id": "97493594", "metadata": {}, "source": [ "# Knowledge Graph construction and querying with extracted software metadata\n", "\n", "This notebook first generates a knowledge graph from the information extracted about software repositories with [SOMEF](https://github.com/KnowledgeCaptureAndDiscovery/somef). It is later queried to assess the good practices followed by the extracted repositories." ] }, { "cell_type": "code", "execution_count": 1, "id": "16cf3ebb", "metadata": {}, "outputs": [], "source": [ "import morph_kgc\n", "import pyoxigraph\n", "import re" ] }, { "cell_type": "markdown", "id": "7d0436b4", "metadata": {}, "source": [ "## Knowledge Graph construction\n", "The knowledge graph is generated using [Morph-KGC](https://morph-kgc.readthedocs.io/en/latest/), that uses RML mappings to transform the JSON file into RDF. In this case, we use the [RML-star](http://w3id.org/rml/star/spec) module to generate an RDF-star graph. This tool requires some configuration parameters, where we indicate the desired output serialisation and the name and path to the RML mapping file. Then, the knowledge graph is generated and stored as a oxigraph store in the variable `graph`, that it is also saved as a `.nq` file." ] }, { "cell_type": "markdown", "id": "7f074d59", "metadata": {}, "source": [ "First, the mapping written in the YARRRML serialisation is translated into RML-star using [Yatter](https://github.com/oeg-upm/yatter)." ] }, { "cell_type": "code", "execution_count": 2, "id": "e0c4a09c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[32m2023-09-11 18:25:38,323\u001b[0m | \u001b[1;30mINFO:\u001b[0m Translating YARRRML mapping to [R2]RML\n", "\u001b[32m2023-09-11 18:25:38,326\u001b[0m | \u001b[1;30mINFO:\u001b[0m RML content is created!\n", "\u001b[32m2023-09-11 18:25:38,398\u001b[0m | \u001b[1;30mINFO:\u001b[0m Mapping has been syntactically validated.\n", "\u001b[32m2023-09-11 18:25:38,398\u001b[0m | \u001b[1;30mINFO:\u001b[0m Translation has finished successfully.\n" ] } ], "source": [ "!python3 -m yatter -i ../mappings/mapping-somef-star.yml -o ../mappings/mapping-somef-star.ttl" ] }, { "cell_type": "markdown", "id": "86fd8c30", "metadata": {}, "source": [ "Configuration options for running Morph-KGC, indicating the path to the mapping file and the desired serialisaiton output (N-Quads)." ] }, { "cell_type": "code", "execution_count": 3, "id": "7f142c18", "metadata": {}, "outputs": [], "source": [ "config = \"\"\"\n", " [CONFIGURATION]\n", " output_format=N-QUADS\n", " \n", " [SOMEF-json]\n", " mappings=../mappings/mapping-somef-star.ttl\n", " \"\"\"" ] }, { "cell_type": "markdown", "id": "9d1c5bd4", "metadata": {}, "source": [ "Generation of the knowledge graph, storing it in the variable `graph`." ] }, { "cell_type": "code", "execution_count": 4, "id": "053b4628", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/mapping/mapping_parser.py:390: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " fnml_df = fnml_df.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/mapping/mapping_parser.py:607: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'JSON' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.\n", " self.rml_df.at[i, 'source_type'] = file_extension.upper()\n", "INFO | 2023-09-11 18:25:48,419 | 145 mapping rules retrieved.\n", "INFO | 2023-09-11 18:25:48,485 | Mapping partition with 46 groups generated.\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/mapping/mapping_partitioner.py:182: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`\n", " f\"{self.rml_df['mapping_partition'].value_counts()[0]}.\")\n", "INFO | 2023-09-11 18:25:48,486 | Maximum number of rules within mapping group: 30.\n", "INFO | 2023-09-11 18:25:48,487 | Mappings processed in 3.325 seconds.\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "/home/dgarijo/miniconda3/envs/oeg_software_graph/lib/python3.10/site-packages/morph_kgc/materializer.py:36: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.\n", " data = data.applymap(str)\n", "INFO | 2023-09-11 18:25:51,654 | Number of triples generated in total: 19449.\n" ] } ], "source": [ "graph = morph_kgc.materialize_oxigraph(config)" ] }, { "cell_type": "markdown", "id": "f5850103", "metadata": {}, "source": [ "Addition of triples to annotate the named graph that contains all triples created in the previous step. This annotations contain the date in which the metadata was extracted and the tool used for the metadata extraction." ] }, { "cell_type": "code", "execution_count": 5, "id": "10e1312c", "metadata": {}, "outputs": [], "source": [ "graph.add(pyoxigraph.Quad(\n", " pyoxigraph.NamedNode('https://w3id.org/okn/i/graph/20230628'),\n", " pyoxigraph.NamedNode('http://purl.org/dc/terms/created'),\n", " pyoxigraph.Literal('2023-06-28 00:00:00', datatype=pyoxigraph.NamedNode('http://www.w3.org/2001/XMLSchema#dateTime')),\n", " pyoxigraph.NamedNode('https://w3id.org/okn/i/graph/default')))\n", "graph.add(pyoxigraph.Quad(\n", " pyoxigraph.NamedNode('https://w3id.org/okn/i/graph/20230628'),\n", " pyoxigraph.NamedNode('http://www.w3.org/ns/prov#wasAttributedTo'),\n", " pyoxigraph.Literal('SOftware Metadata Extraction Framework (SOMEF)', datatype=pyoxigraph.NamedNode('http://www.w3.org/2001/XMLSchema#string')),\n", " pyoxigraph.NamedNode('https://w3id.org/okn/i/graph/default')))" ] }, { "cell_type": "markdown", "id": "5e51f11a", "metadata": {}, "source": [ "The graph is saved as a local file." ] }, { "cell_type": "code", "execution_count": 6, "id": "5c94be6a", "metadata": {}, "outputs": [], "source": [ "with open('../data/somef-kg.nq', 'w') as result:\n", " result.write(str(graph))" ] }, { "cell_type": "markdown", "id": "de8e6b63", "metadata": {}, "source": [ "## KG querying - Assessment of research software best practices" ] }, { "cell_type": "markdown", "id": "ce67ee97", "metadata": {}, "source": [ "Once the knowledge graph is created, we use it to answer some queries to assess the compliance of the GitHub repositories with respect to some best practices in research software\n", "\n", "\n", "| ID | Best practice | FAIR Principle |\n", "|------|-----------------------------------------------------------------------|----------------|\n", "| BP1 | A description (long or short) is available | F |\n", "| BP2 | A persistent identifier (e.g., DOI) is available | F |\n", "| BP3 | A download URL is available | A |\n", "| BP4 | A software versioning scheme is followed | A |\n", "| BP5 | Usage documentation (including I/O) is available | I,R |\n", "| BP6 | A license is declared | R |\n", "| BP7 | An explicit citation is provided | R |\n", "| BP8 | Software metadata (programming language, keywords, etc.) is available | F,R |\n", "| BP9 | Installation instructions are available | R |\n", "| BP10 | Software requirements are available | R |\n" ] }, { "cell_type": "markdown", "id": "9d92b113", "metadata": {}, "source": [ "First, the extracted repositories as counted." ] }, { "cell_type": "code", "execution_count": 7, "id": "d1b1b20c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total number of repositories: 270\n" ] } ], "source": [ "q_res = graph.query(\"\"\"\n", " PREFIX sd: \n", " \n", " SELECT (COUNT (DISTINCT ?s) AS ?count_software)\n", " FROM \n", " WHERE {\n", " ?s a sd:Software\n", " }\n", "\"\"\")\n", "\n", "result_list = {}\n", "\n", "for solution in q_res:\n", " print(\"Total number of repositories:\", solution['count_software'].value)\n", " result_list['total_repos'] = solution['count_software'].value" ] }, { "cell_type": "markdown", "id": "45d11b4e", "metadata": {}, "source": [ "### BP 1: Description is available" ] }, { "cell_type": "markdown", "id": "64046403", "metadata": {}, "source": [ "Number of repositories with description, either long, short or both." ] }, { "cell_type": "code", "execution_count": 8, "id": "76b2ae7e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total number of descriptions: 229\n" ] } ], "source": [ "q_res = graph.query(\"\"\"\n", " PREFIX sd: \n", " \n", " SELECT (COUNT (DISTINCT ?software) AS ?software_count) \n", " FROM \n", " WHERE {\n", " ?software a sd:Software; \n", " sd:description ?desc .\n", " } \n", "\"\"\")\n", "\n", "BP_results = {}\n", "\n", "for solution in q_res:\n", " print(\"Total number of descriptions:\", solution['software_count'].value)\n", " result_list['total_description'] = solution['software_count'].value\n", " BP_results['BP1'] = int(solution['software_count'].value)*100/int(result_list['total_repos'])" ] }, { "cell_type": "markdown", "id": "942aa175", "metadata": {}, "source": [ "Numer of software with descriptions by type: long (README) or short (GitHub API)" ] }, { "cell_type": "code", "execution_count": 9, "id": "80a7ac2b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total number of repositories with short description: 200\n" ] } ], "source": [ "q_res = graph.query(\"\"\"\n", " PREFIX sd: \n", " \n", " SELECT (COUNT (DISTINCT ?software_short) AS ?short_desc_count)\n", " FROM \n", " WHERE {\n", " << ?software_short sd:description ?desc_short >> sd:technique \"GitHub_API\".\n", " }\n", "\"\"\")\n", "\n", "for solution in q_res:\n", " print(\"Total number of repositories with short description:\", solution['short_desc_count'].value)\n", " result_list['total_short_desc'] = solution['short_desc_count'].value\n", " " ] }, { "cell_type": "code", "execution_count": 10, "id": "ebcce44e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total number of repositories with long description: 88\n" ] } ], "source": [ "q_res = graph.query(\"\"\"\n", " PREFIX sd: \n", " PREFIX xsd: \n", " \n", " SELECT (COUNT (DISTINCT ?software_long) AS ?long_desc_count)\n", " FROM \n", " WHERE {\n", " << ?software_long sd:description ?desc_long >> sd:technique ?long_technique ;\n", " sd:confidence ?long_conf .\n", " VALUES ?long_technique {\"supervised_classification\" \"header_analysis\"}\n", " FILTER(?long_conf > 0.98)\n", " }\n", "\"\"\")\n", "\n", "for solution in q_res:\n", " print(\"Total number of repositories with long description:\", solution['long_desc_count'].value)\n", " result_list['total_long_desc'] = solution['long_desc_count'].value\n", " " ] }, { "cell_type": "markdown", "id": "7f66cd54", "metadata": {}, "source": [ "### BP2: Persistent identifier\n", "Repositories that provide a DOI (not from a publication, but from e.g. Zenodo)" ] }, { "cell_type": "code", "execution_count": 11, "id": "98b4893e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total number of repositories with DOI: 21\n" ] } ], "source": [ "q_res = graph.query(\"\"\"\n", " PREFIX sd: \n", " \n", " SELECT (COUNT (DISTINCT ?software) AS ?count_software)\n", " FROM \n", " WHERE {\n", " ?software sd:identifier ?id \n", " }\n", "\"\"\")\n", "\n", "for solution in q_res:\n", " print(\"Total number of repositories with DOI:\", solution['count_software'].value)\n", " result_list['total_id'] = solution['count_software'].value\n", " BP_results['BP2'] = int(solution['count_software'].value)*100/int(result_list['total_repos'])" ] }, { "cell_type": "markdown", "id": "14d1e157", "metadata": {}, "source": [ "### BP3: Download URL\n", "Repositories that provide a URL for download from releases" ] }, { "cell_type": "code", "execution_count": 12, "id": "5f188c03", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total number of repositories with download URL: 81\n" ] } ], "source": [ "q_res = graph.query(\"\"\"\n", " PREFIX sd: \n", " \n", " SELECT (COUNT (DISTINCT ?software) AS ?count_software)\n", " FROM \n", " WHERE {\n", " ?software sd:hasVersion ?version \n", " }\n", "\"\"\")\n", "\n", "for solution in q_res:\n", " print(\"Total number of repositories with download URL:\", solution['count_software'].value)\n", " result_list['total_down_url'] = solution['count_software'].value\n", " BP_results['BP3'] = int(solution['count_software'].value)*100/int(result_list['total_repos'])\n" ] }, { "cell_type": "markdown", "id": "54e78c80", "metadata": {}, "source": [ "### BP4: A software versioning scheme is followed\n", "Repositories whose version tags follow semantic versioning scheme" ] }, { "cell_type": "code", "execution_count": 13, "id": "c42a54c8", "metadata": {}, "outputs": [], "source": [ "def is_semantic_version(version):\n", " pattern = r\"^[v|V]?(0|[1-9]\\d*)\\.(0|[1-9]\\d*)\\.(0|[1-9]\\d*)(?:-([0-9A-Za-z-]+(?:\\.[0-9A-Za-z-]+)*))?(?:\\+([0-9A-Za-z-]+(?:\\.[0-9A-Za-z-]+)*))?$\"\n", " return re.match(pattern, version) is not None" ] }, { "cell_type": "code", "execution_count": 14, "id": "55ce0802", "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total number of repositories with semantic versioning: 30\n" ] } ], "source": [ "q_res = graph.query(\"\"\"\n", " PREFIX sd: \n", " \n", " SELECT DISTINCT ?software (GROUP_CONCAT (?versionId) AS ?ids)\n", " FROM \n", " WHERE {\n", " ?software sd:hasVersion/sd:hasVersionId ?versionId\n", " } GROUP BY ?software\n", "\"\"\")\n", "\n", "\n", "total_semantic_versioning = 0\n", "for solution in q_res:\n", " version_ids = solution['ids'].value\n", " version_ids_array = version_ids.split(' ')\n", " results = [True if is_semantic_version(version) else False for version in version_ids_array]\n", " overall_res = False if False in results else True\n", " total_semantic_versioning = total_semantic_versioning if False in results else total_semantic_versioning + 1\n", "\n", "print(\"Total number of repositories with semantic versioning:\", total_semantic_versioning)\n", "result_list['total_semantic_versioning'] = total_semantic_versioning\n", "BP_results['BP4'] = int(total_semantic_versioning)*100/int(result_list['total_repos'])\n" ] }, { "cell_type": "markdown", "id": "bc235e5f", "metadata": {}, "source": [ "### BP5: Documentation is available\n", "Repositories that provide readable documentation" ] }, { "cell_type": "code", "execution_count": 15, "id": "b0cc1eec", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total number of repositories with available documentation: 42\n" ] } ], "source": [ "q_res = graph.query(\"\"\"\n", " PREFIX sd: \n", " \n", " SELECT (COUNT (DISTINCT ?software) AS ?count_software)\n", " FROM \n", " WHERE {\n", " ?software sd:hasDocumentation ?doc \n", " }\n", "\"\"\")\n", "\n", "for solution in q_res:\n", " print(\"Total number of repositories with available documentation:\", solution['count_software'].value)\n", " result_list['total_docs'] = solution['count_software'].value\n", " BP_results['BP5'] = int(solution['count_software'].value)*100/int(result_list['total_repos'])\n", " \n" ] }, { "cell_type": "markdown", "id": "f0fee18c", "metadata": {}, "source": [ "### BP6: License available\n", "Repositories that declare a license" ] }, { "cell_type": "code", "execution_count": 16, "id": "d1353952", "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total number of repositories with license: 164\n" ] } ], "source": [ "q_res = graph.query(\"\"\"\n", " PREFIX sd: \n", " PREFIX schema: \n", " \n", " SELECT (COUNT (DISTINCT ?software) AS ?count_software)\n", " FROM \n", " WHERE {\n", " ?software a sd:Software ;\n", " schema:license ?license .\n", " ?license a schema:CreativeWork ;\n", " sd:name ?license_name .\n", " }\n", "\"\"\")\n", "\n", "for solution in q_res:\n", " print(\"Total number of repositories with license:\", solution['count_software'].value)\n", " result_list['total_license'] = solution['count_software'].value\n", " BP_results['BP6'] = int(solution['count_software'].value)*100/int(result_list['total_repos'])\n" ] }, { "cell_type": "markdown", "id": "51929ac1", "metadata": {}, "source": [ "### BP7: Explicit citation\n", "Repositories that provide a explicit citation, eiter in the README or with a CFF file." ] }, { "cell_type": "code", "execution_count": 17, "id": "04bc7cb3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total number of repositories with citation: 22\n" ] } ], "source": [ "q_res = graph.query(\"\"\"\n", " PREFIX sd: \n", " PREFIX schema: \n", " PREFIX prov: \n", " \n", " SELECT (COUNT (DISTINCT ?software) AS ?count_software)\n", " FROM \n", " WHERE {\n", " ?software sd:citation ?cite \n", " }\n", "\"\"\")\n", "\n", "for solution in q_res:\n", " print(\"Total number of repositories with citation:\", solution['count_software'].value)\n", " result_list['total_citation'] = solution['count_software'].value\n", " BP_results['BP7'] = int(solution['count_software'].value)*100/int(result_list['total_repos'])\n" ] }, { "cell_type": "code", "execution_count": 18, "id": "1a0e584e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total number of repositories with citation in README: 20\n" ] } ], "source": [ "q_res = graph.query(\"\"\"\n", " PREFIX sd: \n", " PREFIX schema: \n", " PREFIX prov: \n", " \n", " SELECT (COUNT (DISTINCT ?software) AS ?count_software)\n", " FROM \n", " WHERE {\n", " << ?software sd:citation ?cite >> prov:hadPrimarySource ?source\n", " FILTER(CONTAINS(str(?source),'README'))\n", " }\n", "\"\"\")\n", "\n", "for solution in q_res:\n", " print(\"Total number of repositories with citation in README:\", solution['count_software'].value)\n", " result_list['readme_citation'] = solution['count_software'].value\n" ] }, { "cell_type": "code", "execution_count": 19, "id": "d8bb8cf7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total number of repositories with CFF citation file: 5\n" ] } ], "source": [ "q_res = graph.query(\"\"\"\n", " PREFIX sd: \n", " PREFIX schema: \n", " PREFIX prov: \n", " \n", " SELECT (COUNT (DISTINCT ?software) AS ?count_software)\n", " FROM \n", " WHERE {\n", " << ?software sd:citation ?cite >> prov:hadPrimarySource ?source\n", " FILTER(CONTAINS(str(?source),'.cff'))\n", " }\n", "\"\"\")\n", "\n", "for solution in q_res:\n", " print(\"Total number of repositories with CFF citation file:\", solution['count_software'].value)\n", " result_list['cff_citation'] = solution['count_software'].value\n" ] }, { "cell_type": "markdown", "id": "3c6634b3", "metadata": {}, "source": [ "### BP8: Available software metadata\n", "Repositories with minimum software metadata: programming language, date created, at least one release and keywords" ] }, { "cell_type": "code", "execution_count": 20, "id": "deb3bd03", "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total number of repositories with minimum metadata: 22\n" ] } ], "source": [ "q_res = graph.query(\"\"\"\n", " PREFIX sd: \n", " PREFIX schema: \n", " PREFIX prov: \n", " \n", " SELECT (COUNT (DISTINCT ?software) AS ?count_software)\n", " FROM \n", " WHERE {\n", " ?software sd:hasSourceCode/sd:programmingLanguage ?language .\n", " ?software sd:dateCreated ?date .\n", " ?software sd:description ?desc .\n", " ?software sd:hasVersion ?rel .\n", " ?software sd:keywords ?keys .\n", " \n", " }\n", "\"\"\")\n", "\n", "for solution in q_res:\n", " print(\"Total number of repositories with minimum metadata:\", solution['count_software'].value)\n", " result_list['total_repo_metadata'] = solution['count_software'].value\n", " BP_results['BP8'] = int(solution['count_software'].value)*100/int(result_list['total_repos'])\n" ] }, { "cell_type": "markdown", "id": "7fc6aee9", "metadata": {}, "source": [ "### BP9: Installation instructions\n", "Repositories that provide installation instructions" ] }, { "cell_type": "code", "execution_count": 21, "id": "07af6c54", "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total number of repositories with installation instructions: 60\n" ] } ], "source": [ "q_res = graph.query(\"\"\"\n", " PREFIX sd: \n", " PREFIX schema: \n", " PREFIX prov: \n", " \n", " SELECT (COUNT (DISTINCT ?software) AS ?count_software)\n", " FROM \n", " WHERE {\n", " ?software sd:hasInstallationInstructions ?inst .\n", " \n", " }\n", "\"\"\")\n", "\n", "for solution in q_res:\n", " print(\"Total number of repositories with installation instructions:\", solution['count_software'].value)\n", " result_list['total_install_inst'] = solution['count_software'].value\n", " BP_results['BP9'] = int(solution['count_software'].value)*100/int(result_list['total_repos'])\n" ] }, { "cell_type": "markdown", "id": "bb1ff6f8", "metadata": {}, "source": [ "### BP10: Software requirements\n", "Repositories that provide requirements to use the software" ] }, { "cell_type": "code", "execution_count": 22, "id": "38231239", "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total number of repositories with software requirements: 22\n" ] } ], "source": [ "q_res = graph.query(\"\"\"\n", " PREFIX sd: \n", " PREFIX schema: \n", " PREFIX prov: \n", " \n", " SELECT (COUNT (DISTINCT ?software) AS ?count_software)\n", " FROM \n", " WHERE {\n", " ?software sd:softwareRequirements ?requirements .\n", " \n", " }\n", "\"\"\")\n", "\n", "for solution in q_res:\n", " print(\"Total number of repositories with software requirements:\", solution['count_software'].value)\n", " result_list['total_soft_requirements'] = solution['count_software'].value\n", " BP_results['BP10'] = int(solution['count_software'].value)*100/int(result_list['total_repos'])\n" ] }, { "cell_type": "markdown", "id": "8dd59a57", "metadata": {}, "source": [ "## Graphics and statistics\n", "Graphical representation of the results obtained in the queries above to represent the percentage of repositories in the GitHub organisation that are compliant with the best practices" ] }, { "cell_type": "code", "execution_count": 23, "id": "7783d253", "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "id": "053f7f5f", "metadata": {}, "source": [ "#### General barplot showing percentage of repos that comply with the BPs" ] }, { "cell_type": "code", "execution_count": 24, "id": "73b99020", "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "numeric_result_list = dict([a, int(x)] for a, x in result_list.items()) \n", "\n", "labels_BP = ['Description (BP1)',\n", " 'Persistent ID (BP2)',\n", " 'Download URL (BP3)',\n", " 'Versioning scheme (BP4)',\n", " 'Documentation (BP5)',\n", " 'License (BP6)',\n", " 'Citation (BP7)',\n", " 'Metadata (BP8)',\n", " 'Installation (BP9)',\n", " 'Requirements (BP10)']\n", "\n", "bars = plt.bar(*zip(*BP_results.items()), color='#4895ef')\n", "plt.ylabel('Percentage of repositories')\n", "plt.xticks(range(len(BP_results)), list(labels_BP), rotation=45, ha='right')\n", "plt.ylim((0,100))\n", "\n", "for bar in bars:\n", " height = bar.get_height()\n", " percentage = str(round(height,2)) + '%'\n", " plt.text(bar.get_x() + bar.get_width() / 2, height, percentage, ha='center', va='bottom')\n", "\n", "\n", "plt.savefig('general_bp.png',dpi=400,bbox_inches = \"tight\") \n", "plt.show()" ] }, { "cell_type": "markdown", "id": "dffc5cc7", "metadata": {}, "source": [ "#### Specific plots for citations and descriptions" ] }, { "cell_type": "code", "execution_count": 25, "id": "d5297a95", "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "labels_desc = ['Short description', 'Long description']\n", "values_desc = [numeric_result_list['total_short_desc']*100/int(result_list['total_repos']),\n", " numeric_result_list['total_long_desc']*100/int(result_list['total_repos'])]\n", "\n", "plt.figure(figsize=(4,3))\n", "bars = plt.bar(labels_desc, values_desc, color='#4895ef')\n", "plt.ylabel('Percentage of respositories')\n", "plt.ylim((0,100))\n", "\n", "for bar in bars:\n", " height = bar.get_height()\n", " percentage = str(round(height,2)) + '%'\n", " plt.text(bar.get_x() + bar.get_width() / 2, height, percentage, ha='center', va='bottom')\n", "\n", "plt.savefig('descriptions.png',dpi=400,bbox_inches = \"tight\") \n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 26, "id": "4da80232", "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "labels_cite = ['README', 'CFF']\n", "values_cite = [numeric_result_list['readme_citation']*100/int(result_list['total_repos']),\n", " numeric_result_list['cff_citation']*100/int(result_list['total_repos'])]\n", "\n", "plt.figure(figsize=(4,3))\n", "bars = plt.bar(labels_cite, values_cite, color='#4895ef')\n", "plt.ylabel('Percentage of repositories')\n", "plt.ylim((0,100))\n", "for bar in bars:\n", " height = bar.get_height()\n", " percentage = str(round(height,2)) + '%'\n", " plt.text(bar.get_x() + bar.get_width() / 2, height, percentage, ha='center', va='bottom')\n", "\n", "\n", "plt.savefig('citations.png',dpi=400,bbox_inches = \"tight\") \n", "plt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 5 }