{ "cells": [ { "cell_type": "markdown", "id": "c4ee8736-2cae-487b-b722-78dbbe2f556d", "metadata": {}, "source": [ "# KGTK Browser Cache Setup" ] }, { "cell_type": "markdown", "id": "d8198f9d-7aad-417d-972b-b10514faf242", "metadata": {}, "source": [ "This note book will create the SQLite DB Cache and the required indices for KGTK Browser.\n", "\n", "The required input parameters are:\n", "- input_path: Path where the following files should be present\n", " 1. labels.en.tsv.gz\n", " 2. aliases.en.tsv.gz\n", " 3. descriptions.en.tsv.gz\n", " 4. claims.tsv.gz\n", " 5. metadata.property.datatypes.tsv.gz\n", " 6. qualifiers.tsv.gz\n", " 7. metadata.pagerank.undirected.tsv.gz\n", " 8. class-visualization.edge.tsv.gz **# optional, required for class visualization graph**\n", " 9. class-visualization.node.tsv.gz **# optional, required for class visualization graph**\n", " 10. derived.isastar.tsv.gz **# This file is required for creating the ES index only.**\n", "\n", "The files `metadata.pagerank.undirected.tsv.gz` and `derived.isastar.tsv.gz` are created by running [this](https://github.com/usc-isi-i2/kgtk/blob/dev/use-cases/Wikidata%20Useful%20Files.ipynb) notebook\n", "- output_path: Output path \n", "- project_name: folder inside the `output_path` where the required files and cache will be created\n", "- es_host: ES host, default `localhost`\n", "- es_port: ES port, default `9200`\n", "- es_index: name of the desired elasticsearch index, default `wikidata-dwd-kgtk-search-01`\n", "- create_db: variable to control creation of sqlite database cache. `yes|no`, default `yes`\n", "- create_es: variable to control creation of the ES index. `yes|no`, defaut `no`\n", "- create_class_viz: variable to control creation of class visualization tables in the sqlite database cache. Most probably, if you are not working with Wikidata, you will not have files # 8 and 9 listed above. Set this parameter to 'no' in that case. `yes|no`, default `yes`\n", "\n", "**Cache file location:** `//temp./wikidata.sqlite3.db`" ] }, { "cell_type": "code", "execution_count": 1, "id": "1a557f86-0d8a-4e89-9e76-4cedfafbf15a", "metadata": {}, "outputs": [], "source": [ "import os\n", "import pandas as pd\n", "from kgtk.configure_kgtk_notebooks import ConfigureKGTK\n", "import kgtk.kypher.api as kapi\n", "from kgtk.functions import kgtk, kypher" ] }, { "cell_type": "code", "execution_count": 2, "id": "3442d8b8-4db0-40a9-a6aa-2db2e4d5b377", "metadata": { "tags": [ "parameters" ] }, "outputs": [], "source": [ "input_path = \"/data02/ana_iglesias/data/subset/reframings/events\"\n", "output_path = \"/data02/ana_iglesias/data/subset\"\n", "\n", "project_name = \"KRHC-NAry-Events\"\n", "\n", "files = 'label,pagerank_undirected,alias,description,claims,datatypes,qualifiers'\n", "\n", "es_host = \"http://localhost\"\n", "es_port = 9200\n", "es_index = \"wikidata-dwd-kgtk-search-01\"\n", "\n", "create_db = 'yes'\n", "create_es = 'no'\n", "create_class_viz = 'no'" ] }, { "cell_type": "code", "execution_count": 3, "id": "9d7e7730-53a9-4307-b448-c5d4e0946968", "metadata": {}, "outputs": [], "source": [ "files = files.split(',')\n", "\n", "create_sqlite_cache = create_db.lower() == 'yes'\n", "create_es_index = create_es.lower() == 'yes'\n", "create_class_viz_tables = create_class_viz.lower() == 'yes'\n", "\n", "additional_files = {\n", " 'classvizedge': 'class-visualization.edge.tsv.gz',\n", " 'classviznode': 'class-visualization.node.tsv.gz'\n", "}" ] }, { "cell_type": "code", "execution_count": 4, "id": "5d7adfbe-aa7e-46d2-ad76-e78f39434bda", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "User home: /data02/ana_iglesias\n", "Current dir: /data02/ana_iglesias/data/subset\n", "KGTK dir: /data02/ana_iglesias/data\n", "Use-cases dir: /data02/ana_iglesias/data/use-cases\n" ] } ], "source": [ "ck = ConfigureKGTK(files)\n", "if create_class_viz_tables:\n", " ck.configure_kgtk(input_graph_path=input_path,\n", " output_path=output_path,\n", " project_name=project_name,\n", " additional_files=additional_files)\n", "else:\n", " ck.configure_kgtk(input_graph_path=input_path,\n", " output_path=output_path,\n", " project_name=project_name)" ] }, { "cell_type": "code", "execution_count": 5, "id": "36bcea63-6cff-48fc-b06c-d0acc8346c80", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "kypher: kgtk query --graph-cache /data02/ana_iglesias/data/subset/KRHC-NAry-Events/temp.KRHC-NAry-Events/wikidata.sqlite3.db\n", "GRAPH: /data02/ana_iglesias/data/subset/reframings/events\n", "TEMP: /data02/ana_iglesias/data/subset/KRHC-NAry-Events/temp.KRHC-NAry-Events\n", "STORE: /data02/ana_iglesias/data/subset/KRHC-NAry-Events/temp.KRHC-NAry-Events/wikidata.sqlite3.db\n", "KGTK_OPTION_DEBUG: false\n", "KGTK_GRAPH_CACHE: /data02/ana_iglesias/data/subset/KRHC-NAry-Events/temp.KRHC-NAry-Events/wikidata.sqlite3.db\n", "OUT: /data02/ana_iglesias/data/subset/KRHC-NAry-Events\n", "USE_CASES_DIR: /data02/ana_iglesias/data/use-cases\n", "EXAMPLES_DIR: /data02/ana_iglesias/data/examples\n", "kgtk: kgtk\n", "KGTK_LABEL_FILE: /data02/ana_iglesias/data/subset/reframings/events/labels.en.tsv.gz\n", "label: /data02/ana_iglesias/data/subset/reframings/events/labels.en.tsv.gz\n", "pagerank_undirected: /data02/ana_iglesias/data/subset/reframings/events/metadata.pagerank.undirected.tsv.gz\n", "alias: /data02/ana_iglesias/data/subset/reframings/events/aliases.en.tsv.gz\n", "description: /data02/ana_iglesias/data/subset/reframings/events/descriptions.en.tsv.gz\n", "claims: /data02/ana_iglesias/data/subset/reframings/events/claims.tsv.gz\n", "datatypes: /data02/ana_iglesias/data/subset/reframings/events/metadata.property.datatypes.tsv.gz\n", "qualifiers: /data02/ana_iglesias/data/subset/reframings/events/qualifiers.tsv.gz\n" ] } ], "source": [ "ck.print_env_variables()" ] }, { "cell_type": "markdown", "id": "99a8c8ef-3617-4f40-adfd-f2da816216c2", "metadata": {}, "source": [ "## Load the files into cache" ] }, { "cell_type": "code", "execution_count": 6, "id": "a7683f2e-49ee-4b4c-861e-587173380a6d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "kgtk query --graph-cache /data02/ana_iglesias/data/subset/KRHC-NAry-Events/temp.KRHC-NAry-Events/wikidata.sqlite3.db -i \"/data02/ana_iglesias/data/subset/reframings/events/labels.en.tsv.gz\" --as label -i \"/data02/ana_iglesias/data/subset/reframings/events/metadata.pagerank.undirected.tsv.gz\" --as pagerank_undirected -i \"/data02/ana_iglesias/data/subset/reframings/events/aliases.en.tsv.gz\" --as alias -i \"/data02/ana_iglesias/data/subset/reframings/events/descriptions.en.tsv.gz\" --as description -i \"/data02/ana_iglesias/data/subset/reframings/events/claims.tsv.gz\" --as claims -i \"/data02/ana_iglesias/data/subset/reframings/events/metadata.property.datatypes.tsv.gz\" --as datatypes -i \"/data02/ana_iglesias/data/subset/reframings/events/qualifiers.tsv.gz\" --as qualifiers --limit 3\n", "node1\tlabel\tnode2\tid\n", "P10\tlabel\t'video'@en\tP10-label-en\n", "P1000\tlabel\t'record held'@en\tP1000-label-en\n", "P10000\tlabel\t'Research Vocabularies Australia ID'@en\tP10000-label-en\n" ] } ], "source": [ "if create_sqlite_cache:\n", " ck.load_files_into_cache()" ] }, { "cell_type": "markdown", "id": "bb65acc2-2a3d-4c8c-8e4f-d5167f9f84ae", "metadata": {}, "source": [ "## Define the Kypher API" ] }, { "cell_type": "code", "execution_count": 7, "id": "ba6552ee-2a6a-4893-b468-c1e15efd6885", "metadata": {}, "outputs": [], "source": [ "_kapi2 = kapi.KypherApi(graphcache=os.environ['STORE'], loglevel=1, index='auto',\n", " maxresults=100, maxcache=0)" ] }, { "cell_type": "markdown", "id": "97479a01-92b4-489c-91cc-e301907548ad", "metadata": {}, "source": [ "## Create a file with `label`, `undirected_pagerank` and `description`" ] }, { "cell_type": "code", "execution_count": 8, "id": "4fbcb99b-a4d9-4a24-aea3-93b0e7bbbc64", "metadata": {}, "outputs": [], "source": [ "if create_sqlite_cache:\n", " !kgtk query --gc $STORE \\\n", " -i label pagerank_undirected description\\\n", " --match 'label: (qnode)-[l]->(y), pagerank: (qnode)-[:Pundirected_pagerank]->(pr)' \\\n", " --opt 'description: (qnode)-[:description]->(d)' \\\n", " --return 'qnode as node1, l.label as label, y as node2, upper(y) as `node2;upper`, pr as `node1;pagerank`, ifnull(d, \"\") as `node1;description`' \\\n", " --order-by 'qnode' \\\n", " -o $OUT/label_pagerank_undirected_description.tsv.gz" ] }, { "cell_type": "markdown", "id": "d0ecdfcc-1ea4-4c83-8890-588349a6b57c", "metadata": {}, "source": [ "### Load this file into cache as well" ] }, { "cell_type": "code", "execution_count": 9, "id": "6ca3f1c3-91cd-4941-b6db-f6d0a887f35b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "node1\tlabel\tnode2\tnode2;upper\tnode1;pagerank\tnode1;description\r\n", "P10\tlabel\t'video'@en\t'VIDEO'@EN\t1.90092695780294e-06\t'relevant video. For images, use the property P18. For film trailers, qualify with \\\"object has role\\\" (P3831)=\\\"trailer\\\" (Q622550)'@en\r\n", "P1000\tlabel\t'record held'@en\t'RECORD HELD'@EN\t8.846157108863055e-07\t'notable record achieved by a person or entity, include qualifiers for dates held'@en\r\n", "P10000\tlabel\t'Research Vocabularies Australia ID'@en\t'RESEARCH VOCABULARIES AUSTRALIA ID'@EN\t1.7379313655369044e-06\t'identifier of a vocabulary in Research Vocabularies Australia'@en\r\n", "P10006\tlabel\t'AllSides ID'@en\t'ALLSIDES ID'@EN\t1.8528701572707561e-06\t'alphabetical identifier of a publication, organization, or person at AllSides'@en\r\n", "P10007\tlabel\t'Birdata ID'@en\t'BIRDATA ID'@EN\t1.275801866591085e-06\t'identifier for an bird species in the Birdata database'@en\r\n", "P10008\tlabel\t'Geographical Names Board of NSW geoname ID'@en\t'GEOGRAPHICAL NAMES BOARD OF NSW GEONAME ID'@EN\t2.0272204848959872e-06\t'identifier for a registered place in the Geographical Names Board of NSW website'@en\r\n", "P1001\tlabel\t'applies to jurisdiction'@en\t'APPLIES TO JURISDICTION'@EN\t2.991437704358036e-06\t'the item (institution, law, public office, public register...) or statement belongs to or has power over or applies to the value (a territorial jurisdiction: a country, state, municipality, ...)'@en\r\n", "P10012\tlabel\t'NSW Parliament member ID'@en\t'NSW PARLIAMENT MEMBER ID'@EN\t2.320997304725948e-06\t'numerical identifier for a member of the Parliament of New South Wales'@en\r\n", "P10015\tlabel\t'NatureScot Sitelink ID'@en\t'NATURESCOT SITELINK ID'@EN\t1.5980845593309356e-06\t'identifier for items in the \\\"register of European sites under Regulation 11 of the Conservation (Natural Habitats, &c.) Regulations 1994\\\" and published by NatureScot, Scottish government Nature Agency'@en\r\n", "P10017\tlabel\t'Dizionario delle Scienze Fisiche ID'@en\t'DIZIONARIO DELLE SCIENZE FISICHE ID'@EN\t1.4733014297291168e-06\t'identifier for an item in the Dizionario delle Scienze Fisiche on the Institute of the Italian Encyclopaedia website'@en\r\n" ] } ], "source": [ "if create_sqlite_cache:\n", " !kgtk query --gc $STORE -i $OUT/label_pagerank_undirected_description.tsv.gz --as l_d_pgr_ud --limit 10" ] }, { "cell_type": "markdown", "id": "49ba5559-1550-4c53-b5cd-7a7084164b61", "metadata": {}, "source": [ "## Create the required indices" ] }, { "cell_type": "code", "execution_count": 10, "id": "03aa07dd-5a8d-4451-acaf-41337978ce13", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2022-10-03 18:56:01 query]: SQL Translation:\n", "---------------------------------------------\n", " SELECT *\n", " FROM graph_8 AS graph_8_c1\n", " LIMIT ?\n", " PARAS: [5]\n", "---------------------------------------------\n", "[2022-10-03 18:56:01 sqlstore]: CREATE INDEX \"graph_8_node1_idx\" ON \"graph_8\" (\"node1\")\n", "[2022-10-03 18:56:02 sqlstore]: ANALYZE \"graph_8_node1_idx\"\n", "[2022-10-03 18:56:02 sqlstore]: CREATE INDEX \"graph_8_node2;upper_idx\" ON \"graph_8\" (\"node2;upper\")\n", "[2022-10-03 18:56:03 sqlstore]: ANALYZE \"graph_8_node2;upper_idx\"\n", "[2022-10-03 18:56:03 sqlstore]: CREATE INDEX \"graph_8_label_idx\" ON \"graph_8\" (\"label\")\n", "[2022-10-03 18:56:03 sqlstore]: ANALYZE \"graph_8_label_idx\"\n", "[2022-10-03 18:56:04 sqlstore]: CREATE VIRTUAL TABLE \"graph_8_txtidx_ldpgridx\" USING FTS5 (\"node2\", tokenize=\"trigram\", content=\"graph_8\")\n", "[2022-10-03 18:56:04 sqlstore]: INSERT INTO \"graph_8_txtidx_ldpgridx\" (\"node2\") SELECT \"node2\" FROM graph_8\n", "node1\tlabel\tnode2\tnode2;upper\tnode1;pagerank\tnode1;description\n", "P10\tlabel\t'video'@en\t'VIDEO'@EN\t1.90092695780294e-06\t'relevant video. For images, use the property P18. For film trailers, qualify with \\\"object has role\\\" (P3831)=\\\"trailer\\\" (Q622550)'@en\n", "P1000\tlabel\t'record held'@en\t'RECORD HELD'@EN\t8.846157108863055e-07\t'notable record achieved by a person or entity, include qualifiers for dates held'@en\n", "P10000\tlabel\t'Research Vocabularies Australia ID'@en\t'RESEARCH VOCABULARIES AUSTRALIA ID'@EN\t1.7379313655369044e-06\t'identifier of a vocabulary in Research Vocabularies Australia'@en\n", "P10006\tlabel\t'AllSides ID'@en\t'ALLSIDES ID'@EN\t1.8528701572707561e-06\t'alphabetical identifier of a publication, organization, or person at AllSides'@en\n", "P10007\tlabel\t'Birdata ID'@en\t'BIRDATA ID'@EN\t1.275801866591085e-06\t'identifier for an bird species in the Birdata database'@en\n", "CPU times: user 157 ms, sys: 45.5 ms, total: 203 ms\n", "Wall time: 9.81 s\n" ] } ], "source": [ "%%time \n", "if create_sqlite_cache:\n", " !kgtk --debug query -i l_d_pgr_ud --idx node1 \"node2;upper\" label text:node2//name=ldpgridx --gc $STORE --limit 5" ] }, { "cell_type": "code", "execution_count": 11, "id": "e33e4d9d-e0da-4d29-8abf-5bada3316b88", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2022-10-03 18:56:11 query]: SQL Translation:\n", "---------------------------------------------\n", " SELECT *\n", " FROM graph_1 AS graph_1_c1\n", " LIMIT ?\n", " PARAS: [5]\n", "---------------------------------------------\n", "[2022-10-03 18:56:11 sqlstore]: CREATE INDEX \"graph_1_label_idx\" ON \"graph_1\" (\"label\")\n", "[2022-10-03 18:56:11 sqlstore]: ANALYZE \"graph_1_label_idx\"\n", "node1\tlabel\tnode2\tid\n", "P10\tlabel\t'video'@en\tP10-label-en\n", "P1000\tlabel\t'record held'@en\tP1000-label-en\n", "P10000\tlabel\t'Research Vocabularies Australia ID'@en\tP10000-label-en\n", "P10006\tlabel\t'AllSides ID'@en\tP10006-label-en\n", "P10007\tlabel\t'Birdata ID'@en\tP10007-label-en\n", "CPU times: user 25 ms, sys: 14.2 ms, total: 39.2 ms\n", "Wall time: 1.67 s\n" ] } ], "source": [ "%%time\n", "if create_sqlite_cache:\n", " !kgtk --debug query -i label --idx label --gc $STORE --limit 5" ] }, { "cell_type": "code", "execution_count": 12, "id": "85f0a37f-0a55-4187-99d6-e251b7f095cf", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2022-10-03 18:56:13 query]: SQL Translation:\n", "---------------------------------------------\n", " SELECT *\n", " FROM graph_3 AS graph_3_c1\n", " LIMIT ?\n", " PARAS: [5]\n", "---------------------------------------------\n", "[2022-10-03 18:56:13 sqlstore]: CREATE INDEX \"graph_3_label_idx\" ON \"graph_3\" (\"label\")\n", "[2022-10-03 18:56:13 sqlstore]: ANALYZE \"graph_3_label_idx\"\n", "node1\tlabel\tnode2\tid\n", "P10\talias\t'animation'@en\tP10-alias-en-2f86d8-0\n", "P10\talias\t'gif'@en\tP10-alias-en-282226-0\n", "P10\talias\t'media'@en\tP10-alias-en-c1427e-0\n", "P10\talias\t'trailer (Commons)'@en\tP10-alias-en-c61ab1-0\n", "P10000\talias\t'Australian Research Vocabularies ID'@en\tP10000-alias-en-0df7f5-0\n", "CPU times: user 28.4 ms, sys: 17.4 ms, total: 45.8 ms\n", "Wall time: 1.72 s\n" ] } ], "source": [ "%%time\n", "if create_sqlite_cache:\n", " !kgtk --debug query -i alias --idx label --gc $STORE --limit 5" ] }, { "cell_type": "code", "execution_count": 13, "id": "cd728937-6ffd-406f-988a-6dc7c38cb6ef", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2022-10-03 18:56:14 query]: SQL Translation:\n", "---------------------------------------------\n", " SELECT *\n", " FROM graph_4 AS graph_4_c1\n", " LIMIT ?\n", " PARAS: [5]\n", "---------------------------------------------\n", "[2022-10-03 18:56:14 sqlstore]: CREATE INDEX \"graph_4_id_idx\" ON \"graph_4\" (\"id\")\n", "[2022-10-03 18:56:15 sqlstore]: ANALYZE \"graph_4_id_idx\"\n", "node1\tlabel\tnode2\tid\n", "P10\tdescription\t'relevant video. For images, use the property P18. For film trailers, qualify with \\\"object has role\\\" (P3831)=\\\"trailer\\\" (Q622550)'@en\tP10-description-en\n", "P1000\tdescription\t'notable record achieved by a person or entity, include qualifiers for dates held'@en\tP1000-description-en\n", "P10000\tdescription\t'identifier of a vocabulary in Research Vocabularies Australia'@en\tP10000-description-en\n", "P10006\tdescription\t'alphabetical identifier of a publication, organization, or person at AllSides'@en\tP10006-description-en\n", "P10007\tdescription\t'identifier for an bird species in the Birdata database'@en\tP10007-description-en\n", "CPU times: user 24 ms, sys: 15.3 ms, total: 39.3 ms\n", "Wall time: 1.97 s\n" ] } ], "source": [ "%%time\n", "if create_sqlite_cache:\n", " !kgtk --debug query -i description --idx id --gc $STORE --limit 5" ] }, { "cell_type": "code", "execution_count": 14, "id": "c67f7305-8008-4ac0-bd0c-dd76186a8720", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2022-10-03 18:56:16 query]: SQL Translation:\n", "---------------------------------------------\n", " SELECT *\n", " FROM graph_5 AS graph_5_c1\n", " LIMIT ?\n", " PARAS: [5]\n", "---------------------------------------------\n", "[2022-10-03 18:56:16 sqlstore]: CREATE INDEX \"graph_5_label_idx\" ON \"graph_5\" (\"label\")\n", "[2022-10-03 18:56:21 sqlstore]: ANALYZE \"graph_5_label_idx\"\n", "[2022-10-03 18:56:22 sqlstore]: CREATE INDEX \"graph_5_node1_idx\" ON \"graph_5\" (\"node1\")\n", "[2022-10-03 18:56:24 sqlstore]: ANALYZE \"graph_5_node1_idx\"\n", "[2022-10-03 18:56:25 sqlstore]: CREATE INDEX \"graph_5_node2_idx\" ON \"graph_5\" (\"node2\")\n", "[2022-10-03 18:56:32 sqlstore]: ANALYZE \"graph_5_node2_idx\"\n", "[2022-10-03 18:56:32 sqlstore]: CREATE INDEX \"graph_5_id_idx\" ON \"graph_5\" (\"id\")\n", "[2022-10-03 18:56:37 sqlstore]: ANALYZE \"graph_5_id_idx\"\n", "node1\tlabel\tnode2\tid\n", "P10\tP1628\t\"http://www.w3.org/2006/vcard/ns#Video\"\tP10-P1628-32b85d-7927ece6-0\n", "P10\tP1628\t\"https://schema.org/video\"\tP10-P1628-acf60d-b8950832-0\n", "P10\tP1629\tQ34508\tP10-P1629-Q34508-bcc39400-0\n", "P10\tP1630\t\"https://commons.wikimedia.org/wiki/File:$1\"\tP10-P1630-53947a-fbe9093e-0\n", "P10\tP1659\tP1651\tP10-P1659-P1651-c4068028-0\n", "CPU times: user 354 ms, sys: 96.4 ms, total: 450 ms\n", "Wall time: 23.5 s\n" ] } ], "source": [ "%%time\n", "if create_sqlite_cache:\n", " !kgtk --debug query -i claims --idx label node1 node2 id --gc $STORE --limit 5" ] }, { "cell_type": "code", "execution_count": 15, "id": "726beb5e-05b0-45ca-b56f-738a147f5536", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2022-10-03 18:56:40 query]: SQL Translation:\n", "---------------------------------------------\n", " SELECT *\n", " FROM graph_6 AS graph_6_c1\n", " LIMIT ?\n", " PARAS: [5]\n", "---------------------------------------------\n", "[2022-10-03 18:56:40 sqlstore]: CREATE INDEX \"graph_6_label_idx\" ON \"graph_6\" (\"label\")\n", "[2022-10-03 18:56:40 sqlstore]: ANALYZE \"graph_6_label_idx\"\n", "[2022-10-03 18:56:40 sqlstore]: CREATE INDEX \"graph_6_node1_idx\" ON \"graph_6\" (\"node1\")\n", "[2022-10-03 18:56:40 sqlstore]: ANALYZE \"graph_6_node1_idx\"\n", "node1\tlabel\tnode2\tid\n", "Psubject\tdatatype\twikibase-item\tPsubject-datatype\n", "Ppredicate\tdatatype\twikibase-item\tPpredicate-datatype\n", "Pobject\tdatatype\tstring\tPobject-datatype\n", "Phas_event\tdatatype\twikibase-item\tPevent-datatype\n", "P10\tdatatype\tcommonsMedia\tP10-datatype\n", "CPU times: user 24.1 ms, sys: 9.25 ms, total: 33.4 ms\n", "Wall time: 1.38 s\n" ] } ], "source": [ "%%time\n", "if create_sqlite_cache:\n", " !kgtk --debug query -i datatypes --idx label node1 --gc $STORE --limit 5" ] }, { "cell_type": "code", "execution_count": 16, "id": "a37dc7b4-16aa-468a-9de6-f58064bcfca3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2022-10-03 18:56:41 query]: SQL Translation:\n", "---------------------------------------------\n", " SELECT *\n", " FROM graph_7 AS graph_7_c1\n", " LIMIT ?\n", " PARAS: [5]\n", "---------------------------------------------\n", "[2022-10-03 18:56:41 sqlstore]: CREATE INDEX \"graph_7_node2_idx\" ON \"graph_7\" (\"node2\")\n", "[2022-10-03 18:56:41 sqlstore]: ANALYZE \"graph_7_node2_idx\"\n", "[2022-10-03 18:56:41 sqlstore]: CREATE INDEX \"graph_7_node1_idx\" ON \"graph_7\" (\"node1\")\n", "[2022-10-03 18:56:41 sqlstore]: ANALYZE \"graph_7_node1_idx\"\n", "[2022-10-03 18:56:41 sqlstore]: CREATE INDEX \"graph_7_label_idx\" ON \"graph_7\" (\"label\")\n", "[2022-10-03 18:56:42 sqlstore]: ANALYZE \"graph_7_label_idx\"\n", "node1\tlabel\tnode2\tid\n", "CPU times: user 20.4 ms, sys: 17.1 ms, total: 37.5 ms\n", "Wall time: 1.51 s\n" ] } ], "source": [ "%%time\n", "if create_sqlite_cache:\n", " !kgtk --debug query -i qualifiers --idx node2 node1 label --gc $STORE --limit 5" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" } }, "nbformat": 4, "nbformat_minor": 5 }