{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Analyzing CSKG\n", "\n", "This notebook performs various analyses on CSKG\n", "\n", "Parameters are set up in the first cell so that we can run this notebook in batch mode. Example invocation command:\n", "\n", "```\n", "papermill Example8\\ -\\ Wikidata\\ Subset.ipynb example8.out.ipynb \\\n", "-p cskg_path /Users/pedroszekely/Downloads/kypher/cskg \\\n", "-p kg cskg_connected.tsv.gz \\\n", "-p delete_database no \n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Parameters for invoking the notebook\n", "\n", "- `cskg_path`: a folder containing the CSKG edges file and all the analysis products.\n", "- `kg`: the name of the edge file.\n", "- `delete_database`: whether to delete the SQL database before running the notebook: \"\" or \"no\" means don't delete it." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Preamble\n", "\n", "Set up paths and environment variables" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "parameters" ] }, "outputs": [], "source": [ "# Parameters\n", "cskg_path = \"/Users/pedroszekely/Downloads/kypher/cskg\"\n", "kg = \"cskg_connected.kgtk.gz\"\n", "delete_database = \"yes\"" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import io\n", "import os\n", "import subprocess\n", "import sys\n", "\n", "import numpy as np\n", "import pandas as pd\n", "\n", "import altair as alt" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "os.environ['CSKG'] = cskg_path\n", "os.environ['KG'] = \"{}/{}\".format(cskg_path, kg)\n", "os.environ['NKG'] = \"{}/cskg-normalized.kgtk.gz\".format(cskg_path, kg)\n", "os.environ['STORE'] = \"{}/wikidata.sqlite3.db\".format(cskg_path)\n", "os.environ['kypher'] = \"time kgtk query --graph-cache \" + os.environ['STORE']\n", "# os.environ['kypher'] = \"time kgtk --debug query --graph-cache \" + os.environ['STORE']" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/pedroszekely/Downloads/kypher/cskg\n", "/Users/pedroszekely/Downloads/kypher/cskg/cskg_connected.kgtk.gz\n", "/Users/pedroszekely/Downloads/kypher/cskg/cskg-normalized.kgtk.gz\n", "time kgtk query --graph-cache /Users/pedroszekely/Downloads/kypher/cskg/wikidata.sqlite3.db\n", "/Users/pedroszekely/Downloads/kypher/cskg/wikidata.sqlite3.db\n" ] } ], "source": [ "!echo $CSKG\n", "!echo $KG\n", "!echo $NKG\n", "!echo $kypher\n", "!echo $STORE" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/pedroszekely/Downloads/kypher/cskg\n" ] } ], "source": [ "cd $cskg_path" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Deleted database\n" ] } ], "source": [ "if delete_database and delete_database != \"no\":\n", " print(\"Deleted database\")\n", " !rm $STORE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Utilities" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "def bar_chart(data, x_column, y_column, title=\"\", width=800):\n", " \"\"\"Construct a simple bar chart with two properties\"\"\"\n", " bars = alt.Chart(data).mark_bar().encode(\n", " y=alt.Y(y_column, sort='-x'),\n", " x=x_column\n", " ).properties(\n", " title=title,\n", " width=width\n", " )\n", "\n", " text = bars.mark_text(\n", " align='left',\n", " baseline='middle',\n", " dx=3 # Nudges text to right so it doesn't appear on top of the bar\n", " ).encode(\n", " text=x_column\n", " )\n", "\n", " return (bars + text)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "import io\n", "import pandas\n", "import subprocess\n", "\n", "def shell_df(command, shell=False, **kwargs):\n", " \"\"\"\n", " Takes a shell command as a string and and reads the result into a Pandas DataFrame.\n", " \n", " Additional keyword arguments are passed through to pandas.read_csv.\n", " \n", " :param command: a shell command that returns tabular data\n", " :type command: str\n", " :param shell: passed to subprocess.Popen\n", " :type shell: bool\n", " \n", " :return: a pandas dataframe\n", " :rtype: :class:`pandas.dataframe`\n", " \"\"\"\n", " proc = subprocess.Popen(command, \n", " shell=shell,\n", " stdout=subprocess.PIPE, \n", " stderr=subprocess.PIPE)\n", " output, error = proc.communicate()\n", " \n", " if proc.returncode == 0:\n", " if error:\n", " print(error.decode())\n", " with io.StringIO(output.decode()) as buffer:\n", " return pandas.read_csv(buffer, **kwargs)\n", " else:\n", " message = (\"Shell command returned non-zero exit status: {0}\\n\\n\"\n", " \"Command was:\\n{1}\\n\\n\"\n", " \"Standard error was:\\n{2}\")\n", " raise IOError(message.format(proc.returncode, command, error.decode()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Poking around" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Print some lines to see what we have" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "zcat: id node1 relation node2 node1;label node2;label relation;label relation;dimension source sentence\n", "/c/en/0-/r/DefinedAs-/c/en/empty_set-0000 /c/en/0 /r/DefinedAs /c/en/empty_set \"0\" \"empty set\" \"defined as\" \"CN\" \"[[0]] is the [[empty set]].\"\n", "error writing to output/c/en/0-/r/DefinedAs-/c/en/first_limit_ordinal-0000 /c/en/0 /r/DefinedAs /c/en/first_limit_ordinal \"0\" \"first limit ordinal\" \"defined as\" \"CN\" \"[[0]] is the [[first limit ordinal]].\"\n", ": Broken pipe\n", "/c/en/0-/r/DefinedAs-/c/en/number_zero-0000 /c/en/0 /r/DefinedAs /c/en/number_zero \"0\" \"number zero\" \"defined as\" \"CN\" \"[[0]] is the [[number zero]].\"\n", "/c/en/0-/r/HasContext-/c/en/internet_slang-0000 /c/en/0 /r/HasContext /c/en/internet_slang \"0\" \"internet slang\" \"has context\" \"CN\"\n", "/c/en/0-/r/HasProperty-/c/en/pronounced_zero-0000 /c/en/0 /r/HasProperty /c/en/pronounced_zero \"0\" \"pronounced zero\" \"has property\" \"CN\" \"[[\\\"0\\\"]] is [[pronounced zero]]\"\n", "/c/en/0-/r/IsA-/c/en/set_containing_one_element-0000 /c/en/0 /r/IsA /c/en/set_containing_one_element \"0\" \"set containing one element\" \"is a\" \"CN\" \"[[{0}]] is a type of [[set containing one element]].\"\n", "/c/en/0-/r/RelatedTo-/c/en/1-0000 /c/en/0 /r/RelatedTo /c/en/1 \"0\" \"1\" \"related to\" \"CN\"\n", "/c/en/0-/r/RelatedTo-/c/en/2-0000 /c/en/0 /r/RelatedTo /c/en/2 \"0\" \"2\" \"related to\" \"CN\"\n", "/c/en/0.22_inch_calibre-/r/IsA-/c/en/5.6_millimetres-0000 /c/en/0.22_inch_calibre /r/IsA /c/en/5.6_millimetres \"0.22 inch calibre\" \"5.6 millimetres\" \"is a\" \"CN\" \"[[0.22 inch calibre]] is [[5.6 millimetres]]\"\n" ] } ], "source": [ "!zcat < \"$KG\" | head | column -t -s $'\\t' " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Normalize the file so that it is easier to process with Kypher" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Opening the input file: /Users/pedroszekely/Downloads/kypher/cskg/cskg_connected.kgtk.gz\n", "KgtkReader: File_path.suffix: .gz\n", "KgtkReader: reading gzip /Users/pedroszekely/Downloads/kypher/cskg/cskg_connected.kgtk.gz\n", "header: id\tnode1\trelation\tnode2\tnode1;label\tnode2;label\trelation;label\trelation;dimension\tsource\tsentence\n", "KgtkReader: Special columns: node1=1 label=2 node2=3 id=0\n", "KgtkReader: Reading an edge file.\n", "Node1 column name: node1\n", "Label column name: relation\n", "Node2 column name: node2\n", "Id column name: id\n", "The following columns will be lowered or normalized\n", " node1;label from node1 (label 'label')\n", " node2;label from node2 (label 'label')\n", " relation;label from relation (label 'label')\n", " relation;dimension from relation (label 'dimension')\n", " source from id (label 'source')\n", " sentence from id (label 'sentence')\n", "The output columns are: id node1 relation node2\n", "Opening the output file: /Users/pedroszekely/Downloads/kypher/cskg/temp.cskg.normalize.1.kgtk.gz\n", "File_path.suffix: .gz\n", "KgtkWriter: writing gzip /Users/pedroszekely/Downloads/kypher/cskg/temp.cskg.normalize.1.kgtk.gz\n", "header: id\tnode1\trelation\tnode2\n", "Read 6003237 rows, wrote 15120425 rows with 9117188 labels.\n" ] } ], "source": [ "!kgtk normalize --verbose -i $KG -o $CSKG/temp.cskg.normalize.1.kgtk.gz --columns-to-lower 'relation;dimension' source sentence 'node1;label' 'relation;label' 'node2;label'" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "zcat: error writing to output: Broken pipe\n", "id node1 relation node2\n", "/c/en/0-/r/DefinedAs-/c/en/empty_set-0000 /c/en/0 /r/DefinedAs /c/en/empty_set\n", "/c/en/0-/r/DefinedAs-/c/en/empty_set-0000 source \"CN\"\n", "/c/en/0-/r/DefinedAs-/c/en/empty_set-0000 sentence \"[[0]] is the [[empty set]].\"\n", "/c/en/0 label \"0\"\n", "/r/DefinedAs label \"defined as\"\n", "/c/en/empty_set label \"empty set\"\n", "/c/en/0-/r/DefinedAs-/c/en/first_limit_ordinal-0000 /c/en/0 /r/DefinedAs /c/en/first_limit_ordinal\n", "/c/en/0-/r/DefinedAs-/c/en/first_limit_ordinal-0000 source \"CN\"\n", "/c/en/0-/r/DefinedAs-/c/en/first_limit_ordinal-0000 sentence \"[[0]] is the [[first limit ordinal]].\"\n" ] } ], "source": [ "!zcat < $CSKG/temp.cskg.normalize.1.kgtk.gz | head | column -t -s $'\\t' " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Rename the columns to the standard `node1/label/node2` and add ids" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "!kgtk rename-columns --mode NONE -i $CSKG/temp.cskg.normalize.1.kgtk.gz --output-columns id node1 label node2 \\\n", "/ add-id --id-style node1-label-node2 -o $NKG" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Count the number of edges and nodes" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 220.69 real 269.23 user 25.36 sys\n", "num_edges num_nodes num_relations num_values\n", "15120425 8164285 84 3285154\n" ] } ], "source": [ "!$kypher -i $NKG \\\n", "--match '(n1)-[e]->(n2)' \\\n", "--return 'count(e) as num_edges, count(distinct n1) as num_nodes, count(distinct e.label) as num_relations, count(distinct n2) as num_values' \\\n", "| column -t -s $'\\t' " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Some Statistics" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 29.24 real 22.94 user 5.21 sys\n", "\n" ] } ], "source": [ "command = \"$kypher -i $NKG \\\n", "--match '(n1)-[e]->(n2)' \\\n", "--return 'distinct e.label, count(distinct n1) as nodes' \\\n", "--order-by 'count(distinct n1) desc'\"\n", "stats = shell_df(command, shell=True, sep='\\t')" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bar_chart(stats[:20], 'nodes', 'label', title=\"Relations in CSKG\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Distribution of edges in each data source" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 63.68 real 50.39 user 10.79 sys\n", "\n" ] }, { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "command = \"$kypher -i $NKG \\\n", "--match '(n1)-[r]->(n2), (r)-[l:source]->(s)' \\\n", "--return 's as source, count(distinct r) as `count of relations`' \\\n", "--order-by 'count(distinct r) desc'\"\n", "data = shell_df(command, shell=True, sep='\\t')\n", "bar_chart(data, 'count of relations', 'source', title=\"CSKG Relation Counts\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compute the distribuiton of relations in each data source" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 33.97 real 25.07 user 5.16 sys\n", "\n", " 1.79 real 1.18 user 0.57 sys\n", "\n", " 5.39 real 2.51 user 1.05 sys\n", "\n", " 2.46 real 1.21 user 0.47 sys\n", "\n", " 0.88 real 0.59 user 0.15 sys\n", "\n", " 1.37 real 0.81 user 0.21 sys\n", "\n", " 6.86 real 4.12 user 1.23 sys\n", "\n" ] } ], "source": [ "command = \"$kypher -i $NKG \\\n", "--match '(n1)-[r {label: label}]->(n2), (r)-[:source]->(source:`\\\"SOURCE\\\"`)' \\\n", "--return 'label as relation, count(distinct n1) as count' \\\n", "--order-by 'count(distinct n1) desc'\"\n", "datasets = []\n", "for source in [\"CN\", \"WN\", \"AT\", \"VG\", \"FN\", \"WD\", \"RG\"]:\n", " data = shell_df(command.replace(\"SOURCE\", source), shell=True, sep='\\t')\n", " datasets.append(data)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bar_chart(datasets[0], 'count', 'relation', title=\"ConceptNet: Count Of Relations\", width=200)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bar_chart(datasets[1], 'count', 'relation', title=\"WordNet: Count Of Relations\", width=200)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bar_chart(datasets[2], 'count', 'relation', title=\"Atomic: Count Of Relations\", width=200)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bar_chart(datasets[3], 'count', 'relation', title=\"Visual Genome: Count Of Relations\", width=200)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bar_chart(datasets[4], 'count', 'relation', title= \"FrameNet: Count Of Relations\", width=200)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bar_chart(datasets[5], 'count', 'relation', \"Wikidata: Count Of Relations\", width=200)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bar_chart(datasets[6], 'count', 'relation', title=\"Roget: Count Of Relations\", width=200)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### ConceptNet nodes that contain `catch` and `throw`" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "https://www.conceptnet.io/c/en/catch/n\n", "https://www.conceptnet.io/c/en/catch/n/wn/act\n", "https://www.conceptnet.io/c/en/catch/n/wn/artifact\n", "https://www.conceptnet.io/c/en/catch/n/wn/attribute\n", "https://www.conceptnet.io/c/en/catch/n/wn/object\n", "https://www.conceptnet.io/c/en/catch/n/wn/person\n", "https://www.conceptnet.io/c/en/catch/n/wn/quantity\n", "https://www.conceptnet.io/c/en/catch/v\n", "https://www.conceptnet.io/c/en/catch/v/wn/body\n", "https://www.conceptnet.io/c/en/catch/v/wn/competition\n", "https://www.conceptnet.io/c/en/catch/v/wn/emotion\n", "https://www.conceptnet.io/c/en/catch/v/wn/motion\n", "https://www.conceptnet.io/c/en/catch/v/wn/perception\n", "https://www.conceptnet.io/c/en/catch/v/wn/possession\n", "https://www.conceptnet.io/c/en/catch/v/wn/social\n" ] } ], "source": [ "catch = !$kypher -i $NKG \\\n", "--match '(n1)-[r {label: label}]->(n2), (r)-[:source]->(source)' \\\n", "--where 'source = $s and n1 =~ \".*/catch/.*\"' \\\n", "--return 'distinct n1 as node1' \\\n", "--spara s='CN' \\\n", "--limit 100\n", "\n", "for n in catch[1:-1]:\n", " print(\"https://www.conceptnet.io\"+n)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "https://www.conceptnet.io/c/en/thro\n", "https://www.conceptnet.io/c/en/thro/a/wikt/en_2\n", "https://www.conceptnet.io/c/en/throw\n", "https://www.conceptnet.io/c/en/throw/n\n", "https://www.conceptnet.io/c/en/throw/n/wikt/en_1\n", "https://www.conceptnet.io/c/en/throw/n/wikt/en_2\n", "https://www.conceptnet.io/c/en/throw/n/wikt/en_3\n", "https://www.conceptnet.io/c/en/throw/n/wikt/en_4\n", "https://www.conceptnet.io/c/en/throw/n/wn/artifact\n", "https://www.conceptnet.io/c/en/throw/n/wn/event\n", "https://www.conceptnet.io/c/en/throw/n/wn/state\n", "https://www.conceptnet.io/c/en/throw/n/wp/grappling\n", "https://www.conceptnet.io/c/en/throw/v\n", "https://www.conceptnet.io/c/en/throw/v/wikt/en_1\n", "https://www.conceptnet.io/c/en/throw/v/wikt/en_2\n", "https://www.conceptnet.io/c/en/throw/v/wn/cognition\n", "https://www.conceptnet.io/c/en/throw/v/wn/communication\n", "https://www.conceptnet.io/c/en/throw/v/wn/emotion\n" ] } ], "source": [ "throw = !$kypher -i $NKG \\\n", "--match '(n1)-[r {label: label}]->(n2), (r)-[:source]->(source)' \\\n", "--where 'source in [$cn] and n1 =~ \".*/throw?\\\\b.*\"' \\\n", "--return 'distinct n1 as node1' \\\n", "--spara cn='CN' --spara wd='WD' --spara vg='VG'\\\n", "--limit 100\n", "\n", "for n in throw[1:-1]:\n", " print(\"https://www.conceptnet.io\"+n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### ConceptNet nodes that contain `dog`" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "https://www.conceptnet.io/c/en/boxer/n/wp/dog\n", "https://www.conceptnet.io/c/en/coat/n/wp/dog\n", "https://www.conceptnet.io/c/en/dog\n", "https://www.conceptnet.io/c/en/dog's/n\n", "https://www.conceptnet.io/c/en/dog's_abuse/n\n", "https://www.conceptnet.io/c/en/dog's_acoustical_sense\n", "https://www.conceptnet.io/c/en/dog's_age/n\n", "https://www.conceptnet.io/c/en/dog's_bark\n", "https://www.conceptnet.io/c/en/dog's_bollocks\n", "https://www.conceptnet.io/c/en/dog's_bollocks/n\n", "https://www.conceptnet.io/c/en/dog's_breakfast\n", "https://www.conceptnet.io/c/en/dog's_breakfast/n\n", "https://www.conceptnet.io/c/en/dog's_breakfast/n/wn/state\n", "https://www.conceptnet.io/c/en/dog's_breakfasts/n\n", "https://www.conceptnet.io/c/en/dog's_chance\n", "https://www.conceptnet.io/c/en/dog's_chance/n\n", "https://www.conceptnet.io/c/en/dog's_cherries/n\n", "https://www.conceptnet.io/c/en/dog's_cherry\n", "https://www.conceptnet.io/c/en/dog's_dinner/n\n", "https://www.conceptnet.io/c/en/dog's_dinner/n/wn/state\n", "https://www.conceptnet.io/c/en/dog's_ear/v\n", "https://www.conceptnet.io/c/en/dog's_fingers\n", "https://www.conceptnet.io/c/en/dog's_fur\n", "https://www.conceptnet.io/c/en/dog's_letter/n\n", "https://www.conceptnet.io/c/en/dog's_life/n\n", "https://www.conceptnet.io/c/en/dog's_mercury/n\n", "https://www.conceptnet.io/c/en/dog's_mercury/n/wn/plant\n", "https://www.conceptnet.io/c/en/dog's_olfactory_sense\n", "https://www.conceptnet.io/c/en/dog's_tail_grass/n\n", "https://www.conceptnet.io/c/en/dog's_tail_grasses/n\n", "https://www.conceptnet.io/c/en/dog's_tongue\n", "https://www.conceptnet.io/c/en/dog's_tongue/n\n", "https://www.conceptnet.io/c/en/dog's_tongues/n\n", "https://www.conceptnet.io/c/en/dog's_tooth_check/n/wn/artifact\n", "https://www.conceptnet.io/c/en/dog's_tooth_violet/n/wn/plant\n", "https://www.conceptnet.io/c/en/dog's_tooth_violets/n\n", "https://www.conceptnet.io/c/en/dog's_weight\n", "https://www.conceptnet.io/c/en/dog/n\n", "https://www.conceptnet.io/c/en/dog/n/wn/animal\n", "https://www.conceptnet.io/c/en/dog/n/wn/artifact\n", "https://www.conceptnet.io/c/en/dog/n/wn/food\n", "https://www.conceptnet.io/c/en/dog/n/wn/person\n", "https://www.conceptnet.io/c/en/dog/n/wp/goya\n", "https://www.conceptnet.io/c/en/dog/v\n", "https://www.conceptnet.io/c/en/dog/v/wn/motion\n", "https://www.conceptnet.io/c/en/dogs\n", "https://www.conceptnet.io/c/en/dogs/n\n", "https://www.conceptnet.io/c/en/dogs/v\n", "https://www.conceptnet.io/c/en/hokkaido/n/wp/dog\n", "https://www.conceptnet.io/c/en/moose/n/wp/dog\n" ] } ], "source": [ "dogs = !$kypher -i $NKG \\\n", "--match '(n1)-[r {label: label}]->(n2), (r)-[:source]->(source)' \\\n", "--where 'source in [$cn] and n1 =~ \".*/dogs?\\\\b.*\"' \\\n", "--return 'distinct n1 as node1' \\\n", "--spara cn='CN' \\\n", "--limit 100\n", "\n", "for n in dogs[1:-1]:\n", " print(\"https://www.conceptnet.io\"+n)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "https://www.conceptnet.io/c/en/frisbee\n", "https://www.conceptnet.io/c/en/frisbee/n\n", "https://www.conceptnet.io/c/en/frisbee/v\n", "https://www.conceptnet.io/c/en/frisbees\n", "https://www.conceptnet.io/c/en/frisbees/n\n", "https://www.conceptnet.io/c/en/frisbees/v\n" ] } ], "source": [ "frisbee = !$kypher -i $NKG \\\n", "--match '(n1)-[r {label: label}]->(n2), (r)-[:source]->(source)' \\\n", "--where 'source in [$cn] and n1 =~ \".*/frisbees?\\\\b.*\"' \\\n", "--return 'distinct n1 as node1' \\\n", "--spara cn='CN' --spara wd='WD' --spara vg='VG'\\\n", "--limit 100\n", "\n", "for n in frisbee[1:-1]:\n", " print(\"https://www.conceptnet.io\"+n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Get all the nodes in ConceptNet, Wikidata-CS and VisualGenone that contain `frisbee`" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 279.25 real 238.73 user 12.22 sys\n" ] } ], "source": [ "!$kypher -i $NKG \\\n", "--match '(n1)-[r {label: label}]->(n2), (r)-[:source]->(source), (n1)-[:label]->(n1_label)' \\\n", "--where 'source in [$cn, $vg, $wd] and n1 =~ \".*frisbees?.*\"' \\\n", "--return 'distinct source as source, n1 as node1, n1_label as `node1 label`, label as relation, n2 as node2' \\\n", "--order-by 'source, n1' \\\n", "--spara cn='CN' --spara wd='WD' --spara vg='VG'\\\n", "-o $CSKG/frisbee.tsv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have many edges that relate to `frisbee`" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 12828 73835 953300 /Users/pedroszekely/Downloads/kypher/cskg/frisbee.tsv\n" ] } ], "source": [ "!wc $CSKG/frisbee.tsv" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "source node1 node1 label relation node2\n", "\"CN\" /c/en/capturing_frisbee \"capturing frisbee\" /r/HasPrerequisite /c/en/coordination\n", "\"CN\" /c/en/dogs_catching_frisbees \"dogs catching frisbees\" /r/AtLocation /c/en/park\n", "\"CN\" /c/en/frisbee \"frisbee\" /r/AtLocation /c/en/air\n", "\"CN\" /c/en/frisbee \"frisbee\" /r/AtLocation /c/en/deadhead's_van\n", "\"CN\" /c/en/frisbee \"frisbee\" /r/AtLocation /c/en/frisbee_golf_course\n", "\"CN\" /c/en/frisbee \"frisbee\" /r/AtLocation /c/en/park\n", "\"CN\" /c/en/frisbee \"frisbee\" /r/AtLocation /c/en/roof\n", "\"CN\" /c/en/frisbee \"frisbee\" /r/AtLocation /c/en/toy_chest\n", "\"CN\" /c/en/frisbee \"frisbee\" /r/AtLocation /c/en/tree\n" ] } ], "source": [ "!head $CSKG/frisbee.tsv | column -t -s $'\\t' " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "kgtk", "language": "python", "name": "kgtk" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.8" } }, "nbformat": 4, "nbformat_minor": 4 }