{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# KGTK Tutorial: Introduction\n",
    "\n",
    "We begin the tutorial with a quick overview of some of the commands in KGTK. Then we turn our attention to working with Wikidata."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Tutorial Setup\n",
    "\n",
    "Import utility functions and define environment variables for the folders and files that we will use"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ALIAS: \"/Users/pedroszekely/Downloads/kgtk-tutorial/miniwikidata/aliases.en.tsv.gz\"\n",
      "ALL: \"/Users/pedroszekely/Downloads/kgtk-tutorial/miniwikidata/all.tsv.gz\"\n",
      "CLAIMS: \"/Users/pedroszekely/Downloads/kgtk-tutorial/miniwikidata/claims.tsv.gz\"\n",
      "DESCRIPTION: \"/Users/pedroszekely/Downloads/kgtk-tutorial/miniwikidata/descriptions.en.tsv.gz\"\n",
      "EXAMPLES_DIR: \"/Users/pedroszekely/Documents/GitHub/kgtk/examples\"\n",
      "GE: \"/Users/pedroszekely/Downloads/kgtk-tutorial/temp/graph-embedding\"\n",
      "ISA: \"/Users/pedroszekely/Downloads/kgtk-tutorial/miniwikidata/derived.isa.tsv.gz\"\n",
      "ITEM: \"/Users/pedroszekely/Downloads/kgtk-tutorial/miniwikidata/claims.wikibase-item.tsv.gz\"\n",
      "KGTK_PATH: \"/Users/pedroszekely/Documents/GitHub/kgtk\"\n",
      "LABEL: \"/Users/pedroszekely/Downloads/kgtk-tutorial/miniwikidata/labels.en.tsv.gz\"\n",
      "OUT: \"/Users/pedroszekely/Downloads/kgtk-tutorial/output\"\n",
      "P279: \"/Users/pedroszekely/Downloads/kgtk-tutorial/miniwikidata/derived.P279.tsv.gz\"\n",
      "P279STAR: \"/Users/pedroszekely/Downloads/kgtk-tutorial/miniwikidata/derived.P279star.tsv.gz\"\n",
      "PROPERTY_DATATYPES: \"/Users/pedroszekely/Downloads/kgtk-tutorial/miniwikidata/metadata.property.datatypes.tsv.gz\"\n",
      "Q154ALIAS: \"/Users/pedroszekely/Downloads/kgtk-tutorial/output/parts/aliases.en.tsv.gz\"\n",
      "Q154ALL: \"/Users/pedroszekely/Downloads/kgtk-tutorial/output/parts/all.tsv.gz\"\n",
      "Q154CLAIMS: \"/Users/pedroszekely/Downloads/kgtk-tutorial/output/parts/claims.tsv.gz\"\n",
      "Q154DESCRIPTION: \"/Users/pedroszekely/Downloads/kgtk-tutorial/output/parts/descriptions.en.tsv.gz\"\n",
      "Q154ISA: \"/Users/pedroszekely/Downloads/kgtk-tutorial/output/parts/derived.isa.tsv.gz\"\n",
      "Q154ITEM: \"/Users/pedroszekely/Downloads/kgtk-tutorial/output/parts/claims.wikibase-item.tsv.gz\"\n",
      "Q154LABEL: \"/Users/pedroszekely/Downloads/kgtk-tutorial/output/parts/labels.en.tsv.gz\"\n",
      "Q154P279: \"/Users/pedroszekely/Downloads/kgtk-tutorial/output/parts/derived.P279.tsv.gz\"\n",
      "Q154P279STAR: \"/Users/pedroszekely/Downloads/kgtk-tutorial/output/parts/derived.P279star.tsv.gz\"\n",
      "Q154PROPERTY_DATATYPES: \"/Users/pedroszekely/Downloads/kgtk-tutorial/output/parts/metadata.property.datatypes.tsv.gz\"\n",
      "Q154QUALIFIERS: \"/Users/pedroszekely/Downloads/kgtk-tutorial/output/parts/qualifiers.tsv.gz\"\n",
      "Q154QUALIFIERS_TIME: \"/Users/pedroszekely/Downloads/kgtk-tutorial/output/parts/qualifiers.time.tsv.gz\"\n",
      "Q154SITELINKS: \"/Users/pedroszekely/Downloads/kgtk-tutorial/output/parts/sitelinks.tsv.gz\"\n",
      "QUALIFIERS: \"/Users/pedroszekely/Downloads/kgtk-tutorial/miniwikidata/qualifiers.tsv.gz\"\n",
      "QUALIFIERS_TIME: \"/Users/pedroszekely/Downloads/kgtk-tutorial/miniwikidata/qualifiers.time.tsv.gz\"\n",
      "SITELINKS: \"/Users/pedroszekely/Downloads/kgtk-tutorial/miniwikidata/sitelinks.tsv.gz\"\n",
      "STORE: \"/Users/pedroszekely/Downloads/kgtk-tutorial/wikidata.sqlite3.miniwikidata.db\"\n",
      "TE: \"/Users/pedroszekely/Downloads/kgtk-tutorial/temp/text-embedding\"\n",
      "TEMP: \"/Users/pedroszekely/Downloads/kgtk-tutorial/temp\"\n",
      "USECASE_DIR: \"/Users/pedroszekely/Documents/GitHub/kgtk/use-cases\"\n",
      "WIKIDATA: \"/Users/pedroszekely/Downloads/kgtk-tutorial/miniwikidata/\"\n",
      "kgtk: \"kgtk --debug\"\n",
      "kypher: \"kgtk query --graph-cache /Users/pedroszekely/Downloads/kgtk-tutorial/wikidata.sqlite3.miniwikidata.db\"\n"
     ]
    }
   ],
   "source": [
    "import sys  \n",
    "sys.path.insert(0, 'tutorial')\n",
    "from tutorial_setup import *"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/Users/pedroszekely/Downloads/kgtk-tutorial\n"
     ]
    }
   ],
   "source": [
    "!mkdir -p {output_path}\n",
    "%cd {output_path}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir -p {output_folder}\n",
    "!mkdir -p {temp_folder}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir -p \"$GE\"\n",
    "!mkdir -p \"$TE\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Quick Tour Of KGTK Commands\n",
    "\n",
    "Our sample input file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>node1</th>\n",
       "      <th>label</th>\n",
       "      <th>node2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td></td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>label</td>\n",
       "      <td>\"Terminator 2\"@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td></td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>instance_of</td>\n",
       "      <td>film</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td></td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>genre</td>\n",
       "      <td>science_fiction</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td></td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>genre</td>\n",
       "      <td>action</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>t4</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>cast</td>\n",
       "      <td>a_schwarzenegger</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td></td>\n",
       "      <td>t4</td>\n",
       "      <td>role</td>\n",
       "      <td>terminator</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>t6</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>cast</td>\n",
       "      <td>l_hamilton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td></td>\n",
       "      <td>t6</td>\n",
       "      <td>role</td>\n",
       "      <td>s_connor</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>t8</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>award</td>\n",
       "      <td>academy-best-sound-editing</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td></td>\n",
       "      <td>t8</td>\n",
       "      <td>point_in_time</td>\n",
       "      <td>^1992-03-30T00:00:00Z/11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td></td>\n",
       "      <td>t8</td>\n",
       "      <td>winner</td>\n",
       "      <td>g_rydstrom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td></td>\n",
       "      <td>t8</td>\n",
       "      <td>winner</td>\n",
       "      <td>g_borders</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td></td>\n",
       "      <td>l_hamilton</td>\n",
       "      <td>label</td>\n",
       "      <td>\"Linda Hamilton\"@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td></td>\n",
       "      <td>a_schwarzenegger</td>\n",
       "      <td>label</td>\n",
       "      <td>\"Arnold Schwarzenegger\"@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td></td>\n",
       "      <td>film</td>\n",
       "      <td>subclass_of</td>\n",
       "      <td>visual_artwork</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td></td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>publication_date</td>\n",
       "      <td>^1984-10-26T00:00:00Z/11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td></td>\n",
       "      <td>t15</td>\n",
       "      <td>location</td>\n",
       "      <td>united_states</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td></td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>publication_date</td>\n",
       "      <td>^1985-02-08T00:00:00Z/11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td></td>\n",
       "      <td>t17</td>\n",
       "      <td>location</td>\n",
       "      <td>sweden</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td></td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>duration</td>\n",
       "      <td>108minute</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td></td>\n",
       "      <td>instance_of</td>\n",
       "      <td>label</td>\n",
       "      <td>\"instance of\"@en</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    id             node1             label                       node2\n",
       "0         terminator2_jd             label           \"Terminator 2\"@en\n",
       "1         terminator2_jd       instance_of                        film\n",
       "2         terminator2_jd             genre             science_fiction\n",
       "3         terminator2_jd             genre                      action\n",
       "4   t4    terminator2_jd              cast            a_schwarzenegger\n",
       "5                     t4              role                  terminator\n",
       "6   t6    terminator2_jd              cast                  l_hamilton\n",
       "7                     t6              role                    s_connor\n",
       "8   t8    terminator2_jd             award  academy-best-sound-editing\n",
       "9                     t8     point_in_time    ^1992-03-30T00:00:00Z/11\n",
       "10                    t8            winner                  g_rydstrom\n",
       "11                    t8            winner                   g_borders\n",
       "12            l_hamilton             label         \"Linda Hamilton\"@en\n",
       "13      a_schwarzenegger             label  \"Arnold Schwarzenegger\"@en\n",
       "14                  film       subclass_of              visual_artwork\n",
       "15        terminator2_jd  publication_date    ^1984-10-26T00:00:00Z/11\n",
       "16                   t15          location               united_states\n",
       "17        terminator2_jd  publication_date    ^1985-02-08T00:00:00Z/11\n",
       "18                   t17          location                      sweden\n",
       "19        terminator2_jd          duration                   108minute\n",
       "20           instance_of             label            \"instance of\"@en"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lines = !$kgtk cat -i \"$KGTK_PATH\"/tutorial/datasets/movies.tsv\n",
    "kgtk_to_dataframe(lines)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Many edges are missing ids, let's add ids for them. We are adding wikidata-style ids, but there are many other styles:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>node1</th>\n",
       "      <th>label</th>\n",
       "      <th>node2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>terminator2_jd-label-01de63</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>label</td>\n",
       "      <td>\"Terminator 2\"@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>terminator2_jd-instance_of-d0607f</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>instance_of</td>\n",
       "      <td>film</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>terminator2_jd-genre-2e6128</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>genre</td>\n",
       "      <td>science_fiction</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>terminator2_jd-genre-bd938c</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>genre</td>\n",
       "      <td>action</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>t4</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>cast</td>\n",
       "      <td>a_schwarzenegger</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>t4-role-aa802f</td>\n",
       "      <td>t4</td>\n",
       "      <td>role</td>\n",
       "      <td>terminator</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>t6</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>cast</td>\n",
       "      <td>l_hamilton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>t6-role-a29a51</td>\n",
       "      <td>t6</td>\n",
       "      <td>role</td>\n",
       "      <td>s_connor</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>t8</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>award</td>\n",
       "      <td>academy-best-sound-editing</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>t8-point_in_time-370fac</td>\n",
       "      <td>t8</td>\n",
       "      <td>point_in_time</td>\n",
       "      <td>^1992-03-30T00:00:00Z/11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>t8-winner-dc3cda</td>\n",
       "      <td>t8</td>\n",
       "      <td>winner</td>\n",
       "      <td>g_rydstrom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>t8-winner-211455</td>\n",
       "      <td>t8</td>\n",
       "      <td>winner</td>\n",
       "      <td>g_borders</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>l_hamilton-label-2b3667</td>\n",
       "      <td>l_hamilton</td>\n",
       "      <td>label</td>\n",
       "      <td>\"Linda Hamilton\"@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>a_schwarzenegger-label-2a4c28</td>\n",
       "      <td>a_schwarzenegger</td>\n",
       "      <td>label</td>\n",
       "      <td>\"Arnold Schwarzenegger\"@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>film-subclass_of-f126ab</td>\n",
       "      <td>film</td>\n",
       "      <td>subclass_of</td>\n",
       "      <td>visual_artwork</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>terminator2_jd-publication_date-e29331</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>publication_date</td>\n",
       "      <td>^1984-10-26T00:00:00Z/11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>t15-location-303f2a</td>\n",
       "      <td>t15</td>\n",
       "      <td>location</td>\n",
       "      <td>united_states</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>terminator2_jd-publication_date-6aeb53</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>publication_date</td>\n",
       "      <td>^1985-02-08T00:00:00Z/11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>t17-location-295099</td>\n",
       "      <td>t17</td>\n",
       "      <td>location</td>\n",
       "      <td>sweden</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>terminator2_jd-duration-79d04d</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>duration</td>\n",
       "      <td>108minute</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>instance_of-label-0e46af</td>\n",
       "      <td>instance_of</td>\n",
       "      <td>label</td>\n",
       "      <td>\"instance of\"@en</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                        id             node1  \\\n",
       "0              terminator2_jd-label-01de63    terminator2_jd   \n",
       "1        terminator2_jd-instance_of-d0607f    terminator2_jd   \n",
       "2              terminator2_jd-genre-2e6128    terminator2_jd   \n",
       "3              terminator2_jd-genre-bd938c    terminator2_jd   \n",
       "4                                       t4    terminator2_jd   \n",
       "5                           t4-role-aa802f                t4   \n",
       "6                                       t6    terminator2_jd   \n",
       "7                           t6-role-a29a51                t6   \n",
       "8                                       t8    terminator2_jd   \n",
       "9                  t8-point_in_time-370fac                t8   \n",
       "10                        t8-winner-dc3cda                t8   \n",
       "11                        t8-winner-211455                t8   \n",
       "12                 l_hamilton-label-2b3667        l_hamilton   \n",
       "13           a_schwarzenegger-label-2a4c28  a_schwarzenegger   \n",
       "14                 film-subclass_of-f126ab              film   \n",
       "15  terminator2_jd-publication_date-e29331    terminator2_jd   \n",
       "16                     t15-location-303f2a               t15   \n",
       "17  terminator2_jd-publication_date-6aeb53    terminator2_jd   \n",
       "18                     t17-location-295099               t17   \n",
       "19          terminator2_jd-duration-79d04d    terminator2_jd   \n",
       "20                instance_of-label-0e46af       instance_of   \n",
       "\n",
       "               label                       node2  \n",
       "0              label           \"Terminator 2\"@en  \n",
       "1        instance_of                        film  \n",
       "2              genre             science_fiction  \n",
       "3              genre                      action  \n",
       "4               cast            a_schwarzenegger  \n",
       "5               role                  terminator  \n",
       "6               cast                  l_hamilton  \n",
       "7               role                    s_connor  \n",
       "8              award  academy-best-sound-editing  \n",
       "9      point_in_time    ^1992-03-30T00:00:00Z/11  \n",
       "10            winner                  g_rydstrom  \n",
       "11            winner                   g_borders  \n",
       "12             label         \"Linda Hamilton\"@en  \n",
       "13             label  \"Arnold Schwarzenegger\"@en  \n",
       "14       subclass_of              visual_artwork  \n",
       "15  publication_date    ^1984-10-26T00:00:00Z/11  \n",
       "16          location               united_states  \n",
       "17  publication_date    ^1985-02-08T00:00:00Z/11  \n",
       "18          location                      sweden  \n",
       "19          duration                   108minute  \n",
       "20             label            \"instance of\"@en  "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lines = !$kgtk add-id --id-style wikidata -i \"$KGTK_PATH\"/tutorial/datasets/movies.tsv\n",
    "kgtk_to_dataframe(lines)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Put the new version of the movies with ids in a file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "!$kgtk add-id --id-style wikidata -i \"$KGTK_PATH\"/tutorial/datasets/movies.tsv \\\n",
    "-o \"$TEMP\"/movies.ids.tsv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Sort the file by id (there are many other ways to sort):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>node1</th>\n",
       "      <th>label</th>\n",
       "      <th>node2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a_schwarzenegger-label-2a4c28</td>\n",
       "      <td>a_schwarzenegger</td>\n",
       "      <td>label</td>\n",
       "      <td>\"Arnold Schwarzenegger\"@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>film-subclass_of-f126ab</td>\n",
       "      <td>film</td>\n",
       "      <td>subclass_of</td>\n",
       "      <td>visual_artwork</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>instance_of-label-0e46af</td>\n",
       "      <td>instance_of</td>\n",
       "      <td>label</td>\n",
       "      <td>\"instance of\"@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>l_hamilton-label-2b3667</td>\n",
       "      <td>l_hamilton</td>\n",
       "      <td>label</td>\n",
       "      <td>\"Linda Hamilton\"@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>t15-location-303f2a</td>\n",
       "      <td>t15</td>\n",
       "      <td>location</td>\n",
       "      <td>united_states</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>t17-location-295099</td>\n",
       "      <td>t17</td>\n",
       "      <td>location</td>\n",
       "      <td>sweden</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>t4</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>cast</td>\n",
       "      <td>a_schwarzenegger</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>t4-role-aa802f</td>\n",
       "      <td>t4</td>\n",
       "      <td>role</td>\n",
       "      <td>terminator</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>t6</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>cast</td>\n",
       "      <td>l_hamilton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>t6-role-a29a51</td>\n",
       "      <td>t6</td>\n",
       "      <td>role</td>\n",
       "      <td>s_connor</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>t8</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>award</td>\n",
       "      <td>academy-best-sound-editing</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>t8-point_in_time-370fac</td>\n",
       "      <td>t8</td>\n",
       "      <td>point_in_time</td>\n",
       "      <td>^1992-03-30T00:00:00Z/11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>t8-winner-211455</td>\n",
       "      <td>t8</td>\n",
       "      <td>winner</td>\n",
       "      <td>g_borders</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>t8-winner-dc3cda</td>\n",
       "      <td>t8</td>\n",
       "      <td>winner</td>\n",
       "      <td>g_rydstrom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>terminator2_jd-duration-79d04d</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>duration</td>\n",
       "      <td>108minute</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>terminator2_jd-genre-2e6128</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>genre</td>\n",
       "      <td>science_fiction</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>terminator2_jd-genre-bd938c</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>genre</td>\n",
       "      <td>action</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>terminator2_jd-instance_of-d0607f</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>instance_of</td>\n",
       "      <td>film</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>terminator2_jd-label-01de63</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>label</td>\n",
       "      <td>\"Terminator 2\"@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>terminator2_jd-publication_date-6aeb53</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>publication_date</td>\n",
       "      <td>^1985-02-08T00:00:00Z/11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>terminator2_jd-publication_date-e29331</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>publication_date</td>\n",
       "      <td>^1984-10-26T00:00:00Z/11</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                        id             node1  \\\n",
       "0            a_schwarzenegger-label-2a4c28  a_schwarzenegger   \n",
       "1                  film-subclass_of-f126ab              film   \n",
       "2                 instance_of-label-0e46af       instance_of   \n",
       "3                  l_hamilton-label-2b3667        l_hamilton   \n",
       "4                      t15-location-303f2a               t15   \n",
       "5                      t17-location-295099               t17   \n",
       "6                                       t4    terminator2_jd   \n",
       "7                           t4-role-aa802f                t4   \n",
       "8                                       t6    terminator2_jd   \n",
       "9                           t6-role-a29a51                t6   \n",
       "10                                      t8    terminator2_jd   \n",
       "11                 t8-point_in_time-370fac                t8   \n",
       "12                        t8-winner-211455                t8   \n",
       "13                        t8-winner-dc3cda                t8   \n",
       "14          terminator2_jd-duration-79d04d    terminator2_jd   \n",
       "15             terminator2_jd-genre-2e6128    terminator2_jd   \n",
       "16             terminator2_jd-genre-bd938c    terminator2_jd   \n",
       "17       terminator2_jd-instance_of-d0607f    terminator2_jd   \n",
       "18             terminator2_jd-label-01de63    terminator2_jd   \n",
       "19  terminator2_jd-publication_date-6aeb53    terminator2_jd   \n",
       "20  terminator2_jd-publication_date-e29331    terminator2_jd   \n",
       "\n",
       "               label                       node2  \n",
       "0              label  \"Arnold Schwarzenegger\"@en  \n",
       "1        subclass_of              visual_artwork  \n",
       "2              label            \"instance of\"@en  \n",
       "3              label         \"Linda Hamilton\"@en  \n",
       "4           location               united_states  \n",
       "5           location                      sweden  \n",
       "6               cast            a_schwarzenegger  \n",
       "7               role                  terminator  \n",
       "8               cast                  l_hamilton  \n",
       "9               role                    s_connor  \n",
       "10             award  academy-best-sound-editing  \n",
       "11     point_in_time    ^1992-03-30T00:00:00Z/11  \n",
       "12            winner                   g_borders  \n",
       "13            winner                  g_rydstrom  \n",
       "14          duration                   108minute  \n",
       "15             genre             science_fiction  \n",
       "16             genre                      action  \n",
       "17       instance_of                        film  \n",
       "18             label           \"Terminator 2\"@en  \n",
       "19  publication_date    ^1985-02-08T00:00:00Z/11  \n",
       "20  publication_date    ^1984-10-26T00:00:00Z/11  "
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lines = !$kgtk sort -i \"$TEMP\"/movies.ids.tsv\n",
    "kgtk_to_dataframe(lines)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It is nice to be able to see the labels of the nodes. We can use the lift command to lift the lables from rows to columns (It is possible to lift other relations too):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>node1</th>\n",
       "      <th>label</th>\n",
       "      <th>node2</th>\n",
       "      <th>node1;label</th>\n",
       "      <th>label;label</th>\n",
       "      <th>node2;label</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>terminator2_jd-instance_of-d0607f</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>instance_of</td>\n",
       "      <td>film</td>\n",
       "      <td>\"Terminator 2\"@en</td>\n",
       "      <td>\"instance of\"@en</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>terminator2_jd-genre-2e6128</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>genre</td>\n",
       "      <td>science_fiction</td>\n",
       "      <td>\"Terminator 2\"@en</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>terminator2_jd-genre-bd938c</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>genre</td>\n",
       "      <td>action</td>\n",
       "      <td>\"Terminator 2\"@en</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>t4</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>cast</td>\n",
       "      <td>a_schwarzenegger</td>\n",
       "      <td>\"Terminator 2\"@en</td>\n",
       "      <td></td>\n",
       "      <td>\"Arnold Schwarzenegger\"@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>t4-role-aa802f</td>\n",
       "      <td>t4</td>\n",
       "      <td>role</td>\n",
       "      <td>terminator</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>t6</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>cast</td>\n",
       "      <td>l_hamilton</td>\n",
       "      <td>\"Terminator 2\"@en</td>\n",
       "      <td></td>\n",
       "      <td>\"Linda Hamilton\"@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>t6-role-a29a51</td>\n",
       "      <td>t6</td>\n",
       "      <td>role</td>\n",
       "      <td>s_connor</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>t8</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>award</td>\n",
       "      <td>academy-best-sound-editing</td>\n",
       "      <td>\"Terminator 2\"@en</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>t8-point_in_time-370fac</td>\n",
       "      <td>t8</td>\n",
       "      <td>point_in_time</td>\n",
       "      <td>^1992-03-30T00:00:00Z/11</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>t8-winner-dc3cda</td>\n",
       "      <td>t8</td>\n",
       "      <td>winner</td>\n",
       "      <td>g_rydstrom</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>t8-winner-211455</td>\n",
       "      <td>t8</td>\n",
       "      <td>winner</td>\n",
       "      <td>g_borders</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>film-subclass_of-f126ab</td>\n",
       "      <td>film</td>\n",
       "      <td>subclass_of</td>\n",
       "      <td>visual_artwork</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>terminator2_jd-publication_date-e29331</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>publication_date</td>\n",
       "      <td>^1984-10-26T00:00:00Z/11</td>\n",
       "      <td>\"Terminator 2\"@en</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>t15-location-303f2a</td>\n",
       "      <td>t15</td>\n",
       "      <td>location</td>\n",
       "      <td>united_states</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>terminator2_jd-publication_date-6aeb53</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>publication_date</td>\n",
       "      <td>^1985-02-08T00:00:00Z/11</td>\n",
       "      <td>\"Terminator 2\"@en</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>t17-location-295099</td>\n",
       "      <td>t17</td>\n",
       "      <td>location</td>\n",
       "      <td>sweden</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>terminator2_jd-duration-79d04d</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>duration</td>\n",
       "      <td>108minute</td>\n",
       "      <td>\"Terminator 2\"@en</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                        id           node1             label  \\\n",
       "0        terminator2_jd-instance_of-d0607f  terminator2_jd       instance_of   \n",
       "1              terminator2_jd-genre-2e6128  terminator2_jd             genre   \n",
       "2              terminator2_jd-genre-bd938c  terminator2_jd             genre   \n",
       "3                                       t4  terminator2_jd              cast   \n",
       "4                           t4-role-aa802f              t4              role   \n",
       "5                                       t6  terminator2_jd              cast   \n",
       "6                           t6-role-a29a51              t6              role   \n",
       "7                                       t8  terminator2_jd             award   \n",
       "8                  t8-point_in_time-370fac              t8     point_in_time   \n",
       "9                         t8-winner-dc3cda              t8            winner   \n",
       "10                        t8-winner-211455              t8            winner   \n",
       "11                 film-subclass_of-f126ab            film       subclass_of   \n",
       "12  terminator2_jd-publication_date-e29331  terminator2_jd  publication_date   \n",
       "13                     t15-location-303f2a             t15          location   \n",
       "14  terminator2_jd-publication_date-6aeb53  terminator2_jd  publication_date   \n",
       "15                     t17-location-295099             t17          location   \n",
       "16          terminator2_jd-duration-79d04d  terminator2_jd          duration   \n",
       "\n",
       "                         node2        node1;label       label;label  \\\n",
       "0                         film  \"Terminator 2\"@en  \"instance of\"@en   \n",
       "1              science_fiction  \"Terminator 2\"@en                     \n",
       "2                       action  \"Terminator 2\"@en                     \n",
       "3             a_schwarzenegger  \"Terminator 2\"@en                     \n",
       "4                   terminator                                        \n",
       "5                   l_hamilton  \"Terminator 2\"@en                     \n",
       "6                     s_connor                                        \n",
       "7   academy-best-sound-editing  \"Terminator 2\"@en                     \n",
       "8     ^1992-03-30T00:00:00Z/11                                        \n",
       "9                   g_rydstrom                                        \n",
       "10                   g_borders                                        \n",
       "11              visual_artwork                                        \n",
       "12    ^1984-10-26T00:00:00Z/11  \"Terminator 2\"@en                     \n",
       "13               united_states                                        \n",
       "14    ^1985-02-08T00:00:00Z/11  \"Terminator 2\"@en                     \n",
       "15                      sweden                                        \n",
       "16                   108minute  \"Terminator 2\"@en                     \n",
       "\n",
       "                   node2;label  \n",
       "0                               \n",
       "1                               \n",
       "2                               \n",
       "3   \"Arnold Schwarzenegger\"@en  \n",
       "4                               \n",
       "5          \"Linda Hamilton\"@en  \n",
       "6                               \n",
       "7                               \n",
       "8                               \n",
       "9                               \n",
       "10                              \n",
       "11                              \n",
       "12                              \n",
       "13                              \n",
       "14                              \n",
       "15                              \n",
       "16                              "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lines = !$kgtk lift -i \"$TEMP\"/movies.ids.tsv \n",
    "kgtk_to_dataframe(lines)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The KGTK equivalent of grep:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>node1</th>\n",
       "      <th>label</th>\n",
       "      <th>node2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>terminator2_jd-genre-2e6128</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>genre</td>\n",
       "      <td>science_fiction</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>terminator2_jd-genre-bd938c</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>genre</td>\n",
       "      <td>action</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>t4</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>cast</td>\n",
       "      <td>a_schwarzenegger</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>t6</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>cast</td>\n",
       "      <td>l_hamilton</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                            id           node1  label             node2\n",
       "0  terminator2_jd-genre-2e6128  terminator2_jd  genre   science_fiction\n",
       "1  terminator2_jd-genre-bd938c  terminator2_jd  genre            action\n",
       "2                           t4  terminator2_jd   cast  a_schwarzenegger\n",
       "3                           t6  terminator2_jd   cast        l_hamilton"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lines = !$kgtk filter -i \"$TEMP\"/movies.ids.tsv -p \";cast,genre;\"\n",
    "kgtk_to_dataframe(lines)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Filter also supports regular expressioins. Here are the edges that have `mi` somewhere and end with `@en`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>node1</th>\n",
       "      <th>label</th>\n",
       "      <th>node2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>terminator2_jd-label-01de63</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>label</td>\n",
       "      <td>\"Terminator 2\"@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>l_hamilton-label-2b3667</td>\n",
       "      <td>l_hamilton</td>\n",
       "      <td>label</td>\n",
       "      <td>\"Linda Hamilton\"@en</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                            id           node1  label                node2\n",
       "0  terminator2_jd-label-01de63  terminator2_jd  label    \"Terminator 2\"@en\n",
       "1      l_hamilton-label-2b3667      l_hamilton  label  \"Linda Hamilton\"@en"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lines = !$kgtk filter -i \"$TEMP\"/movies.ids.tsv -p \";;mi.*@en\" --regex --match-type search\n",
    "kgtk_to_dataframe(lines)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `md` command makes it easy to convert the output to markdown:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "| id | node1 | label | node2 |\n",
      "| -- | -- | -- | -- |\n",
      "| terminator2_jd-genre-2e6128 | terminator2_jd | genre | science_fiction |\n",
      "| terminator2_jd-genre-bd938c | terminator2_jd | genre | action |\n",
      "| t4 | terminator2_jd | cast | a_schwarzenegger |\n",
      "| t6 | terminator2_jd | cast | l_hamilton |\n"
     ]
    }
   ],
   "source": [
    "!$kgtk filter -i \"$TEMP\"/movies.ids.tsv -p \";cast,genre;\" / md "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `cat` command has many output formats, so we can output CSV:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "id,node1,label,node2\n",
      "terminator2_jd-genre-2e6128,terminator2_jd,genre,science_fiction\n",
      "terminator2_jd-genre-bd938c,terminator2_jd,genre,action\n",
      "t4,terminator2_jd,cast,a_schwarzenegger\n",
      "t6,terminator2_jd,cast,l_hamilton\n"
     ]
    }
   ],
   "source": [
    "!$kgtk filter -i \"$TEMP\"/movies.ids.tsv -p \";cast,genre;\" / cat --output-format csv "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Can also output JSON (and several other formats):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[\n",
      "{\"id\":\"terminator2_jd-genre-2e6128\",\"node1\":\"terminator2_jd\",\"label\":\"genre\",\"node2\":\"science_fiction\"},\n",
      "{\"id\":\"terminator2_jd-genre-bd938c\",\"node1\":\"terminator2_jd\",\"label\":\"genre\",\"node2\":\"action\"},\n",
      "{\"id\":\"t4\",\"node1\":\"terminator2_jd\",\"label\":\"cast\",\"node2\":\"a_schwarzenegger\"},\n",
      "{\"id\":\"t6\",\"node1\":\"terminator2_jd\",\"label\":\"cast\",\"node2\":\"l_hamilton\"}\n",
      "]\n"
     ]
    }
   ],
   "source": [
    "!$kgtk filter -i \"$TEMP\"/movies.ids.tsv -p \";cast,genre;\" / cat --output-format json-map "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Remove the `id` and `label` columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>node1</th>\n",
       "      <th>node2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>science_fiction</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>action</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>a_schwarzenegger</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>l_hamilton</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            node1             node2\n",
       "0  terminator2_jd   science_fiction\n",
       "1  terminator2_jd            action\n",
       "2  terminator2_jd  a_schwarzenegger\n",
       "3  terminator2_jd        l_hamilton"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lines = !$kgtk filter -i \"$TEMP\"/movies.ids.tsv -p \";cast,genre;\"  \\\n",
    "/ remove-columns -c id label\n",
    "kgtk_to_dataframe(lines)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In one go remove the columns we don't want and then rename them to good names:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>movie_id</th>\n",
       "      <th>title</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>science_fiction</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>action</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>a_schwarzenegger</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>l_hamilton</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         movie_id             title\n",
       "0  terminator2_jd   science_fiction\n",
       "1  terminator2_jd            action\n",
       "2  terminator2_jd  a_schwarzenegger\n",
       "3  terminator2_jd        l_hamilton"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lines = !$kgtk filter -i \"$TEMP\"/movies.ids.tsv -p \";cast,genre;\"  \\\n",
    "/ remove-columns -c id label \\\n",
    "/ rename-columns --mode NONE --output-columns movie_id title \n",
    "\n",
    "kgtk_to_dataframe(lines)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Count the number of distinct values in column `label`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>node1</th>\n",
       "      <th>label</th>\n",
       "      <th>node2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>award</td>\n",
       "      <td>count</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>duration</td>\n",
       "      <td>count</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>instance_of</td>\n",
       "      <td>count</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>point_in_time</td>\n",
       "      <td>count</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>subclass_of</td>\n",
       "      <td>count</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>cast</td>\n",
       "      <td>count</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>genre</td>\n",
       "      <td>count</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>location</td>\n",
       "      <td>count</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>publication_date</td>\n",
       "      <td>count</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>role</td>\n",
       "      <td>count</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>winner</td>\n",
       "      <td>count</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>label</td>\n",
       "      <td>count</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               node1  label node2\n",
       "0              award  count     1\n",
       "1           duration  count     1\n",
       "2        instance_of  count     1\n",
       "3      point_in_time  count     1\n",
       "4        subclass_of  count     1\n",
       "5               cast  count     2\n",
       "6              genre  count     2\n",
       "7           location  count     2\n",
       "8   publication_date  count     2\n",
       "9               role  count     2\n",
       "10            winner  count     2\n",
       "11             label  count     4"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lines = !$kgtk unique -i \"$TEMP\"/movies.ids.tsv  --column label / sort -c node2\n",
    "\n",
    "kgtk_to_dataframe(lines)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Expand the structured literals into columns with the consittuents to make it easy for developers to parse the structured literals:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>node1</th>\n",
       "      <th>label</th>\n",
       "      <th>node2</th>\n",
       "      <th>node2;kgtk:data_type</th>\n",
       "      <th>node2;kgtk:valid</th>\n",
       "      <th>node2;kgtk:list_len</th>\n",
       "      <th>node2;kgtk:number</th>\n",
       "      <th>node2;kgtk:low_tolerance</th>\n",
       "      <th>node2;kgtk:high_tolerance</th>\n",
       "      <th>...</th>\n",
       "      <th>node2;kgtk:units_node</th>\n",
       "      <th>node2;kgtk:text</th>\n",
       "      <th>node2;kgtk:language</th>\n",
       "      <th>node2;kgtk:language_suffix</th>\n",
       "      <th>node2;kgtk:latitude</th>\n",
       "      <th>node2;kgtk:longitude</th>\n",
       "      <th>node2;kgtk:date_and_time</th>\n",
       "      <th>node2;kgtk:precision</th>\n",
       "      <th>node2;kgtk:truth</th>\n",
       "      <th>node2;kgtk:symbol</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>terminator2_jd-label-01de63</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>label</td>\n",
       "      <td>\"Terminator 2\"@en</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>terminator2_jd-instance_of-d0607f</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>instance_of</td>\n",
       "      <td>film</td>\n",
       "      <td>symbol</td>\n",
       "      <td>True</td>\n",
       "      <td>0</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>film</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>terminator2_jd-genre-2e6128</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>genre</td>\n",
       "      <td>science_fiction</td>\n",
       "      <td>symbol</td>\n",
       "      <td>True</td>\n",
       "      <td>0</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>science_fiction</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>terminator2_jd-genre-bd938c</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>genre</td>\n",
       "      <td>action</td>\n",
       "      <td>symbol</td>\n",
       "      <td>True</td>\n",
       "      <td>0</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>action</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>t4</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>cast</td>\n",
       "      <td>a_schwarzenegger</td>\n",
       "      <td>symbol</td>\n",
       "      <td>True</td>\n",
       "      <td>0</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>a_schwarzenegger</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>t4-role-aa802f</td>\n",
       "      <td>t4</td>\n",
       "      <td>role</td>\n",
       "      <td>terminator</td>\n",
       "      <td>symbol</td>\n",
       "      <td>True</td>\n",
       "      <td>0</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>terminator</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>t6</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>cast</td>\n",
       "      <td>l_hamilton</td>\n",
       "      <td>symbol</td>\n",
       "      <td>True</td>\n",
       "      <td>0</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>l_hamilton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>t6-role-a29a51</td>\n",
       "      <td>t6</td>\n",
       "      <td>role</td>\n",
       "      <td>s_connor</td>\n",
       "      <td>symbol</td>\n",
       "      <td>True</td>\n",
       "      <td>0</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>s_connor</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>t8</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>award</td>\n",
       "      <td>academy-best-sound-editing</td>\n",
       "      <td>symbol</td>\n",
       "      <td>True</td>\n",
       "      <td>0</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>academy-best-sound-editing</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>t8-point_in_time-370fac</td>\n",
       "      <td>t8</td>\n",
       "      <td>point_in_time</td>\n",
       "      <td>^1992-03-30T00:00:00Z/11</td>\n",
       "      <td>date_and_times</td>\n",
       "      <td>True</td>\n",
       "      <td>0</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>\"1992-03-30T00:00:00Z\"</td>\n",
       "      <td>11</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>t8-winner-dc3cda</td>\n",
       "      <td>t8</td>\n",
       "      <td>winner</td>\n",
       "      <td>g_rydstrom</td>\n",
       "      <td>symbol</td>\n",
       "      <td>True</td>\n",
       "      <td>0</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>g_rydstrom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>t8-winner-211455</td>\n",
       "      <td>t8</td>\n",
       "      <td>winner</td>\n",
       "      <td>g_borders</td>\n",
       "      <td>symbol</td>\n",
       "      <td>True</td>\n",
       "      <td>0</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>g_borders</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>l_hamilton-label-2b3667</td>\n",
       "      <td>l_hamilton</td>\n",
       "      <td>label</td>\n",
       "      <td>\"Linda Hamilton\"@en</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>a_schwarzenegger-label-2a4c28</td>\n",
       "      <td>a_schwarzenegger</td>\n",
       "      <td>label</td>\n",
       "      <td>\"Arnold Schwarzenegger\"@en</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>film-subclass_of-f126ab</td>\n",
       "      <td>film</td>\n",
       "      <td>subclass_of</td>\n",
       "      <td>visual_artwork</td>\n",
       "      <td>symbol</td>\n",
       "      <td>True</td>\n",
       "      <td>0</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>visual_artwork</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>terminator2_jd-publication_date-e29331</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>publication_date</td>\n",
       "      <td>^1984-10-26T00:00:00Z/11</td>\n",
       "      <td>date_and_times</td>\n",
       "      <td>True</td>\n",
       "      <td>0</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>\"1984-10-26T00:00:00Z\"</td>\n",
       "      <td>11</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>t15-location-303f2a</td>\n",
       "      <td>t15</td>\n",
       "      <td>location</td>\n",
       "      <td>united_states</td>\n",
       "      <td>symbol</td>\n",
       "      <td>True</td>\n",
       "      <td>0</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>united_states</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>terminator2_jd-publication_date-6aeb53</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>publication_date</td>\n",
       "      <td>^1985-02-08T00:00:00Z/11</td>\n",
       "      <td>date_and_times</td>\n",
       "      <td>True</td>\n",
       "      <td>0</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>\"1985-02-08T00:00:00Z\"</td>\n",
       "      <td>11</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>t17-location-295099</td>\n",
       "      <td>t17</td>\n",
       "      <td>location</td>\n",
       "      <td>sweden</td>\n",
       "      <td>symbol</td>\n",
       "      <td>True</td>\n",
       "      <td>0</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>sweden</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>terminator2_jd-duration-79d04d</td>\n",
       "      <td>terminator2_jd</td>\n",
       "      <td>duration</td>\n",
       "      <td>108minute</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>instance_of-label-0e46af</td>\n",
       "      <td>instance_of</td>\n",
       "      <td>label</td>\n",
       "      <td>\"instance of\"@en</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>...</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>21 rows × 21 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                        id             node1  \\\n",
       "0              terminator2_jd-label-01de63    terminator2_jd   \n",
       "1        terminator2_jd-instance_of-d0607f    terminator2_jd   \n",
       "2              terminator2_jd-genre-2e6128    terminator2_jd   \n",
       "3              terminator2_jd-genre-bd938c    terminator2_jd   \n",
       "4                                       t4    terminator2_jd   \n",
       "5                           t4-role-aa802f                t4   \n",
       "6                                       t6    terminator2_jd   \n",
       "7                           t6-role-a29a51                t6   \n",
       "8                                       t8    terminator2_jd   \n",
       "9                  t8-point_in_time-370fac                t8   \n",
       "10                        t8-winner-dc3cda                t8   \n",
       "11                        t8-winner-211455                t8   \n",
       "12                 l_hamilton-label-2b3667        l_hamilton   \n",
       "13           a_schwarzenegger-label-2a4c28  a_schwarzenegger   \n",
       "14                 film-subclass_of-f126ab              film   \n",
       "15  terminator2_jd-publication_date-e29331    terminator2_jd   \n",
       "16                     t15-location-303f2a               t15   \n",
       "17  terminator2_jd-publication_date-6aeb53    terminator2_jd   \n",
       "18                     t17-location-295099               t17   \n",
       "19          terminator2_jd-duration-79d04d    terminator2_jd   \n",
       "20                instance_of-label-0e46af       instance_of   \n",
       "\n",
       "               label                       node2 node2;kgtk:data_type  \\\n",
       "0              label           \"Terminator 2\"@en                        \n",
       "1        instance_of                        film               symbol   \n",
       "2              genre             science_fiction               symbol   \n",
       "3              genre                      action               symbol   \n",
       "4               cast            a_schwarzenegger               symbol   \n",
       "5               role                  terminator               symbol   \n",
       "6               cast                  l_hamilton               symbol   \n",
       "7               role                    s_connor               symbol   \n",
       "8              award  academy-best-sound-editing               symbol   \n",
       "9      point_in_time    ^1992-03-30T00:00:00Z/11       date_and_times   \n",
       "10            winner                  g_rydstrom               symbol   \n",
       "11            winner                   g_borders               symbol   \n",
       "12             label         \"Linda Hamilton\"@en                        \n",
       "13             label  \"Arnold Schwarzenegger\"@en                        \n",
       "14       subclass_of              visual_artwork               symbol   \n",
       "15  publication_date    ^1984-10-26T00:00:00Z/11       date_and_times   \n",
       "16          location               united_states               symbol   \n",
       "17  publication_date    ^1985-02-08T00:00:00Z/11       date_and_times   \n",
       "18          location                      sweden               symbol   \n",
       "19          duration                   108minute                        \n",
       "20             label            \"instance of\"@en                        \n",
       "\n",
       "   node2;kgtk:valid node2;kgtk:list_len node2;kgtk:number  \\\n",
       "0                                                           \n",
       "1              True                   0                     \n",
       "2              True                   0                     \n",
       "3              True                   0                     \n",
       "4              True                   0                     \n",
       "5              True                   0                     \n",
       "6              True                   0                     \n",
       "7              True                   0                     \n",
       "8              True                   0                     \n",
       "9              True                   0                     \n",
       "10             True                   0                     \n",
       "11             True                   0                     \n",
       "12                                                          \n",
       "13                                                          \n",
       "14             True                   0                     \n",
       "15             True                   0                     \n",
       "16             True                   0                     \n",
       "17             True                   0                     \n",
       "18             True                   0                     \n",
       "19                                                          \n",
       "20                                                          \n",
       "\n",
       "   node2;kgtk:low_tolerance node2;kgtk:high_tolerance  ...  \\\n",
       "0                                                      ...   \n",
       "1                                                      ...   \n",
       "2                                                      ...   \n",
       "3                                                      ...   \n",
       "4                                                      ...   \n",
       "5                                                      ...   \n",
       "6                                                      ...   \n",
       "7                                                      ...   \n",
       "8                                                      ...   \n",
       "9                                                      ...   \n",
       "10                                                     ...   \n",
       "11                                                     ...   \n",
       "12                                                     ...   \n",
       "13                                                     ...   \n",
       "14                                                     ...   \n",
       "15                                                     ...   \n",
       "16                                                     ...   \n",
       "17                                                     ...   \n",
       "18                                                     ...   \n",
       "19                                                     ...   \n",
       "20                                                     ...   \n",
       "\n",
       "   node2;kgtk:units_node node2;kgtk:text node2;kgtk:language  \\\n",
       "0                                                              \n",
       "1                                                              \n",
       "2                                                              \n",
       "3                                                              \n",
       "4                                                              \n",
       "5                                                              \n",
       "6                                                              \n",
       "7                                                              \n",
       "8                                                              \n",
       "9                                                              \n",
       "10                                                             \n",
       "11                                                             \n",
       "12                                                             \n",
       "13                                                             \n",
       "14                                                             \n",
       "15                                                             \n",
       "16                                                             \n",
       "17                                                             \n",
       "18                                                             \n",
       "19                                                             \n",
       "20                                                             \n",
       "\n",
       "   node2;kgtk:language_suffix node2;kgtk:latitude node2;kgtk:longitude  \\\n",
       "0                                                                        \n",
       "1                                                                        \n",
       "2                                                                        \n",
       "3                                                                        \n",
       "4                                                                        \n",
       "5                                                                        \n",
       "6                                                                        \n",
       "7                                                                        \n",
       "8                                                                        \n",
       "9                                                                        \n",
       "10                                                                       \n",
       "11                                                                       \n",
       "12                                                                       \n",
       "13                                                                       \n",
       "14                                                                       \n",
       "15                                                                       \n",
       "16                                                                       \n",
       "17                                                                       \n",
       "18                                                                       \n",
       "19                                                                       \n",
       "20                                                                       \n",
       "\n",
       "   node2;kgtk:date_and_time node2;kgtk:precision node2;kgtk:truth  \\\n",
       "0                                                                   \n",
       "1                                                                   \n",
       "2                                                                   \n",
       "3                                                                   \n",
       "4                                                                   \n",
       "5                                                                   \n",
       "6                                                                   \n",
       "7                                                                   \n",
       "8                                                                   \n",
       "9    \"1992-03-30T00:00:00Z\"                   11                    \n",
       "10                                                                  \n",
       "11                                                                  \n",
       "12                                                                  \n",
       "13                                                                  \n",
       "14                                                                  \n",
       "15   \"1984-10-26T00:00:00Z\"                   11                    \n",
       "16                                                                  \n",
       "17   \"1985-02-08T00:00:00Z\"                   11                    \n",
       "18                                                                  \n",
       "19                                                                  \n",
       "20                                                                  \n",
       "\n",
       "             node2;kgtk:symbol  \n",
       "0                               \n",
       "1                         film  \n",
       "2              science_fiction  \n",
       "3                       action  \n",
       "4             a_schwarzenegger  \n",
       "5                   terminator  \n",
       "6                   l_hamilton  \n",
       "7                     s_connor  \n",
       "8   academy-best-sound-editing  \n",
       "9                               \n",
       "10                  g_rydstrom  \n",
       "11                   g_borders  \n",
       "12                              \n",
       "13                              \n",
       "14              visual_artwork  \n",
       "15                              \n",
       "16               united_states  \n",
       "17                              \n",
       "18                      sweden  \n",
       "19                              \n",
       "20                              \n",
       "\n",
       "[21 rows x 21 columns]"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lines = !$kgtk explode -i \"$TEMP\"/movies.ids.tsv\n",
    "kgtk_to_dataframe(lines)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Wikidata in KGTK\n",
    "KGTK has the ability to import a Wikidata JSON dump and covert it to the KGTK representation to make it easy to process the full Wikidata KG in a laptop. There are 86 files which include all the information available in the Wikidata dump and files containing commonly used information derived from the dump. We partitioned the files because in most use cases you only need to use a subset of the files.\n",
    "\n",
    "The files are very large. `claims.tsv` (23GB compressed) contains all the statements in the Wikidata dump, `qualifiers.tsv` contains the qualifiers of those edges, and `labels.en.tsv`, `aliases.en.tsv` and `descriptions.en.tsv` contain the English labels, aliases and descriptions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "-rw-r--r--  1 pedroszekely  staff    32M Jan 24 00:32 /Users/pedroszekely/Downloads/kgtk-tutorial/miniwikidata/aliases.en.tsv.gz\n",
      "-rw-r--r--  1 pedroszekely  staff   1.7G Jan 24 00:30 /Users/pedroszekely/Downloads/kgtk-tutorial/miniwikidata/claims.tsv.gz\n",
      "-rw-r--r--  1 pedroszekely  staff   122M Jan 24 00:33 /Users/pedroszekely/Downloads/kgtk-tutorial/miniwikidata/descriptions.en.tsv.gz\n",
      "-rw-r--r--  1 pedroszekely  staff   167M Jan 24 00:35 /Users/pedroszekely/Downloads/kgtk-tutorial/miniwikidata/labels.en.tsv.gz\n",
      "-rw-r--r--  1 pedroszekely  staff   264M Jan 24 00:32 /Users/pedroszekely/Downloads/kgtk-tutorial/miniwikidata/qualifiers.tsv.gz\n"
     ]
    }
   ],
   "source": [
    "!ls -lh \"$CLAIMS\" \"$QUALIFIERS\" \"$LABEL\" \"$ALIAS\" \"$DESCRIPTION\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`claims.tsv` contains many edges:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " 94123796 587802328 7562639743\n"
     ]
    }
   ],
   "source": [
    "!zcat < \"$CLAIMS\" | wc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# KGTK Data Model\n",
    "The KGTK data model is a generalization of RDF and property graphs, inspired by the Wikidata data model. In KGTK, a KG is represented using TSV files with four columns: three columns to store the subject, predicate and object of a triple, and a fourth column to store an identifier for the triple. By convention, we use the heading `id` for the identifier, `node1` for the subject, `node2` for the object and `label` for the predicate, as it labels the edge between `node1` and `node2`. The order of the columns is arbitrary.\n",
    "\n",
    "All KGTK files must include the required `id`, `node1`, `label` and `node2` columns, and can contain additional columns to store addtional information about an edge or the nodes in the edge. We will explain the details after we discuss *qualifiers*.\n",
    "Let's take a look at the first few lines of the `claims.tsv` file. We see the four required columns and two additional columns that the Wikidata import includes to facilitate processing of the `claims` file using custom scripts. The `rank` column records the Wikidata rank of a statement, and the `node2;wikidatatype` records the Wikidata type of the value in the `node2` column."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Claims"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "zcat: error writing to output: Broken pipe\n",
      "id                              node1  label  node2                                    node2;wikidatatype  rank\n",
      "P10-P1628-32b85d-7927ece6-0     P10    P1628  \"http://www.w3.org/2006/vcard/ns#Video\"  url                 normal\n",
      "P10-P1628-acf60d-b8950832-0     P10    P1628  \"https://schema.org/video\"               url                 normal\n",
      "P10-P1629-Q34508-bcc39400-0     P10    P1629  Q34508                                   wikibase-item       normal\n",
      "P10-P1659-P1651-c4068028-0      P10    P1659  P1651                                    wikibase-property   normal\n",
      "P10-P1659-P18-5e4b9c4f-0        P10    P1659  P18                                      wikibase-property   normal\n",
      "P10-P1659-P4238-d21d1ac0-0      P10    P1659  P4238                                    wikibase-property   normal\n",
      "P10-P1659-P51-86aca4c5-0        P10    P1659  P51                                      wikibase-property   normal\n",
      "P10-P1855-Q7378-555592a4-0      P10    P1855  Q7378                                    wikibase-item       normal\n",
      "P10-P2302-Q21502404-d012aef4-0  P10    P2302  Q21502404                                wikibase-item       normal\n"
     ]
    }
   ],
   "source": [
    "!zcat < \"$CLAIMS\" | head | column -t -s $'\\t'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Wikidata uses numbers to identify items and properties. We can use the `wd` utility (https://github.com/maxlath/wikibase-cli) to understand the first few lines. The second line states that the `P10` property in Wikidata has an equivalent property in another ontology. Notice that each edge has a distinct id. These ids are unique identifiers for statements (the format of the id can be arbitrary, but we assigned ids so that sorting files by id arranges the information so that all edges about a subject are consecutive."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[90mid\u001b[39m P10\n",
      "\u001b[42mLabel\u001b[49m video\n",
      "\u001b[44mDescription\u001b[49m relevant video. For images, use the property P18. For film trailers, qualify with \"object has role\" (P3831)=\"trailer\" (Q622550)\n",
      "\u001b[30m\u001b[47minstance of\u001b[49m\u001b[39m \u001b[90m(P31)\u001b[39m\u001b[90m: \u001b[39mWikidata property to link to Commons \u001b[90m(Q18610173)\u001b[39m\n",
      "\n",
      "\u001b[90mid\u001b[39m P1628\n",
      "\u001b[42mLabel\u001b[49m equivalent property\n",
      "\u001b[44mDescription\u001b[49m equivalent property in other ontologies (use in statements on properties, use property URI)\n",
      "\u001b[30m\u001b[47minstance of\u001b[49m\u001b[39m \u001b[90m(P31)\u001b[39m\u001b[90m: \u001b[39mWikidata metaproperty for ontology mapping \u001b[90m(Q42842547)\u001b[39m\n",
      "\n",
      "\u001b[90mid\u001b[39m P1629\n",
      "\u001b[42mLabel\u001b[49m subject item of this property\n",
      "\u001b[44mDescription\u001b[49m relationship represented by the property\n",
      "\u001b[30m\u001b[47minstance of\u001b[49m\u001b[39m \u001b[90m(P31)\u001b[39m\u001b[90m: \u001b[39mWikidata property for property documentation \u001b[90m(Q19820110)\u001b[39m\n"
     ]
    }
   ],
   "source": [
    "!wd u P10 P1628 P1629"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's look at a more meaningful example. `Q31` (https://www.wikidata.org/wiki/Q31) is the Wikidata item about Belgium. We will use the KGTK query to fetch edges about Belgium. `$kypher` is a shortcut to the `kgtk query` command where in addition we pass in the location of the SQLite database we are using ot store the files. KGTK queries use Cypher syntax (https://neo4j.com/developer/cypher/): the following simple query retrieves 10 edges where `node1` is `Q31`, the q-node for Belgium. The results include an edge with `label` `P1036` (Dewey Decimal Classification) and several edges with label `P1081` (human development index).\n",
    "\n",
    " **Note:** We are using the `--as` options in `kgtk query` to set an alias for the `$CLAIMS` file. This alias can be used in the subsequent `kgtk query` commands."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>node1</th>\n",
       "      <th>label</th>\n",
       "      <th>node2</th>\n",
       "      <th>node2;wikidatatype</th>\n",
       "      <th>rank</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Q31-P1036-c4e1ad-df86eeb8-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1036</td>\n",
       "      <td>\"2--493\"</td>\n",
       "      <td>external-id</td>\n",
       "      <td>normal</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Q31-P1081-02c2ed-033524b0-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1081</td>\n",
       "      <td>+0.866</td>\n",
       "      <td>quantity</td>\n",
       "      <td>normal</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Q31-P1081-02c2ed-7971505b-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1081</td>\n",
       "      <td>+0.866</td>\n",
       "      <td>quantity</td>\n",
       "      <td>normal</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Q31-P1081-068470-c1c63b8d-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1081</td>\n",
       "      <td>+0.889</td>\n",
       "      <td>quantity</td>\n",
       "      <td>normal</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Q31-P1081-068470-ddac01e0-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1081</td>\n",
       "      <td>+0.889</td>\n",
       "      <td>quantity</td>\n",
       "      <td>normal</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Q31-P1081-144738-c1851cdc-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1081</td>\n",
       "      <td>+0.905</td>\n",
       "      <td>quantity</td>\n",
       "      <td>normal</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Q31-P1081-175742-c07ac1c8-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1081</td>\n",
       "      <td>+0.888</td>\n",
       "      <td>quantity</td>\n",
       "      <td>normal</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Q31-P1081-19636d-c08dd8a8-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1081</td>\n",
       "      <td>+0.896</td>\n",
       "      <td>quantity</td>\n",
       "      <td>normal</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Q31-P1081-1efc03-433a7a4d-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1081</td>\n",
       "      <td>+0.913</td>\n",
       "      <td>quantity</td>\n",
       "      <td>normal</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Q31-P1081-1f8602-ddac530d-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1081</td>\n",
       "      <td>+0.852</td>\n",
       "      <td>quantity</td>\n",
       "      <td>normal</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                            id node1  label     node2 node2;wikidatatype  \\\n",
       "0  Q31-P1036-c4e1ad-df86eeb8-0   Q31  P1036  \"2--493\"        external-id   \n",
       "1  Q31-P1081-02c2ed-033524b0-0   Q31  P1081    +0.866           quantity   \n",
       "2  Q31-P1081-02c2ed-7971505b-0   Q31  P1081    +0.866           quantity   \n",
       "3  Q31-P1081-068470-c1c63b8d-0   Q31  P1081    +0.889           quantity   \n",
       "4  Q31-P1081-068470-ddac01e0-0   Q31  P1081    +0.889           quantity   \n",
       "5  Q31-P1081-144738-c1851cdc-0   Q31  P1081    +0.905           quantity   \n",
       "6  Q31-P1081-175742-c07ac1c8-0   Q31  P1081    +0.888           quantity   \n",
       "7  Q31-P1081-19636d-c08dd8a8-0   Q31  P1081    +0.896           quantity   \n",
       "8  Q31-P1081-1efc03-433a7a4d-0   Q31  P1081    +0.913           quantity   \n",
       "9  Q31-P1081-1f8602-ddac530d-0   Q31  P1081    +0.852           quantity   \n",
       "\n",
       "     rank  \n",
       "0  normal  \n",
       "1  normal  \n",
       "2  normal  \n",
       "3  normal  \n",
       "4  normal  \n",
       "5  normal  \n",
       "6  normal  \n",
       "7  normal  \n",
       "8  normal  \n",
       "9  normal  "
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result = !$kypher -i \"$CLAIMS\" --as \"claims\" \\\n",
    "--match '(:Q31)-[]->()' \\\n",
    "--limit 10 \n",
    "\n",
    "kgtk_to_dataframe(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The output of the command above is hard to read because we are seeing the numeric Wikidata identifiers. To make the output more readable, we need to look up the labels of the Wikidata nodes. This information is in the `labels.en.tsv` file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "id              node1  label  node2                                     node2;wikidatatype  rank\n",
      "P10-label-en    P10    label  'video'@en\n",
      "P1000-label-en  P1000  label  'record held'@en\n",
      "P1001-label-en  P1001  label  'applies to jurisdiction'@en\n",
      "P1002-label-en  P1002  label  'engine configuration'@en\n",
      "P1003-label-en  P1003  label  'National Library of Romania ID'@en\n",
      "P1004-label-en  P1004  label  'MusicBrainz place ID'@en\n",
      "P1005-label-en  P1005  label  'Portuguese National Library ID'@en\n",
      "P1006-label-en  P1006  label  'Nationale Thesaurus voor Auteurs ID'@en\n",
      "P1007-label-en  P1007  label  'Lattes Platform number'@en\n",
      "zcat: error writing to output: Broken pipe\n"
     ]
    }
   ],
   "source": [
    "!zcat < \"$LABEL\" | head | column -t -s $'\\t'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With KGTK accepts multiple files as input, and can do a join to retrieve the label for each property. When using multiple files, it is necessary to tag each clause with the file that provides the data for the clause. For example, the first clause is tagged with `claim` as the word `claim` is part of the file name. The variable property is used to connect the two clauses.\n",
    "\n",
    "**Note:** We user the alias `claims` defined in a previous cell and introduced a new alias for the `$LABEL` file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>node1</th>\n",
       "      <th>label</th>\n",
       "      <th>node2</th>\n",
       "      <th>label;label</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Q31-P1036-c4e1ad-df86eeb8-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1036</td>\n",
       "      <td>\"2--493\"</td>\n",
       "      <td>'Dewey Decimal Classification'@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Q31-P1081-02c2ed-033524b0-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1081</td>\n",
       "      <td>+0.866</td>\n",
       "      <td>'Human Development Index'@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Q31-P1081-02c2ed-7971505b-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1081</td>\n",
       "      <td>+0.866</td>\n",
       "      <td>'Human Development Index'@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Q31-P1081-068470-c1c63b8d-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1081</td>\n",
       "      <td>+0.889</td>\n",
       "      <td>'Human Development Index'@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Q31-P1081-068470-ddac01e0-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1081</td>\n",
       "      <td>+0.889</td>\n",
       "      <td>'Human Development Index'@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Q31-P1081-144738-c1851cdc-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1081</td>\n",
       "      <td>+0.905</td>\n",
       "      <td>'Human Development Index'@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Q31-P1081-175742-c07ac1c8-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1081</td>\n",
       "      <td>+0.888</td>\n",
       "      <td>'Human Development Index'@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Q31-P1081-19636d-c08dd8a8-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1081</td>\n",
       "      <td>+0.896</td>\n",
       "      <td>'Human Development Index'@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Q31-P1081-1efc03-433a7a4d-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1081</td>\n",
       "      <td>+0.913</td>\n",
       "      <td>'Human Development Index'@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Q31-P1081-1f8602-ddac530d-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1081</td>\n",
       "      <td>+0.852</td>\n",
       "      <td>'Human Development Index'@en</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                            id node1  label     node2  \\\n",
       "0  Q31-P1036-c4e1ad-df86eeb8-0   Q31  P1036  \"2--493\"   \n",
       "1  Q31-P1081-02c2ed-033524b0-0   Q31  P1081    +0.866   \n",
       "2  Q31-P1081-02c2ed-7971505b-0   Q31  P1081    +0.866   \n",
       "3  Q31-P1081-068470-c1c63b8d-0   Q31  P1081    +0.889   \n",
       "4  Q31-P1081-068470-ddac01e0-0   Q31  P1081    +0.889   \n",
       "5  Q31-P1081-144738-c1851cdc-0   Q31  P1081    +0.905   \n",
       "6  Q31-P1081-175742-c07ac1c8-0   Q31  P1081    +0.888   \n",
       "7  Q31-P1081-19636d-c08dd8a8-0   Q31  P1081    +0.896   \n",
       "8  Q31-P1081-1efc03-433a7a4d-0   Q31  P1081    +0.913   \n",
       "9  Q31-P1081-1f8602-ddac530d-0   Q31  P1081    +0.852   \n",
       "\n",
       "                         label;label  \n",
       "0  'Dewey Decimal Classification'@en  \n",
       "1       'Human Development Index'@en  \n",
       "2       'Human Development Index'@en  \n",
       "3       'Human Development Index'@en  \n",
       "4       'Human Development Index'@en  \n",
       "5       'Human Development Index'@en  \n",
       "6       'Human Development Index'@en  \n",
       "7       'Human Development Index'@en  \n",
       "8       'Human Development Index'@en  \n",
       "9       'Human Development Index'@en  "
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result = !$kypher -i claims -i \"$LABEL\" --as \"labels\" \\\n",
    "--match 'claim: (n1:Q31)-[l {label: property}]->(n2), label: (property)-[:label]->(property_label)' \\\n",
    "--return 'l as id, n1 as node1, property as label, n2 as node2, property_label as `label;label`' \\\n",
    "--limit 10 \n",
    "\n",
    "kgtk_to_dataframe(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Get all the distinct properties defined for Belgium"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>label</th>\n",
       "      <th>label;label</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>P1036</td>\n",
       "      <td>'Dewey Decimal Classification'@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>P1081</td>\n",
       "      <td>'Human Development Index'@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>P1082</td>\n",
       "      <td>'population'@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>P1151</td>\n",
       "      <td>'topic\\\\'s main Wikimedia portal'@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>P1198</td>\n",
       "      <td>'unemployment rate'@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>205</th>\n",
       "      <td>P949</td>\n",
       "      <td>'National Library of Israel ID'@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>206</th>\n",
       "      <td>P982</td>\n",
       "      <td>'MusicBrainz area ID'@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>207</th>\n",
       "      <td>P984</td>\n",
       "      <td>'IOC country code'@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>208</th>\n",
       "      <td>P989</td>\n",
       "      <td>'spoken text audio'@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>209</th>\n",
       "      <td>P998</td>\n",
       "      <td>'DMOZ ID'@en</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>210 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     label                           label;label\n",
       "0    P1036     'Dewey Decimal Classification'@en\n",
       "1    P1081          'Human Development Index'@en\n",
       "2    P1082                       'population'@en\n",
       "3    P1151  'topic\\\\'s main Wikimedia portal'@en\n",
       "4    P1198                'unemployment rate'@en\n",
       "..     ...                                   ...\n",
       "205   P949    'National Library of Israel ID'@en\n",
       "206   P982              'MusicBrainz area ID'@en\n",
       "207   P984                 'IOC country code'@en\n",
       "208   P989                'spoken text audio'@en\n",
       "209   P998                          'DMOZ ID'@en\n",
       "\n",
       "[210 rows x 2 columns]"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result = !$kypher -i claims -i \"$LABEL\" --as \"labels\" \\\n",
    "--match 'claim: (n1:Q31)-[l {label: property}]->(n2), label: (property)-[:label]->(property_label)' \\\n",
    "--return 'distinct property as label, property_label as `label;label`' \n",
    "\n",
    "kgtk_to_dataframe(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's look at a the classes that Belgium is an instance of, recorded in property `P31`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>node1</th>\n",
       "      <th>label</th>\n",
       "      <th>node2</th>\n",
       "      <th>node2;label</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Q31-P31-Q1250464-7c4e239d-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P31</td>\n",
       "      <td>Q1250464</td>\n",
       "      <td>'realm'@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Q31-P31-Q185441-58d7de2e-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P31</td>\n",
       "      <td>Q185441</td>\n",
       "      <td>'member state of the European Union'@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Q31-P31-Q20181813-8e41ab67-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P31</td>\n",
       "      <td>Q20181813</td>\n",
       "      <td>'colonial power'@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Q31-P31-Q3624078-a1d9d1a3-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P31</td>\n",
       "      <td>Q3624078</td>\n",
       "      <td>'sovereign state'@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Q31-P31-Q43702-0dce2031-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P31</td>\n",
       "      <td>Q43702</td>\n",
       "      <td>'federation'@en</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Q31-P31-Q6256-3422ad69-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P31</td>\n",
       "      <td>Q6256</td>\n",
       "      <td>'country'@en</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                             id node1 label      node2  \\\n",
       "0   Q31-P31-Q1250464-7c4e239d-0   Q31   P31   Q1250464   \n",
       "1    Q31-P31-Q185441-58d7de2e-0   Q31   P31    Q185441   \n",
       "2  Q31-P31-Q20181813-8e41ab67-0   Q31   P31  Q20181813   \n",
       "3   Q31-P31-Q3624078-a1d9d1a3-0   Q31   P31   Q3624078   \n",
       "4     Q31-P31-Q43702-0dce2031-0   Q31   P31     Q43702   \n",
       "5      Q31-P31-Q6256-3422ad69-0   Q31   P31      Q6256   \n",
       "\n",
       "                               node2;label  \n",
       "0                               'realm'@en  \n",
       "1  'member state of the European Union'@en  \n",
       "2                      'colonial power'@en  \n",
       "3                     'sovereign state'@en  \n",
       "4                          'federation'@en  \n",
       "5                             'country'@en  "
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result = !$kypher -i claims -i labels \\\n",
    "--match 'claims: (n1:Q31)-[l:P31]->(n2), labels: (n2)-[:label]->(n2_label)' \\\n",
    "--return 'l as id, n1 as node1, l.label as label, n2 as node2, n2_label as `node2;label`' \\\n",
    "--limit 10 \n",
    "\n",
    "kgtk_to_dataframe(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Get all the values for population"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>node1</th>\n",
       "      <th>label</th>\n",
       "      <th>node2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Q31-P1082-03700d-e9540ac9-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1082</td>\n",
       "      <td>+10136811</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Q31-P1082-04bed1-dfb79a97-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1082</td>\n",
       "      <td>+9772419</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Q31-P1082-09cf36-da068a8a-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1082</td>\n",
       "      <td>+9153489</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Q31-P1082-0d8ab5-e1fa3416-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1082</td>\n",
       "      <td>+9858308</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Q31-P1082-10985f-021cd5f9-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1082</td>\n",
       "      <td>+9618756</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>65</th>\n",
       "      <td>Q31-P1082-ee304f-78930d38-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1082</td>\n",
       "      <td>+9830358</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>66</th>\n",
       "      <td>Q31-P1082-f304d4-5b5295bb-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1082</td>\n",
       "      <td>+9859242</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>67</th>\n",
       "      <td>Q31-P1082-f90107-aedcfbe5-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1082</td>\n",
       "      <td>+10445852</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>68</th>\n",
       "      <td>Q31-P1082-fa9783-4e530113-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1082</td>\n",
       "      <td>+10203008</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>69</th>\n",
       "      <td>Q31-P1082-fb1f82-f3860fe1-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1082</td>\n",
       "      <td>+9646032</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>70 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                             id node1  label      node2\n",
       "0   Q31-P1082-03700d-e9540ac9-0   Q31  P1082  +10136811\n",
       "1   Q31-P1082-04bed1-dfb79a97-0   Q31  P1082   +9772419\n",
       "2   Q31-P1082-09cf36-da068a8a-0   Q31  P1082   +9153489\n",
       "3   Q31-P1082-0d8ab5-e1fa3416-0   Q31  P1082   +9858308\n",
       "4   Q31-P1082-10985f-021cd5f9-0   Q31  P1082   +9618756\n",
       "..                          ...   ...    ...        ...\n",
       "65  Q31-P1082-ee304f-78930d38-0   Q31  P1082   +9830358\n",
       "66  Q31-P1082-f304d4-5b5295bb-0   Q31  P1082   +9859242\n",
       "67  Q31-P1082-f90107-aedcfbe5-0   Q31  P1082  +10445852\n",
       "68  Q31-P1082-fa9783-4e530113-0   Q31  P1082  +10203008\n",
       "69  Q31-P1082-fb1f82-f3860fe1-0   Q31  P1082   +9646032\n",
       "\n",
       "[70 rows x 4 columns]"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result = !$kypher -i claims -i labels \\\n",
    "--match 'claims: (n1:Q31)-[l:P1082]->(n2)' \\\n",
    "--return 'l as id, n1 as node1, l.label as label, n2 as node2' \n",
    "\n",
    "kgtk_to_dataframe(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Qualifiers\n",
    "Qualifiers provide additional information about the claims stated in the edges. For `P1082` the qualifiers tell use the year when the population was measured. The qualifiers can be retrieved using the identifiers of the edges. Let's retrieve the qualifiers associated with the edge for the first population value. To do so, we use the identifier of the edge (`Q31-P1082-03700d-e9540ac9-0`) as `node1` in the `qualifiers.tsv` file. We get one edge, so we know that the population in `1995` was `10136811`. Note that the qualifier edges are the same as any other edge in KGTK, having `id`, `node1`, `label` and `node2` columns:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>node1</th>\n",
       "      <th>label</th>\n",
       "      <th>node2</th>\n",
       "      <th>node2;wikidatatype</th>\n",
       "      <th>rank</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Q31-P1082-03700d-e9540ac9-0-P585-2a74fa-0</td>\n",
       "      <td>Q31-P1082-03700d-e9540ac9-0</td>\n",
       "      <td>P585</td>\n",
       "      <td>^1995-00-00T00:00:00Z/9</td>\n",
       "      <td>time</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                          id                        node1  \\\n",
       "0  Q31-P1082-03700d-e9540ac9-0-P585-2a74fa-0  Q31-P1082-03700d-e9540ac9-0   \n",
       "\n",
       "  label                    node2 node2;wikidatatype rank  \n",
       "0  P585  ^1995-00-00T00:00:00Z/9               time       "
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result = !$kypher -i \"$QUALIFIERS\" --as \"qualifiers\" \\\n",
    "--match '(n1:`Q31-P1082-03700d-e9540ac9-0`)-[l]->(n2)' \n",
    "\n",
    "kgtk_to_dataframe(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's make the qualifier edge more readable by retrieving the label of the property: the following query combines the patterns of the previous two queries to retrieve the labels of the property and node2. The query omits the identifier of the qualifier edges to save space. Also, the headers of the two additional columns can be arbitrary, i.e., you can name them whatever you want; the names used follow a KGTK convention that enabled KGTK to automatically parse the output, which is useful if we want to use the output as an input to another KGTK command. The word before the `;` refers to one of the standard columns, and the name after the `;` refers to a property of that element. In this example, we used `label` as the column contains the label of the entity."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "node1                        label  node2                    label;label\n",
      "Q31-P1082-03700d-e9540ac9-0  P585   ^1995-00-00T00:00:00Z/9  'point in time'@en\n"
     ]
    }
   ],
   "source": [
    "!$kypher -i qualifiers -i labels \\\n",
    "--match 'qual: (n1:`Q31-P1082-03700d-e9540ac9-0`)-[l {label: property}]->(n2), labels: (property)-[:label]->(property_label)' \\\n",
    "--return 'n1 as node1, property as label, n2 as node2, property_label as `label;label`' \\\n",
    "--limit 10 \\\n",
    "| column -t -s $'\\t'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's put all the values of `P1082` in a file, which we will conveniently name `Q31.P1082.tsv`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [],
   "source": [
    "!$kypher -i claims \\\n",
    "--match '(n1:Q31)-[l:P1082]->(n2)' \\\n",
    "--return 'l as id, n1 as node1, l.label as label, n2 as node2' \\\n",
    "-o \"$TEMP\"/Q31.P1082.tsv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we are going to combine the `P1082` edges of Belgium with the qualifiers. To do this we will run a query that uses the edges that we stored in `Q31.P1082.tsv`, and retrieve the qualifiers for each of those edges; the result of our query will be the qualifier edges of the head of state edges. To union the qualifier edges with the claim edges, we feed the output of the query to the `cat` command (concatenate), and then feed the output to the `sort2` command to sort the edges. The first 12 edges are shown below. We see a claim edge followed by the qualifiers defined for it.\n",
    "\n",
    "This snippet illustrates that KGTK commands can be chained using the `/` chain operator to compose more complex workflows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>node1</th>\n",
       "      <th>label</th>\n",
       "      <th>node2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Q31-P1082-03700d-e9540ac9-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1082</td>\n",
       "      <td>+10136811</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Q31-P1082-03700d-e9540ac9-0-P585-2a74fa-0</td>\n",
       "      <td>Q31-P1082-03700d-e9540ac9-0</td>\n",
       "      <td>P585</td>\n",
       "      <td>^1995-00-00T00:00:00Z/9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Q31-P1082-04bed1-dfb79a97-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1082</td>\n",
       "      <td>+9772419</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Q31-P1082-04bed1-dfb79a97-0-P585-271261-0</td>\n",
       "      <td>Q31-P1082-04bed1-dfb79a97-0</td>\n",
       "      <td>P585</td>\n",
       "      <td>^1974-00-00T00:00:00Z/9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Q31-P1082-09cf36-da068a8a-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1082</td>\n",
       "      <td>+9153489</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>135</th>\n",
       "      <td>Q31-P1082-f90107-aedcfbe5-0-P585-cab8cf-0</td>\n",
       "      <td>Q31-P1082-f90107-aedcfbe5-0</td>\n",
       "      <td>P585</td>\n",
       "      <td>^2005-01-01T00:00:00Z/11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>136</th>\n",
       "      <td>Q31-P1082-fa9783-4e530113-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1082</td>\n",
       "      <td>+10203008</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>137</th>\n",
       "      <td>Q31-P1082-fa9783-4e530113-0-P585-12d4de-0</td>\n",
       "      <td>Q31-P1082-fa9783-4e530113-0</td>\n",
       "      <td>P585</td>\n",
       "      <td>^1998-00-00T00:00:00Z/9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>138</th>\n",
       "      <td>Q31-P1082-fb1f82-f3860fe1-0</td>\n",
       "      <td>Q31</td>\n",
       "      <td>P1082</td>\n",
       "      <td>+9646032</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>139</th>\n",
       "      <td>Q31-P1082-fb1f82-f3860fe1-0-P585-87910b-0</td>\n",
       "      <td>Q31-P1082-fb1f82-f3860fe1-0</td>\n",
       "      <td>P585</td>\n",
       "      <td>^1969-00-00T00:00:00Z/9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>140 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                            id                        node1  \\\n",
       "0                  Q31-P1082-03700d-e9540ac9-0                          Q31   \n",
       "1    Q31-P1082-03700d-e9540ac9-0-P585-2a74fa-0  Q31-P1082-03700d-e9540ac9-0   \n",
       "2                  Q31-P1082-04bed1-dfb79a97-0                          Q31   \n",
       "3    Q31-P1082-04bed1-dfb79a97-0-P585-271261-0  Q31-P1082-04bed1-dfb79a97-0   \n",
       "4                  Q31-P1082-09cf36-da068a8a-0                          Q31   \n",
       "..                                         ...                          ...   \n",
       "135  Q31-P1082-f90107-aedcfbe5-0-P585-cab8cf-0  Q31-P1082-f90107-aedcfbe5-0   \n",
       "136                Q31-P1082-fa9783-4e530113-0                          Q31   \n",
       "137  Q31-P1082-fa9783-4e530113-0-P585-12d4de-0  Q31-P1082-fa9783-4e530113-0   \n",
       "138                Q31-P1082-fb1f82-f3860fe1-0                          Q31   \n",
       "139  Q31-P1082-fb1f82-f3860fe1-0-P585-87910b-0  Q31-P1082-fb1f82-f3860fe1-0   \n",
       "\n",
       "     label                     node2  \n",
       "0    P1082                 +10136811  \n",
       "1     P585   ^1995-00-00T00:00:00Z/9  \n",
       "2    P1082                  +9772419  \n",
       "3     P585   ^1974-00-00T00:00:00Z/9  \n",
       "4    P1082                  +9153489  \n",
       "..     ...                       ...  \n",
       "135   P585  ^2005-01-01T00:00:00Z/11  \n",
       "136  P1082                 +10203008  \n",
       "137   P585   ^1998-00-00T00:00:00Z/9  \n",
       "138  P1082                  +9646032  \n",
       "139   P585   ^1969-00-00T00:00:00Z/9  \n",
       "\n",
       "[140 rows x 4 columns]"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result = !$kypher -i qualifiers -i \"$TEMP\"/Q31.P1082.tsv \\\n",
    "--match 'P1082: ()-[l]->(), qual: (l)-[lq]->(n2)' \\\n",
    "--return 'lq as id, l as node1, lq.label as label, n2 as node2' \\\n",
    "/ cat -i - -i \"$TEMP\"/Q31.P1082.tsv \\\n",
    "/ sort2 \n",
    "\n",
    "kgtk_to_dataframe(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "- KGTK represents graphs in TSV files with standard columns `id`, `node1`, `label` and `node2`\n",
    "- It is possible to include arbitrary additional columns in KGTK files\n",
    "- The identifier of an edge can be used as a node in another edge enabling the representation of edges about edges\n",
    "- KGTK provides a powerful query command based on Cypher as well as a host of other commands, type `kgtk --help` to see the list of commands."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "kgtk-env",
   "language": "python",
   "name": "kgtk-env"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}