{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Structure ins and outs\n",
    "\n",
    "See also the [docs](https://annotation.github.io/text-fabric/Api/Text/#structure)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%autoreload 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "from tf.fabric import Fabric"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "This is Text-Fabric 7.7.2\n",
      "Api reference : https://annotation.github.io/text-fabric/Api/Fabric/\n",
      "\n",
      "10 features found and 0 ignored\n"
     ]
    }
   ],
   "source": [
    "GH_BASE = os.path.expanduser('~/github')\n",
    "ORG = 'annotation'\n",
    "REPO = 'banks'\n",
    "FOLDER = 'tf'\n",
    "TF_DIR = f'{GH_BASE}/{ORG}/{REPO}/{FOLDER}'\n",
    "\n",
    "VERSION = '0.2'\n",
    "\n",
    "TF_PATH = f'{TF_DIR}/{VERSION}'\n",
    "TF = Fabric(locations=TF_PATH)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We ask for a list of all features:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "('author',\n",
       " 'gap',\n",
       " 'letters',\n",
       " 'number',\n",
       " 'otype',\n",
       " 'punc',\n",
       " 'terminator',\n",
       " 'title',\n",
       " 'oslots')"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "allFeatures = TF.explore(silent=True, show=True)\n",
    "loadableFeatures = allFeatures['nodes'] + allFeatures['edges']\n",
    "loadableFeatures"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We load all features:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  0.00s loading features ...\n",
      "   |     0.00s B otype                from /Users/dirk/github/annotation/banks/tf/0.2\n",
      "   |     0.00s B oslots               from /Users/dirk/github/annotation/banks/tf/0.2\n",
      "   |     0.00s B title                from /Users/dirk/github/annotation/banks/tf/0.2\n",
      "   |     0.00s B number               from /Users/dirk/github/annotation/banks/tf/0.2\n",
      "   |     0.00s B letters              from /Users/dirk/github/annotation/banks/tf/0.2\n",
      "   |     0.00s B punc                 from /Users/dirk/github/annotation/banks/tf/0.2\n",
      "   |     0.00s B terminator           from /Users/dirk/github/annotation/banks/tf/0.2\n",
      "   |     0.00s B author               from /Users/dirk/github/annotation/banks/tf/0.2\n",
      "   |     0.00s B gap                  from /Users/dirk/github/annotation/banks/tf/0.2\n",
      "  0.03s All features loaded/computed - for details use loadLog()\n"
     ]
    }
   ],
   "source": [
    "api = TF.load(loadableFeatures, silent=False)\n",
    "T = api.T\n",
    "F = api.F"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Look at the structure definition in the `otext` feature:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'compiler': 'Dirk Roorda',\n",
       " 'fmt:line-default': '{letters:XXX}{terminator} ',\n",
       " 'fmt:line-term': 'line#{terminator} ',\n",
       " 'fmt:text-orig-full': '{letters}{punc} ',\n",
       " 'name': 'Culture quotes from Iain Banks',\n",
       " 'purpose': 'exposition',\n",
       " 'sectionFeatures': 'title,number,number',\n",
       " 'sectionTypes': 'book,chapter,sentence',\n",
       " 'source': 'Good Reads',\n",
       " 'status': 'with for similarities in a separate module',\n",
       " 'structureFeatures': 'title,number,number,number',\n",
       " 'structureTypes': 'book,chapter,sentence,line',\n",
       " 'url': 'https://www.goodreads.com/work/quotes/14366-consider-phlebas',\n",
       " 'version': '0.2',\n",
       " 'writtenBy': 'Text-Fabric',\n",
       " 'dateWritten': '2019-05-13T10:20:06Z'}"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "TF.features['otext'].metaData"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The fields `structureTypes` and `structureFeatures` define the node types that correspond to structural elements\n",
    "and their heading features.\n",
    "\n",
    "But we do not have to ask for the this raw configuration, we can just interrogate the `T` API:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "A heading is a tuple of pairs (node type, feature value)\n",
      "\tof node types and features that have been configured as structural elements\n",
      "These 4 structural elements have been configured\n",
      "\tnode type book       with heading feature title\n",
      "\tnode type chapter    with heading feature number\n",
      "\tnode type sentence   with heading feature number\n",
      "\tnode type line       with heading feature number\n",
      "You can get them as a tuple with T.headings.\n",
      "\n",
      "Structure API:\n",
      "\tT.structure(node=None)       gives the structure below node, or everything if node is None\n",
      "\tT.structurePretty(node=None) prints the structure below node, or everything if node is None\n",
      "\tT.top()                      gives all top-level nodes\n",
      "\tT.up(node)                   gives the (immediate) parent node\n",
      "\tT.down(node)                 gives the (immediate) children nodes\n",
      "\tT.headingFromNode(node)      gives the heading of a node\n",
      "\tT.nodeFromHeading(heading)   gives the node of a heading\n",
      "\tT.ndFromHd                   complete mapping from headings to nodes\n",
      "\tT.hdFromNd                   complete mapping from nodes to headings\n",
      "\tT.hdMult are all headings    with their nodes that occur multiple times\n",
      "\n",
      "There are 18 structural elements in the dataset.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "T.structureInfo()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Print the top-level nodes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(100,)"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "T.top()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Print the heading of the top-level node:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(('book', 'Consider Phlebas'),)"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "top = T.top()[0]\n",
    "T.headingFromNode(top)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Get the node from the heading:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "100"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "T.nodeFromHeading((('book', 'Consider Phlebas'),))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Go a level down:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(101, 102)"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "level2 = T.down(top)\n",
    "level2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "and print their headings:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(('book', 'Consider Phlebas'), ('chapter', 1))\n",
      "(('book', 'Consider Phlebas'), ('chapter', 2))\n"
     ]
    }
   ],
   "source": [
    "for l2 in level2:\n",
    "  print(T.headingFromNode(l2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Go a level up:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "100\n",
      "100\n"
     ]
    }
   ],
   "source": [
    "for l2 in level2:\n",
    "  print(T.up(l2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The complete structure of the corpus as a tuple:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "((100,\n",
       "  ((101,\n",
       "    ((115, ((103, ()), (104, ()), (105, ()), (106, ()))),\n",
       "     (116, ((107, ()), (108, ()), (109, ()))))),\n",
       "   (102, ((117, ((110, ()), (111, ()), (112, ()), (113, ()), (114, ()))),)))),)"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "T.structure()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The structure of the first chapter as a tuple:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(101,\n",
       " ((115, ((103, ()), (104, ()), (105, ()), (106, ()))),\n",
       "  (116, ((107, ()), (108, ()), (109, ())))))"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "T.structure(node=101)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pretty-print the structure of the first chapter:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  chapter:1\n",
      "      sentence:1\n",
      "          line:1\n",
      "          line:2\n",
      "          line:3\n",
      "          line:4\n",
      "      sentence:2\n",
      "          line:6\n",
      "          line:7\n",
      "          line:8\n"
     ]
    }
   ],
   "source": [
    "print(T.structurePretty(node=101))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pretty-print the complete structure:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "    book:Consider Phlebas\n",
      "        chapter:1\n",
      "            sentence:1\n",
      "                line:1\n",
      "                line:2\n",
      "                line:3\n",
      "                line:4\n",
      "            sentence:2\n",
      "                line:6\n",
      "                line:7\n",
      "                line:8\n",
      "        chapter:2\n",
      "            sentence:1\n",
      "                line:1\n",
      "                line:2\n",
      "                line:3\n",
      "                line:4\n",
      "                line:5\n"
     ]
    }
   ],
   "source": [
    "print(T.structurePretty())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pretty-print the complete structure with full headings:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "    book:Consider Phlebas\n",
      "        book:Consider Phlebas-chapter:1\n",
      "            book:Consider Phlebas-chapter:1-sentence:1\n",
      "                book:Consider Phlebas-chapter:1-sentence:1-line:1\n",
      "                book:Consider Phlebas-chapter:1-sentence:1-line:2\n",
      "                book:Consider Phlebas-chapter:1-sentence:1-line:3\n",
      "                book:Consider Phlebas-chapter:1-sentence:1-line:4\n",
      "            book:Consider Phlebas-chapter:1-sentence:2\n",
      "                book:Consider Phlebas-chapter:1-sentence:2-line:6\n",
      "                book:Consider Phlebas-chapter:1-sentence:2-line:7\n",
      "                book:Consider Phlebas-chapter:1-sentence:2-line:8\n",
      "        book:Consider Phlebas-chapter:2\n",
      "            book:Consider Phlebas-chapter:2-sentence:1\n",
      "                book:Consider Phlebas-chapter:2-sentence:1-line:1\n",
      "                book:Consider Phlebas-chapter:2-sentence:1-line:2\n",
      "                book:Consider Phlebas-chapter:2-sentence:1-line:3\n",
      "                book:Consider Phlebas-chapter:2-sentence:1-line:4\n",
      "                book:Consider Phlebas-chapter:2-sentence:1-line:5\n"
     ]
    }
   ],
   "source": [
    "print(T.structurePretty(fullHeading=True))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}