{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Structure ins and outs\n", "\n", "See also the [docs](https://annotation.github.io/text-fabric/Api/Text/#structure)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "from tf.fabric import Fabric" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "This is Text-Fabric 7.7.2\n", "Api reference : https://annotation.github.io/text-fabric/Api/Fabric/\n", "\n", "10 features found and 0 ignored\n" ] } ], "source": [ "GH_BASE = os.path.expanduser('~/github')\n", "ORG = 'annotation'\n", "REPO = 'banks'\n", "FOLDER = 'tf'\n", "TF_DIR = f'{GH_BASE}/{ORG}/{REPO}/{FOLDER}'\n", "\n", "VERSION = '0.2'\n", "\n", "TF_PATH = f'{TF_DIR}/{VERSION}'\n", "TF = Fabric(locations=TF_PATH)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We ask for a list of all features:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "('author',\n", " 'gap',\n", " 'letters',\n", " 'number',\n", " 'otype',\n", " 'punc',\n", " 'terminator',\n", " 'title',\n", " 'oslots')" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "allFeatures = TF.explore(silent=True, show=True)\n", "loadableFeatures = allFeatures['nodes'] + allFeatures['edges']\n", "loadableFeatures" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We load all features:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.00s loading features ...\n", " | 0.00s B otype from /Users/dirk/github/annotation/banks/tf/0.2\n", " | 0.00s B oslots from /Users/dirk/github/annotation/banks/tf/0.2\n", " | 0.00s B title from /Users/dirk/github/annotation/banks/tf/0.2\n", " | 0.00s B number from /Users/dirk/github/annotation/banks/tf/0.2\n", " | 0.00s B letters from /Users/dirk/github/annotation/banks/tf/0.2\n", " | 0.00s B punc from /Users/dirk/github/annotation/banks/tf/0.2\n", " | 0.00s B terminator from /Users/dirk/github/annotation/banks/tf/0.2\n", " | 0.00s B author from /Users/dirk/github/annotation/banks/tf/0.2\n", " | 0.00s B gap from /Users/dirk/github/annotation/banks/tf/0.2\n", " 0.03s All features loaded/computed - for details use loadLog()\n" ] } ], "source": [ "api = TF.load(loadableFeatures, silent=False)\n", "T = api.T\n", "F = api.F" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Look at the structure definition in the `otext` feature:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'compiler': 'Dirk Roorda',\n", " 'fmt:line-default': '{letters:XXX}{terminator} ',\n", " 'fmt:line-term': 'line#{terminator} ',\n", " 'fmt:text-orig-full': '{letters}{punc} ',\n", " 'name': 'Culture quotes from Iain Banks',\n", " 'purpose': 'exposition',\n", " 'sectionFeatures': 'title,number,number',\n", " 'sectionTypes': 'book,chapter,sentence',\n", " 'source': 'Good Reads',\n", " 'status': 'with for similarities in a separate module',\n", " 'structureFeatures': 'title,number,number,number',\n", " 'structureTypes': 'book,chapter,sentence,line',\n", " 'url': 'https://www.goodreads.com/work/quotes/14366-consider-phlebas',\n", " 'version': '0.2',\n", " 'writtenBy': 'Text-Fabric',\n", " 'dateWritten': '2019-05-13T10:20:06Z'}" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "TF.features['otext'].metaData" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The fields `structureTypes` and `structureFeatures` define the node types that correspond to structural elements\n", "and their heading features.\n", "\n", "But we do not have to ask for the this raw configuration, we can just interrogate the `T` API:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A heading is a tuple of pairs (node type, feature value)\n", "\tof node types and features that have been configured as structural elements\n", "These 4 structural elements have been configured\n", "\tnode type book with heading feature title\n", "\tnode type chapter with heading feature number\n", "\tnode type sentence with heading feature number\n", "\tnode type line with heading feature number\n", "You can get them as a tuple with T.headings.\n", "\n", "Structure API:\n", "\tT.structure(node=None) gives the structure below node, or everything if node is None\n", "\tT.structurePretty(node=None) prints the structure below node, or everything if node is None\n", "\tT.top() gives all top-level nodes\n", "\tT.up(node) gives the (immediate) parent node\n", "\tT.down(node) gives the (immediate) children nodes\n", "\tT.headingFromNode(node) gives the heading of a node\n", "\tT.nodeFromHeading(heading) gives the node of a heading\n", "\tT.ndFromHd complete mapping from headings to nodes\n", "\tT.hdFromNd complete mapping from nodes to headings\n", "\tT.hdMult are all headings with their nodes that occur multiple times\n", "\n", "There are 18 structural elements in the dataset.\n", "\n" ] } ], "source": [ "T.structureInfo()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Print the top-level nodes:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(100,)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "T.top()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Print the heading of the top-level node:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(('book', 'Consider Phlebas'),)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "top = T.top()[0]\n", "T.headingFromNode(top)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Get the node from the heading:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "100" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "T.nodeFromHeading((('book', 'Consider Phlebas'),))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Go a level down:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(101, 102)" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "level2 = T.down(top)\n", "level2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and print their headings:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(('book', 'Consider Phlebas'), ('chapter', 1))\n", "(('book', 'Consider Phlebas'), ('chapter', 2))\n" ] } ], "source": [ "for l2 in level2:\n", " print(T.headingFromNode(l2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Go a level up:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "100\n", "100\n" ] } ], "source": [ "for l2 in level2:\n", " print(T.up(l2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The complete structure of the corpus as a tuple:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "((100,\n", " ((101,\n", " ((115, ((103, ()), (104, ()), (105, ()), (106, ()))),\n", " (116, ((107, ()), (108, ()), (109, ()))))),\n", " (102, ((117, ((110, ()), (111, ()), (112, ()), (113, ()), (114, ()))),)))),)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "T.structure()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The structure of the first chapter as a tuple:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(101,\n", " ((115, ((103, ()), (104, ()), (105, ()), (106, ()))),\n", " (116, ((107, ()), (108, ()), (109, ())))))" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "T.structure(node=101)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pretty-print the structure of the first chapter:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " chapter:1\n", " sentence:1\n", " line:1\n", " line:2\n", " line:3\n", " line:4\n", " sentence:2\n", " line:6\n", " line:7\n", " line:8\n" ] } ], "source": [ "print(T.structurePretty(node=101))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pretty-print the complete structure:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " book:Consider Phlebas\n", " chapter:1\n", " sentence:1\n", " line:1\n", " line:2\n", " line:3\n", " line:4\n", " sentence:2\n", " line:6\n", " line:7\n", " line:8\n", " chapter:2\n", " sentence:1\n", " line:1\n", " line:2\n", " line:3\n", " line:4\n", " line:5\n" ] } ], "source": [ "print(T.structurePretty())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pretty-print the complete structure with full headings:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " book:Consider Phlebas\n", " book:Consider Phlebas-chapter:1\n", " book:Consider Phlebas-chapter:1-sentence:1\n", " book:Consider Phlebas-chapter:1-sentence:1-line:1\n", " book:Consider Phlebas-chapter:1-sentence:1-line:2\n", " book:Consider Phlebas-chapter:1-sentence:1-line:3\n", " book:Consider Phlebas-chapter:1-sentence:1-line:4\n", " book:Consider Phlebas-chapter:1-sentence:2\n", " book:Consider Phlebas-chapter:1-sentence:2-line:6\n", " book:Consider Phlebas-chapter:1-sentence:2-line:7\n", " book:Consider Phlebas-chapter:1-sentence:2-line:8\n", " book:Consider Phlebas-chapter:2\n", " book:Consider Phlebas-chapter:2-sentence:1\n", " book:Consider Phlebas-chapter:2-sentence:1-line:1\n", " book:Consider Phlebas-chapter:2-sentence:1-line:2\n", " book:Consider Phlebas-chapter:2-sentence:1-line:3\n", " book:Consider Phlebas-chapter:2-sentence:1-line:4\n", " book:Consider Phlebas-chapter:2-sentence:1-line:5\n" ] } ], "source": [ "print(T.structurePretty(fullHeading=True))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2" } }, "nbformat": 4, "nbformat_minor": 2 }