{
 "metadata": {
  "name": "",
  "signature": "sha256:e749b1873c002bae959edf28afa7a4800de17e6ca2f4bc2726d1da26c621c246"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<a href=\"http://laf-fabric.readthedocs.org/en/latest/\" target=\"_blank\"><img align=\"left\" src=\"images/laf-fabric-xsmall.png\"/></a>\n",
      "<a href=\"http://www.godgeleerdheid.vu.nl/etcbc\" target=\"_blank\"><img align=\"left\" src=\"images/VU-ETCBC-xsmall.png\"/></a>\n",
      "<a href=\"http://www.persistent-identifier.nl/?identifier=urn%3Anbn%3Anl%3Aui%3A13-048i-71\" target=\"_blank\"><img align=\"left\"src=\"images/etcbc4easy-small.png\"/></a>\n",
      "<a href=\"http://tla.mpi.nl\" target=\"_blank\"><img align=\"right\" src=\"images/TLA-xsmall.png\"/></a>\n",
      "<a href=\"http://www.dans.knaw.nl\" target=\"_blank\"><img align=\"right\"src=\"images/DANS-xsmall.png\"/></a>"
     ]
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Trees - the smooth path"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We show the embedding of nodes annotated as sentences, clauses, phrases, subphrases and words.\n",
      "We put them in a format (eventually) such that they can be read by TGREP.\n",
      "Then Rens Bod and Andreas van Cranenburgh can do interesting business with it.\n",
      "\n",
      "<span style=\"color: red; font-weight: bold\">\n",
      "The method of tree construction has improved significantly since the writing of this notebook\n",
      "</span>\n",
      "\n",
      "* we use more information from the ETCBC database\n",
      "* more sanity checks have been done\n",
      "* there is an attempt to define the steps from ETCBC data to trees in a precise manner.\n",
      "\n",
      "<span style=\"color: red; font-weight: bold\">See the notebook [trees_etcbc4](http://nbviewer.ipython.org/github/ETCBC/laf-fabric-nbs/blob/master/trees_etcbc4.ipynb).\n",
      "</span>\n",
      "\n",
      "This notebook (**trees**) is preserved because a DOP parser by Andreas van Cranenburgh has been based on its output.\n",
      "\n",
      "Method\n",
      "======\n",
      "\n",
      "We walk through all words and follow them upwards, along parents edges until there are no more outgoing edges.\n",
      "We then have the starting points for our sentences.\n",
      "\n",
      "We walk through the starting points, and for each starting point we assemble the tree hanging off that point.\n",
      "This we do by walking the parents edges in the opposite direction.\n",
      "\n",
      "We use the monad numbers (word numbers) to maintain word order.\n",
      "\n",
      "* parents links from words to phrases to clauses to sentences\n",
      "* word numbers\n",
      "\n",
      "More details will follow below, when we deal with them."
     ]
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Starting LAF-Fabric"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import sys\n",
      "import collections\n",
      "import laf\n",
      "from laf.fabric import LafFabric\n",
      "from etcbc.preprocess import prepare\n",
      "fabric = LafFabric()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "  0.00s This is LAF-Fabric 4.2.8\n",
        "http://laf-fabric.readthedocs.org/texts/API-reference.html\n"
       ]
      }
     ],
     "prompt_number": 1
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Declaring the features"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We use the features in our data source for establishing the trees.\n",
      "This is what we need:\n",
      "\n",
      "db.otype\n",
      "--------\n",
      "The type of a node: word, phrase, sentence, etc\n",
      "\n",
      "parents\n",
      "-------\n",
      "This is a feature by which we can identify the edges that correspond to the *parents* relationship.\n",
      "*parents* goes from lower level to higher level (word => ... => sentence).\n",
      "\n",
      "**N.B.** There are two linguistic hierarchies interwoven in this database.\n",
      "Sometimes nodes have more than one parent!\n",
      "\n",
      "ft.text_plain\n",
      "-------------\n",
      "The unvocalized text of a word.\n",
      "\n",
      "ft.part_of_speech\n",
      "-----------------\n",
      "The part of speech of a word.\n",
      "\n",
      "sft.verse-label\n",
      "---------------\n",
      "Passage information: book, chapter, verse all in one feature.\n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "fabric.load('bhs3', '--', 'trees', {\n",
      "    \"xmlids\": {\"node\": False, \"edge\": False},\n",
      "    \"features\": ('''\n",
      "        otype monads\n",
      "        text_plain\n",
      "        part_of_speech\n",
      "        clause_constituent_relation phrase_type\n",
      "        verse_label\n",
      "    ''','''\n",
      "        parents.\n",
      "    '''),\n",
      "    \"prepare\": prepare,\n",
      "}, verbose='NORMAL')\n",
      "exec(fabric.localnames.format(var='fabric'))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "  0.00s LOADING API: please wait ... \n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "  0.00s INFO: DATA COMPILED AT: 2014-04-18T18-24-37\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "  5.06s LOGFILE=/Users/dirk/laf-fabric-data/bhs3/tasks/trees/__log__trees.txt\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "  5.76s INFO: DATA LOADED FROM SOURCE bhs3 AND ANNOX -- FOR TASK trees\n"
       ]
      }
     ],
     "prompt_number": 2
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Configuration"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Here we define the formatting of the trees.\n",
      "\n",
      "Relevant nodes\n",
      "--------------\n",
      "Not all nodes will be shown in the output.\n",
      "The nodes that are shown, have abbreviated names.\n",
      "Nodes with ``True`` will be shown, nodes with ``False`` will be suppressed.\n",
      "\n",
      "Suppressing a node leaves its children in place. Another way of looking at it, is: we replace a node by its children.\n",
      "\n",
      "Exception: when a node is visited twice, the second visit refers to the tree built by the first visit.\n",
      "In that case, we do not suppress the node.\n",
      "\n",
      "**N.B.** It turns out that the ``-atom`` nodes are never visited twice.\n",
      "\n",
      "pos_table\n",
      "---------\n",
      "We abbreviate the part-of-speech tags. \n",
      "We include the pos-info by inserting a unary node right above each word."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": true,
     "input": [
      "relevant_nodes = [\n",
      "    (\"word\", '', True),\n",
      "    (\"subphrase\", 'SU', True),\n",
      "    (\"phrase_atom\", 'Pa', False),\n",
      "    (\"phrase\", 'P', True),\n",
      "    (\"clause_atom\", 'Ca', False),\n",
      "    (\"clause\", 'C', True),\n",
      "    (\"sentence_atom\", 'Sa', False),\n",
      "    (\"sentence\", 'S', True),\n",
      "]\n",
      "\n",
      "pos_table = {\n",
      " 'adjective': 'aj',\n",
      " 'adverb': 'av',\n",
      " 'article': 'dt',\n",
      " 'conjunction': 'cj',\n",
      " 'interjection': 'ij',\n",
      " 'interrogative': 'ir',\n",
      " 'negative': 'ng',\n",
      " 'noun': 'n',\n",
      " 'preposition': 'pp',\n",
      " 'pronoun': 'pr',\n",
      " 'verb': 'vb',\n",
      "}\n",
      "\n",
      "select_node = set()\n",
      "select_tag = collections.defaultdict(lambda: None)\n",
      "abbrev_node = collections.defaultdict(lambda: None)\n",
      "\n",
      "for (otype, abb, relevant) in relevant_nodes:\n",
      "    if relevant:\n",
      "        select_node.add(abb)\n",
      "    abbrev_node[otype] = abb if abb != None else otype"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 3
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Exploration"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Currently it is not clear to me how to use the *mother* edges for the syntax trees.\n",
      "\n",
      "The *phrases* are a bit undifferentiated, and the clauses to.\n",
      "\n",
      "We could add the *clause_constituent_relation* to the syntax trees, and also the *phrase_type* and *phrase_function*.\n",
      "\n",
      "The *clause_constituent_relation* is explored in the notebook *clause_constituent_relation.ipynb*, and the phrase features in the notebook *phrase_typology*."
     ]
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Find the top nodes"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We walk through the words.\n",
      "From each word we walk along the parent edges until we cannot get further.\n",
      "All end points are top nodes. \n",
      "We put the end nodes in a set.\n",
      "We join all sets of end nodes that we have found above each word.\n",
      "\n",
      "**N.B.** In this way we encounter the top nodes many times, but it does not matter,\n",
      "because we put them all in a set, without duplicates."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "msg(\"Looking for top nodes\")\n",
      "top_node_types = collections.defaultdict(lambda: 0)\n",
      "top_nodes = set(C.parents_.endnodes(NN(test=F.otype.v, value='word')))\n",
      "\n",
      "msg(\"Top nodes found: {}\".format(len(top_nodes)))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "  8.43s Looking for top nodes\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "    14s Top nodes found: 71354\n"
       ]
      }
     ],
     "prompt_number": 4
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Checking"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Let us see what the types are of all the top nodes we have found.\n",
      "\n",
      "We would like to see that they are all sentences.\n",
      "\n",
      "And are all sentences top nodes?"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "top_node_types = collections.defaultdict(lambda: 0)\n",
      "\n",
      "msg(\"Looking up tags for topnodes\")\n",
      "for node in NN(nodes=top_nodes):\n",
      "    tag = abbrev_node[F.otype.v(node)]\n",
      "    top_node_types[tag] += 1\n",
      "\n",
      "for (otype, tag, relevant) in relevant_nodes:\n",
      "    if top_node_types[tag]:\n",
      "        msg(\"{:<2} {} x at the top\".format(tag, top_node_types[tag]))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "    17s Looking up tags for topnodes\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "    17s S  71354 x at the top\n"
       ]
      }
     ],
     "prompt_number": 5
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "msg(\"Non top nodes of type S ... \")\n",
      "nt = 0\n",
      "for node in NN():\n",
      "    if  F.otype.v(node) == \"sentence\" and C.parents_.e(node):\n",
      "        msg(\"{} \".format(node), newline=False, withtime=False)\n",
      "        nt += 1\n",
      "        break\n",
      "msg(\"Non top nodes of type S found: {}\".format(nt))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "    20s Non top nodes of type S ... \n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "    22s Non top nodes of type S found: 0\n"
       ]
      }
     ],
     "prompt_number": 6
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Serializing"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "For each top node, we serialize its tree along the inverse parents edges.\n",
      "There are confluences, meaning that sometimes one node has two parents.\n",
      "We detect that, and the second time we reach a node, we output a token that references to the tree we constructed in the first visit to that node.\n",
      "\n",
      "We cannot write the string immediately, because after tree creation we want to renumber referenced trees and word occurrences.\n",
      "\n",
      "Output format\n",
      "-------------\n",
      "We output the trees in the order as their texts occur in the Hebrew bible.\n",
      "The output is a text file, and every line corresponds to a exactly one tree.\n",
      "\n",
      "Every line has three tab-separated fields.\n",
      "\n",
      "* passage label\n",
      "* tree structure with placeholders for the words\n",
      "* word sequence (linking words to place holders). This sequence corresponds to the order as found in the text."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "trees = outfile(\"trees-simple.txt\")\n",
      "\n",
      "nodes_seen = set()\n",
      "words = []\n",
      "sequential = []\n",
      "\n",
      "def write_tree(node):\n",
      "    \n",
      "    if node in nodes_seen:\n",
      "        return\n",
      "    \n",
      "    nodes_seen.add(node)\n",
      "\n",
      "    otype = F.otype.v(node)\n",
      "    tag = abbrev_node[otype]\n",
      "    relevant = tag in select_node\n",
      "\n",
      "    if tag == 'C':\n",
      "        crr = F.clause_constituent_relation.v(node)\n",
      "        if crr != 'none':\n",
      "            tag = crr\n",
      "    elif tag == 'P':\n",
      "        tag = F.phrase_type.v(node)\n",
      "    is_word = otype == 'word'\n",
      "    if is_word:\n",
      "        text = F.text_plain.v(node)\n",
      "        pos = pos_table[F.part_of_speech.v(node)]\n",
      "        monad = int(F.monads.v(node))\n",
      "        sequential.append((\"W\", len(words)))\n",
      "        words.append((monad, text, pos))\n",
      "    else:\n",
      "        sequential.append((\"O\" if relevant else \"N\", tag))\n",
      "    \n",
      "    for child in Ci.parents_.v(node, sort=True):\n",
      "        write_tree(child)\n",
      "    \n",
      "    if not is_word:\n",
      "        sequential.append((\"C\" if relevant else \"N\", tag))\n",
      "\n",
      "def do_sequential():\n",
      "    word_perm = {}\n",
      "    new_words = sorted(enumerate(words), key=lambda x: x[1][0])\n",
      "    word_reps = []\n",
      "    for (nn, (on, (monad, text, pos))) in enumerate(new_words):\n",
      "        word_perm[on] = nn\n",
      "        word_reps.append(text)\n",
      "    word_rep = ' '.join(word_reps)\n",
      "                    \n",
      "    for (code, info) in sequential:\n",
      "        if code == 'O' or code == 'C':\n",
      "            if code == 'O':\n",
      "                trees.write('({}'.format(info))\n",
      "            else:\n",
      "                trees.write(')')\n",
      "        elif code == 'W':\n",
      "            nn = word_perm[info]\n",
      "            pos = words[info][2]\n",
      "            trees.write('({} {})'.format(pos, nn))\n",
      "    \n",
      "    trees.write(\"\\t{}\".format(word_rep))\n",
      "    \n",
      "msg(\"Writing trees ...\")\n",
      "verse_label = ''\n",
      "\n",
      "s = 0\n",
      "chunk = 10000\n",
      "sc = 0\n",
      "\n",
      "msg(\"making nodeset\")\n",
      "msg(\"processing topnodes\")\n",
      "for node in NN(nodes=top_nodes | set(NN(test=F.otype.v, value='verse'))):\n",
      "    otype = F.otype.v(node)\n",
      "    if  otype == 'verse':\n",
      "        verse_label = F.verse_label.v(node)\n",
      "        continue\n",
      "    nodes_seen = set()\n",
      "    sequential = []\n",
      "    words = []\n",
      "    write_tree(node)\n",
      "    do_sequential()\n",
      "    trees.write(\"\\t{}\\n\".format(verse_label))\n",
      "    s += 1\n",
      "    sc += 1\n",
      "    if sc == chunk:\n",
      "        msg(\"{} trees written\".format(s))\n",
      "        sc = 0\n",
      "    \n",
      "msg(\"{} trees written\".format(s))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        " 1m 13s Writing trees ...\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        " 1m 13s making nodeset\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        " 1m 13s processing topnodes\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        " 1m 17s 10000 trees written\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        " 1m 21s 20000 trees written\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        " 1m 24s 30000 trees written\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        " 1m 27s 40000 trees written\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        " 1m 30s 50000 trees written\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        " 1m 32s 60000 trees written\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        " 1m 34s 70000 trees written\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        " 1m 35s 71354 trees written\n"
       ]
      }
     ],
     "prompt_number": 7
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "close()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        " 1m 38s Results directory:\n",
        "/Users/dirk/laf-fabric-data/bhs3/tasks/trees\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "\n",
        ".DS_Store                              6148 Wed Apr 30 16:30:02 2014\n",
        ".trees2014-04-30.txt.swp             110592 Tue May 27 15:52:45 2014\n",
        "__log__trees.txt                        686 Tue May 27 15:54:22 2014\n",
        "coor.txt                              70830 Wed Apr 30 20:57:45 2014\n",
        "depths.txt                           714284 Wed Apr 30 20:57:44 2014\n",
        "objects-deut-new.txt                    966 Fri Apr 25 09:53:16 2014\n",
        "objects-deut-old.txt                    966 Fri Apr 25 09:51:15 2014\n",
        "objects.txt                           15416 Fri Apr 25 09:47:18 2014\n",
        "tgrep_result.txt                    4916101 Wed Apr 30 20:57:37 2014\n",
        "tree_notabene-2014-04-30.txt         111037 Wed Apr 30 20:51:21 2014\n",
        "tree_notabene.txt                    114212 Tue May 27 12:27:23 2014\n",
        "trees-nocoor.txt                      13914 Mon Apr 28 09:59:30 2014\n",
        "trees-nosisters.txt                   13928 Mon Apr 28 09:59:30 2014\n",
        "trees-notransform.txt                   501 Mon Apr 28 09:47:29 2014\n",
        "trees-simple.txt                    8310229 Tue May 27 15:54:22 2014\n",
        "trees-transformed.txt                   371 Mon Apr 28 09:47:43 2014\n",
        "trees.t2c                          12361153 Wed Apr 30 20:57:22 2014\n",
        "trees.txt                          10621788 Tue May 27 12:28:15 2014\n",
        "trees2014-04-30.txt                10626155 Wed Apr 30 20:56:26 2014\n",
        "trees_fixed_20.txt                   150015 Tue May 27 12:27:23 2014\n",
        "trees_random_20-2014-04-30.txt       150015 Wed Apr 30 20:51:21 2014\n",
        "trees_random_20.txt                  108048 Tue May 27 12:27:23 2014\n"
       ]
      }
     ],
     "prompt_number": 8
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Preview"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Here are the first lines of the output."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!head -n 25 {my_file('trees-simple.txt')}"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "(S(C(PP(pp 0)(n 1))(VP(vb 2))(NP(n 3))(PP(SU(pp 4)(dt 5)(n 6))(cj 7)(SU(pp 8)(dt 9)(n 10)))))\t\u05d1 \u05e8\u05d0\u05e9\u05c1\u05d9\u05ea \u05d1\u05e8\u05d0 \u05d0\u05dc\u05d4\u05d9\u05dd \u05d0\u05ea \u05d4 \u05e9\u05c1\u05de\u05d9\u05dd \u05d5 \u05d0\u05ea \u05d4 \u05d0\u05e8\u05e5\t GEN 01,01\r\n",
        "(S(C(CP(cj 0))(NP(dt 1)(n 2))(VP(vb 3))(NP(SU(n 4))(cj 5)(SU(n 6)))))\t\u05d5 \u05d4 \u05d0\u05e8\u05e5 \u05d4\u05d9\u05ea\u05d4 \u05ea\u05d4\u05d5 \u05d5 \u05d1\u05d4\u05d5\t GEN 01,02\r\n",
        "(S(C(CP(cj 0))(NP(n 1))(PP(pp 2)(SU(n 3))(SU(n 4)))))\t\u05d5 \u05d7\u05e9\u05c1\u05da \u05e2\u05dc \u05e4\u05e0\u05d9 \u05ea\u05d4\u05d5\u05dd\t GEN 01,02\r\n",
        "(S(C(CP(cj 0))(NP(SU(n 1))(SU(n 2)))(VP(vb 3))(PP(pp 4)(SU(n 5))(SU(dt 6)(n 7)))))\t\u05d5 \u05e8\u05d5\u05d7 \u05d0\u05dc\u05d4\u05d9\u05dd \u05de\u05e8\u05d7\u05e4\u05ea \u05e2\u05dc \u05e4\u05e0\u05d9 \u05d4 \u05de\u05d9\u05dd\t GEN 01,02\r\n",
        "(S(C(CP(cj 0))(VP(vb 1))(NP(n 2))))\t\u05d5 \u05d9\u05d0\u05de\u05e8 \u05d0\u05dc\u05d4\u05d9\u05dd\t GEN 01,03\r\n",
        "(S(C(VP(vb 0))(NP(n 1))))\t\u05d9\u05d4\u05d9 \u05d0\u05d5\u05e8\t GEN 01,03\r\n",
        "(S(C(CP(cj 0))(VP(vb 1))(NP(n 2))))\t\u05d5 \u05d9\u05d4\u05d9 \u05d0\u05d5\u05e8\t GEN 01,03\r\n",
        "(S(C(CP(cj 0))(VP(vb 1))(NP(n 2))(PP(pp 3)(dt 4)(n 5)))(Objc(CP(cj 6))(VP(vb 7))))\t\u05d5 \u05d9\u05e8\u05d0 \u05d0\u05dc\u05d4\u05d9\u05dd \u05d0\u05ea \u05d4 \u05d0\u05d5\u05e8 \u05db\u05d9 \u05d8\u05d5\u05d1\t GEN 01,04\r\n",
        "(S(C(CP(cj 0))(VP(vb 1))(NP(n 2))(PP(SU(n 3)(dt 4)(n 5))(cj 6)(SU(n 7)(dt 8)(n 9)))))\t\u05d5 \u05d9\u05d1\u05d3\u05dc \u05d0\u05dc\u05d4\u05d9\u05dd \u05d1\u05d9\u05df \u05d4 \u05d0\u05d5\u05e8 \u05d5 \u05d1\u05d9\u05df \u05d4 \u05d7\u05e9\u05c1\u05da\t GEN 01,04\r\n",
        "(S(C(CP(cj 0))(VP(vb 1))(NP(n 2))(PP(pp 3)(dt 4)(n 5))(NP(n 6))))\t\u05d5 \u05d9\u05e7\u05e8\u05d0 \u05d0\u05dc\u05d4\u05d9\u05dd \u05dc  \u05d0\u05d5\u05e8 \u05d9\u05d5\u05dd\t GEN 01,05\r\n",
        "(S(C(CP(cj 0))(PP(pp 1)(dt 2)(n 3))(VP(vb 4))(NP(n 5))))\t\u05d5 \u05dc  \u05d7\u05e9\u05c1\u05da \u05e7\u05e8\u05d0 \u05dc\u05d9\u05dc\u05d4\t GEN 01,05\r\n",
        "(S(C(CP(cj 0))(VP(vb 1))(NP(n 2))))\t\u05d5 \u05d9\u05d4\u05d9 \u05e2\u05e8\u05d1\t GEN 01,05\r\n",
        "(S(C(CP(cj 0))(VP(vb 1))(NP(n 2))))\t\u05d5 \u05d9\u05d4\u05d9 \u05d1\u05e7\u05e8\t GEN 01,05\r\n",
        "(S(C(NP(SU(n 0))(SU(n 1)))))\t\u05d9\u05d5\u05dd \u05d0\u05d7\u05d3\t GEN 01,05\r\n",
        "(S(C(CP(cj 0))(VP(vb 1))(NP(n 2))))\t\u05d5 \u05d9\u05d0\u05de\u05e8 \u05d0\u05dc\u05d4\u05d9\u05dd\t GEN 01,06\r\n",
        "(S(C(VP(vb 0))(NP(n 1))(PP(pp 2)(SU(n 3))(SU(dt 4)(n 5)))))\t\u05d9\u05d4\u05d9 \u05e8\u05e7\u05d9\u05e2 \u05d1 \u05ea\u05d5\u05da \u05d4 \u05de\u05d9\u05dd\t GEN 01,06\r\n",
        "(S(C(CP(cj 0))(VP(vb 1))(VP(vb 2))(PP(n 3)(n 4)(pp 5)(n 6))))\t\u05d5 \u05d9\u05d4\u05d9 \u05de\u05d1\u05d3\u05d9\u05dc \u05d1\u05d9\u05df \u05de\u05d9\u05dd \u05dc \u05de\u05d9\u05dd\t GEN 01,06\r\n",
        "(S(C(CP(cj 0))(VP(vb 1))(NP(n 2))(PP(pp 3)(dt 4)(n 5))))\t\u05d5 \u05d9\u05e2\u05e9\u05c2 \u05d0\u05dc\u05d4\u05d9\u05dd \u05d0\u05ea \u05d4 \u05e8\u05e7\u05d9\u05e2\t GEN 01,07\r\n",
        "(S(C(CP(cj 0))(VP(vb 1))(PP(n 2)(dt 3)(n 4))(CP(cj 11))(PP(n 12)(dt 13)(n 14)))(Attr(CP(cj 5))(PP(pp 6)(n 7)(pp 8)(dt 9)(n 10)))(Attr(CP(cj 15))(PP(pp 16)(pp 17)(pp 18)(dt 19)(n 20))))\t\u05d5 \u05d9\u05d1\u05d3\u05dc \u05d1\u05d9\u05df \u05d4 \u05de\u05d9\u05dd \u05d0\u05e9\u05c1\u05e8 \u05de \u05ea\u05d7\u05ea \u05dc  \u05e8\u05e7\u05d9\u05e2 \u05d5 \u05d1\u05d9\u05df \u05d4 \u05de\u05d9\u05dd \u05d0\u05e9\u05c1\u05e8 \u05de \u05e2\u05dc \u05dc  \u05e8\u05e7\u05d9\u05e2\t GEN 01,07\r\n",
        "(S(C(CP(cj 0))(VP(vb 1))(AdvP(av 2))))\t\u05d5 \u05d9\u05d4\u05d9 \u05db\u05df\t GEN 01,07\r\n",
        "(S(C(CP(cj 0))(VP(vb 1))(NP(n 2))(PP(pp 3)(dt 4)(n 5))(NP(n 6))))\t\u05d5 \u05d9\u05e7\u05e8\u05d0 \u05d0\u05dc\u05d4\u05d9\u05dd \u05dc  \u05e8\u05e7\u05d9\u05e2 \u05e9\u05c1\u05de\u05d9\u05dd\t GEN 01,08\r\n",
        "(S(C(CP(cj 0))(VP(vb 1))(NP(n 2))))\t\u05d5 \u05d9\u05d4\u05d9 \u05e2\u05e8\u05d1\t GEN 01,08\r\n",
        "(S(C(CP(cj 0))(VP(vb 1))(NP(n 2))))\t\u05d5 \u05d9\u05d4\u05d9 \u05d1\u05e7\u05e8\t GEN 01,08\r\n",
        "(S(C(NP(SU(n 0))(SU(aj 1)))))\t\u05d9\u05d5\u05dd \u05e9\u05c1\u05e0\u05d9\t GEN 01,08\r\n",
        "(S(C(CP(cj 0))(VP(vb 1))(NP(n 2))))\t\u05d5 \u05d9\u05d0\u05de\u05e8 \u05d0\u05dc\u05d4\u05d9\u05dd\t GEN 01,09\r\n"
       ]
      }
     ],
     "prompt_number": 9
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [],
     "language": "python",
     "metadata": {},
     "outputs": []
    }
   ],
   "metadata": {}
  }
 ]
}