{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Getting Heads 馃樁\n", "## By Cody Kingham, in collaboration with Christiaan Erwich\n", "\n", "## Problem Description\n", "The ETCBC's BHSA core data does not contain the standard syntax tree format. This also means that syntactic and functional relationships between individual words are not mapped in a transparent or easily accessible way. In some cases, fine-grained relationships are ignored altogether. For example, for a given noun phrase (NP), there is no explicit way of obtaining its head noun (i.e. the noun itself without any modifying elements). This causes numerous problems for research in the realm of semantics. For instance, it is currently very difficult to calculate the complete person, gender, and number (PGN) of a given subject phrase. That is because PGN is stored at the word level only. But this is a very inadequate representation. Phrases in the ETCBC often contain coordinate relationships within the phrase. So even if one selects the first `noun` in the phrase and checks for its PGN value, they may overlook the presence of another noun which makes the phrase plural. Ideally, the phrase itself would have a PGN feature. But before this kind of data is created, it is necessary to separate the head words of a phrase from their modifying elements such as adjectives, determiners, or nouns in construct (genitive) relations.\n", "\n", "A head word can be defined as the word for which a phrase type is named after. Examples of phrase types are `NP` (noun phrase) or `VP` (verb phrase). In this notebook, we experiment with and build the functions stored in `heads.py` in order to export a set of Text-Fabric edge features. The edge features represent a mapping from a phrase node to its head element.\n", "\n", "This goal requires us to think carefully about the way inter-word, semantic relations are reflected in the ETCBC's data. The ETCBC *does* contain some rudimentary semantic embeddings through the so-called [subphrase](https://etcbc.github.io/bhsa/features/hebrew/c/otype). These can be utilized to isolate head words from secondary elements. A subphrase should *not* be thought of as a smaller, embedded phrase, like the ETCBC's phrase-atom (though it sometimes must inadequately fill that role). Rather, the subphrase is a way to encode relationships between words below the level of a phrase(atom), hence \"sub.\" A subphrase can be a single word, or it can be a collection of words. A word can be in multiple subphrases, but can not be in more than 3 (due to the limitations of the data creation program, [parsephrases](http://www.etcbc.nl/datacreation/#ps3.p)).\n", "\n", "## Method\n", "The types of phrases represented in the ETCBC include `NP` (noun phrase), `VP` (verb phrase), `PrNP` (proper noun phrase), `PP` (prepositional phrase), `AdvP` (adverbial phrase), and [eight others](https://etcbc.github.io/bhsa/features/hebrew/c/typ). For some of these types, isolating the head word is a simple affair. By coordinating a word's phrase-dependent part of speech with its enclosing phrase's type, one can identify the head word. For a `VP`, that would mean simply finding the word within the phrase that has a `pdp` (phrase dependent part of speech) value of `verb`. Or for a prepositional phrase, find the word with a `pdp` of `prep`.\n", "\n", "The `NP` and `PrNP`, on the other hand, present special challenges. These phrases often contain multiple words with a modifying relation to the head noun. An example of this is the construct relation (e.g. \"Son of Jacob\"). The problem becomes particularly thorny when relations like the construct are chained together so that one is faced with the choice between multiple potential head nouns.\n", "\n", "To navigate the problem, we must use the feature [`rela` (relationship)](https://etcbc.github.io/bhsa/features/hebrew/c/rela) stored on `subphrase`s in addition to the `pdp` and phrase `type` features. In order to isolate the head word of a `NP`, we look for a word within the phrase that has a `pdp` value of `subs` (i.e. noun). We then obtain a list of all the `subphrase`s which contain that word using the [L.u Text-Fabric method](https://github.com/Dans-labs/text-fabric/wiki/Api#locality). We then use the list of subphrase node numbers to create a list of all subphrase relations containing the word. If the list contains *any* dependent relations, then the word is automatically excluded from being a head word and we can move on to the next candidate. One final check is required for candidate words at the level of the `phrase`: the same procedure described above for `subphrase`s must be performed for `phrase_atom` relations. This means excluding words within a `phrase_atom` with a dependent relation to another `phrase_atom` within the `phrase`. If the head of a *`phrase_atom`* is being calculated, this step is not necessary.\n", "\n", "There are only two possible `subphrase` or `phrase_atom` relations for a valid head word: `NA` or `par`/`Para` (the verb is an exception, which in a handful of cases does have a construct relation). `NA` means that no relation is reflected. The word is independent. The `par` (`subphrase`) and `Para` (`phrase_atom`) stands for parallel relations, i.e. coordinates. While coordinates are not formally the head, they are often an important part of how the grammatical and semantic relations are built. Thus we provide coordinates alongside the head noun. These words require one further test, that is, it must be verified that their mother (using the [edge feature](https://github.com/Dans-labs/text-fabric/wiki/Api#edge-features) \"[mother](https://etcbc.github.io/bhsa/features/hebrew/c/mother.html)\") is itself a head word. To do this step thus requires us to keep track of those words within the phrase which have been validated. We can do so with a simple list.\n", "\n", "## Results\n", "The function `get_heads` produces head word nodes on supplied phrase(atom) nodes. The results have been manually inspected for consistency.\n", "\n", "For phrase types other than the noun phrase, the results are very accurate. Some phrase types, like the conjunction phrase, do have unexpected forms. For instance, the phrase 讘注讘讜专 is coded as a conjunction phrase in the BHSA; in it, there is actually no word with a part of speech of `conjunction`. These kinds of cases are easily accounted for by making exceptions in the set of acceptable parts of speech.\n", "\n", "For noun phrases, the situation is different. In the majority of cases, the results are good. But there are a handful of cases that simply cannot be addressed using the current ETCBC data model without a solution that exceeds the bounds of this current project. The reason is that the current model does not transparently encode hierarchy between phrases and embedded phrases. For instance, both phrase atoms and subphrases have *some* overlapping features. But what is the relationship of a phrase atom to a subphrase? Or, what is the relation of one subphrase to another? These are only coded implicitly in the data. In reality, there are `subphrases` embedded within the ETCBC's subphrases which are not even registered in the BHSA data. While phrase atoms receive type codes, subphrases do not. Yet, subphrases are `phrases` too, which should also have type codes. Another problem is that the precise level of embedding for the subphrases are not provided. Subphrases are presented as equal constituents, even though some subphrases are contained within others. These kinds of problems make a simple method, such as applied here, inadequate. But more importantly, they highlight the shortcomings of the ETCBC data model.\n", "\n", "The members of the ETCBC are aware of the inadequacy of that data model to represent complex phrases, and a change is in the pipeline to address it. However, it remains to be seen how long those changes might take. For now, the functions produced and modified in this NB will suffice to provide a temporary solution for those who require head words from BHSA phrases." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Code Development\n", "\n", "Below we experiment with the code and develop the functions that will extract the head nouns. This involves a good deal of manual inspections of the results before exporting the Text-Fabric features.\n", "\n", "The code is written immediately below. Associated questions that arise while writing or evaluating the code are contained in the subsequent section." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "**Documentation:** BHSA Character table Feature docs BHSA API Text-Fabric API 6.4.6 Search Reference" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Loaded features:\n", "book@ll book chapter function gloss label language lex ls number otype pdp rela sp typ verse voc_lex voc_lex_utf8 vs vt mother oslots
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "This notebook online:\n", "NBViewer\n", "GitHub\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import collections, os, sys, random\n", "from pprint import pprint\n", "from tf.fabric import Fabric\n", "from tf.extra.bhsa import Bhsa\n", "\n", "# load Text-Fabric and data\n", "data_loc = [\"~/github/etcbc/bhsa/tf/c\"]\n", "TF = Fabric(locations=data_loc, silent=True)\n", "api = TF.load(\n", " \"\"\"\n", " book chapter verse\n", " typ pdp rela mother \n", " function lex sp ls\n", " \"\"\",\n", " silent=True,\n", ")\n", "\n", "F, E, T, L = api.F, api.E, api.T, api.L # TF data methods\n", "B = Bhsa(api, name=\"getting_heads\", version=\"c\") # BHSA visualizer" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "lines_to_end_of_cell_marker": 2 }, "outputs": [], "source": [ "def get_heads(phrase):\n", " \"\"\"\n", " Extracts and returns the heads of a supplied\n", " phrase or phrase atom based on that phrase's type\n", " and the relations reflected within the phrase.\n", "\n", " --input--\n", " phrase(atom) node number\n", "\n", " --output--\n", " tuple of head word node(s)\n", " \"\"\"\n", "\n", " # mapping from phrase type to good part of speech values for heads\n", " head_pdps = {\n", " \"VP\": {\"verb\"}, # verb\n", " \"NP\": {\"subs\", \"adjv\", \"nmpr\"}, # noun\n", " \"PrNP\": {\"nmpr\", \"subs\"}, # proper-noun\n", " \"AdvP\": {\"advb\", \"nmpr\", \"subs\"}, # adverbial\n", " \"PP\": {\"prep\"}, # prepositional\n", " \"CP\": {\"conj\", \"prep\"}, # conjunctive\n", " \"PPrP\": {\"prps\"}, # personal pronoun\n", " \"DPrP\": {\"prde\"}, # demonstrative pronoun\n", " \"IPrP\": {\"prin\"}, # interrogative pronoun\n", " \"InjP\": {\"intj\"}, # interjectional\n", " \"NegP\": {\"nega\"}, # negative\n", " \"InrP\": {\"inrg\"}, # interrogative\n", " \"AdjP\": {\"adjv\"}, # adjective\n", " }\n", "\n", " # get phrase-head's part of speech value and list of candidate matches\n", " phrase_type = F.typ.v(phrase)\n", " head_candidates = [\n", " w for w in L.d(phrase, \"word\") if F.pdp.v(w) in head_pdps[phrase_type]\n", " ]\n", "\n", " # VP with verbs require no further processing, return the head verb\n", " if phrase_type == \"VP\":\n", " return tuple(head_candidates)\n", "\n", " # go head-hunting!\n", " heads = []\n", "\n", " for word in head_candidates:\n", "\n", " # gather the word's subphrase (+ phrase_atom if otype is phrase) relations\n", " word_phrases = list(L.u(word, \"subphrase\"))\n", " word_phrases += (\n", " list(L.u(word, \"phrase_atom\"))\n", " if (F.otype.v(phrase) == \"phrase\")\n", " else list()\n", " )\n", " word_relas = set(F.rela.v(phr) for phr in word_phrases) or {\"NA\"}\n", "\n", " # check (sub)phrase relations for independency\n", " if word_relas - {\"NA\", \"par\", \"Para\"}:\n", " continue\n", "\n", " # check parallel relations for independency\n", " elif word_relas & {\"par\", \"Para\"} and mother_is_head(word_phrases, heads):\n", " this_head = find_quantified(word) or find_attributed(word) or word\n", " heads.append(this_head)\n", "\n", " # save all others as heads, check for quantifiers first\n", " elif word_relas == {\"NA\"}:\n", " this_head = find_quantified(word) or find_attributed(word) or word\n", " heads.append(this_head)\n", "\n", " return tuple(sorted(set(heads)))\n", "\n", "\n", "def mother_is_head(word_phrases, previous_heads):\n", "\n", " \"\"\"\n", " Test and validate parallel relationships for independency.\n", " Must gather the mother for each relation and check whether\n", " the mother contains a head word.\n", "\n", " --input--\n", " * list of phrase nodes for a given word (includes subphrases)\n", " * list of previously approved heads\n", "\n", " --output--\n", " boolean\n", " \"\"\"\n", "\n", " # get word's enclosing phrases that are parallel\n", " parallel_phrases = [ph for ph in word_phrases if F.rela.v(ph) in {\"par\", \"Para\"}]\n", " # get the mother for the parallel phrases\n", " parallel_mothers = [E.mother.f(ph)[0] for ph in parallel_phrases]\n", " # get mothers' words, by mother\n", " parallel_mom_words = [set(L.d(mom, \"word\")) for mom in parallel_mothers]\n", " # test for head in each mother\n", " test_mothers = [\n", " bool(phrs_words & set(previous_heads)) for phrs_words in parallel_mom_words\n", " ]\n", "\n", " return all(test_mothers)\n", "\n", "\n", "def find_quantified(word):\n", "\n", " \"\"\"\n", " Check whether a head candidate is a quantifier (e.g. 讻诇).\n", " If it is, find the quantified noun if there is one.\n", " Quantifiers are connected with the modified noun\n", " either by a subphrase relation of \"rec\" for nomen\n", " regens. In this case, the quantifier word node is the\n", " mother itself. In other cases, the noun is related to the\n", " number via the `atr` (attributive) subphrase relation. In this\n", " case, the edge relation is connected from the substantive\n", " to the number's subphrase.\n", "\n", " --input--\n", " word node\n", "\n", " --output--\n", " new word node or None\n", " \"\"\"\n", "\n", " custom_quants = {\n", " \"KL/\",\n", " \"M 5:\n", "\n", " heads = get_heads(phrase)\n", "\n", " examples.append(heads)\n", "\n", "len(examples)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "random.shuffle(examples) # get samples at random" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "\n", "\n", "**phrase** *1*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase Time NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs nine
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs -teen
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs year
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs hundred
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs year
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *2*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase Objc NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs disk
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs bread
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs one
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs bread
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs bread
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs oil
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs one
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs wafer
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs one
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *3*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase PreC NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs seven
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs five
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs thousand
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs four
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs hundred
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *4*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase Loca PrNP\n", "
\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
nmpr Gilead
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Loca PrNP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Loca PrNP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs boundary
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs Geshurite
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs Maacathite
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Loca PrNP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Loca PrNP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs whole
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs mountain
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
nmpr Hermon
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs whole
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
nmpr Bashan
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Loca PrNP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep unto
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
nmpr Salecah
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *5*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase Objc NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs cloth
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
prep unto
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs dagger
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
prep unto
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs bow
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
prep unto
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs girdle
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *6*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase Subj NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs flow qal ptca
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs have skin-disease pual ptcp
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
adjv lacking
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs bread
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *7*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase Subj NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
advb even
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs king
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs whole
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs servant
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *8*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase Objc NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs horse
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs chariot
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs power
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
adjv heavy
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *9*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase Objc NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs sound
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs chariot
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs sound
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs horse
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs sound
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs power
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
adjv great
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *10*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase Subj NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs chief
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs king
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs ten
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs man
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *11*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase Subj NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs fish
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs sea
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs birds
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs heavens
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs wild animal
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs open field
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs whole
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs creeping animals
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs whole
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs human, mankind
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *12*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase Subj NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs weight
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs house
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
prde this
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
adjv at the back
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *13*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase Subj NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs town
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs surrounding
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs south
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs low land
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *14*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase Time NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs day
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
adjv much
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Time NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs eight
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs hundred
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs day
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *15*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase Objc NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs voice
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs horn
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs pipe
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs zither
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs sambuca
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs psaltery
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs symphony
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs whole
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs sort
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs music for strings
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *16*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase Objc NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
advb even
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs god(s)
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Objc NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep with
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs libation
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Objc NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep with
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs tool
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs what is desirable
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs silver
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs gold
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *17*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase Objc NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs weight
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs silver
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs gold
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs tool
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *18*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep upon
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs defilement
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs priesthood
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs covenant
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs priesthood
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs Levite
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *19*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase Adju NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs four
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs four
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs thousand
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs seven
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs hundred
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs six
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *20*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase Objc NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs disk
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs bread
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs <type of cake>
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs raisin cake
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "B.show(examples[:20], condenseType=\"phrase\") # uncomment me" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data Discovery\n", "\n", "The queries which follow were written at different times during the code construction for the heads algorithm.\n", "\n", "In this section, important questions were asked whose answers are needed to ensure the code is written correctly. The BHSA data is queried to answer them. These are questions like, \"Do we need to check for relational independency for only noun phrases?\" (no); and \"does every phrase type have a word with a corresponding `pdp`?\" (no).\n", "\n", "### Make definitions available for exploration:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "lines_to_next_cell": 2 }, "outputs": [], "source": [ "# mapping from phrase type to its head part of speech\n", "type_to_pdp = {\n", " \"VP\": \"verb\", # verb\n", " \"NP\": \"subs\", # noun\n", " \"PrNP\": \"nmpr\", # proper-noun\n", " \"AdvP\": \"advb\", # adverbial\n", " \"PP\": \"prep\", # prepositional\n", " \"CP\": \"conj\", # conjunctive\n", " \"PPrP\": \"prps\", # personal pronoun\n", " \"DPrP\": \"prde\", # demonstrative pronoun\n", " \"IPrP\": \"prin\", # interrogative pronoun\n", " \"InjP\": \"intj\", # interjectional\n", " \"NegP\": \"nega\", # negative\n", " \"InrP\": \"inrg\", # interrogative\n", " \"AdjP\": \"adjv\",\n", "} # adjective" ] }, { "cell_type": "markdown", "metadata": { "lines_to_next_cell": 2 }, "source": [ "### Test for non-NP phrases with valid `pdp` but invalid head\n", "\n", "These tests demonstrate that subphrase relation checks are also needed for phrase types besides noun phrases. The only valid subphrase/phrase_atom relations for any potential head word is either `NA` or `par`/`Para`. While a few phrase types do not need additional relational checks, e.g. personal pronoun phrases, we can go ahead and consistently handle all phrases in the same way.\n", "\n", "The only exception to the above rule is the `VP`, for which there are 14 cases of the `VP`'s head word (verb) that is also in a subphrase with a regens (`rec`) relation.\n", "\n", "The operational question of these tests was:\n", "> Are there cases in which a non-NP phrase(atom) contains a word with the corresponding `pdp` value, but which is probably not a head?\n", "\n", "To answer the question, we first survey all cases where the phrase type's head candidate is in a subphrase with a relation that is not normally \"independent.\" Based on the survey, we manually check the most pertinent phrase types and results. The tests reveal that, indeed, relation checks are needed for many phrase types." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "def test_pdp_safe(phrase_object=\"phrase_atom\"):\n", "\n", " \"\"\"\n", " Make a survey of phrase types and their matching `pdp` words,\n", " count what kinds of subphrase relations these words\n", " occurr in. The survey can then be used to investigate\n", " whether phrase types besides noun phrases require relationship\n", " checks for independency.\n", " \"\"\"\n", "\n", " pdp_relas_survey = collections.defaultdict(lambda: collections.Counter())\n", " headless = 0\n", "\n", " for phrase in F.otype.s(phrase_object):\n", "\n", " typ = F.typ.v(phrase) # phrase type\n", "\n", " head_pdp = type_to_pdp[typ]\n", "\n", " maybe_heads = [w for w in L.d(phrase, \"word\") if F.pdp.v(w) == head_pdp]\n", "\n", " # this check shows that many\n", " # phrases don't have a word\n", " # with a corresponding pdp!\n", " if not maybe_heads:\n", " headless += 1\n", "\n", " # survey the candidate heads' relations\n", " for word in maybe_heads:\n", "\n", " head_name = typ + \"|\" + head_pdp\n", " subphrases = L.u(word, \"subphrase\")\n", " sp_relas = (\n", " set(F.rela.v(sp) for sp in subphrases) if subphrases else {\"NA\"}\n", " ) # <- handle cases without any subphrases (i.e. verbs)\n", "\n", " pdp_relas_survey[head_name].update(sp_relas)\n", "\n", " print(f\"{phrase_object}s without matching pdp: {headless}\\n\")\n", " print(\"subphrase relation survey: \")\n", " for name, rela_counts in pdp_relas_survey.items():\n", "\n", " print(name)\n", "\n", " for r, count in rela_counts.items():\n", " print(\"\\t\", r, \"-\", count)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "phrase_atoms without matching pdp: 847\n", "\n", "subphrase relation survey: \n", "PP|prep\n", "\t NA - 64521\n", "\t par - 3824\n", "\t adj - 42\n", "\t rec - 8\n", "VP|verb\n", "\t NA - 69010\n", "\t rec - 14\n", "\t par - 1\n", "NP|subs\n", "\t NA - 53881\n", "\t par - 5868\n", "\t rec - 11628\n", "\t adj - 2936\n", "\t atr - 69\n", "CP|conj\n", "\t NA - 53859\n", "AdvP|advb\n", "\t NA - 5131\n", "\t par - 102\n", "\t mod - 49\n", "\t adj - 1\n", "AdjP|adjv\n", "\t NA - 1848\n", "\t par - 135\n", "\t atr - 5\n", "\t adj - 3\n", "\t rec - 1\n", "InjP|intj\n", "\t NA - 1872\n", "\t par - 11\n", "DPrP|prde\n", "\t NA - 790\n", "PrNP|nmpr\n", "\t NA - 11794\n", "\t par - 1478\n", "\t adj - 210\n", "\t rec - 83\n", "NegP|nega\n", "\t NA - 6742\n", "PPrP|prps\n", "\t NA - 4468\n", "\t par - 9\n", "IPrP|prin\n", "\t NA - 797\n", "\t par - 1\n", "InrP|inrg\n", "\t NA - 1288\n", "\t par - 3\n" ] } ], "source": [ "# for phrase_atoms\n", "test_pdp_safe()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "lines_to_next_cell": 2 }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "phrases without matching pdp: 679\n", "\n", "subphrase relation survey: \n", "PP|prep\n", "\t NA - 62315\n", "\t par - 3678\n", "\t adj - 42\n", "\t rec - 9\n", "VP|verb\n", "\t NA - 69010\n", "\t rec - 14\n", "\t par - 1\n", "NP|subs\n", "\t NA - 50092\n", "\t par - 5808\n", "\t rec - 11214\n", "\t adj - 2927\n", "\t atr - 51\n", "CP|conj\n", "\t NA - 52544\n", "AdvP|advb\n", "\t NA - 5083\n", "\t par - 101\n", "\t mod - 46\n", "\t adj - 1\n", "AdjP|adjv\n", "\t NA - 1800\n", "\t par - 118\n", "\t atr - 5\n", "\t adj - 3\n", "\t rec - 1\n", "InjP|intj\n", "\t NA - 1872\n", "\t par - 11\n", "DPrP|prde\n", "\t NA - 791\n", "PrNP|nmpr\n", "\t NA - 11138\n", "\t par - 1380\n", "\t rec - 1267\n", "\t adj - 209\n", "NegP|nega\n", "\t NA - 6742\n", "PPrP|prps\n", "\t NA - 4388\n", "\t par - 9\n", "IPrP|prin\n", "\t NA - 797\n", "\t par - 1\n", "InrP|inrg\n", "\t NA - 1288\n", "\t par - 3\n" ] } ], "source": [ "# and for phrases\n", "test_pdp_safe(phrase_object=\"phrase\")" ] }, { "cell_type": "markdown", "metadata": { "lines_to_next_cell": 2 }, "source": [ "^ These surveys tell us that for several of these phrase types, e.g. `InjP`, we can automatically take the word with the `pdp` value that corresponds with its phrase type as the head.\n", "\n", "There are also quite a few cases where the phrase type does not have a word with a matching `pdp` value: 837 for phrase atoms and 670 for phrases. In the subsequent section we will run tests to find out why this is the case.\n", "\n", "Back to the question of this section: There are 14 examples of `VP` with verbs that have a `rec` (nomen regens) relation. Are these heads or not? We check now..." ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "lines_to_next_cell": 2 }, "outputs": [], "source": [ "def find_and_show(search_pattern):\n", " results = sorted(B.search(search_pattern))\n", " print(len(results), \"results\")\n", " B.show(results, end=20, condenseType=\"phrase\", withNodes=True)" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.86s 14 results\n", "14 results\n" ] }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *1*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "664760\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 664760 PreS VP\n", "
\n", "
\n", "\n", "
\n", "21355\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "21356\n", "\n", "
subs time
\n", "\n", "\n", "
\n", "\n", "
\n", "21357\n", "\n", "
verb bear qal infc
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *2*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "672346\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 672346 PreS VP\n", "
\n", "
\n", "\n", "
\n", "33361\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "33362\n", "\n", "
subs way
\n", "\n", "\n", "
\n", "\n", "
\n", "33363\n", "\n", "
verb see hif infc
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *3*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "691339\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 691339 PreS VP\n", "
\n", "
\n", "\n", "
\n", "68049\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "68050\n", "\n", "
subs year
\n", "\n", "\n", "
\n", "\n", "
\n", "68051\n", "\n", "
verb sell nif infc
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *4*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "760045\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 760045 Pred VP\n", "
\n", "
\n", "\n", "
\n", "188292\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "188293\n", "\n", "
subs sufficiency
\n", "\n", "\n", "
\n", "\n", "
\n", "188294\n", "\n", "
verb come qal infc
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *5*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "765331\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 765331 PreS VP\n", "
\n", "
\n", "\n", "
\n", "196651\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "196652\n", "\n", "
subs sufficiency
\n", "\n", "\n", "
\n", "\n", "
\n", "196653\n", "\n", "
verb pass qal infc
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *6*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "770954\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 770954 PreS VP\n", "
\n", "
\n", "\n", "
\n", "206121\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "206122\n", "\n", "
subs beginning
\n", "\n", "\n", "
\n", "\n", "
\n", "206123\n", "\n", "
verb sit qal infc
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *7*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "774060\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 774060 PreS VP\n", "
\n", "
\n", "\n", "
\n", "212011\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "212012\n", "\n", "
subs year
\n", "\n", "\n", "
\n", "\n", "
\n", "212013\n", "\n", "
verb be king qal infc
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *8*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "779643\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 779643 PreS VP\n", "
\n", "
\n", "\n", "
\n", "221146\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "221147\n", "\n", "
subs sufficiency
\n", "\n", "\n", "
\n", "\n", "
\n", "221148\n", "\n", "
verb pass qal infc
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *9*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "799114\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 799114 PreS VP\n", "
\n", "
\n", "\n", "
\n", "251009\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "251010\n", "\n", "
subs sufficiency
\n", "\n", "\n", "
\n", "\n", "
\n", "251011\n", "\n", "
verb speak piel infc
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *10*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "804887\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 804887 PreS VP\n", "
\n", "
\n", "\n", "
\n", "261523\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "261524\n", "\n", "
subs sound
\n", "\n", "\n", "
\n", "\n", "
\n", "261525\n", "\n", "
verb fall qal infc
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *11*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "810542\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 810542 PreC VP\n", "
\n", "
\n", "\n", "
\n", "270676\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "270677\n", "\n", "
subs destruction
\n", "\n", "\n", "
\n", "\n", "
\n", "270678\n", "\n", "
verb pass qal ptca
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *12*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "834935\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 834935 Pred VP\n", "
\n", "
\n", "\n", "
\n", "310033\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "310034\n", "\n", "
subs <NEG>
\n", "\n", "\n", "
\n", "\n", "
\n", "310035\n", "\n", "
subs duration
\n", "\n", "\n", "
\n", "\n", "
\n", "310036\n", "\n", "
verb turn qal infc
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *13*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "902929\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 902929 PreS VP\n", "
\n", "
\n", "\n", "
\n", "422921\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "422922\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "
\n", "422923\n", "\n", "
subs linen, part, stave
\n", "\n", "\n", "
\n", "\n", "
\n", "422924\n", "\n", "
verb register hit infc
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *14*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "903656\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 903656 PreS VP\n", "
\n", "
\n", "\n", "
\n", "424337\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "
\n", "424338\n", "\n", "
subs face
\n", "\n", "\n", "
\n", "\n", "
\n", "424339\n", "\n", "
verb be humble nif infc
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# run notebook locally to see HTML-formatted results for the below searches\n", "\n", "\n", "rec_verbs = \"\"\"\n", "\n", "phrase_atom typ=VP\n", " subphrase rela=rec\n", " word pdp=verb\n", "\"\"\"\n", "\n", "find_and_show(rec_verbs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In all 14 results, the verb serves as the true head word of the `VP`.\n", "\n", "*Note: The verb will prove to be an exception, as all other words in a `rec` relation are not head words*\n", "\n", "The `PP` also has some strange relations. We see what's going on with the same kind of inspection. First we look at the `rec` (regens) relations." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.87s 12 results\n", "12 results\n" ] }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *1*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "701824\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 701824 Adju PP\n", "
\n", "
\n", "\n", "
\n", "87947\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "87948\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "
\n", "87949\n", "\n", "
subs linen, part, stave
\n", "\n", "\n", "
\n", "\n", "
\n", "87950\n", "\n", "
subs burnt-offering
\n", "\n", "\n", "
\n", "\n", "
\n", "87951\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "87952\n", "\n", "
subs month
\n", "\n", "\n", "
\n", "\n", "
\n", "87953\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "87954\n", "\n", "
subs present
\n", "\n", "\n", "
\n", "\n", "
\n", "87955\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "87956\n", "\n", "
subs burnt-offering
\n", "\n", "\n", "
\n", "\n", "
\n", "87957\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "87958\n", "\n", "
subs continuity
\n", "\n", "\n", "
\n", "\n", "
\n", "87959\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "87960\n", "\n", "
subs present
\n", "\n", "\n", "
\n", "\n", "
\n", "87961\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "87962\n", "\n", "
subs libation
\n", "\n", "\n", "
\n", "\n", "
\n", "87963\n", "\n", "
prep as
\n", "\n", "\n", "
\n", "\n", "
\n", "87964\n", "\n", "
subs justice
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *2*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "729575\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 729575 Objc PP\n", "
\n", "
\n", "\n", "
\n", "138117\n", "\n", "
prep <object marker>
\n", "\n", "\n", "
\n", "\n", "
\n", "138118\n", "\n", "
subs hand
\n", "\n", "\n", "
\n", "\n", "
\n", "138119\n", "\n", "
subs one
\n", "\n", "\n", "
\n", "\n", "
\n", "138120\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "138121\n", "\n", "
subs son
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *3*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "763550\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 763550 PreC PP\n", "
\n", "
\n", "\n", "
\n", "193931\n", "\n", "
prep as
\n", "\n", "\n", "
\n", "\n", "
\n", "193932\n", "\n", "
subs word
\n", "\n", "\n", "
\n", "\n", "
\n", "193933\n", "\n", "
subs one
\n", "\n", "\n", "
\n", "\n", "
\n", "193934\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *4*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "793213\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 793213 Adju PP\n", "
\n", "
\n", "\n", "
\n", "240898\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "240899\n", "\n", "
subs evil
\n", "\n", "\n", "
\n", "\n", "
\n", "240900\n", "\n", "
subs sit qal ptca
\n", "\n", "\n", "
\n", "\n", "
\n", "240901\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *5*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "848705\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 848705 Adju PP\n", "
\n", "
\n", "\n", "
\n", "329764\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "329765\n", "\n", "
subs evil
\n", "\n", "\n", "
\n", "\n", "
\n", "329766\n", "\n", "
subs sit qal ptca
\n", "\n", "\n", "
\n", "\n", "
\n", "329767\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *6*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "893491\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 893491 Adju PP\n", "
\n", "
\n", "\n", "
\n", "404050\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "
\n", "404051\n", "\n", "
subs side
\n", "\n", "\n", "
\n", "\n", "
\n", "404052\n", "\n", "
prep as
\n", "\n", "\n", "
\n", "\n", "
\n", "404053\n", "
\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "404054\n", "\n", "
subs small
\n", "\n", "\n", "
\n", "\n", "
\n", "404055\n", "\n", "
prep as
\n", "\n", "\n", "
\n", "\n", "
\n", "404056\n", "
\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "404057\n", "\n", "
subs great
\n", "\n", "\n", "
\n", "\n", "
\n", "404058\n", "\n", "
subs understand hif ptca
\n", "\n", "\n", "
\n", "\n", "
\n", "404059\n", "\n", "
prep with
\n", "\n", "\n", "
\n", "\n", "
\n", "404060\n", "\n", "
subs scholar
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "rec_preps = \"\"\"\n", "\n", "phrase_atom typ=PP\n", " subphrase rela=rec\n", " word pdp=prep\n", "\"\"\"\n", "\n", "find_and_show(rec_preps)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The PP is different. In cases where the `phrase_atom` = `rec`, the preposition is *not* the head. Thus, the algorithm will need to check for these cases.\n", "\n", "Now for the `adj` subphrase relation in `PP`:" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.88s 42 results\n", "42 results\n" ] }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *1*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "659013\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 659013 Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "12691\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "12692\n", "\n", "
nmpr Havilah
\n", "\n", "\n", "
\n", "\n", "
\n", "12693\n", "\n", "
prep unto
\n", "\n", "\n", "
\n", "\n", "
\n", "12694\n", "\n", "
nmpr Shur
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *2*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "674965\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 674965 Time PP\n", "
\n", "
\n", "\n", "
\n", "37803\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "37804\n", "
\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "37805\n", "\n", "
subs morning
\n", "\n", "\n", "
\n", "\n", "
\n", "37806\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "37807\n", "
\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "37808\n", "\n", "
subs morning
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *3*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "675546\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 675546 Time PP\n", "
\n", "
\n", "\n", "
\n", "38736\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "38737\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "38738\n", "\n", "
subs morning
\n", "\n", "\n", "
\n", "\n", "
\n", "38739\n", "\n", "
prep unto
\n", "\n", "\n", "
\n", "\n", "
\n", "38740\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "38741\n", "\n", "
subs evening
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *4*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "675571\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 675571 Time PP\n", "
\n", "
\n", "\n", "
\n", "38778\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "38779\n", "\n", "
subs morning
\n", "\n", "\n", "
\n", "\n", "
\n", "38780\n", "\n", "
prep unto
\n", "\n", "\n", "
\n", "\n", "
\n", "38781\n", "\n", "
subs evening
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *5*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "676757\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 676757 Subj NP\n", "
\n", "
\n", "\n", "
\n", "40647\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "40648\n", "\n", "
subs what is stolen
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase 676757 Subj NP\n", "
\n", "
\n", "\n", "
\n", "40649\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "40650\n", "\n", "
subs bullock
\n", "\n", "\n", "
\n", "\n", "
\n", "40651\n", "\n", "
prep unto
\n", "\n", "\n", "
\n", "\n", "
\n", "40652\n", "\n", "
subs he-ass
\n", "\n", "\n", "
\n", "\n", "
\n", "40653\n", "\n", "
prep unto
\n", "\n", "\n", "
\n", "\n", "
\n", "40654\n", "\n", "
subs lamb
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *6*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "677725\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 677725 Adju PP\n", "
\n", "
\n", "\n", "
\n", "42226\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "42227\n", "\n", "
subs end
\n", "\n", "\n", "
\n", "\n", "
\n", "42228\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "42229\n", "\n", "
prde this
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *7*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "677728\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 677728 Adju PP\n", "
\n", "
\n", "\n", "
\n", "42233\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "42234\n", "\n", "
subs end
\n", "\n", "\n", "
\n", "\n", "
\n", "42235\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "42236\n", "\n", "
prde this
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *8*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "678437\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 678437 Time PP\n", "
\n", "
\n", "\n", "
\n", "43705\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "43706\n", "\n", "
subs evening
\n", "\n", "\n", "
\n", "\n", "
\n", "43707\n", "\n", "
prep unto
\n", "\n", "\n", "
\n", "\n", "
\n", "43708\n", "\n", "
subs morning
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *9*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "679353\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 679353 Time PP\n", "
\n", "
\n", "\n", "
\n", "45628\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "45629\n", "
\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "45630\n", "\n", "
subs morning
\n", "\n", "\n", "
\n", "\n", "
\n", "45631\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "45632\n", "
\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "45633\n", "\n", "
subs morning
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *10*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "681309\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 681309 Time PP\n", "
\n", "
\n", "\n", "
\n", "49254\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "49255\n", "
\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "49256\n", "\n", "
subs morning
\n", "\n", "\n", "
\n", "\n", "
\n", "49257\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "49258\n", "
\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "49259\n", "\n", "
subs morning
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *11*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "684131\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 684131 Time PP\n", "
\n", "
\n", "\n", "
\n", "55004\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "55005\n", "
\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "55006\n", "\n", "
subs morning
\n", "\n", "\n", "
\n", "\n", "
\n", "55007\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "55008\n", "
\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "55009\n", "\n", "
subs morning
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *12*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "686147\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 686147 Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "58901\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "
\n", "58902\n", "\n", "
nmpr Aaron
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase 686147 Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "58903\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "58904\n", "\n", "
subs priest
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase 686147 Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "58905\n", "\n", "
conj or
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase 686147 Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "58906\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "
\n", "58907\n", "\n", "
subs one
\n", "\n", "\n", "
\n", "\n", "
\n", "58908\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "58909\n", "\n", "
subs son
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase 686147 Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "58910\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "58911\n", "\n", "
subs priest
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *13*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "695140\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 695140 PreC PP\n", "
\n", "
\n", "\n", "
\n", "76198\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "76199\n", "\n", "
subs evening
\n", "\n", "\n", "
\n", "\n", "
\n", "76200\n", "\n", "
prep unto
\n", "\n", "\n", "
\n", "\n", "
\n", "76201\n", "\n", "
subs morning
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *14*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "701824\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 701824 Adju PP\n", "
\n", "
\n", "\n", "
\n", "87947\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "87948\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "
\n", "87949\n", "\n", "
subs linen, part, stave
\n", "\n", "\n", "
\n", "\n", "
\n", "87950\n", "\n", "
subs burnt-offering
\n", "\n", "\n", "
\n", "\n", "
\n", "87951\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "87952\n", "\n", "
subs month
\n", "\n", "\n", "
\n", "\n", "
\n", "87953\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "87954\n", "\n", "
subs present
\n", "\n", "\n", "
\n", "\n", "
\n", "87955\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "87956\n", "\n", "
subs burnt-offering
\n", "\n", "\n", "
\n", "\n", "
\n", "87957\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "87958\n", "\n", "
subs continuity
\n", "\n", "\n", "
\n", "\n", "
\n", "87959\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "87960\n", "\n", "
subs present
\n", "\n", "\n", "
\n", "\n", "
\n", "87961\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "87962\n", "\n", "
subs libation
\n", "\n", "\n", "
\n", "\n", "
\n", "87963\n", "\n", "
prep as
\n", "\n", "\n", "
\n", "\n", "
\n", "87964\n", "\n", "
subs justice
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *15*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "702443\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 702443 Objc NP\n", "
\n", "
\n", "\n", "
\n", "89462\n", "\n", "
subs one
\n", "\n", "\n", "
\n", "\n", "
\n", "89463\n", "\n", "
subs seize qal ptcp
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase 702443 Objc NP\n", "
\n", "
\n", "\n", "
\n", "89464\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "89465\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "89466\n", "\n", "
subs five
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase 702443 Objc NP\n", "
\n", "
\n", "\n", "
\n", "89467\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "89468\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "89469\n", "\n", "
subs human, mankind
\n", "\n", "\n", "
\n", "\n", "
\n", "89470\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "89471\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "89472\n", "\n", "
subs cattle
\n", "\n", "\n", "
\n", "\n", "
\n", "89473\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "89474\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "89475\n", "\n", "
subs he-ass
\n", "\n", "\n", "
\n", "\n", "
\n", "89476\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "89477\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "89478\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "89479\n", "\n", "
subs cattle
\n", "\n", "\n", "
\n", "\n", "
\n", "89480\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "89481\n", "\n", "
subs whole
\n", "\n", "\n", "
\n", "\n", "
\n", "89482\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "89483\n", "\n", "
subs cattle
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *16*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "705970\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 705970 Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "96035\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "96036\n", "\n", "
subs seed
\n", "\n", "\n", "
\n", "\n", "
\n", "96037\n", "\n", "
prep after
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *17*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "706038\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 706038 Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "96162\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "
\n", "96163\n", "\n", "
subs one
\n", "\n", "\n", "
\n", "\n", "
\n", "96164\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "96165\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "96166\n", "\n", "
subs town
\n", "\n", "\n", "
\n", "\n", "
\n", "96167\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "96168\n", "\n", "
prde these
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *18*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "709891\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 709891 Adju PP\n", "
\n", "
\n", "\n", "
\n", "103115\n", "\n", "
prep interval
\n", "\n", "\n", "
\n", "\n", "
\n", "103116\n", "\n", "
subs blood
\n", "\n", "\n", "
\n", "\n", "
\n", "103117\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "
\n", "103118\n", "\n", "
subs blood
\n", "\n", "\n", "
\n", "\n", "
\n", "103119\n", "\n", "
prep interval
\n", "\n", "\n", "
\n", "\n", "
\n", "103120\n", "\n", "
subs claim
\n", "\n", "\n", "
\n", "\n", "
\n", "103121\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "
\n", "103122\n", "\n", "
subs claim
\n", "\n", "\n", "
\n", "\n", "
\n", "103123\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "103124\n", "\n", "
prep interval
\n", "\n", "\n", "
\n", "\n", "
\n", "103125\n", "\n", "
subs stroke
\n", "\n", "\n", "
\n", "\n", "
\n", "103126\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "
\n", "103127\n", "\n", "
subs stroke
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *19*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "722079\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 722079 Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "125932\n", "\n", "
prep interval
\n", "\n", "\n", "
\n", "\n", "
\n", "125933\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "125934\n", "\n", "
prep interval
\n", "\n", "\n", "
\n", "\n", "
\n", "125935\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "125936\n", "\n", "
prep interval
\n", "\n", "\n", "
\n", "\n", "
\n", "125937\n", "\n", "
subs generation
\n", "\n", "\n", "
\n", "\n", "
\n", "125938\n", "\n", "
prep after
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *20*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "749909\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 749909 Adju PP\n", "
\n", "
\n", "\n", "
\n", "170284\n", "\n", "
advb even
\n", "\n", "\n", "
\n", "\n", "
\n", "170285\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "
\n", "170286\n", "\n", "
nmpr David
\n", "\n", "\n", "
\n", "\n", "
\n", "170287\n", "\n", "
advb even
\n", "\n", "\n", "
\n", "\n", "
\n", "170288\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "
\n", "170289\n", "\n", "
nmpr Absalom
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "adj_preps = \"\"\"\n", "\n", "phrase_atom typ=PP\n", " subphrase rela=adj\n", " word pdp=prep\n", "\"\"\"\n", "\n", "find_and_show(adj_preps)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The results above show that the `adj` subphrase relation is also a non-head. These cases have to be excluded.\n", "\n", "Now we move on to test the **adverb** relations reflected in the survey..." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.76s 1 result\n", "1 results\n" ] }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *1*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "883872\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 883872 Modi AdvP\n", "
\n", "
\n", "\n", "
\n", "383762\n", "\n", "
advb be many hif infa
\n", "\n", "\n", "
\n", "\n", "
\n", "383763\n", "\n", "
advb might
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "adv_adj = \"\"\"\n", "\n", "phrase_atom typ=AdvP\n", " subphrase rela=adj\n", " word pdp=advb\n", "\n", "\"\"\"\n", "\n", "find_and_show(adv_adj)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `adj` relationships in the adverbial phrase is also not a true head. Now for the `mod` (modifier) relation." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.77s 49 results\n", "49 results\n" ] }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *1*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "655580\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 655580 Modi AdvP\n", "
\n", "
\n", "\n", "
\n", "7275\n", "\n", "
advb even
\n", "\n", "\n", "
\n", "\n", "
\n", "7276\n", "\n", "
advb hither
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *2*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "656048\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 656048 Modi AdvP\n", "
\n", "
\n", "\n", "
\n", "8037\n", "\n", "
advb even
\n", "\n", "\n", "
\n", "\n", "
\n", "8038\n", "\n", "
advb really
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *3*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "657025\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 657025 Modi AdvP\n", "
\n", "
\n", "\n", "
\n", "9488\n", "\n", "
advb even
\n", "\n", "\n", "
\n", "\n", "
\n", "9489\n", "\n", "
advb indeed
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *4*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "661761\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 661761 Modi AdvP\n", "
\n", "
\n", "\n", "
\n", "16648\n", "\n", "
advb even
\n", "\n", "\n", "
\n", "\n", "
\n", "16649\n", "\n", "
advb eat qal infa
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *5*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "665316\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 665316 Loca AdvP\n", "
\n", "
\n", "\n", "
\n", "22190\n", "\n", "
advb even
\n", "\n", "\n", "
\n", "\n", "
\n", "22191\n", "\n", "
advb here
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *6*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "667158\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 667158 Time AdvP\n", "
\n", "
\n", "\n", "
\n", "25005\n", "\n", "
advb even
\n", "\n", "\n", "
\n", "\n", "
\n", "25006\n", "\n", "
advb now
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *7*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "667895\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 667895 Modi AdvP\n", "
\n", "
\n", "\n", "
\n", "26052\n", "\n", "
advb even
\n", "\n", "\n", "
\n", "\n", "
\n", "26053\n", "\n", "
advb ascend qal infa
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *8*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "672097\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 672097 Modi AdvP\n", "
\n", "
\n", "\n", "
\n", "32927\n", "\n", "
advb only
\n", "\n", "\n", "
\n", "\n", "
\n", "32928\n", "\n", "
advb be far hif infa
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *9*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "697540\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 697540 Modi AdvP\n", "
\n", "
\n", "\n", "
\n", "80306\n", "\n", "
advb even
\n", "\n", "\n", "
\n", "\n", "
\n", "80307\n", "\n", "
advb rule hit infa
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *10*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "700460\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 700460 Modi AdvP\n", "
\n", "
\n", "\n", "
\n", "85109\n", "\n", "
advb even
\n", "\n", "\n", "
\n", "\n", "
\n", "85110\n", "\n", "
advb curse qal infc
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *11*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "700463\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 700463 Modi AdvP\n", "
\n", "
\n", "\n", "
\n", "85113\n", "\n", "
advb even
\n", "\n", "\n", "
\n", "\n", "
\n", "85114\n", "\n", "
advb bless piel infa
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *12*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "705225\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 705225 Adju PP\n", "
\n", "
\n", "\n", "
\n", "94602\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "
\n", "94603\n", "\n", "
subs linen, part, stave
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase 705225 Adju PP\n", "
\n", "
\n", "\n", "
\n", "94604\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "94605\n", "\n", "
subs town
\n", "\n", "\n", "
\n", "\n", "
\n", "94606\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "94607\n", "\n", "
subs open country
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase 705225 Adju PP\n", "
\n", "
\n", "\n", "
\n", "94608\n", "\n", "
advb be many hif infa
\n", "\n", "\n", "
\n", "\n", "
\n", "94609\n", "\n", "
advb might
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *13*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "716546\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 716546 Modi AdvP\n", "
\n", "
\n", "\n", "
\n", "114323\n", "\n", "
advb be far hif infa
\n", "\n", "\n", "
\n", "\n", "
\n", "114324\n", "\n", "
advb might
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *14*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "719700\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 719700 Modi AdvP\n", "
\n", "
\n", "\n", "
\n", "120380\n", "\n", "
advb be many hif infa
\n", "\n", "\n", "
\n", "\n", "
\n", "120381\n", "\n", "
advb might
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *15*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "721834\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 721834 Adju PP\n", "
\n", "
\n", "\n", "
\n", "125415\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "125416\n", "\n", "
subs riches
\n", "\n", "\n", "
\n", "\n", "
\n", "125417\n", "\n", "
adjv much
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase 721834 Adju PP\n", "
\n", "
\n", "\n", "
\n", "125421\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase 721834 Adju PP\n", "
\n", "
\n", "\n", "
\n", "125422\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "125423\n", "\n", "
subs purchase
\n", "\n", "\n", "
\n", "\n", "
\n", "125424\n", "\n", "
adjv much
\n", "\n", "\n", "
\n", "\n", "
\n", "125425\n", "\n", "
advb might
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase 721834 Adju PP\n", "
\n", "
\n", "\n", "
\n", "125426\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "125427\n", "\n", "
subs silver
\n", "\n", "\n", "
\n", "\n", "
\n", "125428\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "125429\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "125430\n", "\n", "
subs gold
\n", "\n", "\n", "
\n", "\n", "
\n", "125431\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "125432\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "125433\n", "\n", "
subs bronze
\n", "\n", "\n", "
\n", "\n", "
\n", "125434\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "125435\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "125436\n", "\n", "
subs iron
\n", "\n", "\n", "
\n", "\n", "
\n", "125437\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "125438\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "125439\n", "\n", "
subs wrapper
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase 721834 Adju PP\n", "
\n", "
\n", "\n", "
\n", "125440\n", "\n", "
advb be many hif infa
\n", "\n", "\n", "
\n", "\n", "
\n", "125441\n", "\n", "
advb might
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *16*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "731154\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 731154 Modi AdvP\n", "
\n", "
\n", "\n", "
\n", "140793\n", "\n", "
advb only
\n", "\n", "\n", "
\n", "\n", "
\n", "140794\n", "\n", "
advb hurt nif infa
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *17*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "735506\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 735506 Time AdvP\n", "
\n", "
\n", "\n", "
\n", "147570\n", "\n", "
advb even
\n", "\n", "\n", "
\n", "\n", "
\n", "147571\n", "\n", "
advb now
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *18*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "741485\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 741485 Time AdvP\n", "
\n", "
\n", "\n", "
\n", "156845\n", "\n", "
advb even
\n", "\n", "\n", "
\n", "\n", "
\n", "156846\n", "\n", "
advb night
\n", "\n", "\n", "
\n", "\n", "
\n", "156847\n", "\n", "
advb even
\n", "\n", "\n", "
\n", "\n", "
\n", "156848\n", "\n", "
advb by day
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase 741485 Time AdvP\n", "
\n", "
\n", "\n", "
\n", "156849\n", "\n", "
subs whole
\n", "\n", "\n", "
\n", "\n", "
\n", "156850\n", "\n", "
subs day
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *19*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "744756\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 744756 Time AdvP\n", "
\n", "
\n", "\n", "
\n", "162031\n", "\n", "
advb even
\n", "\n", "\n", "
\n", "\n", "
\n", "162032\n", "\n", "
advb yesterday
\n", "\n", "\n", "
\n", "\n", "
\n", "162033\n", "\n", "
advb even
\n", "\n", "\n", "
\n", "\n", "
\n", "162034\n", "\n", "
advb day before yesterday
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *20*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "745299\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 745299 Time AdvP\n", "
\n", "
\n", "\n", "
\n", "162945\n", "\n", "
advb even
\n", "\n", "\n", "
\n", "\n", "
\n", "162946\n", "\n", "
advb yesterday
\n", "\n", "\n", "
\n", "\n", "
\n", "162947\n", "\n", "
advb even
\n", "\n", "\n", "
\n", "\n", "
\n", "162948\n", "\n", "
advb day before yesterday
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "adv_mod = \"\"\"\n", "\n", "phrase_atom typ=AdvP\n", " subphrase rela=mod\n", " word pdp=advb\n", "\n", "\"\"\"\n", "\n", "find_and_show(adv_mod)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this case, it appears that `mod` is also an invalid relation for adverb phrases. And example is 讙诐 讛诇诐 ('also here') where 讙诐 is the adverb in `mod` relation, but the head is really 讛诇诐 \"here\" (also an adverb). In several cases, the modifier modifies a verb. In these cases the \"head,\" often a participle or infinitive, acts as the adverb, even though it is not explicitly marked as such.\n", "\n", "Now we move on to the last examination, that of the `AdjP` (adjective phrase). There are three relations of interest:\n", "> `atr` - 6
\n", "> `adj` - 3
\n", "> `rec` - 1
" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.77s 5 results\n", "5 results\n" ] }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *1*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "661695\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 661695 PreC AdjP\n", "
\n", "
\n", "\n", "
\n", "16552\n", "\n", "
adjv twisted
\n", "\n", "\n", "
\n", "\n", "
\n", "16553\n", "\n", "
adjv speckled
\n", "\n", "\n", "
\n", "\n", "
\n", "16554\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "16555\n", "\n", "
adjv speckled
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *2*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "661716\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 661716 PreC AdjP\n", "
\n", "
\n", "\n", "
\n", "16584\n", "\n", "
adjv twisted
\n", "\n", "\n", "
\n", "\n", "
\n", "16585\n", "\n", "
adjv speckled
\n", "\n", "\n", "
\n", "\n", "
\n", "16586\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "16587\n", "\n", "
adjv speckled
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *3*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "715006\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 715006 PreC AdjP\n", "
\n", "
\n", "\n", "
\n", "111758\n", "\n", "
adjv feed qal ptcp
\n", "\n", "\n", "
\n", "\n", "
\n", "111759\n", "\n", "
subs flame
\n", "\n", "\n", "
\n", "\n", "
\n", "111760\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "111761\n", "\n", "
subs sting
\n", "\n", "\n", "
\n", "\n", "
\n", "111762\n", "\n", "
adjv bitter
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "adj_atr = \"\"\"\n", "\n", "phrase_atom typ=AdjP\n", " subphrase rela=atr\n", " word pdp=adjv\n", "\n", "\"\"\"\n", "\n", "find_and_show(adj_atr)" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.78s 3 results\n", "3 results\n" ] }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *1*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "686718\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 686718 PreC AdjP\n", "
\n", "
\n", "\n", "
\n", "59788\n", "\n", "
adjv white
\n", "\n", "\n", "
\n", "\n", "
\n", "59789\n", "\n", "
adjv reddish
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *2*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "853166\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 853166 PreC AdjP\n", "
\n", "
\n", "\n", "
\n", "336212\n", "\n", "
adjv complete
\n", "\n", "\n", "
\n", "\n", "
\n", "336213\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "336214\n", "\n", "
adjv right
\n", "\n", "\n", "
\n", "\n", "
\n", "336215\n", "\n", "
adjv afraid
\n", "\n", "\n", "
\n", "\n", "
\n", "336216\n", "\n", "
subs god(s)
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *3*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "853425\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 853425 PreC AdjP\n", "
\n", "
\n", "\n", "
\n", "336595\n", "\n", "
adjv complete
\n", "\n", "\n", "
\n", "\n", "
\n", "336596\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "336597\n", "\n", "
adjv right
\n", "\n", "\n", "
\n", "\n", "
\n", "336598\n", "\n", "
adjv afraid
\n", "\n", "\n", "
\n", "\n", "
\n", "336599\n", "\n", "
subs god(s)
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "adj_adj = \"\"\"\n", "\n", "phrase_atom typ=AdjP\n", " subphrase rela=adj\n", " word pdp=adjv\n", "\n", "\"\"\"\n", "\n", "find_and_show(adj_adj)" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.83s 1 result\n", "1 results\n" ] }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *1*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "715006\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 715006 PreC AdjP\n", "
\n", "
\n", "\n", "
\n", "111758\n", "\n", "
adjv feed qal ptcp
\n", "\n", "\n", "
\n", "\n", "
\n", "111759\n", "\n", "
subs flame
\n", "\n", "\n", "
\n", "\n", "
\n", "111760\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "111761\n", "\n", "
subs sting
\n", "\n", "\n", "
\n", "\n", "
\n", "111762\n", "\n", "
adjv bitter
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "adj_rec = \"\"\"\n", "\n", "phrase_atom typ=AdjP\n", " subphrase rela=rec\n", " word pdp=adjv\n", "\n", "\"\"\"\n", "\n", "find_and_show(adj_rec)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The results for the three searches above show indeed that the relations of `atr`, `adj`, and `rec` are not head words." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tests for phrase types without a word that has a valid `pdp` value\n", "\n", "The initial survey above revealed that 837 phrase atoms and 670 phrases lack a word with a corresponding `pdp` value. Here we investigate to see why that is the case. Is there a way to compensate for this problem? Are these truly phrases that lack heads?\n", "\n", "We run another survey and count the phrase types against the non-matching `pdp` values found within them. At this point, we must also exclude words that have dependent relations (as defined above, subphrase values of NA or parallel)." ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "AdvP\n", "\t nmpr - 253\n", "\t subs - 499\n", "\t art - 190\n", "\t conj - 13\n", "PrNP\n", "\t subs - 9\n", "\t art - 3\n", "CP\n", "\t prep - 85\n", "\t subs - 79\n", "\t advb - 6\n", "NP\n", "\t intj - 1\n" ] } ], "source": [ "count_no_pdp = collections.defaultdict(lambda: collections.Counter())\n", "record_no_pdp = collections.defaultdict(lambda: collections.defaultdict(list))\n", "\n", "for phrase in F.otype.s(\"phrase_atom\"):\n", "\n", " typ = F.typ.v(phrase)\n", "\n", " # see if there is not corresponding `pdp` value\n", " corres_pdp = type_to_pdp[typ]\n", " corresponding_pdps = [w for w in L.d(phrase, \"word\") if F.pdp.v(w) == corres_pdp]\n", "\n", " if not corresponding_pdps:\n", "\n", " # put potential heads here\n", " maybe_heads = []\n", "\n", " # calculate subphrase relations\n", " for word in L.d(phrase, \"word\"):\n", "\n", " # get subphrase relations\n", " word_subphrs = L.u(word, \"subphrase\")\n", " sp_relas = set(F.rela.v(sp) for sp in word_subphrs) or {\"NA\"}\n", "\n", " # check subphrase relations for independence\n", " if sp_relas == {\"NA\"}:\n", " maybe_heads.append(word)\n", "\n", " # test parallel relation for independence\n", " elif sp_relas == {\"NA\", \"par\"} or sp_relas == {\"par\"}:\n", "\n", " # check for good, head mothers\n", " good_mothers = set(\n", " sp for w in maybe_heads for sp in L.u(w, \"subphrase\")\n", " )\n", " this_daughter = [sp for sp in word_subphrs if F.rela.v(sp) == \"par\"][0]\n", " this_mother = E.mother.f(this_daughter)\n", "\n", " if this_mother in good_mothers:\n", " maybe_heads.append(word)\n", "\n", " # sanity check\n", " # maybe_heads should have SOMETHING\n", " if not maybe_heads:\n", " raise Exception(f\"phrase {phrase} looks HEADLESS!\")\n", "\n", " # count pdp types\n", " head_pdps = [F.pdp.v(w) for w in maybe_heads]\n", " count_no_pdp[typ].update(head_pdps)\n", "\n", " # save for examination\n", " for word in maybe_heads:\n", " record_no_pdp[typ][F.pdp.v(word)].append((phrase, word))\n", "\n", "for name, counts in count_no_pdp.items():\n", "\n", " print(name)\n", "\n", " for pdp, count in counts.items():\n", " print(\"\\t\", pdp, \"-\", count)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These results are a bit puzzling. The numbers here are words within the phrase atoms that have NO subphrase relations. That means, for example, words such as 讛址 \"the\" do not appear to have any subphrase relation to their modified nouns. That again illustrates the shortcoming of the ETCBC data in this respect. There should be a relation from the article to the determined noun.\n", "\n", "From this point forward, I will begin working through all four phrase types and the cases reflected in the survey.\n", "\n", "Beginning with the `AdvP` type and the article. Upon some initial inspection, I've found that in many of the `AdvP` with the article, there is also a substantive (`subs`) that was found by the search. Are there any cases where there is no `nmpr` or `subs` found alongside the article? We can use the dict `record_no_pdp` which has recorded all cases reflected in the survey. Below I look to see if all 190 cases of an article in these `AdvP` phrases also has a corresponding noun." ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 without nouns found...\n" ] } ], "source": [ "no_noun = []\n", "\n", "for phrase in record_no_pdp[\"AdvP\"][\"art\"]:\n", "\n", " pdps = set(F.pdp.v(w) for w in L.d(phrase[0], \"word\"))\n", "\n", " if not {\"nmpr\", \"subs\"} & pdps:\n", " no_noun.append((phrase,))\n", "\n", "print(len(no_noun), \"without nouns found...\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There it is. So all cases of these articles can be discarded. In these cases, the noun serves as the head of the adverbial phrase. An example of this is when the noun marks the location of the action (hence adverb).\n", "\n", "Next, we check the conjunctions found in the adverbial phrases. Are any of those heads?" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [], "source": [ "# B.show(record_no_pdp['AdvP']['conj']) # uncomment me!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All conjunctions in these `AdvP` phrases function to mark coordinate elements (only 讜 in these results). They can also be discarded as not possible heads.\n", "\n", "Now we investigate the `PrNP` results with `subs` and `art`..." ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [], "source": [ "# B.show(record_no_pdp['PrNP']['subs']) # uncomment me!" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [], "source": [ "# B.show(record_no_pdp['PrNP']['art']) # uncomment me!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `art` relations reflected in the second search are not heads, but are all related to a substantive. All of the results in `subs` are heads. Thus, the only acceptable `pdp` for `PrNP` besides a proper noun is `subs`.\n", "\n", "Now we dig into `CP` results. 85 of them have no `pdp` of conjunction, but have a preposition instead. Let's see what's going on..." ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [], "source": [ "# B.show(record_no_pdp['CP']['prep'][:20]) # uncomment me!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These are very interesting results. These conjunction phrases are made up of constructions like 讘+注讘讜专 and 讘+讟专诐. Together these words function as a conjunction, but alone they are prepositions and particles. Is it even possible in this case to say that there is a \"head\"?\n", "\n", "It could be said that these combinations of words mean more than the sum of their parts; they are good examples of constructions, i.e. combinations of words whose meaning cannot be inferred simply from their individual words. Constructions illustrate the vague boundary between syntax and lexicon (cf. e.g. Goldberg, 1995, *Constructions*).\n", "\n", "While these words are indeed marked as conjunction phrases, it is better in this case to analyse them as prepositional phrases (which they also are...this is another shortcoming of our data, or perhaps a mistake??). Thus, the head is the preposition, not the prepositional object.\n", "\n", "We should expect that the remaining `subs` and `advb` groups are in fact the objects of those prepositions (and hence excluded). Let's test that assumption by looking for a preposition behind these words..." ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "subs|advb with no preceding prepositions: 0\n" ] } ], "source": [ "no_prep = []\n", "\n", "for (phrase, word) in record_no_pdp[\"CP\"][\"subs\"] + record_no_pdp[\"CP\"][\"advb\"]:\n", "\n", " possible_prep = word - 1\n", "\n", " if F.sp.v(possible_prep) != \"prep\":\n", " no_prep.append((phrase, word))\n", "\n", "print(f\"subs|advb with no preceding prepositions: {len(no_prep)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we see. We can confirm that none of the nouns or adverbs will be the head of a conjunction phrase. A preposition is the only other kind of head for the `CP` besides a conjunction itself.\n", "\n", "Finally, we're left with a last noun phrase (`NP`) for which no matching noun was found. The search found instead both `adjv` (adjective) and a `intj` (interjection). Let's see it." ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [], "source": [ "# B.show(record_no_pdp['NP']['intj']) # uncomment me" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this case, the word 讗讜讬 \"woe\" functions like a noun. This thus appears to be another mislabelled `pdp` value, since it should read `subs`. This, like the previous example, will not receive a head value due to the mistake." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Retrieving Quantified Words\n", "\n", "When the heads algorithm looks for a noun without any subphrase relations in the phrase, it will often return a quantifier noun such as a number, e.g. 砖讘注讛 \"seven\", or such as another descriptor like 讻诇. But these words function semantically in a more descriptive role than a head role. Thus, we want our algorithm to isolate quantified nouns from their quantifiers. To do that means we must first know how the ETCBC encodes the relationship between a quantifier and the quantified noun.\n", "\n", "In a previous algorithm used for quantified extraction, we looked for a nomen regens relation on the quantifier and located the noun within the related subphrase. This approach works well for the quantifier 讻诇. But for cardinal numbers, the relation `adj` (adjunct) is often used as well (as seen in the surveys below).\n", "\n", "To illustrate with the search below, the quantifier 砖讘注讛 \"seven\" has no nomen regens relation:" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [], "source": [ "# B.show([(2217,)]) # uncomment me!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Rather than reflecting a regen/rectum relation, the second word 砖谞讬诐 \"years,\" the quantified noun, has a subphrase relation of `adj` \"adjunct\":" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "砖侄讈郑讘址注 砖指讈谞执謹讬诐 讜旨砖职讈诪止谞侄芝讛 诪值讗止謻讜转 砖指讈谞指謶讛 \n", "\n", "1301096 (subphrase)\n", "\t 砖指讈谞执謹讬诐 \n", "\t rela: adj\n", "\n", "1301097 (subphrase)\n", "\t 砖侄讈郑讘址注 砖指讈谞执謹讬诐 \n", "\t rela: NA\n", "\n" ] } ], "source": [ "print(T.text(L.d(652883, \"word\")))\n", "print()\n", "\n", "for sp in L.u(2218, \"subphrase\"): # subphrases belonging to \"years\"\n", " print(sp, \"(subphrase)\")\n", " print(\"\\t\", T.text(L.d(sp, \"word\")))\n", " print(\"\\t\", \"rela:\", F.rela.v(sp))\n", " print()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's see what other kinds of subphrase relations are reflected by quantifieds.\n", "\n", "Below we make a survey of all mother-daughter relations between a quantifier subphrase and its daughters. The goal is to isolate those relationships which contain the quantified noun. We work through examples to get an idea of the meaning of the features. And we write a few TF search queries further below to confirm hypotheses about these relationships." ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ">XD/\n", "\t par - 62\n", "\t adj - 42\n", "\t rec - 69\n", "\t Appo - 1\n", "\t Spec - 6\n", "\t atr - 8\n", "\t mod - 4\n", "CNJM/\n", "\t rec - 345\n", "\t adj - 267\n", "\t par - 109\n", "\t mod - 14\n", "\t atr - 5\n", "\t Spec - 6\n", "\t dem - 3\n", "\t Sfxs - 1\n", ">RBH/\n", "\t rec - 30\n", "\t adj - 302\n", "\t par - 217\n", "\t atr - 2\n", "CMNH/\n", "\t adj - 128\n", "\t par - 28\n", "\t rec - 8\n", "\t mod - 1\n", "TCLP=/\n", "\t adj - 143\n", "\t rec - 33\n", "\t par - 198\n", "\t Spec - 1\n", "\t atr - 2\n", "/\n", "\t adj - 3\n", "\t par - 3\n", "XD/\n", "\t atr - 1\n", "\t Spec - 2\n", "CTJN/\n", "\t par - 1\n", "CT/\n", "TLT/\n", "\t adj - 2\n", "TRJN/\n", "\t rec - 2\n", ">LP/\n", "\t rec - 1\n", "XD/']['atr']) # uncomment me!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `Spec` (`phrase_atom` `rela`) are cases where a phrase atom is used to add adjectival information about the quantifier." ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [], "source": [ "# B.show(quant_ex['>XD/']['Spec']) # uncomment me!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `mod` relation are cases where the quantifier is modified with particles like 讙诐 or 专拽" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [], "source": [ "# B.show(quant_ex['CNJM/']['mod']) # uncomment me!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `dem` relation is when a demonstrative like 讗诇讛 modifies the quantifier." ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [], "source": [ "# B.show(quant_ex['CB Subs Missed Results\n", "\n", "In the first test, several nouns are missed due to the presence of an adjectival element. Let's look at those cases and see what's going on. I have copied the phrase numbers of a few relevant examples." ] }, { "cell_type": "code", "execution_count": 91, "metadata": {}, "outputs": [], "source": [ "adj_examples = [(771933,), (799523,)]\n", "\n", "# B.show(adj_examples) # uncomment me" ] }, { "cell_type": "code", "execution_count": 92, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "subphrase \ttext \trelation \tmother\n", "讘侄谉志 1355711 NA ()\n", "讗指诪止謹讜抓 1355712 rec (207817,)\n" ] } ], "source": [ "show_subphrases(adj_examples[0][0])" ] }, { "cell_type": "code", "execution_count": 93, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "讬职砖址纸讈注职讬指郑讛讜旨 nmpr\n", "讘侄谉志 subs\n", "讗指诪止謹讜抓 nmpr\n" ] } ], "source": [ "for word in L.d(adj_examples[0][0], \"word\"):\n", " print(T.text([word]), F.pdp.v(word))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this case, the substantive is not detected by the algorithm since it is in a dependent subphrase, a construct relation, with its modifying adjective. How to extract these nouns?\n", "\n", "This is very similar to the quantifier case, where the word in the rectum is actually the head (e.g. 砖转讬 砖谞讛 \"two years\" where \"two\" is registered as the head, but the substantive \"years\" is the semantic head). This kind of relationship is differentiated from non-heads by the fact that the adjective itself is independent. Thus, in cases where the adjective is independent and has a daughter rectum subphrase, the algorithm should retrieve the attributed noun.\n", "\n", "**proposed solution**: Add `adjv` to the set of acceptable `pdp` for the `NP`. Any adjectives will be processed for dependency: most will fail that test. But for the dozens of cases where the adjective does not fail, the algorithm will apply a separate check for a `rec` related subphrase which contains the true head.\n", "\n", "### Participle -> Head Missed Results\n", "\n", "Other phrases that end up headless are noun phrases that have a participle which serves as a the nominal element, but since it has satellites is coded as a \"verb\":" ] }, { "cell_type": "code", "execution_count": 94, "metadata": {}, "outputs": [], "source": [ "verb_examples = [(709010,), (711593,), (756104,)]\n", "\n", "# B.show(verb_examples) # uncomment me" ] }, { "cell_type": "code", "execution_count": 95, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "subphrase \ttext \trelation \tmother\n", "\n", "subphrase \ttext \trelation \tmother\n", "\n", "subphrase \ttext \trelation \tmother\n", "\n" ] } ], "source": [ "for phrase in verb_examples:\n", " show_subphrases(phrase[0])\n", " print()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are mixed cases here due to the shortcomings of the current data model. In these cases, the participle is marked as a \"verb\" since it also has objects or descriptors. In the first example above, the noun 讙专讛 functions as the *object* of the verb. The head is 诪注诇讛. But the same logic does not hold for the second or third case. In the second case, 止 驻爪讜注志讚讻讗 gives an *attribute* or quality of 砖驻讻讛. In the third case, 诪爪拽 \"poured\", describes an attribute of 谞讞砖转 \"bronze.\" Thus the opposite of example 1 is true, that is, the head noun is the attributed noun in the construct relation.\n", "\n", "Since the specific role of the noun or the verb is not specified at this lower phrase level, is there even a way to differentiate these cases?\n", "\n" ] }, { "cell_type": "code", "execution_count": 96, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "709010\n", "讜职 conj\n", "\n", "711593\n", "讗值芝讬谉 nega\n", "\n", "756104\n", "讘值讬转止讜蜘 subs\n", "\n" ] } ], "source": [ "for phrase in verb_examples:\n", " print(phrase[0])\n", " for word in L.d(phrase[0], \"word\"):\n", " print(T.text([word]), F.pdp.v(word))\n", " print()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It actually appears that the database treats all 3 the same: as adjectives at the phrase-dependent part of speech level. Thus, these cases will receive the same treatments as the adjective cases above." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `KL/` relation problems\n", "\n", "I found an instance in Number 3:15 where the subphrase relationship that connects 讻诇 with its quantified noun is `atr` .That is probably wrong. Are there other cases with the same problem?" ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 97, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kl_prob = \"\"\"\n", "\n", "sp1:subphrase\n", " w1:word lex=KL/ st=a\n", "\n", "sp2:subphrase rela=atr\n", "\n", "sp2 -mother> sp1\n", "sp2 >> sp1\n", "w1 :: sp1\n", "\"\"\"\n", "\n", "kl_prob = sorted(B.search(kl_prob))\n", "\n", "len(kl_prob)" ] }, { "cell_type": "code", "execution_count": 98, "metadata": {}, "outputs": [], "source": [ "# B.show(kl_prob) # uncomment me" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It seems that the adjectives are not nominalised in this construction as `pdp` of `subs`. Most of the findings are adjectives in construct with 讻诇. But there are several cases of the participle also.\n", "\n", "Is this encoding correct?\n", "\n", "If the `rela` code were properly `rec` as most are, then this would simply be a matter of adding an additional acceptable `pdp` to the list within the `get_quantified` function." ] }, { "cell_type": "code", "execution_count": 99, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 99, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# kl_prob = [r for r in kl_prob if not {'adjv'} and set(F.pdp.v(w) for w in L.d(r[2]))]\n", "\n", "len(kl_prob)" ] }, { "cell_type": "code", "execution_count": 100, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 100, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kl_prob = set(r[0] for r in kl_prob)\n", "\n", "len(kl_prob)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Subphrase by Subphrase Approach?\n", "\n", "Experimenting with switching from a word-by-word approach to a subphrase-by-subphrase. The first iteration of the `get_heads` function iterated word by word to identify valid heads with independent subphrase relations. A more efficient, and methodologically sound approach would be to work from the subphrase down to the word. Here I experiment with such a method." ] }, { "cell_type": "code", "execution_count": 193, "metadata": {}, "outputs": [], "source": [ "test_phrases = [\n", " ph\n", " for ph in F.typ.s(\"NP\")\n", " if len(L.d(ph, \"word\")) == 5 and F.otype.v(ph) == \"phrase\"\n", "]" ] }, { "cell_type": "code", "execution_count": 194, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "655731" ] }, "execution_count": 194, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test = test_phrases[20]\n", "\n", "test" ] }, { "cell_type": "code", "execution_count": 195, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "subphrase \ttext \trelation \tmother\n", "讬职诇执芝讬讚 讘值旨纸讬转职讱指謻 1302879 NA ()\n", "讬职诇执芝讬讚 1302877 NA ()\n", "讘值旨纸讬转职讱指謻 1302878 rec (7530,)\n", "诪执拽职谞址郑转 讻址旨住职驻侄旨謶讱指 1302882 par (1302879,)\n", "诪执拽职谞址郑转 1302880 NA ()\n", "讻址旨住职驻侄旨謶讱指 1302881 rec (7533,)\n" ] } ], "source": [ "show_subphrases(test)" ] }, { "cell_type": "code", "execution_count": 196, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1302879, 1302877, 1302880]" ] }, "execution_count": 196, "metadata": {}, "output_type": "execute_result" } ], "source": [ "head_cands = [sp for sp in L.d(test, \"subphrase\") if F.rela.v(sp) == \"NA\"]\n", "\n", "head_cands" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note above that the heads are those within NA relations that consist of single words. How consistent is this? Are there any cases where the head does not receive its own individual subphrase with a NA relation? Or are there cases of NA relations of non-head elements? Below we run a couple of tests, and then we build a primitive head finder based on this hypothesis in order to manually inspect what happens." ] }, { "cell_type": "code", "execution_count": 201, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "example found: \n", "23 (1300568,) {'rec'}\n", "search complete with 0 results\n" ] } ], "source": [ "for word in F.otype.s(\"word\"):\n", "\n", " subphrases = L.u(word, \"subphrase\")\n", "\n", " if not subphrases:\n", " continue\n", "\n", " sp_relas = set(F.rela.v(sp) for sp in subphrases)\n", "\n", " if not {\"NA\", \"par\"} & sp_relas:\n", " print(\"example found: \")\n", " print(word, subphrases, sp_relas)\n", " break\n", "\n", "print(\"search complete with 0 results\")" ] }, { "cell_type": "code", "execution_count": 202, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'转职讛止謶讜诐 '" ] }, "execution_count": 202, "metadata": {}, "output_type": "execute_result" } ], "source": [ "T.text(L.d(1300568, \"word\"))" ] }, { "cell_type": "code", "execution_count": 224, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "45161" ] }, "execution_count": 224, "metadata": {}, "output_type": "execute_result" } ], "source": [ "no_na = \"\"\"\n", "\n", "sp1:subphrase\n", " w1:word\n", "sp2:subphrase\n", "\n", "sp2 -mother> w1\n", "\"\"\"\n", "\n", "no_na = sorted(S.search(no_na))\n", "\n", "len(no_na)" ] }, { "cell_type": "code", "execution_count": 229, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "words with construct relation and no NA subphrase: 0\n" ] } ], "source": [ "no_na_filtered = []\n", "\n", "for r in no_na:\n", "\n", " reg = r[1]\n", "\n", " reg_subphrases = L.u(reg, \"subphrase\")\n", " reg_sp_relas = set(F.rela.v(sp) for sp in reg_subphrases)\n", "\n", " if \"NA\" not in reg_sp_relas:\n", " no_na_filtered.append(r)\n", "\n", "print(f\"words with construct relation and no NA subphrase: {len(no_na_filtered)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The search above shows that in any case that a word is in a construct relation with a subphrase, a NA (no relation) subphrase exists.\n", "\n", "Let's broaden the inquiry a bit. What are the specific situations in which there is NO non-related subphrase at all. What kinds of relations are present? What kinds of phrases are they?" ] }, { "cell_type": "code", "execution_count": 236, "metadata": { "lines_to_end_of_cell_marker": 2 }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Counter({'NO subphrases': 215258, 'has NA': 37952})\n" ] } ], "source": [ "na_survey = collections.Counter()\n", "\n", "for phrase in F.otype.s(\"phrase\"):\n", "\n", " subphrase_relas = tuple(\n", " sorted(set(F.rela.v(sp) for sp in L.d(phrase, \"subphrase\")))\n", " )\n", "\n", " if not subphrase_relas:\n", " na_survey[\"NO subphrases\"] += 1\n", "\n", " elif \"NA\" in subphrase_relas:\n", " na_survey[\"has NA\"] += 1\n", "\n", " else:\n", " na_survey[subphrase_relas] += 1\n", "\n", "pprint(na_survey)" ] }, { "cell_type": "markdown", "metadata": { "lines_to_next_cell": 2 }, "source": [ "This count shows that there are only two situations in the data: either\n", "\n", "1) a phrase has no subphrases present, or\n", "\n", "2) it has a subphrase with a relation of \"NA\". There are NO cases of phrases that lack an NA subphrase but have other relations. That is good for our hypothesis...\n", "\n", "In the experiment below, two important assumptions are made about the head:\n", "\n", "**First**, it is assumed that **the head is the first valid `pdp` word in the phrase**, with the exception of quantifieds and attributed nouns which are handled differently.\n", "\n", "**Second**, it is assumed that the **first NA-relation subphrase contains the head**. We test that assumption by manually inspecting the output." ] }, { "cell_type": "code", "execution_count": 292, "metadata": {}, "outputs": [], "source": [ "def primitive_head_hunter(phrase):\n", "\n", " \"\"\"\n", " Looks at noun phrases for heads.\n", " \"\"\"\n", "\n", " good_pdp = {\"subs\", \"nmpr\"}\n", "\n", " subphrase_candidates = [\n", " sp\n", " for sp in L.d(phrase, \"subphrase\")\n", " if F.rela.v(sp) == \"NA\" and F.rela.v(L.u(sp, \"phrase_atom\")[0]) == \"NA\"\n", " ]\n", "\n", " # handle simple phrases\n", " if not subphrase_candidates:\n", " head_candidates = [w for w in L.d(phrase, \"word\") if F.pdp.v(w) in good_pdp]\n", " try:\n", " return (head_candidates[0],)\n", " except:\n", " print(f\"exception at {phrase}\")\n", "\n", " # attempt simple head assignment\n", " first_na_subphrase = subphrase_candidates[0]\n", " try:\n", " the_head = next(\n", " w for w in L.d(first_na_subphrase, \"word\") if F.pdp.v(w) in good_pdp\n", " )\n", " return (the_head,)\n", " except:\n", " if F.pdp.v(L.d(first_na_subphrase, \"word\")[0]) == \"adjv\":\n", " pass\n", " else:\n", " raise Exception(phrase)" ] }, { "cell_type": "code", "execution_count": 296, "metadata": {}, "outputs": [], "source": [ "test_results = [primitive_head_hunter(ph) for ph in test_phrases]\n", "\n", "random.shuffle(test_results)" ] }, { "cell_type": "code", "execution_count": 298, "metadata": {}, "outputs": [], "source": [ "# B.show(test_results) # uncomment me" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As it turns out, the assumption about NA phrase type is workable. But the complications of this approach (explained below) make it an unlikely solution for now." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Conclusion\n", "\n", "I've done some initial testing with the subphrase by subphrase approach. It is a promising method, but requires a more complicated implementation with nested searches through each level of the phrase hierarchy. A simple subphrase by subphrase approach is not sufficient鈥攐ne needs to go phrase by phrase, `phrase_atom` by `phrase_atom`, `subphrase` by `subphrase`, and even beyond. It is a recursive problem that cannot be navigated with the present, limited data model. There is more to say about the present state of the data model which I will save for the final report.\n", "\n", "At present, the word-by-word approach provides an elegant (though limited) solution that is able to navigate the quirks of the present data model and provide an acceptable level of accuracy, with some exceptions for more complicated phrase constructions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Handling Parallels\n", "\n", "What is the best way to handle parallel head elements? In general, a phrase has only one real \"head\". That is, often the first head element determines the grammatical gender or number of the verb (thanks to Constantijn Sikkel for this conversation). Yet, the nouns which are coordinate to the head are often of interest for both grammatical and semantic studies.\n", "\n", "There are two approaches to collecting coordinate heads. One is to check for every word with a relation of \"parallel\" whether its mother is already established as a head. Another approach is to recursively search for nouns that are coordinate with the head word. Up until this inquiry, I have opted for option 1 due to the complexity of checking necessary relationships for a head candidate. But a phrase in Deuteronomy 12:17 is then missed by this current approach, since there is there a chain of head nouns in construct with the quantifier 诪注砖专 \"tenth\". These cases are missed.\n", "\n", "It is possible to edit the algorithm to accommodate these cases. But the example raises the broader question of whether option 1 is truly sufficient and methodologically sound. In this section, I test whether option 2 is a better alternative. First, we are unsure about how to separate a head word from a larger, paralleled subphrase. In option 1, individual words are tested, each for dependent relationships. But option 2 will go the opposite direction: beginning at the subphrase level and working down to the word. Does this affect our ability to separate the head noun of the phrase?" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "def OLD_get_heads(phrase):\n", " \"\"\"\n", " Extracts and returns the heads of a supplied\n", " phrase or phrase atom based on that phrase's type\n", " and the relations reflected within the phrase.\n", "\n", " --input--\n", " phrase(atom) node number\n", "\n", " --output--\n", " tuple of head word node(s)\n", " \"\"\"\n", "\n", " # mapping from phrase type to good part of speech values for heads\n", " head_pdps = {\n", " \"VP\": {\"verb\"}, # verb\n", " \"NP\": {\"subs\", \"adjv\", \"nmpr\"}, # noun\n", " \"PrNP\": {\"nmpr\", \"subs\"}, # proper-noun\n", " \"AdvP\": {\"advb\", \"nmpr\", \"subs\"}, # adverbial\n", " \"PP\": {\"prep\"}, # prepositional\n", " \"CP\": {\"conj\", \"prep\"}, # conjunctive\n", " \"PPrP\": {\"prps\"}, # personal pronoun\n", " \"DPrP\": {\"prde\"}, # demonstrative pronoun\n", " \"IPrP\": {\"prin\"}, # interrogative pronoun\n", " \"InjP\": {\"intj\"}, # interjectional\n", " \"NegP\": {\"nega\"}, # negative\n", " \"InrP\": {\"inrg\"}, # interrogative\n", " \"AdjP\": {\"adjv\"}, # adjective\n", " }\n", "\n", " # get phrase-head's part of speech value and list of candidate matches\n", " phrase_type = F.typ.v(phrase)\n", " head_candidates = [\n", " w for w in L.d(phrase, \"word\") if F.pdp.v(w) in head_pdps[phrase_type]\n", " ]\n", "\n", " # VP with verbs require no further processing, return the head verb\n", " if phrase_type == \"VP\":\n", " return tuple(head_candidates)\n", "\n", " # go head-hunting!\n", " heads = []\n", "\n", " for word in head_candidates:\n", "\n", " # gather the word's subphrase (+ phrase_atom if otype is phrase) relations\n", " word_phrases = list(L.u(word, \"subphrase\"))\n", " word_phrases += (\n", " list(L.u(word, \"phrase_atom\"))\n", " if (F.otype.v(phrase) == \"phrase\")\n", " else list()\n", " )\n", " word_relas = set(F.rela.v(phr) for phr in word_phrases) or {\"NA\"}\n", "\n", " # check (sub)phrase relations for independency\n", " if word_relas - {\"NA\", \"par\", \"Para\"}:\n", " continue\n", "\n", " # check parallel relations for independency\n", " elif word_relas & {\"par\", \"Para\"} and mother_is_head(word_phrases, heads):\n", " this_head = find_quantified(word) or find_attributed(word) or word\n", " heads.append(this_head)\n", "\n", " # save all others as heads, check for quantifiers first\n", " elif word_relas == {\"NA\"}:\n", " this_head = find_quantified(word) or find_attributed(word) or word\n", " heads.append(this_head)\n", "\n", " return tuple(sorted(set(heads)))\n", "\n", "\n", "def mother_is_head(word_phrases, previous_heads):\n", "\n", " \"\"\"\n", " Test and validate parallel relationships for independency.\n", " Must gather the mother for each relation and check whether\n", " the mother contains a head word.\n", "\n", " --input--\n", " * list of phrase nodes for a given word (includes subphrases)\n", " * list of previously approved heads\n", "\n", " --output--\n", " boolean\n", " \"\"\"\n", "\n", " # get word's enclosing phrases that are parallel\n", " parallel_phrases = [ph for ph in word_phrases if F.rela.v(ph) in {\"par\", \"Para\"}]\n", " # get the mother for the parallel phrases\n", " parallel_mothers = [E.mother.f(ph)[0] for ph in parallel_phrases]\n", " # get mothers' words, by mother\n", " parallel_mom_words = [set(L.d(mom, \"word\")) for mom in parallel_mothers]\n", " # test for head in each mother\n", " test_mothers = [\n", " bool(phrs_words & set(previous_heads)) for phrs_words in parallel_mom_words\n", " ]\n", "\n", " return all(test_mothers)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### How many subphrases with a parallel relation to a validated head consist of more than one word?\n", "\n", "We take the first head element for every noun phrase and check its parallel elements." ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "length: 1\n", "\t 3346\n", "length: 2\n", "\t 686\n", "length: 3\n", "\t 86\n", "length: 4\n", "\t 24\n", "length: 9\n", "\t 10\n", "length: 5\n", "\t 4\n", "length: 6\n", "\t 3\n" ] } ], "source": [ "par_word_count = collections.Counter()\n", "par_word_list = collections.defaultdict(list)\n", "\n", "for np in F.typ.s(\"NP\"):\n", "\n", " heads = OLD_get_heads(np)\n", "\n", " if not heads:\n", " continue\n", "\n", " the_head = heads[0]\n", "\n", " if not L.u(the_head, \"subphrase\"):\n", " continue\n", "\n", " head_smallest_sp = sorted(sp for sp in L.u(the_head, \"subphrase\"))[0]\n", "\n", " par_daughter = [d for d in E.mother.t(head_smallest_sp) if F.rela.v(d) == \"par\"]\n", "\n", " for pd in par_daughter:\n", "\n", " word_length = len(L.d(par_daughter[0], \"word\"))\n", "\n", " par_word_count[word_length] += 1\n", " par_word_list[word_length].append((the_head, head_smallest_sp))\n", "\n", "for w_count, count in par_word_count.items():\n", " print(\"length:\", w_count)\n", " print(\"\\t\", count)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's see some of the larger cases..." ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "\n", "\n", "**phrase** *1*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "713759\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 713759 Objc NP\n", "
\n", "
\n", "\n", "
\n", "109564\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "109565\n", "\n", "
subs proving
\n", "\n", "\n", "
\n", "\n", "
\n", "109566\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "109567\n", "\n", "
adjv great
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase 713759 Objc NP\n", "
\n", "
\n", "\n", "
\n", "109571\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "109572\n", "\n", "
subs sign
\n", "\n", "\n", "
\n", "\n", "
\n", "109573\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "109574\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "109575\n", "\n", "
subs sign
\n", "\n", "\n", "
\n", "\n", "
\n", "109576\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "109577\n", "\n", "
adjv great
\n", "\n", "\n", "
\n", "\n", "
\n", "109578\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "109579\n", "\n", "
prde they
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *2*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "897414\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 897414 Objc NP\n", "
\n", "
\n", "\n", "
\n", "412359\n", "\n", "
subs chief
\n", "\n", "\n", "
\n", "\n", "
\n", "412360\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "412361\n", "\n", "
subs supply
\n", "\n", "\n", "
\n", "\n", "
\n", "412362\n", "\n", "
subs food
\n", "\n", "\n", "
\n", "\n", "
\n", "412363\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "412364\n", "\n", "
subs oil
\n", "\n", "\n", "
\n", "\n", "
\n", "412365\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "412366\n", "\n", "
subs wine
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "B.show(par_word_list[6], condenseType=\"phrase\", withNodes=True)" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1409887 par 讗止爪职专止芝讜转 诪址讗植讻指謻诇 讜职砖侄讈芝诪侄谉 讜指讬指纸讬执谉變 \n" ] } ], "source": [ "ex_subphrase = par_word_list[6][1][1]\n", "\n", "for daughter in [d for d in E.mother.t(ex_subphrase) if F.rela.v(d) == \"par\"]:\n", "\n", " print(daughter, F.rela.v(daughter), T.text(L.d(daughter, \"word\")))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And now some examples of 2 word lengths..." ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "\n", "\n", "**phrase** *1*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "651919\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 651919 Subj NP\n", "
\n", "
\n", "\n", "
\n", "676\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "677\n", "\n", "
subs heavens
\n", "\n", "\n", "
\n", "\n", "
\n", "678\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "679\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "680\n", "\n", "
subs earth
\n", "\n", "\n", "
\n", "\n", "
\n", "681\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "682\n", "\n", "
subs whole
\n", "\n", "\n", "
\n", "\n", "
\n", "683\n", "\n", "
subs service
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *2*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "652841\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 652841 Time NP\n", "
\n", "
\n", "\n", "
\n", "2152\n", "\n", "
subs three
\n", "\n", "\n", "
\n", "\n", "
\n", "2153\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "2154\n", "\n", "
subs hundred
\n", "\n", "\n", "
\n", "\n", "
\n", "2155\n", "\n", "
subs year
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**phrase** *3*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "653522\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " phrase 653522 Time NP\n", "
\n", "
\n", "\n", "
\n", "3540\n", "\n", "
subs five
\n", "\n", "\n", "
\n", "\n", "
\n", "3541\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "3542\n", "\n", "
subs hundred
\n", "\n", "\n", "
\n", "\n", "
\n", "3543\n", "\n", "
subs day
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "B.show(par_word_list[2][:5], condenseType=\"phrase\", withNodes=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These examples raise an important possibility. If we take the first word labeled \"subs\" (substantive) within the parallel, will that give us the coordinate head?\n", "\n", "Below is an example from Genesis 5:3 which shows a potential pitfall of the method 2 approach, and even of the current approach." ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "652841 砖职讈诇止砖执讈证讬诐 讜旨诪职讗址转謾 砖指讈谞指謹讛 \n", "subphrase \ttext \trelation \tmother\n", "砖职讈诇止砖执讈证讬诐 1301044 NA ()\n", "诪职讗址转謾 砖指讈谞指謹讛 1301047 par (1301044,)\n", "诪职讗址转謾 1301045 NA ()\n", "砖指讈谞指謹讛 1301046 rec (2154,)\n" ] } ], "source": [ "example = L.d(T.nodeFromSection((\"Genesis\", 5, 3)), \"phrase\")[3]\n", "\n", "print(example, T.text(L.d(example, \"word\")))\n", "show_subphrases(example)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The example above illustrates the shortcoming of even the current method of separating quantifiers and quantifieds, as seen in this result:" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'砖职讈诇止砖执讈证讬诐 |砖指讈谞指謹讛 '" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"|\".join(T.text([h]) for h in get_heads(example))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The algorithm retrieves both \"thirty\" and \"year,\" even though the only head in this case is \"year\". This is a shortcoming of the quantifier function, which in this case has not detected a complex quantifier that is formed with a parallel relation.\n", "\n", "The quantifier algorithm should have passed 砖诇砖讬诐 along to another test before validating it as a head. That is, it should look for this case of a complex quantifier. This is actually another good reason to change the parallels finder to approach 2, so that parallels are processed at the head level rather than disconnected from it. In this setup, the algorithm will gather all parallels to the head. If the syntactic head is a quantifier. If it has no quantified noun, then the algorithm will look further at the parallel relationship to see if it is also a quantifier. If it is, then it will look to find that quantifier's substantive and return it instead. This is a complex recursive process that will have to be coded." ] }, { "cell_type": "code", "execution_count": 157, "metadata": {}, "outputs": [], "source": [ "# B.show(par_word_list[3][:15]) # uncomment me" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Are there cases where there is multiple coordinate relations with a single subphrase?" ] }, { "cell_type": "code", "execution_count": 160, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 160, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test_multiple_coor = \"\"\"\n", "\n", "sp1:subphrase\n", "sp2:subphrase rela=par\n", "sp3:subphrase rela=par\n", "\n", "sp2 -mother> sp1\n", "sp3 -mother> sp1\n", "\n", "sp2 # sp3\n", "\"\"\"\n", "\n", "test_multiple_coor = sorted(S.search(test_multiple_coor))\n", "\n", "len(test_multiple_coor)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "No, it does not happen. Thus, coordinate relations are chained to each other, not multiplied to a single mother." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Conclusions\n", "This inquiry sparked the one above it about the subphrase by subphrase approach. We have decided to table this method for now." ] } ], "metadata": { "jupytext": { "encoding": "# -*- coding: utf-8 -*-" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0" } }, "nbformat": 4, "nbformat_minor": 2 }