{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "\n", "You might want to consider the [start](search.ipynb) of this tutorial." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T10:06:39.818664Z", "start_time": "2018-05-24T10:06:39.796588Z" } }, "outputs": [], "source": [ "from tf.fabric import Fabric\n", "from tf.extra.bhsa import Bhsa" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T10:06:41.254515Z", "start_time": "2018-05-24T10:06:41.238046Z" } }, "outputs": [], "source": [ "VERSION = '2017'\n", "DATABASE = '~/github/etcbc'\n", "BHSA = f'bhsa/tf/{VERSION}'\n", "PARA = f'parallels/tf/{VERSION}'\n", "TF = Fabric(locations=[DATABASE], modules=[BHSA, PARA], silent=True )" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T10:06:48.865143Z", "start_time": "2018-05-24T10:06:44.712958Z" } }, "outputs": [ { "data": { "text/markdown": [ "**Documentation:** BHSA Feature docs BHSA API Text-Fabric API 5.0.4 Search Reference" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "This notebook online:\n", "NBViewer\n", "GitHub\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "api = TF.load('', silent=True)\n", "api.makeAvailableIn(globals())\n", "B = Bhsa(api, 'search', version=VERSION)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Gaps and spans\n", "\n", "Searches often do not deliver the results you expect.\n", "Besides typos, lack of familiarity with the template formalism and bugs in the system, there is\n", "another cause: **difficult semantics of the data**.\n", "\n", "Most users reason about phrases, clauses and sentences as if they are consecutive blocks of words.\n", "But in the BHSA this is not the case: each of these objects may have **gaps**.\n", "\n", "Most of the time, verse boundaries coincide with the boundaries of sentences, clauses, and phrases.\n", "But not always, there are verse **spanning** sentences.\n", "\n", "These phenomena may wreak havoc with your intuitive reasoning about what search templates should deliver.\n", "\n", "We are going to show these issues in depth." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Gaps\n", "\n", "Search has no direct primitives to deal with gaps.\n", "Nodes correspond to textual objects such as word, phrases, clauses, verses, book.\n", "Usually these are consecutive sequences of words, but in theory they can be arbitrary sets of slots.\n", "\n", "And, as far as the BHSA corpus is concerned, in practice too.\n", "If we look at phrases, then the overwhelming majority is consecutive, so without gaps,\n", "but gaps in phrases do occur and are not even exceptional.\n", "\n", "People that are familiar with MQL (see [fromMQL](searchFromMQL.ipynb)\n", "may remember that in MQL you can search for a gap.\n", "The MQL query\n", "\n", "```\n", "SELECT ALL OBJECTS WHERE\n", "\n", "[phrase FOCUS\n", " [word lex='L']\n", " [gap]\n", "]\n", "```\n", "\n", "looks for a phrase with a gap in it\n", "(i.e. one or more consecutive words between the start and the end of the phrase\n", "that do not belong to the phrase).\n", "The query then asks additionally for those gap-containing phrases that have a certain word in front of the gap.\n", "\n", "**We want this too!**\n", "\n", "> **Note**\n", " The fact that sentences, clauses, and phrases may not be coherent chunks wreaks havoc with your\n", " basic intuitions about textual objects. Our query templates do not require the objects to be consecutive and\n", " still they make sense. But that might not be your sense, unless you **Mind the gap!**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Find the gap\n", "\n", "We start with a query that aims to get the same results as the MQL query above.\n", "\n", "A phrase `p` is gapped, if there is a word between two of its words that does not belong to it.\n", "\n", "In our template, we require that there is a word `wPreGrap` in the phrase that is just before the gap,\n", "a word `wGap` that comes right after, so it is in the gap, and hence does not belong to the phrase.\n", "But this all must happen before the last word `wLast` of the phrase." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T10:09:32.685437Z", "start_time": "2018-05-24T10:09:32.680670Z" } }, "outputs": [], "source": [ "query = '''\n", "verse\n", " p:phrase\n", " wPreGap:word lex=L\n", " wLast:word\n", " :=\n", "\n", "wGap:word\n", "wPreGap <: wGap\n", "wGap < wLast\n", "p || wGap\n", "'''" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 1.13s 13 results\n" ] } ], "source": [ "results = B.search(query)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nice and quick.\n", "Let's see the results." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T10:09:43.410941Z", "start_time": "2018-05-24T10:09:43.194596Z" } }, "outputs": [ { "data": { "text/markdown": [ "n | verse | phrase | word | word | word\n", "--- | --- | --- | --- | --- | ---\n", "1|Genesis 17:7|לְךָ֙ וּֽלְזַרְעֲךָ֖ אַחֲרֶֽיךָ׃ |לְךָ֙ |אַחֲרֶֽיךָ׃ |לֵֽ\n", "2|Genesis 28:4|לְךָ֙ לְךָ֖ וּלְזַרְעֲךָ֣ אִתָּ֑ךְ |לְךָ֙ |אִתָּ֑ךְ |אֶת־\n", "3|Genesis 31:16|לָ֥נוּ וּלְבָנֵ֑ינוּ |לָ֥נוּ |בָנֵ֑ינוּ |ה֖וּא \n", "4|Exodus 30:21|לָהֶ֧ם לֹ֥ו וּלְזַרְעֹ֖ו |לָהֶ֧ם |זַרְעֹ֖ו |חָק־\n", "5|Leviticus 25:6|לָכֶם֙ לְךָ֖ וּלְעַבְדְּךָ֣ וְלַאֲמָתֶ֑ךָ וְלִשְׂכִֽירְךָ֙ וּלְתֹושָׁ֣בְךָ֔ |לָכֶם֙ |תֹושָׁ֣בְךָ֔ |לְ\n", "6|Numbers 20:15|לָ֛נוּ וְלַאֲבֹתֵֽינוּ׃ |לָ֛נוּ |אֲבֹתֵֽינוּ׃ |מִצְרַ֖יִם \n", "7|Numbers 32:33|לָהֶ֣ם׀ לִבְנֵי־גָד֩ וְלִבְנֵ֨י רְאוּבֵ֜ן וְלַחֲצִ֣י׀ שֵׁ֣בֶט׀ מְנַשֶּׁ֣ה בֶן־יֹוסֵ֗ף |לָהֶ֣ם׀ |יֹוסֵ֗ף |מֹשֶׁ֡ה \n", "8|Deuteronomy 1:36|לֹֽו־וּלְבָנָ֑יו |לֹֽו־|בָנָ֑יו |אֶתֵּ֧ן \n", "9|Deuteronomy 26:11|לְךָ֛ וּלְבֵיתֶ֑ךָ |לְךָ֛ |בֵיתֶ֑ךָ |יְהוָ֥ה \n", "10|1_Samuel 25:31|לְךָ֡ לַאדֹנִ֗י |לְךָ֡ |אדֹנִ֗י |לְ\n", "11|2_Kings 25:24|לָהֶ֤ם וּלְאַנְשֵׁיהֶ֔ם |לָהֶ֤ם |אַנְשֵׁיהֶ֔ם |גְּדַלְיָ֨הוּ֙ \n", "12|Jeremiah 40:9|לָהֶ֜ם וּלְאַנְשֵׁיהֶ֣ם |לָהֶ֜ם |אַנְשֵׁיהֶ֣ם |גְּדַלְיָ֨הוּ \n", "13|Daniel 9:8|לָ֚נוּ לִמְלָכֵ֥ינוּ לְשָׂרֵ֖ינוּ וְלַאֲבֹתֵ֑ינוּ |לָ֚נוּ |אֲבֹתֵ֑ינוּ |בֹּ֣שֶׁת " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "B.table(results)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's color the word in the gap differently." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "ExecuteTime": { "end_time": "2018-05-18T09:16:40.841646Z", "start_time": "2018-05-18T09:16:40.654538Z" }, "scrolled": false }, "outputs": [ { "data": { "text/markdown": [ "\n", "\n", "**result** *1*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "
\n", " \n", " \n", "
\n", "\n", "
\n", "\n", "
\n", " sentence 18\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " clause WQt0\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Conj CP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Pred VP\n", "
\n", "
\n", "\n", "
\n", "\n", "
verb arise hif perf
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Objc PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep <object marker>
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs covenant
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep interval
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
prep interval
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
prep interval
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs seed
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep after
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs generation
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs covenant
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs eternity
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " clause Adju InfC\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Pred VP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
verb be qal infc
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase PreC PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs god(s)
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs seed
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep after
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**result** *2*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "
\n", " \n", " \n", "
\n", "\n", "
\n", "\n", "
\n", " sentence 13\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " clause WYq0\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Conj CP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Pred VP\n", "
\n", "
\n", "\n", "
\n", "\n", "
verb give qal impf
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Objc PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep <object marker>
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs blessing
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
nmpr Abraham
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs seed
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep together with
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " clause Adju InfC\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase PreS VP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
verb trample down qal infc
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Objc PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep <object marker>
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs earth
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs neighbourhood
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " clause Attr xQtX\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Rela CP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj <relative>
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Pred VP\n", "
\n", "
\n", "\n", "
\n", "\n", "
verb give qal perf
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs god(s)
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
nmpr Abraham
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**result** *3*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "
\n", " \n", " \n", "
\n", "\n", "
\n", "\n", "
\n", " sentence 49\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " clause CPen\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Conj CP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj that
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Frnt NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs whole
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs riches
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " clause Attr xQtX\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Rela CP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj <relative>
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Pred VP\n", "
\n", "
\n", "\n", "
\n", "\n", "
verb deliver hif perf
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs god(s)
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs father
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " clause Resu NmCl\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase PreC PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj PPrP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prps he
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase PreC PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase PreC PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs son
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " sentence 50\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " clause MSyn\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Conj CP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Time AdvP\n", "
\n", "
\n", "\n", "
\n", "\n", "
advb now
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " sentence 51\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " clause xIm0\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Objc NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs whole
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " clause Attr xQtX\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Rela CP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj <relative>
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Pred VP\n", "
\n", "
\n", "\n", "
\n", "\n", "
verb say qal perf
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs god(s)
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " clause xIm0\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Pred VP\n", "
\n", "
\n", "\n", "
\n", "\n", "
verb make qal impv
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "B.show(results, end=3, condensed=False, colorMap={2: 'lightyellow', 3: 'yellow', 5: 'magenta'})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## All gapped phrases\n", "\n", "These were particular gaps.\n", "Now we want to get *all* gapped phrases.\n", "\n", "We can just lift the special requirement that \n", "the `preGapWord` has to satisfy a special lexical condition." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T10:09:32.685437Z", "start_time": "2018-05-24T10:09:32.680670Z" } }, "outputs": [], "source": [ "query = '''\n", "p:phrase\n", " wPreGap:word\n", " wLast:word\n", " :=\n", "\n", "wGap:word\n", "wPreGap <: wGap\n", "wGap < wLast\n", "\n", "p || wGap\n", "'''" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 3.55s 715 results\n" ] } ], "source": [ "results = B.search(query)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Not too bad! We could wait for it. Here are some results." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "n | phrase | word | word | word\n", "--- | --- | --- | --- | ---\n", "1|בֵּ֤ין הַמַּ֨יִם֙ וּבֵ֣ין הַמַּ֔יִם |מַּ֨יִם֙ |מַּ֔יִם |אֲשֶׁר֙ \n", "2|דֶּ֔שֶׁא עֵ֚שֶׂב עֵ֣ץ פְּרִ֞י |עֵ֚שֶׂב |פְּרִ֞י |מַזְרִ֣יעַ \n", "3|דֶּ֠שֶׁא עֵ֣שֶׂב וְעֵ֧ץ |עֵ֣שֶׂב |עֵ֧ץ |מַזְרִ֤יעַ \n", "4|אֶת־כָּל־עֵ֣שֶׂב׀ וְאֶת־כָּל־הָעֵ֛ץ |עֵ֣שֶׂב׀ |עֵ֛ץ |זֹרֵ֣עַ \n", "5|שְׁנֵיהֶם֙ הָֽאָדָ֖ם וְאִשְׁתֹּ֑ו |שְׁנֵיהֶם֙ |אִשְׁתֹּ֑ו |עֲרוּמִּ֔ים \n", "6|הֶ֨בֶל גַם־ה֛וּא |הֶ֨בֶל |ה֛וּא |הֵבִ֥יא \n", "7|מִן־הַבְּהֵמָה֙ הַטְּהֹורָ֔ה וּמִן־הַ֨בְּהֵמָ֔ה וּמִ֨ן־הָעֹ֔וף וְכֹ֥ל |בְּהֵמָ֔ה |כֹ֥ל |אֲשֶׁ֥ר \n", "8|הֵ֜מָּה וְכָל־הַֽחַיָּ֣ה לְמִינָ֗הּ וְכָל־הַבְּהֵמָה֙ לְמִינָ֔הּ וְכָל־הָרֶ֛מֶשׂ לְמִינֵ֑הוּ וְכָל־הָעֹ֣וף לְמִינֵ֔הוּ כֹּ֖ל צִפֹּ֥ור כָּל־כָּנָֽף׃ |רֶ֛מֶשׂ |כָּנָֽף׃ |הָ\n", "9|כָּל־בָּשָׂ֣ר׀ בָּעֹ֤וף וּבַבְּהֵמָה֙ וּבַ֣חַיָּ֔ה וּבְכָל־הַשֶּׁ֖רֶץ וְכֹ֖ל הָאָדָֽם׃ |בָּשָׂ֣ר׀ |אָדָֽם׃ |הָ\n", "10|כָּל־בָּשָׂ֣ר׀ בָּעֹ֤וף וּבַבְּהֵמָה֙ וּבַ֣חַיָּ֔ה וּבְכָל־הַשֶּׁ֖רֶץ וְכֹ֖ל הָאָדָֽם׃ |שֶּׁ֖רֶץ |אָדָֽם׃ |הַ" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "B.table(results, end=10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If a phrase has multiple gaps, we encounter it multiple times in our results.\n", "\n", "We show the two condensed results in Genesis 7:21." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "\n", "\n", "**verse** *9*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "
\n", " \n", " \n", "
\n", "\n", "
\n", "\n", "
\n", " sentence 32\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " clause WayX\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Conj CP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Pred VP\n", "
\n", "
\n", "\n", "
\n", "\n", "
verb expire qal wayq
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs whole
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs flesh
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " clause Attr Ptcp\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Rela CP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj the
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase PreC VP\n", "
\n", "
\n", "\n", "
\n", "\n", "
verb creep qal ptca
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep upon
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs earth
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " clause WayX\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "
\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs birds
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "
\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs cattle
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "
\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs wild animal
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs whole
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs swarming creatures
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " clause Attr Ptcp\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Rela CP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj the
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase PreC VP\n", "
\n", "
\n", "\n", "
\n", "\n", "
verb swarm qal ptca
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep upon
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs earth
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " clause WayX\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs whole
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs human, mankind
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**verse** *10*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "
\n", " \n", " \n", "
\n", "\n", "
\n", "\n", "
\n", " sentence 6\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " clause WXYq\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Conj CP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs fear
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs terror
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Pred VP\n", "
\n", "
\n", "\n", "
\n", "\n", "
verb be qal impf
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase PreC PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep upon
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs whole
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs wild animal
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs earth
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
prep upon
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs whole
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs birds
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs heavens
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " clause Coor NmCl\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase PreC PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs whole
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " clause Attr xYq0\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Rela CP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj <relative>
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Pred VP\n", "
\n", "
\n", "\n", "
\n", "\n", "
verb creep qal impf
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Cmpl NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs soil
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " clause Coor NmCl\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase PreC PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase PreC PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs whole
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs fish
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs sea
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " sentence 7\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " clause xQt0\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs hand
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Pred VP\n", "
\n", "
\n", "\n", "
\n", "\n", "
verb give nif perf
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "B.show(results, condensed=True, start=9, end=10, colorMap={1: 'lightyellow', 2: 'yellow', 4: 'magenta'})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we want just the phrases, and only once, we can run the query in shallow mode, see [advanced](searchAdvanced.ipynb):" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 3.49s 671 results\n" ] } ], "source": [ "gapQueryResults = B.search(query, shallow=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A different query\n", "\n", "We can make an equivalent query to get the gaps." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:41:30.980164Z", "start_time": "2018-05-24T08:41:30.974422Z" } }, "outputs": [], "source": [ "query = '''\n", "p:phrase\n", " =: wFirst:word\n", " wLast:word\n", " :=\n", "\n", "wGap:word\n", "wFirst < wGap\n", "wLast > wGap\n", "\n", "p || wGap\n", "'''" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Experience has shown that this is a slow query, so we handle it with care." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " | 0.00s Feature overview: 109 for nodes; 8 for edges; 1 configs; 7 computed\n", " 0.00s Checking search template ...\n", " 0.00s Setting up search space for 4 objects ...\n", " 0.38s Constraining search space with 7 relations ...\n", " 0.40s Setting up retrieval plan ...\n", " 0.45s Ready to deliver results from 1532939 nodes\n", "Iterate over S.fetch() to get the results\n", "See S.showPlan() to interpret the results\n", " 0.00s Counting results per 1 up to 8 ...\n", " | 43s 1\n", " | 43s 2\n", " | 43s 3\n", " | 43s 4\n", " | 43s 5\n", " | 43s 6\n", " | 1m 16s 7\n", " | 1m 16s 8\n", " 1m 16s Done: 8 results\n" ] } ], "source": [ "S.study(query)\n", "S.count(progress=1, limit=8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a good example of a query that is slow to deliver even its first result.\n", "And that is bad, because it is such a straight-forward query.\n", "\n", "Why is this one so slow, while the previous one went so smoothly?\n", "\n", "The crucial thing is the `wGap` word. In the latter template, `wGap` is not embedded in anything.\n", "It is constrained by `wFirst < wGap` and `wGap < wLast`.\n", "However, the way the search strategy works is by examining all possibilities for `wFirst < wGap`\n", "and only then checking whether `wGap < wLast`.\n", "The algorithm cannot check both conditions at the same time.\n", "\n", "With embedding relations, things are better. Text-Fabric is heavily optimized to deal with embedding\n", "relationships.\n", "\n", "In the former template, we see that the `wGap` is required to be `adjacent` to `wPreGap`, and this one\n", "is embedded in the phrase. Hence there are few cases to consider for `wPreGap`, and per instance\n", "there is only one `wGap`.\n", "\n", "> **Lesson**\n", "Try to prevent the use of *free floating* nodes in your template that become constrained\n", "by other spatial relationships than embedding." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### To the rescue\n", "The former template had it right.\n", "Can we rescue the latter template?\n", "\n", "We can assume that the phrase and the gap both contain a word that belongs to the same verse.\n", "Note that phrase and gap may belong to different clauses and sentences.\n", "We assume that a phrase cannot belong to more than two verses, so either the first or the last word\n", "of the phrase is in the same verse as a word in the gap." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:41:30.980164Z", "start_time": "2018-05-24T08:41:30.974422Z" } }, "outputs": [], "source": [ "query = '''\n", "p:phrase\n", " =: wFirst:word\n", " wLast:word\n", " :=\n", "\n", "wGap:word\n", "wFirst < wGap\n", "wLast > wGap\n", "\n", "p || wGap\n", "\n", "v:verse\n", "\n", "v [[ wFirst\n", "v [[ wGap\n", "'''" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " | 0.00s Feature overview: 109 for nodes; 8 for edges; 1 configs; 7 computed\n", " 0.00s Checking search template ...\n", " 0.00s Setting up search space for 5 objects ...\n", " 0.36s Constraining search space with 9 relations ...\n", " 0.38s Setting up retrieval plan ...\n", " 0.45s Ready to deliver results from 1556152 nodes\n", "Iterate over S.fetch() to get the results\n", "See S.showPlan() to interpret the results\n", " 0.00s Counting results per 100 up to 3000 ...\n", " | 0.50s 100\n", " | 1.08s 200\n", " | 1.48s 300\n", " | 1.67s 400\n", " | 1.81s 500\n", " | 2.17s 600\n", " | 2.25s 700\n", " | 2.67s 800\n", " | 2.81s 900\n", " | 3.03s 1000\n", " | 3.29s 1100\n", " | 3.48s 1200\n", " | 3.78s 1300\n", " | 4.70s 1400\n", " | 5.44s 1500\n", " | 5.89s 1600\n", " | 6.43s 1700\n", " | 6.68s 1800\n", " | 7.15s 1900\n", " | 7.68s 2000\n", " | 8.13s 2100\n", " | 8.44s 2200\n", " | 9.20s 2300\n", " | 11s 2400\n", " | 12s 2500\n", " | 12s 2600\n", " | 13s 2700\n", " 13s Done: 2707 results\n" ] } ], "source": [ "S.study(query)\n", "S.count(progress=100, limit=3000)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We are going to run this query in `shallow` mode." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 14s 671 results\n" ] } ], "source": [ "results = B.search(query, shallow=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Shallow mode tends to be quicker, but that does not always materialize.\n", "The number of results agrees with the first query.\n", "Yet we have been lucky, because we required the word in the gap to be in the same verse as the first word in the phrase.\n", "What if we require if it is the last word in the phrase?" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:41:30.980164Z", "start_time": "2018-05-24T08:41:30.974422Z" } }, "outputs": [], "source": [ "query = '''\n", "p:phrase\n", " =: wFirst:word\n", " wLast:word\n", " :=\n", "\n", "wGap:word\n", "wFirst < wGap\n", "wLast > wGap\n", "\n", "p || wGap\n", "\n", "v:verse\n", "\n", "v [[ wLast\n", "v [[ wGap\n", "'''" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 42s 660 results\n" ] } ], "source": [ "results = B.search(query, shallow=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we would not have found all results.\n", "\n", "So, this road, although doable, is much less comfortable, performance-wise and logic-wise." ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2018-05-23T08:31:45.680786Z", "start_time": "2018-05-23T08:31:45.673210Z" } }, "source": [ "## Check the gaps\n", "\n", "In this misty landscape of gaps we need some corroboration that we found the right results.\n", "\n", "1. is every node in `gapQueryResults` a phrase?\n", "1. does every phrase in the `gapQueryResults` have a gap?\n", "1. is every gapped phrase contained in `gapQueryResults`?\n", "\n", "We check all this by hand coding.\n", "\n", "Here is a function that checks whether a phrase has a gap.\n", "If the distance between its end points is greater than the number of words it contains,\n", "it must have a gap." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:41:51.194078Z", "start_time": "2018-05-24T08:41:50.211615Z" } }, "outputs": [], "source": [ "def hasGap(p):\n", " words = L.d(p, otype='word')\n", " return words[-1] - words[0] + 1 > len(words)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can perform the checks." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:41:51.194078Z", "start_time": "2018-05-24T08:41:50.211615Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "671 nodes in query result\n", "1. all nodes are phrases\n", "2. all nodes have gaps\n", "3. all gapped phrases are contained in the results\n" ] } ], "source": [ "otypesGood = True\n", "haveGaps = True\n", "\n", "for p in gapQueryResults:\n", " otype = F.otype.v(p)\n", " if otype != 'phrase':\n", " print(f'Non phrase detected: {p}) is a {otype}')\n", " otypesGood = False\n", " break\n", "\n", " if not hasGap(p):\n", " print(f'Phrase without a gap: {p}')\n", " B.pretty(p)\n", " haveGaps = False\n", " break\n", "\n", "print(f'{len(gapQueryResults)} nodes in query result')\n", "if otypesGood:\n", " print('1. all nodes are phrases')\n", "if haveGaps:\n", " print('2. all nodes have gaps')\n", "\n", "inResults = True\n", "for p in F.otype.s('phrase'):\n", " if hasGap(p):\n", " if p not in gapQueryResults:\n", " print(f'Gapped phrase outside query results: {p}')\n", " B.pretty(p)\n", " inResults = False\n", " break\n", " \n", "if inResults:\n", " print('3. all gapped phrases are contained in the results')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that by hand coding we can get the gapped phrases much more quickly and securely!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Custom sets for (non-)gapped phrases\n", "\n", "We have obtained a set with all gapped phrases,\n", "and we have paid a price:\n", "\n", "* either an expensive query,\n", "* or an inconvenient bit of hand coding.\n", "\n", "It would be nice if we could kick-start our queries using this set as a given.\n", "And that is exactly what we are going to do now.\n", "\n", "We make to custom sets and give them a name, one for gapped phrases and one for non-gapped phrases." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "customSets = dict(\n", " gapphrase=gapQueryResults,\n", " conphrase=set(F.otype.s('phrase')) - gapQueryResults,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Suppose we want all verbs that occur in a gapped phrase." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:41:53.694434Z", "start_time": "2018-05-24T08:41:53.689921Z" } }, "outputs": [], "source": [ "query = '''\n", "gapphrase\n", " word sp=verb\n", "'''" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that we have used the foreign name `gapphrase` in our search template, instead of `phrase`.\n", "\n", "But we can still run `search()`, provided we tell it what we mean by `gapphrase`. \n", "We do that by passing the `sets` parameter to `search()`, which should be a dictionary of sets.\n", "Search will look up `gapphrase` in this dictionary, and will use its value, which should be a node set.\n", "That way, it understands that the expression `gapphrase` stands for the nodes in the given node set.\n", "\n", "Here we go:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:41:57.840028Z", "start_time": "2018-05-24T08:41:57.047787Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 1.03s 93 results\n" ] } ], "source": [ "results = B.search(query, sets=customSets)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:05:09.044933Z", "start_time": "2018-05-24T08:05:09.005186Z" }, "scrolled": false }, "outputs": [ { "data": { "text/markdown": [ "\n", "\n", "**verse** *1*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "
\n", " \n", " \n", "
\n", "\n", "
\n", "\n", "
\n", " sentence 64\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " clause WayX\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Conj CP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Pred VP\n", "
\n", "
\n", "\n", "
\n", "\n", "
verb say qal wayq
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj PrNP\n", "
\n", "
\n", "\n", "
\n", "\n", "
nmpr Leah
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " sentence 65\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " clause ZQtX\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase PreO VP\n", "
\n", "
\n", "\n", "
\n", "\n", "
verb bestow qal perf
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs god(s)
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase PreO VP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep <object marker>
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Objc NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs endowment
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
adjv good
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " sentence 66\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " clause xYqX\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Modi NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs foot
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase PreO VP\n", "
\n", "
\n", "\n", "
\n", "\n", "
verb tolerate qal impf
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs man
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " sentence 67\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " clause xQt0\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Conj CP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj that
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Pred VP\n", "
\n", "
\n", "\n", "
\n", "\n", "
verb bear qal perf
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Objc NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs six
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs son
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " sentence 68\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " clause Way0\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Conj CP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Pred VP\n", "
\n", "
\n", "\n", "
\n", "\n", "
verb call qal wayq
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Objc PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep <object marker>
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs name
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Objc PrNP\n", "
\n", "
\n", "\n", "
\n", "\n", "
nmpr Zebulun
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**verse** *2*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "
\n", " \n", " \n", "
\n", "\n", "
\n", "\n", "
\n", " sentence 118\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " clause Way0\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Conj CP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Pred VP\n", "
\n", "
\n", "\n", "
\n", "\n", "
verb turn aside hif wayq
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Time PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "
\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs day
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
prde he
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Objc PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep <object marker>
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs he-goat
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
adjv twisted
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
adjv patch qal ptcp
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Objc PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Objc PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep <object marker>
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs whole
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs goat
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
adjv speckled
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
adjv patch qal ptcp
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Objc PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs whole
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " clause Attr NmCl\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Rela CP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj <relative>
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs white
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase PreC PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " clause Way0\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Objc PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Objc PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs whole
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
adjv ruttish
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Objc PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "
\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs young ram
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " sentence 119\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " clause Way0\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Conj CP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Pred VP\n", "
\n", "
\n", "\n", "
\n", "\n", "
verb give qal wayq
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs hand
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs son
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**verse** *3*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "
\n", " \n", " \n", "
\n", "\n", "
\n", "\n", "
\n", " sentence 8\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " clause WayX\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Conj CP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Pred VP\n", "
\n", "
\n", "\n", "
\n", "\n", "
verb dream qal wayq
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Objc NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs dream
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs two
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " clause Ellp\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs man
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Objc NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs dream
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Time PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs night
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs one
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " clause Ellp\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs man
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Adju PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep as
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs interpretation
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs dream
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " clause WayX\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs give drink hif ptca
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs baker
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " clause Attr NmCl\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Rela CP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj <relative>
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase PreC PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs king
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
nmpr Egypt
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " clause Attr Ptcp\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Rela CP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj <relative>
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase PreC VP\n", "
\n", "
\n", "\n", "
\n", "\n", "
verb bind qal ptcp
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep in
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs house
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs prison
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "B.show(results, start=1, end=3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That looks good.\n", "\n", "We can also apply feature conditions to `gapphrase`:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:05:41.293060Z", "start_time": "2018-05-24T08:05:41.237943Z" }, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.00s 176 results\n" ] }, { "data": { "text/markdown": [ "\n", "\n", "**verse** *1*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "
\n", " \n", " \n", "
\n", "\n", "
\n", "\n", "
\n", " sentence 59\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " clause WayX\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Conj CP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Pred VP\n", "
\n", "
\n", "\n", "
\n", "\n", "
verb be qal wayq
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs two
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase PreC AdjP\n", "
\n", "
\n", "\n", "
\n", "\n", "
adjv naked
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj NP\n", "
\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs human, mankind
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs woman
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " sentence 60\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " clause WxY0\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Conj CP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Nega NegP\n", "
\n", "
\n", "\n", "
\n", "\n", "
nega not
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Pred VP\n", "
\n", "
\n", "\n", "
\n", "\n", "
verb be ashamed hit impf
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**verse** *2*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "
\n", " \n", " \n", "
\n", "\n", "
\n", "\n", "
\n", " sentence 11\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " clause WXQt\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Conj CP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj PrNP\n", "
\n", "
\n", "\n", "
\n", "\n", "
nmpr Abel
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Pred VP\n", "
\n", "
\n", "\n", "
\n", "\n", "
verb come hif perf
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj PrNP\n", "
\n", "
\n", "\n", "
\n", "\n", "
advb even
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
prps he
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs first-born
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs cattle
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
prep from
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs fat
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " sentence 12\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " clause WayX\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Conj CP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Pred VP\n", "
\n", "
\n", "\n", "
\n", "\n", "
verb look qal wayq
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj PrNP\n", "
\n", "
\n", "\n", "
\n", "\n", "
nmpr YHWH
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
nmpr Abel
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs present
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "\n", "\n", "**verse** *3*\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "
\n", " \n", " \n", "
\n", "\n", "
\n", "\n", "
\n", " sentence 17\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " clause Ellp\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj PPrP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prps they
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs whole
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs wild animal
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj PPrP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs kind
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj PPrP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj PPrP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs whole
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs cattle
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj PPrP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs kind
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj PPrP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj PPrP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs whole
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs creeping animals
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " clause Attr Ptcp\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Rela CP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj the
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase PreC VP\n", "
\n", "
\n", "\n", "
\n", "\n", "
verb creep qal ptca
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Cmpl PP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep upon
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs earth
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " clause Ellp\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj PPrP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs kind
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj PPrP\n", "
\n", "
\n", "\n", "
\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj PPrP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs whole
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs birds
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj PPrP\n", "
\n", "
\n", "\n", "
\n", "\n", "
prep to
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs kind
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase Subj PPrP\n", "
\n", "
\n", "\n", "
\n", "\n", "
subs whole
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs bird
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs whole
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
subs wing
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "query = '''\n", "gapphrase function=Subj\n", "'''\n", "results = B.search(query, sets=customSets)\n", "B.show(results, start=1, end=3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Two-phrase clauses\n", "\n", "We can find the gaps, but do our minds always reckon with gaps?\n", "Gaps cause unexpected semantics.\n", "Here is a little puzzle.\n", "\n", "Suppose we want to count the clauses consisting of exactly two phrases.\n", "\n", "Here follows a little journey.\n", "We use a query to find the clauses, check the result with hand-coding, scratch our heads,\n", "refine the query, the hand-coding and our question until we are satisfied.\n", "\n", "### Attempt 1\n", "\n", "#### By query\n", "\n", "The following template should do it:\n", "a clause, starting with a phrase, followed by an adjacent phrase,\n", "which terminates the clause." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:56:03.852429Z", "start_time": "2018-05-24T08:56:03.849179Z" } }, "outputs": [], "source": [ "query = '''\n", "clause\n", " =: phrase\n", " <: phrase\n", " :=\n", "'''" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:56:06.276198Z", "start_time": "2018-05-24T08:56:05.153080Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 1.11s 23483 results\n" ] }, { "data": { "text/markdown": [ "n | clause | phrase | phrase\n", "--- | --- | --- | ---\n", "1|יְהִ֣י אֹ֑ור |יְהִ֣י |אֹ֑ור \n", "2|כִּי־טֹ֑וב |כִּי־|טֹ֑וב \n", "3|אֲשֶׁר֙ מִתַּ֣חַת לָרָקִ֔יעַ |אֲשֶׁר֙ |מִתַּ֣חַת לָרָקִ֔יעַ \n", "4|אֲשֶׁ֖ר מֵעַ֣ל לָרָקִ֑יעַ |אֲשֶׁ֖ר |מֵעַ֣ל לָרָקִ֑יעַ \n", "5|כִּי־טֹֽוב׃ |כִּי־|טֹֽוב׃ \n", "6|מַזְרִ֣יעַ זֶ֔רַע |מַזְרִ֣יעַ |זֶ֔רַע \n", "7|כִּי־טֹֽוב׃ |כִּי־|טֹֽוב׃ \n", "8|לְהַבְדִּ֕יל בֵּ֥ין הַיֹּ֖ום וּבֵ֣ין הַלָּ֑יְלָה |לְהַבְדִּ֕יל |בֵּ֥ין הַיֹּ֖ום וּבֵ֣ין הַלָּ֑יְלָה \n", "9|לְהָאִ֖יר עַל־הָאָ֑רֶץ |לְהָאִ֖יר |עַל־הָאָ֑רֶץ \n", "10|לְהָאִ֖יר עַל־הָאָֽרֶץ׃ |לְהָאִ֖יר |עַל־הָאָֽרֶץ׃ " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "results = B.search(query)\n", "B.table(results, end=10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we want to have the clauses only, we run it in shallow mode:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 1.06s 23483 results\n" ] } ], "source": [ "clausesByQuery = sorted(B.search(query, shallow=True))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### By hand\n", "\n", "Let us check this with a piece of hand-written code.\n", "We want clauses that consist of exactly two phrases." ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:56:12.592108Z", "start_time": "2018-05-24T08:56:11.096022Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.00s counting ...\n", " 0.82s Done: found 23862\n" ] } ], "source": [ "indent(reset=True)\n", "info('counting ...')\n", "\n", "clausesByHand = []\n", "for clause in F.otype.s('clause'):\n", " phrases = L.d(clause, otype='phrase')\n", " if len(phrases) == 2:\n", " clausesByHand.append(clause)\n", "clausesByHand = sorted(clausesByHand)\n", "info(f'Done: found {len(clausesByHand)}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### The difference" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Strange, we end up with too many cases. What is happening? Let us compare the results.\n", "We look at the first result where both methods diverge.\n", "\n", "We put the difference finding in a little function." ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:56:16.255454Z", "start_time": "2018-05-24T08:56:16.244135Z" } }, "outputs": [], "source": [ "def showDiff(queryResults, handResults):\n", " diff = [x for x in zip(queryResults, handResults) if x[0] != x[1]]\n", " if not diff:\n", " print(f'''\n", "{len(queryResults):>6} queryResults\n", " are identical with\n", "{len(handResults):>6} handResults\n", "''')\n", " return\n", " (rQuery, rHand) = diff[0]\n", " if rQuery < rHand:\n", " print(f'clause {rQuery} is a query result but not found by hand')\n", " toShow = rQuery\n", " else:\n", " print(f'clause {rHand} is not a query result but has been found by hand')\n", " toShow = rHand\n", " colors = ['aqua', 'aquamarine', 'khaki', 'lavender', 'yellow']\n", " highlights = {}\n", " for (i, phrase) in enumerate(L.d(toShow, otype='phrase')):\n", " highlights[phrase] = colors[i % len(colors)]\n", " # for atom in L.d(phrase, otype='phrase_atom'):\n", " # highlights[atom] = colors[i % len(colors)]\n", " B.pretty(toShow, withNodes=True, suppress={'lex', 'sp', 'vt', 'vs'}, highlights=highlights)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:56:16.255454Z", "start_time": "2018-05-24T08:56:16.244135Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "clause 427931 is not a query result but has been found by hand\n" ] }, { "data": { "text/html": [ "
\n", "427931\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " clause 427931 XYqt\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase 652631 Subj NP\n", "
\n", "
\n", "\n", "
\n", "1904\n", "\n", "
subs whole
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " clause 427931 XYqt\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase 652633 PreO VP\n", "
\n", "
\n", "\n", "
\n", "1906\n", "\n", "
verb kill
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "showDiff(clausesByQuery, clausesByHand)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lo and behold:\n", "\n", "* the hand-written code is right in a sense: this is a clause that consists exactly of two phrases.\n", "* the query is also right in a sense: the two phrases are not adjacent: there is a gap in the clause between them!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Attempt 2\n", "\n", "#### By hand\n", "\n", "We modify the hand-written code such that only clauses qualify if the two phrases are adjacent." ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:56:12.592108Z", "start_time": "2018-05-24T08:56:11.096022Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.00s counting ...\n", " 1.00s Done: found 23399\n" ] } ], "source": [ "indent(reset=True)\n", "info('counting ...')\n", "\n", "clausesByHand2 = []\n", "for clause in F.otype.s('clause'):\n", " phrases = L.d(clause, otype='phrase')\n", " if len(phrases) == 2:\n", " if L.d(phrases[0], otype='word')[-1] + 1 == L.d(phrases[1], otype='word')[0]:\n", " clausesByHand2.append(clause)\n", "clausesByHand2 = sorted(clausesByHand2)\n", "info(f'Done: found {len(clausesByHand2)}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### The difference\n", "\n", "Now we have too few cases. What is going on?" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:56:16.255454Z", "start_time": "2018-05-24T08:56:16.244135Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "clause 428692 is a query result but not found by hand\n" ] }, { "data": { "text/html": [ "
\n", "428692\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " clause 428692 WxQ0\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase 655060 Conj CP\n", "
\n", "
\n", "\n", "
\n", "6514\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase 655061 Objc PP\n", "
\n", "
\n", "\n", "
\n", "6515\n", "\n", "
advb even
\n", "\n", "\n", "
\n", "\n", "
\n", "6516\n", "\n", "
prep <object marker>
\n", "\n", "\n", "
\n", "\n", "
\n", "6517\n", "\n", "
nmpr Lot
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase 655061 Objc PP\n", "
\n", "
\n", "\n", "
\n", "6518\n", "\n", "
subs brother
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase 655061 Objc PP\n", "
\n", "
\n", "\n", "
\n", "6519\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase 655061 Objc PP\n", "
\n", "
\n", "\n", "
\n", "6520\n", "\n", "
subs property
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase 655062 Pred VP\n", "
\n", "
\n", "\n", "
\n", "6521\n", "\n", "
verb return
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase 655061 Objc PP\n", "
\n", "
\n", "\n", "
\n", "6522\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase 655061 Objc PP\n", "
\n", "
\n", "\n", "
\n", "6523\n", "\n", "
advb even
\n", "\n", "\n", "
\n", "\n", "
\n", "6524\n", "\n", "
prep <object marker>
\n", "\n", "\n", "
\n", "\n", "
\n", "6525\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "6526\n", "\n", "
subs woman
\n", "\n", "\n", "
\n", "\n", "
\n", "6527\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "6528\n", "\n", "
prep <object marker>
\n", "\n", "\n", "
\n", "\n", "
\n", "6529\n", "\n", "
art the
\n", "\n", "\n", "
\n", "\n", "
\n", "6530\n", "\n", "
subs people
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "showDiff(clausesByQuery, clausesByHand2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Observe:\n", "\n", "This clause has three phrases, but the third one lies inside the second one.\n", "\n", "* the hand-written clause is right in a sense: this clause has three phrases.\n", "* the query is right in a sense: it contains two adjacent phrases that together span the whole clause." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Attempt 3\n", "\n", "#### By query\n", "\n", "Can we adjust the pattern to exclude cases like this? \n", "Yes, with custom sets, see [advanced](searchAdvanced.ipynb).\n", "\n", "Instead of looking through all phrases, we can just consider non gapped phrases only.\n", "\n", "Earlier in this notebook we have constructed the set of non-gapped phrases\n", "and put it under the name `conphrase` in the custom sets." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 1.32s 23327 results\n" ] } ], "source": [ "query = '''\n", "clause\n", " =: conphrase\n", " <: conphrase\n", " :=\n", "'''\n", "\n", "clausesByQuery2 = sorted(B.search(query, sets=customSets, shallow=True))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### The difference\n", "\n", "There is still a difference." ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:56:16.255454Z", "start_time": "2018-05-24T08:56:16.244135Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "clause 428374 is not a query result but has been found by hand\n" ] }, { "data": { "text/html": [ "
\n", "428374\n", "\n", "\n", "\n", "
\n", "\n", "
\n", " clause 428374 Ellp\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase 654063 Conj CP\n", "
\n", "
\n", "\n", "
\n", "4718\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase 654064 Objc PP\n", "
\n", "
\n", "\n", "
\n", "4719\n", "\n", "
prep <object marker>
\n", "\n", "\n", "
\n", "\n", "
\n", "4720\n", "\n", "
nmpr Pathrusites
\n", "\n", "\n", "
\n", "\n", "
\n", "4721\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "
\n", "4722\n", "\n", "
prep <object marker>
\n", "\n", "\n", "
\n", "\n", "
\n", "4723\n", "\n", "
nmpr Casluhites
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " clause 428374 Ellp\n", "
\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase 654064 Objc PP\n", "
\n", "
\n", "\n", "
\n", "4729\n", "\n", "
conj and
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "\n", "
\n", " phrase 654064 Objc PP\n", "
\n", "
\n", "\n", "
\n", "4730\n", "\n", "
prep <object marker>
\n", "\n", "\n", "
\n", "\n", "
\n", "4731\n", "\n", "
subs Caphtorite
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n", "\n", "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "showDiff(clausesByQuery2, clausesByHand2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Observe:\n", "\n", "This clause has two phrases, the second one has a gap, which coincides with a gap in the clause.\n", "\n", "* the hand-written clause is right in a sense: this clause has two phrases, consecutive, and they span the whole clause, nothin left out.\n", "* the query is right in a sense: the second phrase is not consecutive." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Attempt 4\n", "\n", "#### By hand\n", "\n", "We modify the hand-written code, so that only consecutive clauses qualify." ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:56:12.592108Z", "start_time": "2018-05-24T08:56:11.096022Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.00s counting ...\n", " 1.35s Done: found 23327\n" ] } ], "source": [ "indent(reset=True)\n", "info('counting ...')\n", "\n", "clausesByHand3 = []\n", "for clause in F.otype.s('clause'):\n", " if hasGap(clause):\n", " continue\n", " phrases = L.d(clause, otype='phrase')\n", " if len(phrases) == 2:\n", " if L.d(phrases[0], otype='word')[-1] + 1 == L.d(phrases[1], otype='word')[0]:\n", " clausesByHand3.append(clause)\n", "clausesByHand3 = sorted(clausesByHand3)\n", "info(f'Done: found {len(clausesByHand3)}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### The difference\n", "\n", "Now the number of results agree. But are they really the same?" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:56:16.255454Z", "start_time": "2018-05-24T08:56:16.244135Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " 23327 queryResults\n", " are identical with\n", " 23327 handResults\n", "\n" ] } ], "source": [ "showDiff(clausesByQuery2, clausesByHand3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Conclusion\n", "\n", "It took four attempts to arrive at the final concept of things that we were looking for.\n", "\n", "Sometimes the search template had to be modified, sometimes the hand-written code.\n", "\n", "The interplay and systematic comparison between the attempts helped to spot all relevant\n", "configurations of phrases within clauses." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Spans\n", "\n", "Here is another cause of wrong query results: there are sentences that span multiple verses.\n", "Such sentences are not contained in any verse.\n", "That makes that they are easily missed out in queries.\n", "\n", "We describe a scenario where that happens.\n", "\n", "## Mother clauses\n", "\n", "A clause and its mother do not have to be in the same verse.\n", "We are going to fetch are the cases where they are in different verses.\n", "\n", "### All mother clauses\n", "\n", "But first we fetch all pairs of clauses connected by a mother edge." ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:00:06.688698Z", "start_time": "2018-05-24T08:00:05.864656Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.96s 13907 results\n" ] }, { "data": { "text/markdown": [ "n | clause | phrase | phrase\n", "--- | --- | --- | ---\n", "1|יְהִ֣י אֹ֑ור |יְהִ֣י |אֹ֑ור \n", "2|כִּי־טֹ֑וב |כִּי־|טֹ֑וב \n", "3|אֲשֶׁר֙ מִתַּ֣חַת לָרָקִ֔יעַ |אֲשֶׁר֙ |מִתַּ֣חַת לָרָקִ֔יעַ \n", "4|אֲשֶׁ֖ר מֵעַ֣ל לָרָקִ֑יעַ |אֲשֶׁ֖ר |מֵעַ֣ל לָרָקִ֑יעַ \n", "5|כִּי־טֹֽוב׃ |כִּי־|טֹֽוב׃ \n", "6|מַזְרִ֣יעַ זֶ֔רַע |מַזְרִ֣יעַ |זֶ֔רַע \n", "7|כִּי־טֹֽוב׃ |כִּי־|טֹֽוב׃ \n", "8|לְהַבְדִּ֕יל בֵּ֥ין הַיֹּ֖ום וּבֵ֣ין הַלָּ֑יְלָה |לְהַבְדִּ֕יל |בֵּ֥ין הַיֹּ֖ום וּבֵ֣ין הַלָּ֑יְלָה \n", "9|לְהָאִ֖יר עַל־הָאָ֑רֶץ |לְהָאִ֖יר |עַל־הָאָ֑רֶץ \n", "10|לְהָאִ֖יר עַל־הָאָֽרֶץ׃ |לְהָאִ֖יר |עַל־הָאָֽרֶץ׃ " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "query = '''\n", "clause\n", "-mother> clause\n", "'''\n", "allMotherPairs = B.search(query)\n", "B.table(results, end=10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Mother in another verse\n", "\n", "Now we modify the query to the effect that mother and daughter must sit in distinct verses." ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:00:11.096751Z", "start_time": "2018-05-24T08:00:10.585477Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.33s 721 results\n" ] }, { "data": { "text/markdown": [ "n | clause | phrase | phrase\n", "--- | --- | --- | ---\n", "1|יְהִ֣י אֹ֑ור |יְהִ֣י |אֹ֑ור \n", "2|כִּי־טֹ֑וב |כִּי־|טֹ֑וב \n", "3|אֲשֶׁר֙ מִתַּ֣חַת לָרָקִ֔יעַ |אֲשֶׁר֙ |מִתַּ֣חַת לָרָקִ֔יעַ \n", "4|אֲשֶׁ֖ר מֵעַ֣ל לָרָקִ֑יעַ |אֲשֶׁ֖ר |מֵעַ֣ל לָרָקִ֑יעַ \n", "5|כִּי־טֹֽוב׃ |כִּי־|טֹֽוב׃ \n", "6|מַזְרִ֣יעַ זֶ֔רַע |מַזְרִ֣יעַ |זֶ֔רַע \n", "7|כִּי־טֹֽוב׃ |כִּי־|טֹֽוב׃ \n", "8|לְהַבְדִּ֕יל בֵּ֥ין הַיֹּ֖ום וּבֵ֣ין הַלָּ֑יְלָה |לְהַבְדִּ֕יל |בֵּ֥ין הַיֹּ֖ום וּבֵ֣ין הַלָּ֑יְלָה \n", "9|לְהָאִ֖יר עַל־הָאָ֑רֶץ |לְהָאִ֖יר |עַל־הָאָ֑רֶץ \n", "10|לְהָאִ֖יר עַל־הָאָֽרֶץ׃ |לְהָאִ֖יר |עַל־הָאָֽרֶץ׃ " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "query = '''\n", "cm:clause\n", "-mother> cd:clause\n", "\n", "v1:verse\n", "v2:verse\n", "v1 # v2\n", "\n", "cm ]] v1\n", "cd ]] v2\n", "'''\n", "diffMotherPairs = B.search(query)\n", "B.table(results, end=10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Mother in same verse\n", "\n", "As a check,\n", "we modify the latter query and require `v1` and `v2` to be the same verse, to get the\n", "mother pairs of which both members are in the same verse." ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:00:11.096751Z", "start_time": "2018-05-24T08:00:10.585477Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.51s 13160 results\n" ] }, { "data": { "text/markdown": [ "n | clause | phrase | phrase\n", "--- | --- | --- | ---\n", "1|יְהִ֣י אֹ֑ור |יְהִ֣י |אֹ֑ור \n", "2|כִּי־טֹ֑וב |כִּי־|טֹ֑וב \n", "3|אֲשֶׁר֙ מִתַּ֣חַת לָרָקִ֔יעַ |אֲשֶׁר֙ |מִתַּ֣חַת לָרָקִ֔יעַ \n", "4|אֲשֶׁ֖ר מֵעַ֣ל לָרָקִ֑יעַ |אֲשֶׁ֖ר |מֵעַ֣ל לָרָקִ֑יעַ \n", "5|כִּי־טֹֽוב׃ |כִּי־|טֹֽוב׃ \n", "6|מַזְרִ֣יעַ זֶ֔רַע |מַזְרִ֣יעַ |זֶ֔רַע \n", "7|כִּי־טֹֽוב׃ |כִּי־|טֹֽוב׃ \n", "8|לְהַבְדִּ֕יל בֵּ֥ין הַיֹּ֖ום וּבֵ֣ין הַלָּ֑יְלָה |לְהַבְדִּ֕יל |בֵּ֥ין הַיֹּ֖ום וּבֵ֣ין הַלָּ֑יְלָה \n", "9|לְהָאִ֖יר עַל־הָאָ֑רֶץ |לְהָאִ֖יר |עַל־הָאָ֑רֶץ \n", "10|לְהָאִ֖יר עַל־הָאָֽרֶץ׃ |לְהָאִ֖יר |עַל־הָאָֽרֶץ׃ " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "query = '''\n", "cm:clause\n", "-mother> cd:clause\n", "\n", "v1:verse\n", "v2:verse\n", "v1 = v2\n", "\n", "cm ]] v1\n", "cd ]] v2\n", "'''\n", "sameMotherPairs = B.search(query)\n", "B.table(results, end=10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The difference\n", "\n", "Let's check if the numbers add up:\n", "\n", "* the first query asked for all pairs\n", "* the second query asked for pairs with members in different verses\n", "* the third query asked for pairs with members in the same verse\n", "\n", "Then the results of the second and third query combined should\n", "equal the results of the first query.\n", "\n", "That makes sense.\n", "\n", "Still, let's check:" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:00:16.632029Z", "start_time": "2018-05-24T08:00:16.627787Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "26\n" ] } ], "source": [ "discrepancy = len(allMotherPairs) - len(diffMotherPairs) - len(sameMotherPairs)\n", "print(discrepancy)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The numbers do not add up. We are missing cases. Why?\n", "\n", "Clauses may cross verse boundaries. In that case they are not part of a verse, and hence our latter two queries\n", "do not detect them. Let's count how many verse boundary crossing clauses there are." ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:00:20.987274Z", "start_time": "2018-05-24T08:00:17.973289Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 3.50s 50 results\n" ] } ], "source": [ "query = '''\n", "clause\n", " =: first:word\n", " last:word\n", " :=\n", "v1:verse\n", " w1:word\n", "v2:verse\n", " w2:word\n", " \n", "first = w1\n", "last = w2\n", "v1 # v2\n", "'''\n", "results = B.search(query)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some of these verse spanning clauses do not have mothers or are not mothers. Let's count the cases where two clauses\n", "are in a mother relation and at least one of them spans a verse.\n", "\n", "We need two queries for that. These queries are almost similar. One retrieves the clause pairs where the mother\n", "crosses verse boundaries, and the other where the daughter does so.\n", "\n", "But we are programmers. We do not have to repeat ourselves:" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:00:28.320191Z", "start_time": "2018-05-24T08:00:24.591375Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 26 spanners are missing\n", " 26 missing cases were detected before\n", " 0 is the resulting disagreement\n" ] } ], "source": [ "queryCommon = '''\n", "c1:clause\n", "-mother> c2:clause\n", "\n", "c3:clause\n", " =: first:word\n", " last:word\n", " :=\n", "v1:verse\n", " w1:word\n", "v2:verse\n", " w2:word\n", " \n", "first = w1\n", "last = w2\n", "v1 # v2\n", "'''\n", "\n", "query1 = f'''\n", "{queryCommon}\n", "c1 = c3\n", "'''\n", "query2 = f'''\n", "{queryCommon}\n", "c2 = c3\n", "'''\n", "\n", "results1 = B.search(query1, silent=True)\n", "results2 = B.search(query2, silent=True)\n", "spannersByQuery = {(r[0], r[1]) for r in results1 + results2}\n", "print(f'{len(spannersByQuery):>3} spanners are missing')\n", "print(f'{discrepancy:>3} missing cases were detected before')\n", "print(f'{discrepancy - len(spannersByQuery):>3} is the resulting disagreement')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We may find the mother clause pairs in which it least one member is verse spanning by hand-coding in an easier way:\n", "\n", "Starting with the set of all mother pairs, we filter out any pair that has a verse spanner." ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "26" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "spannersByHand = set()\n", "\n", "for (c1, c2) in allMotherPairs:\n", " if not (\n", " L.u(c1, otype='verse')\n", " and\n", " L.u(c2, otype='verse')\n", " ):\n", " spannersByHand.add((c1, c2))\n", " \n", "len(spannersByHand)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And, to be completely sure:" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "spannersByHand == spannersByQuery" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### By custom sets\n", "\n", "If we are content with the clauses that do not span verses,\n", "we can put them in a set, and modify the queries by replacing `clause` by `conclause`\n", "and bind the right set to it.\n", "\n", "Here we go. In one cell we run the queries to get all pairs, the mother-daughter-in-separate-verses pairs,\n", "and the mother-daughter-in-same-verses pair and we do the math of checking." ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "All pairs\n", " 0.46s 13881 results\n", "Different verse pairs\n", " 0.56s 721 results\n", "Same verse pairs\n", " 0.45s 13160 results\n", "Intersection same-verse/different-verse pairs: set()\n", "All pairs is union of same-verse/different-verse pairs: True\n" ] } ], "source": [ "conClauses = {c for c in F.otype.s('clause') if L.u(c, otype='verse')}\n", "customSets = dict(conclause=conClauses)\n", "\n", "print('All pairs')\n", "allPairs = B.search('''\n", "conclause\n", "-mother> conclause\n", "''', \n", " sets=customSets,\n", ")\n", "\n", "print('Different verse pairs')\n", "diffPairs = B.search('''\n", "cm:conclause\n", "-mother> cd:conclause\n", "\n", "v1:verse\n", "v2:verse\n", "v1 # v2\n", "\n", "cm ]] v1\n", "cd ]] v2\n", "''',\n", " sets=customSets,\n", ")\n", "\n", "print('Same verse pairs')\n", "samePairs = B.search('''\n", "cm:conclause\n", "-mother> cd:conclause\n", "\n", "v1:verse\n", "v2:verse\n", "v1 = v2\n", "\n", "cm ]] v1\n", "cd ]] v2\n", "''',\n", " sets=customSets,\n", ")\n", "\n", "allPairSet = set(allPairs)\n", "diffPairSet = {(r[0], r[1]) for r in diffPairs}\n", "samePairSet = {(r[0], r[1]) for r in samePairs}\n", "\n", "print(f'Intersection same-verse/different-verse pairs: {samePairSet & diffPairSet}')\n", "print(f'All pairs is union of same-verse/different-verse pairs: {allPairSet == (samePairSet | diffPairSet)}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lessons\n", "\n", "* mix programming with composing queries;\n", "* a good way to do so is custom sets;\n", "* use programming for processing results;\n", "* find the balance between queries and hand-coding." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Next\n", "\n", "**You have now finished the search tutorial.**\n", "\n", "If you are interested in reproducing MQL queries in Text-Fabric search templates,\n", "see [fromMQL](searchFromMQL.ipynb).\n", "\n", "---\n", "\n", "[basic](search.ipynb)\n", "[advanced](searchAdvanced.ipynb)\n", "[relations](searchRelations.ipynb)\n", "[quantifiers](searchQuantifiers.ipynb)\n", "[rough](searchRough.ipynb)\n", "gaps" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }