{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# How to query the `nametype` feature?\n", "\n", "## The case\n", "\n", "Victor Isaak reported a strange case on Slack, a SHEBANQ query\n", "\n", "```\n", "select all objects\n", "where\n", "[lex focus\n", " nametype = 'pers'\n", " OR\n", " nametype = 'gens'\n", "]\n", "```\n", "\n", "whose results were not shown properly shown in SHEBANQ.\n", "\n", "In particular, in this verse there seem to be 3 hits, but only one hit (`Riphath`) is highlighted:\n", "\n", "![nr](images/nametype.png)\n", "\n", "## Locating\n", "\n", "Let's drill down by means of Text-Fabric.\n", "\n", "First we need to find where this case is, and in what version of the BHSA it occurs.\n", "\n", "We start with loading version `c` and locating the case." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will load the versions `4b`, `2017` and `c`." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from tf.app import use" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "TF-app: ~/github/annotation/app-bhsa/code" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/github/etcbc/bhsa/tf/c" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/github/etcbc/phono/tf/c" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/github/etcbc/parallels/tf/c" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Text-Fabric: Text-Fabric API 8.3.0, app-bhsa, Search Reference
Data: BHSA, Character table, Feature docs
Features:
Parallel Passagescrossref
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensisbook
book@ll
chapter
code
det
domain
freq_lex
function
g_cons
g_cons_utf8
g_lex
g_lex_utf8
g_word
g_word_utf8
gloss
gn
label
language
lex
lex_utf8
ls
nametype
nme
nu
number
otype
pargr
pdp
pfm
prs
prs_gn
prs_nu
prs_ps
ps
qere
qere_trailer
qere_trailer_utf8
qere_utf8
rank_lex
rela
sp
st
tab
trailer
trailer_utf8
txt
typ
uvf
vbe
vbs
verse
voc_lex
voc_lex_utf8
vs
vt
mother
oslots
Phonetic Transcriptionsphono
phono_trailer
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Text-Fabric API: names N F E L T S C TF directly usable

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# A = use(\"ETCBC/bhsa\", hoist=globals())\n", "A = use(\"ETCBC/bhsa:clone\", checkout=\"clone\", hoist=globals())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Make sure which version we have:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'c'" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A.version" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Right. Let's start with looking for `RIJPA37T` in the `g_word` feature:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.45s 1 result\n" ] } ], "source": [ "results = A.search(\n", " \"\"\"\n", "word g_word=RIJPA73T\n", "\"\"\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Good, that's clear. Where is it?" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "Genesis 10:3" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "w = results[0][0]\n", "A.webLink(w)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When we click this link, the verse opens in SHEBANQ" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reproducing\n", "\n", "Now let's do the original query." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "query = \"\"\"\n", "lex nametype=pers|gens\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.02s 1722 results\n" ] } ], "source": [ "results = A.search(query)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets find the occurrences of the results in this verse:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "query = \"\"\"\n", "lex nametype=pers|gens\n", " w:word\n", "\n", "verse book=Genesis chapter=10 verse=3\n", " w\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.55s 3 results\n" ] } ], "source": [ "results = A.search(query)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We show these words and their name types" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "lines_to_next_cell": 2 }, "outputs": [ { "data": { "text/html": [ "

verse 1

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
verse:1414591
book=Genesischapter=10verse=3
sentence:1172884
clause:428355
phrase:654041
4570 וּ
phrase:654042
nametype=pers
phrase:654043
nametype=gens
4574 וְ
nametype=gens
4576 וְ
nametype=topo
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.show(results, condensed=True, withNodes=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This looks perfectly alright." ] }, { "cell_type": "markdown", "metadata": { "lines_to_next_cell": 2 }, "source": [ "# Other versions\n", "\n", "Let's repeat this exercise in two other versions: `2017` and `4b`.\n", "\n", "We write a function that produces the result right away.\n", "\n", "The TF-API of the data source is passed as parameter." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "def gensPers(A):\n", " A.dm(f\"Version `{A.version}`\\n\")\n", " query = \"\"\"\n", "lex nametype=pers|gens\n", " w:word\n", "\n", "verse book=Genesis chapter=10 verse=3\n", " w\n", " \"\"\"\n", " results = A.search(query)\n", " A.show(results, condensed=True, withNodes=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We load version 2017, but without hoisting the API to the global namespace.\n", "Instead, we retain the API in a mapping from version name to TF-API.\n", "We make sure that we do not loose the API for version `c` which we have just loaded." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "A = {\"c\": A}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2017" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "TF-app: ~/github/annotation/app-bhsa/code" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/github/etcbc/bhsa/tf/2017" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/github/etcbc/phono/tf/2017" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/github/etcbc/parallels/tf/2017" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Text-Fabric: Text-Fabric API 8.3.0, app-bhsa, Search Reference
Data: BHSA, Character table, Feature docs
Features:
Parallel Passagescrossref
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensisbook
book@ll
chapter
code
det
domain
freq_lex
function
g_cons
g_cons_utf8
g_lex
g_lex_utf8
g_word
g_word_utf8
gloss
gn
label
language
lex
lex_utf8
ls
nametype
nme
nu
number
otype
pargr
pdp
pfm
prs
prs_gn
prs_nu
prs_ps
ps
qere
qere_trailer
qere_trailer_utf8
qere_utf8
rank_lex
rela
sp
st
tab
trailer
trailer_utf8
txt
typ
uvf
vbe
vbs
verse
voc_lex
voc_lex_utf8
vs
vt
mother
omap@ll
oslots
Phonetic Transcriptionsphono
phono_trailer
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# A['2017'] = use('ETCBC/bhsa', version='2017')\n", "A[\"2017\"] = use(\"ETCBC/bhsa:clone\", checkout=\"clone\", version=\"2017\")" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "Version `2017`\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ " 0.55s 3 results\n" ] }, { "data": { "text/html": [ "

verse 1

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
verse:1414427
book=Genesischapter=10verse=3
sentence:1172803
clause:428355
phrase:654002
4570 וּ
phrase:654003
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "gensPers(A[\"2017\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Observation: the same words are highlighted, but the `nametype` feature is not shown. Why? Probably because in version `2017`\n", "the `nametype` feature is only available for `lex` nodes and not for `word` nodes. Let's find out for sure." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "*word* 4572 has nametype `None`\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "*lexeme* 1437887 has nametype `pers`\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A2017 = A[\"2017\"]\n", "F2017 = A2017.api.F\n", "L2017 = A2017.api.L\n", "\n", "w = 4572\n", "lx = L2017.u(w, otype=\"lex\")[0]\n", "\n", "A2017.dm(f\"*word* {w} has nametype `{F2017.nametype.v(w)}`\\n\")\n", "A2017.dm(f\"*lexeme* {lx} has nametype `{F2017.nametype.v(lx)}`\\n\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Indeed!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4b" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "TF-app: ~/github/annotation/app-bhsa/code" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/github/etcbc/bhsa/tf/4b" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "The requested data is not available offline\n" ] }, { "data": { "text/html": [ "data: ~/github/etcbc/parallels/tf/4b" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "There were problems with loading data.\n", "The Text-Fabric API has not been loaded!\n", "The app \"bhsa\" will not work!\n" ] } ], "source": [ "# A['4b'] = use('ETCBC/bhsa', version='4b')\n", "A[\"4b\"] = use(\"ETCBC/bhsa:clone\", checkout=\"clone\", version=\"4b\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok, version `4b` is rather old. We go to GitHub to look at the\n", "[release notes of the BHSA data](https://github.com/ETCBC/bhsa/releases).\n", "\n", "There we see that the latest release of the data does not include the older versions anymore.\n", "So we have to go back to an earlier release, `v1.5`:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "TF-app: ~/github/annotation/app-bhsa/code" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "rate limit is 5000 requests per hour, with 4986 left for this hour\n", "\tconnecting to online GitHub repo etcbc/bhsa ... connected\n" ] }, { "data": { "text/html": [ "data: ~/text-fabric-data/etcbc/bhsa/tf/4b" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "rate limit is 5000 requests per hour, with 4983 left for this hour\n", "\tconnecting to online GitHub repo etcbc/phono ... connected\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\tno release tagged \"1.5\"\n", "The requested data is not available online\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "rate limit is 5000 requests per hour, with 4981 left for this hour\n", "\tconnecting to online GitHub repo etcbc/parallels ... connected\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\tno release tagged \"1.5\"\n", "The requested data is not available online\n", "There were problems with loading data.\n", "The Text-Fabric API has not been loaded!\n", "The app \"bhsa\" will not work!\n" ] } ], "source": [ "# A['4b'] = use('ETCBC/bhsa', checkout='1.5', version='4b')\n", "A[\"4b\"] = use(\"ETCBC/bhsa:clone\", checkout=\"1.5\", version=\"4b\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we have the core data, but TF wants to get additional data (`parallels` and `phono` which is not available for this version in this release.\n", "We tell TF to not fetch additional modules:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "TF-app: ~/github/annotation/app-bhsa/code" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "rate limit is 5000 requests per hour, with 4979 left for this hour\n", "\tconnecting to online GitHub repo etcbc/bhsa ... connected\n" ] }, { "data": { "text/html": [ "data: ~/text-fabric-data/etcbc/bhsa/tf/4b" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Text-Fabric: Text-Fabric API 8.3.0, app-bhsa, Search Reference
Data: BHSA, Character table, Feature docs
Features:
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensisbook
book@ll
chapter
code
det
domain
freq_lex
function
g_cons
g_cons_utf8
g_entry
g_entry_heb
g_lex
g_lex_utf8
g_qere_utf8
g_word
g_word_utf8
gloss
gn
label
language
lex
lex_utf8
ls
nametype
nme
nu
number
otype
pargr
pdp
pfm
phono
phono_sep
prs
ps
qtrailer_utf8
rank_lex
rela
sp
st
tab
trailer_utf8
txt
typ
uvf
vbe
vbs
verse
vs
vt
mother
omap@ll
oslots
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "App config error(s) in lex:\n", "\ttemplate: feature voc_lex_utf8 not loaded\n", "\tlabel: feature voc_lex_utf8 not loaded\n" ] }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# A['4b'] = use('ETCBC/bhsa', checkout='1.5', version='4b', provenanceSpec=dict(moduleSpecs=()))\n", "A[\"4b\"] = use(\n", " \"bhsa:clone\", checkout=\"1.5\", version=\"4b\", provenanceSpec=dict(moduleSpecs=())\n", ")" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "Version `4b`\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ " 0.56s 3 results\n" ] }, { "data": { "text/html": [ "

verse 1

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
verse:1413883
book=Genesischapter=10verse=3
sentence:1172536
clause:428344
phrase:653784
4570 וּ
nametype=
phrase:653785
nametype=
nametype=pers
phrase:653786
nametype=gens
4574 וְ
nametype=
nametype=gens
4576 וְ
nametype=
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "gensPers(A[\"4b\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again: all three highlighted." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "TF-app: ~/github/annotation/app-bhsa/code" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "rate limit is 5000 requests per hour, with 4976 left for this hour\n", "\tconnecting to online GitHub repo etcbc/bhsa ... connected\n" ] }, { "data": { "text/html": [ "data: ~/text-fabric-data/etcbc/bhsa/tf/4" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Text-Fabric: Text-Fabric API 8.3.0, app-bhsa, Search Reference
Data: BHSA, Character table, Feature docs
Features:
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensisbook
book@ll
chapter
clause_kind
code
det
domain
freq_lex
function
g_cons
g_cons_utf8
g_entry
g_entry_heb
g_lex
g_lex_utf8
g_qere_utf8
g_word
g_word_utf8
gloss
gn
half_verse
label
language
lex
lex_utf8
ls
nametype
nme
nu
number
otype
pargr
pdp
pfm
phono
phono_sep
prs
ps
qtrailer_utf8
rank_lex
rela
sp
st
tab
trailer_utf8
txt
typ
uvf
vbe
vbs
verse
vs
vt
mother
omap@ll
oslots
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "App config error(s) in lex:\n", "\ttemplate: feature voc_lex_utf8 not loaded\n", "\tlabel: feature voc_lex_utf8 not loaded\n" ] }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# A['4'] = use('ETCBC/bhsa', checkout='1.5', version='4', provenanceSpec=dict(moduleSpecs=()))\n", "A[\"4\"] = use(\n", " \"bhsa:clone\", checkout=\"1.5\", version=\"4\", provenanceSpec=dict(moduleSpecs=())\n", ")" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "Version `4`\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ " 0.55s 3 results\n" ] }, { "data": { "text/html": [ "

verse 1

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
verse:1418169
book=Genesischapter=10verse=3
sentence:1173552
clause:428331
phrase:652838
4570 וּ
nametype=
phrase:652839
nametype=
nametype=pers
phrase:652840
nametype=gens
4574 וְ
nametype=
nametype=gens
4576 וְ
nametype=
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "gensPers(A[\"4\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Conclusion 1\n", "\n", "Versions `4`, `4b`, `2017`, and `c` of the BHSA all have the `nametype` feature on `lex` nodes with values `pers`, `gens`, `gens` for the three words of Genesis 10:3.\n", "\n", "Version `c` also has the `nametype` on `word` nodes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Conclusion 2\n", "\n", "I have run this\n", "query on [SHEBANQ](https://shebanq.ancient-data.org/hebrew/query?id=3921)\n", "on version `2017`, and `c` and they all produced the expected results.\n", "\n", "For version `4` and `4b` I had to modify the query, because these versions have not the `lex` node type.\n", "\n", "The data on GitHub, however, has the `lex` node type, see [`otype` in version 4](https://github.com/ETCBC/bhsa/blob/master/tf/4/otype.tf).\n", "\n", "Probably I have added `lex` later to `4` and `4b`, without bringing it over to SHEBANQ." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Without more information I can not reproduce the screenshot at the start of the notebook." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Reproduced!\n", "\n", "Viktor has shared the full [query](https://shebanq.ancient-data.org/hebrew/query?version=c&id=3919).\n", "\n", "Observations:\n", "\n", "The text of the query is\n", "\n", "```\n", "select all objects\n", "in {4539-4965}\n", "where\n", "[lex focus\n", " nametype = 'pers'\n", " OR\n", " nametype = 'gens'\n", "]\n", "```\n", "\n", "There are only three results, one of which is in Genesis 10:3, the word `RIJPA73T` only." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Explanation\n", "\n", "How does this make sense? The meaning of the query is:\n", "\n", "* restrict the search to the portion of the corpus from slot (word) 4539 till slot 4965 (including);\n", "* in that portion find lexeme nodes in it with certain properties\n", "\n", "What does it mean, lexeme nodes inside a portion of the corpus?\n", "\n", "A lexeme node occupies the slots of its occurrences, so we are interested in lexemes that have\n", "all of their occurrences in the indicated portion.\n", "\n", "This rules out many lexemes.\n", "\n", "Let's verify by manual coding that the other two `pers` and `gens` words in Genesis 10:3\n", "have occurrences outside this region." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We do this in version `c` and continue to work in this version only. We still have the globals `N E F L T S TF` tied to the `c` version\n", "of the data, we only have to restore `A` to the `c` version:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "A = A[\"c\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We repeat the query" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "query = \"\"\"\n", "lex nametype=pers|gens\n", " w:word\n", "\n", "verse book=Genesis chapter=10 verse=3\n", " w\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.52s 3 results\n" ] } ], "source": [ "results = A.search(query)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Of each result (a tuple of nodes), we pretty-display the lex node, which is the first of the tuple\n", "since `lex` is the first node mentioned in the search template.\n", "\n", "The pretty display of a lexeme shows the first and last occurrence of it." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
lex
nametype=pers
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
lex
nametype=gens
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
lex
nametype=gens
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "for result in results:\n", " lx = result[0]\n", " A.pretty(lx)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Indeed, only `RIJPA73T` occurs in the narrow portion that the SHEBANQ query was looking in." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Tips\n", "\n", "How to query for word with certain lexeme properties in a portion of the query?\n", "\n", "If the lexeme properties are present on the occurrences of the lexeme (the word nodes),\n", "this query will do:\n", "\n", "```\n", "select all objects\n", "in {4539-4965}\n", "where\n", "[word focus\n", " nametype = 'pers'\n", " OR\n", " nametype = 'gens'\n", "]\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But, as we saw, in version `2017` the `nametype` property only exists on the `lex` nodes?\n", "\n", "How do we go about this then?\n", "\n", "The clue is on p. 21 of Ulrik's MQL query guide.\n", "We can search for words that are contained in a lex by using monad set relation clauses:\n", "\n", "```\n", "select all objects\n", "in {4539-4965}\n", "where\n", "[word focus\n", " [lex overlap(substrate) nametype = 'pers' OR nametype = 'gens']\n", "]\n", "```\n", "\n", "So, we start for selecting all words 4539 - 4965, and for each word we require that there is a lex with some\n", "properties that overlaps with it." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "See [`nametype` x](https://shebanq.ancient-data.org/hebrew/query?version=c&id=3922)\n", "on SHEBANQ." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 4 }