"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"gensPers(A[\"4\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Conclusion 1\n",
"\n",
"Versions `4`, `4b`, `2017`, and `c` of the BHSA all have the `nametype` feature on `lex` nodes with values `pers`, `gens`, `gens` for the three words of Genesis 10:3.\n",
"\n",
"Version `c` also has the `nametype` on `word` nodes."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Conclusion 2\n",
"\n",
"I have run this\n",
"query on [SHEBANQ](https://shebanq.ancient-data.org/hebrew/query?id=3921)\n",
"on version `2017`, and `c` and they all produced the expected results.\n",
"\n",
"For version `4` and `4b` I had to modify the query, because these versions have not the `lex` node type.\n",
"\n",
"The data on GitHub, however, has the `lex` node type, see [`otype` in version 4](https://github.com/ETCBC/bhsa/blob/master/tf/4/otype.tf).\n",
"\n",
"Probably I have added `lex` later to `4` and `4b`, without bringing it over to SHEBANQ."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Without more information I can not reproduce the screenshot at the start of the notebook."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Reproduced!\n",
"\n",
"Viktor has shared the full [query](https://shebanq.ancient-data.org/hebrew/query?version=c&id=3919).\n",
"\n",
"Observations:\n",
"\n",
"The text of the query is\n",
"\n",
"```\n",
"select all objects\n",
"in {4539-4965}\n",
"where\n",
"[lex focus\n",
" nametype = 'pers'\n",
" OR\n",
" nametype = 'gens'\n",
"]\n",
"```\n",
"\n",
"There are only three results, one of which is in Genesis 10:3, the word `RIJPA73T` only."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Explanation\n",
"\n",
"How does this make sense? The meaning of the query is:\n",
"\n",
"* restrict the search to the portion of the corpus from slot (word) 4539 till slot 4965 (including);\n",
"* in that portion find lexeme nodes in it with certain properties\n",
"\n",
"What does it mean, lexeme nodes inside a portion of the corpus?\n",
"\n",
"A lexeme node occupies the slots of its occurrences, so we are interested in lexemes that have\n",
"all of their occurrences in the indicated portion.\n",
"\n",
"This rules out many lexemes.\n",
"\n",
"Let's verify by manual coding that the other two `pers` and `gens` words in Genesis 10:3\n",
"have occurrences outside this region."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We do this in version `c` and continue to work in this version only. We still have the globals `N E F L T S TF` tied to the `c` version\n",
"of the data, we only have to restore `A` to the `c` version:"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
"A = A[\"c\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We repeat the query"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"query = \"\"\"\n",
"lex nametype=pers|gens\n",
" w:word\n",
"\n",
"verse book=Genesis chapter=10 verse=3\n",
" w\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 0.52s 3 results\n"
]
}
],
"source": [
"results = A.search(query)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Of each result (a tuple of nodes), we pretty-display the lex node, which is the first of the tuple\n",
"since `lex` is the first node mentioned in the search template.\n",
"\n",
"The pretty display of a lexeme shows the first and last occurrence of it."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"for result in results:\n",
" lx = result[0]\n",
" A.pretty(lx)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Indeed, only `RIJPA73T` occurs in the narrow portion that the SHEBANQ query was looking in."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tips\n",
"\n",
"How to query for word with certain lexeme properties in a portion of the query?\n",
"\n",
"If the lexeme properties are present on the occurrences of the lexeme (the word nodes),\n",
"this query will do:\n",
"\n",
"```\n",
"select all objects\n",
"in {4539-4965}\n",
"where\n",
"[word focus\n",
" nametype = 'pers'\n",
" OR\n",
" nametype = 'gens'\n",
"]\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But, as we saw, in version `2017` the `nametype` property only exists on the `lex` nodes?\n",
"\n",
"How do we go about this then?\n",
"\n",
"The clue is on p. 21 of Ulrik's MQL query guide.\n",
"We can search for words that are contained in a lex by using monad set relation clauses:\n",
"\n",
"```\n",
"select all objects\n",
"in {4539-4965}\n",
"where\n",
"[word focus\n",
" [lex overlap(substrate) nametype = 'pers' OR nametype = 'gens']\n",
"]\n",
"```\n",
"\n",
"So, we start for selecting all words 4539 - 4965, and for each word we require that there is a lex with some\n",
"properties that overlaps with it."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"See [`nametype` x](https://shebanq.ancient-data.org/hebrew/query?version=c&id=3922)\n",
"on SHEBANQ."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}