{ "cells": [ { "cell_type": "markdown", "id": "b33ecd94-2835-4391-9f08-670b379d13de", "metadata": {}, "source": [ "# Advanced search options (Nestle1904LFT)\n", "\n", "NOTE: This notebook requires significant cleaning and rework." ] }, { "cell_type": "markdown", "id": "d53d318b-65fe-4980-8a48-daddd355f115", "metadata": {}, "source": [ "## Table of content \n", "* 1 - Introduction\n", "* 2 - Load Text-Fabric app and data\n", "* 3 - Performing the queries\n", " * 3.1 - TBD\n", " * 3.2 - Inspecting your query\n", " * 3.2 - Comparing two lists with query results\n", "* 4 - Discussion\n", "* 5 - Atribution and footnotes\n", "* 6 - Required libraries" ] }, { "cell_type": "markdown", "id": "8ab013e2-c82e-4cd8-a54f-d696b1bebd41", "metadata": {}, "source": [ "# 1 - Introduction \n", "##### [Back to TOC](#TOC)\n", "\n", "TBD" ] }, { "cell_type": "markdown", "id": "dbc8843b-4930-4ab4-b92b-dbbfb76ca88f", "metadata": {}, "source": [ "# 2 - Load Text-Fabric app and data \n", "##### [Back to TOC](#TOC)" ] }, { "cell_type": "code", "execution_count": 1, "id": "9cc1b0db-edf8-4795-af72-2f2be414d5d8", "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 2, "id": "6abd3aa9-e545-48f1-8392-33b63e28db6d", "metadata": {}, "outputs": [], "source": [ "# Loading the Text-Fabric code\n", "# Note: it is assumed Text-Fabric is installed in your environment\n", "from tf.fabric import Fabric\n", "from tf.app import use" ] }, { "cell_type": "code", "execution_count": 3, "id": "61625284-ca5a-4832-b188-c881d8efceca", "metadata": { "scrolled": true, "tags": [] }, "outputs": [ { "data": { "text/markdown": [ "**Locating corpus resources ...**" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "app: ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.7" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " TF: TF API 12.2.2, tonyjurg/Nestle1904LFT/app v3, Search Reference
\n", " Data: tonyjurg - Nestle1904LFT 0.7, Character table, Feature docs
\n", "
Node types\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "
Name# of nodes# slots / node% coverage
book275102.93100
chapter260529.92100
verse794317.35100
sentence801117.20100
wg1054306.85524
word1377791.00100
\n", " Sets: no custom sets
\n", " Features:
\n", "
Nestle 1904 (Low Fat Tree)\n", "
\n", "\n", "
\n", "
\n", "after\n", "
\n", "
str
\n", "\n", " ✅ Characters (eg. punctuations) following the word\n", "\n", "
\n", "\n", "
\n", "
\n", "book\n", "
\n", "
str
\n", "\n", " ✅ Book name (in English language)\n", "\n", "
\n", "\n", "
\n", "
\n", "booknumber\n", "
\n", "
int
\n", "\n", " ✅ NT book number (Matthew=1, Mark=2, ..., Revelation=27)\n", "\n", "
\n", "\n", "
\n", "
\n", "bookshort\n", "
\n", "
str
\n", "\n", " ✅ Book name (abbreviated)\n", "\n", "
\n", "\n", "
\n", "
\n", "case\n", "
\n", "
str
\n", "\n", " ✅ Gramatical case (Nominative, Genitive, Dative, Accusative, Vocative)\n", "\n", "
\n", "\n", "
\n", "
\n", "chapter\n", "
\n", "
int
\n", "\n", " ✅ Chapter number inside book\n", "\n", "
\n", "\n", "
\n", "
\n", "clausetype\n", "
\n", "
str
\n", "\n", " ✅ Clause type details (e.g. Verbless, Minor)\n", "\n", "
\n", "\n", "
\n", "
\n", "containedclause\n", "
\n", "
str
\n", "\n", " 🆗 Contained clause (WG number)\n", "\n", "
\n", "\n", "
\n", "
\n", "degree\n", "
\n", "
str
\n", "\n", " ✅ Degree (e.g. Comparitative, Superlative)\n", "\n", "
\n", "\n", "
\n", "
\n", "gloss\n", "
\n", "
str
\n", "\n", " ✅ English gloss\n", "\n", "
\n", "\n", "
\n", "
\n", "gn\n", "
\n", "
str
\n", "\n", " ✅ Gramatical gender (Masculine, Feminine, Neuter)\n", "\n", "
\n", "\n", "
\n", "
\n", "headverse\n", "
\n", "
str
\n", "\n", " ✅ Start verse number of a sentence\n", "\n", "
\n", "\n", "
\n", "
\n", "junction\n", "
\n", "
str
\n", "\n", " ✅ Junction data related to a wordgroup\n", "\n", "
\n", "\n", "
\n", "
\n", "lemma\n", "
\n", "
str
\n", "\n", " ✅ Lexeme (lemma)\n", "\n", "
\n", "\n", "
\n", "
\n", "lex_dom\n", "
\n", "
str
\n", "\n", " ✅ Lexical domain according to Semantic Dictionary of Biblical Greek, SDBG (not present everywhere?)\n", "\n", "
\n", "\n", "
\n", "
\n", "ln\n", "
\n", "
str
\n", "\n", " ✅ Lauw-Nida lexical classification (not present everywhere?)\n", "\n", "
\n", "\n", "
\n", "
\n", "markafter\n", "
\n", "
str
\n", "\n", " 🆗 Text critical marker after word\n", "\n", "
\n", "\n", "
\n", "
\n", "markbefore\n", "
\n", "
str
\n", "\n", " 🆗 Text critical marker before word\n", "\n", "
\n", "\n", "
\n", "
\n", "markorder\n", "
\n", "
str
\n", "\n", "  Order of punctuation and text critical marker\n", "\n", "
\n", "\n", "
\n", "
\n", "monad\n", "
\n", "
int
\n", "\n", " ✅ Monad (smallest token matching word order in the corpus)\n", "\n", "
\n", "\n", "
\n", "
\n", "mood\n", "
\n", "
str
\n", "\n", " ✅ Gramatical mood of the verb (passive, etc)\n", "\n", "
\n", "\n", "
\n", "
\n", "morph\n", "
\n", "
str
\n", "\n", " ✅ Morphological tag (Sandborg-Petersen morphology)\n", "\n", "
\n", "\n", "
\n", "
\n", "nodeID\n", "
\n", "
str
\n", "\n", " ✅ Node ID (as in the XML source data)\n", "\n", "
\n", "\n", "
\n", "
\n", "normalized\n", "
\n", "
str
\n", "\n", " ✅ Surface word with accents normalized and trailing punctuations removed\n", "\n", "
\n", "\n", "
\n", "
\n", "nu\n", "
\n", "
str
\n", "\n", " ✅ Gramatical number (Singular, Plural)\n", "\n", "
\n", "\n", "
\n", "
\n", "number\n", "
\n", "
str
\n", "\n", " ✅ Gramatical number of the verb (e.g. singular, plural)\n", "\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "person\n", "
\n", "
str
\n", "\n", " ✅ Gramatical person of the verb (first, second, third)\n", "\n", "
\n", "\n", "
\n", "
\n", "punctuation\n", "
\n", "
str
\n", "\n", " ✅ Punctuation after word\n", "\n", "
\n", "\n", "
\n", "
\n", "ref\n", "
\n", "
str
\n", "\n", " ✅ Value of the ref ID (taken from XML sourcedata)\n", "\n", "
\n", "\n", "
\n", "
\n", "reference\n", "
\n", "
str
\n", "\n", " ✅ Reference (to nodeID in XML source data, not yet post-processes)\n", "\n", "
\n", "\n", "
\n", "
\n", "roleclausedistance\n", "
\n", "
str
\n", "\n", " ⚠️ Distance to the wordgroup defining the syntactical role of this word\n", "\n", "
\n", "\n", "
\n", "
\n", "sentence\n", "
\n", "
int
\n", "\n", " ✅ Sentence number (counted per chapter)\n", "\n", "
\n", "\n", "
\n", "
\n", "sp\n", "
\n", "
str
\n", "\n", " ✅ Part of Speech (abbreviated)\n", "\n", "
\n", "\n", "
\n", "
\n", "sp_full\n", "
\n", "
str
\n", "\n", " ✅ Part of Speech (long description)\n", "\n", "
\n", "\n", "
\n", "
\n", "strongs\n", "
\n", "
str
\n", "\n", " ✅ Strongs number\n", "\n", "
\n", "\n", "
\n", "
\n", "subj_ref\n", "
\n", "
str
\n", "\n", " 🆗 Subject reference (to nodeID in XML source data, not yet post-processes)\n", "\n", "
\n", "\n", "
\n", "
\n", "tense\n", "
\n", "
str
\n", "\n", " ✅ Gramatical tense of the verb (e.g. Present, Aorist)\n", "\n", "
\n", "\n", "
\n", "
\n", "type\n", "
\n", "
str
\n", "\n", " ✅ Gramatical type of noun or pronoun (e.g. Common, Personal)\n", "\n", "
\n", "\n", "
\n", "
\n", "unicode\n", "
\n", "
str
\n", "\n", " ✅ Word as it apears in the text in Unicode (incl. punctuations)\n", "\n", "
\n", "\n", "
\n", "
\n", "verse\n", "
\n", "
int
\n", "\n", " ✅ Verse number inside chapter\n", "\n", "
\n", "\n", "
\n", "
\n", "voice\n", "
\n", "
str
\n", "\n", " ✅ Gramatical voice of the verb (e.g. active,passive)\n", "\n", "
\n", "\n", "
\n", "
\n", "wgclass\n", "
\n", "
str
\n", "\n", " ✅ Class of the wordgroup (e.g. cl, np, vp)\n", "\n", "
\n", "\n", "
\n", "
\n", "wglevel\n", "
\n", "
int
\n", "\n", " 🆗 Number of the parent wordgroups for a wordgroup\n", "\n", "
\n", "\n", "
\n", "
\n", "wgnum\n", "
\n", "
int
\n", "\n", " ✅ Wordgroup number (counted per book)\n", "\n", "
\n", "\n", "
\n", "
\n", "wgrole\n", "
\n", "
str
\n", "\n", " ✅ Syntactical role of the wordgroup (abbreviated)\n", "\n", "
\n", "\n", "
\n", "
\n", "wgrolelong\n", "
\n", "
str
\n", "\n", " ✅ Syntactical role of the wordgroup (full)\n", "\n", "
\n", "\n", "
\n", "
\n", "wgrule\n", "
\n", "
str
\n", "\n", " ✅ Wordgroup rule information (e.g. Np-Appos, ClCl2, PrepNp)\n", "\n", "
\n", "\n", "
\n", "
\n", "wgtype\n", "
\n", "
str
\n", "\n", " ✅ Wordgroup type details (e.g. group, apposition)\n", "\n", "
\n", "\n", "
\n", "
\n", "word\n", "
\n", "
str
\n", "\n", " ✅ Word as it appears in the text (excl. punctuations)\n", "\n", "
\n", "\n", "
\n", "
\n", "wordlevel\n", "
\n", "
str
\n", "\n", " 🆗 Number of the parent wordgroups for a word\n", "\n", "
\n", "\n", "
\n", "
\n", "wordrole\n", "
\n", "
str
\n", "\n", " ✅ Syntactical role of the word (abbreviated)\n", "\n", "
\n", "\n", "
\n", "
\n", "wordrolelong\n", "
\n", "
str
\n", "\n", " ✅ Syntactical role of the word (full)\n", "\n", "
\n", "\n", "
\n", "
\n", "wordtranslit\n", "
\n", "
str
\n", "\n", " 🆗 Transliteration of the text (in latin letters, excl. punctuations)\n", "\n", "
\n", "\n", "
\n", "
\n", "wordunacc\n", "
\n", "
str
\n", "\n", " ✅ Word without accents (excl. punctuations)\n", "\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "\n", " Settings:
specified
  1. apiVersion: 3
  2. appName: tonyjurg/Nestle1904LFT
  3. appPath:C:/Users/tonyj/text-fabric-data/github/tonyjurg/Nestle1904LFT/app
  4. commit: e68bd68c7c4c862c1464d995d51e27db7691254f
  5. css: ''
  6. dataDisplay:
    • excludedFeatures:
      • orig_order
      • verse
      • book
      • chapter
    • noneValues:
      • none
      • unknown
      • no value
      • NA
      • ''
    • showVerseInTuple: 0
    • textFormat: text-orig-full
  7. docs:
    • docBase: https://github.com/tonyjurg/Nestle1904LFT/blob/main/docs/
    • docPage: about
    • docRoot: https://github.com/tonyjurg/Nestle1904LFT
    • featureBase:https://github.com/tonyjurg/Nestle1904LFT/blob/main/docs/features/<feature>.md
  8. interfaceDefaults: {fmt: layout-orig-full}
  9. isCompatible: True
  10. local: local
  11. localDir:C:/Users/tonyj/text-fabric-data/github/tonyjurg/Nestle1904LFT/_temp
  12. provenanceSpec:
    • corpus: Nestle 1904 (Low Fat Tree)
    • doi: 10.5281/zenodo.10182594
    • org: tonyjurg
    • relative: /tf
    • repo: Nestle1904LFT
    • repro: Nestle1904LFT
    • version: 0.7
    • webBase: https://learner.bible/text/show_text/nestle1904/
    • webHint: Show this on the Bible Online Learner website
    • webLang: en
    • webUrl:https://learner.bible/text/show_text/nestle1904/<1>/<2>/<3>
    • webUrlLex: {webBase}/word?version={version}&id=<lid>
  13. release: v0.6
  14. typeDisplay:
    • book:
      • condense: True
      • hidden: True
      • label: {book}
      • style: ''
    • chapter:
      • condense: True
      • hidden: True
      • label: {chapter}
      • style: ''
    • sentence:
      • hidden: 0
      • label: #{sentence} (start: {book} {chapter}:{headverse})
      • style: ''
    • verse:
      • condense: True
      • excludedFeatures: chapter verse
      • label: {book} {chapter}:{verse}
      • style: ''
    • wg:
      • hidden: 0
      • label:#{wgnum}: {wgtype} {wgclass} {clausetype} {wgrole} {wgrule} {junction}
      • style: ''
    • word:
      • base: True
      • features: lemma
      • featuresBare: gloss
      • surpress: chapter verse
  15. writing: grc
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
TF API: names N F E L T S C TF Fs Fall Es Eall Cs Call directly usable

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# load the N1904 app and data\n", "N1904 = use (\"tonyjurg/Nestle1904LFT\", version=\"0.7\", hoist=globals())" ] }, { "cell_type": "code", "execution_count": 4, "id": "99f4d49e-d075-4e3d-b6ba-8c104ebbefd5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# The following will push the Text-Fabric stylesheet to this notebook (to facilitate proper display with notebook viewer)\n", "N1904.dh(N1904.getCss())" ] }, { "cell_type": "code", "execution_count": 5, "id": "2b9232b5-98a4-4277-b31d-9f379e5546c2", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Set default view in a way to limit noise as much as possible.\n", "N1904.displaySetup(condensed=True, multiFeatures=False,queryFeatures=False)" ] }, { "cell_type": "markdown", "id": "6cee9712-ae79-477c-9f39-1a51e3a0a8a1", "metadata": {}, "source": [ "# 3 - Performing the queries \n", "##### [Back to TOC](#TOC)" ] }, { "cell_type": "markdown", "id": "e2b752ab-1b2a-46f3-a079-fe0a45555b8d", "metadata": {}, "source": [ "## 3.1 - TBD\n", "##### [Back to TOC](#TOC)\n", "\n", "TBD" ] }, { "cell_type": "markdown", "id": "47a87214-feb2-4b04-9057-91f302ba600c", "metadata": {}, "source": [ "## 3.2 - Inspecting your query\n", "##### [Back to TOC](#TOC)\n", "\n", "Each query templace can be inspected by use of [`S.study()`](https://annotation.github.io/text-fabric/tf/search/search.html#tf.search.search.Search.study). This is particulary helpfull in case the query is complicated." ] }, { "cell_type": "code", "execution_count": 6, "id": "2352983f-398d-4cd6-afbc-a7ef83d25602", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.00s Checking search template ...\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ " 0 \n", " 1 wg\n", " 2 /where\n", " 3 wg phrasefunction=S\n", " 4 /have\n", " 5 /without\n", " 6 word sp#conj\n", " 7 /-\n", " 8 /-\n", " 9 wg phrasefunction=O\n", "10 \n", "Missing feature \"phrasefunction\" in line(s) 9\n" ] } ], "source": [ "ComplicatedQuery = '''\n", "wg\n", "/where/\n", " wg phrasefunction=S\n", "/have/\n", " /without/\n", " word sp#conj\n", " /-/\n", "/-/\n", " wg phrasefunction=O\n", "'''\n", "S.study(ComplicatedQuery)" ] }, { "cell_type": "code", "execution_count": 5, "id": "2fbd91b5-5092-4018-a9af-fa63a2a934be", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ " 15s Cannot show plan if there is no previous \"study()\"\n" ] } ], "source": [ "S.showPlan()" ] }, { "cell_type": "code", "execution_count": 6, "id": "83ee75bd-f2c3-4c91-b0fe-a3663c812873", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ " 22s Cannot fetch if there is no previous \"study()\"\n" ] }, { "ename": "TypeError", "evalue": "'NoneType' object is not iterable", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)", "Cell \u001b[1;32mIn[6], line 1\u001b[0m\n\u001b[1;32m----> 1\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m result \u001b[38;5;129;01min\u001b[39;00m S\u001b[38;5;241m.\u001b[39mfetch(limit\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m10\u001b[39m):\n\u001b[0;32m 2\u001b[0m TF\u001b[38;5;241m.\u001b[39minfo(S\u001b[38;5;241m.\u001b[39mglean(result))\n", "\u001b[1;31mTypeError\u001b[0m: 'NoneType' object is not iterable" ] } ], "source": [ "for result in S.fetch(limit=10):\n", " TF.info(S.glean(result))" ] }, { "cell_type": "markdown", "id": "e3d3c434-6b7a-435e-9525-d2096866e604", "metadata": {}, "source": [ "## 3.3 - Comparing two lists with query results\n", "##### [Back to TOC](#TOC)\n", "\n", "Using python standard functions, it apears to be easy to verify if the result of two queries are the same or different. However, it is good to have a closer look at the matter since there are a few pitfalls that could lead to false results." ] }, { "cell_type": "code", "execution_count": 7, "id": "a57e9482-6843-4f69-a965-3803eac00fe3", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ " 0 \n", " 1 phrase phrasefunction=V\n", " 2 word lemma=λέγω\n", " 3 \n", "line 1: Unknown object type: \"phrase\"\n", "Valid object types are: book, chapter, verse, sentence, wg, word\n", "Or choose a custom set from: \n", "Missing feature \"phrasefunction\" in line(s) 1\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " 0.00s Cannot load feature \"phrasefunction\": not in dataset\n", " 0.00s 0 results\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ " 0 \n", " 1 phrase phrasefunction=V\n", " 2 word lemma=λέγω\n", " 3 \n", "line 1: Unknown object type: \"phrase\"\n", "Valid object types are: book, chapter, verse, sentence, wg, word\n", "Or choose a custom set from: \n", "Missing feature \"phrasefunction\" in line(s) 1\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " 0.00s Cannot load feature \"phrasefunction\": not in dataset\n", " 0.00s 0 results\n", "Same result? True\n" ] } ], "source": [ "# define query template 1\n", "SomeQuery ='''\n", "phrase phrasefunction=V\n", " word lemma=λέγω\n", "'''\n", "# now create two lists with identical query results and compare result lists\n", "SomeResult1=N1904.search(SomeQuery)\n", "SomeResult2=N1904.search(SomeQuery)\n", "print(f'Same result? {SomeResult1 == SomeResult2}')" ] }, { "cell_type": "markdown", "id": "ea751b45-2398-4ea4-b3cf-81808cae3aa4", "metadata": {}, "source": [ "This is exactly what we would expect. But comparing lists can be tricky. Consider the following two queries." ] }, { "cell_type": "code", "execution_count": 25, "id": "5373b900-a8fe-4079-8aeb-bf7f2ee5e64c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.34s 3851 results\n", " 0.35s 3851 results\n", "Same result? False\n" ] } ], "source": [ "# define query template 1\n", "Query1 ='''\n", "phrase\n", " a:word sp=prep\n", " b:word sp=adj\n", " c:word sp=noun \n", "'''\n", "\n", "# define query template 2\n", "Query2 ='''\n", "phrase\n", " a:word sp=prep\n", " b:word sp=noun\n", " c:word sp=adj\n", "'''\n", "\n", "# create and compare result lists\n", "ResultQuery1=N1904.search(Query1)\n", "ResultQuery2=N1904.search(Query2)\n", "\n", "print(f'Same result? {ResultQuery1 == ResultQuery2}')" ] }, { "cell_type": "markdown", "id": "702723ab-2a52-41da-934e-7c4b36118e2b", "metadata": {}, "source": [ "This method of comparing the lists results identifies a difference between the two lists. However, upon closer examination, that may or may not be the case, depending on what is understood as difference. The 'problem' here is that both ResultQuery1 and ResultQuery2 are lists of **ordered tuples**. The swapping of the feature conditions 'sp=adj' and 'sp=noun' did not result in a different set of results; only the presentation of rhe result differed. " ] }, { "cell_type": "code", "execution_count": 33, "id": "a38422bf-24fb-435c-a156-a6bee707986b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.34s 3851 results\n", " 0.33s 3851 results\n", "Unsorted lists: Same result ? False\n", "Sorted lists: Same result ? False\n" ] } ], "source": [ "# create 2 result lists\n", "ResultQuery3=N1904.search(Query1,sort=True)\n", "ResultQuery4=N1904.search(Query1,sort=False)\n", "\n", "# compare unsorted lists\n", "print(f'Unsorted lists: Same result ? {ResultQuery3 == ResultQuery4}')\n", "\n", "# sort both lists on the first tuple \n", "SortedResultQuery3 = sorted(ResultQuery3, key=lambda x: x[0])\n", "SortedResultQuery4 = sorted(ResultQuery4, key=lambda x: x[0])\n", "\n", "# compare sorted lists\n", "print(f'Sorted lists: Same result ? {SortedResultQuery3 == SortedResultQuery4}')" ] }, { "cell_type": "markdown", "id": "979830b3-0714-4824-80eb-8a484062e73c", "metadata": {}, "source": [ "Unexpectedly the python list compare still viewed the two lists as different. But why? Python does report that the two lists are different because the comparison of lists (SortedResultQuery3 == SortedResultQuery4) checks for the equality of the list objects, not their contents.\n", "\n", "The search() function in Text-Fabric returns a list of nodes or tuples representing search results. Even if the search criteria and the data are the same, the two lists, ResultQuery3 and ResultQuery4, are distinct list objects. Hence, when comparing them directly using the == operator, Python considers them as different objects, resulting in False.\n", "\n", "To compare the content of the lists, it is advices to first onvert them to sets and compare those sets. See following example: " ] }, { "cell_type": "code", "execution_count": 35, "id": "3aa8e50e-d7ed-4724-9d23-b2b81dbebda2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Lists ResultQuery3 and ResultQuery4 are equal.\n" ] } ], "source": [ "# Convert tuples to sets\n", "set1 = set(tuple(item) for item in ResultQuery3)\n", "set2 = set(tuple(item) for item in ResultQuery4)\n", "\n", "# Compare the sets\n", "if set1 == set2:\n", " print(\"Lists ResultQuery3 and ResultQuery4 are equal.\")\n", "else:\n", " print(\"Lists ResultQuery3 and ResultQuery4 are not equal.\")" ] }, { "cell_type": "markdown", "id": "d6c70d84-b455-4213-9827-049a03dd21fd", "metadata": {}, "source": [ "Now, let's compare ResultQuery1 and ResultQuery2 again by first converting them to sets. " ] }, { "cell_type": "code", "execution_count": 38, "id": "3302530c-0dd3-4062-882c-087d00c72b71", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Lists ResultQuery1 and ResultQuery2 are not equal.\n" ] } ], "source": [ "# Convert tuples to sets\n", "set1 = set(tuple(item) for item in ResultQuery1)\n", "set2 = set(tuple(item) for item in ResultQuery2)\n", "\n", "# Compare the sets\n", "if set1 == set2:\n", " print(\"Lists ResultQuery1 and ResultQuery2 are equal.\")\n", "else:\n", " print(\"Lists ResultQuery1 and ResultQuery2 are not equal.\")" ] }, { "cell_type": "markdown", "id": "eb6b6ec3-6079-4bb9-8edb-14cad4bae771", "metadata": {}, "source": [ "This is indeed the result we expected (see earlier mentioned reasons)." ] }, { "cell_type": "markdown", "id": "30d9b5c9-6e9c-45f7-b890-c24c48930015", "metadata": {}, "source": [ "## 3.4 - Using search qualifiers\n", "##### [Back to TOC](#TOC)\n", "\n", "A search template can also use the following:" ] }, { "cell_type": "code", "execution_count": 7, "id": "5ea2fbcc-9f71-4325-83d8-17e9bafb128a", "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'N1904GBI' is not defined", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mNameError\u001b[0m Traceback (most recent call last)", "Cell \u001b[1;32mIn[7], line 10\u001b[0m\n\u001b[0;32m 1\u001b[0m ErgetaiQuery \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m'''\u001b[39m\n\u001b[0;32m 2\u001b[0m \u001b[38;5;124mword word=ἔρχεται\u001b[39m\n\u001b[0;32m 3\u001b[0m \u001b[38;5;124m/with/\u001b[39m\n\u001b[1;32m (...)\u001b[0m\n\u001b[0;32m 7\u001b[0m \u001b[38;5;124m/-/\u001b[39m\n\u001b[0;32m 8\u001b[0m \u001b[38;5;124m'''\u001b[39m\n\u001b[1;32m---> 10\u001b[0m ErgetaiResult \u001b[38;5;241m=\u001b[39m N1904GBI\u001b[38;5;241m.\u001b[39msearch(ErgetaiQuery)\n", "\u001b[1;31mNameError\u001b[0m: name 'N1904GBI' is not defined" ] } ], "source": [ "ErgetaiQuery = '''\n", "word word=ἔρχεται\n", "/with/\n", "book=Mark chapter=6 verse=1 \n", "/or/\n", "book=Revelation chapter=1 verse=7 \n", "/-/\n", "'''\n", "\n", "ErgetaiResult = N1904GBI.search(ErgetaiQuery) \n", "# returns list of ordered tuples" ] }, { "cell_type": "raw", "id": "e35cc584-64f8-4b55-8d0e-439cbe11f862", "metadata": {}, "source": [ "Maybe discuss:\n", "\n", "query = '''\n", "node feature1=A|B\n", "node feature2=X|Y|Z\n", "'''" ] }, { "cell_type": "markdown", "id": "84ba6446-de16-4709-88ff-67d7b2f9d6cc", "metadata": {}, "source": [ "# 4 - Discussion\n", "##### [Back to TOC](#TOC)\n", "\n", "TBA" ] }, { "cell_type": "markdown", "id": "a2256298-1c2c-4918-8c33-8a459d3a2b64", "metadata": { "tags": [] }, "source": [ "# 5 - Attribution and footnotes\n", "##### [Back to TOC](#TOC)\n", "\n", "NA" ] }, { "cell_type": "markdown", "id": "8d487b6a-447f-42b8-a0c7-1165b2312e6c", "metadata": { "tags": [] }, "source": [ "# 6 - Required libraries \n", "##### [Back to TOC](#TOC)\n", "\n", "The scripts in this notebook require (beside `text-fabric`) the following Python libraries to be installed in the environment:\n", "\n", " {none}\n", "\n", "You can install any missing library from within Jupyter Notebook using either`pip` or `pip3`." ] }, { "cell_type": "code", "execution_count": null, "id": "45457f3b-34c3-4d5f-88dc-4fa1a1d033c5", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" } }, "nbformat": 4, "nbformat_minor": 5 }