{ "cells": [ { "cell_type": "markdown", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "\n", "\n", "\n", "\n", "You might want to consider the [start](search.ipynb) of this tutorial.\n", "\n", "Short introductions to other TF datasets:\n", "\n", "* [Dead Sea Scrolls](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/dss.ipynb),\n", "* [Old Babylonian Letters](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/oldbabylonian.ipynb),\n", "or the\n", "* [Quran](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/quran.ipynb)\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [] }, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T10:06:39.818664Z", "start_time": "2018-05-24T10:06:39.796588Z" }, "tags": [] }, "outputs": [], "source": [ "import collections\n", "\n", "from tf.app import use" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T10:06:48.865143Z", "start_time": "2018-05-24T10:06:44.712958Z" }, "tags": [] }, "outputs": [ { "data": { "text/markdown": [ "**Locating corpus resources ...**" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "app: ~/text-fabric-data/github/ETCBC/bhsa/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/ETCBC/bhsa/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/ETCBC/phono/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/ETCBC/parallels/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " TF: TF API 12.1.7, ETCBC/bhsa/app v3, Search Reference
\n", " Data: ETCBC - bhsa 2021, Character table, Feature docs
\n", "
Node types\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "
Name# of nodes# slots / node% coverage
book3910938.21100
chapter929459.19100
lex923046.22100
verse2321318.38100
half_verse451799.44100
sentence637176.70100
sentence_atom645146.61100
clause881314.84100
clause_atom907044.70100
phrase2532031.68100
phrase_atom2675321.59100
subphrase1138501.4238
word4265901.00100
\n", " Sets: no custom sets
\n", " Features:
\n", "
Parallel Passages\n", "
\n", "\n", "
\n", "
\n", "crossref\n", "
\n", "
int
\n", "\n", " 🆗 links between similar passages\n", "\n", "
\n", "\n", "
\n", "
\n", "\n", "
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis\n", "
\n", "\n", "
\n", "
\n", "book\n", "
\n", "
str
\n", "\n", " ✅ book name in Latin (Genesis; Numeri; Reges1; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "book@ll\n", "
\n", "
str
\n", "\n", " ✅ book name in amharic (ኣማርኛ)\n", "\n", "
\n", "\n", "
\n", "
\n", "chapter\n", "
\n", "
int
\n", "\n", " ✅ chapter number (1; 2; 3; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "code\n", "
\n", "
int
\n", "\n", " ✅ identifier of a clause atom relationship (0; 74; 367; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "det\n", "
\n", "
str
\n", "\n", " ✅ determinedness of phrase(atom) (det; und; NA.)\n", "\n", "
\n", "\n", "
\n", "
\n", "domain\n", "
\n", "
str
\n", "\n", " ✅ text type of clause (? (Unknown); N (narrative); D (discursive); Q (Quotation).)\n", "\n", "
\n", "\n", "
\n", "
\n", "freq_lex\n", "
\n", "
int
\n", "\n", " ✅ frequency of lexemes\n", "\n", "
\n", "\n", "
\n", "
\n", "function\n", "
\n", "
str
\n", "\n", " ✅ syntactic function of phrase (Cmpl; Objc; Pred; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_cons\n", "
\n", "
str
\n", "\n", " ✅ word consonantal-transliterated (B R>CJT BR> >LHJM ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_cons_utf8\n", "
\n", "
str
\n", "\n", " ✅ word consonantal-Hebrew (ב ראשׁית ברא אלהים)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_lex\n", "
\n", "
str
\n", "\n", " ✅ lexeme pointed-transliterated (B.:- R;>CIJT B.@R@> >:ELOH ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_lex_utf8\n", "
\n", "
str
\n", "\n", " ✅ lexeme pointed-Hebrew (בְּ רֵאשִׁית בָּרָא אֱלֹה)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_word\n", "
\n", "
str
\n", "\n", " ✅ word pointed-transliterated (B.:- R;>CI73JT B.@R@74> >:ELOHI92JM)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_word_utf8\n", "
\n", "
str
\n", "\n", " ✅ word pointed-Hebrew (בְּ רֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים)\n", "\n", "
\n", "\n", "
\n", "
\n", "gloss\n", "
\n", "
str
\n", "\n", " 🆗 english translation of lexeme (beginning create god(s))\n", "\n", "
\n", "\n", "
\n", "
\n", "gn\n", "
\n", "
str
\n", "\n", " ✅ grammatical gender (m; f; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "label\n", "
\n", "
str
\n", "\n", " ✅ (half-)verse label (half verses: A; B; C; verses: GEN 01,02)\n", "\n", "
\n", "\n", "
\n", "
\n", "language\n", "
\n", "
str
\n", "\n", " ✅ of word or lexeme (Hebrew; Aramaic.)\n", "\n", "
\n", "\n", "
\n", "
\n", "lex\n", "
\n", "
str
\n", "\n", " ✅ lexeme consonantal-transliterated (B R>CJT/ BR>[ >LHJM/)\n", "\n", "
\n", "\n", "
\n", "
\n", "lex_utf8\n", "
\n", "
str
\n", "\n", " ✅ lexeme consonantal-Hebrew (ב ראשׁית֜ ברא אלהים֜)\n", "\n", "
\n", "\n", "
\n", "
\n", "ls\n", "
\n", "
str
\n", "\n", " ✅ lexical set, subclassification of part-of-speech (card; ques; mult)\n", "\n", "
\n", "\n", "
\n", "
\n", "nametype\n", "
\n", "
str
\n", "\n", " ⚠️ named entity type (pers; mens; gens; topo; ppde.)\n", "\n", "
\n", "\n", "
\n", "
\n", "nme\n", "
\n", "
str
\n", "\n", " ✅ nominal ending consonantal-transliterated (absent; n/a; JM, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "nu\n", "
\n", "
str
\n", "\n", " ✅ grammatical number (sg; du; pl; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "number\n", "
\n", "
int
\n", "\n", " ✅ sequence number of an object within its context\n", "\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "pargr\n", "
\n", "
str
\n", "\n", " 🆗 hierarchical paragraph number (1; 1.2; 1.2.3.4; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "pdp\n", "
\n", "
str
\n", "\n", " ✅ phrase dependent part-of-speech (art; verb; subs; nmpr, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "pfm\n", "
\n", "
str
\n", "\n", " ✅ preformative consonantal-transliterated (absent; n/a; J, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix consonantal-transliterated (absent; n/a; W; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_gn\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix gender (m; f; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_nu\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix number (sg; du; pl; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_ps\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix person (p1; p2; p3; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "ps\n", "
\n", "
str
\n", "\n", " ✅ grammatical person (p1; p2; p3; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "qere\n", "
\n", "
str
\n", "\n", " ✅ word pointed-transliterated masoretic reading correction\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_trailer\n", "
\n", "
str
\n", "\n", " ✅ interword material -pointed-transliterated (Masoretic correction)\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_trailer_utf8\n", "
\n", "
str
\n", "\n", " ✅ interword material -pointed-transliterated (Masoretic correction)\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_utf8\n", "
\n", "
str
\n", "\n", " ✅ word pointed-Hebrew masoretic reading correction\n", "\n", "
\n", "\n", "
\n", "
\n", "rank_lex\n", "
\n", "
int
\n", "\n", " ✅ ranking of lexemes based on freqnuecy\n", "\n", "
\n", "\n", "
\n", "
\n", "rela\n", "
\n", "
str
\n", "\n", " ✅ linguistic relation between clause/(sub)phrase(atom) (ADJ; MOD; ATR; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "sp\n", "
\n", "
str
\n", "\n", " ✅ part-of-speech (art; verb; subs; nmpr, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "st\n", "
\n", "
str
\n", "\n", " ✅ state of a noun (a (absolute); c (construct); e (emphatic).)\n", "\n", "
\n", "\n", "
\n", "
\n", "tab\n", "
\n", "
int
\n", "\n", " ✅ clause atom: its level in the linguistic embedding\n", "\n", "
\n", "\n", "
\n", "
\n", "trailer\n", "
\n", "
str
\n", "\n", " ✅ interword material pointed-transliterated (& 00 05 00_P ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "trailer_utf8\n", "
\n", "
str
\n", "\n", " ✅ interword material pointed-Hebrew (־ ׃)\n", "\n", "
\n", "\n", "
\n", "
\n", "txt\n", "
\n", "
str
\n", "\n", " ✅ text type of clause and surrounding (repetion of ? N D Q as in feature domain)\n", "\n", "
\n", "\n", "
\n", "
\n", "typ\n", "
\n", "
str
\n", "\n", " ✅ clause/phrase(atom) type (VP; NP; Ellp; Ptcp; WayX)\n", "\n", "
\n", "\n", "
\n", "
\n", "uvf\n", "
\n", "
str
\n", "\n", " ✅ univalent final consonant consonantal-transliterated (absent; N; J; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "vbe\n", "
\n", "
str
\n", "\n", " ✅ verbal ending consonantal-transliterated (n/a; W; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "vbs\n", "
\n", "
str
\n", "\n", " ✅ root formation consonantal-transliterated (absent; n/a; H; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "verse\n", "
\n", "
int
\n", "\n", " ✅ verse number\n", "\n", "
\n", "\n", "
\n", "
\n", "voc_lex\n", "
\n", "
str
\n", "\n", " ✅ vocalized lexeme pointed-transliterated (B.: R;>CIJT BR> >:ELOHIJM)\n", "\n", "
\n", "\n", "
\n", "
\n", "voc_lex_utf8\n", "
\n", "
str
\n", "\n", " ✅ vocalized lexeme pointed-Hebrew (בְּ רֵאשִׁית ברא אֱלֹהִים)\n", "\n", "
\n", "\n", "
\n", "
\n", "vs\n", "
\n", "
str
\n", "\n", " ✅ verbal stem (qal; piel; hif; apel; pael)\n", "\n", "
\n", "\n", "
\n", "
\n", "vt\n", "
\n", "
str
\n", "\n", " ✅ verbal tense (perf; impv; wayq; infc)\n", "\n", "
\n", "\n", "
\n", "
\n", "mother\n", "
\n", "
none
\n", "\n", " ✅ linguistic dependency between textual objects\n", "\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "\n", "
Phonetic Transcriptions\n", "
\n", "\n", "
\n", "
\n", "phono\n", "
\n", "
str
\n", "\n", " 🆗 phonological transcription (bᵊ rēšˌîṯ bārˈā ʔᵉlōhˈîm)\n", "\n", "
\n", "\n", "
\n", "
\n", "phono_trailer\n", "
\n", "
str
\n", "\n", " 🆗 interword material in phonological transcription\n", "\n", "
\n", "\n", "
\n", "
\n", "\n", " Settings:
specified
  1. apiVersion: 3
  2. appName: ETCBC/bhsa
  3. appPath: /Users/me/text-fabric-data/github/ETCBC/bhsa/app
  4. commit: gb112c161cfd21eae403d51a2733740d8743460e7
  5. css: ''
  6. dataDisplay:
    • exampleSectionHtml:<code>Genesis 1:1</code> (use <a href=\"https://github.com/{org}/{repo}/blob/master/tf/{version}/book%40en.tf\" target=\"_blank\">English book names</a>)
    • excludedFeatures:
      • g_uvf_utf8
      • g_vbs
      • kq_hybrid
      • languageISO
      • g_nme
      • lex0
      • is_root
      • g_vbs_utf8
      • g_uvf
      • dist
      • root
      • suffix_person
      • g_vbe
      • dist_unit
      • suffix_number
      • distributional_parent
      • kq_hybrid_utf8
      • crossrefSET
      • instruction
      • g_prs
      • lexeme_count
      • rank_occ
      • g_pfm_utf8
      • freq_occ
      • crossrefLCS
      • functional_parent
      • g_pfm
      • g_nme_utf8
      • g_vbe_utf8
      • kind
      • g_prs_utf8
      • suffix_gender
      • mother_object_type
    • noneValues:
      • absent
      • n/a
      • none
      • unknown
      • no value
      • NA
  7. docs:
    • docBase: {docRoot}/{repo}
    • docExt: ''
    • docPage: ''
    • docRoot: https://{org}.github.io
    • featurePage: 0_home
  8. interfaceDefaults: {}
  9. isCompatible: True
  10. local: local
  11. localDir: /Users/me/text-fabric-data/github/ETCBC/bhsa/_temp
  12. provenanceSpec:
    • corpus: BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis
    • doi: 10.5281/zenodo.1007624
    • extraData: ner
    • moduleSpecs:
      • :
        • backend: no value
        • corpus: Phonetic Transcriptions
        • docUrl:https://nbviewer.jupyter.org/github/etcbc/phono/blob/master/programs/phono.ipynb
        • doi: 10.5281/zenodo.1007636
        • org: ETCBC
        • relative: /tf
        • repo: phono
      • :
        • backend: no value
        • corpus: Parallel Passages
        • docUrl:https://nbviewer.jupyter.org/github/ETCBC/parallels/blob/master/programs/parallels.ipynb
        • doi: 10.5281/zenodo.1007642
        • org: ETCBC
        • relative: /tf
        • repo: parallels
    • org: ETCBC
    • relative: /tf
    • repo: bhsa
    • version: 2021
    • webBase: https://shebanq.ancient-data.org/hebrew
    • webHint: Show this on SHEBANQ
    • webLang: la
    • webLexId: True
    • webUrl:{webBase}/text?book=<1>&chapter=<2>&verse=<3>&version={version}&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt
    • webUrlLex: {webBase}/word?version={version}&id=<lid>
  13. release: v1.8.1
  14. typeDisplay:
    • clause:
      • label: {typ} {rela}
      • style: ''
    • clause_atom:
      • hidden: True
      • label: {code}
      • level: 1
      • style: ''
    • half_verse:
      • hidden: True
      • label: {label}
      • style: ''
      • verselike: True
    • lex:
      • featuresBare: gloss
      • label: {voc_lex_utf8}
      • lexOcc: word
      • style: orig
      • template: {voc_lex_utf8}
    • phrase:
      • label: {typ} {function}
      • style: ''
    • phrase_atom:
      • hidden: True
      • label: {typ} {rela}
      • level: 1
      • style: ''
    • sentence:
      • label: {number}
      • style: ''
    • sentence_atom:
      • hidden: True
      • label: {number}
      • level: 1
      • style: ''
    • subphrase:
      • hidden: True
      • label: {number}
      • style: ''
    • word:
      • features: pdp vs vt
      • featuresBare: lex:gloss
  15. writing: hbo
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
TF API: names N F E L T S C TF Fs Fall Es Eall Cs Call directly usable

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A = use(\"ETCBC/bhsa\", hoist=globals())" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "# Relations\n", "\n", "So far we have seen search templates specifying feature conditions on nodes\n", "and a bit of nesting of those nodes, with an occasional extra constraint on their\n", "positions.\n", "\n", "We show some more possibilities.\n", "An more thorough treatment is in [relations](searchRelations.ipynb).\n", "\n", "We can refer to (spatial) relationships between nodes by means of extra constraints\n", "of the form\n", "\n", "```\n", "n relop m\n", "```\n", "\n", "where `n` and `m` are names of node parts in your template, and `relop` is the name of a relational operator.\n", "\n", "Text-Fabric comes with a fixed bunch of spatial relational operators,\n", "and your data set may contain *edge*-features, which correspond to additional relational operators.\n", "\n", "You can get the list of all relational operators that you can currently use:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T07:59:57.799159Z", "start_time": "2018-05-24T07:59:57.776703Z" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " = left equal to right (as node)\n", " # left unequal to right (as node)\n", " < left before right (in canonical node ordering)\n", " > left after right (in canonical node ordering)\n", " == left occupies same slots as right\n", " && left has overlapping slots with right\n", " ## left and right do not have the same slot set\n", " || left and right do not have common slots\n", " [[ left embeds right\n", " ]] left embedded in right\n", " << left completely before right\n", " >> left completely after right\n", " =: left and right start at the same slot\n", " := left and right end at the same slot\n", " :: left and right start and end at the same slot\n", " <: left immediately before right\n", " :> left immediately after right\n", " =k: left and right start at k-nearly the same slot\n", " :k= left and right end at k-nearly the same slot\n", " :k: left and right start and end at k-near slots\n", " left k-nearly after right\n", " .f. left.f = right.f\n", " .f=g. left.f = right.g\n", " .f~r~g. left.f matches right.g\n", " .f#g. left.f # right.g\n", " .f>g. left.f > right.g\n", " .f edge feature \"crossref\" with value specification allowed\n", " edge feature \"crossref\" with value specification allowed (either direction)\n", " -crossrefLCS> edge feature \"crossrefLCS\" with value specification allowed\n", " edge feature \"crossrefLCS\" with value specification allowed (either direction)\n", " -crossrefSET> edge feature \"crossrefSET\" with value specification allowed\n", " edge feature \"crossrefSET\" with value specification allowed (either direction)\n", "-distributional_parent> edge feature \"distributional_parent\"\n", " edge feature \"distributional_parent\" (either direction)\n", " -functional_parent> edge feature \"functional_parent\"\n", " edge feature \"functional_parent\" (either direction)\n", " -mother> edge feature \"mother\"\n", " edge feature \"mother\" (either direction)\n", "The warp feature \"oslots\" and omap features cannot be used in searches.\n", "One of the above relations on nodes and / or slots will suit you better.\n" ] } ], "source": [ "S.relationsLegend()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Feature comparison\n", "\n", "Note the operators that are surrounded by `. .` and have `f` and/or `g` and/or `r` in them.\n", "You can supply any node feature `f` and `g` in your dataset, and any regular expression `r`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We look for predicate - subject pairs where the subject is a single noun and agrees with the predicate in grammatical number.\n", "\n", "Moreover, the noun must be part of the subject." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "tags": [] }, "outputs": [], "source": [ "query = \"\"\"\n", "clause\n", " phrase function=Pred\n", " w1:word pdp=verb\n", " phrase function=Subj\n", " =: w2:word pdp=subs\n", " :=\n", "w1 .nu. w2\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.68s 3759 results\n" ] } ], "source": [ "results = A.search(query)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "
npclausephrasewordphraseword
1Genesis 1:3יְהִ֣י אֹ֑ור יְהִ֣י יְהִ֣י אֹ֑ור אֹ֑ור
2Genesis 1:3וַֽיְהִי־אֹֽור׃ יְהִי־יְהִי־אֹֽור׃ אֹֽור׃
3Genesis 1:5וַֽיְהִי־עֶ֥רֶב יְהִי־יְהִי־עֶ֥רֶב עֶ֥רֶב
4Genesis 1:5וַֽיְהִי־בֹ֖קֶר יְהִי־יְהִי־בֹ֖קֶר בֹ֖קֶר
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.table(results, end=4)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "

result 1" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

clause ZYqX NA
phrase VP Pred
function=Pred
pdp=verbnu=sg
phrase NP Subj
function=Subj
pdp=subsnu=sg
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

result 2" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

clause WayX NA
phrase CP Conj
function=Conj
pdp=conj
phrase VP Pred
function=Pred
pdp=verbnu=sg
phrase NP Subj
function=Subj
pdp=subsnu=sg
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

result 3" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

clause WayX NA
phrase CP Conj
function=Conj
pdp=conj
phrase VP Pred
function=Pred
pdp=verbnu=sg
phrase NP Subj
function=Subj
pdp=subsnu=sg
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

result 4" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

clause WayX NA
phrase CP Conj
function=Conj
pdp=conj
phrase VP Pred
function=Pred
pdp=verbnu=sg
phrase NP Subj
function=Subj
pdp=subsnu=sg
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.show(results, condenseType=\"clause\", end=4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we want such pairs, but then where the grammatical number differs." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "tags": [] }, "outputs": [], "source": [ "query = \"\"\"\n", "clause\n", " phrase function=Pred\n", " w1:word pdp=verb\n", " phrase function=Subj\n", " =: w2:word pdp=subs\n", " :=\n", "w1 .nu#nu. w2\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.65s 739 results\n" ] } ], "source": [ "results = A.search(query)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "
npclausephrasewordphraseword
1Genesis 1:1בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃ בָּרָ֣א בָּרָ֣א אֱלֹהִ֑ים אֱלֹהִ֑ים
2Genesis 1:3וַיֹּ֥אמֶר אֱלֹהִ֖ים יֹּ֥אמֶר יֹּ֥אמֶר אֱלֹהִ֖ים אֱלֹהִ֖ים
3Genesis 1:4וַיַּ֧רְא אֱלֹהִ֛ים אֶת־הָאֹ֖ור יַּ֧רְא יַּ֧רְא אֱלֹהִ֛ים אֱלֹהִ֛ים
4Genesis 1:4וַיַּבְדֵּ֣ל אֱלֹהִ֔ים בֵּ֥ין הָאֹ֖ור וּבֵ֥ין הַחֹֽשֶׁךְ׃ יַּבְדֵּ֣ל יַּבְדֵּ֣ל אֱלֹהִ֔ים אֱלֹהִ֔ים
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.table(results, end=4)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "

result 1" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

clause xQtX NA
phrase PP Time
function=Time
pdp=prep
pdp=subsnu=sg
phrase VP Pred
function=Pred
pdp=verbnu=sg
phrase NP Subj
function=Subj
pdp=subsnu=pl
phrase PP Objc
function=Objc
pdp=prep
pdp=art
pdp=conj
pdp=prep
pdp=art
pdp=subsnu=sg
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

result 2" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

clause WayX NA
phrase CP Conj
function=Conj
pdp=conj
phrase VP Pred
function=Pred
pdp=verbnu=sg
phrase NP Subj
function=Subj
pdp=subsnu=pl
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

result 3" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

clause WayX NA
phrase CP Conj
function=Conj
pdp=conj
phrase VP Pred
function=Pred
pdp=verbnu=sg
phrase NP Subj
function=Subj
pdp=subsnu=pl
phrase PP Objc
function=Objc
pdp=prep
pdp=art
pdp=subsnu=sg
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

result 4" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

clause WayX NA
phrase CP Conj
function=Conj
pdp=conj
phrase VP Pred
function=Pred
phrase NP Subj
function=Subj
pdp=subsnu=pl
phrase PP Cmpl
function=Cmpl
pdp=prepnu=sg
pdp=art
pdp=subsnu=sg
pdp=conj
pdp=prepnu=sg
pdp=art
pdp=subsnu=sg
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.show(results, condenseType=\"clause\", end=4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and now where the subject is not God(s)." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "tags": [] }, "outputs": [], "source": [ "query = \"\"\"\n", "clause\n", " phrase function=Pred\n", " w1:word pdp=verb\n", " phrase function=Subj\n", " =: w2:word pdp=subs lex#>LHJM/\n", " :=\n", "w1 .nu#nu. w2\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.77s 525 results\n" ] } ], "source": [ "results = A.search(query)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "
npclausephrasewordphraseword
1Genesis 1:14יְהִ֤י מְאֹרֹת֙ בִּרְקִ֣יעַ הַשָּׁמַ֔יִם יְהִ֤י יְהִ֤י מְאֹרֹת֙ מְאֹרֹת֙
2Genesis 3:5וְנִפְקְח֖וּ עֵֽינֵיכֶ֑ם נִפְקְח֖וּ נִפְקְח֖וּ עֵֽינֵיכֶ֑ם עֵֽינֵיכֶ֑ם
3Genesis 7:22כֹּ֡ל מִכֹּ֛ל מֵֽתוּ׃ מֵֽתוּ׃ מֵֽתוּ׃ כֹּ֡ל כֹּ֡ל
4Genesis 18:32אוּלַ֛י יִמָּצְא֥וּן שָׁ֖ם עֲשָׂרָ֑ה יִמָּצְא֥וּן יִמָּצְא֥וּן עֲשָׂרָ֑ה עֲשָׂרָ֑ה
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.table(results, end=4)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "

result 1" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

clause ZYqX NA
phrase VP Pred
function=Pred
pdp=verblex=HJH[nu=sg
phrase NP Subj
function=Subj
pdp=subslex=M>WR/nu=pl
phrase PP Loca
function=Loca
pdp=preplex=B
pdp=subslex=RQJ</nu=sg
pdp=artlex=H
pdp=subslex=CMJM/nu=pl
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

result 2" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

clause WQtX Resu
phrase CP Conj
function=Conj
pdp=conjlex=W
phrase VP Pred
function=Pred
pdp=verblex=PQX[nu=pl
phrase NP Subj
function=Subj
pdp=subslex=<JN/nu=du
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

result 3" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

clause XQtl NA
phrase NP Subj
function=Subj
pdp=subslex=KL/nu=sg
clause XQtl NA
phrase PP Adju
function=Adju
pdp=preplex=MN
pdp=subslex=KL/nu=sg
clause XQtl NA
phrase VP Pred
function=Pred
pdp=verblex=MWT[nu=pl
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

result 4" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

clause xYqX NA
phrase AdvP Modi
function=Modi
pdp=advblex=>WLJ
phrase VP Pred
function=Pred
pdp=verblex=MY>[nu=pl
phrase AdvP Cmpl
function=Cmpl
pdp=advblex=CM
phrase NP Subj
function=Subj
pdp=subslex=<FRH=/nu=sg
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.show(results, condenseType=\"clause\", end=4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Edges\n", "\n", "Note that all *edge* features in the dataset correspond to three relational operators.\n", "For example, `mother` gives rise to the operators `-mother>` and ``.\n", "\n", "### Simple edges\n", "Here is an example: look for pairs of clauses of which one is the mother of the other.\n", "In our dataset, there is an *edge* between the two clauses, and this edge is coded in the feature `mother`.\n", "The following query shows how to use the `mother` edge information." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:00:06.688698Z", "start_time": "2018-05-24T08:00:05.864656Z" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.08s 13917 results\n" ] } ], "source": [ "query = \"\"\"\n", "clause\n", "-mother> clause\n", "\"\"\"\n", "results = A.search(query)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:00:06.688698Z", "start_time": "2018-05-24T08:00:05.864656Z" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
npclauseclause
1Genesis 1:4כִּי־טֹ֑וב וַיַּ֧רְא אֱלֹהִ֛ים אֶת־הָאֹ֖ור
2Genesis 1:10כִּי־טֹֽוב׃ וַיַּ֥רְא אֱלֹהִ֖ים
3Genesis 1:12כִּי־טֹֽוב׃ וַיַּ֥רְא אֱלֹהִ֖ים
4Genesis 1:14לְהַבְדִּ֕יל בֵּ֥ין הַיֹּ֖ום וּבֵ֣ין הַלָּ֑יְלָה יְהִ֤י מְאֹרֹת֙ בִּרְקִ֣יעַ הַשָּׁמַ֔יִם
5Genesis 1:15לְהָאִ֖יר עַל־הָאָ֑רֶץ וְהָי֤וּ לִמְאֹורֹת֙ בִּרְקִ֣יעַ הַשָּׁמַ֔יִם
6Genesis 1:17לְהָאִ֖יר עַל־הָאָֽרֶץ׃ וַיִּתֵּ֥ן אֹתָ֛ם אֱלֹהִ֖ים בִּרְקִ֣יעַ הַשָּׁמָ֑יִם
7Genesis 1:18וְלִמְשֹׁל֙ בַּיֹּ֣ום וּבַלַּ֔יְלָה לְהָאִ֖יר עַל־הָאָֽרֶץ׃
8Genesis 1:18וּֽלֲהַבְדִּ֔יל בֵּ֥ין הָאֹ֖ור וּבֵ֣ין הַחֹ֑שֶׁךְ וְלִמְשֹׁל֙ בַּיֹּ֣ום וּבַלַּ֔יְלָה
9Genesis 1:18כִּי־טֹֽוב׃ וַיַּ֥רְא אֱלֹהִ֖ים
10Genesis 1:21כִּי־טֹֽוב׃ וַיַּ֥רְא אֱלֹהִ֖ים
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.table(results, end=10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The mother relation is not always between clause nodes. \n", "What if we are interested in all nodes between which the mother relation exists, irrespective\n", "of the type?\n", "\n", "Use the `.` in the query instead of `clause`. \n", "The `.` stands for: *any node type*." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:00:06.688698Z", "start_time": "2018-05-24T08:00:05.864656Z" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.75s 182269 results\n" ] } ], "source": [ "query = \"\"\"\n", ".\n", "-mother> .\n", "\"\"\"\n", "results = A.search(query)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "
npclause_atom (+1)clause_atom (+1)
1Genesis 1:1אֵ֥ת הָאָֽרֶץ׃ אֵ֥ת הַשָּׁמַ֖יִם
2Genesis 1:2וְהָאָ֗רֶץ הָיְתָ֥ה תֹ֨הוּ֙ וָבֹ֔הוּ בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃
3Genesis 1:2בֹ֔הוּ תֹ֨הוּ֙
4Genesis 1:2וְחֹ֖שֶׁךְ עַל־פְּנֵ֣י תְהֹ֑ום וְהָאָ֗רֶץ הָיְתָ֥ה תֹ֨הוּ֙ וָבֹ֔הוּ
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.table(results, end=4, colorMap={1: \"salmon\", 2: \"cyan\"})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can show more of the edges.\n", "\n", "Let's highlight all edges in the result in yellow." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "

result 1" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

verse
sentence 1
sentence_atom 1
clause xQtX NA
clause_atom 0
mother•\n", "\n", " \n", "\n", "
phrase PP Time
phrase_atom PP NA
phrase VP Pred
phrase_atom VP NA
phrase NP Subj
phrase_atom NP NA
phrase PP Objc
phrase_atom PP NA
subphrase
mother•\n", "\n", "
subphrase
mother•\n", "\n", "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.show(\n", " results,\n", " end=1,\n", " colorMap={1: \"salmon\", 2: \"cyan\"},\n", " hiddenTypes={\"half_verse\"},\n", " edgeHighlights=dict(mother={p: \"yellow\" for p in results}),\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we color the edges between subphrases orange, the edges between clause atoms green, and the other edges yellow." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "tags": [] }, "outputs": [], "source": [ "ehighlights = {p: \"yellow\" for p in results}\n", "\n", "for (f, t) in results:\n", " fType = F.otype.v(f)\n", " tType = F.otype.v(t)\n", " ehighlights[(f, t)] = (\n", " (\n", " \"orange\"\n", " if fType == \"subphrase\"\n", " else \"lightgreen\"\n", " if fType == \"clause_atom\"\n", " else \"yellow\"\n", " )\n", " if fType == tType\n", " else \"yellow\"\n", " )" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "

result 1" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

verse
sentence 1
sentence_atom 1
clause xQtX NA
clause_atom 0
mother•\n", "\n", " \n", "\n", "
phrase PP Time
phrase_atom PP NA
phrase VP Pred
phrase_atom VP NA
phrase NP Subj
phrase_atom NP NA
phrase PP Objc
phrase_atom PP NA
subphrase
mother•\n", "\n", "
subphrase
mother•\n", "\n", "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.show(\n", " results,\n", " end=1,\n", " colorMap={1: \"salmon\", 2: \"cyan\"},\n", " hiddenTypes={\"half_verse\"},\n", " edgeHighlights=dict(mother=ehighlights),\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's have a look at result 2:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "

result 2" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

verse
sentence 1
sentence_atom 1
clause xQtX NA
clause_atom 0
mother•\n", "\n", " \n", "\n", "
phrase PP Time
phrase_atom PP NA
phrase VP Pred
phrase_atom VP NA
phrase NP Subj
phrase_atom NP NA
phrase PP Objc
phrase_atom PP NA
subphrase
mother•\n", "\n", "
subphrase
mother•\n", "\n", "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
verse
sentence 2
sentence_atom 2
clause WXQt NA
clause_atom 422
mother•\n", "\n", "\n", "\n", "
phrase CP Conj
phrase_atom CP NA
phrase NP Subj
phrase_atom NP NA
phrase VP Pred
phrase_atom VP NA
phrase NP PreC
phrase_atom NP NA
subphrase
mother•\n", "\n", "
subphrase
mother•\n", "\n", "
sentence 3
sentence_atom 3
clause NmCl NA
clause_atom 402
mother•\n", "\n", "\n", "\n", "
phrase CP Conj
phrase_atom CP NA
phrase NP Subj
phrase_atom NP NA
phrase PP PreC
phrase_atom PP NA
subphrase
mother•\n", "\n", "
subphrase
mother•\n", "\n", "
sentence 4
sentence_atom 4
clause Ptcp NA
clause_atom 460
mother•\n", "\n", "
phrase CP Conj
phrase_atom CP NA
phrase NP Subj
phrase_atom NP NA
subphrase
mother•\n", "\n", "
subphrase
mother•\n", "\n", "
phrase VP PreC
phrase_atom VP NA
phrase PP Cmpl
phrase_atom PP NA
subphrase
mother•\n", "\n", "
subphrase
mother•\n", "\n", "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.show(\n", " results,\n", " start=2,\n", " end=2,\n", " colorMap={1: \"salmon\", 2: \"cyan\"},\n", " hiddenTypes={\"half_verse\"},\n", " edgeHighlights=dict(mother=ehighlights),\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What about those yellow edges in the subphrases above? Didn't we say that those should be orange?\n", "\n", "No, because they do not point to a subphrase, but to the word in the subphrase. To make that\n", "even more explicit, we show the node numbers:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "

result 2" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

verse:1414389
sentence:1172308 1
sentence_atom:1236025 1
clause:427559 xQtX NA
clause_atom:515690 0
mother•\n", "515691\n", " \n", "515694\n", "
phrase:651573 PP Time
phrase_atom:904776 PP NA
phrase:651574 VP Pred
phrase_atom:904777 VP NA
phrase:651575 NP Subj
phrase_atom:904778 NP NA
phrase:651576 PP Objc
phrase_atom:904779 PP NA
subphrase:1300539
mother•\n", "1300540\n", "
subphrase:1300540
mother•\n", "1300539\n", "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
verse:1414390
sentence:1172309 2
sentence_atom:1236026 2
clause:427560 WXQt NA
clause_atom:515691 422
mother•\n", "515692\n", "\n", "515690\n", "
phrase:651577 CP Conj
phrase_atom:904780 CP NA
12 וְ
phrase:651578 NP Subj
phrase_atom:904781 NP NA
phrase:651579 VP Pred
phrase_atom:904782 VP NA
phrase:651580 NP PreC
phrase_atom:904783 NP NA
subphrase:1300541
mother•\n", "1300542\n", "
17 וָ
subphrase:1300542
mother•\n", "1300541\n", "
sentence:1172310 3
sentence_atom:1236027 3
clause:427561 NmCl NA
clause_atom:515692 402
mother•\n", "515693\n", "\n", "515691\n", "
phrase:651581 CP Conj
phrase_atom:904784 CP NA
19 וְ
phrase:651582 NP Subj
phrase_atom:904785 NP NA
phrase:651583 PP PreC
phrase_atom:904786 PP NA
subphrase:1300543
mother•\n", "1300544\n", "
subphrase:1300544
mother•\n", "22\n", "
sentence:1172311 4
sentence_atom:1236028 4
clause:427562 Ptcp NA
clause_atom:515693 460
mother•\n", "515692\n", "
phrase:651584 CP Conj
phrase_atom:904787 CP NA
24 וְ
phrase:651585 NP Subj
phrase_atom:904788 NP NA
subphrase:1300545
mother•\n", "1300546\n", "
subphrase:1300546
mother•\n", "25\n", "
phrase:651586 VP PreC
phrase_atom:904789 VP NA
phrase:651587 PP Cmpl
phrase_atom:904790 PP NA
subphrase:1300547
mother•\n", "1300548\n", "
subphrase:1300548
mother•\n", "29\n", "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.show(\n", " results,\n", " start=2,\n", " end=2,\n", " colorMap={1: \"salmon\", 2: \"cyan\"},\n", " withNodes=True,\n", " hiddenTypes={\"half_verse\"},\n", " edgeHighlights=dict(mother=ehighlights),\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A clause and its mother do not have to be in the same verse.\n", "We are going to fetch are the cases where they are in different verses.\n", "\n", "Note that we need a more flexible syntax here, where we specify a few templates, give names\n", "to a few positions in the template, and then constrain those positions\n", "by stipulating relationships between them.\n", "\n", "> **Caution**\n", "Referring to verses is not as innocent as it seems.\n", "That will be addressed in [gaps](searchGaps.ipynb)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:00:11.096751Z", "start_time": "2018-05-24T08:00:10.585477Z" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.13s 710 results\n" ] } ], "source": [ "query = \"\"\"\n", "v1:verse\n", " c1:clause\n", "v2:verse\n", " c2:clause\n", "\n", "c1 -mother> c2\n", "v1 # v2\n", "\"\"\"\n", "results = A.search(query)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:00:11.096751Z", "start_time": "2018-05-24T08:00:10.585477Z" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", "
npverseclauseverseclause
1Genesis 1:18וְלִמְשֹׁל֙ בַּיֹּ֣ום וּבַלַּ֔יְלָה לְהָאִ֖יר עַל־הָאָֽרֶץ׃
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.table(results, end=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want to see the different verse references in the table.\n", "\n", "We can skip the verse columns first:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:00:11.096751Z", "start_time": "2018-05-24T08:00:10.585477Z" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", "
npverseclauseverseclause
1Genesis 1:18וְלִמְשֹׁל֙ בַּיֹּ֣ום וּבַלַּ֔יְלָה לְהָאִ֖יר עַל־הָאָֽרֶץ׃
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.table(results, end=1, skipCols=\"1 3\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and then specify that the remaining columns (the clauses) show the passage:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:00:11.096751Z", "start_time": "2018-05-24T08:00:10.585477Z" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
nverseclauseverseclause
1Genesis 1:18  וְלִמְשֹׁל֙ בַּיֹּ֣ום וּבַלַּ֔יְלָה לְהָאִ֖יר עַל־הָאָֽרֶץ׃
2Genesis 2:7  וַיִּיצֶר֩ יְהוָ֨ה אֱלֹהִ֜ים אֶת־הָֽאָדָ֗ם עָפָר֙ מִן־הָ֣אֲדָמָ֔ה בְּיֹ֗ום
3Genesis 7:3  לְחַיֹּ֥ות זֶ֖רַע עַל־פְּנֵ֥י כָל־הָאָֽרֶץ׃ מִכֹּ֣ל׀ הַבְּהֵמָ֣ה הַטְּהֹורָ֗ה תִּֽקַּח־לְךָ֛ שִׁבְעָ֥ה שִׁבְעָ֖ה אִ֣ישׁ וְאִשְׁתֹּ֑ו
4Genesis 22:17  כִּֽי־בָרֵ֣ךְ אֲבָרֶכְךָ֗ כִּ֗י
5Genesis 24:44  הִ֣וא הָֽאִשָּׁ֔ה הָֽעַלְמָה֙
6Genesis 27:45  עַד־שׁ֨וּב אַף־אָחִ֜יךָ מִמְּךָ֗ עַ֥ד אֲשֶׁר־תָּשׁ֖וּב חֲמַ֥ת אָחִֽיךָ׃
7Genesis 36:16  אַלּֽוּף־קֹ֛רַח אַלּ֥וּף גַּעְתָּ֖ם אַלּ֣וּף עֲמָלֵ֑ק בְּנֵ֤י אֱלִיפַז֙ בְּכֹ֣ור עֵשָׂ֔ו אַלּ֤וּף תֵּימָן֙ אַלּ֣וּף אֹומָ֔ר אַלּ֥וּף צְפֹ֖ו אַלּ֥וּף קְנַֽז׃
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.table(results, end=7, skipCols=\"1 3\", withPassage=\"1 2\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Edges with values\n", "\n", "There are also edge features that somehow *qualify* the relation between nodes they specify.\n", "\n", "The edge feature `crossref` in the\n", "[parallels](https://github.com/ETCBC/parallels)\n", "module specifies a relationship between verses: they are *parallel* if they are similar.\n", "But `crossref` also tells you how similar, in the form of a number that is the percentage of similarity\n", "according to the measure used by the algorithm to detect the parallels.\n", "\n", "This number is called the *value* of the `crossref` edge.\n", "In our search templates we make use of the *values* of edge features.\n", "\n", "Not all edge features provide values. `mother` does not. But `crossref` does.\n", "\n", "Here is how many cross-references we have. The `crossref` edge feature is symmetric: if `v` is parallel to `w`, `w` is parallel to `v`. So in our query we stipulate that `v` comes before `w`:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:00:35.648065Z", "start_time": "2018-05-24T08:00:35.276033Z" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.06s 15871 results\n" ] } ], "source": [ "query = \"\"\"\n", "v:verse\n", "-crossref> w:verse\n", "v < w\n", "\"\"\"\n", "results = A.search(query)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We get a quick overview of the similarity distribution of parallels by means of `freqList()`:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:00:38.315507Z", "start_time": "2018-05-24T08:00:38.291652Z" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "((100, 8456),\n", " (80, 7796),\n", " (84, 2874),\n", " (86, 2328),\n", " (76, 1274),\n", " (77, 1220),\n", " (78, 1170),\n", " (79, 844),\n", " (81, 844),\n", " (75, 836),\n", " (83, 754),\n", " (88, 730),\n", " (82, 720),\n", " (92, 250),\n", " (85, 248),\n", " (90, 240),\n", " (91, 216),\n", " (94, 160),\n", " (87, 148),\n", " (95, 148),\n", " (89, 142),\n", " (96, 90),\n", " (93, 88),\n", " (98, 76),\n", " (99, 58),\n", " (97, 32))" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "E.crossref.freqList()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we want the cases with a high similarity, we can say:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:00:40.831880Z", "start_time": "2018-05-24T08:00:40.657543Z" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.04s 4356 results\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
nverseverse
1Genesis 10:2  1_Chronicles 1:5  
2Genesis 10:6  1_Chronicles 1:8  
3Genesis 10:7  1_Chronicles 1:9  
4Genesis 10:8  1_Chronicles 1:10  
5Genesis 10:13  1_Chronicles 1:11  
6Genesis 10:14  1_Chronicles 1:12  
7Genesis 10:15  1_Chronicles 1:13  
8Genesis 10:16  1_Chronicles 1:14  
9Genesis 10:17  1_Chronicles 1:15  
10Genesis 10:24  1_Chronicles 1:18  
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "query = \"\"\"\n", "v:verse\n", "-crossref>95> w:verse\n", "v < w\n", "\"\"\"\n", "results = A.search(query)\n", "A.table(results, end=10, withPassage=\"1 2\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also see the verses written out:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "
nverseverse
1Genesis 10:2  בְּנֵ֣י יֶ֔פֶת גֹּ֣מֶר וּמָגֹ֔וג וּמָדַ֖י וְיָוָ֣ן וְתֻבָ֑ל וּמֶ֖שֶׁךְ וְתִירָֽס׃ 1_Chronicles 1:5  בְּנֵ֣י יֶ֔פֶת גֹּ֣מֶר וּמָגֹ֔וג וּמָדַ֖י וְיָוָ֣ן וְתֻבָ֑ל וּמֶ֖שֶׁךְ וְתִירָֽס׃ ס
2Genesis 10:6  וּבְנֵ֖י חָ֑ם כּ֥וּשׁ וּמִצְרַ֖יִם וּפ֥וּט וּכְנָֽעַן׃ 1_Chronicles 1:8  בְּנֵ֖י חָ֑ם כּ֥וּשׁ וּמִצְרַ֖יִם פּ֥וּט וּכְנָֽעַן׃
3Genesis 10:7  וּבְנֵ֣י כ֔וּשׁ סְבָא֙ וַֽחֲוִילָ֔ה וְסַבְתָּ֥ה וְרַעְמָ֖ה וְסַבְתְּכָ֑א וּבְנֵ֥י רַעְמָ֖ה שְׁבָ֥א וּדְדָֽן׃ 1_Chronicles 1:9  וּבְנֵ֣י כ֔וּשׁ סְבָא֙ וַחֲוִילָ֔ה וְסַבְתָּ֥א וְרַעְמָ֖א וְסַבְתְּכָ֑א וּבְנֵ֥י רַעְמָ֖א שְׁבָ֥א וּדְדָֽן׃ ס
4Genesis 10:8  וְכ֖וּשׁ יָלַ֣ד אֶת־נִמְרֹ֑ד ה֣וּא הֵחֵ֔ל לִֽהְיֹ֥ות גִּבֹּ֖ר בָּאָֽרֶץ׃ 1_Chronicles 1:10  וְכ֖וּשׁ יָלַ֣ד אֶת־נִמְרֹ֑וד ה֣וּא הֵחֵ֔ל לִהְיֹ֥ות גִּבֹּ֖ור בָּאָֽרֶץ׃ ס
5Genesis 10:13  וּמִצְרַ֡יִם יָלַ֞ד אֶת־לוּדִ֧ים וְאֶת־עֲנָמִ֛ים וְאֶת־לְהָבִ֖ים וְאֶת־נַפְתֻּחִֽים׃ 1_Chronicles 1:11  וּמִצְרַ֡יִם יָלַ֞ד אֶת־לוּדִ֧ים וְאֶת־עֲנָמִ֛ים וְאֶת־לְהָבִ֖ים וְאֶת־נַפְתֻּחִֽים׃
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.table(results, end=5, withPassage=\"1 2\", full=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we want to inspect the cases with a lower similarity:" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:00:43.547559Z", "start_time": "2018-05-24T08:00:43.379437Z" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.03s 2672 results\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "
nverseverse
1Genesis 1:15  וְהָי֤וּ לִמְאֹורֹת֙ בִּרְקִ֣יעַ הַשָּׁמַ֔יִם לְהָאִ֖יר עַל־הָאָ֑רֶץ וַֽיְהִי־כֵֽן׃ Genesis 1:17  וַיִּתֵּ֥ן אֹתָ֛ם אֱלֹהִ֖ים בִּרְקִ֣יעַ הַשָּׁמָ֑יִם לְהָאִ֖יר עַל־הָאָֽרֶץ׃
2Genesis 5:4  וַיִּֽהְי֣וּ יְמֵי־אָדָ֗ם אַֽחֲרֵי֙ הֹולִידֹ֣ו אֶת־שֵׁ֔ת שְׁמֹנֶ֥ה מֵאֹ֖ת שָׁנָ֑ה וַיֹּ֥ולֶד בָּנִ֖ים וּבָנֹֽות׃ Genesis 5:7  וַֽיְחִי־שֵׁ֗ת אַֽחֲרֵי֙ הֹולִידֹ֣ו אֶת־אֱנֹ֔ושׁ שֶׁ֣בַע שָׁנִ֔ים וּשְׁמֹנֶ֥ה מֵאֹ֖ות שָׁנָ֑ה וַיֹּ֥ולֶד בָּנִ֖ים וּבָנֹֽות׃
3Genesis 5:4  וַיִּֽהְי֣וּ יְמֵי־אָדָ֗ם אַֽחֲרֵי֙ הֹולִידֹ֣ו אֶת־שֵׁ֔ת שְׁמֹנֶ֥ה מֵאֹ֖ת שָׁנָ֑ה וַיֹּ֥ולֶד בָּנִ֖ים וּבָנֹֽות׃ Genesis 5:13  וַיְחִ֣י קֵינָ֗ן אַחֲרֵי֙ הֹולִידֹ֣ו אֶת־מַֽהֲלַלְאֵ֔ל אַרְבָּעִ֣ים שָׁנָ֔ה וּשְׁמֹנֶ֥ה מֵאֹ֖ות שָׁנָ֑ה וַיֹּ֥ולֶד בָּנִ֖ים וּבָנֹֽות׃
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "query = \"\"\"\n", "v:verse\n", "-crossref<80> w:verse\n", "v < w\n", "\"\"\"\n", "results = A.search(query)\n", "A.table(results, end=3, withPassage=\"1 2\", full=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This shows how all features in your data can be queried in search templates, even the features that give values\n", "to edges." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Feature conditions\n", "\n", "So far we have seen feature conditions in templates of these forms\n", "\n", "```\n", "node feature=value\n", "```\n", "\n", "But there is more." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Trivially true\n", "\n", "You can say\n", "\n", "```\n", "node feature*\n", "```\n", "\n", "which selects all nodes, irrespective of the existence or value of feature.\n", "\n", "This is a useless criterion in the sense that it does not influence the set of results.\n", "\n", "But when some applications run queries for you, they might use the features mentioned in your query\n", "to decorate the results retrieved.\n", "\n", "This is your way to tell such applications that you want the values of `feature` included in your results.\n", "\n", "The text fabric browser looks at the features when it exports your results to CSV." ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.35s 426590 results\n", "426590\n", " 0.34s 426590 results\n", "426590\n" ] } ], "source": [ "query1 = \"\"\"\n", "word vt*\n", "\"\"\"\n", "\n", "query2 = \"\"\"\n", "word\n", "\"\"\"\n", "\n", "results = A.search(query1)\n", "print(len(results))\n", "\n", "results = A.search(query1)\n", "print(len(results))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Inequality\n", "\n", "You can also say\n", "\n", "```\n", "node feature#value\n", "```\n", "which selects nodes where the feature does not have `value`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Multiple values\n", "\n", "When stating a feature condition, such as `chapter=1`,\n", "you may also specify a list of alternative values:\n", "\n", "```\n", " chapter=1|2|3\n", "```\n", "\n", "You may list as many values as you wish, for every feature.\n", "\n", "It also works with inequalities:\n", "\n", "```\n", " chapter#1|2|3\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's find all verbally inflected words that are:\n", "not in the qal, not in the third person, not in the singular,\n", "not in the masculine." ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.35s 271 results\n" ] } ], "source": [ "query = \"\"\"\n", "word sp=verb vs#qal vt#infc|infa|ptca|ptcp ps#p3 nu#sg gn#m\n", "\"\"\"\n", "\n", "A.displaySetup(extraFeatures=\"vt ps nu gn\")\n", "results = A.search(query, shallow=True)" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
vt=impvgn=fnu=plps=p2
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
vt=impfnu=plps=p1
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
vt=impfnu=plps=p1
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
vt=impfnu=plps=p1
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
vt=impfnu=plps=p1
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "for r in sorted(results)[0:5]:\n", " A.pretty(r)" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "tags": [] }, "outputs": [], "source": [ "A.displayReset(\"extraFeatures\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Existence of values\n", "\n", "If you are not interested in the particular value of a feature,\n", "but only in whether there is a value or not, you can express that.\n", "\n", "### Qere\n", "\n", "We can ask for all words that have a qere.\n", "Just leave out the `=value` part.\n", "\n", "```\n", "word qere\n", "```\n", "\n", "Conversely, we can ask for words without a qere.\n", "Just add a `#` after the feature name.\n", "\n", "```\n", "word qere#\n", "```\n", "\n", "Let's test it." ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:00:49.932231Z", "start_time": "2018-05-24T08:00:48.725647Z" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Words in total:\n", " 0.25s 426590 results\n", "Words with a qere:\n", " 0.12s 1892 results\n", "Words without a qere:\n", " 0.30s 424698 results\n", "qereWords + plainWords == allWords ? True\n" ] } ], "source": [ "query = \"\"\"\n", "word\n", "\"\"\"\n", "print(\"Words in total:\")\n", "results = A.search(query)\n", "allWords = len(results)\n", "\n", "print(\"Words with a qere:\")\n", "query = \"\"\"\n", "word qere\n", "\"\"\"\n", "results = A.search(query)\n", "qereWords = len(results)\n", "\n", "print(\"Words without a qere:\")\n", "query = \"\"\"\n", "word qere#\n", "\"\"\"\n", "results = A.search(query)\n", "plainWords = len(results)\n", "\n", "print(f\"qereWords + plainWords == allWords ? {qereWords + plainWords == allWords}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Boundaries\n", "\n", "For features with *numerical* values, we may ask for values higher or lower than a given value.\n", "\n", "The\n", "[dist](https://etcbc.github.io/bhsa/features/hebrew/2017/dist.html)\n", "feature gives the distance between an object and its mother.\n", "\n", "We want to see it values by means of `freqList()`, but the feature is not yet loaded.\n", "Let's do a query with it, after running it, the feature is loaded." ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:00:55.805149Z", "start_time": "2018-05-24T08:00:55.469647Z" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 1.67s 598 results\n" ] } ], "source": [ "query = \"\"\"\n", "clause dist=1\n", "\"\"\"\n", "results = A.search(query)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can explore the frequencies:" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:00:58.342986Z", "start_time": "2018-05-24T08:00:57.929824Z" }, "scrolled": true, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "((0, 631151),\n", " (-1, 104911),\n", " (-2, 38188),\n", " (-3, 14986),\n", " (-4, 7665),\n", " (-5, 3657),\n", " (-6, 2145),\n", " (1, 1773),\n", " (-7, 1380),\n", " (-8, 918))" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F.dist.freqList()[0:10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us say we are interested in clause only. The feature `dist` is defined for multiple node types.\n", "We can pass a set of node types to `freqList()` in order to get the frequencies restricted to those types:" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:01:00.822906Z", "start_time": "2018-05-24T08:01:00.224369Z" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "((0, 67340),\n", " (-1, 11593),\n", " (-2, 3265),\n", " (-3, 2437),\n", " (-4, 1384),\n", " (-5, 668),\n", " (1, 598),\n", " (-6, 329),\n", " (-7, 167),\n", " (-8, 70))" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "F.dist.freqList({\"clause\"})[0:10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are negative distances. In those cases the mother precedes the daughter. Let's get the mothers that\n", "precede their daughters by a large amount." ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:01:05.718152Z", "start_time": "2018-05-24T08:01:05.541047Z" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.03s 86 results\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
npclause
1Genesis 25:12אֲשֶׁ֨ר יָלְדָ֜ה הָגָ֧ר הַמִּצְרִ֛ית שִׁפְחַ֥ת שָׂרָ֖ה לְאַבְרָהָֽם׃
2Genesis 30:33אֲשֶׁר־אֵינֶנּוּ֩ נָקֹ֨ד וְטָל֜וּא בָּֽעִזִּ֗ים וְחוּם֙ בַּכְּשָׂבִ֔ים
3Genesis 49:11אֹסְרִ֤י לַגֶּ֨פֶן֙ עִירֹ֔ו
4Genesis 50:13אֲשֶׁ֣ר קָנָה֩ אַבְרָהָ֨ם אֶת־הַשָּׂדֶ֜ה לַאֲחֻזַּת־קֶ֗בֶר מֵאֵ֛ת עֶפְרֹ֥ן הַחִתִּ֖י
5Exodus 18:8אֲשֶׁ֨ר עָשָׂ֤ה יְהוָה֙ לְפַרְעֹ֣ה וּלְמִצְרַ֔יִם עַ֖ל אֹודֹ֣ת יִשְׂרָאֵ֑ל
6Exodus 25:9אֲשֶׁ֤ר אֲנִי֙ מַרְאֶ֣ה אֹותְךָ֔ אֵ֚ת תַּבְנִ֣ית הַמִּשְׁכָּ֔ן וְאֵ֖ת תַּבְנִ֣ית כָּל־כֵּלָ֑יו
7Exodus 38:26הָעֹבֵ֜ר עַל־הַפְּקֻדִ֗ים מִבֶּ֨ן עֶשְׂרִ֤ים שָׁנָה֙ וָמַ֔עְלָה
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "query = \"\"\"\n", "clause dist<-10\n", "\"\"\"\n", "results = A.search(query)\n", "A.table(sorted(results), end=7)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Regular expressions\n", "\n", "An even more powerful way of specifying desired feature values is by regular expressions.\n", "You can do this for *string-valued* values features only.\n", "\n", "Instead of specifying a feature condition like this\n", "\n", "```\n", "typ=WIm0\n", "```\n", "\n", "or\n", "\n", "```\n", "typ=WIm0|WImX\n", "```\n", "\n", "you can say\n", "\n", "```\n", "typ~WIm[0X]\n", "```\n", "\n", "Note that you do not use the `=` between feature name and value specification,\n", "but `~`.\n", "\n", "The syntax and semantics of regular expressions are those as defined in the\n", "[Python docs](https://docs.python.org/3/library/re.html#regular-expression-syntax).\n", "\n", "Note, that if you need to enter a `\\` in the regular expression, you have to double it.\n", "Also, when you need a space in it, you have to put a `\\` in front of it." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### No value no match\n", "\n", "If you search with regular expressions, then nodes without a value do not match any regular expression.\n", "\n", "The regular expression `.*` matches everything.\n", "\n", "#### Qere\n", "\n", "Not all words have a qere.\n", "\n", "So we expect the following template to list all words that do have a qere and none of those that don't." ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:01:11.104476Z", "start_time": "2018-05-24T08:01:10.518168Z" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.14s 1892 results\n", "Compare this with qere words: 1892: Equal\n" ] } ], "source": [ "query = \"\"\"\n", "word qere~.*\n", "\"\"\"\n", "results = list(A.search(query))\n", "matchWords = len(results)\n", "print(\n", " \"Compare this with qere words: \"\n", " f'{qereWords}: {\"Equal\" if matchWords == qereWords else \"Unequal\"}'\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### More examples\n", "\n", "#### Two letter nouns\n", "\n", "We pick two letter nouns that start with an aleph." ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:01:13.922452Z", "start_time": "2018-05-24T08:01:13.089321Z" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.21s 816 results\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
npword
1Genesis 2:6אֵ֖ד
2Genesis 3:20אֵ֥ם
3Genesis 14:18אֵ֥ל
4Genesis 14:19אֵ֣ל
5Genesis 14:20אֵ֣ל
6Genesis 14:22אֵ֣ל
7Genesis 15:17אֵ֔שׁ
8Genesis 16:13אֵ֣ל
9Genesis 17:1אֵ֣ל
10Genesis 17:4אַ֖ב
11Genesis 17:5אַב־
12Genesis 19:24אֵ֑שׁ
13Genesis 21:33אֵ֥ל
14Genesis 22:6אֵ֖שׁ
15Genesis 22:7אֵשׁ֙
16Genesis 24:29אָ֖ח
17Genesis 27:45אַף־
18Genesis 28:3אֵ֤ל
19Genesis 28:5אֵ֥ם
20Genesis 30:2אַ֥ף
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "query = \"\"\"\n", "word sp=subs g_cons~^>.$\n", "\"\"\"\n", "results = A.search(query, sort=True)\n", "A.table(results, end=20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us zoom in on one of the results.\n", "We want to know more about the lexeme in question.\n", "\n", "There are several methods to do that." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Show the nodes\n", "\n", "First of all, let us show the nodes." ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:01:17.632676Z", "start_time": "2018-05-24T08:01:17.624599Z" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", "
npword
20Genesis 30:215621 אַ֥ף
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.table(results, start=20, end=20, withNodes=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can use `pretty()` to get more info." ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:01:20.072171Z", "start_time": "2018-05-24T08:01:20.065240Z" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
g_cons=>Psp=subs
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.pretty(results[19][0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that under the word is a link to its lexeme entry in SHEBANQ.\n", "\n", "##### Programmatically\n", "With a bit of TF juggling you could also have got this link programmatically:" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "tags": [] }, "outputs": [], "source": [ "lx = L.u(results[19][0], otype=\"lex\")[0]" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "אַף" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.webLink(lx)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Enrich the query\n", "\n", "We can also add some context to the query.\n", "Since we are interested in the lexemes, let's add those to the query.\n", "\n", "Every word lies embedded in a lexeme." ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:01:27.629901Z", "start_time": "2018-05-24T08:01:26.793939Z" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.20s 816 results\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
nplexword
1Exodus 4:8אֹותאֹ֣ת
2Exodus 4:8אֹותאֹ֥ת
3Exodus 8:19אֹותאֹ֥ת
4Exodus 12:13אֹותאֹ֗ת
5Genesis 2:6אֵדאֵ֖ד
6Genesis 27:45אַףאַף־
7Genesis 30:2אַףאַ֥ף
8Exodus 4:14אַףאַ֨ף
9Exodus 11:8אַףאָֽף׃ ס
10Exodus 32:19אַףאַ֣ף
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "query = \"\"\"\n", "lex\n", " word sp=subs g_cons~^>.$\n", "\"\"\"\n", "results = A.search(query)\n", "A.table(results, end=10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Same amount of results, but the order is different.\n", "We just use Python to get the lexemes only, together with their first occurrence.\n", "We make a list of tuples, and feed that to `A.table()`." ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:01:38.168240Z", "start_time": "2018-05-24T08:01:38.158934Z" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
nplexword
1Exodus 4:8אֹותאֹ֣ת
2Genesis 2:6אֵדאֵ֖ד
3Genesis 27:45אַףאַף־
4Genesis 17:4אָבאַ֖ב
5Genesis 3:20אֵםאֵ֥ם
6Genesis 24:29אָחאָ֖ח
7Isaiah 20:6אִיאִ֣י
8Genesis 14:18אֵלאֵ֥ל
9Genesis 15:17אֵשׁאֵ֔שׁ
10Genesis 31:29אֵלאֵ֣ל
112_Samuel 18:5אַטאַט־
122_Samuel 14:19אִשׁאִ֣שׁ׀
13Ezekiel 40:48אַיִלאֵ֣ל
14Jeremiah 36:22אָחאָ֖ח
15Job 24:25אַלאַ֗ל
16Ezra 5:8אָעאָ֖ע
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lexemes = set()\n", "lexResults = []\n", "for (lex, word) in results:\n", " if lex not in lexemes:\n", " lexemes.add(lex)\n", " lexResults.append((lex, word))\n", "A.table(lexResults)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Observe how you can use a query to get an interesting node set,\n", "which you can then massage using standard Python machinery,\n", "after which you can display the results prettily with `A.table()` or `A.show()`.\n", "\n", "**The take-away lesson is: you can use `A.table()` and `A.show()` on arbitrary iterables of tuples of nodes,\n", "whether or not they come from an executed query.**\n", "\n", "The headers of the tables are taken from the node types of all tuples, but it shows the most\n", "frequent one only. \n", "If there are more types in the same column, it will be indicated, and if you hover over the `(+1)` you see which\n", "types are also present." ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:01:41.297750Z", "start_time": "2018-05-24T08:01:41.291652Z" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
npphrase_atom (+1)phrase_atom (+1)
1Genesis 1:1בְּתֹּ֕אמֶר
2Genesis 1:1בִּי־רֵאשִׁ֖ית
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tuples = (\n", " (1, 1000000),\n", " (1000001, 2),\n", ")\n", "A.table(tuples)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Also `A.show()` makes perfect sense in this case." ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:01:43.690920Z", "start_time": "2018-05-24T08:01:43.667562Z" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "

clause 1" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

clause xQtX NA
phrase PP Time
g_cons=Bsp=prep
g_cons=R>CJTsp=subs
phrase VP Pred
g_cons=BR>sp=verb
phrase NP Subj
g_cons=>LHJMsp=subs
phrase PP Objc
g_cons=>Tsp=prep
g_cons=Hsp=art
g_cons=CMJMsp=subs
g_cons=Wsp=conj
g_cons=>Tsp=prep
g_cons=Hsp=art
g_cons=>RYsp=subs
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

clause 2" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

clause Way0 NA
phrase CP Conj
g_cons=Wsp=conj
phrase VP Pred
g_cons=T>MRsp=verb
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

clause 3" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

clause NmCl NA
phrase PP PreC
g_cons=BJsp=prep
g_cons=>NJsp=prps
clause NmCl NA
phrase NP Subj
g_cons=Hsp=art
g_cons=<WNsp=subs
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.show(tuples, condensed=True, condenseType=\"clause\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Everything that is part of a result, we see properly highlighted, but we can not discern what belongs to result 1 and what to result 2.\n", "\n", "That becomes clear if we uncondense:" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:01:46.770065Z", "start_time": "2018-05-24T08:01:46.729601Z" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "

result 1" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

clause xQtX NA
phrase PP Time
g_cons=Bsp=prep
g_cons=R>CJTsp=subs
phrase VP Pred
g_cons=BR>sp=verb
phrase NP Subj
g_cons=>LHJMsp=subs
phrase PP Objc
g_cons=>Tsp=prep
g_cons=Hsp=art
g_cons=CMJMsp=subs
g_cons=Wsp=conj
g_cons=>Tsp=prep
g_cons=Hsp=art
g_cons=>RYsp=subs
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
clause Way0 NA
phrase CP Conj
g_cons=Wsp=conj
phrase VP Pred
g_cons=T>MRsp=verb
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

result 2" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

clause xQtX NA
phrase PP Time
g_cons=Bsp=prep
g_cons=R>CJTsp=subs
phrase VP Pred
g_cons=BR>sp=verb
phrase NP Subj
g_cons=>LHJMsp=subs
phrase PP Objc
g_cons=>Tsp=prep
g_cons=Hsp=art
g_cons=CMJMsp=subs
g_cons=Wsp=conj
g_cons=>Tsp=prep
g_cons=Hsp=art
g_cons=>RYsp=subs
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
clause NmCl NA
phrase PP PreC
g_cons=BJsp=prep
g_cons=>NJsp=prps
clause NmCl NA
phrase NP Subj
g_cons=Hsp=art
g_cons=<WNsp=subs
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.show(tuples, condensed=False, condenseType=\"clause\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### we-x clauses with a non-qal verb\n", "\n", "If you look at the [clause types](https://etcbc.github.io/bhsa/features/hebrew/2017/typ.html)\n", "you see a lot of types indicating that the clause starts with `we`:\n", "\n", "```\n", "Way0\tWayyiqtol-null clause\n", "WayX\tWayyiqtol-X clause\n", "WIm0\tWe-imperative-null clause\n", "WImX\tWe-imperative-X clause\n", "WQt0\tWe-qatal-null clause\n", "WQtX\tWe-qatal-X clause\n", "WxI0\tWe-x-imperative-null clause\n", "WXIm\tWe-X-imperative clause\n", "WxIX\tWe-x-imperative-X clause\n", "WxQ0\tWe-x-qatal-null clause\n", "WXQt\tWe-X-qatal clause\n", "WxQX\tWe-x-qatal-X clause\n", "WxY0\tWe-x-yiqtol-null clause\n", "WXYq\tWe-X-yiqtol clause\n", "WxYX\tWe-x-yiqtol-X clause\n", "WYq0\tWe-yiqtol-null clause\n", "WYqX\tWe-yiqtol-X clause\n", "```\n", "\n", "We are interested in the `We-x` and `We-X` clauses, so all clauses whose `typ` starts with `Wx` or `WX`.\n", "\n", "There are quite a number of verb stems. By means of a regular expression we can pick everything except `qal`.\n", "\n", "In the\n", "[Python docs on regular expressions](https://docs.python.org/3/library/re.html#regular-expression-syntax)\n", "we see that we can check for that by `^(?:!qal)`." ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:01:53.494281Z", "start_time": "2018-05-24T08:01:52.486679Z" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.24s 3098 results\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
npclauseword
1Genesis 1:20וְעֹוף֙ יְעֹופֵ֣ף עַל־הָאָ֔רֶץ עַל־פְּנֵ֖י רְקִ֥יעַ הַשָּׁמָֽיִם׃ יְעֹופֵ֣ף
2Genesis 2:10וּמִשָּׁם֙ יִפָּרֵ֔ד יִפָּרֵ֔ד
3Genesis 2:25וְלֹ֖א יִתְבֹּשָֽׁשׁוּ׃ יִתְבֹּשָֽׁשׁוּ׃
4Genesis 3:18וְקֹ֥וץ וְדַרְדַּ֖ר תַּצְמִ֣יחַֽ לָ֑ךְ תַּצְמִ֣יחַֽ
5Genesis 4:4וְהֶ֨בֶל הֵבִ֥יא גַם־ה֛וּא מִבְּכֹרֹ֥ות צֹאנֹ֖ו וּמֵֽחֶלְבֵהֶ֑ן הֵבִ֥יא
6Genesis 4:7וְאִם֙ לֹ֣א תֵיטִ֔יב תֵיטִ֔יב
7Genesis 4:14וּמִפָּנֶ֖יךָ אֶסָּתֵ֑ר אֶסָּתֵ֑ר
8Genesis 4:26וּלְשֵׁ֤ת גַּם־הוּא֙ יֻלַּד־בֵּ֔ן יֻלַּד־
9Genesis 6:1וּבָנֹ֖ות יֻלְּד֥וּ לָהֶֽם׃ יֻלְּד֥וּ
10Genesis 6:12וְהִנֵּ֣ה נִשְׁחָ֑תָה נִשְׁחָ֑תָה
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "query = \"\"\"\n", "clause typ~^W[xX]\n", " word sp=verb vs#qal\n", "\"\"\"\n", "results = list(A.search(query))\n", "A.table(results, end=10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Find all glosses with a space" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:02:00.113562Z", "start_time": "2018-05-24T08:02:00.028266Z" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.01s 406 results\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "
nplex
1תְּהֹוםתְּהֹום
2תַּחַתתַּחַת
3יַבָּשָׁהיַבָּשָׁה
4דֶּשֶׁאדֶּשֶׁא
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "query = r\"\"\"\n", "lex gloss~[\\ ] sp=subs\n", "\"\"\"\n", "results = list(A.search(query))\n", "A.table(results, start=1, end=4)" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "

result 1" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

lex תְּהֹום
primeval oceansp=subs
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

result 2" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

lex תַּחַת
under partsp=subs
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

result 3" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

lex יַבָּשָׁה
dry landsp=subs
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

result 4" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

lex דֶּשֶׁא
young grasssp=subs
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.show(results, condensed=False, start=1, end=4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Custom sets\n", "\n", "Eventually you reach cases where search templates are just not up to it.\n", "\n", "Examples:\n", "\n", "* What if you want to restrict a search to sentences that do not contain infrequent words?\n", "* It is fairly tricky to look for gapped phrases. What if you look for complex patterns, but only in\n", " gapped phrases?\n", "\n", "Before you dive head over heels into hand coding, here is an intermediate solution.\n", "You can create node sets by means of search, and then use those node sets in other search templates\n", "at the places where you have node types." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can make custom sets with arbitrary nodes, not all of the same type.\n", "Let's collect all non-word, non-lex nodes that contain fairly frequent words only.\n", "We also collect a set of nodes that contain highly infrequent words.\n", "\n", "There is a feature for that, [`rank_lex`](https://etcbc.github.io/bhsa/features/hebrew/2017/rank_lex.html).\n", "Since we have not loaded it, we do so now." ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:05:53.690573Z", "start_time": "2018-05-24T08:05:53.592190Z" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "TF.load(\"rank_lex\", add=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We set a threshold `COMMON_RANK`, and pick all objects with only high ranking words, their ranks between 0 and `COMMON_RANK`.\n", "\n", "We set a threshold `RARE_RANK`, and pick all objects that contain at least one low ranking word, its rank higher than `RARE_RANK`." ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:06:08.187551Z", "start_time": "2018-05-24T08:06:00.209985Z" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "669195 members in set frequent\n", "425320 members in set infrequent\n" ] } ], "source": [ "COMMON_RANK = 100\n", "RARE_RANK = 500\n", "\n", "frequent = set()\n", "infrequent = set()\n", "\n", "for n in N.walk():\n", " nTp = F.otype.v(n)\n", " if nTp == \"lex\":\n", " continue\n", " if nTp == \"word\":\n", " ranks = [F.rank_lex.v(n)]\n", " else:\n", " ranks = [F.rank_lex.v(w) for w in L.d(n, otype=\"word\")]\n", " maxRank = max(ranks)\n", " minRank = min(ranks)\n", " if maxRank < COMMON_RANK:\n", " frequent.add(n)\n", " if maxRank > RARE_RANK:\n", " infrequent.add(n)\n", "\n", "print(f\"{len(frequent):>6} members in set frequent\")\n", "print(f\"{len(infrequent):>6} members in set infrequent\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can do all kinds of searches within the domain of `frequent` and `infrequent` things.\n", "\n", "We give the names to all the sets and put them in a dictionary." ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:07:11.688552Z", "start_time": "2018-05-24T08:07:11.685127Z" }, "tags": [] }, "outputs": [], "source": [ "customSets = dict(\n", " frequent=frequent,\n", " infrequent=infrequent,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we pass it to `A.search()` with a query to look for sentences with a rare word that have a clause with only frequent words:" ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.43s 4311 results\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
npsentenceclause
5Genesis 1:25וַיַּ֥רְא אֱלֹהִ֖ים כִּי־טֹֽוב׃ וַיַּ֥רְא אֱלֹהִ֖ים
6Genesis 1:29הִנֵּה֩ נָתַ֨תִּי לָכֶ֜ם אֶת־כָּל־עֵ֣שֶׂב׀ זֹרֵ֣עַ זֶ֗רַע אֲשֶׁר֙ עַל־פְּנֵ֣י כָל־הָאָ֔רֶץ וְאֶת־כָּל־הָעֵ֛ץ אֲשֶׁר־בֹּ֥ו פְרִי־עֵ֖ץ זֹרֵ֣עַ זָ֑רַע וּֽלְכָל־חַיַּ֣ת הָ֠אָרֶץ וּלְכָל־עֹ֨וף הַשָּׁמַ֜יִם וּלְכֹ֣ל׀ רֹומֵ֣שׂ עַל־הָאָ֗רֶץ אֲשֶׁר־בֹּו֙ נֶ֣פֶשׁ חַיָּ֔ה אֶת־כָּל־יֶ֥רֶק עֵ֖שֶׂב לְאָכְלָ֑ה אֲשֶׁר֙ עַל־פְּנֵ֣י כָל־הָאָ֔רֶץ
7Genesis 2:2וַיְכַ֤ל אֱלֹהִים֙ בַּיֹּ֣ום הַשְּׁבִיעִ֔י מְלַאכְתֹּ֖ו אֲשֶׁ֣ר עָשָׂ֑ה אֲשֶׁ֣ר עָשָׂ֑ה
8Genesis 2:2וַיִּשְׁבֹּת֙ בַּיֹּ֣ום הַשְּׁבִיעִ֔י מִכָּל־מְלַאכְתֹּ֖ו אֲשֶׁ֥ר עָשָֽׂה׃ אֲשֶׁ֥ר עָשָֽׂה׃
9Genesis 2:3כִּ֣י בֹ֤ו שָׁבַת֙ מִכָּל־מְלַאכְתֹּ֔ו אֲשֶׁר־בָּרָ֥א אֱלֹהִ֖ים לַעֲשֹֽׂות׃ פ לַעֲשֹֽׂות׃ פ
10Genesis 2:4בְּיֹ֗ום עֲשֹׂ֛ות יְהוָ֥ה אֱלֹהִ֖ים אֶ֥רֶץ וְשָׁמָֽיִם׃ וַיִּיצֶר֩ יְהוָ֨ה אֱלֹהִ֜ים אֶת־הָֽאָדָ֗ם עָפָר֙ מִן־הָ֣אֲדָמָ֔ה בְּיֹ֗ום
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "query = \"\"\"\n", "infrequent otype=sentence\n", " frequent otype=clause\n", "\"\"\"\n", "results = A.search(query, sets=customSets)\n", "A.table(results, start=5, end=10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We are going to show this really nice:\n", "\n", "* we add the feature `rank_lex` to the display\n", "* we suppress the other features\n", "* we color the rare words and the common words differently" ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T08:07:22.498973Z", "start_time": "2018-05-24T08:07:22.065761Z" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "

result 6" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

verse
sentence 82
clause NA
phrase
rank_lex=0
phrase
phrase
sentence 83
clause NA
phrase
rank_lex=44
phrase
phrase
rank_lex=2
phrase
rank_lex=4
rank_lex=10
rank_lex=1030
clause Attr
phrase
rank_lex=657
phrase
rank_lex=209
clause Attr
phrase
rank_lex=9
phrase
rank_lex=7
rank_lex=25
rank_lex=10
rank_lex=1
rank_lex=22
clause NA
phrase
rank_lex=0
rank_lex=4
rank_lex=10
rank_lex=1
rank_lex=151
clause Attr
phrase
rank_lex=9
phrase
rank_lex=3
phrase
rank_lex=367
rank_lex=151
clause Attr
phrase
rank_lex=657
phrase
rank_lex=209
sentence 84
clause NA
phrase
rank_lex=2
phrase
phrase
rank_lex=2
rank_lex=1544
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
verse
sentence 83
clause NA
phrase
rank_lex=0
phrase
rank_lex=2
rank_lex=10
rank_lex=423
rank_lex=1
rank_lex=22
rank_lex=0
rank_lex=2
rank_lex=10
rank_lex=554
rank_lex=1
rank_lex=0
rank_lex=2
rank_lex=10
rank_lex=1596
rank_lex=7
rank_lex=1
rank_lex=22
clause Attr
phrase
rank_lex=9
phrase
rank_lex=3
phrase
rank_lex=67
rank_lex=201
clause NA
phrase
rank_lex=4
rank_lex=10
rank_lex=2866
rank_lex=1030
phrase
rank_lex=2
rank_lex=1544
sentence 85
clause NA
phrase
rank_lex=0
phrase
rank_lex=15
phrase
rank_lex=95
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

result 7" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

verse
sentence 2
clause NA
phrase
rank_lex=0
phrase
rank_lex=236
phrase
phrase
rank_lex=3
rank_lex=1
rank_lex=23
rank_lex=1
phrase
clause Attr
phrase
rank_lex=9
phrase
rank_lex=17
sentence 3
clause NA
phrase
rank_lex=0
phrase
phrase
rank_lex=3
rank_lex=1
rank_lex=23
rank_lex=1
phrase
rank_lex=5
rank_lex=10
clause Attr
phrase
rank_lex=9
phrase
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.displaySetup(extraFeatures=\"rank_lex\")\n", "highlights = {}\n", "for (sentence, clause) in results:\n", " highlights[sentence] = \"magenta\"\n", " highlights[clause] = \"cyan\"\n", " for w in L.d(sentence, otype=\"word\"):\n", " if F.rank_lex.v(w) > RARE_RANK:\n", " highlights[w] = \"magenta\"\n", " for w in L.d(clause, otype=\"word\"):\n", " if F.rank_lex.v(w) < COMMON_RANK:\n", " highlights[w] = \"cyan\"\n", "A.show(\n", " results,\n", " condensed=False,\n", " start=6,\n", " end=7,\n", " suppress={\"sp\", \"vt\", \"vs\", \"function\", \"typ\", \"otype\"},\n", " highlights=highlights,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now infrequent sentences ending in a frequent word:" ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.45s 10798 results\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
npsentenceword
5Genesis 1:9יִקָּו֨וּ הַמַּ֜יִם מִתַּ֤חַת הַשָּׁמַ֨יִם֙ אֶל־מָקֹ֣ום אֶחָ֔ד אֶחָ֔ד
6Genesis 1:10וַיִּקְרָ֨א אֱלֹהִ֤ים׀ לַיַּבָּשָׁה֙ אֶ֔רֶץ אֶ֔רֶץ
7Genesis 1:11תַּֽדְשֵׁ֤א הָאָ֨רֶץ֙ דֶּ֔שֶׁא עֵ֚שֶׂב מַזְרִ֣יעַ זֶ֔רַע עֵ֣ץ פְּרִ֞י עֹ֤שֶׂה פְּרִי֙ לְמִינֹ֔ו אֲשֶׁ֥ר זַרְעֹו־בֹ֖ו עַל־הָאָ֑רֶץ אָ֑רֶץ
8Genesis 1:15וְהָי֤וּ לִמְאֹורֹת֙ בִּרְקִ֣יעַ הַשָּׁמַ֔יִם לְהָאִ֖יר עַל־הָאָ֑רֶץ אָ֑רֶץ
9Genesis 1:22וְהָעֹ֖וף יִ֥רֶב בָּאָֽרֶץ׃ אָֽרֶץ׃
10Genesis 1:26וְיִרְדּוּ֩ בִדְגַ֨ת הַיָּ֜ם וּבְעֹ֣וף הַשָּׁמַ֗יִם וּבַבְּהֵמָה֙ וּבְכָל־הָאָ֔רֶץ וּבְכָל־הָרֶ֖מֶשׂ הָֽרֹמֵ֥שׂ עַל־הָאָֽרֶץ׃ אָֽרֶץ׃
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "query = \"\"\"\n", "infrequent otype=sentence\n", " := frequent otype=word\n", "\"\"\"\n", "results = A.search(query, sets=customSets)\n", "A.table(results, start=5, end=10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As a check, we replace the custom set `frequent` by the ordinary type `word` with a rank condition." ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.37s 10798 results\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
npsentenceword
5Genesis 1:9יִקָּו֨וּ הַמַּ֜יִם מִתַּ֤חַת הַשָּׁמַ֨יִם֙ אֶל־מָקֹ֣ום אֶחָ֔ד אֶחָ֔ד
6Genesis 1:10וַיִּקְרָ֨א אֱלֹהִ֤ים׀ לַיַּבָּשָׁה֙ אֶ֔רֶץ אֶ֔רֶץ
7Genesis 1:11תַּֽדְשֵׁ֤א הָאָ֨רֶץ֙ דֶּ֔שֶׁא עֵ֚שֶׂב מַזְרִ֣יעַ זֶ֔רַע עֵ֣ץ פְּרִ֞י עֹ֤שֶׂה פְּרִי֙ לְמִינֹ֔ו אֲשֶׁ֥ר זַרְעֹו־בֹ֖ו עַל־הָאָ֑רֶץ אָ֑רֶץ
8Genesis 1:15וְהָי֤וּ לִמְאֹורֹת֙ בִּרְקִ֣יעַ הַשָּׁמַ֔יִם לְהָאִ֖יר עַל־הָאָ֑רֶץ אָ֑רֶץ
9Genesis 1:22וְהָעֹ֖וף יִ֥רֶב בָּאָֽרֶץ׃ אָֽרֶץ׃
10Genesis 1:26וְיִרְדּוּ֩ בִדְגַ֨ת הַיָּ֜ם וּבְעֹ֣וף הַשָּׁמַ֗יִם וּבַבְּהֵמָה֙ וּבְכָל־הָאָ֔רֶץ וּבְכָל־הָרֶ֖מֶשׂ הָֽרֹמֵ֥שׂ עַל־הָאָֽרֶץ׃ אָֽרֶץ׃
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "query = \"\"\"\n", "infrequent otype=sentence\n", " := word rank_lex<100\n", "\"\"\"\n", "results = A.search(query, sets=customSets)\n", "A.table(results, start=5, end=10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that no matter how expensive the construction of a set has been, once you have it, queries based on it are just fast. There is no penalty when you use given sets instead of the familiar node types." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# All steps\n", "\n", "* **[start](start.ipynb)** your first step in mastering the bible computationally\n", "* **[display](display.ipynb)** become an expert in creating pretty displays of your text structures\n", "* **[search](search.ipynb)** turbo charge your hand-coding with search templates\n", "\n", "---\n", "\n", "advanced\n", "\n", "You have seen how to filter on feature values, of nodes and of edges.\n", "\n", "Now we want to set up sets for real.\n", "\n", "[sets](searchSets.ipynb)\n", "[relations](searchRelations.ipynb)\n", "[quantifiers](searchQuantifiers.ipynb)\n", "[from MQL](searchFromMQL.ipynb)\n", "[rough](searchRough.ipynb)\n", "[gaps](searchGaps.ipynb)\n", "\n", "---\n", "\n", "* **[export Excel](exportExcel.ipynb)** make tailor-made spreadsheets out of your results\n", "* **[share](share.ipynb)** draw in other people's data and let them use yours\n", "* **[export](export.ipynb)** export your dataset as an Emdros database\n", "* **[annotate](annotate.ipynb)** annotate plain text by means of other tools and import the annotations as TF features\n", "* **[map](map.ipynb)** map somebody else's annotations to a new version of the corpus\n", "* **[volumes](volumes.ipynb)** work with selected books only\n", "* **[trees](trees.ipynb)** work with the BHSA data as syntax trees\n", "\n", "CC-BY Dirk Roorda" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.0" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }