{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "\n", "# Tutorial\n", "\n", "This notebook gets you started with using\n", "[Text-Fabric](https://annotation.github.io/text-fabric/) for coding in the Hebrew Bible.\n", "\n", "If you are totally new to Text-Fabric, it might be helpful to read about the underlying\n", "[data model](https://annotation.github.io/text-fabric/tf/about/datamodel.html) first.\n", "\n", "Short introductions to other TF datasets:\n", "\n", "* [Dead Sea Scrolls](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/dss.ipynb),\n", "* [Old Babylonian Letters](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/oldbabylonian.ipynb),\n", "or the\n", "* [Quran](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/quran.ipynb)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Export to Excel\n", "\n", "In a notebook, you can perform searches and view them in a tabular display and zoom in on items with\n", "pretty displays.\n", "\n", "But there are times that you want to take your results outside Text-Fabric, outside a notebook, outside Python, and just\n", "work with them in other programs, such as Excel.\n", "\n", "You want to do that not only with query results, but with all kinds of lists of tuples of nodes.\n", "\n", "There is a function for that, `A.export()`, and here we show what it can do." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Incantation\n", "\n", "The ins and outs of installing Text-Fabric, getting the corpus, and initializing a notebook are\n", "explained in the [start tutorial](start.ipynb)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T10:06:39.818664Z", "start_time": "2018-05-24T10:06:39.796588Z" } }, "outputs": [], "source": [ "import os\n", "from tf.app import use" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "**Locating corpus resources ...**" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "app: ~/text-fabric-data/github/ETCBC/bhsa/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/ETCBC/bhsa/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/ETCBC/phono/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/ETCBC/parallels/tf/2021" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " Text-Fabric: Text-Fabric API 12.0.4, ETCBC/bhsa/app v3, Search Reference
\n", " Data: ETCBC - bhsa 2021, Character table, Feature docs
\n", "
Node types\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "
Name# of nodes# slots/node% coverage
book3910938.21100
chapter929459.19100
lex923046.22100
verse2321318.38100
half_verse451799.44100
sentence637176.70100
sentence_atom645146.61100
clause881314.84100
clause_atom907044.70100
phrase2532031.68100
phrase_atom2675321.59100
subphrase1138501.4238
word4265901.00100
\n", " Sets: no custom sets
\n", " Features:
\n", "
Parallel Passages\n", "
\n", "\n", "
\n", "
\n", "crossref\n", "
\n", "
int
\n", "\n", " 🆗 links between similar passages\n", "\n", "
\n", "\n", "
\n", "
\n", "\n", "
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis\n", "
\n", "\n", "
\n", "
\n", "book\n", "
\n", "
str
\n", "\n", " ✅ book name in Latin (Genesis; Numeri; Reges1; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "book@ll\n", "
\n", "
str
\n", "\n", " ✅ book name in amharic (ኣማርኛ)\n", "\n", "
\n", "\n", "
\n", "
\n", "chapter\n", "
\n", "
int
\n", "\n", " ✅ chapter number (1; 2; 3; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "code\n", "
\n", "
int
\n", "\n", " ✅ identifier of a clause atom relationship (0; 74; 367; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "det\n", "
\n", "
str
\n", "\n", " ✅ determinedness of phrase(atom) (det; und; NA.)\n", "\n", "
\n", "\n", "
\n", "
\n", "domain\n", "
\n", "
str
\n", "\n", " ✅ text type of clause (? (Unknown); N (narrative); D (discursive); Q (Quotation).)\n", "\n", "
\n", "\n", "
\n", "
\n", "freq_lex\n", "
\n", "
int
\n", "\n", " ✅ frequency of lexemes\n", "\n", "
\n", "\n", "
\n", "
\n", "function\n", "
\n", "
str
\n", "\n", " ✅ syntactic function of phrase (Cmpl; Objc; Pred; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_cons\n", "
\n", "
str
\n", "\n", " ✅ word consonantal-transliterated (B R>CJT BR> >LHJM ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_cons_utf8\n", "
\n", "
str
\n", "\n", " ✅ word consonantal-Hebrew (ב ראשׁית ברא אלהים)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_lex\n", "
\n", "
str
\n", "\n", " ✅ lexeme pointed-transliterated (B.:- R;>CIJT B.@R@> >:ELOH ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_lex_utf8\n", "
\n", "
str
\n", "\n", " ✅ lexeme pointed-Hebrew (בְּ רֵאשִׁית בָּרָא אֱלֹה)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_word\n", "
\n", "
str
\n", "\n", " ✅ word pointed-transliterated (B.:- R;>CI73JT B.@R@74> >:ELOHI92JM)\n", "\n", "
\n", "\n", "
\n", "
\n", "g_word_utf8\n", "
\n", "
str
\n", "\n", " ✅ word pointed-Hebrew (בְּ רֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים)\n", "\n", "
\n", "\n", "
\n", "
\n", "gloss\n", "
\n", "
str
\n", "\n", " 🆗 english translation of lexeme (beginning create god(s))\n", "\n", "
\n", "\n", "
\n", "
\n", "gn\n", "
\n", "
str
\n", "\n", " ✅ grammatical gender (m; f; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "label\n", "
\n", "
str
\n", "\n", " ✅ (half-)verse label (half verses: A; B; C; verses: GEN 01,02)\n", "\n", "
\n", "\n", "
\n", "
\n", "language\n", "
\n", "
str
\n", "\n", " ✅ of word or lexeme (Hebrew; Aramaic.)\n", "\n", "
\n", "\n", "
\n", "
\n", "lex\n", "
\n", "
str
\n", "\n", " ✅ lexeme consonantal-transliterated (B R>CJT/ BR>[ >LHJM/)\n", "\n", "
\n", "\n", "
\n", "
\n", "lex_utf8\n", "
\n", "
str
\n", "\n", " ✅ lexeme consonantal-Hebrew (ב ראשׁית֜ ברא אלהים֜)\n", "\n", "
\n", "\n", "
\n", "
\n", "ls\n", "
\n", "
str
\n", "\n", " ✅ lexical set, subclassification of part-of-speech (card; ques; mult)\n", "\n", "
\n", "\n", "
\n", "
\n", "nametype\n", "
\n", "
str
\n", "\n", " ⚠️ named entity type (pers; mens; gens; topo; ppde.)\n", "\n", "
\n", "\n", "
\n", "
\n", "nme\n", "
\n", "
str
\n", "\n", " ✅ nominal ending consonantal-transliterated (absent; n/a; JM, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "nu\n", "
\n", "
str
\n", "\n", " ✅ grammatical number (sg; du; pl; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "number\n", "
\n", "
int
\n", "\n", " ✅ sequence number of an object within its context\n", "\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "pargr\n", "
\n", "
str
\n", "\n", " 🆗 hierarchical paragraph number (1; 1.2; 1.2.3.4; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "pdp\n", "
\n", "
str
\n", "\n", " ✅ phrase dependent part-of-speech (art; verb; subs; nmpr, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "pfm\n", "
\n", "
str
\n", "\n", " ✅ preformative consonantal-transliterated (absent; n/a; J, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix consonantal-transliterated (absent; n/a; W; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_gn\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix gender (m; f; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_nu\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix number (sg; du; pl; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "prs_ps\n", "
\n", "
str
\n", "\n", " ✅ pronominal suffix person (p1; p2; p3; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "ps\n", "
\n", "
str
\n", "\n", " ✅ grammatical person (p1; p2; p3; NA; unknown.)\n", "\n", "
\n", "\n", "
\n", "
\n", "qere\n", "
\n", "
str
\n", "\n", " ✅ word pointed-transliterated masoretic reading correction\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_trailer\n", "
\n", "
str
\n", "\n", " ✅ interword material -pointed-transliterated (Masoretic correction)\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_trailer_utf8\n", "
\n", "
str
\n", "\n", " ✅ interword material -pointed-transliterated (Masoretic correction)\n", "\n", "
\n", "\n", "
\n", "
\n", "qere_utf8\n", "
\n", "
str
\n", "\n", " ✅ word pointed-Hebrew masoretic reading correction\n", "\n", "
\n", "\n", "
\n", "
\n", "rank_lex\n", "
\n", "
int
\n", "\n", " ✅ ranking of lexemes based on freqnuecy\n", "\n", "
\n", "\n", "
\n", "
\n", "rela\n", "
\n", "
str
\n", "\n", " ✅ linguistic relation between clause/(sub)phrase(atom) (ADJ; MOD; ATR; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "sp\n", "
\n", "
str
\n", "\n", " ✅ part-of-speech (art; verb; subs; nmpr, ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "st\n", "
\n", "
str
\n", "\n", " ✅ state of a noun (a (absolute); c (construct); e (emphatic).)\n", "\n", "
\n", "\n", "
\n", "
\n", "tab\n", "
\n", "
int
\n", "\n", " ✅ clause atom: its level in the linguistic embedding\n", "\n", "
\n", "\n", "
\n", "
\n", "trailer\n", "
\n", "
str
\n", "\n", " ✅ interword material pointed-transliterated (& 00 05 00_P ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "trailer_utf8\n", "
\n", "
str
\n", "\n", " ✅ interword material pointed-Hebrew (־ ׃)\n", "\n", "
\n", "\n", "
\n", "
\n", "txt\n", "
\n", "
str
\n", "\n", " ✅ text type of clause and surrounding (repetion of ? N D Q as in feature domain)\n", "\n", "
\n", "\n", "
\n", "
\n", "typ\n", "
\n", "
str
\n", "\n", " ✅ clause/phrase(atom) type (VP; NP; Ellp; Ptcp; WayX)\n", "\n", "
\n", "\n", "
\n", "
\n", "uvf\n", "
\n", "
str
\n", "\n", " ✅ univalent final consonant consonantal-transliterated (absent; N; J; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "vbe\n", "
\n", "
str
\n", "\n", " ✅ verbal ending consonantal-transliterated (n/a; W; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "vbs\n", "
\n", "
str
\n", "\n", " ✅ root formation consonantal-transliterated (absent; n/a; H; ...)\n", "\n", "
\n", "\n", "
\n", "
\n", "verse\n", "
\n", "
int
\n", "\n", " ✅ verse number\n", "\n", "
\n", "\n", "
\n", "
\n", "voc_lex\n", "
\n", "
str
\n", "\n", " ✅ vocalized lexeme pointed-transliterated (B.: R;>CIJT BR> >:ELOHIJM)\n", "\n", "
\n", "\n", "
\n", "
\n", "voc_lex_utf8\n", "
\n", "
str
\n", "\n", " ✅ vocalized lexeme pointed-Hebrew (בְּ רֵאשִׁית ברא אֱלֹהִים)\n", "\n", "
\n", "\n", "
\n", "
\n", "vs\n", "
\n", "
str
\n", "\n", " ✅ verbal stem (qal; piel; hif; apel; pael)\n", "\n", "
\n", "\n", "
\n", "
\n", "vt\n", "
\n", "
str
\n", "\n", " ✅ verbal tense (perf; impv; wayq; infc)\n", "\n", "
\n", "\n", "
\n", "
\n", "mother\n", "
\n", "
none
\n", "\n", " ✅ linguistic dependency between textual objects\n", "\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "\n", "
Phonetic Transcriptions\n", "
\n", "\n", "
\n", "
\n", "phono\n", "
\n", "
str
\n", "\n", " 🆗 phonological transcription (bᵊ rēšˌîṯ bārˈā ʔᵉlōhˈîm)\n", "\n", "
\n", "\n", "
\n", "
\n", "phono_trailer\n", "
\n", "
str
\n", "\n", " 🆗 interword material in phonological transcription\n", "\n", "
\n", "\n", "
\n", "
\n", "\n", " Settings:
specified
  1. apiVersion: 3
  2. appName: ETCBC/bhsa
  3. appPath: /Users/me/text-fabric-data/github/ETCBC/bhsa/app
  4. commit: gd905e3fb6e80d0fa537600337614adc2af157309
  5. css: ''
  6. dataDisplay:
    • exampleSectionHtml:<code>Genesis 1:1</code> (use <a href=\"https://github.com/{org}/{repo}/blob/master/tf/{version}/book%40en.tf\" target=\"_blank\">English book names</a>)
    • excludedFeatures:
      • g_uvf_utf8
      • g_vbs
      • kq_hybrid
      • languageISO
      • g_nme
      • lex0
      • is_root
      • g_vbs_utf8
      • g_uvf
      • dist
      • root
      • suffix_person
      • g_vbe
      • dist_unit
      • suffix_number
      • distributional_parent
      • kq_hybrid_utf8
      • crossrefSET
      • instruction
      • g_prs
      • lexeme_count
      • rank_occ
      • g_pfm_utf8
      • freq_occ
      • crossrefLCS
      • functional_parent
      • g_pfm
      • g_nme_utf8
      • g_vbe_utf8
      • kind
      • g_prs_utf8
      • suffix_gender
      • mother_object_type
    • noneValues:
      • none
      • unknown
      • no value
      • NA
  7. docs:
    • docBase: {docRoot}/{repo}
    • docExt: ''
    • docPage: ''
    • docRoot: https://{org}.github.io
    • featurePage: 0_home
  8. interfaceDefaults: {}
  9. isCompatible: True
  10. local: local
  11. localDir: /Users/me/text-fabric-data/github/ETCBC/bhsa/_temp
  12. provenanceSpec:
    • corpus: BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis
    • doi: 10.5281/zenodo.1007624
    • moduleSpecs:
      • :
        • backend: no value
        • corpus: Phonetic Transcriptions
        • docUrl:https://nbviewer.jupyter.org/github/etcbc/phono/blob/master/programs/phono.ipynb
        • doi: 10.5281/zenodo.1007636
        • org: ETCBC
        • relative: /tf
        • repo: phono
      • :
        • backend: no value
        • corpus: Parallel Passages
        • docUrl:https://nbviewer.jupyter.org/github/ETCBC/parallels/blob/master/programs/parallels.ipynb
        • doi: 10.5281/zenodo.1007642
        • org: ETCBC
        • relative: /tf
        • repo: parallels
    • org: ETCBC
    • relative: /tf
    • repo: bhsa
    • version: 2021
    • webBase: https://shebanq.ancient-data.org/hebrew
    • webHint: Show this on SHEBANQ
    • webLang: la
    • webLexId: True
    • webUrl:{webBase}/text?book=<1>&chapter=<2>&verse=<3>&version={version}&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt
    • webUrlLex: {webBase}/word?version={version}&id=<lid>
  13. release: v1.8
  14. typeDisplay:
    • clause:
      • label: {typ} {rela}
      • style: ''
    • clause_atom:
      • hidden: True
      • label: {code}
      • level: 1
      • style: ''
    • half_verse:
      • hidden: True
      • label: {label}
      • style: ''
      • verselike: True
    • lex:
      • featuresBare: gloss
      • label: {voc_lex_utf8}
      • lexOcc: word
      • style: orig
      • template: {voc_lex_utf8}
    • phrase:
      • label: {typ} {function}
      • style: ''
    • phrase_atom:
      • hidden: True
      • label: {typ} {rela}
      • level: 1
      • style: ''
    • sentence:
      • label: {number}
      • style: ''
    • sentence_atom:
      • hidden: True
      • label: {number}
      • level: 1
      • style: ''
    • subphrase:
      • hidden: True
      • label: {number}
      • style: ''
    • word:
      • features: pdp vs vt
      • featuresBare: lex:gloss
  15. writing: hbo
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Text-Fabric API: names N F E L T S C TF Fs Fall Es Eall Cs Call directly usable

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A = use(\"ETCBC/bhsa\", hoist=globals())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Inspect the contents of a file\n", "We write a function that can peek into file on your system, and show the first few lines.\n", "We'll use it to inspect the exported files that we are going to produce." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "EXPORT_FILE = os.path.expanduser(\"~/Downloads/results.tsv\")\n", "UPTO = 10\n", "\n", "\n", "def checkout():\n", " with open(EXPORT_FILE, encoding=\"utf_16\") as fh:\n", " for (i, line) in enumerate(fh):\n", " if i >= UPTO:\n", " break\n", " print(line)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Encoding\n", "\n", "Our exported `.tsv` files open in Excel without hassle, even if they contain non-latin characters.\n", "That is because TF writes such files in an\n", "encoding that works well with Excel: `utf_16_le`.\n", "You can just open them in Excel, there is no need for conversion before or after opening these files.\n", "\n", "Should you want to process these files by means of a (Python) program,\n", "take care to read them with encoding `utf_16`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Example query\n", "\n", "We first run a query in order to export the results." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T07:46:55.998382Z", "start_time": "2018-05-24T07:46:55.137956Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.38s 1868 results\n" ] } ], "source": [ "query = \"\"\"\n", "book book=Samuel_I\n", " clause\n", " word sp=nmpr\n", "\"\"\"\n", "results = A.search(query)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Bare export\n", "\n", "You can export the table of results to Excel.\n", "\n", "The following command writes a tab-separated file `results.tsv` to your downloads directory.\n", "\n", "You can specify arguments `toDir=directory` and `toFile=file name` to write to a different file.\n", "If the directory does not exist, it will be created.\n", "\n", "We stick to the default, however." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "A.export(results)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check out the contents:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "R\tS1\tS2\tS3\tNODE1\tTYPE1\tbook1\tNODE2\tTYPE2\tTEXT2\tNODE3\tTYPE3\tTEXT3\tsp3\n", "\n", "1\t1_Samuel\t1\t1\t426598\tbook\tSamuel_I\t453958\tclause\tוַיְהִי֩ אִ֨ישׁ אֶחָ֜ד מִן־הָרָמָתַ֛יִם צֹופִ֖ים מֵהַ֣ר אֶפְרָ֑יִם \t141550\tword\tאֶפְרָ֑יִם \tnmpr\n", "\n", "2\t1_Samuel\t1\t1\t426598\tbook\tSamuel_I\t453959\tclause\tוּשְׁמֹ֡ו אֶ֠לְקָנָה בֶּן־יְרֹחָ֧ם בֶּן־אֱלִיה֛וּא בֶּן־תֹּ֥חוּ בֶן־צ֖וּף אֶפְרָתִֽי׃ \t141553\tword\tאֶ֠לְקָנָה \tnmpr\n", "\n", "3\t1_Samuel\t1\t1\t426598\tbook\tSamuel_I\t453959\tclause\tוּשְׁמֹ֡ו אֶ֠לְקָנָה בֶּן־יְרֹחָ֧ם בֶּן־אֱלִיה֛וּא בֶּן־תֹּ֥חוּ בֶן־צ֖וּף אֶפְרָתִֽי׃ \t141555\tword\tיְרֹחָ֧ם \tnmpr\n", "\n", "4\t1_Samuel\t1\t1\t426598\tbook\tSamuel_I\t453959\tclause\tוּשְׁמֹ֡ו אֶ֠לְקָנָה בֶּן־יְרֹחָ֧ם בֶּן־אֱלִיה֛וּא בֶּן־תֹּ֥חוּ בֶן־צ֖וּף אֶפְרָתִֽי׃ \t141557\tword\tאֱלִיה֛וּא \tnmpr\n", "\n", "5\t1_Samuel\t1\t1\t426598\tbook\tSamuel_I\t453959\tclause\tוּשְׁמֹ֡ו אֶ֠לְקָנָה בֶּן־יְרֹחָ֧ם בֶּן־אֱלִיה֛וּא בֶּן־תֹּ֥חוּ בֶן־צ֖וּף אֶפְרָתִֽי׃ \t141559\tword\tתֹּ֥חוּ \tnmpr\n", "\n", "6\t1_Samuel\t1\t1\t426598\tbook\tSamuel_I\t453959\tclause\tוּשְׁמֹ֡ו אֶ֠לְקָנָה בֶּן־יְרֹחָ֧ם בֶּן־אֱלִיה֛וּא בֶּן־תֹּ֥חוּ בֶן־צ֖וּף אֶפְרָתִֽי׃ \t141561\tword\tצ֖וּף \tnmpr\n", "\n", "7\t1_Samuel\t1\t2\t426598\tbook\tSamuel_I\t453961\tclause\tשֵׁ֤ם אַחַת֙ חַנָּ֔ה \t141569\tword\tחַנָּ֔ה \tnmpr\n", "\n", "8\t1_Samuel\t1\t2\t426598\tbook\tSamuel_I\t453962\tclause\tוְשֵׁ֥ם הַשֵּׁנִ֖ית פְּנִנָּ֑ה \t141574\tword\tפְּנִנָּ֑ה \tnmpr\n", "\n", "9\t1_Samuel\t1\t2\t426598\tbook\tSamuel_I\t453964\tclause\tלִפְנִנָּה֙ יְלָדִ֔ים \t141578\tword\tפְנִנָּה֙ \tnmpr\n", "\n" ] } ], "source": [ "checkout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You see the following columns:\n", "\n", "* `R` the sequence number of the result tuple in the result list\n", "* `S1 S2 S3` the section as book, chapter, verse, in separate columns\n", "* `NODEi TYPEi` the node and its type, for each node `i` in the result tuple\n", "* `TEXTi` the full text of node `i`, if the node type admits a concise text representation\n", "* `sp3` the value of feature `3`, since our query mentions the feature `sp` on node 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Richer exports\n", "\n", "If we want to see the clause type (feature `typ`) and the word gender (feature `gn`) as well, we must mention them\n", "in the query.\n", "\n", "We can do so as follows:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T07:46:55.998382Z", "start_time": "2018-05-24T07:46:55.137956Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.67s 1868 results\n" ] } ], "source": [ "query = \"\"\"\n", "book book=Samuel_I\n", " clause typ*\n", " word sp=nmpr gn*\n", "\"\"\"\n", "results = A.search(query)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The same number of results as before.\n", "The `*` is a trivial condition, it is always true.\n", "\n", "We do the export again and peek at the results." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "R\tS1\tS2\tS3\tNODE1\tTYPE1\tbook1\tNODE2\tTYPE2\tTEXT2\ttyp2\tNODE3\tTYPE3\tTEXT3\tgn3\tsp3\n", "\n", "1\t1_Samuel\t1\t1\t426598\tbook\tSamuel_I\t453958\tclause\tוַיְהִי֩ אִ֨ישׁ אֶחָ֜ד מִן־הָרָמָתַ֛יִם צֹופִ֖ים מֵהַ֣ר אֶפְרָ֑יִם \tWayX\t141550\tword\tאֶפְרָ֑יִם \tunknown\tnmpr\n", "\n", "2\t1_Samuel\t1\t1\t426598\tbook\tSamuel_I\t453959\tclause\tוּשְׁמֹ֡ו אֶ֠לְקָנָה בֶּן־יְרֹחָ֧ם בֶּן־אֱלִיה֛וּא בֶּן־תֹּ֥חוּ בֶן־צ֖וּף אֶפְרָתִֽי׃ \tNmCl\t141553\tword\tאֶ֠לְקָנָה \tm\tnmpr\n", "\n", "3\t1_Samuel\t1\t1\t426598\tbook\tSamuel_I\t453959\tclause\tוּשְׁמֹ֡ו אֶ֠לְקָנָה בֶּן־יְרֹחָ֧ם בֶּן־אֱלִיה֛וּא בֶּן־תֹּ֥חוּ בֶן־צ֖וּף אֶפְרָתִֽי׃ \tNmCl\t141555\tword\tיְרֹחָ֧ם \tm\tnmpr\n", "\n", "4\t1_Samuel\t1\t1\t426598\tbook\tSamuel_I\t453959\tclause\tוּשְׁמֹ֡ו אֶ֠לְקָנָה בֶּן־יְרֹחָ֧ם בֶּן־אֱלִיה֛וּא בֶּן־תֹּ֥חוּ בֶן־צ֖וּף אֶפְרָתִֽי׃ \tNmCl\t141557\tword\tאֱלִיה֛וּא \tm\tnmpr\n", "\n", "5\t1_Samuel\t1\t1\t426598\tbook\tSamuel_I\t453959\tclause\tוּשְׁמֹ֡ו אֶ֠לְקָנָה בֶּן־יְרֹחָ֧ם בֶּן־אֱלִיה֛וּא בֶּן־תֹּ֥חוּ בֶן־צ֖וּף אֶפְרָתִֽי׃ \tNmCl\t141559\tword\tתֹּ֥חוּ \tm\tnmpr\n", "\n", "6\t1_Samuel\t1\t1\t426598\tbook\tSamuel_I\t453959\tclause\tוּשְׁמֹ֡ו אֶ֠לְקָנָה בֶּן־יְרֹחָ֧ם בֶּן־אֱלִיה֛וּא בֶּן־תֹּ֥חוּ בֶן־צ֖וּף אֶפְרָתִֽי׃ \tNmCl\t141561\tword\tצ֖וּף \tm\tnmpr\n", "\n", "7\t1_Samuel\t1\t2\t426598\tbook\tSamuel_I\t453961\tclause\tשֵׁ֤ם אַחַת֙ חַנָּ֔ה \tNmCl\t141569\tword\tחַנָּ֔ה \tf\tnmpr\n", "\n", "8\t1_Samuel\t1\t2\t426598\tbook\tSamuel_I\t453962\tclause\tוְשֵׁ֥ם הַשֵּׁנִ֖ית פְּנִנָּ֑ה \tNmCl\t141574\tword\tפְּנִנָּ֑ה \tf\tnmpr\n", "\n", "9\t1_Samuel\t1\t2\t426598\tbook\tSamuel_I\t453964\tclause\tלִפְנִנָּה֙ יְלָדִ֔ים \tNmCl\t141578\tword\tפְנִנָּה֙ \tf\tnmpr\n", "\n" ] } ], "source": [ "A.export(results)\n", "checkout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you see, you have an extra column `typ2` and `gn3`.\n", "\n", "This gives you a lot of control over the generation of spreadsheets." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Not from queries\n", "\n", "You can also export lists of node tuples that are not obtained by a query:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "((453958, 141550), (453959, 141553))" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tuples = (\n", " tuple(results[0][1:3]),\n", " tuple(results[1][1:3]),\n", ")\n", "\n", "tuples" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Two rows, each row has a clause node and a word node.\n", "\n", "Let's do a bare export:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "R\tS1\tS2\tS3\tNODE1\tTYPE1\tTEXT1\tbook1\tNODE2\tTYPE2\tTEXT2\ttyp2\n", "\n", "1\t1_Samuel\t1\t1\t453958\tclause\tוַיְהִי֩ אִ֨ישׁ אֶחָ֜ד מִן־הָרָמָתַ֛יִם צֹופִ֖ים מֵהַ֣ר אֶפְרָ֑יִם \t\t141550\tword\tאֶפְרָ֑יִם \t\n", "\n", "2\t1_Samuel\t1\t1\t453959\tclause\tוּשְׁמֹ֡ו אֶ֠לְקָנָה בֶּן־יְרֹחָ֧ם בֶּן־אֱלִיה֛וּא בֶּן־תֹּ֥חוּ בֶן־צ֖וּף אֶפְרָתִֽי׃ \t\t141553\tword\tאֶ֠לְקָנָה \t\n", "\n" ] } ], "source": [ "A.export(tuples)\n", "checkout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Wait a minute: why is the `typ2` there?\n", "\n", "It is because we have run a query before where we asked for `typ`.\n", "\n", "If we do not want to be influenced by previous things we've run, we need to reset the display:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "A.displayReset(\"tupleFeatures\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "R\tS1\tS2\tS3\tNODE1\tTYPE1\tTEXT1\tNODE2\tTYPE2\tTEXT2\n", "\n", "1\t1_Samuel\t1\t1\t453958\tclause\tוַיְהִי֩ אִ֨ישׁ אֶחָ֜ד מִן־הָרָמָתַ֛יִם צֹופִ֖ים מֵהַ֣ר אֶפְרָ֑יִם \t141550\tword\tאֶפְרָ֑יִם \n", "\n", "2\t1_Samuel\t1\t1\t453959\tclause\tוּשְׁמֹ֡ו אֶ֠לְקָנָה בֶּן־יְרֹחָ֧ם בֶּן־אֱלִיה֛וּא בֶּן־תֹּ֥חוּ בֶן־צ֖וּף אֶפְרָתִֽי׃ \t141553\tword\tאֶ֠לְקָנָה \n", "\n" ] } ], "source": [ "A.export(tuples)\n", "checkout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Display setup\n", "\n", "We can get richer exports by means of\n", "`A.displaySetup()`, using the parameter `tupleFeatures`:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "A.displaySetup(\n", " tupleFeatures=(\n", " (0, \"typ rela\"),\n", " (1, \"sp gn nu pdp\"),\n", " )\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We assign extra features per member of the tuple.\n", "\n", "In the above case:\n", "\n", "* the first (`0`) member (the clause node), gets feature `typ`;\n", "* the second (`1`) member (the word node), gets features `sp` and `gn`." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "R\tS1\tS2\tS3\tNODE1\tTYPE1\tTEXT1\ttyp1\trela1\tNODE2\tTYPE2\tTEXT2\tsp2\tgn2\tnu2\tpdp2\n", "\n", "1\t1_Samuel\t1\t1\t453958\tclause\tוַיְהִי֩ אִ֨ישׁ אֶחָ֜ד מִן־הָרָמָתַ֛יִם צֹופִ֖ים מֵהַ֣ר אֶפְרָ֑יִם \tWayX\tNA\t141550\tword\tאֶפְרָ֑יִם \tnmpr\tunknown\tsg\tnmpr\n", "\n", "2\t1_Samuel\t1\t1\t453959\tclause\tוּשְׁמֹ֡ו אֶ֠לְקָנָה בֶּן־יְרֹחָ֧ם בֶּן־אֱלִיה֛וּא בֶּן־תֹּ֥חוּ בֶן־צ֖וּף אֶפְרָתִֽי׃ \tNmCl\tNA\t141553\tword\tאֶ֠לְקָנָה \tnmpr\tm\tsg\tnmpr\n", "\n" ] } ], "source": [ "A.export(tuples)\n", "checkout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Talking about display setup: other parameters also have effect, e.g. the text format.\n", "\n", "Let's change it to the phonetic representation." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "R\tS1\tS2\tS3\tNODE1\tTYPE1\tTEXT1\ttyp1\trela1\tNODE2\tTYPE2\tTEXT2\tsp2\tgn2\tnu2\tpdp2\n", "\n", "1\t1_Samuel\t1\t1\t453958\tclause\twayᵊhˌî ʔˌîš ʔeḥˈāḏ min-hārāmāṯˈayim ṣôfˌîm mēhˈar ʔefrˈāyim \tWayX\tNA\t141550\tword\tʔefrˈāyim \tnmpr\tunknown\tsg\tnmpr\n", "\n", "2\t1_Samuel\t1\t1\t453959\tclause\tûšᵊmˈô ʔelqānˌā ben-yᵊrōḥˈām ben-ʔᵉlîhˈû ben-tˌōḥû ven-ṣˌûf ʔefrāṯˈî . \tNmCl\tNA\t141553\tword\tʔelqānˌā \tnmpr\tm\tsg\tnmpr\n", "\n" ] } ], "source": [ "A.export(tuples, fmt=\"text-phono-full\")\n", "checkout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Chained queries\n", "\n", "You can chain queries like this:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.39s 6 results\n", " 0.40s 1 result\n" ] } ], "source": [ "results = (\n", " A.search(\n", " \"\"\"\n", "book book=Samuel_I\n", " chapter chapter=1\n", " verse verse=1\n", " clause\n", " word sp=nmpr\n", "\"\"\"\n", " )\n", " + A.search(\n", " \"\"\"\n", "book book=Samuel_I\n", " chapter chapter=1\n", " verse verse=1\n", " clause\n", " word sp=verb nu=pl\n", "\"\"\"\n", " )\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In such cases, it is better to setup the features yourself:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "A.displaySetup(\n", " tupleFeatures=(\n", " (3, \"typ rela\"),\n", " (4, \"sp gn vt vs\"),\n", " ),\n", " fmt=\"text-phono-full\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can do a fine export:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "R\tS1\tS2\tS3\tNODE1\tTYPE1\tNODE2\tTYPE2\tNODE3\tTYPE3\tTEXT3\tNODE4\tTYPE4\tTEXT4\ttyp4\trela4\tNODE5\tTYPE5\tTEXT5\tsp5\tgn5\tvt5\tvs5\n", "\n", "1\t1_Samuel\t1\t1\t426598\tbook\t426862\tchapter\t1421518\tverse\twayᵊhˌî ʔˌîš ʔeḥˈāḏ min-hārāmāṯˈayim ṣôfˌîm mēhˈar ʔefrˈāyim ûšᵊmˈô ʔelqānˌā ben-yᵊrōḥˈām ben-ʔᵉlîhˈû ben-tˌōḥû ven-ṣˌûf ʔefrāṯˈî . \t453958\tclause\twayᵊhˌî ʔˌîš ʔeḥˈāḏ min-hārāmāṯˈayim ṣôfˌîm mēhˈar ʔefrˈāyim \tWayX\tNA\t141550\tword\tʔefrˈāyim \tnmpr\tunknown\tNA\tNA\n", "\n", "2\t1_Samuel\t1\t1\t426598\tbook\t426862\tchapter\t1421518\tverse\twayᵊhˌî ʔˌîš ʔeḥˈāḏ min-hārāmāṯˈayim ṣôfˌîm mēhˈar ʔefrˈāyim ûšᵊmˈô ʔelqānˌā ben-yᵊrōḥˈām ben-ʔᵉlîhˈû ben-tˌōḥû ven-ṣˌûf ʔefrāṯˈî . \t453959\tclause\tûšᵊmˈô ʔelqānˌā ben-yᵊrōḥˈām ben-ʔᵉlîhˈû ben-tˌōḥû ven-ṣˌûf ʔefrāṯˈî . \tNmCl\tNA\t141553\tword\tʔelqānˌā \tnmpr\tm\tNA\tNA\n", "\n", "3\t1_Samuel\t1\t1\t426598\tbook\t426862\tchapter\t1421518\tverse\twayᵊhˌî ʔˌîš ʔeḥˈāḏ min-hārāmāṯˈayim ṣôfˌîm mēhˈar ʔefrˈāyim ûšᵊmˈô ʔelqānˌā ben-yᵊrōḥˈām ben-ʔᵉlîhˈû ben-tˌōḥû ven-ṣˌûf ʔefrāṯˈî . \t453959\tclause\tûšᵊmˈô ʔelqānˌā ben-yᵊrōḥˈām ben-ʔᵉlîhˈû ben-tˌōḥû ven-ṣˌûf ʔefrāṯˈî . \tNmCl\tNA\t141555\tword\tyᵊrōḥˈām \tnmpr\tm\tNA\tNA\n", "\n", "4\t1_Samuel\t1\t1\t426598\tbook\t426862\tchapter\t1421518\tverse\twayᵊhˌî ʔˌîš ʔeḥˈāḏ min-hārāmāṯˈayim ṣôfˌîm mēhˈar ʔefrˈāyim ûšᵊmˈô ʔelqānˌā ben-yᵊrōḥˈām ben-ʔᵉlîhˈû ben-tˌōḥû ven-ṣˌûf ʔefrāṯˈî . \t453959\tclause\tûšᵊmˈô ʔelqānˌā ben-yᵊrōḥˈām ben-ʔᵉlîhˈû ben-tˌōḥû ven-ṣˌûf ʔefrāṯˈî . \tNmCl\tNA\t141557\tword\tʔᵉlîhˈû \tnmpr\tm\tNA\tNA\n", "\n", "5\t1_Samuel\t1\t1\t426598\tbook\t426862\tchapter\t1421518\tverse\twayᵊhˌî ʔˌîš ʔeḥˈāḏ min-hārāmāṯˈayim ṣôfˌîm mēhˈar ʔefrˈāyim ûšᵊmˈô ʔelqānˌā ben-yᵊrōḥˈām ben-ʔᵉlîhˈû ben-tˌōḥû ven-ṣˌûf ʔefrāṯˈî . \t453959\tclause\tûšᵊmˈô ʔelqānˌā ben-yᵊrōḥˈām ben-ʔᵉlîhˈû ben-tˌōḥû ven-ṣˌûf ʔefrāṯˈî . \tNmCl\tNA\t141559\tword\ttˌōḥû \tnmpr\tm\tNA\tNA\n", "\n", "6\t1_Samuel\t1\t1\t426598\tbook\t426862\tchapter\t1421518\tverse\twayᵊhˌî ʔˌîš ʔeḥˈāḏ min-hārāmāṯˈayim ṣôfˌîm mēhˈar ʔefrˈāyim ûšᵊmˈô ʔelqānˌā ben-yᵊrōḥˈām ben-ʔᵉlîhˈû ben-tˌōḥû ven-ṣˌûf ʔefrāṯˈî . \t453959\tclause\tûšᵊmˈô ʔelqānˌā ben-yᵊrōḥˈām ben-ʔᵉlîhˈû ben-tˌōḥû ven-ṣˌûf ʔefrāṯˈî . \tNmCl\tNA\t141561\tword\tṣˌûf \tnmpr\tm\tNA\tNA\n", "\n", "7\t1_Samuel\t1\t1\t426598\tbook\t426862\tchapter\t1421518\tverse\twayᵊhˌî ʔˌîš ʔeḥˈāḏ min-hārāmāṯˈayim ṣôfˌîm mēhˈar ʔefrˈāyim ûšᵊmˈô ʔelqānˌā ben-yᵊrōḥˈām ben-ʔᵉlîhˈû ben-tˌōḥû ven-ṣˌûf ʔefrāṯˈî . \t453958\tclause\twayᵊhˌî ʔˌîš ʔeḥˈāḏ min-hārāmāṯˈayim ṣôfˌîm mēhˈar ʔefrˈāyim \tWayX\tNA\t141547\tword\tṣôfˌîm \tverb\tm\tptca\tqal\n", "\n" ] } ], "source": [ "A.export(results)\n", "checkout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# All steps\n", "\n", "Now you now how to escape from Text-Fabric.\n", "\n", "We hope that this makes your stay in TF more comfortable.\n", "It's not a *Hotel California*.\n", "\n", "* **[start](start.ipynb)** your first step in mastering the bible computationally\n", "* **[display](display.ipynb)** become an expert in creating pretty displays of your text structures\n", "* **[search](search.ipynb)** turbo charge your hand-coding with search templates\n", "* **export Excel** make tailor-made spreadsheets out of your results\n", "* **[share](share.ipynb)** draw in other people's data and let them use yours\n", "* **[export](export.ipynb)** export your dataset as an Emdros database\n", "* **[annotate](annotate.ipynb)** annotate plain text by means of other tools and import the annotations as TF features\n", "* **[map](map.ipynb)** map somebody else's annotations to a new version of the corpus\n", "* **[volumes](volumes.ipynb)** work with selected books only\n", "* **[trees](trees.ipynb)** work with the BHSA data as syntax trees\n", "\n", "CC-BY Dirk Roorda" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.1" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }