{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "\n", "---\n", "\n", "To get started: consult [start](start.ipynb)\n", "\n", "---\n", "\n", "# Annotate\n", "\n", "Text-Fabric is a tool for computing with read only datasets.\n", "How can you manually annotate an existing dataset?\n", "\n", "The scenario is: export the portions that must be annotated into a plain text file, accompanied with\n", "location information.\n", "\n", "Use an external tool, e.g.\n", "[BRAT](https://brat.nlplab.org) to manually annotate that text.\n", "\n", "Read the resulting annotations, combine them with the location information,\n", "and export the result as a new feature or set of features.\n", "\n", "These new features can be published anywhere,\n", "see the [share](share.ipynb) tutorial,\n", "and users that want to make use of the new features, can tell Text-Fabric to fetch it from the\n", "published location alongside the main dataset.\n", "\n", "From this point on, the new features act as first class citizens in the dataset.\n", "\n", "Note how this does not involve modifying existing datasets!" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T10:06:39.818664Z", "start_time": "2018-05-24T10:06:39.796588Z" }, "lines_to_next_cell": 2 }, "outputs": [], "source": [ "import os\n", "from tf.app import use\n", "from tf.convert.recorder import Recorder\n", "from tf.dataset import Versions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**NB:** We used version 0.4 of this data set to export material, annotate the exported material,\n", "and draw in those annotation as a bunch of new features.\n", "However, in the meanwhile we have newer versions of the missieven data, where different encoding decisions have been applied.\n", "\n", "Rather than doing the annotation work again, we want to migrate the annotations from 0.4 to 0.7.\n", "We shall show how.\n", "\n", "First we show how we made the annotations in 0.4, and to that end we use a previous version of the data.\n", "\n", "We have to overcome the fact that in those times this repository resided under a different organization on GitHub (`Dans-labs`) and\n", "had a different name (`clariah-gm`). Also, the TF-app for this dataset resided in `annotation/app-missieven`, while\n", "it is now in `clariah/wp6-missieven/app`.\n", "\n", "It is still possible to work with that old version. We ask for the old TF-app\n", "and override the `org` and `repo` settings of the old app, by passing the new values in `provenanceSpec=...`." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "TF-app: ~/text-fabric-data/github/annotation/app-missieven/code" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/github/CLARIAH/wp6-missieven/tf/0.4" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Text-Fabric: Text-Fabric API 10.2.6, CLARIAH/wp6-missieven/app v3, Search Reference
Data: WP6-MISSIEVEN, Character table, Feature docs
Features:
\n", "
General Missives Dutch East India Company 1600-1800\n", "
\n", "\n", "
\n", "
\n", "author\n", "
\n", "
str
\n", "\n", " authors of the letter, surnames only\n", "\n", "
\n", "\n", "
\n", "
\n", "authorFull\n", "
\n", "
str
\n", "\n", " authors of the letter, full names\n", "\n", "
\n", "\n", "
\n", "
\n", "col\n", "
\n", "
int
\n", "\n", " column number of a column in a row in a table\n", "\n", "
\n", "\n", "
\n", "
\n", "day\n", "
\n", "
int
\n", "\n", " day part of the date of the letter\n", "\n", "
\n", "\n", "
\n", "
\n", "emph\n", "
\n", "
str
\n", "\n", " whether a word is emphasized by typography\n", "\n", "
\n", "\n", "
\n", "
\n", "facs\n", "
\n", "
str
\n", "\n", " url part of the corresponding online facsimile page; the url itself can be constructed using a hard coded template. See also the tpl feature\n", "\n", "
\n", "\n", "
\n", "
\n", "fnote\n", "
\n", "
str
\n", "\n", " all footnotes at that position\n", "\n", "
\n", "\n", "
\n", "
\n", "folio\n", "
\n", "
int
\n", "\n", " a folio reference\n", "\n", "
\n", "\n", "
\n", "
\n", "month\n", "
\n", "
int
\n", "\n", " month part of the date of the letter\n", "\n", "
\n", "\n", "
\n", "
\n", "n\n", "
\n", "
int
\n", "\n", " number of a volume, letter, page, para, line, table\n", "\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "page\n", "
\n", "
str
\n", "\n", " number of the first page of this letter in this volume\n", "\n", "
\n", "\n", "
\n", "
\n", "place\n", "
\n", "
str
\n", "\n", " place from where the letter was sent\n", "\n", "
\n", "\n", "
\n", "
\n", "punc\n", "
\n", "
str
\n", "\n", " punctuation and/or whitespace following a wordup to the next word\n", "\n", "
\n", "\n", "
\n", "
\n", "punco\n", "
\n", "
str
\n", "\n", " punctuation and/or whitespace following a word,up to the next word, original text only\n", "\n", "
\n", "\n", "
\n", "
\n", "puncr\n", "
\n", "
str
\n", "\n", " punctuation and/or whitespace following a word,up to the next word, remark text only\n", "\n", "
\n", "\n", "
\n", "
\n", "rawdate\n", "
\n", "
str
\n", "\n", " the date the letter was sent\n", "\n", "
\n", "\n", "
\n", "
\n", "ref\n", "
\n", "
str
\n", "\n", " whether a word belongs to the text of reference\n", "\n", "
\n", "\n", "
\n", "
\n", "remark\n", "
\n", "
int
\n", "\n", " whether a word belongs to the text of editorial remarks\n", "\n", "
\n", "\n", "
\n", "
\n", "row\n", "
\n", "
int
\n", "\n", " row number of a row of column in a table\n", "\n", "
\n", "\n", "
\n", "
\n", "seq\n", "
\n", "
str
\n", "\n", " ('sequence number of this letter among the letters of the same author in this volume',)\n", "\n", "
\n", "\n", "
\n", "
\n", "special\n", "
\n", "
str
\n", "\n", " whether a word has special typography possibly with OCR mistakes as well\n", "\n", "
\n", "\n", "
\n", "
\n", "status\n", "
\n", "
str
\n", "\n", " status of the letter, e.g. secret, copy\n", "\n", "
\n", "\n", "
\n", "
\n", "sub\n", "
\n", "
str
\n", "\n", " whether a word has subscript typography possibly indicating the denominator of a fraction\n", "\n", "
\n", "\n", "
\n", "
\n", "super\n", "
\n", "
str
\n", "\n", " whether a word has superscript typography possibly indicating the numerator of a fraction\n", "\n", "
\n", "\n", "
\n", "
\n", "title\n", "
\n", "
str
\n", "\n", " title of the letter\n", "\n", "
\n", "\n", "
\n", "
\n", "tpl\n", "
\n", "
int
\n", "\n", " url template number of the corresponding online facsimile page;the url itself can be constructed using this template, filled with the contents of the facs attribute.\n", "\n", "
\n", "\n", "
\n", "
\n", "trans\n", "
\n", "
str
\n", "\n", " transcription of a word\n", "\n", "
\n", "\n", "
\n", "
\n", "transo\n", "
\n", "
str
\n", "\n", " transcription of a word, only for original text\n", "\n", "
\n", "\n", "
\n", "
\n", "transr\n", "
\n", "
str
\n", "\n", " transcription of a word, only for remark text\n", "\n", "
\n", "\n", "
\n", "
\n", "und\n", "
\n", "
str
\n", "\n", " whether a word is underlined by typography\n", "\n", "
\n", "\n", "
\n", "
\n", "vol\n", "
\n", "
int
\n", "\n", " volume number\n", "\n", "
\n", "\n", "
\n", "
\n", "year\n", "
\n", "
int
\n", "\n", " year part of the date of the letter\n", "\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Text-Fabric API: names N F E L T S C TF directly usable

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A = use(\"CLARIAH/wp6-missieven:v0.4\", checkout=\"clone\", version=\"0.4\", hoist=globals(),\n", " legacy=True, provenanceSpec=dict(org=\"CLARIAH\", repo=\"wp6-missieven\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Text-Fabric has support for exporting data together with location information and then importing new data\n", "and turning it into new features based on the location information.\n", "\n", "See [Recorder](https://annotation.github.io/text-fabric/tf/convert/recorder.html).\n", "\n", "We show the workflow by selecting a letter, exporting the original text material as plain text,\n", "manually annotating it for named entities with [BRAT](https://brat.nlplab.org) and then saving the output\n", "as a new feature `name`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Text selection\n", "\n", "We choose volume 1 page 6:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T07:46:55.998382Z", "start_time": "2018-05-24T07:46:55.137956Z" } }, "outputs": [ { "data": { "text/html": [ "
1 6:1  Houding der Bandanezen, vgl. Europeërs VIII, p. 258-259klacht over particuliere
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:2  handel
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:3  
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:4  Op den 12 deser is een jonge slave van een orancaybij nacht comen
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:5  swemmen aen onse sloep, cloeck van verstant ende prompt in ’t antwoorden,
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:6  hebben daeromme den 14en deser goet gevonden 14 soldaeten onder ’t commandement
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:7  van een sergeant met een prau 3 uuren voor daege aen lant te setten aen
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:8  d’ander sijde om te maeken een bosschaede waertoe medegenomen werde den
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:9  overloper tot een guide, niettemin hem gebonden houdende die daertoe seer willich
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:10  was, haer belastende sich niet te openbaeren, dan voor een persoon van qualiteyt
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:11  gevangen ofte doot te crijgen. Soo is gebeurt, dat daer quamp alleen met een
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:12  jongen een groot arancay van Nera, broeder van den sabandaer dewelcke sij
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:13  tref f ten ende ’t hooft ons hier in ’t casteel gebracht, den voorsz. orancaye was door
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:14  den jongen ende andere seer wel bekent, 14 daegen te voorens in ’t parlementeren
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:15  met den onsen gesproken, die hem mede geroemt hadde, 2 van onse Hollanders
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:16  in den moort van den admirael Verhoeven saliger omgebracht te hebben.
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:17  Diverse mededelingen, meest klachten over het personeel. B. gébruikt de term
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:18  „generaele brief f” in tegenstelling tot „carteb ellen” aan repatriërenden als attestatie
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:19  over hun gedrag meegegeven
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:20  
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:21  Ic hebbe voor mij genomen, soo haest dit casteel gemaeckt is te keeren naer
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:22  Ambojna ende van daer naer Ternnate ende soo den Coninck van Spagnien tusschen
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:23  de Heeren Staeten den treves geobserveert werde, metten Coninck van ditto
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:24  plaetse te contracteeren om met sijn hulpe dese plaetse te ocuperen ende hem daer
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:25  mede Coninck van te maeken onder protexie van E Mogende Heeren Staeten, doch
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:26  hiervan sal den tijt leeren ende namaels U E adviseeren
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:27  Dit volck van Banda is superbe, moordadich, wel versien van waepenen, van
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:28  de onsen voor desen ende van de Engelsche gecomen, dan weynich couraege omme
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:29  metten onsen te slaen, maer de bergen sijn haer fortressen, die voor ons innaccessible
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:30  sijn, sij leven mede van de vruchten van de boomen ende wortelen van de aerde,
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:31  daer wijluyden bij vergaen souden, hier regneert onder d’onse een plaege, genaempt
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:32  berebery waervan sij worden geheel impotent van handen ende beenen, mede
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:33  een plage van sere benen, alsoo dat ick er van 20 geen een hebbe sonder plaesters
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:34  aen de beenen.
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:35  Mededelingen over verschillende personen
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:36  
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:37  
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "p = A.nodeFromSectionStr(\"1 6\")\n", "for ln in L.d(p, otype=\"line\"):\n", " A.plain(ln, fmt=\"text-orig-full\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Quite a bit of names. Let's leave out the notes." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2018-05-24T07:46:55.998382Z", "start_time": "2018-05-24T07:46:55.137956Z" } }, "outputs": [ { "data": { "text/html": [ "
1 6:1  
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:2  
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:3  
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:4  Op den 12 deser is een jonge slave van een orancay*bij nacht comen
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:5  swemmen aen onse sloep, cloeck van verstant ende prompt in ’t antwoorden,
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:6  hebben daeromme den 14en deser goet gevonden 14 soldaeten onder ’t commandement
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:7  van een sergeant met een prau 3 uuren voor daege aen lant te setten aen
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:8  d’ander sijde om te maeken een bosschaede* waertoe medegenomen werde den
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:9  overloper tot een guide, niettemin hem gebonden houdende die daertoe seer willich
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:10  was, haer belastende sich niet te openbaeren, dan voor een persoon van qualiteyt
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:11  gevangen ofte doot te crijgen. Soo is gebeurt, dat daer quamp alleen met een
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:12  jongen een groot arancay van Nera, broeder van den sabandaer* dewelcke sij
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:13  tref f ten ende ’t hooft ons hier in ’t casteel gebracht, den voorsz. orancaye was door
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:14  den jongen ende andere seer wel bekent, 14 daegen te voorens in ’t parlementeren
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:15  met den onsen gesproken, die hem mede geroemt hadde, 2 van onse Hollanders
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:16  in den moort van den admirael Verhoeven* saliger omgebracht te hebben.
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:17  
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:18  
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:19  
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:20  
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:21  Ic hebbe voor mij genomen, soo haest dit casteel gemaeckt is te keeren naer
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:22  Ambojna ende van daer naer Ternnate ende soo den Coninck van Spagnien tusschen
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:23  de Heeren Staeten den treves geobserveert werde, metten Coninck van ditto
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:24  plaetse te contracteeren om met sijn hulpe dese plaetse te ocuperen ende hem daer
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:25  mede Coninck van te maeken onder protexie van E Mogende Heeren Staeten, doch
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:26  hiervan sal den tijt leeren ende namaels U E adviseeren*
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:27  Dit volck van Banda is superbe, moordadich, wel versien van waepenen, van
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:28  de onsen voor desen ende van de Engelsche gecomen, dan weynich couraege omme
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:29  metten onsen te slaen, maer de bergen sijn haer fortressen, die voor ons innaccessible
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:30  sijn, sij leven mede van de vruchten van de boomen ende wortelen van de aerde,
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:31  daer wijluyden bij vergaen souden, hier regneert onder d’onse een plaege, genaempt
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:32  berebery* waervan sij worden geheel impotent van handen ende beenen, mede
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:33  een plage van sere benen, alsoo dat ick er van 20 geen een hebbe sonder plaesters
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:34  aen de beenen.
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:35  
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:36  
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
1 6:37  
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "for ln in L.d(p, otype=\"line\"):\n", " A.plain(ln, fmt=\"layout-orig\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Recording\n", "\n", "We'll prepare this portion of text for annotation outside TF.\n", "\n", "What needs to happen is, that we produce a text file and that we remember the positions of the relevant\n", "nodes in that text file.\n", "\n", "The [Recorder](https://annotation.github.io/text-fabric/tf/convert/recorder.html).\n", "lets you create a string from nodes,\n", "where the positions of the nodes in that string are remembered.\n", "You may add all kinds of material in between the texts of the nodes.\n", "\n", "And it is up to you how you represent the nodes.\n", "\n", "We can add strings to the recorder, and we can tell nodes to start and to stop.\n", "\n", "We add all words in all lines to the recorder, provided the words belong to the original material.\n", "\n", "We add line numbers to each line." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# start a recorder\n", "rec = Recorder()\n", "\n", "for ln in L.d(p, otype=\"line\"):\n", " # start a line node\n", " rec.start(ln)\n", "\n", " # add the line number\n", " rec.add(f\"{F.n.v(ln)}. \")\n", "\n", " for w in L.d(ln, otype=\"word\"):\n", " trans = F.transo.v(w)\n", " # if there is nothing in transo, it is not original text\n", " if not trans:\n", " continue\n", "\n", " # start a word node\n", " rec.start(w)\n", "\n", " # add the word and its trailing punctuation\n", " rec.add(f\"{trans}{F.punco.v(w)}\")\n", "\n", " # terminate the word node\n", " rec.end(w)\n", "\n", " # add a newline\n", " rec.add(\"\\n\")\n", "\n", " # terminate the line node\n", " rec.end(ln)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As a check, let's print the recorded text:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1. \n", "2. \n", "3. \n", "4. Op den 12 deser is een jonge slave van een orancaybij nacht comen\n", "5. swemmen aen onse sloep, cloeck van verstant ende prompt in ’t antwoorden,\n", "6. hebben daeromme den 14en deser goet gevonden 14 soldaeten onder ’t commandement\n", "7. van een sergeant met een prau 3 uuren voor daege aen lant te setten aen\n", "8. d’ander sijde om te maeken een bosschaede waertoe medegenomen werde den\n", "9. overloper tot een guide, niettemin hem gebonden houdende die daertoe seer willich\n", "10. was, haer belastende sich niet te openbaeren, dan voor een persoon van qualiteyt\n", "11. gevangen ofte doot te crijgen. Soo is gebeurt, dat daer quamp alleen met een\n", "12. jongen een groot arancay van Nera, broeder van den sabandaer dewelcke sij\n", "13. tref f ten ende ’t hooft ons hier in ’t casteel gebracht, den voorsz. orancaye was door\n", "14. den jongen ende andere seer wel bekent, 14 daegen te voorens in ’t parlementeren\n", "15. met den onsen gesproken, die hem mede geroemt hadde, 2 van onse Hollanders\n", "16. in den moort van den admirael Verhoeven saliger omgebracht te hebben.\n", "17. \n", "18. \n", "19. \n", "20. \n", "21. Ic hebbe voor mij genomen, soo haest dit casteel gemaeckt is te keeren naer\n", "22. Ambojna ende van daer naer Ternnate ende soo den Coninck van Spagnien tusschen\n", "23. de Heeren Staeten den treves geobserveert werde, metten Coninck van ditto\n", "24. plaetse te contracteeren om met sijn hulpe dese plaetse te ocuperen ende hem daer\n", "25. mede Coninck van te maeken onder protexie van E Mogende Heeren Staeten, doch\n", "26. hiervan sal den tijt leeren ende namaels U E adviseeren \n", "27. Dit volck van Banda is superbe, moordadich, wel versien van waepenen, van\n", "28. de onsen voor desen ende van de Engelsche gecomen, dan weynich couraege omme\n", "29. metten onsen te slaen, maer de bergen sijn haer fortressen, die voor ons innaccessible\n", "30. sijn, sij leven mede van de vruchten van de boomen ende wortelen van de aerde,\n", "31. daer wijluyden bij vergaen souden, hier regneert onder d’onse een plaege, genaempt\n", "32. berebery waervan sij worden geheel impotent van handen ende beenen, mede\n", "33. een plage van sere benen, alsoo dat ick er van 20 geen een hebbe sonder plaesters\n", "34. aen de beenen.\n", "35. \n", "36. \n", "37. \n", "\n" ] } ], "source": [ "print(rec.text())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and the recorded node positions." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "pos 20: frozenset({5054871, 1039})\n", "pos 21: frozenset({5054871, 1039})\n", "pos 22: frozenset({1040, 5054871})\n", "pos 23: frozenset({1040, 5054871})\n", "pos 24: frozenset({1040, 5054871})\n", "pos 25: frozenset({1041, 5054871})\n", "pos 26: frozenset({1041, 5054871})\n", "pos 27: frozenset({1041, 5054871})\n", "pos 28: frozenset({1041, 5054871})\n", "pos 29: frozenset({1041, 5054871})\n" ] } ], "source": [ "for i in range(20, 30):\n", " print(f\"pos {i}: {rec.positions()[i]}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This means that the character on position 20 in the plain text string is part of the text of node 1039 and of node 5054871." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With one statement we write the recorded text and the positions to two files:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "rec.write(\"exercises/v01-p0006.txt\")" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1. \n", "2. \n", "3. \n", "4. Op den 12 deser is een jonge slave van een orancaybij nacht comen\n", "5. swemmen aen onse sloep, cloeck van verstant ende prompt in ’t antwoorden,\n", "6. hebben daeromme den 14en deser goet gevonden 14 soldaeten onder ’t commandement\n", "7. van een sergeant met een prau 3 uuren voor daege aen lant te setten aen\n", "8. d’ander sijde om te maeken een bosschaede waertoe medegenomen werde den\n", "9. overloper tot een guide, niettemin hem gebonden houdende die daertoe seer willich\n", "10. was, haer belastende sich niet te openbaeren, dan voor een persoon van qualiteyt\n" ] } ], "source": [ "!head -n 10 exercises/v01-p0006.txt" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5054868\n", "5054868\n", "5054868\n", "5054868\n", "5054869\n", "5054869\n", "5054869\n", "5054869\n", "5054870\n", "5054870\n", "5054870\n", "5054870\n", "5054871\n", "5054871\n", "5054871\n", "1038\t5054871\n", "1038\t5054871\n", "1038\t5054871\n", "5054871\t1039\n", "5054871\t1039\n", "5054871\t1039\n", "5054871\t1039\n", "1040\t5054871\n", "1040\t5054871\n", "1040\t5054871\n", "1041\t5054871\n", "1041\t5054871\n", "1041\t5054871\n", "1041\t5054871\n", "1041\t5054871\n" ] } ], "source": [ "!head -n 30 exercises/v01-p0006.txt.pos" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "# Annotating\n", "\n", "We head over to a local installation of Brat\n", "and annotate our text.\n", "\n", "Left you see a quick and dirty manual annotation of some entities that I performed on\n", "the Brat interface, served locally.\n", "\n", "We captured the output of this annotation session into the file `v01-p0006.txt.ann`, it has the following contents:\n", "\n", "```\n", "T1\tPerson 675 679\tNera\n", "T2\tGPE 1181 1189\tTernnate\n", "#1\tAnnotatorNotes T2\tTernate\n", "T3\tPerson 1203 1223\tConinck van Spagnien\n", "T4\tGPE 1215 1223\tSpagnien\n", "T5\tOrganization 1240 1254\tHeeren Staeten\n", "T6\tPerson 1293 1300\tConinck\n", "T7\tPerson 1406 1413\tConinck\n", "T8\tOrganization 1457 1471\tHeeren Staeten\n", "T9\tGPE 1557 1562\tBanda\n", "T10\tGPE 1653 1662\tEngelsche\n", "T11\tPerson 58 65\torancay\n", "T12\tPerson 663 670\tarancay\n", "T13\tPerson 697 706\tsabandaer\n", "T14\tPerson 794 802\torancaye\n", "T15\tGPE 965 975\tHollanders\n", "T16\tPerson 1010 1019\tVerhoeven\n", "T17\tGPE 1154 1161\tAmbojna\n", "#2\tAnnotatorNotes T17\tAmboina\n", "T18\tGPE 1305 1310;1311 1322\tditto 24. plaetse\n", "#3\tAnnotatorNotes T18\tTernate\n", "*\tAlias T11 T14\n", "R1\tGeographical_part Arg1:T2 Arg2:T18\n", "```\n", "\n", "Now we want to feed back these annotations as TF features on word nodes.\n", "The Recorder cannot anticipate the formats that tools like Brat deliver their results in.\n", "Therefore, it expects the data to be in a straightforward tabular format.\n", "\n", "In this case, we must do a small conversion to bring the output annotations\n", "into good shape, namely a tab separated file\n", "with columns `start end feature1 feature2 ...`\n", "\n", "Here we choose to expose the identifier (the `Tn` values) as `feature1`\n", "and the kind of entity as `feature2`.\n", "\n", "In case there is a link between two entities, we want to assign\n", "the earliest `T`number to all entities involved.\n", "\n", "We also want to preserve the annotator notes." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "675\t679\tT1\tPerson\t\n", "1181\t1189\tT2\tGPE\tTernate\n", "1203\t1223\tT3\tPerson\t\n", "1215\t1223\tT4\tGPE\t\n", "1240\t1254\tT5\tOrganization\t\n", "1293\t1300\tT6\tPerson\t\n", "1406\t1413\tT7\tPerson\t\n", "1457\t1471\tT8\tOrganization\t\n", "1557\t1562\tT9\tGPE\t\n", "1653\t1662\tT10\tGPE\t\n", "58\t65\tT11\tPerson\t\n", "663\t670\tT12\tPerson\t\n", "697\t706\tT13\tPerson\t\n", "794\t802\tT11\tPerson\t\n", "965\t975\tT15\tGPE\t\n", "1010\t1019\tT16\tPerson\t\n", "1154\t1161\tT17\tGPE\tAmboina\n", "1305\t1322\tT2\tGPE\tTernate\n", "{'T14': 'T11', 'T18': 'T2'}\n" ] } ], "source": [ "def brat2tsv(inh, outh):\n", " outh.write(\"start\\tend\\tentityId\\tentityKind\\tentityComment\\n\")\n", " entities = []\n", " notes = {}\n", " maps = {}\n", " for line in inh:\n", " fields = line.rstrip(\"\\n\").split(\"\\t\")\n", " if line.startswith(\"T\"):\n", " id1 = fields[0]\n", " (kind, *positions) = fields[1].split()\n", " (start, end) = (positions[0], positions[-1])\n", " entities.append([start, end, id1, kind, \"\"])\n", " elif line.startswith(\"#\"):\n", " id1 = fields[1].split()[1]\n", " notes[id1] = fields[2]\n", " elif line.startswith(\"*\"):\n", " (kind, id1, id2) = fields[1].split()\n", " maps[id2] = id1\n", " elif line.startswith(\"R\"):\n", " (id1, id2) = (f[5:] for f in fields[1].split()[1:])\n", " maps[id2] = id1\n", " for entity in entities:\n", " id1 = entity[2]\n", " if id1 in maps:\n", " entity[2] = maps[id1]\n", " if id1 in notes:\n", " entity[4] = notes[id1]\n", " line = \"\\t\".join(entity)\n", " print(line)\n", " outh.write(f\"{line}\\n\")\n", "\n", " print(maps)\n", "\n", "\n", "with open(\"exercises/v01-p0006.txt.ann\") as inh:\n", " with open(\"exercises/v01-p0006.txt.tsv\", \"w\") as outh:\n", " brat2tsv(inh, outh)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our recorder knows how to do transform this file in feature data." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "features = rec.makeFeatures(\"exercises/v01-p0006.txt.tsv\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's see." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "entityId\n", "\t {1146: 'T1', 5054879: 'T13', 5054889: 'T17', 1255: 'T2', 1259: 'T3', 1260: 'T3', 1261: 'T4', 1264: 'T5', 1265: 'T5', 5054890: 'T2', 1271: 'T6', 1289: 'T7', 5054892: 'T8', 1298: 'T8', 1299: 'T8', 1314: 'T9', 5054894: 'T9', 1330: 'T10', 5054895: 'T10', 1048: 'T11', 1049: 'T11', 5054871: 'T11', 1144: 'T12', 1150: 'T13', 5054880: 'T11', 1167: 'T11', 5054882: 'T15', 1196: 'T15', 5054883: 'T16', 1203: 'T16', 1250: 'T17', 1273: 'T2', 5054891: 'T2', 1274: 'T2'}\n", "entityKind\n", "\t {1146: 'Person', 5054879: 'Person', 5054889: 'GPE', 1255: 'GPE', 1259: 'Person', 1260: 'Person', 1261: 'GPE', 1264: 'Organization', 1265: 'Organization', 5054890: 'GPE', 1271: 'Person', 1289: 'Person', 5054892: 'Organization', 1298: 'Organization', 1299: 'Organization', 1314: 'GPE', 5054894: 'GPE', 1330: 'GPE', 5054895: 'GPE', 1048: 'Person', 1049: 'Person', 5054871: 'Person', 1144: 'Person', 1150: 'Person', 5054880: 'Person', 1167: 'Person', 5054882: 'GPE', 1196: 'GPE', 5054883: 'Person', 1203: 'Person', 1250: 'GPE', 1273: 'GPE', 5054891: 'GPE', 1274: 'GPE'}\n", "entityComment\n", "\t {5054889: 'Amboina', 1255: 'Ternate', 1250: 'Amboina', 1273: 'Ternate', 5054890: 'Ternate', 5054891: 'Ternate', 1274: 'Ternate'}\n" ] } ], "source": [ "for (feat, data) in features.items():\n", " print(feat)\n", " print(\"\\t\", data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can show this prettier:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "entityId\n", "\tword 1146 => T1\n", "\tline 5054879 => T13\n", "\tline 5054889 => T17\n", "\tword 1255 => T2\n", "\tword 1259 => T3\n", "\tword 1260 => T3\n", "\tword 1261 => T4\n", "\tword 1264 => T5\n", "\tword 1265 => T5\n", "\tline 5054890 => T2\n", "\tword 1271 => T6\n", "\tword 1289 => T7\n", "\tline 5054892 => T8\n", "\tword 1298 => T8\n", "\tword 1299 => T8\n", "\tword 1314 => T9\n", "\tline 5054894 => T9\n", "\tword 1330 => T10\n", "\tline 5054895 => T10\n", "\tword 1048 => T11\n", "\tword 1049 => T11\n", "\tline 5054871 => T11\n", "\tword 1144 => T12\n", "\tword 1150 => T13\n", "\tline 5054880 => T11\n", "\tword 1167 => T11\n", "\tline 5054882 => T15\n", "\tword 1196 => T15\n", "\tline 5054883 => T16\n", "\tword 1203 => T16\n", "\tword 1250 => T17\n", "\tword 1273 => T2\n", "\tline 5054891 => T2\n", "\tword 1274 => T2\n", "entityKind\n", "\tword 1146 => Person\n", "\tline 5054879 => Person\n", "\tline 5054889 => GPE\n", "\tword 1255 => GPE\n", "\tword 1259 => Person\n", "\tword 1260 => Person\n", "\tword 1261 => GPE\n", "\tword 1264 => Organization\n", "\tword 1265 => Organization\n", "\tline 5054890 => GPE\n", "\tword 1271 => Person\n", "\tword 1289 => Person\n", "\tline 5054892 => Organization\n", "\tword 1298 => Organization\n", "\tword 1299 => Organization\n", "\tword 1314 => GPE\n", "\tline 5054894 => GPE\n", "\tword 1330 => GPE\n", "\tline 5054895 => GPE\n", "\tword 1048 => Person\n", "\tword 1049 => Person\n", "\tline 5054871 => Person\n", "\tword 1144 => Person\n", "\tword 1150 => Person\n", "\tline 5054880 => Person\n", "\tword 1167 => Person\n", "\tline 5054882 => GPE\n", "\tword 1196 => GPE\n", "\tline 5054883 => Person\n", "\tword 1203 => Person\n", "\tword 1250 => GPE\n", "\tword 1273 => GPE\n", "\tline 5054891 => GPE\n", "\tword 1274 => GPE\n", "entityComment\n", "\tline 5054889 => Amboina\n", "\tword 1255 => Ternate\n", "\tword 1250 => Amboina\n", "\tword 1273 => Ternate\n", "\tline 5054890 => Ternate\n", "\tline 5054891 => Ternate\n", "\tword 1274 => Ternate\n" ] } ], "source": [ "for (feat, data) in features.items():\n", " print(feat)\n", " for (node, value) in data.items():\n", " print(f\"\\t{F.otype.v(node)} {node} => {value}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that we assign entity features to line nodes as well.\n", "\n", "If that is undesired, we should not have instructed the Recorder to `rec.add(ln)` above." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Saving data\n", "\n", "The [documentation](https://annotation.github.io/text-fabric/tf/core/fabric.html#tf.core.fabric.FabricCore.save)\n", "explains how to save this data into text-fabric data files.\n", "\n", "We choose a location where to save it, the `exercises` directory next to this notebook." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ORG='CLARIAH' REPO='wp6-missieven' VERSION='0.4'\n" ] } ], "source": [ "GITHUB = os.path.expanduser(\"~/github\")\n", "ORG = A.context.org\n", "REPO = A.context.repo\n", "PATH = \"exercises\"\n", "VERSION = A.version\n", "\n", "print(f\"{ORG=} {REPO=} {VERSION=}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note the version: we have built the version against a specific version of the data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Later on, we pass this version on, so that users of our data will get the shared data in exactly the same version as their core data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have to specify a bit of metadata for this feature:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "metaData = {\n", " \"entityId\": dict(\n", " valueType=\"str\",\n", " description=\"identifier of a named entity\",\n", " creator=\"Dirk Roorda\",\n", " ),\n", " \"entityKind\": dict(\n", " valueType=\"str\",\n", " description=\"kind of a named entity\",\n", " creator=\"Dirk Roorda\",\n", " ),\n", " \"entityComment\": dict(\n", " valueType=\"str\",\n", " description=\"comment to a named entity\",\n", " creator=\"Dirk Roorda\",\n", " ),\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can give the save command:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.00s Exporting 3 node and 0 edge and 0 config features to ~/github/CLARIAH/wp6-missieven/exercises/entities/tf/0.4:\n", " | 0.00s T entityComment to ~/github/CLARIAH/wp6-missieven/exercises/entities/tf/0.4\n", " | 0.00s T entityId to ~/github/CLARIAH/wp6-missieven/exercises/entities/tf/0.4\n", " | 0.00s T entityKind to ~/github/CLARIAH/wp6-missieven/exercises/entities/tf/0.4\n", " 0.00s Exported 3 node features and 0 edge features and 0 config features to ~/github/CLARIAH/wp6-missieven/exercises/entities/tf/0.4\n" ] }, { "data": { "text/plain": [ "True" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "location = f\"{GITHUB}/{ORG}/{REPO}/{PATH}/entities/tf\"\n", "TF.save(nodeFeatures=features, metaData=metaData, location=location, module=VERSION, silent=\"auto\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Migrating\n", "\n", "We now migrate these annotations to the current version (1.0), which is different from 0.4 in that the footnote texts have been drawn into\n", "the main text. \n", "We use the mapping from 0.4 nodes to 1.0 nodes, which is available as an edge feature `omap#0.4-1.0` in the\n", "current version of the dataset.\n", "\n", "We load both the old and new versions of the dataset." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "entityModule = \"clariah/wp6-missieven/exercises/entities/tf\"\n", "va = \"0.4\"\n", "\n", "A = {}" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "TF-app: ~/text-fabric-data/github/annotation/app-missieven/code" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/github/clariah/wp6-missieven/tf/0.4" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/github/clariah/wp6-missieven/exercises/entities/tf/0.4" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Text-Fabric: Text-Fabric API 10.2.6, clariah/wp6-missieven/app v3, Search Reference
Data: WP6-MISSIEVEN, Character table, Feature docs
Features:
\n", "
clariah/wp6-missieven/exercises/entities/tf\n", "
\n", "\n", "
\n", "
\n", "entityComment\n", "
\n", "
str
\n", "\n", " comment to a named entity\n", "\n", "
\n", "\n", "
\n", "
\n", "entityId\n", "
\n", "
str
\n", "\n", " identifier of a named entity\n", "\n", "
\n", "\n", "
\n", "
\n", "entityKind\n", "
\n", "
str
\n", "\n", " kind of a named entity\n", "\n", "
\n", "\n", "
\n", "
\n", "\n", "
General Missives Dutch East India Company 1600-1800\n", "
\n", "\n", "
\n", "
\n", "author\n", "
\n", "
str
\n", "\n", " authors of the letter, surnames only\n", "\n", "
\n", "\n", "
\n", "
\n", "authorFull\n", "
\n", "
str
\n", "\n", " authors of the letter, full names\n", "\n", "
\n", "\n", "
\n", "
\n", "col\n", "
\n", "
int
\n", "\n", " column number of a column in a row in a table\n", "\n", "
\n", "\n", "
\n", "
\n", "day\n", "
\n", "
int
\n", "\n", " day part of the date of the letter\n", "\n", "
\n", "\n", "
\n", "
\n", "emph\n", "
\n", "
str
\n", "\n", " whether a word is emphasized by typography\n", "\n", "
\n", "\n", "
\n", "
\n", "facs\n", "
\n", "
str
\n", "\n", " url part of the corresponding online facsimile page; the url itself can be constructed using a hard coded template. See also the tpl feature\n", "\n", "
\n", "\n", "
\n", "
\n", "fnote\n", "
\n", "
str
\n", "\n", " all footnotes at that position\n", "\n", "
\n", "\n", "
\n", "
\n", "folio\n", "
\n", "
int
\n", "\n", " a folio reference\n", "\n", "
\n", "\n", "
\n", "
\n", "month\n", "
\n", "
int
\n", "\n", " month part of the date of the letter\n", "\n", "
\n", "\n", "
\n", "
\n", "n\n", "
\n", "
int
\n", "\n", " number of a volume, letter, page, para, line, table\n", "\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "page\n", "
\n", "
str
\n", "\n", " number of the first page of this letter in this volume\n", "\n", "
\n", "\n", "
\n", "
\n", "place\n", "
\n", "
str
\n", "\n", " place from where the letter was sent\n", "\n", "
\n", "\n", "
\n", "
\n", "punc\n", "
\n", "
str
\n", "\n", " punctuation and/or whitespace following a wordup to the next word\n", "\n", "
\n", "\n", "
\n", "
\n", "punco\n", "
\n", "
str
\n", "\n", " punctuation and/or whitespace following a word,up to the next word, original text only\n", "\n", "
\n", "\n", "
\n", "
\n", "puncr\n", "
\n", "
str
\n", "\n", " punctuation and/or whitespace following a word,up to the next word, remark text only\n", "\n", "
\n", "\n", "
\n", "
\n", "rawdate\n", "
\n", "
str
\n", "\n", " the date the letter was sent\n", "\n", "
\n", "\n", "
\n", "
\n", "ref\n", "
\n", "
str
\n", "\n", " whether a word belongs to the text of reference\n", "\n", "
\n", "\n", "
\n", "
\n", "remark\n", "
\n", "
int
\n", "\n", " whether a word belongs to the text of editorial remarks\n", "\n", "
\n", "\n", "
\n", "
\n", "row\n", "
\n", "
int
\n", "\n", " row number of a row of column in a table\n", "\n", "
\n", "\n", "
\n", "
\n", "seq\n", "
\n", "
str
\n", "\n", " ('sequence number of this letter among the letters of the same author in this volume',)\n", "\n", "
\n", "\n", "
\n", "
\n", "special\n", "
\n", "
str
\n", "\n", " whether a word has special typography possibly with OCR mistakes as well\n", "\n", "
\n", "\n", "
\n", "
\n", "status\n", "
\n", "
str
\n", "\n", " status of the letter, e.g. secret, copy\n", "\n", "
\n", "\n", "
\n", "
\n", "sub\n", "
\n", "
str
\n", "\n", " whether a word has subscript typography possibly indicating the denominator of a fraction\n", "\n", "
\n", "\n", "
\n", "
\n", "super\n", "
\n", "
str
\n", "\n", " whether a word has superscript typography possibly indicating the numerator of a fraction\n", "\n", "
\n", "\n", "
\n", "
\n", "title\n", "
\n", "
str
\n", "\n", " title of the letter\n", "\n", "
\n", "\n", "
\n", "
\n", "tpl\n", "
\n", "
int
\n", "\n", " url template number of the corresponding online facsimile page;the url itself can be constructed using this template, filled with the contents of the facs attribute.\n", "\n", "
\n", "\n", "
\n", "
\n", "trans\n", "
\n", "
str
\n", "\n", " transcription of a word\n", "\n", "
\n", "\n", "
\n", "
\n", "transo\n", "
\n", "
str
\n", "\n", " transcription of a word, only for original text\n", "\n", "
\n", "\n", "
\n", "
\n", "transr\n", "
\n", "
str
\n", "\n", " transcription of a word, only for remark text\n", "\n", "
\n", "\n", "
\n", "
\n", "und\n", "
\n", "
str
\n", "\n", " whether a word is underlined by typography\n", "\n", "
\n", "\n", "
\n", "
\n", "vol\n", "
\n", "
int
\n", "\n", " volume number\n", "\n", "
\n", "\n", "
\n", "
\n", "year\n", "
\n", "
int
\n", "\n", " year part of the date of the letter\n", "\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A[va] = use(f\"CLARIAH/wp6-missieven:v{va}\", mod=f\"{entityModule}:clone\", checkout=\"clone\", version=va,\n", " legacy=True, provenanceSpec=dict(org=\"clariah\", repo=\"wp6-missieven\"))" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "TF-app: ~/github/CLARIAH/wp6-missieven/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/github/CLARIAH/wp6-missieven/tf/1.0" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Text-Fabric: Text-Fabric API 10.2.6, CLARIAH/wp6-missieven/app v3, Search Reference
Data: WP6-MISSIEVEN, Character table, Feature docs
Features:
\n", "
General Missives Dutch East India Company 1600-1800\n", "
\n", "\n", "
\n", "
\n", "author\n", "
\n", "
str
\n", "\n", " authors of the letter, surnames only\n", "\n", "
\n", "\n", "
\n", "
\n", "authorFull\n", "
\n", "
str
\n", "\n", " authors of the letter, full names\n", "\n", "
\n", "\n", "
\n", "
\n", "col\n", "
\n", "
int
\n", "\n", " column number of a column in a row in a table\n", "\n", "
\n", "\n", "
\n", "
\n", "day\n", "
\n", "
int
\n", "\n", " day part of the date of the letter\n", "\n", "
\n", "\n", "
\n", "
\n", "entityId\n", "
\n", "
str
\n", "\n", " identifier of a named entity\n", "\n", "
\n", "\n", "
\n", "
\n", "entityKind\n", "
\n", "
str
\n", "\n", " kind of a named entity\n", "\n", "
\n", "\n", "
\n", "
\n", "isden\n", "
\n", "
int
\n", "\n", " whether a word is the denominator in fraction, e.g. 4 in 1/4\n", "\n", "
\n", "\n", "
\n", "
\n", "isemph\n", "
\n", "
str
\n", "\n", " whether a word is emphasized by typography\n", "\n", "
\n", "\n", "
\n", "
\n", "isfolio\n", "
\n", "
int
\n", "\n", " a folio reference\n", "\n", "
\n", "\n", "
\n", "
\n", "isnote\n", "
\n", "
int
\n", "\n", " whether a word belongs to footnote text\n", "\n", "
\n", "\n", "
\n", "
\n", "isnum\n", "
\n", "
int
\n", "\n", " whether a word is the numerator in fraction, e.g. 1 in 1/4\n", "\n", "
\n", "\n", "
\n", "
\n", "isorig\n", "
\n", "
int
\n", "\n", " whether a word belongs to original text\n", "\n", "
\n", "\n", "
\n", "
\n", "isq\n", "
\n", "
int
\n", "\n", " whether a word is a numerical fraction, e.g. 1/4\n", "\n", "
\n", "\n", "
\n", "
\n", "isref\n", "
\n", "
int
\n", "\n", " whether a word belongs to the text of reference\n", "\n", "
\n", "\n", "
\n", "
\n", "isremark\n", "
\n", "
int
\n", "\n", " whether a word belongs to the text of editorial remarks\n", "\n", "
\n", "\n", "
\n", "
\n", "isspecial\n", "
\n", "
int
\n", "\n", " whether a word has special typography possibly with OCR mistakes as well\n", "\n", "
\n", "\n", "
\n", "
\n", "issub\n", "
\n", "
int
\n", "\n", " whether a word has subscript typography possibly indicating the denominator of a fraction\n", "\n", "
\n", "\n", "
\n", "
\n", "issuper\n", "
\n", "
int
\n", "\n", " whether a word has superscript typography possibly indicating the numerator of a fraction\n", "\n", "
\n", "\n", "
\n", "
\n", "isund\n", "
\n", "
str
\n", "\n", " whether a word is underlined by typography\n", "\n", "
\n", "\n", "
\n", "
\n", "mark\n", "
\n", "
int
\n", "\n", " footnote mark (not necessarily the same as shown on the printed page\n", "\n", "
\n", "\n", "
\n", "
\n", "month\n", "
\n", "
int
\n", "\n", " month part of the date of the letter\n", "\n", "
\n", "\n", "
\n", "
\n", "n\n", "
\n", "
int
\n", "\n", " number of a volume, letter, page, para, line, table\n", "\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "page\n", "
\n", "
str
\n", "\n", " number of the first page of this letter in this volume\n", "\n", "
\n", "\n", "
\n", "
\n", "place\n", "
\n", "
str
\n", "\n", " place from where the letter was sent\n", "\n", "
\n", "\n", "
\n", "
\n", "punc\n", "
\n", "
str
\n", "\n", " punctuation and/or whitespace following a wordup to the next word\n", "\n", "
\n", "\n", "
\n", "
\n", "puncn\n", "
\n", "
str
\n", "\n", " punctuation and/or whitespace following a word,up to the next word, footnote text only\n", "\n", "
\n", "\n", "
\n", "
\n", "punco\n", "
\n", "
str
\n", "\n", " punctuation and/or whitespace following a word,up to the next word, original text only\n", "\n", "
\n", "\n", "
\n", "
\n", "puncr\n", "
\n", "
str
\n", "\n", " punctuation and/or whitespace following a word,up to the next word, remark text only\n", "\n", "
\n", "\n", "
\n", "
\n", "rawdate\n", "
\n", "
str
\n", "\n", " the date the letter was sent\n", "\n", "
\n", "\n", "
\n", "
\n", "row\n", "
\n", "
int
\n", "\n", " row number of a row of column in a table\n", "\n", "
\n", "\n", "
\n", "
\n", "seq\n", "
\n", "
str
\n", "\n", " ('sequence number of this letter among the letters of the same author in this volume',)\n", "\n", "
\n", "\n", "
\n", "
\n", "status\n", "
\n", "
str
\n", "\n", " status of the letter, e.g. secret, copy\n", "\n", "
\n", "\n", "
\n", "
\n", "title\n", "
\n", "
str
\n", "\n", " title of the letter\n", "\n", "
\n", "\n", "
\n", "
\n", "trans\n", "
\n", "
str
\n", "\n", " transcription of a word\n", "\n", "
\n", "\n", "
\n", "
\n", "transn\n", "
\n", "
str
\n", "\n", " transcription of a word, only for footnote text\n", "\n", "
\n", "\n", "
\n", "
\n", "transo\n", "
\n", "
str
\n", "\n", " transcription of a word, only for original text\n", "\n", "
\n", "\n", "
\n", "
\n", "transr\n", "
\n", "
str
\n", "\n", " transcription of a word, only for remark text\n", "\n", "
\n", "\n", "
\n", "
\n", "vol\n", "
\n", "
int
\n", "\n", " volume number\n", "\n", "
\n", "\n", "
\n", "
\n", "weblink\n", "
\n", "
str
\n", "\n", " the page-specific part of web links for page nodes\n", "\n", "
\n", "\n", "
\n", "
\n", "x\n", "
\n", "
int
\n", "\n", " column offset of a column in a row in a table\n", "\n", "
\n", "\n", "
\n", "
\n", "year\n", "
\n", "
int
\n", "\n", " year part of the date of the letter\n", "\n", "
\n", "\n", "
\n", "
\n", "note\n", "
\n", "
none
\n", "\n", " edge between a word and the footnotes associated with it\n", "\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "Alater = use(\"CLARIAH/wp6-missieven:clone\", checkout=\"clone\")\n", "vb = Alater.version\n", "A[vb] = Alater" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can use the function\n", "[migrateFeatures](https://annotation.github.io/text-fabric/tf/dataset/nodemaps.html#tf.dataset.nodemaps.Versions.migrateFeatures)\n", "from TF to migrate our features.\n", "See also\n", "[nodeMaps](https://annotation.github.io/text-fabric/tf/dataset/nodemaps.html)." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "V = Versions({va: A[va].api, vb: A[vb].api}, va, vb)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.46s start migrating\n", " 1.90s All additional features loaded - for details use TF.isLoaded()\n", " 1.90s Mapping entityComment (node)\n", " 1.90s Mapping entityId (node)\n", " 1.90s Mapping entityKind (node)\n", " 0.00s Exporting 3 node and 0 edge and 0 config features to ~/github/CLARIAH/wp6-missieven/exercises/entities/tf/1.0:\n", " | 0.00s T entityComment to ~/github/CLARIAH/wp6-missieven/exercises/entities/tf/1.0\n", " | 0.00s T entityId to ~/github/CLARIAH/wp6-missieven/exercises/entities/tf/1.0\n", " | 0.00s T entityKind to ~/github/CLARIAH/wp6-missieven/exercises/entities/tf/1.0\n", " 0.00s Exported 3 node features and 0 edge features and 0 config features to ~/github/CLARIAH/wp6-missieven/exercises/entities/tf/1.0\n", " 0.00s Done\n" ] } ], "source": [ "features = (\"entityComment\", \"entityId\", \"entityKind\")\n", "\n", "V.migrateFeatures(features, location=location, silent=\"auto\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We load the migrated features:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "lines_to_next_cell": 2 }, "outputs": [ { "data": { "text/html": [ "TF-app: ~/github/CLARIAH/wp6-missieven/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/github/CLARIAH/wp6-missieven/tf/1.0" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/github/clariah/wp6-missieven/exercises/entities/tf/1.0" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Text-Fabric: Text-Fabric API 10.2.6, CLARIAH/wp6-missieven/app v3, Search Reference
Data: WP6-MISSIEVEN, Character table, Feature docs
Features:
\n", "
clariah/wp6-missieven/exercises/entities/tf\n", "
\n", "\n", "
\n", "
\n", "entityComment\n", "
\n", "
str
\n", "\n", " comment to a named entity\n", "\n", "
\n", "\n", "
\n", "
\n", "entityId\n", "
\n", "
str
\n", "\n", " identifier of a named entity\n", "\n", "
\n", "\n", "
\n", "
\n", "entityKind\n", "
\n", "
str
\n", "\n", " kind of a named entity\n", "\n", "
\n", "\n", "
\n", "
\n", "\n", "
General Missives Dutch East India Company 1600-1800\n", "
\n", "\n", "
\n", "
\n", "author\n", "
\n", "
str
\n", "\n", " authors of the letter, surnames only\n", "\n", "
\n", "\n", "
\n", "
\n", "authorFull\n", "
\n", "
str
\n", "\n", " authors of the letter, full names\n", "\n", "
\n", "\n", "
\n", "
\n", "col\n", "
\n", "
int
\n", "\n", " column number of a column in a row in a table\n", "\n", "
\n", "\n", "
\n", "
\n", "day\n", "
\n", "
int
\n", "\n", " day part of the date of the letter\n", "\n", "
\n", "\n", "
\n", "
\n", "isden\n", "
\n", "
int
\n", "\n", " whether a word is the denominator in fraction, e.g. 4 in 1/4\n", "\n", "
\n", "\n", "
\n", "
\n", "isemph\n", "
\n", "
str
\n", "\n", " whether a word is emphasized by typography\n", "\n", "
\n", "\n", "
\n", "
\n", "isfolio\n", "
\n", "
int
\n", "\n", " a folio reference\n", "\n", "
\n", "\n", "
\n", "
\n", "isnote\n", "
\n", "
int
\n", "\n", " whether a word belongs to footnote text\n", "\n", "
\n", "\n", "
\n", "
\n", "isnum\n", "
\n", "
int
\n", "\n", " whether a word is the numerator in fraction, e.g. 1 in 1/4\n", "\n", "
\n", "\n", "
\n", "
\n", "isorig\n", "
\n", "
int
\n", "\n", " whether a word belongs to original text\n", "\n", "
\n", "\n", "
\n", "
\n", "isq\n", "
\n", "
int
\n", "\n", " whether a word is a numerical fraction, e.g. 1/4\n", "\n", "
\n", "\n", "
\n", "
\n", "isref\n", "
\n", "
int
\n", "\n", " whether a word belongs to the text of reference\n", "\n", "
\n", "\n", "
\n", "
\n", "isremark\n", "
\n", "
int
\n", "\n", " whether a word belongs to the text of editorial remarks\n", "\n", "
\n", "\n", "
\n", "
\n", "isspecial\n", "
\n", "
int
\n", "\n", " whether a word has special typography possibly with OCR mistakes as well\n", "\n", "
\n", "\n", "
\n", "
\n", "issub\n", "
\n", "
int
\n", "\n", " whether a word has subscript typography possibly indicating the denominator of a fraction\n", "\n", "
\n", "\n", "
\n", "
\n", "issuper\n", "
\n", "
int
\n", "\n", " whether a word has superscript typography possibly indicating the numerator of a fraction\n", "\n", "
\n", "\n", "
\n", "
\n", "isund\n", "
\n", "
str
\n", "\n", " whether a word is underlined by typography\n", "\n", "
\n", "\n", "
\n", "
\n", "mark\n", "
\n", "
int
\n", "\n", " footnote mark (not necessarily the same as shown on the printed page\n", "\n", "
\n", "\n", "
\n", "
\n", "month\n", "
\n", "
int
\n", "\n", " month part of the date of the letter\n", "\n", "
\n", "\n", "
\n", "
\n", "n\n", "
\n", "
int
\n", "\n", " number of a volume, letter, page, para, line, table\n", "\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "page\n", "
\n", "
str
\n", "\n", " number of the first page of this letter in this volume\n", "\n", "
\n", "\n", "
\n", "
\n", "place\n", "
\n", "
str
\n", "\n", " place from where the letter was sent\n", "\n", "
\n", "\n", "
\n", "
\n", "punc\n", "
\n", "
str
\n", "\n", " punctuation and/or whitespace following a wordup to the next word\n", "\n", "
\n", "\n", "
\n", "
\n", "puncn\n", "
\n", "
str
\n", "\n", " punctuation and/or whitespace following a word,up to the next word, footnote text only\n", "\n", "
\n", "\n", "
\n", "
\n", "punco\n", "
\n", "
str
\n", "\n", " punctuation and/or whitespace following a word,up to the next word, original text only\n", "\n", "
\n", "\n", "
\n", "
\n", "puncr\n", "
\n", "
str
\n", "\n", " punctuation and/or whitespace following a word,up to the next word, remark text only\n", "\n", "
\n", "\n", "
\n", "
\n", "rawdate\n", "
\n", "
str
\n", "\n", " the date the letter was sent\n", "\n", "
\n", "\n", "
\n", "
\n", "row\n", "
\n", "
int
\n", "\n", " row number of a row of column in a table\n", "\n", "
\n", "\n", "
\n", "
\n", "seq\n", "
\n", "
str
\n", "\n", " ('sequence number of this letter among the letters of the same author in this volume',)\n", "\n", "
\n", "\n", "
\n", "
\n", "status\n", "
\n", "
str
\n", "\n", " status of the letter, e.g. secret, copy\n", "\n", "
\n", "\n", "
\n", "
\n", "title\n", "
\n", "
str
\n", "\n", " title of the letter\n", "\n", "
\n", "\n", "
\n", "
\n", "trans\n", "
\n", "
str
\n", "\n", " transcription of a word\n", "\n", "
\n", "\n", "
\n", "
\n", "transn\n", "
\n", "
str
\n", "\n", " transcription of a word, only for footnote text\n", "\n", "
\n", "\n", "
\n", "
\n", "transo\n", "
\n", "
str
\n", "\n", " transcription of a word, only for original text\n", "\n", "
\n", "\n", "
\n", "
\n", "transr\n", "
\n", "
str
\n", "\n", " transcription of a word, only for remark text\n", "\n", "
\n", "\n", "
\n", "
\n", "vol\n", "
\n", "
int
\n", "\n", " volume number\n", "\n", "
\n", "\n", "
\n", "
\n", "weblink\n", "
\n", "
str
\n", "\n", " the page-specific part of web links for page nodes\n", "\n", "
\n", "\n", "
\n", "
\n", "x\n", "
\n", "
int
\n", "\n", " column offset of a column in a row in a table\n", "\n", "
\n", "\n", "
\n", "
\n", "year\n", "
\n", "
int
\n", "\n", " year part of the date of the letter\n", "\n", "
\n", "\n", "
\n", "
\n", "note\n", "
\n", "
none
\n", "\n", " edge between a word and the footnotes associated with it\n", "\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A[vb] = use(\"CLARIAH/wp6-missieven:clone\", version=vb, mod=f\"{entityModule}:clone\", checkout=\"clone\")" ] }, { "cell_type": "markdown", "metadata": { "lines_to_next_cell": 2 }, "source": [ "We compare the features in both versions" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "def showFeature(v, f):\n", " F = A[v].api.F\n", " Fs = A[v].api.Fs\n", " T = A[v].api.T\n", "\n", " for (n, val) in Fs(f).items():\n", " ntp = F.otype.v(n)\n", " print(f\"{v} {f} ({ntp:<4} {n:>8}) {val:<8} <= {T.text(n)}\")" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.4 entityComment (word 1250) Amboina <= Ambojna \n", "0.4 entityComment (word 1255) Ternate <= Ternnate \n", "0.4 entityComment (word 1273) Ternate <= ditto\n", "0.4 entityComment (word 1274) Ternate <= plaetse \n", "0.4 entityComment (line 5054889) Amboina <= Ambojna ende van daer naer Ternnate ende soo den Coninck van Spagnien tusschen\n", "0.4 entityComment (line 5054890) Ternate <= de Heeren Staeten den treves geobserveert werde, metten Coninck van ditto\n", "0.4 entityComment (line 5054891) Ternate <= plaetse te contracteeren om met sijn hulpe dese plaetse te ocuperen ende hem daer\n", "\n", "1.0 entityComment (word 1545) Amboina <= Ambojna \n", "1.0 entityComment (word 1550) Ternate <= Ternnate \n", "1.0 entityComment (word 1568) Ternate <= ditto \n", "1.0 entityComment (word 1569) Ternate <= plaetse \n", "1.0 entityComment (line 6018931) Amboina <= Ambojna ende van daer naer Ternnate ende soo den Coninck van Spagnien tusschen \n", "1.0 entityComment (line 6018932) Ternate <= de Heeren Staeten den treves geobserveert werde, metten Coninck van ditto \n", "1.0 entityComment (line 6018933) Ternate <= plaetse te contracteeren om met sijn hulpe dese plaetse te ocuperen ende hem daer \n", "\n", "0.4 entityId (word 1048) T11 <= orancay\n", "0.4 entityId (word 1049) T11 <= bij \n", "0.4 entityId (word 1144) T12 <= arancay \n", "0.4 entityId (word 1146) T1 <= Nera, \n", "0.4 entityId (word 1150) T13 <= sabandaer \n", "0.4 entityId (word 1167) T11 <= orancaye \n", "0.4 entityId (word 1196) T15 <= Hollanders\n", "0.4 entityId (word 1203) T16 <= Verhoeven \n", "0.4 entityId (word 1250) T17 <= Ambojna \n", "0.4 entityId (word 1255) T2 <= Ternnate \n", "0.4 entityId (word 1259) T3 <= Coninck \n", "0.4 entityId (word 1260) T3 <= van \n", "0.4 entityId (word 1261) T4 <= Spagnien \n", "0.4 entityId (word 1264) T5 <= Heeren \n", "0.4 entityId (word 1265) T5 <= Staeten \n", "0.4 entityId (word 1271) T6 <= Coninck \n", "0.4 entityId (word 1273) T2 <= ditto\n", "0.4 entityId (word 1274) T2 <= plaetse \n", "0.4 entityId (word 1289) T7 <= Coninck \n", "0.4 entityId (word 1298) T8 <= Heeren \n", "0.4 entityId (word 1299) T8 <= Staeten, \n", "0.4 entityId (word 1314) T9 <= Banda \n", "0.4 entityId (word 1330) T10 <= Engelsche \n", "0.4 entityId (line 5054871) T11 <= Op den 12 deser is een jonge slave van een orancaybij nacht comen\n", "0.4 entityId (line 5054879) T13 <= jongen een groot arancay van Nera, broeder van den sabandaer dewelcke sij\n", "0.4 entityId (line 5054880) T11 <= tref f ten ende ’t hooft ons hier in ’t casteel gebracht, den voorsz. orancaye was door\n", "0.4 entityId (line 5054882) T15 <= met den onsen gesproken, die hem mede geroemt hadde, 2 van onse Hollanders\n", "0.4 entityId (line 5054883) T16 <= in den moort van den admirael Verhoeven saliger omgebracht te hebben.\n", "0.4 entityId (line 5054889) T17 <= Ambojna ende van daer naer Ternnate ende soo den Coninck van Spagnien tusschen\n", "0.4 entityId (line 5054890) T2 <= de Heeren Staeten den treves geobserveert werde, metten Coninck van ditto\n", "0.4 entityId (line 5054891) T2 <= plaetse te contracteeren om met sijn hulpe dese plaetse te ocuperen ende hem daer\n", "0.4 entityId (line 5054892) T8 <= mede Coninck van te maeken onder protexie van E Mogende Heeren Staeten, doch\n", "0.4 entityId (line 5054894) T9 <= Dit volck van Banda is superbe, moordadich, wel versien van waepenen, van\n", "0.4 entityId (line 5054895) T10 <= de onsen voor desen ende van de Engelsche gecomen, dan weynich couraege omme\n", "\n", "1.0 entityId (word 1242) T11 <= orancay \n", "1.0 entityId (word 1252) T11 <= bij \n", "1.0 entityId (word 1351) T12 <= arancay \n", "1.0 entityId (word 1353) T1 <= Nera, \n", "1.0 entityId (word 1357) T13 <= sabandaer \n", "1.0 entityId (word 1398) T11 <= orancaye \n", "1.0 entityId (word 1427) T15 <= Hollanders \n", "1.0 entityId (word 1434) T16 <= Verhoeven \n", "1.0 entityId (word 1545) T17 <= Ambojna \n", "1.0 entityId (word 1550) T2 <= Ternnate \n", "1.0 entityId (word 1554) T3 <= Coninck \n", "1.0 entityId (word 1555) T3 <= van \n", "1.0 entityId (word 1556) T4 <= Spagnien \n", "1.0 entityId (word 1559) T5 <= Heeren \n", "1.0 entityId (word 1560) T5 <= Staeten \n", "1.0 entityId (word 1566) T6 <= Coninck \n", "1.0 entityId (word 1568) T2 <= ditto \n", "1.0 entityId (word 1569) T2 <= plaetse \n", "1.0 entityId (word 1584) T7 <= Coninck \n", "1.0 entityId (word 1593) T8 <= Heeren \n", "1.0 entityId (word 1594) T8 <= Staeten, \n", "1.0 entityId (word 1653) T9 <= Banda \n", "1.0 entityId (word 1669) T10 <= Engelsche \n", "1.0 entityId (line 6018904) T11 <= Op den 12 deser is een jonge slave van een orancay Orangkaja, hier aanduiding voor een Bandanees hoofd of aanzienlijke. \n", "1.0 entityId (line 6018905) T11 <= bij nacht comen \n", "1.0 entityId (line 6018914) T13 <= jongen een groot arancay van Nera, broeder van den sabandaer Sjahbandar, uit het Perzisch overgenomen woord, in Zuidoost-Azië gebruikt voor \n", "1.0 entityId (line 6018916) T13 <= dewelcke sij \n", "1.0 entityId (line 6018917) T11 <= tref f ten ende ’t hooft ons hier in ’t casteel gebracht, den voorsz. orancaye was door \n", "1.0 entityId (line 6018919) T15 <= met den onsen gesproken, die hem mede geroemt hadde, 2 van onse Hollanders \n", "1.0 entityId (line 6018920) T16 <= in den moort van den admirael Verhoeven Admiraal Pieter Willemsz. Verhoeff kwam 23 november 1608 met zijn vloot voor \n", "1.0 entityId (line 6018925) T16 <= saliger omgebracht te hebben. \n", "1.0 entityId (line 6018931) T17 <= Ambojna ende van daer naer Ternnate ende soo den Coninck van Spagnien tusschen \n", "1.0 entityId (line 6018932) T2 <= de Heeren Staeten den treves geobserveert werde, metten Coninck van ditto \n", "1.0 entityId (line 6018933) T2 <= plaetse te contracteeren om met sijn hulpe dese plaetse te ocuperen ende hem daer \n", "1.0 entityId (line 6018934) T8 <= mede Coninck van te maeken onder protexie van E Mogende Heeren Staeten, doch \n", "1.0 entityId (line 6018938) T9 <= \n", "1.0 entityId (line 6018939) T9 <= Dit volck van Banda is superbe, moordadich, wel versien van waepenen, van \n", "1.0 entityId (line 6018940) T10 <= de onsen voor desen ende van de Engelsche gecomen, dan weynich couraege omme \n", "\n", "0.4 entityKind (word 1048) Person <= orancay\n", "0.4 entityKind (word 1049) Person <= bij \n", "0.4 entityKind (word 1144) Person <= arancay \n", "0.4 entityKind (word 1146) Person <= Nera, \n", "0.4 entityKind (word 1150) Person <= sabandaer \n", "0.4 entityKind (word 1167) Person <= orancaye \n", "0.4 entityKind (word 1196) GPE <= Hollanders\n", "0.4 entityKind (word 1203) Person <= Verhoeven \n", "0.4 entityKind (word 1250) GPE <= Ambojna \n", "0.4 entityKind (word 1255) GPE <= Ternnate \n", "0.4 entityKind (word 1259) Person <= Coninck \n", "0.4 entityKind (word 1260) Person <= van \n", "0.4 entityKind (word 1261) GPE <= Spagnien \n", "0.4 entityKind (word 1264) Organization <= Heeren \n", "0.4 entityKind (word 1265) Organization <= Staeten \n", "0.4 entityKind (word 1271) Person <= Coninck \n", "0.4 entityKind (word 1273) GPE <= ditto\n", "0.4 entityKind (word 1274) GPE <= plaetse \n", "0.4 entityKind (word 1289) Person <= Coninck \n", "0.4 entityKind (word 1298) Organization <= Heeren \n", "0.4 entityKind (word 1299) Organization <= Staeten, \n", "0.4 entityKind (word 1314) GPE <= Banda \n", "0.4 entityKind (word 1330) GPE <= Engelsche \n", "0.4 entityKind (line 5054871) Person <= Op den 12 deser is een jonge slave van een orancaybij nacht comen\n", "0.4 entityKind (line 5054879) Person <= jongen een groot arancay van Nera, broeder van den sabandaer dewelcke sij\n", "0.4 entityKind (line 5054880) Person <= tref f ten ende ’t hooft ons hier in ’t casteel gebracht, den voorsz. orancaye was door\n", "0.4 entityKind (line 5054882) GPE <= met den onsen gesproken, die hem mede geroemt hadde, 2 van onse Hollanders\n", "0.4 entityKind (line 5054883) Person <= in den moort van den admirael Verhoeven saliger omgebracht te hebben.\n", "0.4 entityKind (line 5054889) GPE <= Ambojna ende van daer naer Ternnate ende soo den Coninck van Spagnien tusschen\n", "0.4 entityKind (line 5054890) GPE <= de Heeren Staeten den treves geobserveert werde, metten Coninck van ditto\n", "0.4 entityKind (line 5054891) GPE <= plaetse te contracteeren om met sijn hulpe dese plaetse te ocuperen ende hem daer\n", "0.4 entityKind (line 5054892) Organization <= mede Coninck van te maeken onder protexie van E Mogende Heeren Staeten, doch\n", "0.4 entityKind (line 5054894) GPE <= Dit volck van Banda is superbe, moordadich, wel versien van waepenen, van\n", "0.4 entityKind (line 5054895) GPE <= de onsen voor desen ende van de Engelsche gecomen, dan weynich couraege omme\n", "\n", "1.0 entityKind (word 1242) Person <= orancay \n", "1.0 entityKind (word 1252) Person <= bij \n", "1.0 entityKind (word 1351) Person <= arancay \n", "1.0 entityKind (word 1353) Person <= Nera, \n", "1.0 entityKind (word 1357) Person <= sabandaer \n", "1.0 entityKind (word 1398) Person <= orancaye \n", "1.0 entityKind (word 1427) GPE <= Hollanders \n", "1.0 entityKind (word 1434) Person <= Verhoeven \n", "1.0 entityKind (word 1545) GPE <= Ambojna \n", "1.0 entityKind (word 1550) GPE <= Ternnate \n", "1.0 entityKind (word 1554) Person <= Coninck \n", "1.0 entityKind (word 1555) Person <= van \n", "1.0 entityKind (word 1556) GPE <= Spagnien \n", "1.0 entityKind (word 1559) Organization <= Heeren \n", "1.0 entityKind (word 1560) Organization <= Staeten \n", "1.0 entityKind (word 1566) Person <= Coninck \n", "1.0 entityKind (word 1568) GPE <= ditto \n", "1.0 entityKind (word 1569) GPE <= plaetse \n", "1.0 entityKind (word 1584) Person <= Coninck \n", "1.0 entityKind (word 1593) Organization <= Heeren \n", "1.0 entityKind (word 1594) Organization <= Staeten, \n", "1.0 entityKind (word 1653) GPE <= Banda \n", "1.0 entityKind (word 1669) GPE <= Engelsche \n", "1.0 entityKind (line 6018904) Person <= Op den 12 deser is een jonge slave van een orancay Orangkaja, hier aanduiding voor een Bandanees hoofd of aanzienlijke. \n", "1.0 entityKind (line 6018905) Person <= bij nacht comen \n", "1.0 entityKind (line 6018914) Person <= jongen een groot arancay van Nera, broeder van den sabandaer Sjahbandar, uit het Perzisch overgenomen woord, in Zuidoost-Azië gebruikt voor \n", "1.0 entityKind (line 6018916) Person <= dewelcke sij \n", "1.0 entityKind (line 6018917) Person <= tref f ten ende ’t hooft ons hier in ’t casteel gebracht, den voorsz. orancaye was door \n", "1.0 entityKind (line 6018919) GPE <= met den onsen gesproken, die hem mede geroemt hadde, 2 van onse Hollanders \n", "1.0 entityKind (line 6018920) Person <= in den moort van den admirael Verhoeven Admiraal Pieter Willemsz. Verhoeff kwam 23 november 1608 met zijn vloot voor \n", "1.0 entityKind (line 6018925) Person <= saliger omgebracht te hebben. \n", "1.0 entityKind (line 6018931) GPE <= Ambojna ende van daer naer Ternnate ende soo den Coninck van Spagnien tusschen \n", "1.0 entityKind (line 6018932) GPE <= de Heeren Staeten den treves geobserveert werde, metten Coninck van ditto \n", "1.0 entityKind (line 6018933) GPE <= plaetse te contracteeren om met sijn hulpe dese plaetse te ocuperen ende hem daer \n", "1.0 entityKind (line 6018934) Organization <= mede Coninck van te maeken onder protexie van E Mogende Heeren Staeten, doch \n", "1.0 entityKind (line 6018938) GPE <= \n", "1.0 entityKind (line 6018939) GPE <= Dit volck van Banda is superbe, moordadich, wel versien van waepenen, van \n", "1.0 entityKind (line 6018940) GPE <= de onsen voor desen ende van de Engelsche gecomen, dan weynich couraege omme \n", "\n" ] } ], "source": [ "for f in features:\n", " showFeature(va, f)\n", " print(\"\")\n", " showFeature(vb, f)\n", " print(\"\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Sharing\n", "\n", "In [share](share.ipynb) we show how we can share and reuse these features." ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "---\n", "\n", "# Contents\n", "\n", "* **[start](start.ipynb)** start computing with this corpus\n", "* **[search](search.ipynb)** turbo charge your hand-coding with search templates\n", "* **[compute](compute.ipynb)** sink down a level and compute it yourself\n", "* **[exportExcel](exportExcel)** make tailor-made spreadsheets out of your results\n", "* **annotate** export text, annotate with BRAT, import annotations\n", "* **[share](share.ipynb)** draw in other people's data and let them use yours\n", "* **[entities](entities.ipynb)** use results of third-party NER (named entity recognition)\n", "* **[volumes](volumes.ipynb)** work with selected volumes only\n", "\n", "CC-BY Dirk Roorda" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.7" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }