"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"NL = \"\\n\"\n",
"\n",
"mdHead = f\"\"\"\n",
"{\" | \".join(features)}\n",
"{\" | \".join(\"---\" for _ in features)}\n",
"\"\"\"\n",
"\n",
"mdData = \"\\n\".join(\n",
" f\"\"\"{\" | \".join(str(c or \"\").replace(NL, \" \") for c in row)}\"\"\" for row in table\n",
")\n",
"\n",
"A.dm(f\"\"\"{mdHead}{mdData}\"\"\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that the dataset designer has the text strings of all words into the feature `trans`;\n",
"editorial words also go into `transr`, but not into `transo`;\n",
"original words go into `transo`, but not into `transr`.\n",
"\n",
"The existence of these features is mainly to make it possible to define the selective text formats\n",
"we have seen above."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If constructing a low level dataset is too low-level for your taste,\n",
"we can just collect a bunch of nodes and feed it to a higher-level display function of Text-Fabric:"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[(1094113,),\n",
" (1094114,),\n",
" (1094115,),\n",
" (1094116,),\n",
" (1094117,),\n",
" (1094118,),\n",
" (1094119,),\n",
" (1094120,),\n",
" (1094121,),\n",
" (1094122,),\n",
" (1094123,),\n",
" (1094124,),\n",
" (1094125,),\n",
" (1094126,),\n",
" (1094127,),\n",
" (1094128,),\n",
" (1094129,),\n",
" (1094130,),\n",
" (1094131,),\n",
" (1094132,),\n",
" (1094133,),\n",
" (1094134,),\n",
" (1094135,),\n",
" (1094136,),\n",
" (1094137,),\n",
" (1094138,),\n",
" (1094139,),\n",
" (1094140,),\n",
" (1094141,),\n",
" (1094142,),\n",
" (1094143,),\n",
" (1094144,),\n",
" (1094145,),\n",
" (1094146,),\n",
" (1094147,),\n",
" (1094148,),\n",
" (1094149,),\n",
" (1094150,),\n",
" (1094151,),\n",
" (1094152,),\n",
" (1094153,),\n",
" (1094154,),\n",
" (1094155,),\n",
" (1094156,),\n",
" (1094157,),\n",
" (1094158,),\n",
" (1094159,),\n",
" (1094160,),\n",
" (1094161,),\n",
" (1094162,),\n",
" (1094163,),\n",
" (1094164,),\n",
" (1094165,),\n",
" (1094166,)]"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table = []\n",
"\n",
"for lno in range(lineNum - 2, lineNum + 3):\n",
" ln = T.nodeFromSection((3, 717, lno))\n",
" for w in L.d(ln, otype=\"word\"):\n",
" table.append((w,))\n",
"\n",
"table"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Before we ask Text-Fabric to display this, we tell it the features we're interested in."
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"line 1"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"line
punc= trans=doortransr=door
punc= trans=Sjivadjitransr=Sjivadji
punc= trans=hoofdentransr=hoofden
punc= trans=vantransr=van
punc=; trans=Bijapurtransr=Bijapur
punc= trans=residenttransr=resident
punc=. trans=Leendertsztransr=Leendertsz
punc= trans=raadttransr=raadt
punc= trans=aantransr=aan
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"line 2"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"line
punc= trans=huntransr=hun
punc= trans=latentransr=laten
punc=, trans=wetentransr=weten
punc= trans=dattransr=dat
punc= trans=vestigingtransr=vestiging
punc= trans=wordttransr=wordt
punc=, trans=opgeheventransr=opgeheven
punc= trans=indientransr=indien
punc= trans=zijtransr=zij
punc= » trans=voortgaantransr=voortgaan
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"line 3"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"line 4"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"line
Wij
isorig=1punc= trans=Wijtranso=Wij
twijfelen,
isorig=1punc=, trans=twijfelentranso=twijfelen
of
isorig=1punc= trans=oftranso=of
sij
isorig=1punc= trans=sijtranso=sij
sooveel
isorig=1punc= trans=sooveeltranso=sooveel
werck
isorig=1punc= trans=wercktranso=werck
al
isorig=1punc= trans=altranso=al
van
isorig=1punc= trans=vantranso=van
ons
isorig=1punc= trans=onstranso=ons
soude
isorig=1punc= trans=soudetranso=soude
maecken,
isorig=1punc=, trans=maeckentranso=maecken
omdat
isorig=1punc= trans=omdattranso=omdat
se
isorig=1punc= trans=setranso=se
de
isorig=1punc= trans=detranso=de
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"line 5"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"line
Engelsen
isorig=1punc= trans=Engelsentranso=Engelsen
ende
isorig=1punc= trans=endetranso=ende
nu
isorig=1punc= trans=nutranso=nu
oocq
isorig=1punc= trans=oocqtranso=oocq
de
isorig=1punc= trans=detranso=de
Francen
isorig=1punc= trans=Francentranso=Francen
bij
isorig=1punc= trans=bijtranso=bij
de
isorig=1punc= trans=detranso=de
wercken
isorig=1punc= trans=werckentranso=wercken
hebben
isorig=1punc= trans=hebbentranso=hebben
ende
isorig=1punc= trans=endetranso=ende
van
isorig=1punc= trans=vantranso=van
dewelcke
isorig=1punc= trans=dewelcketranso=dewelcke
sij
isorig=1punc= trans=sijtranso=sij
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"A.displaySetup(extraFeatures=features)\n",
"A.show(table, condensed=True, fmt=\"layout-full\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Where this machinery really shines is when it comes to displaying the results of queries.\n",
"See [search](search.ipynb)."
]
},
{
"cell_type": "markdown",
"metadata": {
"tags": []
},
"source": [
"---\n",
"\n",
"# Next steps\n",
"\n",
"By now you have an impression how to orient yourself in the Missieven dataset.\n",
"The next steps will show you how to get powerful: searching and computing.\n",
"\n",
"After that it is time for collecting results, use them in new annotations and share them.\n",
"\n",
"* **start** start computing with this corpus\n",
"* **[search](search.ipynb)** turbo charge your hand-coding with search templates\n",
"* **[compute](compute.ipynb)** sink down a level and compute it yourself\n",
"* **[exportExcel](exportExcel.ipynb)** make tailor-made spreadsheets out of your results\n",
"* **[annotate](annotate.ipynb)** export text, annotate with BRAT, import annotations\n",
"* **[share](share.ipynb)** draw in other people's data and let them use yours\n",
"* **[entities](entities.ipynb)** use results of third-party NER (named entity recognition)\n",
"* **[porting](porting.ipynb)** port features made against an older version to a newer version\n",
"* **[volumes](volumes.ipynb)** work with selected volumes only\n",
"\n",
"CC-BY Dirk Roorda"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.1"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"state": {},
"version_major": 2,
"version_minor": 0
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}