<img align="right" src="images/tf.png" width="200"/>
<img align="right" src="images/huc.png" width="200"/>
<img align="right" src="images/logo.png" width="200"/>

---

To get started: consult [start](start.ipynb)

---

# Named Entities

A research group at VU University Amsterdam (Piek Vossen VU, Sophie Arnoult)
has applied a NER-algorithm to this corpus (Named Entity Recognition) and 
delivered the results as Text-Fabric features in 
[cltl/voc-missives](https://github.com/cltl/voc-missives).

We can use these shared features, they are in `export/tf` and we see that they have been produced
against version `1.0` of the corpus data.

See [entityProto](entityProto.ipynb) for an exploration of these entities.

Based on that we have created `ent` nodes for entity occurrences and `entity` nodes for collections of `ent`
nodes that have the same entity id and entity kind.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from tf.app import use

In [10]:
A = use("CLARIAH/wp6-missieven", checkout="latest", hoist=globals())

**Locating corpus resources ...**

   |     1.22s T otype                from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e
   |       12s T oslots               from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e
   |     0.53s T transn               from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e
   |       11s T punc                 from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e
   |     0.99s T n                    from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e
   |     6.26s T punco                from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e
   |     4.51s T puncr                from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e
   |     0.41s T puncn                from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e
   |     0.00s T title                from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e
   |     5.42s T transr               from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e
   |     7

Name,# of nodes,# slots / node,% coverage
volume,14,426954.79,100
letter,607,9847.39,100
page,11215,532.98,100
table,491,137.91,1
para,34773,100.79,59
remark,24110,97.49,39
head,607,31.12,0
note,12476,16.88,4
line,526918,11.34,100
row,8350,8.1,1


The following snippet shows how the `entity` and `ent` nodes hang together.

In [11]:
firstEntity = F.otype.s("entity")[0]
entityOccurrences = E.eoccs.f(firstEntity)

print(f"entity {firstEntity} is {F.kind.v(firstEntity)} {F.eid.v(firstEntity)} having {len(entityOccurrences)} occs")

for eo in entityOccurrences:
    print(f"ent    {eo} is {F.kind.v(eo)} {F.eid.v(eo)} {A.sectionStrFromNode(eo)}")

entity 6656750 is PER pieter.both having 20 occs
ent    6638994 is PER pieter.both 1 3:1
ent    6639002 is PER pieter.both 1 3:1
ent    6639005 is PER pieter.both 1 3:1
ent    6639023 is PER pieter.both 1 7:1
ent    6639045 is PER pieter.both 1 8:1
ent    6639063 is PER pieter.both 1 16:1
ent    6639067 is PER pieter.both 1 16:1
ent    6639073 is PER pieter.both 1 17:1
ent    6639084 is PER pieter.both 1 18:1
ent    6639086 is PER pieter.both 1 19:1
ent    6639105 is PER pieter.both 1 20:1
ent    6639109 is PER pieter.both 1 20:1
ent    6639115 is PER pieter.both 1 20:1
ent    6639116 is PER pieter.both 1 21:1
ent    6639136 is PER pieter.both 1 27:1
ent    6639138 is PER pieter.both 1 27:1
ent    6639141 is PER pieter.both 1 29:1
ent    6639169 is PER pieter.both 1 33:1
ent    6639183 is PER pieter.both 1 37:1
ent    6639217 is PER pieter.both 1 39:1


Here we show the
[NER API](https://annotation.github.io/text-fabric/tf/browser/ner/annotate.html)
as built in into Text-Fabric.

In [12]:
NE = A.makeNer()

In [13]:
results = NE.filterContent(eVals=("japan", "LOC"))

74 lines


In [14]:
NE.showContent(results, start=20)

---

# Contents

* **[start](start.ipynb)** start computing with this corpus
* **[search](search.ipynb)** turbo charge your hand-coding with search templates
* **[compute](compute.ipynb)** sink down a level and compute it yourself
* **[exportExcel](exportExcel.ipynb)** make tailor-made spreadsheets out of your results
* **[annotate](annotate.ipynb)** export text, annotate with BRAT, import annotations
* **[share](share.ipynb)** draw in other people's data and let them use yours
* **entities** use results of third-party NER (named entity recognition)
* **[porting](porting.ipynb)** port features made against an older version to a newer version
* **[volumes](volumes.ipynb)** work with selected volumes only

CC-BY Dirk Roorda