<img align="right" src="images/tf.png" width="200"/>
<img align="right" src="images/huc.png" width="200"/>
<img align="right" src="images/logo.png" width="200"/>

---

To get started: consult [start](start.ipynb)

---

# Search Introduction

*Search* in Text-Fabric is a template based way of looking for structural patterns in your dataset.

Within Text-Fabric we have the unique possibility to combine the ease of formulating search templates for
complicated syntactical patterns with the power of programmatically processing the results.

This notebook will show you how to get up and running.

## Easy command

Search is as simple as saying (just an example)

```python
results = A.search(template)
A.show(results)
```

See all ins and outs in the
[search template docs](https://annotation.github.io/text-fabric/tf/about/searchusage.html).

# Incantation

The ins and outs of installing Text-Fabric, getting the corpus, and initializing a notebook are
explained in the [start tutorial](start.ipynb).

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from tf.app import use

In [3]:
A = use("clariah/wp6-missieven", hoist=globals())

# Basic search command

We start with the most simple form of issuing a query.
Let's look for the words in volume 4, page 235, line 17

All work involved in searching takes place under the hood.

In [5]:
query = """
volume n=4
  page n=239
    line n<9
      word
"""
results = A.search(query)
A.table(results, skipCols="1 2 3")

  1.80s 61 results


n,p,word
1,4 239:1,IV.
2,4 239:1,RYCKLOFF
3,4 239:1,VAN
4,4 239:1,"GOENS,"
5,4 239:1,CORNELIS
6,4 239:1,"SPEELMAN,"
7,4 239:1,WILLEM
8,4 239:1,VAN
9,4 239:2,"OUTHOORN,"
10,4 239:2,JOANNES


The hyperlinks take us to the online image of this page at the Huygens institute.

Note that we can choose start and/or end points in the results list.

In [6]:
A.table(results, start=44, end=53, skipCols="1 2")

n,p,line,word
44,4 239:7,,St.
45,4 239:7,,Andries
46,4 239:7,,kwam
47,4 239:7,,in
48,4 239:8,,slechte
49,4 239:8,,staat
50,4 239:8,,uit
51,4 239:8,,patria
52,4 239:8,,te
53,4 239:8,,Batavia


We can show the results more fully with `show()`.

In [7]:
A.show(results, skipCols="1 2 3", condensed=True, condenseType="line")

Now we pick all numerical words, or rather, words that contain a digit

In [8]:
query = """
volume n=4
  page n=239
    line n<9
      word trans~[0-9]
"""
results = A.search(query)
A.show(results, skipCols="1 2 3", condensed=True)

  2.97s 11 results


Lets look for all places where there is a remark by the editor:

In [9]:
query = """
word isremark
"""
results = A.search(query)

  2.63s 2349087 results


We can narrow down to the page we just inspected:

In [10]:
query = """
volume n=4
  page n=239
    word isremark
"""
results = A.search(query)

  1.72s 198 results


and show the results:

In [11]:
A.show(results, condensed=True)

# Special characters

How can we look for special characters?

Let's first see what special characters we have in the corpus.

In [6]:
A = use("clariah/wp6-missieven:clone", hoist=globals())

In [12]:
A.specialCharacters()

If you click on a character it is copied to the clipboard.

We can search for all words with a black square:

In [7]:
results = A.search("""
word trans~■
""")

  2.53s 37 results


In [8]:
A.table(results, condensed=True)

n,p,line,word
1,1 31:32,,■
2,1 80:27,,■willen
3,1 557:3,,■
4,2 641:42,,■voorsz.
5,2 660:41,,■
6,3 118:14,,■witste
7,3 662:13,,■ffi.
8,4 205:4,,■ . .
9,4 208:43,,■
10,4 209:7,,■ '


---

# Contents

* **[start](start.ipynb)** start computing with this corpus
* **search** turbo charge your hand-coding with search templates
* **[compute](compute.ipynb)** sink down a level and compute it yourself
* **[exportExcel](exportExcel.ipynb)** make tailor-made spreadsheets out of your results
* **[annotate](annotate.ipynb)** export text, annotate with BRAT, import annotations
* **[share](share.ipynb)** draw in other people's data and let them use yours
* **[entities](entities.ipynb)** use results of third-party NER (named entity recognition)
* **[porting](porting.ipynb)** port features made against an older version to a newer version
* **[volumes](volumes.ipynb)** work with selected volumes only

CC-BY Dirk Roorda