{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "\n", "# Start\n", "\n", "This notebook gets you started with using\n", "[Text-Fabric](https://github.com/annotation/text-fabric) for coding in the\n", "letters of René Descartes.\n", "\n", "Familiarity with the underlying\n", "[data model](https://annotation.github.io/text-fabric/tf/about/datamodel.html)\n", "is recommended.\n", "\n", "For provenance, see the documentation:\n", "[about](https://github.com/CLARIAH/descartes-tf/blob/master/docs/about.md)." ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Overview\n", "\n", "* we tell you how to get Text-Fabric on your system;\n", "* we tell you how to get the Descartes corpus on your system." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Installing Text-Fabric\n", "\n", "See the [installation instructions](https://annotation.github.io/text-fabric/tf/about/install.html)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Running Text-Fabric\n", "\n", "We will run computer code in the cells below, and this code makes use of the\n", "text-fabric library, shortly called `tf`.\n", "\n", "We import some standard Python modules and then we import the `use` function from text-fabric." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2018-05-17T09:33:07.029915Z", "start_time": "2018-05-17T09:33:07.006073Z" } }, "outputs": [], "source": [ "import sys, os\n", "from tf.app import use" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we are going to *use* the `use` function.\n", "We want to *use* a corpus, and if we specify what corpus, text-fabric will the data for us.\n", "\n", "If you have cloned the `CLARIAH/descartes-tf` repository to your local machine under the directory\n", "\n", "`~/github/CLARIAH/descartes-tf`\n", "\n", "then you already have the data.\n", "In that case you have to call the use command like this:\n", "\n", "```\n", "A = use(\"CLARIAH/descartes-tf:clone\", checkout=\"clone\", hoist=globals())\n", "```\n", "\n", "Below we give the command for the case where you have not cloned the repository.\n", "Text-Fabric will fetch the data from the internet and store it in your directory\n", "\n", "`~/text-fabric-data/github/CLARIAH/descartes-tf`.\n", "\n", "In both cases, the corpus data will be optimised for fast processing, a one time job." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2018-05-17T09:33:10.782586Z", "start_time": "2018-05-17T09:33:09.929360Z" } }, "outputs": [ { "data": { "text/markdown": [ "**Locating corpus resources ...**" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "app: ~/text-fabric-data/github/CLARIAH/descartes-tf/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/CLARIAH/descartes-tf/tf/1.1" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/CLARIAH/descartes-tf/parallels/tf/1.1" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " Text-Fabric: Text-Fabric API 11.2.0, CLARIAH/descartes-tf/app v3, Search Reference
\n", " Data: CLARIAH - descartes-tf 1.1, Character table, Feature docs
\n", "
Node types\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "
Name# of nodes# slots/node% coverage
volume885241.88100
letter725940.60100
page2884236.45100
postscriptum5646.790
opener5451.970
closer54113.101
address8615.220
head72523.372
p843880.82100
sentence1307450.1496
hi59724.634
formula62001.211
figure3191.000
word6819351.00100
\n", " Sets: no custom sets
\n", " Features:
\n", "
Similar Sentences\n", "
\n", "\n", "
\n", "
\n", "sim\n", "
\n", "
int
\n", "\n", " similarity between sentences based on the Levenshtein ratio\n", "\n", "
\n", "\n", "
\n", "
\n", "\n", "
Descartes = Descartes, all letters\n", "
\n", "\n", "
\n", "
\n", "alt_date\n", "
\n", "
str
\n", "\n", " alternative date of a letter\n", "\n", "
\n", "\n", "
\n", "
\n", "alt_id\n", "
\n", "
str
\n", "\n", " alternative ids of a letter, comma separated\n", "\n", "
\n", "\n", "
\n", "
\n", "cert\n", "
\n", "
str
\n", "\n", " certainty of something\n", "\n", "
\n", "\n", "
\n", "
\n", "date\n", "
\n", "
str
\n", "\n", " date of a letter\n", "\n", "
\n", "\n", "
\n", "
\n", "id\n", "
\n", "
str
\n", "\n", " id of a letter\n", "\n", "
\n", "\n", "
\n", "
\n", "intermediary\n", "
\n", "
str
\n", "\n", " person involved in the transmission of the letter from sender to receiver\n", "\n", "
\n", "\n", "
\n", "
\n", "isitalic\n", "
\n", "
str
\n", "\n", " whether the word is in italic\n", "\n", "
\n", "\n", "
\n", "
\n", "ismargin\n", "
\n", "
str
\n", "\n", " whether the word is in the margin\n", "\n", "
\n", "\n", "
\n", "
\n", "issub\n", "
\n", "
str
\n", "\n", " whether the word is in subscript\n", "\n", "
\n", "\n", "
\n", "
\n", "issup\n", "
\n", "
str
\n", "\n", " whether the word is in supscript\n", "\n", "
\n", "\n", "
\n", "
\n", "language\n", "
\n", "
str
\n", "\n", " language of a letter\n", "\n", "
\n", "\n", "
\n", "
\n", "level\n", "
\n", "
str
\n", "\n", " level of a paragraph when it acts like a heading\n", "\n", "
\n", "\n", "
\n", "
\n", "n\n", "
\n", "
int
\n", "\n", " number of whatever element\n", "\n", "
\n", "\n", "
\n", "
\n", "notation\n", "
\n", "
str
\n", "\n", " notation method of a formula\n", "\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "punc\n", "
\n", "
str
\n", "\n", " nonword chars after a word \n", "\n", "
\n", "\n", "
\n", "
\n", "recipient\n", "
\n", "
str
\n", "\n", " recipient of a letter\n", "\n", "
\n", "\n", "
\n", "
\n", "recipientloc\n", "
\n", "
str
\n", "\n", " location from where a letter was received\n", "\n", "
\n", "\n", "
\n", "
\n", "resp\n", "
\n", "
str
\n", "\n", " person responsible for something\n", "\n", "
\n", "\n", "
\n", "
\n", "sender\n", "
\n", "
str
\n", "\n", " sender of a letter\n", "\n", "
\n", "\n", "
\n", "
\n", "senderloc\n", "
\n", "
str
\n", "\n", " location from where a letter was sent\n", "\n", "
\n", "\n", "
\n", "
\n", "tex\n", "
\n", "
str
\n", "\n", " unformatted TeX code of a formula, without the `$`\n", "\n", "
\n", "\n", "
\n", "
\n", "trans\n", "
\n", "
str
\n", "\n", " transcription of a word \n", "\n", "
\n", "\n", "
\n", "
\n", "typ\n", "
\n", "
str
\n", "\n", " kind of a node; \"empty\"; \"formula\", \"head\", \"symbol\", \"illustration\"\n", "\n", "
\n", "\n", "
\n", "
\n", "url\n", "
\n", "
str
\n", "\n", " url of a graphic node\n", "\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Text-Fabric API: names N F E L T S C TF directly usable

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/CLARIAH/descartes-tf/source/illustrations" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Found 5 symbols
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Found 310 illustrations
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A = use(\"CLARIAH/descartes-tf\", hoist=globals())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# The following loads will be much quicker!\n", "\n", "Just to show the results of the optimization step: if we give the same command again,\n", "the data is loaded much quicker." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "**Locating corpus resources ...**" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "app: ~/text-fabric-data/github/CLARIAH/descartes-tf/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/CLARIAH/descartes-tf/tf/1.1" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/CLARIAH/descartes-tf/parallels/tf/1.1" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " Text-Fabric: Text-Fabric API 11.1.2, CLARIAH/descartes-tf/app v3, Search Reference
\n", " Data: DESCARTES-TF, Character table, Feature docs
\n", "
Node types\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "
Name# of nodes# slots/node% coverage
volume885241.88100
letter725940.60100
page2884236.45100
postscriptum5646.790
opener5451.970
closer54113.101
address8615.220
head72523.372
p843880.82100
sentence1307450.1496
hi59724.634
formula62001.211
figure3191.000
word6819351.00100
\n", " Sets: no custom sets
\n", " Features:
\n", "
Similar Sentences\n", "
\n", "\n", "
\n", "
\n", "sim\n", "
\n", "
int
\n", "\n", " similarity between sentences based on the Levenshtein ratio\n", "\n", "
\n", "\n", "
\n", "
\n", "\n", "
Descartes = Descartes, all letters\n", "
\n", "\n", "
\n", "
\n", "alt_date\n", "
\n", "
str
\n", "\n", " alternative date of a letter\n", "\n", "
\n", "\n", "
\n", "
\n", "alt_id\n", "
\n", "
str
\n", "\n", " alternative ids of a letter, comma separated\n", "\n", "
\n", "\n", "
\n", "
\n", "cert\n", "
\n", "
str
\n", "\n", " certainty of something\n", "\n", "
\n", "\n", "
\n", "
\n", "date\n", "
\n", "
str
\n", "\n", " date of a letter\n", "\n", "
\n", "\n", "
\n", "
\n", "id\n", "
\n", "
str
\n", "\n", " id of a letter\n", "\n", "
\n", "\n", "
\n", "
\n", "intermediary\n", "
\n", "
str
\n", "\n", " person involved in the transmission of the letter from sender to receiver\n", "\n", "
\n", "\n", "
\n", "
\n", "isitalic\n", "
\n", "
str
\n", "\n", " whether the word is in italic\n", "\n", "
\n", "\n", "
\n", "
\n", "ismargin\n", "
\n", "
str
\n", "\n", " whether the word is in the margin\n", "\n", "
\n", "\n", "
\n", "
\n", "issub\n", "
\n", "
str
\n", "\n", " whether the word is in subscript\n", "\n", "
\n", "\n", "
\n", "
\n", "issup\n", "
\n", "
str
\n", "\n", " whether the word is in supscript\n", "\n", "
\n", "\n", "
\n", "
\n", "language\n", "
\n", "
str
\n", "\n", " language of a letter\n", "\n", "
\n", "\n", "
\n", "
\n", "level\n", "
\n", "
str
\n", "\n", " level of a paragraph when it acts like a heading\n", "\n", "
\n", "\n", "
\n", "
\n", "n\n", "
\n", "
int
\n", "\n", " number of whatever element\n", "\n", "
\n", "\n", "
\n", "
\n", "notation\n", "
\n", "
str
\n", "\n", " notation method of a formula\n", "\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "punc\n", "
\n", "
str
\n", "\n", " nonword chars after a word \n", "\n", "
\n", "\n", "
\n", "
\n", "recipient\n", "
\n", "
str
\n", "\n", " recipient of a letter\n", "\n", "
\n", "\n", "
\n", "
\n", "recipientloc\n", "
\n", "
str
\n", "\n", " location from where a letter was received\n", "\n", "
\n", "\n", "
\n", "
\n", "resp\n", "
\n", "
str
\n", "\n", " person responsible for something\n", "\n", "
\n", "\n", "
\n", "
\n", "sender\n", "
\n", "
str
\n", "\n", " sender of a letter\n", "\n", "
\n", "\n", "
\n", "
\n", "senderloc\n", "
\n", "
str
\n", "\n", " location from where a letter was sent\n", "\n", "
\n", "\n", "
\n", "
\n", "tex\n", "
\n", "
str
\n", "\n", " unformatted TeX code of a formula, without the `$`\n", "\n", "
\n", "\n", "
\n", "
\n", "trans\n", "
\n", "
str
\n", "\n", " transcription of a word \n", "\n", "
\n", "\n", "
\n", "
\n", "typ\n", "
\n", "
str
\n", "\n", " kind of a node; \"empty\"; \"formula\", \"head\", \"symbol\", \"illustration\"\n", "\n", "
\n", "\n", "
\n", "
\n", "url\n", "
\n", "
str
\n", "\n", " url of a graphic node\n", "\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Text-Fabric API: names N F E L T S C TF directly usable

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/CLARIAH/descartes-tf/source/illustrations" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Found 5 symbols
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "Found 310 illustrations
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A = use(\"CLARIAH/descartes-tf\", hoist=globals())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The output\n", "\n", "The messages after loading the corpus contain a lot of information about it.\n", "\n", "**Tip:** click the triangles and the links, and have a quick look.\n", "\n", "The **Text-Fabric** line has various links to the API docs.\n", "\n", "Under **Node types** you find statistics about the corpus.\n", "\n", "Under **Descartes = Descartes, all letters** you find the *features* of the corpus\n", "with short descriptions.\n", "\n", "This corpus has additional material: *illustrations*.\n", "They have been downloaded automatically in the process, and you see how many there are." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Highlights\n", "\n", "This corpus is special in that it has mathematical formulas and illustrations.\n", "\n", "We show some of them to whet your appetite.\n", "\n", "## Formulas\n", "\n", "There are simple formulas and complex formulas.\n", "The latter are represented as TeX codes, and will be typeset nicely.\n", "\n", "Let's find the complex ones." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2018-05-17T09:34:09.968339Z", "start_time": "2018-05-17T09:34:09.963447Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.01s 219 results\n" ] } ], "source": [ "query = \"\"\"\n", "formula notation=TeX\n", "\"\"\"\n", "\n", "results = A.search(query)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's show a few." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
npformula
11 1046:11 ${1\\over 3} {4\\over 9} {16\\over 27} {64\\over 81}$
21 1060:3 $4.900x^{6} \\ {\\it aequat}\\ - 4.899x^{5} + 2.354x^{4} + 16.858x^{3} + 9.458xx + 429x - 4.900$
31 1060:9 ${\\displaystyle\\strut {3xx - 1x}\\over \\displaystyle\\strut 2}$
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.table(results, end=3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can see them in context as well:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "

result 1" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

1 1046:11
sentence 1
Vous
me
demandez,
en
troisième
lieu,
comment
se
meut
une
pierre
hi
in
vacuo;
mais
parce
que
vous
avez
oublié
à
mettre
la
figure,
que
vous
supposez
être
à
la
marge
de
votre
lettre,
je
ne
puis
bien
entendre
ce
que
vous
proposez,
et
il
ne
me
semble
point
que
les
proportions
que
vous
mettez,
se
rapportent
à
celles
que
je
vous
ai
autrefois
mandées,
ou
au
lieu
de,
etc.,
comme
vous
m'
écrivez,
je
mettais
formula TeX
notation=TeX
${1\\over 3} {4\\over 9} {16\\over 27} {64\\over 81}$
,
etc.,
ce
qui
donne
bien
d'
autres
conséquences.
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

result 2" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

1 1060:3
sentence 2
Et
je
trouve
que
la
proportion,
qui
est
entre
le
moindre
côté
du
triangle
formula
ABC
et
le
plus
grand,
est
comme
l'
unité
à
l'
une
des
deux
racines
qui
peuvent
être
tirées
de
cette
équation:
formula TeX
notation=TeX
$4.900x^{6} \\ {\\it aequat}\\ - 4.899x^{5} + 2.354x^{4} + 16.858x^{3} + 9.458xx + 429x - 4.900$
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

result 3" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

1 1060:9
sentence 1
(
Lequel
est
nombre
figuré
comme
5,
12,
22,
sont
nombres
pentagonaux
et
formula TeX
notation=TeX
${\\displaystyle\\strut {3xx - 1x}\\over \\displaystyle\\strut 2}$
sont
les
termes
d'
algebra
qui
expriment
leurs
racines,
et
ils
contiennent
6
unités).
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.show(results, end=3)" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "---\n", "\n", "# Next steps\n", "\n", "By now you have an impression how to orient yourself in this corpus.\n", "The next steps will show you how to get powerful: searching and computing.\n", "\n", "After that it is time for collecting results, use them in new annotations and share them.\n", "\n", "* **start** intro and highlights\n", "* **[search](search.ipynb)** turbo charge your hand-coding with search templates\n", "* **[compute](compute.ipynb)** sink down a level and compute it yourself\n", "* **[exportExcel](exportExcel.ipynb)** make tailor-made spreadsheets out of your results\n", "\n", "Advanced\n", "\n", "* **[similar sentences](similar.ipynb)** find similar sentences\n", "\n", "CC-BY Dirk Roorda" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.1" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": true, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": { "height": "607px", "left": "0px", "right": "983px", "top": "110px", "width": "297px" }, "toc_section_display": "block", "toc_window_display": false }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }