{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "\n", "---\n", "Start with [convert](https://nbviewer.jupyter.org/github/annotation/banks/blob/master/programs/convert.ipynb)\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# The Banks example corpus as app" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from tf.app import use" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We do not only load the main corpus data, but also the additional *sim* (similarity) feature that is in a\n", "module.\n", "\n", "For the very last version, use `hot`.\n", "\n", "For the latest release, use `latest`.\n", "\n", "If you have cloned the repos (TF app and data), use `clone`.\n", "\n", "If you do not want/need to upgrade, leave out the checkout specifiers." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "rate limit is 5000 requests per hour, with 4933 left for this hour\n", "\tconnecting to online GitHub repo annotation/banks ... connected\n" ] }, { "data": { "text/html": [ "TF-app: ~/text-fabric-data/annotation/banks/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/annotation/banks/tf/0.2" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/annotation/banks/sim/tf/0.2" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "This is Text-Fabric 9.2.0\n", "Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html\n", "\n", "11 features found and 0 ignored\n" ] }, { "data": { "text/html": [ "Text-Fabric: Text-Fabric API 9.2.0, annotation/banks/app v3, Search Reference
Data: BANKS, Character table, Feature docs
Features:
\n", "
annotation/banks/sim/tf\n", "
\n", "\n", "
\n", "
\n", "sim\n", "
\n", "
int
\n", "
\n", " similarity between words, as a percentage of the common material wrt the combined material\n", "
\n", "\n", "
\n", "
converters:
\n", "
Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-06-10T19:40:37Z
\n", "
\n", "\n", "
\n", "
name:
\n", "
Banks (similar words)
\n", "
\n", "\n", "
\n", "
sourceUrl:
\n", "
https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/text-fabric/use.ipynb
\n", "
\n", "\n", "
\n", "
version:
\n", "
0.2
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "\n", "
Two quotes from Consider Phlebas by Iain M. Banks\n", "
\n", "\n", "
\n", "
\n", "author\n", "
\n", "
str
\n", "
\n", " the author of a book\n", "
\n", "\n", "
\n", "
compiler:
\n", "
Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-02-13T13:37:47Z
\n", "
\n", "\n", "
\n", "
name:
\n", "
Culture quotes from Iain Banks
\n", "
\n", "\n", "
\n", "
purpose:
\n", "
exposition
\n", "
\n", "\n", "
\n", "
source:
\n", "
Good Reads
\n", "
\n", "\n", "
\n", "
status:
\n", "
with for similarities in a separate module
\n", "
\n", "\n", "
\n", "
url:
\n", "
https://www.goodreads.com/work/quotes/14366-consider-phlebas
\n", "
\n", "\n", "
\n", "
version:
\n", "
0.2
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "gap\n", "
\n", "
int
\n", "
\n", " 1 for words that occur between [ ], which are inserted by the editor\n", "
\n", "\n", "
\n", "
compiler:
\n", "
Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-02-13T13:37:47Z
\n", "
\n", "\n", "
\n", "
name:
\n", "
Culture quotes from Iain Banks
\n", "
\n", "\n", "
\n", "
purpose:
\n", "
exposition
\n", "
\n", "\n", "
\n", "
source:
\n", "
Good Reads
\n", "
\n", "\n", "
\n", "
status:
\n", "
with for similarities in a separate module
\n", "
\n", "\n", "
\n", "
url:
\n", "
https://www.goodreads.com/work/quotes/14366-consider-phlebas
\n", "
\n", "\n", "
\n", "
version:
\n", "
0.2
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "letters\n", "
\n", "
str
\n", "
\n", " the letters of a word\n", "
\n", "\n", "
\n", "
compiler:
\n", "
Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-02-13T13:37:47Z
\n", "
\n", "\n", "
\n", "
name:
\n", "
Culture quotes from Iain Banks
\n", "
\n", "\n", "
\n", "
purpose:
\n", "
exposition
\n", "
\n", "\n", "
\n", "
source:
\n", "
Good Reads
\n", "
\n", "\n", "
\n", "
status:
\n", "
with for similarities in a separate module
\n", "
\n", "\n", "
\n", "
url:
\n", "
https://www.goodreads.com/work/quotes/14366-consider-phlebas
\n", "
\n", "\n", "
\n", "
version:
\n", "
0.2
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "number\n", "
\n", "
int
\n", "
\n", " number of chapter, or sentence in chapter, or line in sentence\n", "
\n", "\n", "
\n", "
compiler:
\n", "
Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-02-13T13:37:47Z
\n", "
\n", "\n", "
\n", "
name:
\n", "
Culture quotes from Iain Banks
\n", "
\n", "\n", "
\n", "
purpose:
\n", "
exposition
\n", "
\n", "\n", "
\n", "
source:
\n", "
Good Reads
\n", "
\n", "\n", "
\n", "
status:
\n", "
with for similarities in a separate module
\n", "
\n", "\n", "
\n", "
url:
\n", "
https://www.goodreads.com/work/quotes/14366-consider-phlebas
\n", "
\n", "\n", "
\n", "
version:
\n", "
0.2
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "
\n", " \n", "
\n", "\n", "
\n", "
compiler:
\n", "
Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-02-13T13:37:47Z
\n", "
\n", "\n", "
\n", "
name:
\n", "
Culture quotes from Iain Banks
\n", "
\n", "\n", "
\n", "
purpose:
\n", "
exposition
\n", "
\n", "\n", "
\n", "
source:
\n", "
Good Reads
\n", "
\n", "\n", "
\n", "
status:
\n", "
with for similarities in a separate module
\n", "
\n", "\n", "
\n", "
url:
\n", "
https://www.goodreads.com/work/quotes/14366-consider-phlebas
\n", "
\n", "\n", "
\n", "
version:
\n", "
0.2
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "punc\n", "
\n", "
str
\n", "
\n", " the punctuation after a word\n", "
\n", "\n", "
\n", "
compiler:
\n", "
Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-02-13T13:37:47Z
\n", "
\n", "\n", "
\n", "
name:
\n", "
Culture quotes from Iain Banks
\n", "
\n", "\n", "
\n", "
purpose:
\n", "
exposition
\n", "
\n", "\n", "
\n", "
remark:
\n", "
a bit more info is needed
\n", "
\n", "\n", "
\n", "
source:
\n", "
Good Reads
\n", "
\n", "\n", "
\n", "
status:
\n", "
with for similarities in a separate module
\n", "
\n", "\n", "
\n", "
url:
\n", "
https://www.goodreads.com/work/quotes/14366-consider-phlebas
\n", "
\n", "\n", "
\n", "
version:
\n", "
0.2
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "terminator\n", "
\n", "
str
\n", "
\n", " the last character of a line\n", "
\n", "\n", "
\n", "
compiler:
\n", "
Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-02-13T13:37:47Z
\n", "
\n", "\n", "
\n", "
name:
\n", "
Culture quotes from Iain Banks
\n", "
\n", "\n", "
\n", "
purpose:
\n", "
exposition
\n", "
\n", "\n", "
\n", "
source:
\n", "
Good Reads
\n", "
\n", "\n", "
\n", "
status:
\n", "
with for similarities in a separate module
\n", "
\n", "\n", "
\n", "
url:
\n", "
https://www.goodreads.com/work/quotes/14366-consider-phlebas
\n", "
\n", "\n", "
\n", "
version:
\n", "
0.2
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "title\n", "
\n", "
str
\n", "
\n", " the title of a book\n", "
\n", "\n", "
\n", "
compiler:
\n", "
Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-02-13T13:37:47Z
\n", "
\n", "\n", "
\n", "
name:
\n", "
Culture quotes from Iain Banks
\n", "
\n", "\n", "
\n", "
purpose:
\n", "
exposition
\n", "
\n", "\n", "
\n", "
source:
\n", "
Good Reads
\n", "
\n", "\n", "
\n", "
status:
\n", "
with for similarities in a separate module
\n", "
\n", "\n", "
\n", "
url:
\n", "
https://www.goodreads.com/work/quotes/14366-consider-phlebas
\n", "
\n", "\n", "
\n", "
version:
\n", "
0.2
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "
\n", " \n", "
\n", "\n", "
\n", "
compiler:
\n", "
Dirk Roorda
\n", "
\n", "\n", "
\n", "
dateWritten:
\n", "
2020-02-13T13:37:47Z
\n", "
\n", "\n", "
\n", "
name:
\n", "
Culture quotes from Iain Banks
\n", "
\n", "\n", "
\n", "
purpose:
\n", "
exposition
\n", "
\n", "\n", "
\n", "
source:
\n", "
Good Reads
\n", "
\n", "\n", "
\n", "
status:
\n", "
with for similarities in a separate module
\n", "
\n", "\n", "
\n", "
url:
\n", "
https://www.goodreads.com/work/quotes/14366-consider-phlebas
\n", "
\n", "\n", "
\n", "
version:
\n", "
0.2
\n", "
\n", "\n", "
\n", "
writtenBy:
\n", "
Text-Fabric
\n", "
\n", "\n", "
\n", "
\n", "
\n", "\n", "
\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Text-Fabric API: names N F E L T S C TF directly usable

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A = use(\n", " \"annotation/banks:hot\",\n", " mod=\"annotation/banks/sim/tf\",\n", " hoist=globals(),\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Use the similarity edge feature\n", "\n", "We print all similar pairs of words that are at least 50% similar but not 100%." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "query = \"\"\"\n", "word\n", "50> word\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.01s 170 results\n" ] } ], "source": [ "results = A.search(query)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
nwordword
1Consider Phlebas 1:1  Everything Consider Phlebas 1:1  everything
2Consider Phlebas 1:1  Everything Consider Phlebas 1:1  everything
3Consider Phlebas 1:1  us, Consider Phlebas 1:1  us,
4Consider Phlebas 1:1  everything Consider Phlebas 1:1  Everything
5Consider Phlebas 1:1  everything Consider Phlebas 1:1  everything
6Consider Phlebas 1:1  us, Consider Phlebas 1:1  us,
7Consider Phlebas 1:1  everything Consider Phlebas 1:1  Everything
8Consider Phlebas 1:1  everything Consider Phlebas 1:1  everything
9Consider Phlebas 1:1  we Consider Phlebas 1:2  we
10Consider Phlebas 1:1  we Consider Phlebas 1:2  we
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.table(results, end=10, withPassage=\"1 2\")" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "

result 1

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Consider Phlebas 1:1
sentence
line
Everything
about
us,
line
everything
around
us,
line
everything
we
know
and
can
know
of
line
is
composed
ultimately
of
patterns
of
nothing;
line
that’s
the
bottom
line,
the
final
truth.
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

result 2

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Consider Phlebas 1:1
sentence
line
Everything
about
us,
line
everything
around
us,
line
everything
we
know
and
can
know
of
line
is
composed
ultimately
of
patterns
of
nothing;
line
that’s
the
bottom
line,
the
final
truth.
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

result 3

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Consider Phlebas 1:1
sentence
line
Everything
about
us,
line
everything
around
us,
line
everything
we
know
and
can
know
of
line
is
composed
ultimately
of
patterns
of
nothing;
line
that’s
the
bottom
line,
the
final
truth.
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

result 4

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Consider Phlebas 1:1
sentence
line
Everything
about
us,
line
everything
around
us,
line
everything
we
know
and
can
know
of
line
is
composed
ultimately
of
patterns
of
nothing;
line
that’s
the
bottom
line,
the
final
truth.
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

result 5

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Consider Phlebas 1:1
sentence
line
Everything
about
us,
line
everything
around
us,
line
everything
we
know
and
can
know
of
line
is
composed
ultimately
of
patterns
of
nothing;
line
that’s
the
bottom
line,
the
final
truth.
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.show(results, end=5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We sort each pair.\n", "We keep track of pairs we have seen in order to prevent printing duplicate pairs." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "know ~ own\n", "harness ~ patterns\n", "nothing ~ things\n", "that ~ that’s\n", "the ~ those\n", "bottom ~ most\n", "life ~ line\n", "societies ~ those\n", "not ~ to\n", "make ~ take\n", "elegant ~ languages\n", "mattered ~ terms\n", "left ~ life\n", "humans ~ mountains\n", "care ~ romance\n", "studying ~ things\n", "impossible ~ problems\n" ] } ], "source": [ "seen = set()\n", "for (w1, w2) in results:\n", " if (w2, 100) in E.sim.b(w1):\n", " continue\n", " letters1 = F.letters.v(w1)\n", " letters2 = F.letters.v(w2)\n", " pair = tuple(sorted((letters1, letters2)))\n", " if pair in seen:\n", " continue\n", " seen.add(pair)\n", " print(\" ~ \".join(pair))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "All chapters:\n", "\n", "* [use](use.ipynb)\n", "* [share](share.ipynb)\n", "* *app*\n", "* [repo](repo.ipynb)\n", "* [compose](compose.ipynb)\n", "\n", "---\n", "\n", "CC-BY Dirk Roorda" ] } ], "metadata": { "jupytext": { "formats": "ipynb,md,py:light" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.1" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }