{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "\n", "---\n", "Start with [convert](https://nbviewer.jupyter.org/github/annotation/banks/blob/master/programs/convert.ipynb)\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# The Banks example corpus as app" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from tf.app import use" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We do not only load the main corpus data, but also the additional *sim* (similarity) feature that is in a\n", "module.\n", "\n", "For the very last version, use `hot`.\n", "\n", "For the latest release, use `latest`.\n", "\n", "If you have cloned the repos (TF app and data), use `clone`.\n", "\n", "If you do not want/need to upgrade, leave out the checkout specifiers." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "# A = use(\"annotation/banks:latest\", mod=\"annotation/banks/sim/tf:hot\", hoist=globals())" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "**Locating corpus resources ...**" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "app: ~/text-fabric-data/github/annotation/banks/app" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/annotation/banks/tf/0.2" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "data: ~/text-fabric-data/github/annotation/banks/sim/tf/0.2" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " TF: TF API 12.5.4, annotation/banks/app v3, Search Reference
\n", " Data: annotation - banks 0.2, Character table, Feature docs
\n", "
Node types\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "
Name# of nodes# slots / node% coverage
book199.00100
chapter249.50100
sentence333.00100
line127.6793
word991.00100
\n", " Sets: no custom sets
\n", " Features:
\n", "
annotation/banks/sim/tf\n", "
\n", "\n", "
\n", "
\n", "sim\n", "
\n", "
int
\n", "\n", " similarity between words, as a percentage of the common material wrt the combined material\n", "\n", "
\n", "\n", "
\n", "
\n", "\n", "
Two quotes from Consider Phlebas by Iain M. Banks\n", "
\n", "\n", "
\n", "
\n", "author\n", "
\n", "
str
\n", "\n", " the author of a book\n", "\n", "
\n", "\n", "
\n", "
\n", "gap\n", "
\n", "
int
\n", "\n", " 1 for words that occur between [ ], which are inserted by the editor\n", "\n", "
\n", "\n", "
\n", "
\n", "letters\n", "
\n", "
str
\n", "\n", " the letters of a word\n", "\n", "
\n", "\n", "
\n", "
\n", "number\n", "
\n", "
int
\n", "\n", " number of chapter, or sentence in chapter, or line in sentence\n", "\n", "
\n", "\n", "
\n", "
\n", "otype\n", "
\n", "
str
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "punc\n", "
\n", "
str
\n", "\n", " the punctuation after a word\n", "\n", "
\n", "\n", "
\n", "
\n", "terminator\n", "
\n", "
str
\n", "\n", " the last character of a line\n", "\n", "
\n", "\n", "
\n", "
\n", "title\n", "
\n", "
str
\n", "\n", " the title of a book\n", "\n", "
\n", "\n", "
\n", "
\n", "oslots\n", "
\n", "
none
\n", "\n", " \n", "\n", "
\n", "\n", "
\n", "
\n", "\n", " Settings:
specified
  1. apiVersion: 3
  2. appName: annotation/banks
  3. appPath: /Users/me/text-fabric-data/github/annotation/banks/app
  4. commit: g5e28b54aaf679d5fbbbf7e879a6316f323b9d330
  5. css: ''
  6. dataDisplay:
  7. \n", " textFormats:\n", "
  8. \n", " layout-orig-full:\n", "
    • method: layoutRich
    • style: normal
    \n", "
  9. \n", "
  10. docs:
    • docBase: {docRoot}/{org}/{repo}/blob/master/programs
    • docExt: .ipynb
    • docPage: convert
    • docRoot: {urlNb}
    • featureBase: {docBase}
    • featurePage: convert
  11. interfaceDefaults: {}
  12. isCompatible: True
  13. local: local
  14. localDir: /Users/me/text-fabric-data/github/annotation/banks/_temp
  15. provenanceSpec:
    • corpus: Two quotes from Consider Phlebas by Iain M. Banks
    • doi: 10.5281/zenodo.2630416
    • org: annotation
    • relative: /tf
    • repo: banks
    • version: 0.2
  16. release: v3.1
  17. typeDisplay:
    • book: {featuresBare: author}
    • line:
      • features: terminator
      • label: {number}
      • template: {number}
      • verselike: True
    • word: {features: gap}
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
TF API: names N F E L T S C TF Fs Fall Es Eall Cs Call directly usable

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A = use(\n", " \"annotation/banks\",\n", " mod=\"annotation/banks/sim/tf\",\n", " hoist=globals(),\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Use the similarity edge feature\n", "\n", "We print all similar pairs of words that are at least 50% similar but not 100%." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "query = \"\"\"\n", "word\n", "50> word\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.00s 170 results\n" ] } ], "source": [ "results = A.search(query)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
nwordword
1Consider Phlebas 1:1  Everything Consider Phlebas 1:1  everything
2Consider Phlebas 1:1  Everything Consider Phlebas 1:1  everything
3Consider Phlebas 1:1  us, Consider Phlebas 1:1  us,
4Consider Phlebas 1:1  everything Consider Phlebas 1:1  Everything
5Consider Phlebas 1:1  everything Consider Phlebas 1:1  everything
6Consider Phlebas 1:1  us, Consider Phlebas 1:1  us,
7Consider Phlebas 1:1  everything Consider Phlebas 1:1  Everything
8Consider Phlebas 1:1  everything Consider Phlebas 1:1  everything
9Consider Phlebas 1:1  we Consider Phlebas 1:2  we
10Consider Phlebas 1:1  we Consider Phlebas 1:2  we
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.table(results, end=10, withPassage=\"1 2\")" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "

result 1" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

Consider Phlebas 1:1
sentence
line 1
Everything
about
us,
line 2
everything
around
us,
line 3
everything
we
know
and
can
know
of
line 3
is
composed
ultimately
of
patterns
of
nothing;
line 4
that’s
the
bottom
line,
the
final
truth.
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

result 2" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

Consider Phlebas 1:1
sentence
line 1
Everything
about
us,
line 2
everything
around
us,
line 3
everything
we
know
and
can
know
of
line 3
is
composed
ultimately
of
patterns
of
nothing;
line 4
that’s
the
bottom
line,
the
final
truth.
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

result 3" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

Consider Phlebas 1:1
sentence
line 1
Everything
about
us,
line 2
everything
around
us,
line 3
everything
we
know
and
can
know
of
line 3
is
composed
ultimately
of
patterns
of
nothing;
line 4
that’s
the
bottom
line,
the
final
truth.
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

result 4" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

Consider Phlebas 1:1
sentence
line 1
Everything
about
us,
line 2
everything
around
us,
line 3
everything
we
know
and
can
know
of
line 3
is
composed
ultimately
of
patterns
of
nothing;
line 4
that’s
the
bottom
line,
the
final
truth.
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

result 5" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "

Consider Phlebas 1:1
sentence
line 1
Everything
about
us,
line 2
everything
around
us,
line 3
everything
we
know
and
can
know
of
line 3
is
composed
ultimately
of
patterns
of
nothing;
line 4
that’s
the
bottom
line,
the
final
truth.
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "A.show(results, end=5, tupleFeatures=())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We sort each pair.\n", "We keep track of pairs we have seen in order to prevent printing duplicate pairs." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "know ~ own\n", "harness ~ patterns\n", "nothing ~ things\n", "that ~ that’s\n", "the ~ those\n", "bottom ~ most\n", "life ~ line\n", "societies ~ those\n", "not ~ to\n", "make ~ take\n", "elegant ~ languages\n", "mattered ~ terms\n", "left ~ life\n", "humans ~ mountains\n", "care ~ romance\n", "studying ~ things\n", "impossible ~ problems\n" ] } ], "source": [ "seen = set()\n", "for (w1, w2) in results:\n", " if (w2, 100) in E.sim.b(w1):\n", " continue\n", " letters1 = F.letters.v(w1)\n", " letters2 = F.letters.v(w2)\n", " pair = tuple(sorted((letters1, letters2)))\n", " if pair in seen:\n", " continue\n", " seen.add(pair)\n", " print(\" ~ \".join(pair))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "All chapters:\n", "\n", "* [use](use.ipynb)\n", "* [share](share.ipynb)\n", "* *app*\n", "* [repo](repo.ipynb)\n", "* [compose](compose.ipynb)\n", "\n", "---\n", "\n", "CC-BY Dirk Roorda" ] } ], "metadata": { "jupytext": { "formats": "ipynb,md,py:light" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.4" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }