{ "cells": [ { "cell_type": "markdown", "id": "dominant-bracelet", "metadata": {}, "source": [ "# Getting started with clictagger in Jupyter Notebooks\n", "\n", "Firstly, load the module:" ] }, { "cell_type": "code", "execution_count": 1, "id": "metropolitan-sierra", "metadata": {}, "outputs": [], "source": [ "from clictagger.taggedtext import TaggedText" ] }, { "cell_type": "markdown", "id": "civic-month", "metadata": {}, "source": [ "All clictagger operations are done on a TaggedText object, so we create one first. Text should conform to [cleaning of corpora texts](https://github.com/mahlberg-lab/corpora#cleaning-of-corpora-texts) rules in the corpora repository.\n", "\n", "Text can be loaded directly from a string. When printing out a summary we get a summary of the regions found in the text:" ] }, { "cell_type": "code", "execution_count": 2, "id": "boxed-exhibition", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "TaggedText:Alice’s Adventures in Wonderland
characters675
metadata.title1
metadata.author1
chapter.title1
chapter.text1
chapter.paragraph2
chapter.sentence2
quote.quote2
quote.nonquote3
quote.suspension.short1
tokens123
" ], "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tt = TaggedText('''\n", "Alice’s Adventures in Wonderland\n", "Lewis Carroll\n", "\n", "CHAPTER I. Down the Rabbit-Hole\n", "\n", "Alice was beginning to get very tired of sitting by her sister on the\n", "bank, and of having nothing to do: once or twice she had peeped into the\n", "book her sister was reading, but it had no pictures or conversations in\n", "it, ‘and what is the use of a book,’ thought Alice ‘without pictures or\n", "conversations?’\n", "\n", "So she was considering in her own mind (as well as she could, for the\n", "hot day made her feel very sleepy and stupid), whether the pleasure\n", "of making a daisy-chain would be worth the trouble of getting up and\n", "picking the daisies, when suddenly a White Rabbit with pink eyes ran\n", "close by her.\n", "'''.lstrip())\n", "tt" ] }, { "cell_type": "markdown", "id": "signed-cuisine", "metadata": {}, "source": [ "We can also load text from the [corpora repository](https://github.com/mahlberg-lab/corpora) or directly (or any other repository if we specified the ``repo`` parameter), by specifying a path to a \".txt\" file in the repository. The tag is the version of corpora you are using. Enter the 7-character string of [the latest commit on the commits page](https://github.com/mahlberg-lab/corpora/commits/master), so if the text changes in future your work will stay reproducible:" ] }, { "cell_type": "code", "execution_count": 3, "id": "billion-anaheim", "metadata": {}, "outputs": [ { "data": { "text/html": [ "TaggedText:https://raw.githubusercontent.com/mahlberg-lab/corpora/80d00e4/ChiLit/alice.txt
characters144396
metadata.title1
metadata.author1
chapter.title12
chapter.text12
chapter.paragraph804
chapter.sentence1674
quote.quote1098
quote.embedded47
quote.nonquote865
quote.suspension.short166
quote.suspension.long106
tokens26548
" ], "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tt_corpora = TaggedText.from_github('ChiLit/alice', tag='80d00e4')\n", "tt_corpora" ] }, { "cell_type": "markdown", "id": "coordinate-moses", "metadata": {}, "source": [ "The ``markup()`` function will reformat the tagged text into coloured output, highlighting regions that were found:" ] }, { "cell_type": "code", "execution_count": 4, "id": "liberal-lodging", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
  • chapter.sentence
  • quote.quote
  • quote.suspension.short
  • quote.suspension.long
  • chapter.title

\n", "

\n", "
\n", "
CHAPTER I. Down the Rabbit-Hole
\n", "
\n", "
Alice was beginning to get very tired of sitting by her sister on the
\n", "bank, and of having nothing to do: once or twice she had peeped into the
\n", "book her sister was reading, but it had no pictures or conversations in
\n", "it,
‘and what is the use of a book,’ thought Alice ‘without pictures or
\n", "conversations?’

\n", "
\n", "
So she was considering in her own mind (as well as she could, for the
\n", "hot day made her feel very sleepy and stupid), whether the pleasure
\n", "of making a daisy-chain would be worth the trouble of getting up and
\n", "picking the daisies, when suddenly a White Rabbit with pink eyes ran
\n", "close by her.
" ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tt.markup()" ] }, { "cell_type": "markdown", "id": "conventional-association", "metadata": {}, "source": [ "We can also specify which region classes we want highlighted:" ] }, { "cell_type": "code", "execution_count": 5, "id": "original-dress", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
  • quote.quote
  • quote.suspension.short

\n", "

\n", "
\n", "
CHAPTER I. Down the Rabbit-Hole
\n", "
\n", "
Alice was beginning to get very tired of sitting by her sister on the
\n", "bank, and of having nothing to do: once or twice she had peeped into the
\n", "book her sister was reading, but it had no pictures or conversations in
\n", "it,
‘and what is the use of a book,’ thought Alice ‘without pictures or
\n", "conversations?’

\n", "
\n", "
So she was considering in her own mind (as well as she could, for the
\n", "hot day made her feel very sleepy and stupid), whether the pleasure
\n", "of making a daisy-chain would be worth the trouble of getting up and
\n", "picking the daisies, when suddenly a White Rabbit with pink eyes ran
\n", "close by her.
" ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tt.markup([\"quote.quote\", \"quote.suspension.short\"])" ] }, { "cell_type": "markdown", "id": "female-motor", "metadata": {}, "source": [ "Alternatively, the ``table()`` function will return a table of each region tag, and it's start and end position in the text. Again we can provide a list of region classes we're interested in:" ] }, { "cell_type": "code", "execution_count": 6, "id": "communist-alaska", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "
Region classStartEndRegion valueContent
quote.quote 300 332 ‘and what is the use of a book,’
quote.quote 347 383 ‘without pictures or\n", "conversations?’
quote.suspension.short 333 346 thought Alice
\n" ], "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tt.table([\"quote.quote\", \"quote.suspension.short\"])" ] }, { "cell_type": "markdown", "id": "empirical-burden", "metadata": {}, "source": [ "By providing the display parameter, we can have a CSV download link instead:" ] }, { "cell_type": "code", "execution_count": 7, "id": "forty-verse", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "Download CSV file" ], "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tt.table([\"quote.quote\", \"quote.suspension.short\"], display='csv-download')" ] }, { "cell_type": "markdown", "id": "intelligent-narrow", "metadata": {}, "source": [ "Again, we can get a table of the region types we are interested in:" ] }, { "cell_type": "code", "execution_count": 8, "id": "behind-stock", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Region classStartEndRegion valueContent
quote.embedded 15351 15374 “How doth the little--”
quote.embedded 16251 16273 “Come up again, dear!”
quote.embedded 16303 16443 “Who am I then? Tell me that first, and then,\n", "if I like being that person, I’ll come up: if not, I’ll stay down here\n", "till I’m somebody else”
quote.embedded 23759 24000 “William\n", "the Conqueror, whose cause was favoured by the pope, was soon submitted\n", "to by the English, who wanted leaders, and had been of late much\n", "accustomed to usurpation and conquest. Edwin and Morcar, the earls of\n", "Mercia and Northumbria--”
quote.embedded 24209 24362 “Edwin and Morcar,\n", "the earls of Mercia and Northumbria, declared for him: and even Stigand,\n", "the patriotic archbishop of Canterbury, found it advisable--”
quote.embedded 24701 24866 “--found\n", "it advisable to go with Edgar Atheling to meet William and offer him the\n", "crown. William’s conduct at first was moderate. But the insolence of his\n", "Normans--”
quote.embedded 28830 29059 “Let us\n", " both go to\n", " law: I will\n", " prosecute\n", " YOU.--Come,\n", " I’ll take no\n", " denial; We\n", " must have a\n", " trial: For\n", " really this\n", " morning I’ve\n", " nothing\n", " to do.”
quote.embedded 29106 29255 “Such\n", " a trial,\n", " dear Sir,\n", " With\n", " no jury\n", " or judge,\n", " would be\n", " wasting\n", " our\n", " breath.”
quote.embedded 29264 29311 “I’ll be\n", " judge, I’ll\n", " be jury,”
quote.embedded 29377 29521 “I’ll\n", " try the\n", " whole\n", " cause,\n", " and\n", " condemn\n", " you\n", " to\n", " death.”
quote.embedded 33844 33906 “Miss Alice! Come\n", "here directly, and get ready for your walk!”
quote.embedded 33907 33987 “Coming in a minute,\n", "nurse! But I’ve got to see that the mouse doesn’t get out.”
quote.embedded 48578 48609 “HOW DOTH THE LITTLE BUSY BEE,”
quote.embedded 48690 48720 “YOU ARE OLD, FATHER WILLIAM,”
quote.embedded 73790 73808 “I see what I eat”
quote.embedded 73830 73848 “I eat what I see”
quote.embedded 73910 73929 “I like what I\n", "get”
quote.embedded 73951 73970 “I get what I like”
quote.embedded 74069 74093 “I breathe when I sleep”
quote.embedded 74115 74139 “I sleep when I breathe”
quote.embedded 78302 78347 “He’s murdering the time! Off with his\n", "head!”
quote.embedded 83048 83068 “much of a muchness”
quote.embedded 90607 90621 “What a pity!”
quote.embedded 90713 90724 “What for?”
quote.embedded 100013 100071 “Oh, ‘tis love,\n", "‘tis love, that makes the world go round!”
quote.embedded 100320 100390 “Take care of the sense, and the sounds will take care of\n", "themselves.”
quote.embedded 100862 100898 “Birds of a feather flock together.”
quote.embedded 101233 101289 “The more there is of mine, the less there is of\n", "yours.”
quote.embedded 101494 101524 “Be what you would seem to be”
quote.embedded 101563 101767 “Never imagine yourself not to be otherwise than what it might\n", "appear to others that what you were or might have been was not otherwise\n", "than what you had been would have appeared to them to be otherwise.”
quote.embedded 105472 105482 “come on!”
quote.embedded 108362 108398 “French, music, AND WASHING--extra.”
quote.embedded 108815 108830 “Uglification,”
quote.embedded 113244 113276 “Will you walk a little faster?”
quote.embedded 114051 114083 “What matters it how far we go?”
quote.embedded 116563 116610 “Keep back, please: we\n", "don’t want YOU with us!”
quote.embedded 116911 116932 “With what porpoise?”
quote.embedded 118280 118313 “‘TIS THE VOICE OF THE SLUGGARD,”
quote.embedded 118691 118743 “You have baked me too brown, I must sugar my hair.”
quote.embedded 119853 119878 “I passed by his garden.”
quote.embedded 121013 121027 “Turtle Soup,”
quote.embedded 130280 130381 “There was some attempts\n", "at applause, which was immediately suppressed by the officers of the\n", "court,”
quote.embedded 139239 139266 “--SAID\n", "I COULD NOT SWIM--”
quote.embedded 139543 139568 “WE KNOW IT TO BE TRUE--”
quote.embedded 139597 139634 “I GAVE HER ONE, THEY GAVE HIM TWO--”
quote.embedded 139711 139747 “THEY ALL RETURNED FROM HIM TO YOU,”
quote.embedded 139896 139923 “BEFORE SHE\n", "HAD THIS FIT--”
\n" ], "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tt_corpora.table([\"quote.embedded\"])" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.2" } }, "nbformat": 4, "nbformat_minor": 5 }