{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Speculative magic words workbook\n", "\n", "By [Allison Parrish](http://www.decontextualize.com/)\n", "\n", "(Early draft, incomplete, under construction gif here)\n", "\n", "The goal of this notebook is to demonstrate some computational means for exploring the literary genre of the *magic word*. For present purposes, I define a \"magic word\" as a string of letters that affords a foregrounding of its material properties (e.g., spelling, pronunciation), and suggests some effect beyond meaning alone. The underlying assumption (maybe faulty) is that magic words with similar material properties will also have similar effects, and that by writing computer programs to produce magic words (whether from whole cloth or as variants on other magic words), we can produce *new* magic words with *new* effects.\n", "\n", "I don't understand this notebook as a way of *casting* spells, but merely as a way of investigating potential forms. Hence: *speculative* magic words.\n", "\n", "The notebook serves as a demonstration of (1) Python string manipulation techniques; and (2) the Pincelate library for grapheme-to-phoneme and phoneme-to-grapheme translation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Preliminaries\n", "\n", "Some of these examples will be data-driven, i.e., we need an existing corpus of words. [Download this file](https://github.com/dariusk/corpora/blob/master/data/words/nouns.json) into the same folder as this notebook like so:" ] }, { "cell_type": "code", "execution_count": 332, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " % Total % Received % Xferd Average Speed Time Time Time Current\n", " Dload Upload Total Spent Left Speed\n", "100 18192 100 18192 0 0 94259 0 --:--:-- --:--:-- --:--:-- 94750\n" ] } ], "source": [ "!curl -L -O https://raw.githubusercontent.com/dariusk/corpora/master/data/words/nouns.json" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The file contains a list of English nouns. The code in the cell below reads them into a list. We'll use this list throughout in the code below." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import json\n", "nouns = [item.lower() for item in json.load(open(\"nouns.json\"))['nouns']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `random` module has a function `choice` that picks one item from a list at random:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import random" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'mediator'" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "random.choice(nouns)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Orthographic variations\n", "\n", "> \"[W]riting gave physical permanence to words.... Written words continued to act in one's behalf long after the sound of spoken words had ceased\" (Skemer 133)\n", "\n", "> \"Motion terminates at no other end save its own beginning, in order to cease and rest in it... In the intelligible world... Grammar begins with the letter, from which all writing is derived and into which it is all resolved\" (John Scotus Erigena, quoted in Leggott 46)\n", "\n", "> \"[T]he unit of textual meaning—the letter—lacks meaning itself. The alphabet's semantic vacuum represents a threat to orthodoxy, for into this space competing meaning systems may rush.\" (Crain 18)\n", "\n", "The words in many apotropaic charms exhibit certain kinds of manipulation that we can characterize as *orthographic* in nature—i.e., they have to do with the letters in the words. In this section of the notebook, I show some computer code for performing these transformations explicitly.\n", "\n", "The following cell defines a short text that we'll use for testing purposes:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "text = \"in the beginning was the notebook\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Cacography\n", "\n", "> \"In medieval manuscripts, the letters themselves were frequently a source of confusion. [...] The first letters of words can be omitted... while others are doubled up.... Words can be dislocated,\" \"compounded,\" \"contracted,\" \"abbreviated\"; \"letters vanish. [...] [W]e should also mention the variations made with uppercase and lowercase letters.... May this overview give the reader a small idea of the difficulties encountered by the researcher!\" (Lecouteux xxi)\n", "\n", "\"Cacography\" here means writing with mistakes. In medieval grimoires, mistakes were usually introduced as errors in copying, but the presence of errors actually made people perceive the spells as more powerful. We can simulate these errors in Python." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Compounding/contracting words\n", "\n", "This operation \"contracts\" two words, smooshing together the first and last parts." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "noun1 = random.choice(nouns)\n", "noun2 = random.choice(nouns)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "spoiler intercession\n" ] } ], "source": [ "print(noun1, noun2)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'spoession'" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "noun1[:int(len(noun1)/2)] + noun2[int(len(noun2)/2):]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In function form:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'allrish'" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def smoosh(a, b):\n", " return a[:int(len(a)/2)] + b[int(len(b)/2):]\n", "smoosh(\"allison\", \"parrish\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Dislocation\n", "\n", "This operation inserts random spaces, dislocating words from each other." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "in the beginning was the notebo ok\n" ] } ], "source": [ "out = \"\"\n", "for ch in text:\n", " if random.random() < 0.1:\n", " out += \" \"\n", " out += ch\n", "print(out)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As a function:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'ab rac adabra'" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def dislocate(s, prob=0.1):\n", " out = \"\"\n", " for ch in s:\n", " if random.random() < prob:\n", " out += \" \"\n", " out += ch\n", " return out\n", "dislocate(\"abracadabra\")" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "' a b r ac ad a b r a'" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dislocate(\"abracadabra\", 0.75)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Coding, transliteration, encryption\n", "\n", "Another strategy for producing magic words is transliterating them (e.g., converting Greek letters to their Roman equivalent) or applying ciphers (like a [substitution cipher](https://en.wikipedia.org/wiki/Substitution_cipher), in which each letter is replaced with another letter). These techniques retain the underlying *structure* of the spelling, so the resulting form doesn't look entirely random. But it doesn't retain the surface form—it makes the familiar unfamiliar.\n", "\n", "#### Character ciphers\n", "\n", "The function below implements simple character replacement:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "def replace_by_char(s, ch_map):\n", " out = \"\"\n", " for ch in s:\n", " if ch in ch_map:\n", " out += ch_map[ch]\n", " else:\n", " out += ch\n", " return out" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You need to give the function a dictionary that maps any letter expected in the input to a corresponding letter to output. This dictionary maps each letter to the letter that follows it in the alphabet:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "nextch_map = {\n", " 'a': 'b', 'b': 'c', 'c': 'd', 'd': 'e',\n", " 'e': 'f', 'f': 'g', 'g': 'h', 'h': 'i',\n", " 'i': 'j', 'j': 'k', 'k': 'l', 'l': 'm',\n", " 'm': 'n', 'n': 'o', 'o': 'p', 'p': 'q',\n", " 'q': 'r', 'r': 's', 's': 't', 't': 'u',\n", " 'u': 'v', 'v': 'w', 'w': 'x', 'x': 'y',\n", " 'y': 'z', 'z': 'a'\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Call it on a string:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'bmmjtpo qbssjti'" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "replace_by_char(\"allison parrish\", nextch_map)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A well-known cipher in computer programming culture is [rot13](https://en.wikipedia.org/wiki/ROT13), in which each character is replaced with the character that comes thirteen spots later in the alphabet (wrapping around the end of the alphabet as needed). It's so common, it's already implemented in Python:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'nyyvfba cneevfu'" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import codecs\n", "codecs.encode(\"allison parrish\", 'rot13')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Mirror writing\n", "\n", "> \"According to legend, some devil-pacts were written in retrograde to invoke diabolical powers. [...] Artists depicted retrograde writing as demonic. In a 15th c. block book, a demon is shown holding up a tablet on which the sins of the dying man's life are recorded in mirror writing...\" (Skemer 121)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "# from https://github.com/combatwombat/Lunicode.js/blob/master/lunicode.js\n", "mirror_replacements = {\n", " 'a': 'ɒ', 'b': 'd', 'c': 'ɔ', 'd': 'b', 'e': 'ɘ', \n", " 'f': 'Ꮈ', 'g': 'ǫ', 'h': 'ʜ', 'i': 'i', 'j': 'ꞁ',\n", " 'k': 'ʞ', 'l': 'l', 'm': 'm', 'n': 'ᴎ', 'o': 'o',\n", " 'p': 'q', 'q': 'p', 'r': 'ɿ', 's': 'ꙅ', 't': 'ƚ',\n", " 'u': 'u', 'v': 'v', 'w': 'w', 'x': 'x', 'y': 'ʏ', 'z': 'ƹ',\n", " 'A': 'A', 'B': 'ᙠ', 'C': 'Ɔ', 'D': 'ᗡ', 'E': 'Ǝ',\n", " 'F': 'ꟻ', 'G': 'Ꭾ', 'H': 'H', 'I': 'I', 'J': 'Ⴑ',\n", " 'K': '⋊', 'L': '⅃', 'M': 'M', 'N': 'Ͷ', 'O': 'O',\n", " 'P': 'ꟼ', 'Q': 'Ọ', 'R': 'Я', 'S': 'Ꙅ', 'T': 'T',\n", " 'U': 'U', 'V': 'V', 'W': 'W', 'X': 'X', 'Y': 'Y', 'Z': 'Ƹ'}" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "in the beginning was the notebook iᴎ ƚʜɘ dɘǫiᴎᴎiᴎǫ wɒꙅ ƚʜɘ ᴎoƚɘdooʞ\n" ] } ], "source": [ "print(text + \" \" + replace_by_char(text, mirror_replacements))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Mimicking handwriting mistakes and misinterpretations\n", "\n", "Magic words gain power from being copied over and over; mistakes creep in that make the words strange. Lecouteux (p. xxi) suggests that the following accidental replacements were common in medieval manuscripts written in Roman scripts:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "# suggested in Lecouteux, p. xxi\n", "replacements = {\n", " 'u': ['o', 'n'],\n", " 'st': ['h'],\n", " 'p': ['f'],\n", " 'ni': ['m'],\n", " 'rn': ['m'],\n", " 'in': ['m'],\n", " 'iu': ['m', 'in'],\n", " 'r': ['t', 'z', 'c'],\n", " 'l': ['t'],\n", " 'c': ['t'],\n", " 'd': ['ol']\n", "}" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "import re\n", "import random" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These replacements have to be implemented a bit differently from the character substitution ciphers, because the patterns on the left have varying numbers of characters. So we can't just step straight through the source string character by character. The following code replaces every instance of sequences of characters on the left (dictionary keys) at random from the suggested replacements on the right (dictionary values), if a coin flip succeeds." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "in the beginning was the notebook\n", "in the begmmng was the notebook\n" ] } ], "source": [ "out = text\n", "for patt, repl in replacements.items():\n", " out = re.sub(patt,\n", " lambda m: random.choice(repl) if random.random() < 0.5 else m.group(),\n", " out)\n", "print(text)\n", "print(out)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Abbreviations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> [In magic spells] \"we find sequences of letters that can be the initials of words. [...] A passage from the *Gesta Imperatorum* suggests this; in fact we read there the sequence \"P P P, S S S, R R R, F F F,\" meaning, \"Pater patriae perditur, sapientia secum sustollitur, ruunt regna Rome ferro, flamma, fame.\" The series of letters would therefore be a mnemonic means used to retain whole phrases, but in charms it also serves as a way to keep things secret...\" (Lecouteux xx)\n", "\n", "The following function takes a string and returns the first *n* characters of each word in the string (as a list)." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['h', 't', 'h', 'a', 'y']" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def abbrev(s, take=1):\n", " words = s.split()\n", " return [w[:take] for w in words]\n", "abbrev(\"hello there how are you?\")" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['in', 'th', 'be', 'wa', 'th', 'no']" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "abbrev(text, 2)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "inthbewathno\n" ] } ], "source": [ "print(''.join(abbrev(text, 2)))" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "In. Th. Be. Wa. Th. No\n" ] } ], "source": [ "init_cap = [item.capitalize() for item in abbrev(text, 2)]\n", "print('. '.join(init_cap))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Formatting\n", "\n", "According to Skemer, magic words and formulas such as *abracadabra* and *abraxas* were \"often written as diminishing and augmenting series of letters\"—shaped in \"inverted triangles\" or \"[mandorlas](https://en.wikipedia.org/wiki/Mandorla)\" (116).\n", "\n", "The following function implements a word triangle, in which the word is spelled out letter-by-letter, with each spelling on its own line (returned as a list). It's demonstrated here with a second call that reverses the order, creating a mandorla." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a\n", "ab\n", "abr\n", "abra\n", "abrac\n", "abraca\n", "abracad\n", "abracada\n", "abracadab\n", "abracadabr\n", "abracadabra\n", "abracadabra\n", "abracadabr\n", "abracadab\n", "abracada\n", "abracad\n", "abraca\n", "abrac\n", "abra\n", "abr\n", "ab\n", "a\n" ] } ], "source": [ "def triangle(s):\n", " out = []\n", " for i in range(len(s)):\n", " snippet = s[:i+1]\n", " out.append(snippet)\n", " return out\n", "print(\"\\n\".join(triangle(\"abracadabra\")))\n", "print(\"\\n\".join(reversed(triangle(\"abracadabra\"))))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `mandorla` function performs both steps:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "def mandorla(s):\n", " return triangle(s)[:-1] + list(reversed(triangle(s)))" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a\n", "ab\n", "abr\n", "abra\n", "abrac\n", "abraca\n", "abracad\n", "abracada\n", "abracadab\n", "abracadabr\n", "abracadabra\n", "abracadabr\n", "abracadab\n", "abracada\n", "abracad\n", "abraca\n", "abrac\n", "abra\n", "abr\n", "ab\n", "a\n" ] } ], "source": [ "print(\"\\n\".join(mandorla(\"abracadabra\")))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Jupyter Notebook displays text in a fixed-width font by default, so centering doesn't work very well. Instead, we'll write the lines out as HTML and display with Jupyter Notebook's HTML widget:" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "from IPython.display import display, HTML" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "html_src = \"
\"\n", "html_src += \"
\".join(mandorla(\"abracadabra\"))\n", "html_src += \"
\"" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
a
ab
abr
abra
abrac
abraca
abracad
abracada
abracadab
abracadabr
abracadabra
abracadabr
abracadab
abracada
abracad
abraca
abrac
abra
abr
ab
a
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display(HTML(html_src))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's a mandorla of the word `abracadabra` followed by its mirror replacement:" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [], "source": [ "html_src = \"
\"\n", "html_src += \"
\".join(mandorla(\"abracadabra\" + replace_by_char(\"abracadabra\", mirror_replacements)))\n", "html_src += \"
\"" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
a
ab
abr
abra
abrac
abraca
abracad
abracada
abracadab
abracadabr
abracadabra
abracadabraɒ
abracadabraɒd
abracadabraɒdɿ
abracadabraɒdɿɒ
abracadabraɒdɿɒɔ
abracadabraɒdɿɒɔɒ
abracadabraɒdɿɒɔɒb
abracadabraɒdɿɒɔɒbɒ
abracadabraɒdɿɒɔɒbɒd
abracadabraɒdɿɒɔɒbɒdɿ
abracadabraɒdɿɒɔɒbɒdɿɒ
abracadabraɒdɿɒɔɒbɒdɿ
abracadabraɒdɿɒɔɒbɒd
abracadabraɒdɿɒɔɒbɒ
abracadabraɒdɿɒɔɒb
abracadabraɒdɿɒɔɒ
abracadabraɒdɿɒɔ
abracadabraɒdɿɒ
abracadabraɒdɿ
abracadabraɒd
abracadabraɒ
abracadabra
abracadabr
abracadab
abracada
abracad
abraca
abrac
abra
abr
ab
a
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display(HTML(html_src))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Word squares\n", "\n", "The [Sator Square](https://en.wikipedia.org/wiki/Sator_Square):\n", "\n", " S A T O R\n", " A R E P O\n", " T E N E T\n", " O P E R A\n", " R O T A S\n", " \n", "\"Arepo the sower guides the wheels by his work\" (Skemer's translation, pp. 116–117), an example of an apotropaic formula that \"clearly worked best in writing\" (134).\n", "\n", "We can generate random word squares in Python.\n", "\n", "The function below creates a string of random characters of the given length, using the specified alphabet." ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'yhmok'" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def gen_str(n, alphabet):\n", " return ''.join([random.choice(alphabet) for i in range(n)])\n", "gen_str(5, alphabet=\"abcdefghijklmnopqrstuvwxyz\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can pick the letters in the alphabet, and the word will be constrained to contain only those letters." ] }, { "cell_type": "code", "execution_count": 193, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'aarad'" ] }, "execution_count": 193, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gen_str(5, alphabet=\"abracadabra\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `gen_square` function generates random word squares of size `n` using the given alphabet. (\"Letter square\" might be a better term here, since the function is not guaranteed to produce valid \"words.\")" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [], "source": [ "def gen_square(n, alphabet='abcdefghijklmnopqrstuvwxyz', start=None):\n", " if start is None:\n", " rows = [gen_str(n, alphabet)]\n", " else:\n", " assert len(start) == n\n", " rows = [start]\n", " for i in range(int(n/2)):\n", " beg = \"\"\n", " end = \"\"\n", " for j in range(i+1):\n", " beg += rows[j][i+1]\n", " end += rows[j][-i-2]\n", " row = beg + gen_str(n - ((i+1)*2), alphabet) + ''.join(reversed(end))\n", " rows.append(row)\n", " return rows + list(reversed([''.join(reversed(s)) for s in rows[:int(n/2)]]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Five on the side with random letters:" ] }, { "cell_type": "code", "execution_count": 195, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "xoqgr\n", "olfug\n", "qfifq\n", "guflo\n", "rgqox\n" ] } ], "source": [ "print(\"\\n\".join(gen_square(5)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Five on the side with only the letters in the Sator square starting with the word `sator`:" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "sator\n", "aarto\n", "trtrt\n", "otraa\n", "rotas\n" ] } ], "source": [ "print(\"\\n\".join(gen_square(5, alphabet=\"satorarepotenet\", start=\"sator\")))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Seven on a side, starting with the word `allison`:" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "allison\n", "lwediho\n", "lehjpis\n", "idjdjdi\n", "sipjhel\n", "ohidewl\n", "nosilla\n" ] } ], "source": [ "print(\"\\n\".join(gen_square(7, start=\"allison\")))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Five on a square, with an alphabet of emoji:" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "🙂😘😁😁😂\n", "😘🙃😍🙂😁\n", "😁😍😊😍😁\n", "😁🙂😍🙃😘\n", "😂😁😁😘🙂\n" ] } ], "source": [ "print()\n", "print(\"\\n\".join(gen_square(5, alphabet=\"😀😄😁😆😅😂🤣😊😙😗😘🥰😍😌😉🙃🙂😇😚😋😛😝😜🤨🧐🤓😎\")))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Numerology\n", "\n", "> \"Gematria was based on the fact that, in Hebrew, numbers are indicated by letters; this means that each Hebrew word can be given a numerical value, calculated by summing numbers represented by its letters. This allows mystic relations to be established between words having different meanings though identical numerical values...\" (Eco 28)\n", "\n", "Let's write some code that groups words by the sum of their numbers." ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# only works for the English alphabet (a-z)\n", "def letter_value(ch):\n", " if not(ch.isalpha()):\n", " return 0\n", " return ord(ch.lower()) - 96\n", "letter_value('a')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Adds up the sum of letters in a word:" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "82" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def gematriesque(s):\n", " return sum([letter_value(ch) for ch in s])\n", "gematriesque('allison')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The code below finds the sum of every noun in our noun list, then makes a lookup dictionary that shows us all of the nouns with a given sum." ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [], "source": [ "from collections import defaultdict\n", "by_sum = defaultdict(list)\n", "word_to_sum = {}" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [], "source": [ "for item in nouns:\n", " letter_sum = gematriesque(item)\n", " word_to_sum[item] = letter_sum\n", " by_sum[letter_sum].append(item)" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['ambulance',\n", " 'carrier',\n", " 'dawning',\n", " 'discord',\n", " 'homeland',\n", " 'lifeline',\n", " 'mayor',\n", " 'sending',\n", " 'tendon',\n", " 'tracing']" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "by_sum[72]" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "82" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gematriesque('allison')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Display words with the same sum as the given word:" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "frenchman\n", "apartheid\n", "artisan\n", "bowling\n", "colors\n", "conflict\n", "glucose\n", "gusto\n", "hallway\n", "indecency\n", "innocence\n", "juror\n", "kangaroo\n", "melodrama\n", "panther\n", "volcano\n", "voltage\n" ] } ], "source": [ "print(\"\\n\".join(by_sum[gematriesque('allison')]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The sound\n", "\n", "> \"[T]hose who are skilled in the use of incantations, relate that the utterance of the same incantation in its proper language can accomplish what the spell professes to do; but when translated into any other tongue, it is observed to become inefficacious and feeble. And thus it is not the things signified, but the qualities and peculiarities of words, which possess a certain power for this or that purpose...\"—Origen (in Richardson and Pick 406–407)\n", "\n", "> \"The rhyme, repetition and alliteration of charms produced a sonorous effect that appealed to users and had psychological effects. [...] Words in a sacralized and euphonious language like Latin could be soothing to the ear and thus might seem to have an immediate magical effect. [...] Vocalized reading [was] better able to deter evil spirits\" (Skemer 153)\n", "\n", "> \"I don't think I can breathe / With the way you let me down [...] / I don't need the words / I want the sound, sound, sound...\" (Jepsen)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A recurring concept with magic words is that *what they sound like matters*. So it would be nice if we had some way to compose magic words based solely on their phonetics. The problem (in English, at least) is creating the *written* form of a word from its sound—i.e., spelling." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pincelate is a Python library that provides a simple interface for a machine learning model that can sound out English words and spell English words based on how they sound. \"Sounding out\" here means converting letters (\"orthography\") to sounds (\"phonemes\"), and \"spelling\" means converting sounds to letters (phonemes to orthography). The model is trained on the [CMU Pronouncing Dictionary](http://www.speech.cs.cmu.edu/cgi-bin/cmudict), which means it generally sounds words out as though speaking \"standard\" North American English, and spells words according to \"standard\" North American English rules (at least as far as the model itself is accurate).\n", "\n", "### Installing Pincelate\n", "\n", "You need to install the `tensorflow` and `pincelate` modules. Open up a terminal window and type the following lines:\n", "\n", " pip install tensorflow\n", " pip install pincelate\n", " \n", "If you're not using Anaconda, you might also need to install a few other libraries:\n", "\n", " pip install numpy scipy\n", " \n", "Now import the libraries:" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pronouncing as pr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now import Pincelate and instantiate a Pincelate object. (This will load the pre-trained model provided with the package.)" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "scrolled": true }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Using TensorFlow backend.\n" ] } ], "source": [ "from pincelate import Pincelate" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [], "source": [ "pin = Pincelate()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Later in the notebook, I'm going to use some of Jupyter Notebook's interactive features, so I'll import the libraries here:" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [], "source": [ "import ipywidgets as widgets\n", "from IPython.display import display\n", "from ipywidgets import interact, interactive_output, Layout, HBox, VBox" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sounding out and spelling\n", "\n", "Pincelate is a machine learning model trained on the [CMU Pronouncing Dictionary](http://www.speech.cs.cmu.edu/cgi-bin/cmudict), a database of tens of thousands of English words along with their pronunciations. To get the pronunciation of a word:" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['M', 'IH1', 'M', 'S', 'IY0']" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pin.soundout(\"mimsy\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "... and to produce a plausible spelling for a word whose sounds you just made up, use the `.spell()` method, passing it a list of Arpabet phonemes:" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'blurf'" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pin.spell(['B', 'L', 'AH1', 'R', 'F'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's important to note that Pincelate's `.soundout()` method will *only* work with letters that appear the CMU Pronouncing Dictionary's vocabulary. (You need to use lowercase letters only.) So the following will throw an error:" ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "scrolled": true }, "outputs": [ { "ename": "KeyError", "evalue": "'é'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mpin\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mspell\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"étui\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m/Users/allison/anaconda/lib/python3.6/site-packages/pincelate/__init__.py\u001b[0m in \u001b[0;36mspell\u001b[0;34m(self, phones, temperature)\u001b[0m\n\u001b[1;32m 130\u001b[0m \u001b[0mthis_feat\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mphone_feature_map\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mitem\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 131\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 132\u001b[0;31m \u001b[0mthis_feat\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mphone_feature_map\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mitem\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 133\u001b[0m \u001b[0mfeats\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mthis_feat\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 134\u001b[0m state_val = self.phon2orth.infer(\n", "\u001b[0;31mKeyError\u001b[0m: 'é'" ] } ], "source": [ "pin.spell(\"étui\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Spelling words from random phonemes\n", "\n", "Let's invent somewhat plausible neologisms by drawing phonemes at random from the list of Arpabet phonemes. (\"Neologism\" is a fancy word for \"made-up word.\") Here's a list of all of the phonemes in the CMU Pronouncing Dictionary, plus examples in use:\n", "\n", " Phoneme Example Translation\n", " ------- ------- -----------\n", " AA odd AA D\n", " AE at AE T\n", " AH hut HH AH T\n", " AO ought AO T\n", " AW cow K AW\n", " AY hide HH AY D\n", " B be B IY\n", " CH cheese CH IY Z\n", " D dee D IY\n", " DH thee DH IY\n", " EH Ed EH D\n", " ER hurt HH ER T\n", " EY ate EY T\n", " F fee F IY\n", " G green G R IY N\n", " HH he HH IY\n", " IH it IH T\n", " IY eat IY T\n", " JH gee JH IY\n", " K key K IY\n", " L lee L IY\n", " M me M IY\n", " N knee N IY\n", " NG ping P IH NG\n", " OW oat OW T\n", " OY toy T OY\n", " P pee P IY\n", " R read R IY D\n", " S sea S IY\n", " SH she SH IY\n", " T tea T IY\n", " TH theta TH EY T AH\n", " UH hood HH UH D\n", " UW two T UW\n", " V vee V IY\n", " W we W IY\n", " Y yield Y IY L D\n", " Z zee Z IY\n", " ZH seizure S IY ZH ER" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The cell below has a Python list containing all of these phonemes:" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [], "source": [ "all_phonemes = ['AH', 'N', 'S', 'IH', 'L', 'T', 'R', 'K', 'IY', 'D', 'M',\n", " 'ER', 'Z', 'EH', 'AA', 'AE', 'B', 'P', 'OW', 'F', 'EY',\n", " 'G', 'AO', 'AY', 'V', 'NG', 'UW', 'HH', 'W', 'SH', 'JH',\n", " 'Y', 'CH', 'AW', 'TH', 'UH', 'OY', 'DH', 'ZH']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And then this function will return a random neologism, created from phonemes drawn at random from that list:" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [], "source": [ "def neologism_phonemes():\n", " return [random.choice(all_phonemes) for item in range(random.randrange(3,10))]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's a handful, just to get a taste:" ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['ER', 'AH', 'AO', 'AH', 'DH', 'B', 'K', 'S']\n", "['JH', 'Y', 'OW', 'AA', 'R', 'Z', 'G', 'AY']\n", "['R', 'HH', 'OY', 'N', 'R', 'OY', 'EY', 'EH']\n", "['F', 'JH', 'NG']\n", "['CH', 'SH', 'DH', 'UH', 'AA', 'S', 'IH']\n" ] } ], "source": [ "for i in range(5):\n", " print(neologism_phonemes())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's all well and good! Try sounding out some of these on your own (consult the [Arpabet](https://en.wikipedia.org/wiki/ARPABET) table to find the English sound corresponding to each symbol).\n", "\n", "But how do you *spell* these neologisms? Why, with Pincelate's `.spell()` method of course:" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'appengts'" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pin.spell(neologism_phonemes())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's a for loop that generates neologisms and prints them along with their spellings:" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "anguinitaiz ['EY', 'NG', 'W', 'AA', 'IY', 'T', 'AY', 'JH', 'EY']\n", "mang ['M', 'AE', 'N', 'G']\n", "bninghich ['B', 'N', 'T', 'NG', 'AY', 'CH']\n", "gauth ['G', 'AA', 'TH']\n", "glojchivew ['G', 'F', 'JH', 'OY', 'CH', 'V', 'S', 'UW']\n", "odchden ['AA', 'CH', 'D', 'N']\n", "bsguing ['B', 'S', 'G']\n", "gjuth ['JH', 'Y', 'TH', 'W']\n", "enlplawen ['IY', 'N', 'L', 'W', 'P', 'L', 'AW', 'N']\n", "pubge's ['P', 'UW', 'B', 'JH', 'Z']\n", "jroshwok ['R', 'JH', 'SH', 'W', 'OW', 'HH', 'K']\n", "adiet ['AH', 'D', 'IY', 'T']\n" ] } ], "source": [ "for i in range(12):\n", " phonemes = neologism_phonemes()\n", " print(pin.spell(phonemes), phonemes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Phoneme features\n", "\n", "The examples above use the phoneme as the basic unit of English phonetics. But each phoneme itself has characteristics, and many phonemes have characteristics in common. For example, the phoneme `/B/` has the following characteristics:\n", "\n", "* *bilabial*: you put your lips together when you say it\n", "* *stop*: airflow from the lungs is completely obstructed\n", "* *voiced*: your vocal cords are vibrating while you say it\n", "\n", "The phoneme `/P/` shares two out of three of these characteristics (it's *bilabial* and a *stop*, but is not voiced). The phoneme `/AE/`, on the other hand, shares *none* of these characteristics. Instead, it has these characteristics:\n", "\n", "* *vowel*: your mouth doesn't stop or occlude airflow when making this sound\n", "* *low*: your tongue is low in the mouth\n", "* *front*: your tongue is advanced forward in the mouth\n", "* *unrounded*: your lips are not rounded\n", "\n", "These characteristics of phonemes are traditionally called \"features.\" You can look up the features for particular phonemes using the `phone_feature_map` variable in Pincelate's `featurephone` module:" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [], "source": [ "from pincelate.featurephone import phone_feature_map" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For example, to get the features for the vowel `/UW/` (vowel sound in \"toot\"):" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "('hgh', 'bck', 'rnd', 'vwl')" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "phone_feature_map['UW']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The features are referred to here with short three-letter abbreviations. Here's a full list:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* `alv`: alveolar\n", "* `apr`: approximant\n", "* `bck`: back\n", "* `blb`: bilabial\n", "* `cnt`: central\n", "* `dnt`: dental\n", "* `fnt`: front\n", "* `frc`: fricative\n", "* `glt`: glottal\n", "* `hgh`: high\n", "* `lat`: lateral\n", "* `lbd`: labiodental\n", "* `lbv`: labiovelar\n", "* `lmd`: low-mid\n", "* `low`: low\n", "* `mid`: mid\n", "* `nas`: nasal\n", "* `pal`: palatal\n", "* `pla`: palato-alveolar\n", "* `rnd`: rounded\n", "* `rzd`: rhoticized\n", "* `smh`: semi-high\n", "* `stp`: stop\n", "* `umd`: upper-mid\n", "* `unr`: unrounded\n", "* `vcd`: voiced\n", "* `vel`: velar\n", "* `vls`: voiceless\n", "* `vwl`: vowel\n", "\n", "Additionally, there are two special phoneme features:\n", "\n", "* `beg`: beginning of word\n", "* `end`: end of word\n", "\n", "... which are found and the beginnings and endings of words." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Internally, Pincelate's model operates on these *phoneme features*, instead of directly on whole phonemes. This allows the model to capture and predict underlying similarities between phonemes.\n", "\n", "Pincelate's `.phonemefeatures()` method works a lot like `.spell()`, except instead of returning a list of phonemes, it returns a [numpy](https://numpy.org/) array of *phoneme feature probabilities*. This array has one row for each predicted phoneme, and one column for the probability (between 0 and 1) of a phoneme feature being a component of each phoneme. To illustrate, here I get the feature array for the word `cat`:" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [], "source": [ "cat_feats = pin.phonemefeatures(\"cat\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This array has the following shape:" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(5, 32)" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cat_feats.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "... which tells us that there are five predicted phonemes. (The `32` is the total number of possible features.) The word `cat`, of course, has only three phonemes (`/K AE T/`)—the extra two are the special \"beginning of the word\" and \"end of the word\" phonemes at the beginning and end, respectively.\n", "\n", "### Examining predicted phoneme features\n", "\n", "Let's look at the feature probabilities for the first phoneme (after the special \"beginning of the word\" token at index 0):" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([6.42707571e-04, 2.13692928e-07, 6.62605757e-08, 5.43442347e-10,\n", " 5.47038814e-09, 7.04440527e-06, 1.58982238e-09, 1.66211791e-08,\n", " 3.81101599e-05, 8.24350354e-05, 1.62252746e-07, 5.46323768e-08,\n", " 1.41502560e-10, 5.33169420e-09, 7.31331828e-10, 2.70081146e-05,\n", " 1.83614669e-04, 1.62359720e-05, 2.74244065e-11, 1.44446346e-07,\n", " 3.33543511e-07, 1.91042790e-08, 3.52445828e-09, 4.54965146e-07,\n", " 9.99929667e-01, 7.26780854e-05, 8.35576885e-10, 2.66875286e-04,\n", " 1.75827936e-05, 9.99930263e-01, 9.99974251e-01, 1.87013138e-04])" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cat_feats[1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can look up the index in this array associated with a particular phoneme feature using Pincelate's `.featureidx()` method:" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.9999302625656128" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cat_feats[1][pin.featureidx('vel')]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This tells us that the `vel` (velar) feature for this phoneme is predicted with almost 100% probability—which makes sense, since the phoneme we'd anticipate—`/K/` is a voiceless velar stop." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following bit of code steps through each row in this array and prints out the phoneme features with the highest probability in that row, using numpy's `argsort` function:" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "phoneme 0\n", "beg 1.0\n", "vwl 0.0\n", "vls 0.0\n", "apr 0.0\n", "bck 0.0\n", "\n", "phoneme 1\n", "vls 0.999974250793457\n", "vel 0.9999302625656128\n", "stp 0.999929666519165\n", "alv 0.000642707571387291\n", "unr 0.00026687528588809073\n", "\n", "phoneme 2\n", "unr 0.9997866749763489\n", "vwl 0.9990422129631042\n", "str 0.9986899495124817\n", "fnt 0.9959463477134705\n", "low 0.9807271957397461\n", "\n", "phoneme 3\n", "vls 0.9993033409118652\n", "alv 0.9990631937980652\n", "stp 0.9904974102973938\n", "frc 0.0036416002549231052\n", "end 0.0013078120537102222\n", "\n", "phoneme 4\n", "end 0.9997904896736145\n", "fnt 0.0006787743768654764\n", "vwl 0.000589678471442312\n", "unr 0.0005248847301118076\n", "str 0.0003406509349588305\n", "\n" ] } ], "source": [ "def idxfeature(pin, idx):\n", " return pin.orth2phon.target_vocab[idx]\n", "for i, phon in enumerate(cat_feats):\n", " print(\"phoneme\", i)\n", " for idx in np.argsort(phon)[::-1][:5]:\n", " print(idxfeature(pin, idx), phon[idx])\n", " print()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll come back to a more complete example that shows how to *manipulate* these values below." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example: Resizing feature probability arrays\n", "\n", "Once you have the phonetic feature probability arrays, you can treat them the same way you'd treat any other numpy array. One thing I like to do is use scipy's image manipulation functions and use them resample the phonetic feature arrays. This lets us use the same phonetic information to spell a shorter or longer word. In particular, `scipy.ndimage.interpolation` has a handy [zoom](https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.ndimage.interpolation.zoom.html) function that resamples an array and interpolates it. Normally you'd use this to resize an image, but nothing's stopping us from using it to resize our phonetic feature array.\n", "\n", "First, import the function:" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [], "source": [ "from scipy.ndimage.interpolation import zoom" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then get some phoneme feature probabilities:" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [], "source": [ "feats = pin.phonemefeatures(\"alphabet\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then resize with `zoom()`. The second parameter to `zoom()` is a tuple with the factor by which to scale the dimensions of the incoming array. We only want to scale along the first axis (i.e., the phonemes), keeping the second axis (i.e., the features) constant.\n", "\n", "A shorter version of the word:" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'albe'" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "shorter = zoom(feats, (0.67, 1))\n", "pin.spellfeatures(shorter)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A longer version:" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'all-phafebet'" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "longer = zoom(feats, (2.0, 1))\n", "pin.spellfeatures(longer)" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/allison/anaconda/lib/python3.6/site-packages/scipy/ndimage/interpolation.py:605: UserWarning: From scipy 0.13.0, the output shape of zoom() is calculated with round() instead of int() - for these inputs the size of the returned array has changed.\n", " \"the returned array has changed.\", UserWarning)\n" ] }, { "data": { "text/plain": [ "\"theothis' ayes' ah ttestsed\"" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def stretch_words(s, factor=1.0):\n", " out = []\n", " for word in s.split():\n", " word = word.lower()\n", " vec = pin.phonemefeatures(word)\n", " if factor < 1.0:\n", " order = 3\n", " else:\n", " order = 0\n", " zoomed = zoom(vec, (factor, 1), order=order)\n", " out.append(pin.spellfeatures(zoomed))\n", " return \" \".join(out)\n", "stretch_words(\"this is a test\", factor=1.5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you've downloaded this notebook and you're following along running the code, the following cell will create an interactive widget that lets you \"stretch\" and \"shrink\" the words that you type into the text box by dragging the slider." ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "scrolled": true }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "84f4c92193d04e499fe04fa81be34489", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(Text(value='in the beginning was the notebook', description='words'), FloatSlider(value=…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import warnings\n", "warnings.filterwarnings('ignore')\n", "@interact(words=\"in the beginning was the notebook\", factor=(0.1, 4.0, 0.1))\n", "def stretchy(words, factor=1.0):\n", " print(stretch_words(words, factor))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Round-trip spelling manipulation\n", "\n", "Pincelate actually consists of *two* models: one that knows how to sound out words based on how they're spelled, and another that knows how to spell words from sounds. Pincelate's `.manipulate()` function does a \"round trip\" re-spelling of a word, passing it through both models to return back to the original word. Try it out:" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'poetic'" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pin.manipulate(\"poetic\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On the surface, this isn't very interesting! You don't need Pincelate to tell you how to spell a word that you already know how to spell. But the `.manipulate()` has a handful of parameters that allow you to mess around with the model's internal workings in fun and interesting ways. The first is the `temperature` parameter, which artificially increases or decreases the amount of randomness in the model's output probabilities.\n", "\n", "#### Spelling temperature\n", "\n", "When the temperature is close to zero, the model will always pick the most likely spelling of the word at each step." ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'poetic'" ] }, "execution_count": 80, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pin.manipulate(\"poetic\", temperature=0.01)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you increase the temperature to 1.0, the model starts picking values at random according to the underlying probabilities." ] }, { "cell_type": "code", "execution_count": 81, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "'poetick'" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pin.manipulate(\"poetic\", temperature=1.0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At temperatures above 1.0, the model has a higher chance of picking from letters with lower probabilities, producing a more unlikely spelling:" ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'poetike'" ] }, "execution_count": 82, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pin.manipulate(\"poetic\", temperature=1.5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At a high enough temperature, the model's spelling feels essentially random:" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'ppeetinh'" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pin.manipulate(\"poetic\", temperature=3.0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following interactive widget lets you play with the `temperature` parameter:" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "d326952a7e9d4ec2a725c58a0b37c438", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(Text(value='your text here', description='s'), FloatSlider(value=1.2500000000000002, des…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "@interact(s=\"your text here\", temp=(0.05, 2.5, 0.05))\n", "def tempadjust(s, temp):\n", " return ' '.join([pin.manipulate(w.lower(), temperature=temp) for w in s.split()])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Manipulating letter and phoneme frequencies\n", "\n", "The `manipulate` method can take two other parameters: `letters` and `features`. These are dictionaries that map letters or phonetic features to exponential multipliers. When Pincelate is spelling the word, it uses these multipliers to adjust the probability of the corresponding letters in the output. Somewhat unintuitively, positive values reduce the corresponding probability, while negative values increase the probability.\n", "\n", "Here's an example to make it more clear. First: respelling a word without the letter `e`:" ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'spilling'" ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pin.manipulate(\"spelling\", letters={'e': 10})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's do this for a set of randomly selected words from the noun list:" ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "carcass carcas\n", "nuisance nusancl\n", "stylus stylus\n", "humility humility\n", "firing firing\n", "councilman councilman\n", "sediment shadimant\n", "constable constably\n", "contentment contintmint\n", "inaction inaction\n" ] } ], "source": [ "for noun in random.sample(nouns, 10):\n", " print(noun, pin.manipulate(noun, letters={'e': 20}))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `features` parameter does the same thing, except it adjusts the probability of particular phoneme features at each step. For example, this makes words more nasal:" ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'smnenging'" ] }, "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pin.manipulate(\"spelling\", features={'nas': -10})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following code makes all of the vowels more rounded and further back in the mouth in a list of random nouns:" ] }, { "cell_type": "code", "execution_count": 88, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "vomiting vowautung\n", "ralph raulfough\n", "footing foutung\n", "rendition rundutuon\n", "mechanics mocaunux\n", "cabbage kaubauge\n", "cilantro solauntuo\n", "criminality cruminauloup\n", "lineage lunaug\n", "disobedience dusoubuluong\n" ] } ], "source": [ "for noun in random.sample(nouns, 10):\n", " print(noun, pin.manipulate(noun, features={'bck': -2, 'rnd': -5}))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Interactive manipulation tool\n", "\n", "The following cells make an interactive tool you can use to play around with manipulating temperature, letter probabilties and phoneme probabilities interactively." ] }, { "cell_type": "code", "execution_count": 89, "metadata": {}, "outputs": [], "source": [ "import ipywidgets as widgets\n", "from IPython.display import display\n", "from ipywidgets import interact, interactive_output, Layout, HBox, VBox" ] }, { "cell_type": "code", "execution_count": 90, "metadata": {}, "outputs": [], "source": [ "def manipulate(instr=\"allison\", temp=0.25, **kwargs):\n", " return ' '.join([\n", " pin.manipulate(\n", " w,\n", " letters={k: v*-1 for k, v in kwargs.items()\n", " if k in pin.orth2phon.src_vocab_idx_map.keys()},\n", " features={k: v*-1 for k, v in kwargs.items()\n", " if k in pin.orth2phon.target_vocab_idx_map.keys()},\n", " temperature=temp\n", " ) for w in instr.split()]\n", " )" ] }, { "cell_type": "code", "execution_count": 91, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "1c046a0035c841e8ac1f80f610faeea6", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(HBox(children=(VBox(children=(FloatSlider(value=0.0, continuous_update=False, description='$', …" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "27fb9511731e447dbc6705e1c85589f8", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Output(layout=Layout(height='100px'))" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "orth_sliders = {}\n", "phon_sliders = {}\n", "for ch in pin.orth2phon.src_vocab_idx_map.keys():\n", " if ch in \"'-.\": continue\n", " orth_sliders[ch] = widgets.FloatSlider(description=ch,\n", " continuous_update=False,\n", " value=0,\n", " min=-20,\n", " max=20,\n", " step=0.5,\n", " layout=Layout(height=\"10px\"))\n", "for feat in pin.orth2phon.target_vocab_idx_map.keys():\n", " if feat in (\"beg\", \"end\", \"cnt\", \"dnt\"): continue\n", " phon_sliders[feat] = widgets.FloatSlider(description=feat,\n", " continuous_update=False,\n", " value=0,\n", " min=-20,\n", " max=20,\n", " step=0.5,\n", " layout=Layout(height=\"10px\"))\n", "instr = widgets.Text(description='input', value=\"spelling words with machine learning\")\n", "tempslider = widgets.FloatSlider(description='temp', continuous_update=False, value=0.3, min=0.01, max=5, step=0.05)\n", "left_box = VBox(tuple(orth_sliders.values()) + (tempslider,))\n", "right_box = VBox(tuple(phon_sliders.values()))\n", "all_sliders = HBox([left_box, right_box])\n", "\n", "out = interactive_output(lambda *args, **kwargs: print(manipulate(*args, **kwargs)),\n", " dict(instr=instr, temp=tempslider, **orth_sliders, **phon_sliders))\n", "out.layout.height = \"100px\"\n", "display(VBox([all_sliders, instr]), out)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Phonetic states\n", "\n", "The Pincelate model also produces a \"hidden state,\" which is a single fixed-size vector that represents the total sound of a word. (You can think of this as a point on a Cartesian plane, where words with similar sounds are clustered next to each other.) To get the hidden state of a word, call the `.phonemestate()` method:" ] }, { "cell_type": "code", "execution_count": 92, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 7.95686364e-01, -8.32179904e-01, -1.32981718e+00, 7.25831270e-01,\n", " -2.64316416e+00, 1.57794631e+00, -1.49719226e+00, 2.60457993e+00,\n", " -3.31631720e-01, -6.20785542e-02, -1.07942343e+00, -9.35500801e-01,\n", " 1.13087571e+00, -2.40438804e-02, -3.28609198e-01, 2.97865009e+00,\n", " 5.29175103e-01, 1.03818035e+00, -1.86510909e+00, 1.05075657e+00,\n", " 1.13979602e+00, 2.85125399e+00, -6.54058456e-01, 5.91307104e-01,\n", " 4.18249458e-01, 4.07120883e-01, 2.90681601e-01, -2.21350479e+00,\n", " 6.69969380e-01, -6.35705888e-01, -1.40898752e+00, 1.23353994e+00,\n", " -4.64894950e-01, -5.61830521e-01, -2.65465081e-01, 6.93497515e+00,\n", " 2.54075122e+00, -3.86470616e-01, 7.37920403e-01, -2.52454400e-01,\n", " 1.13615263e+00, 1.07363796e+00, -3.24268669e-01, 2.30040264e+00,\n", " 1.46473849e+00, -2.06925702e+00, -1.03245997e+00, -1.25596628e-01,\n", " -1.65496230e+00, -4.91467148e-01, -5.36341250e-01, 4.08115983e-01,\n", " 1.84644151e+00, -1.96521312e-01, -9.94934380e-01, -1.75815284e-01,\n", " -1.07653344e+00, -3.44106033e-02, 2.51568604e+00, 4.28566813e-01,\n", " 4.42921072e-01, 1.39196253e+00, -1.56479609e+00, 3.04453349e+00,\n", " 2.39666963e+00, 8.14390123e-01, -2.70349789e+00, 1.15729785e+00,\n", " -7.88961649e-02, 2.75429010e-01, 6.31406188e-01, 1.58451569e+00,\n", " -1.55730531e-01, 2.57675266e+00, -1.86182892e+00, 1.68593317e-01,\n", " -1.95982814e+00, -7.32693970e-01, 7.66813755e-03, -5.66927716e-02,\n", " -4.79854643e-01, -1.47521091e+00, -3.14706206e+00, -1.85165763e-01,\n", " 1.51251328e+00, -3.31812084e-01, 3.72764409e-01, 1.87518907e+00,\n", " 7.84418583e-01, 5.91440462e-02, 2.49756783e-01, -6.65867984e-01,\n", " -2.45798969e+00, 2.43706182e-01, -1.74799120e+00, 6.31147289e+00,\n", " -2.21082544e+00, -6.17550135e-01, -1.05487323e+00, 1.32610798e+00,\n", " -1.96974850e+00, 6.00875989e-02, -7.77341351e-02, 3.41730595e-01,\n", " -3.29071307e+00, 1.91098666e+00, 2.74943769e-01, 2.36249596e-01,\n", " -7.78424263e-01, -1.48498321e+00, -1.75742328e-01, -2.70122141e-01,\n", " -7.82502234e-01, 1.02417684e+00, 1.33242464e+00, 8.82816672e-01,\n", " -9.57970083e-01, -1.86585039e-01, -8.48214865e-01, 1.15504694e+00,\n", " -1.22457647e+00, 2.49675870e-01, 1.96862161e+00, -3.13274860e-01,\n", " 2.70345712e+00, 1.11661434e+00, 1.75637615e+00, -3.24920726e+00,\n", " -1.31210089e+00, 7.51341939e-01, -4.61002064e+00, -1.79387522e+00,\n", " -2.13482738e-01, 1.16403735e+00, 6.09336972e-01, -1.19726789e+00,\n", " 6.51616156e-01, -1.64964771e+00, -1.07895292e-01, 1.17505085e+00,\n", " 1.00255024e+00, 2.09715486e+00, -2.84226799e+00, 3.04437727e-01,\n", " -8.29695046e-01, 1.77979434e+00, -3.90957534e-01, -1.63378143e+00,\n", " 1.43395996e+00, -4.61261392e-01, 4.31022048e-03, 2.70064235e-01,\n", " -2.65720755e-01, -1.66805908e-01, 7.00646102e-01, -3.77741992e-01,\n", " 8.39838505e-01, 1.02057767e+00, 1.36773157e+00, -5.73049784e-01,\n", " 3.41587991e-01, -8.69696915e-01, -5.50617874e-01, -7.18180537e-01,\n", " 5.41177392e-01, -7.49346852e-01, 1.33970344e+00, -1.03110039e+00,\n", " 6.94945455e-01, -1.95170224e-01, -1.03363979e+00, 2.98215580e+00,\n", " 3.45216870e-01, -2.18459392e+00, 2.91187835e+00, 9.79840875e-01,\n", " 3.20049500e+00, 7.04905629e-01, 1.92975909e-01, 7.36262500e-01,\n", " -3.34599018e-02, 1.89192712e+00, 8.96418840e-02, -1.41968474e-01,\n", " -1.46555102e+00, -1.18895268e+00, -1.25323486e+00, 5.48723757e-01,\n", " 1.16233110e+00, -3.77950400e-01, -2.00661182e+00, 3.27691698e+00,\n", " -1.96016419e+00, -2.57373786e+00, 1.35590124e+00, 3.65701348e-01,\n", " -3.07851863e+00, -1.65423945e-01, 1.09554805e-01, 4.22158629e-01,\n", " -4.81078625e-01, 1.02364518e-01, 1.48046303e+00, -1.36909890e+00,\n", " -9.12416160e-01, -2.13123873e-01, 1.57091486e+00, 1.03272748e+00,\n", " 3.81099284e-02, 3.83975387e-01, 2.15760851e+00, 6.17110789e-01,\n", " -5.82861066e-01, -1.10520041e+00, -8.93351912e-01, -4.44957986e-02,\n", " 1.46159840e+00, -1.04589856e+00, -1.55343175e+00, -1.07608688e+00,\n", " 1.22968698e+00, 8.79801631e-01, -1.39852309e+00, 4.19925094e-01,\n", " 1.06851876e+00, -1.04367542e+00, -2.36931384e-01, 2.73201913e-01,\n", " -6.20300889e-01, -1.98342371e+00, 1.82388949e+00, 1.52567357e-01,\n", " 1.38442791e+00, -1.00117397e+00, -6.62417471e-01, -3.00938010e+00,\n", " 1.61543345e+00, -1.76816809e+00, 2.49266005e+00, 4.57145870e-01,\n", " 7.06938148e-01, -1.41129887e+00, -7.02914178e-01, -5.97419918e-01,\n", " -1.53821719e+00, 2.86762547e+00, 1.05890915e-01, 6.03433847e-01,\n", " 9.90708888e-01, 2.16755056e+00, 8.05558801e-01, -3.54735470e+00,\n", " -2.80189663e-01, 3.64496589e+00, 2.59146929e-01, -1.99815035e+00],\n", " dtype=float32)" ] }, "execution_count": 92, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pin.phonemestate('abracadabra')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a big weird number (a 256-dimensional vector, to be specific) that doesn't seem meaningful on its own. But we can do some interesting things with it.\n", "\n", "#### Blending words\n", "\n", "We can manipulate this underlying representation in various ways and then spell the word resulting from that manipulation with the `.spellstate()` method. The following code phonetically blends two words:" ] }, { "cell_type": "code", "execution_count": 93, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'paceter'" ] }, "execution_count": 93, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def blend(a, b):\n", " factor = 0.5\n", " start = pin.phonemestate(a)\n", " end = pin.phonemestate(b)\n", " return pin.spellstate(((start*factor) + (end*(1-factor))))\n", "blend('paper', 'plastic')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following code picks ten random nouns and then shows the word between them:" ] }, { "cell_type": "code", "execution_count": 94, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "philosopher → alloiser → allies\n", "extinction → tichtion → tycoon\n", "variation → neawirting → networking\n", "glitter → gietter → headcount\n", "corpus → torsales → translation\n", "weariness → fraising → phrasing\n", "cheesecake → contecater → contractor\n", "youngster → linguitor → liquidation\n", "standpoint → sandipite → publicity\n", "atheism → geinthism → greens\n" ] } ], "source": [ "for i in range(10):\n", " worda = random.choice(nouns)\n", " wordb = random.choice(nouns)\n", " print(worda, \" → \", blend(worda, wordb), \" → \", wordb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Variants with noise\n", "\n", "> \"[M]agic spells come in a wide variety [...]. [W]hat seems to be most important is the sound, which is often based on alliterations and homophones. The use of sounds prompts a series of variations on a single word, such as, \"festella, festelle, festelle festelli festello festello, festella festellum,\" used to banish all kinds of fistulas.\" (Lecouteux xix)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can create variants of existing words by adding randomly generated noise to the phoneme state vector. For example: " ] }, { "cell_type": "code", "execution_count": 96, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'a-prakanbabaragh'" ] }, "execution_count": 96, "metadata": {}, "output_type": "execute_result" } ], "source": [ "state = pin.phonemestate(\"abracadabra\")\n", "pin.spellstate(state + np.random.randn(256))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This function lets you control the amount of noise to add to the specified word, and spells from the resulting vector:" ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'alison'" ] }, "execution_count": 97, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def noisy(word, factor=1.0):\n", " state = pin.phonemestate(word) + np.random.randn(256) * factor\n", " return pin.spellstate(state)\n", "noisy(\"allison\", 0.5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Adding progressively more noise:" ] }, { "cell_type": "code", "execution_count": 101, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.50 abrocadable\n", "0.60 abrakadabara\n", "0.70 abracadabra\n", "0.80 arkrakadorab\n", "0.90 abricadabana\n", "1.00 abrocadra\n", "1.10 mbrimidaga\n", "1.20 ammracibadiam\n", "1.30 ab-abaganberrieg\n", "1.40 apcricabada\n", "1.50 hahrhachadagrada\n", "1.60 e\n", "1.70 zbrccracherb\n", "1.80 adradadadrada\n", "1.90 abbbb\n", "2.00 jjqteepadagedavad\n", "2.10 hhhhh\n", "2.20 eegekwywywbyb-byg-b\n", "2.30 ggpradabadarpa\n", "2.40 vqmrttctttritttt-bj\n" ] } ], "source": [ "for i in range(5, 25):\n", " factor = i * 0.1\n", " print(\"%0.02f\" % factor, noisy(\"abracadabra\", factor))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bibliography\n", "\n", "Crain, Patricia. *The Story of A: The Alphabetization of America from the New England Primer to the Scarlet Letter*. Stanford University Press, 2000.\n", "\n", "Eco, Umberto. *The Search for the Perfect Language*. Blackwell, 1997.\n", "\n", "Jepsen, Carly R. “The sound.” *Dedicated*. By Jepsen, Carly R., et al, 2019. Digital release.\n", "\n", "Lecouteux, Claude. *Dictionary of Ancient Magic Words and Spells: From Abraxas to Zoar.* First U.S. edition, Inner Traditions, 2015.\n", "\n", "Leggott, Michele J. *Reading Zukofsky’s 80 Flowers*. Johns Hopkins University Press, 1989.\n", "\n", "Richardson, Ernest Cushing, and Bernhard Pick, editors. *The Ante-Nicene Fathers: Translations of the Writings of the Fathers down to A.D. 325.* C. Scribner’s sons, 1905.\n", "\n", "Skemer, Don C. *Binding Words: Textual Amulets in the Middle Ages.* Penn State Press, 2010." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" } }, "nbformat": 4, "nbformat_minor": 2 }