{ "cells": [ { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "# Spelling Bee" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "This notebook starts our deep dive (no pun intended) into NLP by introducing sequence-to-sequence learning on Spelling Bee." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "heading_collapsed": true }, "source": [ "## Data Stuff\n", "\n", "We take our data set from [The CMU pronouncing dictionary](https://en.wikipedia.org/wiki/CMU_Pronouncing_Dictionary)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Using TensorFlow backend.\n", "/home/jhoward/anaconda3/lib/python3.6/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.\n", " \"This module will be removed in 0.20.\", DeprecationWarning)\n" ] } ], "source": [ "%matplotlib inline\n", "import importlib\n", "import utils2; importlib.reload(utils2)\n", "from utils2 import *\n", "np.set_printoptions(4)\n", "PATH = 'data/spellbee/'" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true, "deletable": true, "editable": true, "hidden": true }, "outputs": [], "source": [ "limit_mem()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true, "deletable": true, "editable": true, "hidden": true }, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "hidden": true }, "source": [ "The CMU pronouncing dictionary consists of sounds/words and their corresponding phonetic description (American pronunciation).\n", "\n", "The phonetic descriptions are a sequence of phonemes. Note that the vowels end with integers; these indicate where the stress is.\n", "\n", "Our goal is to learn how to spell these words given the sequence of phonemes." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "hidden": true }, "source": [ "The preparation of this data set follows the same pattern we've seen before for NLP tasks.\n", "\n", "Here we iterate through each line of the file and grab each word/phoneme pair that starts with an uppercase letter. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true }, "outputs": [ { "data": { "text/plain": [ "(('A', ['AH0']), ('ZYWICKI', ['Z', 'IH0', 'W', 'IH1', 'K', 'IY0']))" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lines = [l.strip().split(\" \") for l in open(PATH+\"cmudict-0.7b\", encoding='latin1') \n", " if re.match('^[A-Z]', l)]\n", "lines = [(w, ps.split()) for w, ps in lines]\n", "lines[0], lines[-1]" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "hidden": true }, "source": [ "Next we're going to get a list of the unique phonemes in our vocabulary, as well as add a null \"_\" for zero-padding." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true, "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "['_', 'AA0', 'AA1', 'AA2', 'AE0']" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "phonemes = [\"_\"] + sorted(set(p for w, ps in lines for p in ps))\n", "phonemes[:5]" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true }, "outputs": [ { "data": { "text/plain": [ "70" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(phonemes)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "hidden": true }, "source": [ "Then we create mappings of phonemes and letters to respective indices.\n", "\n", "Our letters include the padding element \"_\", but also \"*\" which we'll explain later." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true }, "outputs": [], "source": [ "p2i = dict((v, k) for k,v in enumerate(phonemes))\n", "letters = \"_abcdefghijklmnopqrstuvwxyz*\"\n", "l2i = dict((v, k) for k,v in enumerate(letters))" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "hidden": true }, "source": [ "Let's create a dictionary mapping words to the sequence of indices corresponding to it's phonemes, and let's do it only for words between 5 and 15 characters long." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true }, "outputs": [ { "data": { "text/plain": [ "108006" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "maxlen=15\n", "pronounce_dict = {w.lower(): [p2i[p] for p in ps] for w, ps in lines\n", " if (5<=len(w)<=maxlen) and re.match(\"^[A-Z]+$\", w)}\n", "len(pronounce_dict)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "hidden": true }, "source": [ "Aside on various approaches to python's list comprehension:\n", "* the first list is a typical example of a list comprehension subject to a conditional\n", "* the second is a list comprehension inside a list comprehension, which returns a list of list\n", "* the third is similar to the second, but is read and behaves like a nested loop\n", " * Since there is no inner bracket, there are no lists wrapping the inner loop" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true }, "outputs": [ { "data": { "text/plain": [ "(['XYZ'], [['x', 'y', 'z'], ['a', 'b', 'c']], ['x', 'y', 'z', 'a', 'b', 'c'])" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a=['xyz','abc']\n", "[o.upper() for o in a if o[0]=='x'], [[p for p in o] for o in a], [p for o in a for p in o]" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "hidden": true }, "source": [ "Split lines into words, phonemes, convert to indexes (with padding), split into training, validation, test sets. Note we also find the max phoneme sequence length for padding." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true }, "outputs": [], "source": [ "maxlen_p = max([len(v) for k,v in pronounce_dict.items()])" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true }, "outputs": [], "source": [ "pairs = np.random.permutation(list(pronounce_dict.keys()))\n", "n = len(pairs)\n", "input_ = np.zeros((n, maxlen_p), np.int32)\n", "labels_ = np.zeros((n, maxlen), np.int32)\n", "\n", "for i, k in enumerate(pairs):\n", " for j, p in enumerate(pronounce_dict[k]): input_[i][j] = p\n", " for j, letter in enumerate(k): labels_[i][j] = l2i[letter]" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true }, "outputs": [], "source": [ "go_token = l2i[\"*\"]\n", "dec_input_ = np.concatenate([np.ones((n,1)) * go_token, labels_[:,:-1]], axis=1)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "hidden": true }, "source": [ "Sklearn's train_test_split is an easy way to split data into training and testing sets." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": true, "deletable": true, "editable": true, "hidden": true }, "outputs": [], "source": [ "(input_train, input_test, labels_train, labels_test, dec_input_train, dec_input_test\n", " ) = train_test_split(input_, labels_, dec_input_, test_size=0.1)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true }, "outputs": [ { "data": { "text/plain": [ "(97205, 16)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "input_train.shape" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true }, "outputs": [ { "data": { "text/plain": [ "(97205, 15)" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "labels_train.shape" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true }, "outputs": [ { "data": { "text/plain": [ "(70, 28)" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "input_vocab_size, output_vocab_size = len(phonemes), len(letters)\n", "input_vocab_size, output_vocab_size" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "hidden": true }, "source": [ "Next we proceed to build our model." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## Keras code" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "parms = {'verbose': 0, 'callbacks': [TQDMNotebookCallback(leave_inner=True)]}\n", "lstm_params = {}" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "dim = 240" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "### Without attention" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "def get_rnn(return_sequences= True): \n", " return LSTM(dim, dropout_U= 0.1, dropout_W= 0.1, \n", " consume_less= 'gpu', return_sequences=return_sequences)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "The model has three parts:\n", "* We first pass list of phonemes through an embedding function to get a list of phoneme embeddings. Our goal is to turn this sequence of embeddings into a single distributed representation that captures what our phonemes say.\n", "* Turning a sequence into a representation can be done using an RNN. This approach is useful because RNN's are able to keep track of state and memory, which is obviously important in forming a complete understanding of a pronunciation.\n", " * BiDirectional passes the original sequence through an RNN, and the reversed sequence through a different RNN and concatenates the results. This allows us to look forward and backwards.\n", " * We do this because in language things that happen later often influence what came before (i.e. in Spanish, \"el chico, la chica\" means the boy, the girl; the word for \"the\" is determined by the gender of the subject, which comes after).\n", "* Finally, we arrive at a vector representation of the sequence which captures everything we need to spell it. We feed this vector into more RNN's, which are trying to generate the labels. After this, we make a classification for what each letter is in the output sequence.\n", " * We use RepeatVector to help our RNN remember at each point what the original word is that it's trying to translate.\n", " \n" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "inp = Input((maxlen_p,))\n", "x = Embedding(input_vocab_size, 120)(inp)\n", "\n", "x = Bidirectional(get_rnn())(x)\n", "x = get_rnn(False)(x)\n", "\n", "x = RepeatVector(maxlen)(x)\n", "x = get_rnn()(x)\n", "x = get_rnn()(x)\n", "x = TimeDistributed(Dense(output_vocab_size, activation='softmax'))(x)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "We can refer to the parts of the model before and after get_rnn(False) returns a vector as the encoder and decoder. The encoder has taken a sequence of embeddings and encoded it into a numerical vector that completely describes it's input, while the decoder transforms that vector into a new sequence.\n", "\n", "Now we can fit our model" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "model = Model(inp, x)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "model.compile(Adam(), 'sparse_categorical_crossentropy', metrics=['acc'])" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [ { "ename": "KeyboardInterrupt", "evalue": "", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m hist=model.fit(input_train, np.expand_dims(labels_train,-1), \n\u001b[1;32m 2\u001b[0m \u001b[0mvalidation_data\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0minput_test\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexpand_dims\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlabels_test\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m batch_size=64, **parms, nb_epoch=3)\n\u001b[0m", "\u001b[0;32m/home/jhoward/anaconda3/lib/python3.6/site-packages/keras/engine/training.py\u001b[0m in \u001b[0;36mfit\u001b[0;34m(self, x, y, batch_size, nb_epoch, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch)\u001b[0m\n\u001b[1;32m 1194\u001b[0m \u001b[0mval_f\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mval_f\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mval_ins\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mval_ins\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mshuffle\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mshuffle\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1195\u001b[0m \u001b[0mcallback_metrics\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mcallback_metrics\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1196\u001b[0;31m initial_epoch=initial_epoch)\n\u001b[0m\u001b[1;32m 1197\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1198\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mevaluate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbatch_size\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m32\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mverbose\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msample_weight\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/home/jhoward/anaconda3/lib/python3.6/site-packages/keras/engine/training.py\u001b[0m in \u001b[0;36m_fit_loop\u001b[0;34m(self, f, ins, out_labels, batch_size, nb_epoch, verbose, callbacks, val_f, val_ins, shuffle, callback_metrics, initial_epoch)\u001b[0m\n\u001b[1;32m 889\u001b[0m \u001b[0mbatch_logs\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'size'\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mbatch_ids\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 890\u001b[0m \u001b[0mcallbacks\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mon_batch_begin\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mbatch_index\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbatch_logs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 891\u001b[0;31m \u001b[0mouts\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mins_batch\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 892\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mouts\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlist\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 893\u001b[0m \u001b[0mouts\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0mouts\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/home/jhoward/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, inputs)\u001b[0m\n\u001b[1;32m 1941\u001b[0m \u001b[0msession\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_session\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1942\u001b[0m updated = session.run(self.outputs + [self.updates_op],\n\u001b[0;32m-> 1943\u001b[0;31m feed_dict=feed_dict)\n\u001b[0m\u001b[1;32m 1944\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mupdated\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0moutputs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1945\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/home/jhoward/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py\u001b[0m in \u001b[0;36mrun\u001b[0;34m(self, fetches, feed_dict, options, run_metadata)\u001b[0m\n\u001b[1;32m 765\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 766\u001b[0m result = self._run(None, fetches, feed_dict, options_ptr,\n\u001b[0;32m--> 767\u001b[0;31m run_metadata_ptr)\n\u001b[0m\u001b[1;32m 768\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mrun_metadata\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 769\u001b[0m \u001b[0mproto_data\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtf_session\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mTF_GetBuffer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrun_metadata_ptr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/home/jhoward/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py\u001b[0m in \u001b[0;36m_run\u001b[0;34m(self, handle, fetches, feed_dict, options, run_metadata)\u001b[0m\n\u001b[1;32m 963\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mfinal_fetches\u001b[0m \u001b[0;32mor\u001b[0m \u001b[0mfinal_targets\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 964\u001b[0m results = self._do_run(handle, final_targets, final_fetches,\n\u001b[0;32m--> 965\u001b[0;31m feed_dict_string, options, run_metadata)\n\u001b[0m\u001b[1;32m 966\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 967\u001b[0m \u001b[0mresults\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/home/jhoward/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py\u001b[0m in \u001b[0;36m_do_run\u001b[0;34m(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)\u001b[0m\n\u001b[1;32m 1013\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mhandle\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1014\u001b[0m return self._do_call(_run_fn, self._session, feed_dict, fetch_list,\n\u001b[0;32m-> 1015\u001b[0;31m target_list, options, run_metadata)\n\u001b[0m\u001b[1;32m 1016\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1017\u001b[0m return self._do_call(_prun_fn, self._session, handle, feed_dict,\n", "\u001b[0;32m/home/jhoward/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py\u001b[0m in \u001b[0;36m_do_call\u001b[0;34m(self, fn, *args)\u001b[0m\n\u001b[1;32m 1020\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_do_call\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfn\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1021\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1022\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mfn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1023\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0merrors\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mOpError\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1024\u001b[0m \u001b[0mmessage\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcompat\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mas_text\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0me\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmessage\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/home/jhoward/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py\u001b[0m in \u001b[0;36m_run_fn\u001b[0;34m(session, feed_dict, fetch_list, target_list, options, run_metadata)\u001b[0m\n\u001b[1;32m 1002\u001b[0m return tf_session.TF_Run(session, options,\n\u001b[1;32m 1003\u001b[0m \u001b[0mfeed_dict\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfetch_list\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtarget_list\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1004\u001b[0;31m status, run_metadata)\n\u001b[0m\u001b[1;32m 1005\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1006\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_prun_fn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msession\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mhandle\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfeed_dict\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfetch_list\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mKeyboardInterrupt\u001b[0m: " ] } ], "source": [ "hist=model.fit(input_train, np.expand_dims(labels_train,-1), \n", " validation_data=[input_test, np.expand_dims(labels_test,-1)], \n", " batch_size=64, **parms, nb_epoch=3)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "deletable": true, "editable": true, "scrolled": true }, "outputs": [], "source": [ "hist.history['val_loss']" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "To evaluate, we don't want to know what percentage of letters are correct but what percentage of words are." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "def eval_keras(input):\n", " preds = model.predict(input, batch_size=128)\n", " predict = np.argmax(preds, axis = 2)\n", " return (np.mean([all(real==p) for real, p in zip(labels_test, predict)]), predict)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "The accuracy isn't great." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "acc, preds = eval_keras(input_test); acc" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "def print_examples(preds):\n", " print(\"pronunciation\".ljust(40), \"real spelling\".ljust(17), \n", " \"model spelling\".ljust(17), \"is correct\")\n", "\n", " for index in range(20):\n", " ps = \"-\".join([phonemes[p] for p in input_test[index]]) \n", " real = [letters[l] for l in labels_test[index]] \n", " predict = [letters[l] for l in preds[index]]\n", " print (ps.split(\"-_\")[0].ljust(40), \"\".join(real).split(\"_\")[0].ljust(17),\n", " \"\".join(predict).split(\"_\")[0].ljust(17), str(real == predict))" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "We can see that sometimes the mistakes are completely reasonable, occasionally they're totally off. This tends to happen with the longer words that have large phoneme sequences.\n", "\n", "That's understandable; we'd expect larger sequences to lose more information in an encoding." ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "pronunciation real spelling model spelling is correct\n", "OW1-SH-AA0-F oshaf ossaf False\n", "R-IH1-NG-K-AH0-L-D wrinkled rinkkld False\n", "D-IH0-T-EH1-K-SH-AH0-N detection detection True\n", "Y-AE1-S-IH0-N yassin yasin False\n", "AA1-P-HH-AY2-M opheim ophime False\n", "JH-AH1-S-K-OW0 jusco jusko False\n", "B-L-UW1-P-EH2-N-S-AH0-L-IH0-NG bluepencilling bluopensiinn False\n", "L-AE1-R-OW0 laroe larro False\n", "L-AO1-D-AH0-T-AO2-R-IY0 laudatory laudttrr False\n", "P-ER2-T-ER0-B-EY1-SH-AH0-N-Z perturbations perterbations False\n", "EH1-V-AH0-D-AH0-N-S-AH0-Z evidences evedences False\n", "M-AH0-K-AA1-N-V-IH2-L mcconville mcconvilll False\n", "F-AE1-S-T-S fasts fasts True\n", "M-IH0-S-D-IH0-R-EH1-K-T misdirect misderect False\n", "M-IH0-Z-UW1-L-AH0 missoula misula False\n", "K-L-IH1-R-Z clears cliars False\n", "P-R-AA1-S-T-AH0-T-UW2-T-S prostitutes prostitutes True\n", "R-EY1-Z-AH0-N rayson raion False\n", "S-T-R-OW1-L-D strolled strolldd False\n", "S-T-EY1-OW2-V-ER0-Z stayovers stayevers False\n" ] } ], "source": [ "print_examples(preds)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "heading_collapsed": true }, "source": [ "### Attention model" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "hidden": true }, "source": [ "This graph demonstrates the accuracy decay for a nueral translation task. With an encoding/decoding technique, larger input sequences result in less accuracy.\n", "\n", "\n", "\n", "This can be mitigated using an attentional model." ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true }, "outputs": [], "source": [ "import attention_wrapper; importlib.reload(attention_wrapper)\n", "from attention_wrapper import Attention" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "hidden": true }, "source": [ "The attentional model doesn't encode into a single vector, but rather a sequence of vectors. The decoder then at every point is passing through this sequence. For example, after the bi-directional RNN we have 16 vectors corresponding to each phoneme's output state. Each output state describes how each phoneme relates between the other phonemes before and after it. After going through more RNN's, our goal is to transform this sequence into a vector of length 15 so we can classify into characters. \n", "\n", "A smart way to take a weighted average of the 16 vectors for each of the 15 outputs, where each set of weights is unique to the output. For example, if character 1 only needs information from the first phoneme vector, that weight might be 1 and the others 0; if it needed information from the 1st and 2nd equally, those two might be 0.5 each.\n", "\n", "The weights for combining all the input states to produce specific outputs can be learned using an attentional model; we update the weights using SGD, and train it jointly with the encoder/decoder. Once we have the outputs, we can classify the character using softmax as usual." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "hidden": true }, "source": [ "Notice below we do not have an RNN that returns a flat vector as we did before; we have a sequence of vectors as desired. We can then pass a sequence of encoded states into the our custom Attention model.\n", "\n", "This attention model also uses a technique called teacher forcing; in addition to passing the encoded hidden state, we also pass the correct answer for the previous time period. We give this information to the model because it makes it easier to train. In the beginning of training, the model will get most things wrong, and if your earlier character predictions are wrong then your later ones will likely be as well. Teacher forcing allows the model to still learn how to predict later characters, even if the earlier characters were all wrong." ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true, "scrolled": false }, "outputs": [], "source": [ "inp = Input((maxlen_p,))\n", "inp_dec = Input((maxlen,))\n", "emb_dec = Embedding(output_vocab_size, 120)(inp_dec)\n", "emb_dec = Dense(dim)(emb_dec)\n", "\n", "x = Embedding(input_vocab_size, 120)(inp)\n", "x = Bidirectional(get_rnn())(x)\n", "x = get_rnn()(x)\n", "x = get_rnn()(x)\n", "x = Attention(get_rnn, 3)([x, emb_dec])\n", "x = TimeDistributed(Dense(output_vocab_size, activation='softmax'))(x)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "hidden": true }, "source": [ "We can now train, passing in the decoder inputs as well for teacher forcing." ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true }, "outputs": [], "source": [ "model = Model([inp, inp_dec], x)\n", "model.compile(Adam(), 'sparse_categorical_crossentropy', metrics=['acc'])" ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\n" ] } ], "source": [ "hist=model.fit([input_train, dec_input_train], np.expand_dims(labels_train,-1), \n", " validation_data=[[input_test, dec_input_test], np.expand_dims(labels_test,-1)], \n", " batch_size=64, **parms, nb_epoch=3)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true, "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "[1.1220142268966762,\n", " 0.8048337527904541,\n", " 0.5265462693931019,\n", " 0.31194811385601234,\n", " 0.23548489567190725,\n", " 0.22613589595439379,\n", " 0.19347934096944011,\n", " 0.19403484622484754,\n", " 0.18044625614216103,\n", " 0.17990882804400168]" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hist.history['val_loss']" ] }, { "cell_type": "code", "execution_count": 998, "metadata": { "collapsed": true, "deletable": true, "editable": true, "hidden": true }, "outputs": [], "source": [ "K.set_value(model.optimizer.lr, 1e-4)" ] }, { "cell_type": "code", "execution_count": 999, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "hist=model.fit([input_train, dec_input_train], np.expand_dims(labels_train,-1), \n", " validation_data=[[input_test, dec_input_test], np.expand_dims(labels_test,-1)], \n", " batch_size=64, **parms, nb_epoch=5)" ] }, { "cell_type": "code", "execution_count": 1000, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true }, "outputs": [ { "data": { "text/plain": [ "array([ 0.1591, 0.1563, 0.1532, 0.1517, 0.1499])" ] }, "execution_count": 1000, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.array(hist.history['val_loss'])" ] }, { "cell_type": "code", "execution_count": 1001, "metadata": { "collapsed": true, "deletable": true, "editable": true, "hidden": true }, "outputs": [], "source": [ "def eval_keras():\n", " preds = model.predict([input_test, dec_input_test], batch_size=128)\n", " predict = np.argmax(preds, axis = 2)\n", " return (np.mean([all(real==p) for real, p in zip(labels_test, predict)]), predict)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "hidden": true }, "source": [ "Better accuracy!" ] }, { "cell_type": "code", "execution_count": 895, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true }, "outputs": [ { "data": { "text/plain": [ "0.51134154244977315" ] }, "execution_count": 895, "metadata": {}, "output_type": "execute_result" } ], "source": [ "acc, preds = eval_keras(); acc" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "hidden": true }, "source": [ "This model is certainly performing better with longer words. The mistakes it's making are reasonable, and it even succesfully formed the word \"partisanship\"." ] }, { "cell_type": "code", "execution_count": 896, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "pronunciation real spelling model spelling is correct\n", "D-AY0-AE1-F-AH0-N-IH0-S diaphanous diaphoneus False\n", "T-IH1-R-IY0 teary tiary False\n", "M-IH1-T-AH0-N mittan mitton False\n", "P-AA1-R-N-AH0-S parness parnass False\n", "SH-UW1-ER0-M-AH0-N schuermann schuerman False\n", "P-AA1-R-T-AH0-Z-AH0-N-SH-IH2-P partisanship partisanship True\n", "AE1-SH-IH0-Z ashes ashes True\n", "S-P-IY1-K speak speek False\n", "K-AO1-R-B-IH0-T corbitt corbit False\n", "B-IH1-L-D-ER0-Z builders biilders False\n", "EH0-S-P-R-IY1 esprit espree False\n", "EH0-NG-G-EH0-L-B-EH1-R-T-AH0 engelberta engelberta True\n", "AE0-N-T-AY2-P-ER0-S-AH0-N-EH1-L antipersonnel antipersoneal False\n", "W-IH1-M-Z whims whimm False\n", "AO2-L-T-AH0-G-EH1-DH-ER0 altogether altagether False\n", "AE1-K-S-AH0-L-Z axles axees False\n", "D-UW1-S doose doose True\n", "B-R-AH0-Z-IH1-L-Y-AH0-N brazilian brazilian True\n", "G-R-EH1-N-Z grenz grenz True\n", "B-AH1-K-HH-AE2-N-T-S buckhantz buckhants False\n" ] } ], "source": [ "print(\"pronunciation\".ljust(40), \"real spelling\".ljust(17), \n", " \"model spelling\".ljust(17), \"is correct\")\n", "\n", "for index in range(20):\n", " ps = \"-\".join([phonemes[p] for p in input_test[index]]) \n", " real = [letters[l] for l in labels_test[index]] \n", " predict = [letters[l] for l in preds[index]]\n", " print (ps.split(\"-_\")[0].ljust(40), \"\".join(real).split(\"_\")[0].ljust(17),\n", " \"\".join(predict).split(\"_\")[0].ljust(17), str(real == predict))" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "deletable": true, "editable": true, "heading_collapsed": true }, "source": [ "## Test code for the attention layer" ] }, { "cell_type": "code", "execution_count": 301, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true }, "outputs": [], "source": [ "nb_samples, nb_time, input_dim, output_dim = (64, 4, 32, 48)" ] }, { "cell_type": "code", "execution_count": 302, "metadata": { "collapsed": true, "deletable": true, "editable": true, "hidden": true }, "outputs": [], "source": [ "x = tf.placeholder(np.float32, (nb_samples, nb_time, input_dim))" ] }, { "cell_type": "code", "execution_count": 303, "metadata": { "collapsed": true, "deletable": true, "editable": true, "hidden": true }, "outputs": [], "source": [ "xr = K.reshape(x,(-1,nb_time,1,input_dim))" ] }, { "cell_type": "code", "execution_count": 304, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true }, "outputs": [ { "data": { "text/plain": [ "TensorShape([Dimension(32), Dimension(32)])" ] }, "execution_count": 304, "metadata": {}, "output_type": "execute_result" } ], "source": [ "W1 = tf.placeholder(np.float32, (input_dim, input_dim)); W1.shape" ] }, { "cell_type": "code", "execution_count": 305, "metadata": { "collapsed": true, "deletable": true, "editable": true, "hidden": true }, "outputs": [], "source": [ "W1r = K.reshape(W1, (1, input_dim, input_dim))" ] }, { "cell_type": "code", "execution_count": 306, "metadata": { "collapsed": true, "deletable": true, "editable": true, "hidden": true }, "outputs": [], "source": [ "W1r2 = K.reshape(W1, (1, 1, input_dim, input_dim))" ] }, { "cell_type": "code", "execution_count": 307, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true }, "outputs": [ { "data": { "text/plain": [ "TensorShape([Dimension(64), Dimension(4), Dimension(32)])" ] }, "execution_count": 307, "metadata": {}, "output_type": "execute_result" } ], "source": [ "xW1 = K.conv1d(x,W1r,border_mode='same'); xW1.shape" ] }, { "cell_type": "code", "execution_count": 308, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true }, "outputs": [ { "data": { "text/plain": [ "TensorShape([Dimension(64), Dimension(4), Dimension(1), Dimension(32)])" ] }, "execution_count": 308, "metadata": {}, "output_type": "execute_result" } ], "source": [ "xW12 = K.conv2d(xr,W1r2,border_mode='same'); xW12.shape" ] }, { "cell_type": "code", "execution_count": 251, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true }, "outputs": [], "source": [ "xW2 = K.dot(x, W1)" ] }, { "cell_type": "code", "execution_count": 245, "metadata": { "collapsed": true, "deletable": true, "editable": true, "hidden": true }, "outputs": [], "source": [ "x1 = np.random.normal(size=(nb_samples, nb_time, input_dim))" ] }, { "cell_type": "code", "execution_count": 246, "metadata": { "collapsed": true, "deletable": true, "editable": true, "hidden": true }, "outputs": [], "source": [ "w1 = np.random.normal(size=(input_dim, input_dim))" ] }, { "cell_type": "code", "execution_count": 248, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true }, "outputs": [], "source": [ "res = sess.run(xW1, {x:x1, W1:w1})" ] }, { "cell_type": "code", "execution_count": 252, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true }, "outputs": [], "source": [ "res2 = sess.run(xW2, {x:x1, W1:w1})" ] }, { "cell_type": "code", "execution_count": 253, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 253, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.allclose(res, res2)" ] }, { "cell_type": "code", "execution_count": 283, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true }, "outputs": [ { "data": { "text/plain": [ "TensorShape([Dimension(48), Dimension(32)])" ] }, "execution_count": 283, "metadata": {}, "output_type": "execute_result" } ], "source": [ "W2 = tf.placeholder(np.float32, (output_dim, input_dim)); W2.shape" ] }, { "cell_type": "code", "execution_count": 295, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true }, "outputs": [], "source": [ "h = tf.placeholder(np.float32, (nb_samples, output_dim))" ] }, { "cell_type": "code", "execution_count": 296, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true }, "outputs": [ { "data": { "text/plain": [ "TensorShape([Dimension(64), Dimension(32)])" ] }, "execution_count": 296, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hW2 = K.dot(h,W2); hW2.shape" ] }, { "cell_type": "code", "execution_count": 297, "metadata": { "collapsed": false, "deletable": true, "editable": true, "hidden": true }, "outputs": [ { "data": { "text/plain": [ "TensorShape([Dimension(64), Dimension(1), Dimension(1), Dimension(32)])" ] }, "execution_count": 297, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hW2 = K.reshape(hW2,(-1,1,1,input_dim)); hW2.shape" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" } }, "nbformat": 4, "nbformat_minor": 0 }