{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "**Important: This notebook will only work with fastai-0.7.x. Do not try to run any fastai-1.x code from this path in the repository because it will load fastai-0.7.x**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "%reload_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Please note that this notebook is most likely going to cause a stuck process. So if you are going to run it, please make sure to restart your jupyter notebook as soon as you completed running it.\n", "\n", "The bug happens inside the `fastText` library, which we have no control over. You can check the status of this issue: [here](https://github.com/fastai/fastai/issues/754) and [here](https://github.com/facebookresearch/fastText/issues/618#issuecomment-419554225).\n", "\n", "For the future, note that there're 3 separate implementations of fasttext, perhaps one of them works:\n", "https://github.com/facebookresearch/fastText/tree/master/python\n", "https://pypi.org/project/fasttext/\n", "https://radimrehurek.com/gensim/models/fasttext.html#module-gensim.models.fasttext" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Translation files" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from fastai.text import *" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "French/English parallel texts from http://www.statmt.org/wmt15/translation-task.html . It was created by Chris Callison-Burch, who crawled millions of web pages and then used *a set of simple heuristics to transform French URLs onto English URLs (i.e. replacing \"fr\" with \"en\" and about 40 other hand-written rules), and assume that these documents are translations of each other*." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "PATH = Path('data/translate')\n", "TMP_PATH = PATH/'tmp'\n", "TMP_PATH.mkdir(exist_ok=True)\n", "fname='giga-fren.release2.fixed'\n", "en_fname = PATH/f'{fname}.en'\n", "fr_fname = PATH/f'{fname}.fr'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "re_eq = re.compile('^(Wh[^?.!]+\\?)')\n", "re_fq = re.compile('^([^?.!]+\\?)')\n", "\n", "lines = ((re_eq.search(eq), re_fq.search(fq)) \n", " for eq, fq in zip(open(en_fname, encoding='utf-8'), open(fr_fname, encoding='utf-8')))\n", "\n", "qs = [(e.group(), f.group()) for e,f in lines if e and f]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pickle.dump(qs, (PATH/'fr-en-qs.pkl').open('wb'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "qs = pickle.load((PATH/'fr-en-qs.pkl').open('rb'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "([('What is light ?', 'Qu’est-ce que la lumière?'),\n", " ('Who are we?', 'Où sommes-nous?'),\n", " ('Where did we come from?', \"D'où venons-nous?\"),\n", " ('What would we do without it?', 'Que ferions-nous sans elle ?'),\n", " ('What is the absolute location (latitude and longitude) of Badger, Newfoundland and Labrador?',\n", " 'Quelle sont les coordonnées (latitude et longitude) de Badger, à Terre-Neuve-etLabrador?')],\n", " 52331)" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "qs[:5], len(qs)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "en_qs,fr_qs = zip(*qs)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "en_tok = Tokenizer.proc_all_mp(partition_by_cores(en_qs))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fr_tok = Tokenizer.proc_all_mp(partition_by_cores(fr_qs), 'fr')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(['what', 'is', 'light', '?'],\n", " ['qu’', 'est', '-ce', 'que', 'la', 'lumière', '?'])" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "en_tok[0], fr_tok[0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(23.0, 28.0)" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.percentile([len(o) for o in en_tok], 90), np.percentile([len(o) for o in fr_tok], 90)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "keep = np.array([len(o)<30 for o in en_tok])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "en_tok = np.array(en_tok)[keep]\n", "fr_tok = np.array(fr_tok)[keep]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pickle.dump(en_tok, (PATH/'en_tok.pkl').open('wb'))\n", "pickle.dump(fr_tok, (PATH/'fr_tok.pkl').open('wb'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "en_tok = pickle.load((PATH/'en_tok.pkl').open('rb'))\n", "fr_tok = pickle.load((PATH/'fr_tok.pkl').open('rb'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def toks2ids(tok,pre):\n", " freq = Counter(p for o in tok for p in o)\n", " itos = [o for o,c in freq.most_common(40000)]\n", " itos.insert(0, '_bos_')\n", " itos.insert(1, '_pad_')\n", " itos.insert(2, '_eos_')\n", " itos.insert(3, '_unk')\n", " stoi = collections.defaultdict(lambda: 3, {v:k for k,v in enumerate(itos)})\n", " ids = np.array([([stoi[o] for o in p] + [2]) for p in tok])\n", " np.save(TMP_PATH/f'{pre}_ids.npy', ids)\n", " pickle.dump(itos, open(TMP_PATH/f'{pre}_itos.pkl', 'wb'))\n", " return ids,itos,stoi" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "en_ids,en_itos,en_stoi = toks2ids(en_tok,'en')\n", "fr_ids,fr_itos,fr_stoi = toks2ids(fr_tok,'fr')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def load_ids(pre):\n", " ids = np.load(TMP_PATH/f'{pre}_ids.npy')\n", " itos = pickle.load(open(TMP_PATH/f'{pre}_itos.pkl', 'rb'))\n", " stoi = collections.defaultdict(lambda: 3, {v:k for k,v in enumerate(itos)})\n", " return ids,itos,stoi" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "en_ids,en_itos,en_stoi = load_ids('en')\n", "fr_ids,fr_itos,fr_stoi = load_ids('fr')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(['qu’', 'est', '-ce', 'que', 'la', 'lumière', '?', '_eos_'], 17573, 24793)" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[fr_itos[o] for o in fr_ids[0]], len(en_itos), len(fr_itos)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Word vectors" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "fasttext word vectors available from https://fasttext.cc/docs/en/english-vectors.html" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# ! pip install git+https://github.com/facebookresearch/fastText.git" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import fastText as ft" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To use the fastText library, you'll need to download [fasttext word vectors](https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md) for your language (download the 'bin plus text' ones)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "en_vecs = ft.load_model(str((PATH/'wiki.en.bin')))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fr_vecs = ft.load_model(str((PATH/'wiki.fr.bin')))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def get_vecs(lang, ft_vecs):\n", " vecd = {w:ft_vecs.get_word_vector(w) for w in ft_vecs.get_words()}\n", " pickle.dump(vecd, open(PATH/f'wiki.{lang}.pkl','wb'))\n", " return vecd" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "en_vecd = get_vecs('en', en_vecs)\n", "fr_vecd = get_vecs('fr', fr_vecs)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "en_vecd = pickle.load(open(PATH/'wiki.en.pkl','rb'))\n", "fr_vecd = pickle.load(open(PATH/'wiki.fr.pkl','rb'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ft_words = en_vecs.get_words(include_freq=True)\n", "ft_word_dict = {k:v for k,v in zip(*ft_words)}\n", "ft_words = sorted(ft_word_dict.keys(), key=lambda x: ft_word_dict[x])\n", "\n", "len(ft_words)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(300, 300)" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dim_en_vec = len(en_vecd[','])\n", "dim_fr_vec = len(fr_vecd[','])\n", "dim_en_vec,dim_fr_vec" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(0.0075652334, 0.29283327)" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "en_vecs = np.stack(list(en_vecd.values()))\n", "en_vecs.mean(),en_vecs.std()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Model data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(29, 33)" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "enlen_90 = int(np.percentile([len(o) for o in en_ids], 99))\n", "frlen_90 = int(np.percentile([len(o) for o in fr_ids], 97))\n", "enlen_90,frlen_90" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "en_ids_tr = np.array([o[:enlen_90] for o in en_ids])\n", "fr_ids_tr = np.array([o[:frlen_90] for o in fr_ids])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class Seq2SeqDataset(Dataset):\n", " def __init__(self, x, y): self.x,self.y = x,y\n", " def __getitem__(self, idx): return A(self.x[idx], self.y[idx])\n", " def __len__(self): return len(self.x)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(45219, 5041)" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.random.seed(42)\n", "trn_keep = np.random.rand(len(en_ids_tr))>0.1\n", "en_trn,fr_trn = en_ids_tr[trn_keep],fr_ids_tr[trn_keep]\n", "en_val,fr_val = en_ids_tr[~trn_keep],fr_ids_tr[~trn_keep]\n", "len(en_trn),len(en_val)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "trn_ds = Seq2SeqDataset(fr_trn,en_trn)\n", "val_ds = Seq2SeqDataset(fr_val,en_val)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bs=125" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "trn_samp = SortishSampler(en_trn, key=lambda x: len(en_trn[x]), bs=bs)\n", "val_samp = SortSampler(en_val, key=lambda x: len(en_val[x]))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "trn_dl = DataLoader(trn_ds, bs, transpose=True, transpose_y=True, num_workers=1, \n", " pad_idx=1, pre_pad=False, sampler=trn_samp)\n", "val_dl = DataLoader(val_ds, int(bs*1.6), transpose=True, transpose_y=True, num_workers=1, \n", " pad_idx=1, pre_pad=False, sampler=val_samp)\n", "md = ModelData(PATH, trn_dl, val_dl)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[(33, 29), (21, 7), (21, 8), (33, 13), (33, 21)]" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "it = iter(trn_dl)\n", "its = [next(it) for i in range(5)]\n", "[(len(x),len(y)) for x,y in its]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initial model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def create_emb(vecs, itos, em_sz):\n", " emb = nn.Embedding(len(itos), em_sz, padding_idx=1)\n", " wgts = emb.weight.data\n", " miss = []\n", " for i,w in enumerate(itos):\n", " try: wgts[i] = torch.from_numpy(vecs[w]*3)\n", " except: miss.append(w)\n", " print(len(miss),miss[5:10])\n", " return emb" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nh,nl = 256,2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class Seq2SeqRNN(nn.Module):\n", " def __init__(self, vecs_enc, itos_enc, em_sz_enc, vecs_dec, itos_dec, em_sz_dec, nh, out_sl, nl=2):\n", " super().__init__()\n", " self.nl,self.nh,self.out_sl = nl,nh,out_sl\n", " self.emb_enc = create_emb(vecs_enc, itos_enc, em_sz_enc)\n", " self.emb_enc_drop = nn.Dropout(0.15)\n", " self.gru_enc = nn.GRU(em_sz_enc, nh, num_layers=nl, dropout=0.25)\n", " self.out_enc = nn.Linear(nh, em_sz_dec, bias=False)\n", " \n", " self.emb_dec = create_emb(vecs_dec, itos_dec, em_sz_dec)\n", " self.gru_dec = nn.GRU(em_sz_dec, em_sz_dec, num_layers=nl, dropout=0.1)\n", " self.out_drop = nn.Dropout(0.35)\n", " self.out = nn.Linear(em_sz_dec, len(itos_dec))\n", " self.out.weight.data = self.emb_dec.weight.data\n", " \n", " def forward(self, inp):\n", " sl,bs = inp.size()\n", " h = self.initHidden(bs)\n", " emb = self.emb_enc_drop(self.emb_enc(inp))\n", " enc_out, h = self.gru_enc(emb, h)\n", " h = self.out_enc(h)\n", "\n", " dec_inp = V(torch.zeros(bs).long())\n", " res = []\n", " for i in range(self.out_sl):\n", " emb = self.emb_dec(dec_inp).unsqueeze(0)\n", " outp, h = self.gru_dec(emb, h)\n", " outp = self.out(self.out_drop(outp[0]))\n", " res.append(outp)\n", " dec_inp = V(outp.data.max(1)[1])\n", " if (dec_inp==1).all(): break\n", " return torch.stack(res)\n", " \n", " def initHidden(self, bs): return V(torch.zeros(self.nl, bs, self.nh))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def seq2seq_loss(input, target):\n", " sl,bs = target.size()\n", " sl_in,bs_in,nc = input.size()\n", " if sl>sl_in: input = F.pad(input, (0,0,0,0,0,sl-sl_in))\n", " input = input[:sl]\n", " return F.cross_entropy(input.view(-1,nc), target.view(-1))#, ignore_index=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "opt_fn = partial(optim.Adam, betas=(0.8, 0.99))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3097 ['l’', \"d'\", 't_up', 'd’', \"qu'\"]\n", "1285 [\"'s\", '’s', \"n't\", 'n’t', ':']\n" ] } ], "source": [ "rnn = Seq2SeqRNN(fr_vecd, fr_itos, dim_fr_vec, en_vecd, en_itos, dim_en_vec, nh, enlen_90)\n", "learn = RNN_Learner(md, SingleModel(to_gpu(rnn)), opt_fn=opt_fn)\n", "learn.crit = seq2seq_loss" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "93120e170d0f45dbbc0e41fc792709dc", "version_major": 2, "version_minor": 0 }, "text/plain": [ "A Jupyter Widget" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ " 16%|█▋ | 62/377 [00:09<00:47, 6.60it/s, loss=11.4] \n", " 17%|█▋ | 64/377 [00:09<00:47, 6.62it/s, loss=11.2]" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception in thread Thread-242:\n", "Traceback (most recent call last):\n", " File \"/home/jhoward/anaconda3/lib/python3.6/threading.py\", line 916, in _bootstrap_inner\n", " self.run()\n", " File \"/home/jhoward/anaconda3/lib/python3.6/site-packages/tqdm/_tqdm.py\", line 144, in run\n", " for instance in self.tqdm_cls._instances:\n", " File \"/home/jhoward/anaconda3/lib/python3.6/_weakrefset.py\", line 60, in __iter__\n", " for itemref in self.data:\n", "RuntimeError: Set changed size during iteration\n", "\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " 70%|███████ | 265/377 [00:39<00:16, 6.64it/s, loss=30] \n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYwAAAEOCAYAAACaQSCZAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4xLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvAOZPmwAAIABJREFUeJzt3Xl8lWed///XJwsJgWxAErJBoGylQKENtRW6QBdbrLbaqu3XcarVQefr/tWZqV/nO/p1fuNWdcapa62d6nyrVq3V2tKFlraUrixlCfsO2UiAkABJCEk+vz/OAdN4Qk8gJ/c5J+/n45FH7vs6932fT66eng/Xdd33dZm7IyIi8lZSgg5AREQSgxKGiIhERQlDRESiooQhIiJRUcIQEZGoKGGIiEhUlDBERCQqShgiIhIVJQwREYmKEoaIiEQlLegABtKYMWO8oqIi6DBERBLG6tWrD7p7QTTHJlXCqKioYNWqVUGHISKSMMxsb7THqktKRESiooQhIiJRUcIQEZGoKGGIiEhUYpYwzKzczJ4zs81mttHMPhsu/6qZ1ZjZ2vDPoj7Ov97MtprZDjO7K1ZxiohIdGJ5l1Qn8AV3X2Nm2cBqM1safu3f3f07fZ1oZqnAD4FrgWpgpZk96u6bYhiviIicQcxaGO5e5+5rwttHgc1AaZSnXwLscPdd7t4B/Aa4KTaRiogkrqqaZl7eeZDBWG57UMYwzKwCmAO8Fi76lJmtN7P7zSw/wimlwP4e+9X0kWzMbLGZrTKzVY2NjQMYtYhI/Hvg5T185tdvDMp7xTxhmNlI4GHgc+7eAvwYOA+YDdQB3410WoSyiOnT3e9190p3rywoiOphRRGRpFFV08yM0lzMIn1tDqyYJgwzSyeULB509z8AuPsBd+9y927gZ4S6n3qrBsp77JcBtbGMVUQk0bR1dLHtwFFmleYOyvvF8i4pA34ObHb37/UoL+5x2HuAqginrwQmm9kEMxsG3AY8GqtYRUQS0aa6FrodZgxSwojlXVLzgA8BG8xsbbjsfwO3m9lsQl1Me4CPA5hZCXCfuy9y904z+xTwFJAK3O/uG2MYq4hIwqmqaQZgZlmCJwx3X0HksYglfRxfCyzqsb+kr2NFRATWVzczZmQGY3MyB+X99KS3iEiCqqppZmZpzqAMeIMShohIQmrr6GJ7w1FmDtL4BShhiIgkpE11zXQ7zCzLG7T3VMIQEUlAG6rDA95qYYiIyJlsqGlhzMgMinIyBu09lTBERBLQhpojgzrgDUoYIiIJp7Wjkx0Nxwa1OwqUMEREEs7afUfodrhofKS5W2NHCUNEJMGs3NOEmRKGiIi8hVV7DzO1KJuczPRBfV8lDBGRBNLZ1c2avU3MrRg16O+thCEikkC21B/leEcXlRWD2x0FShgiIgnl1Ay1Fw7iE96nKGGIiCSQTXUtjMxIY9yorEF/byUMEZEEsrG2henFOaSkDN4De6coYYiIJIjubmdzXQvTS3ICeX8lDBGRBLH3cCutHV1ML1bCEBGRM9hU2wKgFoaIiJzZ+pojpKcak4tGBvL+MUsYZlZuZs+Z2WYz22hmnw2X321mW8xsvZk9YmYR7w0zsz1mtsHM1prZqljFKSKSKFbuPsyssjwy0lIDef9YtjA6gS+4+/nApcAnzWw6sBSY4e6zgG3Al85wjQXuPtvdK2MYp4hI3Gs/2cWGmuZAnvA+JWYJw93r3H1NePsosBkodfen3b0zfNirQFmsYhARSRZr9x/hZJczN4AnvE8ZlDEMM6sA5gCv9XrpTuCJPk5z4GkzW21mi89w7cVmtsrMVjU2Ng5EuCIicWfVnsMAVI5PwhbGKWY2EngY+Jy7t/Qo/zKhbqsH+zh1nrtfBNxAqDvrikgHufu97l7p7pUFBQUDHL2ISHxYu/8IkwtHkps1uDPU9hTThGFm6YSSxYPu/oce5XcANwIfdHePdK6714Z/NwCPAJfEMlYRkXhW3dTG+NEjAo0hlndJGfBzYLO7f69H+fXAPwHvdvfWPs4dYWbZp7aB64CqWMUqIhLv6lvaKc7NDDSGWLYw5gEfAhaGb41da2aLgB8A2cDScNlPAMysxMyWhM8tAlaY2TrgdeBxd38yhrGKiMStto4ujrSeZGzACSMtVhd29xVApNmxlkQoO9UFtSi8vQu4MFaxiYgkkrrmNgBK8pK3hSEiIgOgvrkdgLE5wwONQwlDRCTO1YUTRjKPYYiIyAA41SUV9BiGEoaISJyra25n1IhhZKYHM4fUKUoYIiJxrr65nbE5wbYuQAlDRCTu1Ta3B36HFChhiIjENXenpqk18PELUMIQEYlrew+10tLeyfTi3KBDUcIQEYlna/cfAWDOuIhrzQ0qJQwRkTi2dv8RsoalMqUoO+hQlDBEROLZG/uamFmaS2pKpJmWBpcShohInGo/2cWmuhZmx0F3FChhiIjEraqaZk52OXPKg1uWtSclDBGROPV6eFnWINfx7kkJQ0QkTq3a08TEghGMHpkRdCiAEoaISFzq7nZW7TnMJRWjgg7lNCUMEZE4tPXAUVraO5mrhCEiImey6vT4hRKGiIicwet7mijKyaB8VLCr7PUUs4RhZuVm9pyZbTazjWb22XD5KDNbambbw78jDv+b2R3hY7ab2R2xilNEJN64Oyt3H2ZuxSjMgn9g75RYtjA6gS+4+/nApcAnzWw6cBfwrLtPBp4N77+JmY0CvgK8DbgE+EpfiUVEJNlUN7VR39LOJRPipzsKYpgw3L3O3deEt48Cm4FS4CbgF+HDfgHcHOH0dwBL3f2wuzcBS4HrYxWriEg8WRkev6gcP0QSRk9mVgHMAV4Dity9DkJJBSiMcEopsL/HfnW4LNK1F5vZKjNb1djYOJBhi4gEYuWew2RnpjF1bPATDvYU84RhZiOBh4HPuXtLtKdFKPNIB7r7ve5e6e6VBQUFZxumiEjceH33YSrH58fFhIM9xTRhmFk6oWTxoLv/IVx8wMyKw68XAw0RTq0GynvslwG1sYxVRCQeHDp2gp2Nx6mMo9tpT4nlXVIG/BzY7O7f6/HSo8Cpu57uAP4U4fSngOvMLD882H1duExEJKmt2tsEEHcD3hDbFsY84EPAQjNbG/5ZBHwTuNbMtgPXhvcxs0ozuw/A3Q8D/wqsDP98LVwmIpLUVu4+zLC0FGaVBb8ka29psbqwu68g8lgEwNURjl8FfKzH/v3A/bGJTkQkPq3a28SFZblkpKUGHcpf0ZPeIiJx4mRXN5vqWriwLD4WTOpNCUNEJE7sbDxGR2c3M0rjrzsKlDBEROLGxprQkwcXlOQEHElkShgiInGiqraZzPQUJhaMDDqUiJQwRETixMbaFs4vzom7B/ZOUcIQEYkD3d3OptoWZpTE5/gFKGGIiMSFfYdbOXaiM27HL0AJQ0QkLlTVNgPE7R1SoIQhIhIXNta2kJZiTC6KzwFvUMIQEYkLVTXNTCnKjssnvE9RwhARCZh7aMA7nscvQAlDRCRw9S3tHDreEdfjF6CEISISuHh/wvsUJQwRkYBV1TZjBucXK2GIiMgZbKxtYcKYEYzIiNmKEwNCCUNEJGAba5rj+gnvU5QwREQCdPh4B7XN7XE/fgFKGCIigdqYAE94n6KEISISoPXVoYQxPc4HvCGGa3qb2f3AjUCDu88Ilz0ETA0fkgcccffZEc7dAxwFuoBOd6+MVZwiIkF6emM9M0tzyR8xLOhQ3lIsWxgPANf3LHD3D7j77HCSeBj4wxnOXxA+VslCRJJSdVMr66qbWTSzOOhQohKzFoa7LzezikivmZkB7wcWxur9RUTi3RMb6gFYNHNswJFEJ6gxjMuBA+6+vY/XHXjazFab2eIzXcjMFpvZKjNb1djYOOCBiojEypKqOi4oyWH86BFBhxKVoBLG7cCvz/D6PHe/CLgB+KSZXdHXge5+r7tXuntlQUHBQMcpIhITtUfaeGPfkYTpjoIAEoaZpQHvBR7q6xh3rw3/bgAeAS4ZnOhERAbHE1Wh7qgbZiRGdxQE08K4Btji7tWRXjSzEWaWfWobuA6oGsT4RERi7okNdUwbm83EgvhdMKm3mCUMM/s18Aow1cyqzeyj4Zduo1d3lJmVmNmS8G4RsMLM1gGvA4+7+5OxilNEZLA1Hj3B6n1N3DAjcbqjILZ3Sd3eR/mHI5TVAovC27uAC2MVl4hI0J7dfAB3uHZ6UdCh9Iue9BYRGWTPbD5Aad5wzi/ODjqUflHCEBEZRK0dnby4/SDXTi8i9Eha4ogqYZjZZ80sx0J+bmZrzOy6WAcnIpJsVmw/yInO7oTrjoLoWxh3unsLoTuWCoCPAN+MWVQiIklq6aYDZGemccmEUUGH0m/RJoxT7aZFwH+5+7oeZSIiEoWubmfZlgYWTC0kPTXxRgSijXi1mT1NKGE8FX5Oojt2YYmIJJ839jVx6HgH1yRgdxREf1vtR4HZwC53bzWzUYS6pUREJEpPVtWTnmpcNTUxpzGKtoVxGbDV3Y+Y2d8A/ww0xy4sEZHk4u48UVXP/EljyMlMDzqcsxJtwvgx0GpmFwL/COwFfhmzqEREksyGmmZqjrRxQwJNNthbtAmj090duAn4vrt/H0isJ05ERAK0ZEM9aSnGdQk6fgHRj2EcNbMvAR8CLjezVCAx21QiIoOsu9t5dG0N8yePIS8r/pdi7Uu0LYwPACcIPY9RD5QCd8csKhGRJPLq7kPUNrfznjmlQYdyTqJKGOEk8SCQa2Y3Au3urjEMEZEoPLKmhpEZaVw3PXHWvogk2qlB3k9oqvH3EVqL+zUzuzWWgYmIJIP2k108WVXP9TPGMnxYatDhnJNoxzC+DMwNr4CHmRUAzwC/j1VgIiLJ4IVtjRw90cm7LiwJOpRzFu0YRsqpZBF2qB/niogMWY+tr2PUiGG8/bzRQYdyzqJtYTxpZk/xl5XyPgAsOcPxIiJDXmtHJ89sOsB7LypNyLmjeosqYbj7P5jZLcA8QpMO3uvuj8Q0MhGRBPfs5gbaTnZx46zE746CfizR6u4PAw/HMBYRkaTy53W1FGZnJORU5pGcsY1kZkfNrCXCz1Eza3mLc+83swYzq+pR9lUzqzGzteGfRX2ce72ZbTWzHWZ219n9aSIiwWlpP8nz2xp556xiUlOSYzWIM7Yw3P1cpv94APgBfz3n1L+7+3f6Oin8FPkPgWuBamClmT3q7pvOIRYRkUH13JYGOjq7uXFW4s4d1VvMRmHcfTlw+CxOvQTY4e673L0D+A2hOaxERBLG0k0HGDNyGHPK84MOZcAEMWz/KTNbH+6yilSTpcD+HvvV4bKIzGyxma0ys1WNjY0DHauISL91dHbzwtZGrp5WREqSdEfB4CeMHwPnEVqMqQ74boRjItWu93VBd7/X3SvdvbKgIDEXJRGR5PL67sMcPdGZsCvr9WVQE4a7H3D3LnfvBn5GqPupt2qgvMd+GVA7GPGJiAyEpZvqyUxPYf6kMUGHMqAGNWGYWc/Rn/cAVREOWwlMNrMJZjYMuA14dDDiExE5V+7OM5sbmD+pIOHnjuotZgnDzH4NvAJMNbNqM/so8G0z22Bm64EFwOfDx5aY2RIAd+8EPgU8BWwGfuvuG2MVp4jIQNpcd5SaI21cO70w6FAGXNQP7vWXu98eofjnfRxbCyzqsb8ETT0iIgnomc0HMIOF05Jr/AI0gaCIyIB6ZvMBZpfnUZCdEXQoA04JQ0RkgNQ3t7O+uplrk+zuqFOUMEREBsizWw4AcO35ShgiInIGz2w6wPjRWUwqHBl0KDGhhCEiMgCOn+jkpZ2HuOb8IsyS5+nunpQwREQGwIvbG+no7OaaJO2OAiUMEZEBsXRTA7nD05lbkTyTDfamhCEico66up1lWw6wcFohaUmwFGtfkvcvExEZJGv2NdHUejKpu6NACUNE5Jw9s+kA6anGFVOSa7LB3pQwRETO0dLNB7h04miyM9ODDiWmlDBERM7BzsZj7Go8nrRPd/ekhCEicg5e3nkIgAVTk2922t6UMEREzsGWuhZyMtMoyx8edCgxp4QB/GDZdpZtOcDJru6gQxGRBLPtwFGmjs1O2qe7e4rZehiJorWjk/96aQ+HjncwYlgqF5bnMWHMCC4al88lE0ZRlj98SHwQRKT/3J2t9Ud59+ySoEMZFEM+YWQNS+OVL13N8m2NvLCtkfU1zTy6rpYHX9sHQE5mGpOLsqmsyGfu+FFUVuSTlzUs4KhFJB7Ut7TT0t7J1KLsoEMZFEM+YQAMS0vhmulFXBO+y6G729nWcJTXdx9m24GjbKpt4f4Vu/npC7sAmFI0krkVo5hbEUogpXlqhYgMRVvrjwIwRQnj3JjZ/cCNQIO7zwiX3Q28C+gAdgIfcfcjEc7dAxwFuoBOd6+MVZyRpKQY08bmMG1szumy9pNdrNt/hJV7DrNyTxOPrv1LK6QoJ4OLx+dz0bh85ozLZ0ZpDhlpybX4u4j8tW0HQglj6lgljHP1APAD4Jc9ypYCX3L3TjP7FvAl4J/6OH+Bux+MYXz9kpmeytsmjuZtE0cDobljttS3sGpPE2v2NbF6bxNLNtQDoRbL2yaMYv6kMVw+uYDzi4fGgJjIULOuupni3Mwh000ds4Th7svNrKJX2dM9dl8Fbo3V+8daaopxQUkuF5TkcsfbKwBoONrOmr1HeH33YVbsaOQbT2zhG09sYczIDOZPGs3lkwu4fPIYCnMygw1eRM5ZZ1c3K7YfHBIP7J0S5BjGncBDfbzmwNNm5sBP3f3ewQvr7BVmZ3L9jLFcP2MsEFrfd8WOg7y4vZEVOw7yx7W1AEwtyubKqQUsmFpIZUU+6Uk8u6VIslpX3Uxz20mumloQdCiDJpCEYWZfBjqBB/s4ZJ6715pZIbDUzLa4+/I+rrUYWAwwbty4mMR7tsbmZnLrxWXcenEZ3d3O5voWVmw/yPLtjTzw0h7uXb6L7Mw0rphSwMKphVw1tYDRIzOCDltEovDCtkZSDOZPSu4JB3sa9IRhZncQGgy/2t090jHuXhv+3WBmjwCXABETRrj1cS9AZWVlxOvFg5QeXVgfv/I8jp/oZMWOgyzb3MBzWxt4fH0dZjC7PI+rpxVyw8xizitIznWBRZLB81sbmF2eN2TGL2CQE4aZXU9okPtKd2/t45gRQIq7Hw1vXwd8bRDDHBQjMtJ4xwVjeccFY+nudjbVtbBsSwPPbmngO09v4ztPb2NqUTaLZhazaOZYJg+R2/ZEEkF1Uyvrq5u564ZpQYcyqGJ5W+2vgauAMWZWDXyF0F1RGYS6mQBedfdPmFkJcJ+7LwKKgEfCr6cBv3L3J2MVZzxISTFmlOYyozSXz1w9mfrmdp6sqmPJhnr+49lt/Psz25hcOJJ3zirmxlklTCpUy0MkSE9Whe6IvCE8XjlUWB+9QgmpsrLSV61aFXQYA6qhpZ0nN9bz2Po6Vu45jDtMG5vNuy4s4Z0zi6kYMyLoEEWGnPf+6CVOdHbz+GcuDzqUc2Zmq6N91k1Pese5wpxM/vayCv72sgoOtLSzZEMdj62v4+6ntnL3U1uZVZbL+yrLuWl2CTlJvniLSDxoONrOmn1H+OJ1U4IOZdApYSSQopxMPjJvAh+ZN4GaI20sWV/Hw2uq+T9/rOLfHt/EopnF3DZ3HHMr8vWgoEiMvLC1EYAF05J//YvelDASVGnecP7uiol87PIJbKhp5jcr9/Po2lr+sKaGyYUj+cSV5/Hu2SV6xkNkgD2/rZHC7AymF+e89cFJRt8mCc7MmFWWx9ffM5PXv3w1d986i9QU4wu/W8eC7zzPf7+6l/aTXUGHKZIUOru6eXFbI1dOKRiSrXgljCSSNSyN91WW88RnL+fnd1RSkJ3B//ljFfO/9Rw/fWEnx050Bh2iSEJbV91MS3snVw2B5VgjUZdUEjIzrj6/iIXTCnl112F+9PwOvvHEFn70/E4+On8CH55XoQFykbPwxr4mAOZOyA84kmAoYSQxM+Oy80Zz2XmjWbf/CPcs28H3lm7jvhd3cef80OB57nAlDpForQ/PTluYPTQnEFWX1BBxYXke991RyWOfns/bJo7mP57ZzvxvLeN7S7fR3Hoy6PBEEsKGmmZmleUGHUZglDCGmBmlufzsbyt5/DPzeft5o/nPZ0OJ47tPb1XiEDmD5raT7D54nFlleUGHEhgljCHqgpJcfvqhSp747OXMnzyGe5bt4PJvL+NHz++gtUOD4yK9bahuBlALQ4au84tz+PHfXMySz1xOZcUovv3kVq68+3l++coeOjq7gw5PJG5sqAknjFK1MGSIm16Sw/0fnsvvP3EZE8aM4F/+tJGF332eh1dX09WdPPONiZyt7Q1HGZuTSW7W0L1RRAlD3qSyYhQPLb6UX9x5CXlZ6Xzhd+u44fvLeWpjPck0UaVIf+1sOMZ5hUN7sk8lDPkrZsaVUwp49JPz+eH/uIjObufj/72am3/0Mi/tOBh0eCKDzt3Z2XicSUN8UTMlDOlTSorxzlnFPP25K/j2LbNobGnng/e9xgfve5W1+48EHZ7IoGk4eoJjJzo5b4ivRaOEIW8pLTWF988tZ9kXr+JfbpzOlrqj3PzDl1j8y1VsO3A06PBEYm5HwzEAtTCCDkASR2Z6KnfOn8AL/7iAL1w7hVd2HuId/7Gczz+09vT/UCLJaGdj6PM91FsYmhpE+m1kRhqfvnoyf3PpeH7ywk5+8coe/ri2hhtmjOWTCyZxQcnQvU9dktOOhmNkZ6RRmJ0RdCiBUsKQs5Y/YhhfWnQ+i6+YyP0v7eaXL+9lyYZ6Fk4r5JMLJnHx+KE5QZskn90HjzOhYMSQnNK8p5h2SZnZ/WbWYGZVPcpGmdlSM9se/h3xW8XM7ggfs93M7ohlnHJuRo/M4B/eMY0Vdy3ki9dN4Y19Tdzy45e5/d5XeWnHQd2OKwmvuqmN8lFZQYcRuFiPYTwAXN+r7C7gWXefDDwb3n8TMxsFfAV4G3AJ8JW+EovEj9zh6Xxq4WReumsh//zO89nZeIwP3vcaN96zgkfeqNaT45KQurudmqY2yvOVMGKaMNx9OXC4V/FNwC/C278Abo5w6juApe5+2N2bgKX8deKROJU1LI2PXT6R5f+4gK+/ZyYnOrv5/EPrmP+tZfzwuR00He8IOkSRqDUcPUFHVzdl+cODDiVwQYxhFLl7HYC715lZpKWrSoH9Pfarw2WSQDLTU/kfbxvHbXPLWb69kZ+v2M3dT23lnmXbueWiMu6cP4HzhvhtihL/qptaAZQwiN9B70gjSxE7ws1sMbAYYNy4cbGMSc5SSopx1dRCrppayJb6Fu5fsZvfra7mwdf2sWBqAR+ZN4H5k8aQkjK0BxQlPlU3tQFQpi6pQJ7DOGBmxQDh3w0RjqkGynvslwG1kS7m7ve6e6W7VxYUFAx4sDKwpo3N4du3XsjLdy3kc9dMZn11M397/+tccfdz3PPsduqa24IOUeRN1ML4iyASxqPAqbue7gD+FOGYp4DrzCw/PNh9XbhMksSYkRl87popvHTXQr5/22zGjcriu0u3Me+by7jzgZU8WVXPyS4Nkkvw9h9uoyA7g8z01KBDCVxMu6TM7NfAVcAYM6smdOfTN4HfmtlHgX3A+8LHVgKfcPePufthM/tXYGX4Ul9z996D55IEMtNTuWl2KTfNLmXvoeP8dtV+freqmmVbGhgzMoNbLi7llovKmFKUHXSoMkRVH2lV6yLMkuke+crKSl+1alXQYcg56uzq5oVtjfxm5X6WbWmgq9uZXpzDzXNKuGl2KUU5mUGHKEPIlXc/x6yyPO65fU7QocSEma1298pojo3XQW8ZwtJSU7j6/CKuPr+Ig8dO8Ni6Wh5ZW8vXl2zhm09s4fLJBdx6cRnXTi9SN4HE1P7Drew91Mr7K8vf+uAhQAlD4tqYkRl8eN4EPjxvArsaj/HIGzU8vLqaT//6DXIy03jXhSW896JS5pTn6y4rGXAPr6nGDG6eo7v6QV1SkoC6u51Xdh3i96ureaKqjvaT3ZTkZrJoZjHvnFXM7PK8IT/nj5y77m7nyu88x7hRWTz4sUuDDidm1CUlSS0lxZg3aQzzJo3hazddwLObG3hsfR2/fGUv963YTWnecN45q5h3zixmVlmukoeclTf2N7H/cBufv2ZK0KHEDSUMSWjZmencPKeUm+eU0tx2kmc2HeDxDXX810u7uXf5Lsryh3P1tEIWTCvk0omjNeYhUXt8fT3DUlO4dnpR0KHEDSUMSRq5w9O55eIybrm4jObWkzy9qZ6nNtbz0Kr9/OKVvQxPT2XepNEsmFbIgqmFlOTpVkmJrLvbeaKqjiumjCE7Mz3ocOKGEoYkpdysdN5XWc77KstpP9nFq7sO8dyWBpZtbeCZzaHJBaaNzWbhtEIWTitkzrh8UjVoLmFrq49Q19zOP7xjatChxBUlDEl6mempp+ey+qo7OxuPsWxLA8u2NPDT5bv40fM7yctK58opBSycVsiVUwrIyxoWdNgSoIdXV5ORFrq9W/5CCUOGFDNjUmE2kwqzWXzFeTS3neTF7Y0s29LAC1sb+dPaWlIMZpfncfnkAq6YMoYLy/JISw1iFh0JQmtHJ39aW8s7ZxaTO1zdUT0pYciQljs8nRtnlXDjrBK6u531Nc0s29LA8m2N3LNsO99/djvZGWm8beJo5k8azbxJY5hUOFJ3XiWxx9fXcexEJ7ddotmve1PCEAlLSTFml+cxuzyP/3XtFI60dvDyzkO8uL2Rl3Yc4pnNBwAozM5g3qQxvP280cytGMX40VlKIElk6aYDlOUPZ26FFvnsTQlDpA95WcNYNLOYRTOLgdA0ES/tOMhLOw+xfFsjj7xRA0B+VjpzxuUzpzyPi8bnM6ssV3fWJCh3Z82+I1wxeYz+ERCBEoZIlMpHZXHbJeO47ZJxdHc72xuOsWZfE2/sa+KNfUdYtiV095UZTCnMZs64vPBPPpMKRmrqkgRQ3dTGwWMnuGi8WheRKGGInIWUFGPq2Gymjs3m9nBfd3PbSdbtP8Ib+47wxv4mnqiq5zcrQysNZ2ekcWF5KIFcNC6fC8vzGDVCd2LFm9V7mwC4aJwSRiRKGCIDJHd4OldMKeCKKaGVH92d3QeP88a+I+G5SFliAAAN+ElEQVSWyBF++NwOusPTt5XlD2dWWS4zS/OYVZbLjNJc3ZUTsDX7mhgxLJWpY7X+SiRKGCIxYmZMLBjJxIKR3HJxGQDHT3SyoaaZtfuPsKGmmQ3VzSzZUH/6nIrRWcwsy2NWaS4zy3K5oCRH4yGDpLWjk+e2NjB7XJ4e4uyDEobIIBqRkcalE0dz6cTRp8uajndQVdvM+upQAlmzt4k/rwstYW8GE8eMYFZZHjNLc5lVlsv0khyyhul/3YHU3e388yNVVDe18a33zgo6nLilT51IwPJHDOPyyQVcPrngdNnBYydOt0DWVzfz8s6Dp+/KSjGYXJjNzLLccJdWLucX52hixbN08NgJvvKnjTy+oY7PXTOZt08aE3RIcUvrYYgkiAMt7aEEUtNMVU0z66uPcPBYBwBpKcaUomxmhruyZpXlMnVsNhlpSiJ9eW3XIX71+j6e2lhPR2c3d90wjb+7fOKQu522P+thDHrCMLOpwEM9iiYC/+Lu/9HjmKuAPwG7w0V/cPevvdW1lTBkKHF36lvaT3dlra9pZkP1EZpaTwKQnmpMG5vDzHArZNrYbKYUZTMiY2h3LBw6doJ/W7KZP6ypIT8rnetnFPOxyydwXsHIoEMLRFwvoOTuW4HZAGaWCtQAj0Q49EV3v3EwYxNJJGZGce5winOH844LxgKhJFLd1MaGmvCYSM0R/ryull+9tu/0eeNGZTFtbDbTinO4oCSHGaW5lORmJt2/rE90dlHd1MbeQ8fZc7CVptYODh7r4PH1tbSd7OJTCybxqYWT1JXXD0H/U+NqYKe77w04DpGkYGaUj8qifFTW6SfU3Z39h9vYUt/ClvqjbK0/ypb6Fp7ZfOD0Lb55WenMKMnlgtIcLijJZUZJDhWjR8TNw4ZH20/SePQE7Se7OdHZxckup7Orm/1NrVTVtNDR2c3J7m4aWk6w9/BxDjSfoKOr+03XMIPMtFSumV7EpxdOYkqRbp3tr6ATxm3Ar/t47TIzWwfUAl90942DF5ZI8jAzxo3OYtzoLK4Lt0QA2jq62FLfQlVtC5tqm6mqaeG/Vuw5/UU7Ylgq00tymFKUzfjRWYwbNYLxo7MYPzprwO7Saj/ZxfrqZlbvbeJASztNrR00tZ7kSGsHLW0nMTMOH++gue1kn9fIzkwja1gqaSkpjMnOYE55PsUzMxkxLI2y/OGMHz2CitFZpx+UTLaW1GAKbNDbzIYRSgYXuPuBXq/lAN3ufszMFgHfd/fJfVxnMbAYYNy4cRfv3avGisjZ6ujsZkfDMapqm9lY00xVbQs7Go791Rd2UU4GkwpHMqlgJOcVjqQwO4O8rGGMGjGM/Kxh5GWlk95rSviaI218fclmDh49wYnOblraT1Ld1EZHZyhBZWemkZ81jPysdPKyhpE7PB0HcoenUZafRWF2BsPTU8lMTyU9NYX0VGP0yAzOKxihJHAO4nrQ+/Qbm90EfNLdr4vi2D1ApbsfPNNxGvQWiY3m1pPsPXycvYda2Xe4lZ2Nx9jZeJydDcc4dqIz4jnZmWmnE0h+Vjoba1to7ejigpIchqWlkJ0ZSgSV4/OZWzGKfE2VEoi4HvTu4Xb66I4ys7HAAXd3M7sESAEODWZwIvIXuVnpzMrKY1ZZ3pvK3Z3GYyc4dKyDpuMdHG4N/W5qPcnh4x00tXZw+HgHjcdOUJo/nG+8dybTxuYE9FfIuQokYZhZFnAt8PEeZZ8AcPefALcCf29mnUAbcJsn0wMjIknCzCjMzqQwOzPoUGQQBJIw3L0VGN2r7Cc9tn8A/GCw4xIRkb5poWIREYmKEoaIiERFCUNERKKihCEiIlFRwhARkagoYYiISFSUMEREJCpJtYCSmTUCpyaTGgOccSqRc5ALNMfonDMd19drkcqjKeu5H2/11Z/zVGf9O++tjhnIOuv9uurszJ+pSPuxrrM8dy94yyMh9Gh/Mv4Aq2J47Xtjdc6ZjuvrtUjl0ZT13I+3+lKdxa7O3uqYgayzCPWnOjvDZyre60xdUmfnzzE850zH9fVapPJoys7m7zgbZ/s+qrPYnPdWxwxknQ1WfZ3LewVdZ0F9xvr9XknVJdWTma3yKGdgFNXX2VCd9Z/qrP/iqc6SuYVxb9ABJBjVV/+pzvpPddZ/cVNnSdvCEBGRgZXMLQwRERlAShgiIhIVJQwREYnKkEsYZnaVmb1oZj8xs6uCjidRmNkIM1ttZjcGHUsiMLPzw5+x35vZ3wcdTyIws5vN7Gdm9iczuy7oeBKBmU00s5+b2e8H4/0SKmGY2f1m1mBmVb3KrzezrWa2w8zueovLOHAMyASqYxVrvBigOgP4J+C3sYkyvgxEnbn7Znf/BPB+IC5uiYylAaqzP7r73wEfBj4Qw3DjwgDV2S53/2hsI+0RWyLdJWVmVxD6sv+lu88Il6UC2witEV4NrARuB1KBb/S6xJ3AQXfvNrMi4Hvu/sHBij8IA1RnswhNT5BJqP4eG5zogzEQdebuDWb2buAu4Afu/qvBij8IA1Vn4fO+Czzo7msGKfxADHCd/d7db411zIGs6X223H25mVX0Kr4E2OHuuwDM7DfATe7+DeBM3SdNQEYs4ownA1FnZrYAGAFMB9rMbIm7d8c08AAN1OfM3R8FHjWzx4GkThgD9Dkz4JvAE8meLGDAv88GRUIljD6UAvt77FcDb+vrYDN7L/AOIA/4QWxDi1v9qjN3/zKAmX2YcAstptHFp/5+zq4C3kvoHyVLYhpZ/OpXnQGfBq4Bcs1skrv/JJbBxan+fs5GA/8GzDGzL4UTS8wkQ8KwCGV99rO5+x+AP8QunITQrzo7fYD7AwMfSsLo7+fseeD5WAWTIPpbZ/8J/GfswkkI/a2zQ8AnYhfOmyXUoHcfqoHyHvtlQG1AsSQK1Vn/qc76T3XWf3FdZ8mQMFYCk81sgpkNA24DHg04pninOus/1Vn/qc76L67rLKEShpn9GngFmGpm1Wb2UXfvBD4FPAVsBn7r7huDjDOeqM76T3XWf6qz/kvEOkuo22pFRCQ4CdXCEBGR4ChhiIhIVJQwREQkKkoYIiISFSUMERGJihKGiIhERQlDAmNmxwbhPd4d5fTtA/meV5nZ28/ivDlmdl94+8NmFhdznZlZRe8puCMcU2BmTw5WTBIMJQxJeOEpoSNy90fd/ZsxeM8zzcN2FdDvhAH8b+CeswooYO7eCNSZ2bygY5HYUcKQuGBm/2BmK81svZn93x7lf7TQSn8bzWxxj/JjZvY1M3sNuMzM9pjZ/zWzNWa2wcymhY87/S91M3vAzP7TzF42s11mdmu4PMXMfhR+j8fMbMmp13rF+LyZfd3MXgA+a2bvMrPXzOwNM3vGzIrC01V/Avi8ma01s8vD//p+OPz3rYz0pWpm2cAsd18X4bXxZvZsuG6eNbNx4fLzzOzV8DW/FqnFZqGVEh83s3VmVmVmHwiXzw3Xwzoze93MssMtiRfDdbgmUivJzFLN7O4e/60+3uPlPwJJvb7MkOfu+tFPID/AsfDv64B7Cc3UmQI8BlwRfm1U+PdwoAoYHd534P09rrUH+HR4+38C94W3P0xoASOAB4Dfhd9jOqF1BwBuJTQFeQowltBaKbdGiPd54Ec99vP5y2wJHwO+G97+KvDFHsf9Cpgf3h4HbI5w7QXAwz32e8b9Z+CO8PadwB/D248Bt4e3P3GqPntd9xbgZz32c4FhwC5gbrgsh9DM1VlAZrhsMrAqvF0BVIW3FwP/HN7OAFYBE8L7pcCGoD9X+ondTzJMby6J77rwzxvh/ZGEvrCWA58xs/eEy8vD5YeALuDhXtc5NW39akJrUUTyRw+t57HJQqsuAswHfhcurzez584Q60M9tsuAh8ysmNCX8O4+zrkGmG52eubqHDPLdvejPY4pBhr7OP+yHn/PfwPf7lF+c3j7V8B3Ipy7AfiOmX0LeMzdXzSzmUCdu68EcPcWCLVGgB+Y2WxC9TslwvWuA2b1aIHlEvpvshtoAEr6+BskCShhSDww4Bvu/tM3FYYWIboGuMzdW83seULLxAK0u3tXr+ucCP/uou/P9oke29brdzSO99i+h9Ayv4+GY/1qH+ekEPob2s5w3Tb+8re9lagngHP3bWZ2MbAI+IaZPU2o6yjSNT4PHAAuDMfcHuEYI9SSeyrCa5mE/g5JUhrDkHjwFHCnmY0EMLNSMysk9K/XpnCymAZcGqP3XwHcEh7LKCI0aB2NXKAmvH1Hj/KjQHaP/acJzUAKQPhf8L1tBib18T4vE5rmGkJjBCvC268S6nKix+tvYmYlQKu7/z9CLZCLgC1AiZnNDR+THR7EzyXU8ugGPkRoHenengL+3szSw+dOCbdMINQiOePdVJLYlDAkcO7+NKEulVfMbAPwe0JfuE8CaWa2HvhXQl+QsfAwoYVrqoCfAq8BzVGc91Xgd2b2InCwR/mfgfecGvQGPgNUhgeJNxFhhTR330JoadLs3q+Fz/9IuB4+BHw2XP454H+Z2euEurQixTwTeN3M1gJfBv4/d+8APgDcY2brgKWEWgc/Au4ws1cJffkfj3C9+4BNwJrwrbY/5S+tuQXA4xHOkSSh6c1FADMb6e7HLLRG8uvAPHevH+QYPg8cdff7ojw+C2hzdzez2wgNgN8U0yDPHM9y4CZ3bwoqBoktjWGIhDxmZnmEBq//dbCTRdiPgff14/iLCQ1SG3CE0B1UgTCzAkLjOUoWSUwtDBERiYrGMEREJCpKGCIiEhUlDBERiYoShoiIREUJQ0REoqKEISIiUfn/AczMUco/hxFrAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "learn.lr_find()\n", "learn.sched.plot()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "lr=3e-3" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "26801451fa214563aaef1d748a892f20", "version_major": 2, "version_minor": 0 }, "text/plain": [ "A Jupyter Widget" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ " 6%|▌ | 22/377 [00:04<01:06, 5.34it/s, loss=10.8] \n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Exception in thread Thread-20:\n", "Traceback (most recent call last):\n", " File \"/home/jhoward/anaconda3/lib/python3.6/threading.py\", line 916, in _bootstrap_inner\n", " self.run()\n", " File \"/home/jhoward/anaconda3/lib/python3.6/site-packages/tqdm/_tqdm.py\", line 144, in run\n", " for instance in self.tqdm_cls._instances:\n", " File \"/home/jhoward/anaconda3/lib/python3.6/_weakrefset.py\", line 60, in __iter__\n", " for itemref in self.data:\n", "RuntimeError: Set changed size during iteration\n", "\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "epoch trn_loss val_loss \n", " 0 5.48978 5.462648 \n", " 1 4.616437 4.770539 \n", " 2 4.345884 4.37726 \n", " 3 3.857125 4.136014 \n", " 4 3.612306 3.941867 \n", " 5 3.375064 3.839872 \n", " 6 3.383987 3.708972 \n", " 7 3.224772 3.664173 \n", " 8 3.238523 3.604765 \n", " 9 2.962041 3.587814 \n", " 10 2.96163 3.574888 \n", " 11 2.866477 3.581224 \n", "\n" ] }, { "data": { "text/plain": [ "[3.5812237]" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "learn.fit(lr, 1, cycle_len=12, use_clr=(20,10))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "learn.save('initial')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "learn.load('initial')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Test" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "quels facteurs pourraient influer sur le choix de leur emplacement ? _eos_\n", "what factors influencetheir location ? _eos_\n", "what factors might might influence on the their ? ? _eos_\n", "\n", "qu’ est -ce qui ne peut pas changer ? _eos_\n", "what can not change ? _eos_\n", "what not change change ? _eos_\n", "\n", "que faites - vous ? _eos_\n", "what do you do ? _eos_\n", "what do you do ? _eos_\n", "\n", "qui réglemente les pylônes d' antennes ? _eos_\n", "who regulates antenna towers ? _eos_\n", "who regulates the doors doors ? _eos_\n", "\n", "où sont - ils situés ? _eos_\n", "where are they located ? _eos_\n", "where are the located ? _eos_\n", "\n", "quelles sont leurs compétences ? _eos_\n", "what are their qualifications ? _eos_\n", "what are their skills ? _eos_\n", "\n", "qui est victime de harcèlement sexuel ? _eos_\n", "who experiences sexual harassment ? _eos_\n", "who is victim sexual sexual ? ? _eos_\n", "\n", "quelles sont les personnes qui visitent les communautés autochtones ? _eos_\n", "who visits indigenous communities ? _eos_\n", "who are people people aboriginal aboriginal ? _eos_\n", "\n", "pourquoi ces trois points en particulier ? _eos_\n", "why these specific three ? _eos_\n", "why are these two different ? ? _eos_\n", "\n", "pourquoi ou pourquoi pas ? _eos_\n", "why or why not ? _eos_\n", "why or why not _eos_\n", "\n" ] } ], "source": [ "x,y = next(iter(val_dl))\n", "probs = learn.model(V(x))\n", "preds = to_np(probs.max(2)[1])\n", "\n", "for i in range(180,190):\n", " print(' '.join([fr_itos[o] for o in x[:,i] if o != 1]))\n", " print(' '.join([en_itos[o] for o in y[:,i] if o != 1]))\n", " print(' '.join([en_itos[o] for o in preds[:,i] if o!=1]))\n", " print()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bidir" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class Seq2SeqRNN_Bidir(nn.Module):\n", " def __init__(self, vecs_enc, itos_enc, em_sz_enc, vecs_dec, itos_dec, em_sz_dec, nh, out_sl, nl=2):\n", " super().__init__()\n", " self.emb_enc = create_emb(vecs_enc, itos_enc, em_sz_enc)\n", " self.nl,self.nh,self.out_sl = nl,nh,out_sl\n", " self.gru_enc = nn.GRU(em_sz_enc, nh, num_layers=nl, dropout=0.25, bidirectional=True)\n", " self.out_enc = nn.Linear(nh*2, em_sz_dec, bias=False)\n", " self.drop_enc = nn.Dropout(0.05)\n", " self.emb_dec = create_emb(vecs_dec, itos_dec, em_sz_dec)\n", " self.gru_dec = nn.GRU(em_sz_dec, em_sz_dec, num_layers=nl, dropout=0.1)\n", " self.emb_enc_drop = nn.Dropout(0.15)\n", " self.out_drop = nn.Dropout(0.35)\n", " self.out = nn.Linear(em_sz_dec, len(itos_dec))\n", " self.out.weight.data = self.emb_dec.weight.data\n", " \n", " def forward(self, inp):\n", " sl,bs = inp.size()\n", " h = self.initHidden(bs)\n", " emb = self.emb_enc_drop(self.emb_enc(inp))\n", " enc_out, h = self.gru_enc(emb, h)\n", " h = h.view(2,2,bs,-1).permute(0,2,1,3).contiguous().view(2,bs,-1)\n", " h = self.out_enc(self.drop_enc(h))\n", "\n", " dec_inp = V(torch.zeros(bs).long())\n", " res = []\n", " for i in range(self.out_sl):\n", " emb = self.emb_dec(dec_inp).unsqueeze(0)\n", " outp, h = self.gru_dec(emb, h)\n", " outp = self.out(self.out_drop(outp[0]))\n", " res.append(outp)\n", " dec_inp = V(outp.data.max(1)[1])\n", " if (dec_inp==1).all(): break\n", " return torch.stack(res)\n", " \n", " def initHidden(self, bs): return V(torch.zeros(self.nl*2, bs, self.nh))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rnn = Seq2SeqRNN_Bidir(fr_vecd, fr_itos, dim_fr_vec, en_vecd, en_itos, dim_en_vec, nh, enlen_90)\n", "learn = RNN_Learner(md, SingleModel(to_gpu(rnn)), opt_fn=opt_fn)\n", "learn.crit = seq2seq_loss" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "311a44c5e45644b49728f8a4105e89ed", "version_major": 2, "version_minor": 0 }, "text/plain": [ "A Jupyter Widget" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "epoch trn_loss val_loss \n", " 0 4.896942 4.761351 \n", " 1 4.323335 4.260878 \n", " 2 3.962747 4.06161 \n", " 3 3.596254 3.940087 \n", " 4 3.432788 3.944787 \n", " 5 3.310895 3.686629 \n", " 6 3.454976 3.638168 \n", " 7 3.093827 3.588456 \n", " 8 3.257495 3.610536 \n", " 9 3.033345 3.540344 \n", " 10 2.967694 3.516766 \n", " 11 2.718945 3.513977 \n", "\n" ] }, { "data": { "text/plain": [ "[3.5139771]" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "learn.fit(lr, 1, cycle_len=12, use_clr=(20,10))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "learn.save('bidir')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Teacher forcing" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class Seq2SeqStepper(Stepper):\n", " def step(self, xs, y, epoch):\n", " self.m.pr_force = (10-epoch)*0.1 if epoch<10 else 0\n", " xtra = []\n", " output = self.m(*xs, y)\n", " if isinstance(output,tuple): output,*xtra = output\n", " self.opt.zero_grad()\n", " loss = raw_loss = self.crit(output, y)\n", " if self.reg_fn: loss = self.reg_fn(output, xtra, raw_loss)\n", " loss.backward()\n", " if self.clip: # Gradient clipping\n", " nn.utils.clip_grad_norm(trainable_params_(self.m), self.clip)\n", " self.opt.step()\n", " return raw_loss.data[0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class Seq2SeqRNN_TeacherForcing(nn.Module):\n", " def __init__(self, vecs_enc, itos_enc, em_sz_enc, vecs_dec, itos_dec, em_sz_dec, nh, out_sl, nl=2):\n", " super().__init__()\n", " self.emb_enc = create_emb(vecs_enc, itos_enc, em_sz_enc)\n", " self.nl,self.nh,self.out_sl = nl,nh,out_sl\n", " self.gru_enc = nn.GRU(em_sz_enc, nh, num_layers=nl, dropout=0.25)\n", " self.out_enc = nn.Linear(nh, em_sz_dec, bias=False)\n", " self.emb_dec = create_emb(vecs_dec, itos_dec, em_sz_dec)\n", " self.gru_dec = nn.GRU(em_sz_dec, em_sz_dec, num_layers=nl, dropout=0.1)\n", " self.emb_enc_drop = nn.Dropout(0.15)\n", " self.out_drop = nn.Dropout(0.35)\n", " self.out = nn.Linear(em_sz_dec, len(itos_dec))\n", " self.out.weight.data = self.emb_dec.weight.data\n", " self.pr_force = 1.\n", " \n", " def forward(self, inp, y=None):\n", " sl,bs = inp.size()\n", " h = self.initHidden(bs)\n", " emb = self.emb_enc_drop(self.emb_enc(inp))\n", " enc_out, h = self.gru_enc(emb, h)\n", " h = self.out_enc(h)\n", "\n", " dec_inp = V(torch.zeros(bs).long())\n", " res = []\n", " for i in range(self.out_sl):\n", " emb = self.emb_dec(dec_inp).unsqueeze(0)\n", " outp, h = self.gru_dec(emb, h)\n", " outp = self.out(self.out_drop(outp[0]))\n", " res.append(outp)\n", " dec_inp = V(outp.data.max(1)[1])\n", " if (dec_inp==1).all(): break\n", " if (y is not None) and (random.random()=len(y): break\n", " dec_inp = y[i]\n", " return torch.stack(res)\n", " \n", " def initHidden(self, bs): return V(torch.zeros(self.nl, bs, self.nh))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rnn = Seq2SeqRNN_TeacherForcing(fr_vecd, fr_itos, dim_fr_vec, en_vecd, en_itos, dim_en_vec, nh, enlen_90)\n", "learn = RNN_Learner(md, SingleModel(to_gpu(rnn)), opt_fn=opt_fn)\n", "learn.crit = seq2seq_loss" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "fdd91dafb9c94ffc8eda007de8d0dabd", "version_major": 2, "version_minor": 0 }, "text/plain": [ "A Jupyter Widget" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "epoch trn_loss val_loss \n", " 0 4.460622 12.661013 \n", " 1 3.468132 7.138729 \n", " 2 3.235244 6.202878 \n", " 3 3.101616 5.454283 \n", " 4 3.135989 4.823736 \n", " 5 2.980696 4.933402 \n", " 6 2.91562 4.287475 \n", " 7 3.032661 3.975346 \n", " 8 3.103834 3.790773 \n", " 9 3.121457 3.578682 \n", " 10 2.917534 3.532427 \n", " 11 3.326946 3.490643 \n", "\n" ] }, { "data": { "text/plain": [ "[3.490643]" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "learn.fit(lr, 1, cycle_len=12, use_clr=(20,10), stepper=Seq2SeqStepper)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "learn.save('forcing')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Attentional model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def rand_t(*sz): return torch.randn(sz)/math.sqrt(sz[0])\n", "def rand_p(*sz): return nn.Parameter(rand_t(*sz))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class Seq2SeqAttnRNN(nn.Module):\n", " def __init__(self, vecs_enc, itos_enc, em_sz_enc, vecs_dec, itos_dec, em_sz_dec, nh, out_sl, nl=2):\n", " super().__init__()\n", " self.emb_enc = create_emb(vecs_enc, itos_enc, em_sz_enc)\n", " self.nl,self.nh,self.out_sl = nl,nh,out_sl\n", " self.gru_enc = nn.GRU(em_sz_enc, nh, num_layers=nl, dropout=0.25)\n", " self.out_enc = nn.Linear(nh, em_sz_dec, bias=False)\n", " self.emb_dec = create_emb(vecs_dec, itos_dec, em_sz_dec)\n", " self.gru_dec = nn.GRU(em_sz_dec, em_sz_dec, num_layers=nl, dropout=0.1)\n", " self.emb_enc_drop = nn.Dropout(0.15)\n", " self.out_drop = nn.Dropout(0.35)\n", " self.out = nn.Linear(em_sz_dec, len(itos_dec))\n", " self.out.weight.data = self.emb_dec.weight.data\n", "\n", " self.W1 = rand_p(nh, em_sz_dec)\n", " self.l2 = nn.Linear(em_sz_dec, em_sz_dec)\n", " self.l3 = nn.Linear(em_sz_dec+nh, em_sz_dec)\n", " self.V = rand_p(em_sz_dec)\n", "\n", " def forward(self, inp, y=None, ret_attn=False):\n", " sl,bs = inp.size()\n", " h = self.initHidden(bs)\n", " emb = self.emb_enc_drop(self.emb_enc(inp))\n", " enc_out, h = self.gru_enc(emb, h)\n", " h = self.out_enc(h)\n", "\n", " dec_inp = V(torch.zeros(bs).long())\n", " res,attns = [],[]\n", " w1e = enc_out @ self.W1\n", " for i in range(self.out_sl):\n", " w2h = self.l2(h[-1])\n", " u = F.tanh(w1e + w2h)\n", " a = F.softmax(u @ self.V, 0)\n", " attns.append(a)\n", " Xa = (a.unsqueeze(2) * enc_out).sum(0)\n", " emb = self.emb_dec(dec_inp)\n", " wgt_enc = self.l3(torch.cat([emb, Xa], 1))\n", " \n", " outp, h = self.gru_dec(wgt_enc.unsqueeze(0), h)\n", " outp = self.out(self.out_drop(outp[0]))\n", " res.append(outp)\n", " dec_inp = V(outp.data.max(1)[1])\n", " if (dec_inp==1).all(): break\n", " if (y is not None) and (random.random()=len(y): break\n", " dec_inp = y[i]\n", "\n", " res = torch.stack(res)\n", " if ret_attn: res = res,torch.stack(attns)\n", " return res\n", "\n", " def initHidden(self, bs): return V(torch.zeros(self.nl, bs, self.nh))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rnn = Seq2SeqAttnRNN(fr_vecd, fr_itos, dim_fr_vec, en_vecd, en_itos, dim_en_vec, nh, enlen_90)\n", "learn = RNN_Learner(md, SingleModel(to_gpu(rnn)), opt_fn=opt_fn)\n", "learn.crit = seq2seq_loss" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "lr=2e-3" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "5eeb660d3184439191ed4390415a1241", "version_major": 2, "version_minor": 0 }, "text/plain": [ "A Jupyter Widget" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "epoch trn_loss val_loss \n", " 0 3.882168 11.125291 \n", " 1 3.599992 6.667136 \n", " 2 3.236066 5.552943 \n", " 3 3.050283 4.919096 \n", " 4 2.99024 4.500383 \n", " 5 3.07999 4.000295 \n", " 6 2.891087 4.024115 \n", " 7 2.854725 3.673913 \n", " 8 2.979285 3.590668 \n", " 9 3.109851 3.459867 \n", " 10 2.92878 3.517598 \n", " 11 2.778292 3.390253 \n", " 12 2.795427 3.388423 \n", " 13 2.809757 3.353334 \n", " 14 2.6723 3.368584 \n", "\n" ] }, { "data": { "text/plain": [ "[3.3685837]" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "learn.fit(lr, 1, cycle_len=15, use_clr=(20,10), stepper=Seq2SeqStepper)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "learn.save('attn')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "learn.load('attn')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Test" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x,y = next(iter(val_dl))\n", "probs,attns = learn.model(V(x),ret_attn=True)\n", "preds = to_np(probs.max(2)[1])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "quels facteurs pourraient influer sur le choix de leur emplacement ? _eos_\n", "what factors influencetheir location ? _eos_\n", "what factors might influence the their their their ? _eos_\n", "\n", "qu’ est -ce qui ne peut pas changer ? _eos_\n", "what can not change ? _eos_\n", "what can not change change ? _eos_\n", "\n", "que faites - vous ? _eos_\n", "what do you do ? _eos_\n", "what do you do ? _eos_\n", "\n", "qui réglemente les pylônes d' antennes ? _eos_\n", "who regulates antenna towers ? _eos_\n", "who regulates the lights ? ? _eos_\n", "\n", "où sont - ils situés ? _eos_\n", "where are they located ? _eos_\n", "where are they located ? _eos_\n", "\n", "quelles sont leurs compétences ? _eos_\n", "what are their qualifications ? _eos_\n", "what are their skills ? _eos_\n", "\n", "qui est victime de harcèlement sexuel ? _eos_\n", "who experiences sexual harassment ? _eos_\n", "who is victim sexual sexual ? _eos_\n", "\n", "quelles sont les personnes qui visitent les communautés autochtones ? _eos_\n", "who visits indigenous communities ? _eos_\n", "who is people people aboriginal people ? _eos_\n", "\n", "pourquoi ces trois points en particulier ? _eos_\n", "why these specific three ? _eos_\n", "why are these three three ? ? _eos_\n", "\n", "pourquoi ou pourquoi pas ? _eos_\n", "why or why not ? _eos_\n", "why or why not ? _eos_\n", "\n" ] } ], "source": [ "for i in range(180,190):\n", " print(' '.join([fr_itos[o] for o in x[:,i] if o != 1]))\n", " print(' '.join([en_itos[o] for o in y[:,i] if o != 1]))\n", " print(' '.join([en_itos[o] for o in preds[:,i] if o!=1]))\n", " print()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "attn = to_np(attns[...,180])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig, axes = plt.subplots(3, 3, figsize=(15, 10))\n", "for i,ax in enumerate(axes.flat):\n", " ax.plot(attn[i])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## All" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class Seq2SeqRNN_All(nn.Module):\n", " def __init__(self, vecs_enc, itos_enc, em_sz_enc, vecs_dec, itos_dec, em_sz_dec, nh, out_sl, nl=2):\n", " super().__init__()\n", " self.emb_enc = create_emb(vecs_enc, itos_enc, em_sz_enc)\n", " self.nl,self.nh,self.out_sl = nl,nh,out_sl\n", " self.gru_enc = nn.GRU(em_sz_enc, nh, num_layers=nl, dropout=0.25, bidirectional=True)\n", " self.out_enc = nn.Linear(nh*2, em_sz_dec, bias=False)\n", " self.drop_enc = nn.Dropout(0.25)\n", " self.emb_dec = create_emb(vecs_dec, itos_dec, em_sz_dec)\n", " self.gru_dec = nn.GRU(em_sz_dec, em_sz_dec, num_layers=nl, dropout=0.1)\n", " self.emb_enc_drop = nn.Dropout(0.15)\n", " self.out_drop = nn.Dropout(0.35)\n", " self.out = nn.Linear(em_sz_dec, len(itos_dec))\n", " self.out.weight.data = self.emb_dec.weight.data\n", "\n", " self.W1 = rand_p(nh*2, em_sz_dec)\n", " self.l2 = nn.Linear(em_sz_dec, em_sz_dec)\n", " self.l3 = nn.Linear(em_sz_dec+nh*2, em_sz_dec)\n", " self.V = rand_p(em_sz_dec)\n", "\n", " def forward(self, inp, y=None):\n", " sl,bs = inp.size()\n", " h = self.initHidden(bs)\n", " emb = self.emb_enc_drop(self.emb_enc(inp))\n", " enc_out, h = self.gru_enc(emb, h)\n", " h = h.view(2,2,bs,-1).permute(0,2,1,3).contiguous().view(2,bs,-1)\n", " h = self.out_enc(self.drop_enc(h))\n", "\n", " dec_inp = V(torch.zeros(bs).long())\n", " res,attns = [],[]\n", " w1e = enc_out @ self.W1\n", " for i in range(self.out_sl):\n", " w2h = self.l2(h[-1])\n", " u = F.tanh(w1e + w2h)\n", " a = F.softmax(u @ self.V, 0)\n", " attns.append(a)\n", " Xa = (a.unsqueeze(2) * enc_out).sum(0)\n", " emb = self.emb_dec(dec_inp)\n", " wgt_enc = self.l3(torch.cat([emb, Xa], 1))\n", " \n", " outp, h = self.gru_dec(wgt_enc.unsqueeze(0), h)\n", " outp = self.out(self.out_drop(outp[0]))\n", " res.append(outp)\n", " dec_inp = V(outp.data.max(1)[1])\n", " if (dec_inp==1).all(): break\n", " if (y is not None) and (random.random()=len(y): break\n", " dec_inp = y[i]\n", " return torch.stack(res)\n", "\n", " def initHidden(self, bs): return V(torch.zeros(self.nl*2, bs, self.nh))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rnn = Seq2SeqRNN_All(fr_vecd, fr_itos, dim_fr_vec, en_vecd, en_itos, dim_en_vec, nh, enlen_90)\n", "learn = RNN_Learner(md, SingleModel(to_gpu(rnn)), opt_fn=opt_fn)\n", "learn.crit = seq2seq_loss" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "cec8c2bb6118434b8758dd816b504c49", "version_major": 2, "version_minor": 0 }, "text/plain": [ "A Jupyter Widget" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "epoch trn_loss val_loss \n", " 0 3.817306 7.527982 \n", " 1 3.239813 5.82099 \n", " 2 3.06717 5.437195 \n", " 3 3.077923 4.718295 \n", " 4 2.952973 4.337892 \n", " 5 3.018182 3.994012 \n", " 6 2.761607 3.777056 \n", " 7 2.913683 3.595531 \n", " 8 2.91521 3.46984 \n", " 9 2.921533 3.370839 \n", " 10 2.913826 3.336167 \n", " 11 2.746896 3.37274 \n", " 12 2.695839 3.332427 \n", " 13 2.531583 3.341861 \n", " 14 2.524642 3.324184 \n", "\n" ] }, { "data": { "text/plain": [ "[3.3241842]" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "learn.fit(lr, 1, cycle_len=15, use_clr=(20,10), stepper=Seq2SeqStepper)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Test" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "quels facteurs pourraient influer sur le choix de leur emplacement ? _eos_\n", "what factors influencetheir location ? _eos_\n", "what factors might affect the choice of their ? ? _eos_\n", "\n", "qu’ est -ce qui ne peut pas changer ? _eos_\n", "what can not change ? _eos_\n", "what can not change change _eos_\n", "\n", "que faites - vous ? _eos_\n", "what do you do ? _eos_\n", "what do you do ? _eos_\n", "\n", "qui réglemente les pylônes d' antennes ? _eos_\n", "who regulates antenna towers ? _eos_\n", "who regulates the antenna ? ? _eos_\n", "\n", "où sont - ils situés ? _eos_\n", "where are they located ? _eos_\n", "where are they located ? _eos_\n", "\n", "quelles sont leurs compétences ? _eos_\n", "what are their qualifications ? _eos_\n", "what are their skills ? _eos_\n", "\n", "qui est victime de harcèlement sexuel ? _eos_\n", "who experiences sexual harassment ? _eos_\n", "who is victim harassment harassment ? _eos_\n", "\n", "quelles sont les personnes qui visitent les communautés autochtones ? _eos_\n", "who visits indigenous communities ? _eos_\n", "who are the people people ? ?\n", "\n", "pourquoi ces trois points en particulier ? _eos_\n", "why these specific three ? _eos_\n", "why are these three specific ? _eos_\n", "\n", "pourquoi ou pourquoi pas ? _eos_\n", "why or why not ? _eos_\n", "why or why not ? _eos_\n", "\n" ] } ], "source": [ "x,y = next(iter(val_dl))\n", "probs = learn.model(V(x))\n", "preds = to_np(probs.max(2)[1])\n", "\n", "for i in range(180,190):\n", " print(' '.join([fr_itos[o] for o in x[:,i] if o != 1]))\n", " print(' '.join([en_itos[o] for o in y[:,i] if o != 1]))\n", " print(' '.join([en_itos[o] for o in preds[:,i] if o!=1]))\n", " print()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 2 }