{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "run_control": { "frozen": false, "read_only": false } }, "outputs": [], "source": [ "%%capture\n", "%load_ext autoreload\n", "%autoreload 2\n", "%matplotlib inline\n", "%load_ext tikzmagic\n", "# %cd .. \n", "import sys\n", "sys.path.append(\"..\")\n", "import statnlpbook.util as util\n", "import matplotlib\n", "import matplotlib.pyplot as plt\n", "from statnlpbook.lm import *\n", "from statnlpbook.util import safe_log as log\n", "import statnlpbook.mt as mt\n", "# util.execute_notebook('word_mt.ipynb')\n", "# matplotlib.rcParams['figure.figsize'] = (10.0, 6.0)\n", "# import tikzmagic" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "$$\n", "\\newcommand{\\Xs}{\\mathcal{X}}\n", "\\newcommand{\\Ys}{\\mathcal{Y}}\n", "\\newcommand{\\y}{\\mathbf{y}}\n", "\\newcommand{\\balpha}{\\boldsymbol{\\alpha}}\n", "\\newcommand{\\bbeta}{\\boldsymbol{\\beta}}\n", "\\newcommand{\\aligns}{\\mathbf{a}}\n", "\\newcommand{\\align}{a}\n", "\\newcommand{\\source}{\\mathbf{s}}\n", "\\newcommand{\\target}{\\mathbf{t}}\n", "\\newcommand{\\ssource}{s}\n", "\\newcommand{\\starget}{t}\n", "\\newcommand{\\repr}{\\mathbf{f}}\n", "\\newcommand{\\repry}{\\mathbf{g}}\n", "\\newcommand{\\x}{\\mathbf{x}}\n", "\\newcommand{\\prob}{p}\n", "\\newcommand{\\vocab}{V}\n", "\\newcommand{\\params}{\\boldsymbol{\\theta}}\n", "\\newcommand{\\param}{\\theta}\n", "\\DeclareMathOperator{\\perplexity}{PP}\n", "\\DeclareMathOperator{\\argmax}{argmax}\n", "\\DeclareMathOperator{\\argmin}{argmin}\n", "\\newcommand{\\train}{\\mathcal{D}}\n", "\\newcommand{\\counts}[2]{\\#_{#1}(#2) }\n", "\\newcommand{\\length}[1]{\\text{length}(#1) }\n", "\\newcommand{\\indi}{\\mathbb{I}}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Word-based Machine Translation\n", "\n", "Machine Translation (MT) is one of the canonical NLP applications, and one that nowadays most people are familiar with, primarily through online translation services of the major search engine providers. While there is still some way to go before machines can provide fluent and flawless translations, in particular for more distant language pairs like English and Japanese, progress in this field has been remarkable. \n", "\n", "In this chapter we will illustrate the foundations of this progress, and focus on word-based machine translation models. In such models words are the basic unit of translation. Nowadays the field has mostly moved to phrase and syntax-based approaches (and fully neural methods), but the word-based approach is still important, both from a foundational point of view, and as sub-component in more complex approaches.\n", "\n", "## MT as Structured Prediction\n", "\n", "Formally we will see MT as the task of translating a _source_ sentence $\\source$ to a _target_ sentence $\\target$. We can tackle the problem using the [structured prediction recipe](structured_prediction.ipynb): We define a parametrised model $s_\\params(\\target,\\source)$ that measures how well a target $\\target$ sentence matches a source sentence $\\source$, learn the parameters $\\params$ from training data, and then find\n", "\n", "\\begin{equation}\\label{decode-mt}\n", "\\argmax_\\target s_\\params(\\target,\\source)\n", "\\end{equation}\n", "\n", "as translation of $\\source$. Different _statistical_ MT approaches, in this view, differ primarily in how $s$ is defined, $\\params$ are learned, and how the $\\argmax$ is found." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Noisy Channel Model for MT\n", "\n", "Many Word-based MT systems, as well as those based on more advanced representations, rely on a [Noisy Channel](https://www.dropbox.com/s/gfucv538m6anmgd/NoisyChannel.pdf?dl=0) model as choice for the scoring function $s_\\params$. In this approach to MT we effectively model the translation process *in reverse*. That is, we assume that a probabilistic process (the speaker's brain) first generates the target sentence $\\target$ according to the distribution $\\prob(\\target)$. Then the target sentence $\\target$ is transmitted through a _noisy channel_ $\\prob(\\source|\\target)$ that translates $\\target$ into $\\source$. \n", "\n", "Hence translation is seen as adding noise to a clean $\\target$. This _generative story_ defines a _joint distribution_ over target and source sentences $\\prob(\\source,\\target) = \\prob(\\target) \\prob(\\source|\\target)$." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "run_control": { "frozen": false, "read_only": false } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZAAAAAnEAYAAABxl8L2AAAJJGlDQ1BpY2MAAHjalZVnUJNZF8fv\n8zzphUASQodQQ5EqJYCUEFoo0quoQOidUEVsiLgCK4qINEUQUUDBVSmyVkSxsCgoYkE3yCKgrBtX\nERWUF/Sd0Xnf2Q/7n7n3/OY/Z+4995wPFwCCOFgSvLQnJqULvJ3smIFBwUzwg8L4aSkcT0838I96\nPwyg5XhvBfj3IkREpvGX4sLSyuWnCNIBgLKXWDMrPWWZDy8xPTz+K59dZsFSgUt8Y5mjv/Ho15xv\nLPqa4+vNXXoVCgAcKfoHDv+B/3vvslQ4gvTYqMhspk9yVHpWmCCSmbbcCR6Xy/QUJEfFJkT+UPC/\nSv4HpUdmpy9HbnLKBkFsdEw68/8ONTIwNATfZ/HW62uPIUb//85nWd+95HoA2LMAIHu+e+GVAHTu\nAED68XdPbamvlHwAOu7wMwSZ3zzU8oYGBEABdCADFIEq0AS6wAiYAUtgCxyAC/AAviAIrAN8EAMS\ngQBkgVywDRSAIrAH7AdVoBY0gCbQCk6DTnAeXAHXwW1wFwyDJ0AIJsArIALvwTwEQViIDNEgGUgJ\nUod0ICOIDVlDDpAb5A0FQaFQNJQEZUC50HaoCCqFqqA6qAn6BToHXYFuQoPQI2gMmob+hj7BCEyC\n6bACrAHrw2yYA7vCvvBaOBpOhXPgfHg3XAHXwyfgDvgKfBsehoXwK3gWAQgRYSDKiC7CRriIBxKM\nRCECZDNSiJQj9Ugr0o30IfcQITKDfERhUDQUE6WLskQ5o/xQfFQqajOqGFWFOo7qQPWi7qHGUCLU\nFzQZLY/WQVugeehAdDQ6C12ALkc3otvR19DD6An0ewwGw8CwMGYYZ0wQJg6zEVOMOYhpw1zGDGLG\nMbNYLFYGq4O1wnpgw7Dp2AJsJfYE9hJ2CDuB/YAj4pRwRjhHXDAuCZeHK8c14y7ihnCTuHm8OF4d\nb4H3wEfgN+BL8A34bvwd/AR+niBBYBGsCL6EOMI2QgWhlXCNMEp4SyQSVYjmRC9iLHErsYJ4iniD\nOEb8SKKStElcUggpg7SbdIx0mfSI9JZMJmuQbcnB5HTybnIT+Sr5GfmDGE1MT4wnFiG2RaxarENs\nSOw1BU9Rp3Ao6yg5lHLKGcodyow4XlxDnCseJr5ZvFr8nPiI+KwETcJQwkMiUaJYolnipsQUFUvV\noDpQI6j51CPUq9RxGkJTpXFpfNp2WgPtGm2CjqGz6Dx6HL2IfpI+QBdJUiWNJf0lsyWrJS9IChkI\nQ4PBYyQwShinGQ8Yn6QUpDhSkVK7pFqlhqTmpOWkbaUjpQul26SHpT/JMGUcZOJl9sp0yjyVRclq\ny3rJZskekr0mOyNHl7OU48sVyp2WeywPy2vLe8tvlD8i3y8/q6Co4KSQolCpcFVhRpGhaKsYp1im\neFFxWommZK0Uq1SmdEnpJVOSyWEmMCuYvUyRsryys3KGcp3ygPK8CkvFTyVPpU3lqSpBla0apVqm\n2qMqUlNSc1fLVWtRe6yOV2erx6gfUO9Tn9NgaQRo7NTo1JhiSbN4rBxWC2tUk6xpo5mqWa95Xwuj\nxdaK1zqodVcb1jbRjtGu1r6jA+uY6sTqHNQZXIFeYb4iaUX9ihFdki5HN1O3RXdMj6Hnppen16n3\nWl9NP1h/r36f/hcDE4MEgwaDJ4ZUQxfDPMNuw7+NtI34RtVG91eSVzqu3LKya+UbYx3jSONDxg9N\naCbuJjtNekw+m5qZCkxbTafN1MxCzWrMRth0tie7mH3DHG1uZ77F/Lz5RwtTi3SL0xZ/Wepaxls2\nW06tYq2KXNWwatxKxSrMqs5KaM20DrU+bC20UbYJs6m3eW6rahth22g7ydHixHFOcF7bGdgJ7Nrt\n5rgW3E3cy/aIvZN9of2AA9XBz6HK4ZmjimO0Y4ujyMnEaaPTZWe0s6vzXucRngKPz2viiVzMXDa5\n9LqSXH1cq1yfu2m7Cdy63WF3F/d97qOr1Vcnre70AB48j30eTz1Znqmev3phvDy9qr1eeBt653r3\n+dB81vs0+7z3tfMt8X3ip+mX4dfjT/EP8W/ynwuwDygNEAbqB24KvB0kGxQb1BWMDfYPbgyeXeOw\nZv+aiRCTkIKQB2tZa7PX3lwnuy5h3YX1lPVh68+EokMDQptDF8I8wurDZsN54TXhIj6Xf4D/KsI2\noixiOtIqsjRyMsoqqjRqKtoqel/0dIxNTHnMTCw3tir2TZxzXG3cXLxH/LH4xYSAhLZEXGJo4rkk\nalJ8Um+yYnJ28mCKTkpBijDVInV/qkjgKmhMg9LWpnWl05c+xf4MzYwdGWOZ1pnVmR+y/LPOZEtk\nJ2X3b9DesGvDZI5jztGNqI38jT25yrnbcsc2cTbVbYY2h2/u2aK6JX/LxFanrce3EbbFb/stzyCv\nNO/d9oDt3fkK+Vvzx3c47WgpECsQFIzstNxZ+xPqp9ifBnat3FW560thROGtIoOi8qKFYn7xrZ8N\nf674eXF31O6BEtOSQ3swe5L2PNhrs/d4qURpTun4Pvd9HWXMssKyd/vX779Zblxee4BwIOOAsMKt\noqtSrXJP5UJVTNVwtV11W418za6auYMRB4cO2R5qrVWoLar9dDj28MM6p7qOeo368iOYI5lHXjT4\nN/QdZR9tapRtLGr8fCzpmPC49/HeJrOmpmb55pIWuCWjZfpEyIm7J+1PdrXqtta1MdqKToFTGade\n/hL6y4PTrqd7zrDPtJ5VP1vTTmsv7IA6NnSIOmM6hV1BXYPnXM71dFt2t/+q9+ux88rnqy9IXii5\nSLiYf3HxUs6l2cspl2euRF8Z71nf8+Rq4NX7vV69A9dcr9247nj9ah+n79INqxvnb1rcPHeLfavz\ntuntjn6T/vbfTH5rHzAd6Lhjdqfrrvnd7sFVgxeHbIau3LO/d/0+7/7t4dXDgw/8HjwcCRkRPox4\nOPUo4dGbx5mP559sHUWPFj4Vf1r+TP5Z/e9av7cJTYUXxuzH+p/7PH8yzh9/9UfaHwsT+S/IL8on\nlSabpoymzk87Tt99ueblxKuUV/MzBX9K/FnzWvP12b9s/+oXBYom3gjeLP5d/Fbm7bF3xu96Zj1n\nn71PfD8/V/hB5sPxj+yPfZ8CPk3OZy1gFyo+a33u/uL6ZXQxcXHxPy6ikLxyKdSVAAAAIGNIUk0A\nAHomAACAhAAA+gAAAIDoAAB1MAAA6mAAADqYAAAXcJy6UTwAAAAGYktHRP///////wlY99wAAAAJ\ncEhZcwAAASwAAAEsAHOI6VIAAAAHdElNRQfhChgWBhSa+qlvAAAv9klEQVR42u19B1zN+///+Zw6\np61FGlRKNIwoo2EmVMZFCNluVOaNZI/KRdkZyV7XioRsHVRmuvY1QkOD9jTC/36en7ffv+N7nFvJ\nveH9uo+H5+Wc8/7s1+v1fK0Pj0eFChUqVKhQoUKFChUqVKhQoUKFChUqVKhQoUKFChUqVKhQoUKF\nChUqVKhQoUKFChUqVKhQoUKFChUqVKhQoUKFChUqVKhQoUKlpgnTgvyPWxUXaEywhJ5LKlSo1DD9\nNoT8T5uvXEiG4AmCLapp/0zoNaJChQoVKj+RGLzjsNt2DgX+VVuHP4VDG1sOjZPpuaVChcp/KyYD\nOOwygvCHSV9JFFpzOHgkh3qvqmc/zVZwaL2X/MNZeu2oUKFChcoPKPU3c/gbyXgY7yQffJ4BecCB\n0hZCNOp8YUGSAalHCIzPew71Q+m5pkKFyr8rDUlGYcpzou9yyQdfmQFhLnC4ZA2HFg7Sv6+UwaHg\nH9ZVacmhVwjZzU30GlKhQoUKlR9IBIYcTl/HYZdw6d/XeMThWksOtdZWbDsORWQ7PTiUnU7PPRUq\nVL6tyGtwOLchh7YR1bs+c4rD3305NN8u+XsywRyu0iWE4tcKEid9sn46h5pCek2pUKFChUoNEuaX\nLxi+f4jwWZoSw5jEoeLcL3zxGgf91Di8I8uhLsmcMF7ke70k/1zpGCEupzlsOoReMypUqFRQv/l9\nQb9ZS/+dbTyHQbM4FG6o2Pb4iwkWVpKA7JD8vVYXOXxMsCPJJDNkv3i/fWE/BnM4y5Lo3zB6L1Ch\nQoUKlf9QZBQ4bEsibl4LiGEjBtf+DTFYpQTVOVS4IL6ONzHgv5VK3o6QrOtuwOF9EkksbMvhljgO\nF5OagvpK0vd7JsmYjLpFryEVKlS+4HiTUqXWDTj0jOCwiyYhFiSw0vcFh66JHCpbiK/jSxx9j3jp\n21NV43AYcfB993E4jei5QJL57XL2HwhIT7L/ZL97ET13iTSnvycBn0Ok92R5Iw6b/EOJVR9S6rp4\nG9H/7vQeoUKlmuXjR4oUqw9/XDEihrgnaX70IdOmjupxaDKKw0+9GivyicH+LEMRQgz2L4++QHQm\nc2ixnMOLSzjM0OKwW1MOm5NInqKB9P0eRAjL0nlUn1GkSPWkZNEnvWN9iX7yJkQjijR9m5YRfUP0\nyBLSezHoICEGTTjcTBz3Ll7Stzd8NIeHnxJCsIcEVFI49CO9a+7MPxAQUlrKeBA9TXCjPIfvAjgc\nQ4Z9tCD6WfWk9P2zNORwDyndUjKneo8ixWrWgx8/+ogoUqw6jrHmUO3cj+3aGpDmcEMS+Vs2jsOZ\nCp9FEsdwuOlTKr+h+Of7SMlVBw3p25MhBCXiHodJxHDW6Vq5/e5BHImtXXk/gXz8yMzgsGshfT4p\n1hwcPIdD+YY188nRI5kGI1LqGUgIycI/PyMAlhyuJhmGQEJMZK8QfZXGYcuZ0rfnTvRaOnHso1tx\nuP4lIQwko6E74bPtn/mMgOySvP6cbA7fkoyzXSUJhBE5/ijiKqluqNl6z+A+h+6hBGdRpFgD8QiH\nfS+TG5fjIhQpVg2TlDg0Wc/7KURlIIdHpnHo+Eb8cw0yjSr6KocDPjN8O4khcwiUvh0hiSQeeUsI\nCOkZ0SKZFSUXDuu+l75Ob+JYhHb/OQiITG8Ot7rR55NizcHrNhxqdKvZT5ASGeN9sDOHPVeKf16L\nEIzTJEM79AgJvJASrQNTOWxdJn07tYljP86VBGYIJpASqgxSOjWdnC9mKcH1nxGQLzShzyPE5C35\nvT3R26qRHGqekb5/Jvs5jCSErJZ7zdZ7A/w5fDeYYG+KFGsgjiZ+ozMlIBSrARM3EQIi+3MQEIu6\nHF4mU6wafBbp67+Iw70HiKH9rATLnxjS4WOlb0dAMiWHyRz9F6QUS5tkWLoTgzh3ifR1xhJCNPP3\nn4SAdOZwUyZ9PinWHLxiQAhIDX+CjGtzGOPIYZMk8c9dSI/H/m0kAHKcfED00jKi9/r9gz0YlsWh\nB9Fz8qQpXI2M7w0kBGMf0bMCMqacsa8YAZlN9HIZKclqTwjJMD6Hnv+QEWlHxvxuJKWucgdqtt5z\n60WfM4rfD6YJKQGhSAlIpcWNlFzFx3I4kBjgrmTc48zdHJqNkvz7Tz0h/r8Qg/qFJvJPb+YdtpHD\nW2ocfppqteZTrfUXInmMHIdLiAF3+YsSEIoUKQH5h4wpyUAkRJMMBymJ6kKGWfiS3rdm8l8gFrM5\n9PvUA6Iv+Xu/EX12lrxAddJwcUKynDj+PT6NKbcheu1YxQiILZkWeL0ZIUykd2Q9eR9JWxnp52FU\nFIdTH5J/yKvhBMSQPmcUvx98EUwJCEVKQCosDGm2DCbEYeEHkgEx5LCRNylhGCB9nbpk2ssq0jxZ\n/x/G48oYcWjYl0MrEjk0IE2UjJHk3+mTUq01Thxq/SRN6JSAUKQEpApC9IU/CagEEQJiSIZuNLrJ\noXK69GUMSOBkJdFTWvUkf6826SHRJxncJuR7Lcj4cV0SQPl8zHhF3wPCI/pOl2QyrEhJl8GnMenH\nJf9MkfTuBSmT3935PvQeJSAUKQGhSAnIjyZk+pQKKXmKIgasz5gqrkcyJb1JpsTjMIfV9aJAPik1\nGEGaL3vr8n4ioQSEIiUglRKSgVUi7984TKZgDe5QxfVIj4UbyVQMIxkO/rJqCgR9IiBkKId5ZPWe\njk4xHHqTDI1sISUgFCnWYAKiHaMdw+JQ3lD8fWW/lf1Y3Pq3sLj8w/IPLA72GezDos4rnVc/8wXQ\nT9RPZNGqk1UnFhXWK6ynBKSmipDURLvZcXikDykZIM2XqlU01EIy3rE/iQB2JVNh+FU0eAyJFLYi\nJQf9MzmU+ykyH9+SgChEKUSx2Nm0symLc7vO7cri5pDNISyGbg7dzOIU6ynWLDbzaObBIv8g/6C0\ndeuPqT+GReub1jdZlM+Sz6KGqeaghq2GLYuto1tHs6ipo6nz4xEQGdIT0Y9kJA6RDMe0+hxqzq/a\nugrEgR9Mpvl1JPqTuf+VBISURgWQzK9Zz+o5D4YkoDSc9LhoLvm+9F7VCYhKVxXos9bprdNZbLup\n7SZgbNtYadh6fWv4LcaDjAexKIgSRFG98f+Rmc/MZ9Ei2gL6w9zO3I5FGQUZBXp+qoGA2HnZebEY\nqxyrzGJR+6L2LD6a92gei9dGXBvB4oPCB4UsFpwrOMfi5aGXh7JoG2YbhvVkeDI/0wVYVLKohMXn\nxs+NWTTLM8ujBKTGymOSmielUoqEkCgRIsI3+brlBWR8sRGZHiNzsIqGeRuHemQ+v1ws7yeU6iQg\n+qb6IBxb3m55y2Ju09ymLCb/lvwbi9dfXn/J4p29d/bic5NcE3x+N/kui55XPa/CMG8WbJa0/vzf\n5//OYtrLNKzTWKaxDDVMNQd7H+99HNd1bO5YFvv59/P/ATMg44leMyRIMqdKpHSTf/7rlpcjhKMB\nCYjwnb5yf8n0QJMbZD+Dquc01CHj1VWDvk+9V3UCYqtsC/8tzzoPAZRi9WJ1FvPv5kOP5WfnZ4vh\np383zDdk8enup7tZ3LR602oWTfgmfKo/Pn4UHhQiAHXR96Ivi2fOnYH/qzJHZQ49P19BQOqY1TFj\n8ULkhUgWs1tkt4DB/fs/VgzqGNRhUW2W2ixE+lrXb83iyKYjYcDTLNMsWYyZGzOXRa1Yrdif6QIE\nywfLs5gRlBEEhuxv4U8JCBUqNYOAqF5QvcDiH+F/hLNY8kvJL3huRwePxvPKt4CBVc9Xz0cGeKX2\nShYd/v6PFVG8KB6O6+vc1yz2WNpjqaTtBA4KROQwd3UuDLfpXNO51DDVHOwb2Rf27fXI1yNZHLBr\nwK4fvwmdys9GQOxt7G1YLFtVtorFsxPPTsT9f7HvRRDvaf2mlcdP/z4iYUQCiwdnHpyJ398pu8Pi\n3oi9ESwqC5WFP7P+kD0te5rFJTeW3GAxIC4gDhl1XwVfql+/goB0edrlKRTznNdgchsbb2yMkoPz\n/PNSf6/CU2Fx1u1Zt1l82/4tMiYupi6mFUoVClQEiExO0Z8C/Ps//LuXitfXnBAZTxkwJ53lOstZ\nNFQ2RERAZb/K/kql3DozncsTKsMYQ5SmaTbRbFL+e0GOQY4gIO0y2sGh8bHwkcqkWwhB8HTX6K5h\nsYFCA6TwNAdpDvqa4+YX84tZ1PPRw/ZVbVRtKAGh8tMSED2eHoujFUbj+XrzyxsQj7Wz1iKQotBG\noU1F1ml9tPVRBGb0srFe5LDIYSwqrlFcI0ZAzAIRyMl1ynVCBkS2sSwMl4os9KROpk4m9Mh+Q+gh\nDZ5G5UoAFBlFFmsfq30M66QbosRCP1Yf+knNRM3kqwzsbVnocb3VeiBQBlEGKMFQuapy9WvW1XTT\ndMP+phimsFjXuC4yxYwao1aZdeTuyMEhqjeyHgiEoZMhzrO6q7prhQhISF+U2L2Wf42A0YDAAYGU\ngFD54QiIsT2er/eF71GpslF3o25lfq91TguR/dgLsQjc5DzOecyi5QHLAxX5vdoaNehFQ29Dbzyv\nQ+uhQkb+nPy5qhyPUoZSBvTcOP1xwJb6LfHv9ZTqVUp/+jDwi+rsrLMTfpdVAysEnN5pv4PfeEnm\nktR1THjQr7JtZdvi+/kyCFjxuvG6/RuOvmKCIghi/fX1USpncMbgDIu1xtQa8zXrChYKFuI6rai3\nAkTTS9nrXyUgrtauSNV9kqC3QW8r8/umU5tOZTGsUVgj1FbHdpaYAZEdLjucxT42feAYnxp4aiBK\nHGKT8f1U01QQl9NTToOQuAa7BuN3XrIST4hdvB0ik4tfLUYPipm9mT2L/o/98cDc176vjQxN/bT6\n2J7FKQsQrkldJuGG7MP0kdgDE6cNZrvg3QLcmLdVb6uymH4y/SSLV/dc3YMemX5D0RuzsvFKELb0\nW+m3pBEQy/6W/dFL47MVnz+a9ggRiIx+GVjnWvtrIHDTrKfheqjbqttKWqeRViMtXKfbQXAYWvZo\n2YNFr5Ne2L/EnMQcZLD+FkpAqHwfYkFKMJokf0ZAelSVgCg1VsJzeWbjmY14Pueno4a3mX4z/Uql\n3hcKoaDXXl57mcUo1yg4vLqtdFuJERCLQOiX7AfZD1hs/7Q9AjvBnsF4Du+/uP8C+zEtHc99nFKc\nEotDzIaAuMjWk5VoUHXW68DgLJy+cDqL8QPiB2CdtPQ0FlMCUgJQGrDu4joWxwWOg2OtMFthtqT1\nnHY6wQDPT52fCr3pbQZHYfnR5SBaj3Y9QmbgxbwXKL09aXwSDk0XQRcEjHiveGI9f9pztBG4WnJv\nyT0c97b221h083TDcV+ZfQX7kd41HbXpt61vQ79NV5kOYqZiqyJRz/Hm8LCubQtbBGz+mPcH9ufp\n6KfIXKVdSIODFLs0FhkprwAvnAelcKVwSkCo1GzRIFO8Oq3mkN/oMwLS4KsJSPL7ZPhlY8Kq5KCG\nvgl9w+IHhw9IBbd72O6hRMc4TBGl9x6vPKAXYmRjEHhJK0wDAXo64+kMFveb7IcDb//E/kn5gMr/\nBFJH8UdBTxU4FbB4bOsx9B4nb0uGXkn2SYb/dDT+KPy/7vHdgfzx/PES/TkrbRANf0V/bC9hRQIc\n7YxVGcgQPVB+gAD1mp1roBcbtWzUUmJg21YGemqYYBj04OCcwfCz5G7Jwe8bunsoStcWb16MEl3d\nEbojpAbKD8qgpGvsr2N/xf5N9YcfXdu6NvQjvwO/A4vdNnVDD0+kfST82+cvn6PE98UfL/5g8bz5\neXPYkalD8HvhYeFhSdvraNoR/vXvHX/viNK6YyYIZC3xWIIex2clz9BK0D+9f/q/SkDaCtrihOYE\n58Dhf2b4DLWAQy2HorRK66EWbjxmH7NP6gk9JwOGy9xgboh9vp+HiN+Qk0PgIGdnZaM58+aUmyAa\n8/3no2TJ38kfEa27qnfh8GenZ+NEDJ0zFIaImcHMKL+uJ88TWLy6GBG7Xdt3bWdxZ9xOEIhhl4fB\nYZifNx89GTkbc+CIJMxOgEGsp1ZPLAKnXKiMB2ab1jY4+K//FjwA9Y+BwPiZ+OEBWt91PWdIX93G\nA/dw1UPcyKlhqWGSCEiTpU1gIP90+dMF3xuSOgTERX0lajOnp06HI3Bi1Ak8eCVyJXIshpwOQcpP\ncYyimAJxUHfA74r+KvoLhKb21tos3nt9D/t7wvAErl+3h90eVoGAhHHY8Bg1ElT+PbHN5nDdOw7r\naHH3IcMnBORlZfVak3dNEEDIiMlA5vL4mON4jhRiFapUIipnJgeioLRJCQaBP5Q/tPznAWEBeP6L\nw4qBJ9+ffM/i9qDtKM0cNnnYZOiRX/yQiUm8mojMQoZsBgy2zVUbsUyD3AS5CXAEBoYiUFO6sBRE\n6OCBg4hETus0DUMv5i+cj39PGJOA4yv+vRi9KAOcBzhjPXmefPl1F+gtQCYnq3lWcxb3jd8Hw701\naCv2c2jPoT3R29Z/EQImuVtyt2D99ATo43rv670vv56FgYUBasjt8tGUeWTYEWSIIiMiUboxcdHE\nRTC0z8Y+Q6BlybUl0K/vXuP6uLdxl5iJsltqB735xPkJjiPRJBH6d/G9xSA6s1rMAjER5YsQiSx2\nKYZ+XSC/AMcraCJo8g0JiB6H6tvps0ul8qLZhMP1JOBiIy9OQAbmfjUByXoPP2tj2MawSum5pnIo\nrT+y7ggCGpkZmchAWDS0aCgWOQ8WwF+cO34u9EdB0wL87nThafhRfpf8kFFYFrsM+jbZNRmBm6eu\nT4HtNdprSNp+tznd4O+lXUrD7xOsEkAg5lnPg2O+yGAR9E2ibiIyO6l1U+tivSntp4hlYkRqIhb3\nnNwDvzNvaR70yZ7FexazODV9KvTZlnVbcJx58XkgMhdtL4JoGGYZig0REd4RIgMbtz0Ofmb05Wj4\nl0ojlEA0JttOxu8+eH5A4MV9pvtMaefZONc4l8WkV0nwI4+/Pw69qtBPAQHpHkN7wL6kN0mHHrvj\newelXgtDF4bifKjNg/96w/4GiEnBrAJk9sfHjYf/K2MsY1x+e5MPTwYxyR+TDzuxxXgLPn+09BHO\nS/i6cJwHW1Nb03+VgCikKsABDu4YDGZUalWKC/7O7h0Myi3FW2COoe6h7uj9UBiJkobml5vjAiib\nK5tLW19ntw6Y4Z/MnwyaPe9dhwFpeKfhHUnft7C0APFJeJqACOJtp9sgJrrZutnlv+fRy6MXLvik\nD8hohH9E4OujZqpmqljqLY1BpHBFyQowvCLjIpz4jq86ikXyeh7oCcNe/Lz4Oabi8DejNlytg1oH\nMaLVUQbnaZjXMGRmilOKUVqQKp8Kw2ahbgGCwA/gIyK3dtdaRBRz9HMQeXXe6gxGz9gz9uXXVf9V\nHUx4i9MWHG/RliIY/u4R3SPKf6/T+U4ojSuKK4orPyTAubkzHArhCiEYPu8Y71il74fMRIyRTQ0w\n8+NUoZC88E6YRJHit0Ml0jT6G5ketiKV/VOrtB5emPbx1SaDyuo1xwBHPH9vh7zlCD+zEvqHl8BL\n+BYp8gCDABjGj4kfMRVvp/JOrvSzk0onSd8f7jwcjvW7vu/6wkHXGSs2lanxhMYgIKm3UhFhi5CJ\nQFO7yhuVN5LWa+PcBuvlKedhuyudVkKP8N7wxL4/O2U29NUH2w8wmGGDwlD6qZSvlC8WiYzgQ+8s\nn70cAZvXKq+RsbAV2YrKf8+8i3kXbHdQHtb56+BfiOxZdbXqKmk/O0R2QC9GQdcCfL762uprYvYo\nTAEOU7hjOEpbU5+lgrjYnbA7ITHCKaON8xJhGQG78dLqJexXq/mt5n8zAhJ9BUMr4jTrkiERwiL6\nHFOsBJKpi13IlMNPb4i3fsv++e7t/70J/UWVCYjZewRM9i3btwwO71RjRMiN3xu/F8PxxiAQzXY1\ng5/il+4Hxzx9Vzr+vvgj/PWPcgI5QfntWP9mjeEdmfGZcNz3Dd0Hh1lDWUNZbL+SeEnIoI7vgu2k\nl6aXsnhI7RAcaIVFCghQ1LpWC3rgrNtZlGze63WvV/kM7efH6eLsAn1XHFKM5zpELgSBW8aMwXEP\nCRsCPVJypwR+5uxOs6GH5S7IXRAjUmECfM97ojd6ZV6bv4Y/O//yfPi3TBlTBr9qtRCB7rg+caic\nifaJRqBZqYFSAwS8/JsgkJ7ZPhOVLLsm7IL+FswQzJC0/yOnjsT1KL1XCn94eJ3h6LWu5VML615o\ndwEl/fcy76F01yLEIkTSOkbLjdBqEHM+Bn7hU/en8NNNzpmIlbxN8J+A/Xsf+h4E5lLWJRCsVidb\ngaDJashyhHApb+l/MgVLKVkJKTu3kW6osd3xZgcM1wO3B7ghioOLwXg/HPsAxzbzUeYjlDYNOgXD\n4zbKDRF8OUaOEZs+ktMbqaqSAyVw8D32eeyryP5MTp6M/SkuKEYqrtvYbmPFCMhWDzjyb4+9xf6M\njB8ZL229sVPH4oIXlRbhAXDs5dir/OchHUJANLJdshFJa5nUMklqc+szVRjGS/UuoXQirW8aHAmz\nK2ZXQJi66aI28EnDJ4gc7NXei5IwfkN+Q2nrWj+1BvHKdc1FpMD/vv99SQSkpG8Jtrdh0wZEZJlG\nTKOvdaQ+2DxDathx4OIDnEIMI+Nlw/ZQpPgN8RcOt/lwmIgXqcm2jcQc/+Q/D/et7P3cO713evnS\n0gC9AL1vWaMbcDoAGcvSm6UYw+uy2GWxtO934nUCFt4txBQa79neYiVT2hnaiDx65XohUtZGu422\ntPVaqrQEQcgalQU9vLb9WhhCpjfTu/z35rjNgT5/M+sNImbOV5yvSFvXa5YXvlf6aykCJO1C24WK\nERBHcxCF/Ix87O/WLVsROGESGIlEz7iHMUpGk24m4Txt6L1BbP8sjlqgFCxzfCYcllVDVg2pyPl3\nMXeB41B8rRiOjHeSt5j+7rus77LqIiBvZ99YxGKDzUvIe4U2mdLnmGIVkLzB/VoYh7cm4W4at2I3\nd799mFBVAvKm7xv4B0WJRQiIpPRJgeOcsjNlpxj2TEHGMzMtE4Ha9zfeo4JlV/NdCGiq/aX2l9h2\nLHkg+n5r/dbiud+djwBz+4z2GRXZv62OW7nAwvXU6+hlaGAAB76VVysEdPPm5mF4xzyFeVLH2ypP\nVYY/tyFnA/zLwDaByKQq71TGce2z2YdS/0dzH2G9erb1bKWtV0uhFrZ33fY6vnfuzjkQFyUfJRAC\nQb4AAZq4FnHIvEa7RsM/UxmsMhgEpaMQgemDPgfx/WcnniFgYpRkJKaHBDsEO1CS1m8/Mh2Pgh4h\n86xXpgeiY3/CHr8rVC9EINtvpN/IipzXEaNHoDS1+EjxEbQwHHE9IkZALk7AsIGy22Uo3fdq49Wm\n+uzfN3gRoWyuLAyfTqkO57D7O4JBzVw/EzXJZ26fwYGUZpciM1H4rhApde823jgwJprBvOQpF6aA\ncZZpl8GAhu0IwwXw7eOLB8J3l+8uMdT3RabgD/c/3Ms7ECMfjXwkRkCyPbDdorNFZ7F/Lo4uUgmI\nyViT8qVLjnxHbrxcBA+RvlM2p3DDxq+LRypKc6LmRKlN3z34MKTrE9bD0KYnpANNB5vihmxzuA1S\nXrn1ckFQojOi8YD6GvkaSTzukb640ZYWLAXhKoksQaRwT+c9nSVmQGoXofRqQsaEjGq77nWferA4\n3m8geVFfLzeCARQpfjvseZdkQMgbmq9gfKjw3lQhi3kDdm+t7P3c/UN3vK+oTFSGiP2ymcu4lPgM\n3oxvQUACTQKhX3LX5UJ/mF0yk9rU2KmsEwxO4fPC5+X15uff+1T62rhNY3zu+swVgQ+frT4IwITo\nhYBYxYfGgxh8CPwAx3rN0jWIZDHOjHP59eaazkWKPWdcDpo6rc9bSx024j3JGxnm0jal2H47r3Ze\nkghI3ug8GED/uf5Sp3817NwQ+ixZPhlEYH3Z+jKxHpX9TijZfb3pNQIrR2cdBQHy1fPVk6g3+/ui\nVGzd1nU4H2U7ymBfgnsE9xAjIMP7Dq8uAvLe8tpAFu2EvUgTSK8n9DmmWAl05HDwWg7PLuUwRJ/9\n84Khz1Vyv7WtKgF5F/gO9/fVS1ehh3wFvshg+PJ8eWIo9MV0q3mH5h1CJN0mBn5Qqk8qHOmBSwai\nZJLH58Ff4r/go5dt847NeM4K3QvhpwWPCkbgw3eB7wKx53OnLwiBr5svAh8XfS5i3eL4YgSM20S2\ngZ/jttMN3ysdWDqwfEXKF49XmYdMi6KJIvSuwhMF9JbU+a0OMjPXYq6h9Pa59XOUbs25OOeiRP2x\n3hf+7MzFMxEweqLzBJnoxEaJCOhqXtW8Wr5Z+38ISIxKTPn9GrV1FPRQqUopAkIDwwaKlcCZWpqC\nwCWPSsb5Wj1zNewSs5FBi8CYemPgL77r8Q76a/fZ3fBvfZv5NpO4/6180Yu49dDWQ+X95SnZU8Qq\nhiY6ToSeLupbBGLqEOsQ+58SEFl9WTj6jSIb4QZoaNQQjrGMUKZC49Zq1amFlJFnpCd+n1+aD6Ly\n58g/4UjXFtWG4Z+6bSqah95GvIWjH68TjwssWiRC6k20QbRBDOeIUAMo2irChTyvfx776fTc6bkY\nAcn3ACMt6lWETIbDfgepU67G2oy1ESMgjCMyNYwDgyYr0RsRMj4xejEwdKqZqpkVOQ9BzkEw8Bn2\nGSipMjU2hQKwb2APZp8nzMP5fLz2MSIGIh+Rj8TjXiLCgy7aJILhvSBzAaUFs0WzRZIISOHFQjxQ\nnrqeutV3IyWu4dAogtbpUvn3RGM5h2vIm5Fd67B/rp0pM570gDSq7P3c4lELBCyyhmfB8YxYHwFD\nI+8u716V58PFzgUlqduPbEdkyXaV7SqJU7CsclECZOpo6vg1BET5kjIch4B1ASA0KaYpIA5Z/bPg\ncD9JeIKAx/W51+Hwh38IB+Eq8ivyAzGxCbGRSEBazkWTZXZ8NhwA63bW7aqFgPTIg8FcuHLhyq8h\nIL079O5QfirjffP75lLtxe8i9LyIlopAuETPRTifnvM854kRkLF9x1ZjD4iAQ9W29NmlUgUh76Oa\nQF7oOJ8QWaVR3H01oOlX94CUvcdzFTomtFJN6OZLzOGHPBv0DJUtJ1uehL6QWysH/0XmsgxKk7a/\n3Y5hRSWaJZpwzKfEoQdDtFK0UuJzGiJCCZEoTASH/PjF4/BfLKZbYLiGRxsP6JfSG6XIwHTU6KhR\nleOvq1QXwz3iS+Phj2Y1zsIwElE/UT+J+7VWxPlloSIEcER3RMh87FqzC9O8VFupwsEX3BLckkhA\nzquIBXAaeTdCyVjSoCScv82vNqPUX2a6DI7T67UXenXz3PJAyBysHazFepuveSKDW1a3DL0tCcMT\nYL9EC0ULJe7/AtGC8v6yKE2ETNYAZoBYJdLEYRPRm1cwpgD3Q7t+7fr9pwSkVlQtjFk8tewUUtMn\nrp3Agav1UOtRmXXkneVh4KJbReNC5R7PxQufjG8bI0Myfut4nJiCeQUwCJ2VOuMGEdoL4bALRwlH\niWEnIVerpyqHZnRFdUWkomR2yOyQSEAGFoExO/zp8GelCIjAkatpdOPhRjjoehA31P1h93GhtMdq\nj5VK4Mxk4XBst9sOxyR9bjocAdMlpniArZKtUEKWk5wDnHd+Hm5UYW1hbYnHPVqICKJQQYhUoEKa\nAm4kYR9hH4kE5HIhFIGnh6cHfQ8Ile9bnIZzOGUCh8x+7j7kXyMEJKuy97O6hbpF+abBZL9kOOaN\nHzau1HAGRsQgABA2LQzTq96UvoFh67KoyyKJBMQ+lwtEeJt6V4qAWHuLGSLXGFdE1kpSS9DTFv5X\nOPRWl8NdkFn99IKwWnNqwVHXD9ZHiWza+TToh7XOa50lEpAWc2FAsx9lg6BZ97HuUy0EpFceAkEL\nDyw88DUExDHSEQGtEqUS2AmvUi+cb0GuIFei3hwqRO25cLEQEUyFPAUMHREMEYiVbvVd03cNnYJF\npWZIfVJ69TvpHaozuvynHz+6Naraffn1U7AUVBXgd0VZRSGQctPpJnrJNOZqwL/hy/IxNGPj+Y3Q\nM0kNkhBobdyvMRxa4RDhEInPKV8IfSWXLIf9UjiqgFJL5h6DHgi3N24IAJc+LsUUU9eBrgOl7q8v\nD03ZRu5GCCiZuZsBtXZroSTsqvZVVNycvXoWGQwVPRU9iftFUHBSgF4I+bfyIFbyi+W5EtqjPOyn\n8ILwQkUIiGyxLF6HsGPzDkzD+lvNIhNhKmOKgHJkVCT87kttL2Gcr2qcapxYb+CD4ZiiWFpYimb+\nX4x/wfUUWggtJO4/8Q+Fx4XwuxWbK6J0TtZFVqwiaOLQidCTBbMLUOrb7kC7A/8pAZHxkUEkfu/U\nvaily2FywJjsTtudrsw6dWbXwQEljE9Aze6D7Q9g8Osq1kXzusNuB9wQxQrFcKz9xvuNr8i6Qy4P\ngYO9+ulq9EToXNG5Uq0ERNVRtfznM4/NRC9J8dFi3HA9DXsaSltPb4Iemozu6N5BBuLFlRfYP3Mt\nc0zR0szSRJPP3Zd3MTYtSi0KTVeKrxRfSe0BMbOGIxPmH4aSt/bu7d0pAaHyY0stTxIBlBU3xDJd\nCAGp/HtAgniorZ20axJS1WXqZQhkBF8JxnMqaCFoUZF1Ggc3hmOfODcRBvim1U0YZq1srexqJSB/\n/6D858EXgrk5/CtyMFSiafOmzaWtZ3bPDIY8yyALzfAhq0PQNMlYMVbfEwFp2LghIpbJ/snQfzuO\n7EDGSWApsJS2bufmnXF+QvuFwhGy7GrZlY7hpVIzRRjKoXqepE+r5T0gVSQgcpFyCAAcERxBgPZ+\nyn0MrdDN0wWx5zE8+Inj14wHoS96XATC0Kd+n/pS3zfRSwD9ML3hdPTAztOch8yJwhUF6ONm4c0w\nRehV6CtkIpbrLOeGcuzk7ZTYA1JbGYHcY0nH0GMRqR0JwqF6WRV+0Za/BXokJhmBHLNCs0KpfmxU\nHRCD5Y7Loc/GNB8DfcJfzYceFQYIAypCQD7hIKtB0Lt5e/P2YnqXziIcT1JxEgjKlPgpEnuWWz1r\nhRLb3DO5eM/H4qzF3DSuZbxl0va/T3Yf2KN1Z9ehZMs43Ti9RhOQT+g02gmR91eFr3CB4mTjwHA7\nduiIVLhGngZuPPlAeShs5Uhl3KBGzY1wgYIGB6Hn4V2td7VgiDsFIoPB38wHA1RfqY6U3LnN5/D3\nlIcpD8szO9VEVTRJqcSpgAk6mznDkKfkpWC7p8NP48ZU9VD1qFYCouWoJfaejlJLRNpS41NxY8TG\nxOLGtVK3guOiFKCEG1D3ii4emGXrlq0r38yZ7JUMw2xhY8GVPuQz2L+F0QvRC1MiW4Lz6tfTD01f\nde3qInOieFMRzZjmOeZopjrlego3dsaADMz7t3KzcqMEhMrPKNXxJnRtB22UWJ4pO8M5/K0LW2PK\nSdJ8GC7jB8aIOCnOUsRzrBKrgtrYlqtaosQq4k0EInOvW75GKYLncU9EmhgdRqdaCUhT76blP58x\nfwamOBVdKYK+6f1bb9Q28y/z8dzzV/BBTBqmN4Sh2dZ7G5q5P4g+IGOz4coG/I7RZrS/JwLyKdOx\nPnY9rkP+83yupErWE/pTc4Qmxl4qtVNC6Zh1HWuUAl8ZcAX68uH1h2huNR5gPIASECrfp977DwkI\nTw542OUw99qAlanw30zSTNLEAjOTGkMvPN76GBUuVw9fRWa2jWsb+C/Kk5Uxdlx9hjp67sYUjoF/\nWahYiMD06rLVeO5lp8oiAK6wUAElRnuj9oIIZIRnwO/rd6rfKWR6B9VCSZNqiioI0bh+4xBoKHEs\ngf4JNAxEwJjpxUAPuUS7wO8qSCqAnt/dcjf0t8lKk5Xl31+i+7fAj50RhP18ve41/Lpf3//KjRuf\nxZtVPsNQUQJiYGKA3pR7+ffgB77Mewl/9umrpwhAm2803yiRWI1TRm/e4d6Hoc8zCzLREzxw/EAE\n7tV2qyGgr3xFGfq9o05H2KGH4Q9xvq4euopekLpudd2+CwIiayMLh3lE5gj0PDx9/BSMNvtZNpjY\nlfQr6eVr9s65ncOBPb78GIYwv0k+5hRvV9yOG0srRytH0nbar24PJnn3+V0YlNzCXNyQl6Iu4YaL\nXh2Nz7MistArcm/UPaSY7C/bX5a0nkcLD9wIBSsKYIgd+jhINaQefh4owShULUTmo8vaLmvFMkLd\nZbqjqXvmBDQFZe/PRk9J8uxkXLAzq89g/xLOJoBhPqv3DM1ChzsfhkF9FPIIBs5iqYXYGDPds7r4\nfkTnCHyvMK0QD/JNuZsYG3fa5DRu1MQNiajpezniJQzs+KDxiODKHJc5Lua4OHSCQ1XAK8Dfx20f\nt50SECqUgPzDe0GuNUGJ6WnV03j+S+VL4YgmNkxERO6U7ymk9KP3R+O5T3ufBgOUXS8bz3lgRiCG\nPSjvVt4tcQqWZQAi9NlzslESZWpoKjWD2mlBJ9TuFugUwIB4TfKaVP7zZt2aYYre/Yf3EbBJaZcC\nh3v/gv0LyhvqG043UCIRYRIBPZIiSEHk8kXIC+ijyaMnjxabguU4BwY7Sz8LvXVWuVa5Uqdg7fHC\ni1eLPYsx396+1L5ULPNSYobx5jnLchChW3BiwQmp8+8/GKNX5fmo59Dv656vey7peyadTaAvz10+\nB/1f6FAIvXc17Coch9MPT3PnpXUKCGVKXAoCWO5+7tDzTEemo1iE8G0frmbdpgT2rn9K/xRKQKj8\naATE7qgdKjjK5Mug3zZ23ti5UiWnIxn08K56vgrP5euE1+g1817vjR46/q98TMNjGjLQm+5N3OH/\npW5ORYA5bUsaMg/nnM5BL8WejkVFTb5yvnJ5/9HkickTidP8FrYEEbnZ4CZKu3IicuAPivaKkEm4\ndOASHOe8jXlw4KPVolFZYpRqJPb6BbmRcjiOAMMA6OHc1Fx8/njeY7QCnLp7CtMHE64lwC4UtiiE\nP7lx80Ych9oFNbFxvcKLQvi/MdExIDbncs9Bb6rsUtklcUhREh/EZ43rGhCVjx0+IqC/w28H9JOg\nraCttOtgFWOFAHh8UDz8wDzfPNinWO1YBJTOXTyH/ckIyEBg/HHPxwhsO/Vz4no7hvLE3lM1YcUE\n+Ml5J/Kgn9u1bde2Zk3BKuSBEFgUWgB/U/8Nkf8dH3bAYJwwP4FmwPAj4UiJL7VaihST0xMn3EjK\n+5X3V2Q7pjqmMLizd85Gai3CPQIlRoeXHYYBm2s2F5FEcxNzE2nr6Ofpg1E6TXTCtKraI2tLHVdm\nIDCAYXZa6wTiUftSbYlTamTXyuJzR0VHEKoQpxA8SFGOUTDc6zLWwRHpcLcDbmCDJwY4/s69OoN5\nq1irWEtat3aD2nigRpeOhgHffWw3Sr6OnzoOhr88fjkyL532dsKDJttLtpekdTQvauLGc3JwgkE2\nCDcIpwSECiUgFSwZja4DAzIkcwgCLqFTQtE8eZx3HJ8fMTkCvfP74d8R0Wuf3B6RROEc4Rypek1g\nCv3S7UU3TIlRbqQsdSy25l1N6A8nZyf0aBi2NRQ3CAG8gPLjdYPuBKE5Mqo4Cin87XrbUdPs5uGG\nDKj6MHX0rg35MAT6+lCLQzCoU/ZM2SPm2GeZIKXfbVY3RPZUT6mekrafhk6G0H/OLs6IiGpM05gm\nFrE7pgw91r1td+x/Y8fGUpvvFV8o4vw4ZDugZKBpeFOp+kvnFx28uNE73xuRxP299yMyeNT9KOzG\nkj+W4I3AbTPaQi/z9/L3SsyE3dJGE6nzUmcEiHR5ujxKQKj8aAREQ1MDpU09/hYEMo42O1qVdYzC\njED0e9r2xFja1g9bg/Dz3/HfSeqRa6PUBj1bSyyXIBBz7P4xvD5gb529yFBOrD0RJVO6YboVejGi\n0SUj+GezDs7Ce4UOPTqEjO2h+4ewrl9bP+gb/R36O6T2KDeShx528XFBy8H6mPVw7KPuRaFkdYv7\nFuiRQTmDEDiv1aRWE4mEQomP47NbaYcMis18G2SoZRvINpC2/QauDUBAeuT2AGEx0TOp1Dj4Bt4N\nkEn37eiLgEr4y3CU9B85eATnZYHVAvjhzZKaceN++/L6SlzHogF6SLo/647EgsZijcU1egzv/+EB\nHhgn85RBLwbTk+lZneszsxlkGJjxzPhvsv/VtZ93mbvVuu4jHh4o5hpzreYcJyUgVH5sAvI/aMxD\nyQJznkEqnVnELKrReug2g+EevKm8qVK/58sgYsbz5/nX5OOp9PEbMlypxW5m93+3H5SAUKmZBKTG\nPKfkNQy8t7y31bLeKAYZU2YMM+ar1pPjcS8sfMJwGZixvLHf1XkdwKC0lAlmgmvOfn1LAkLxJ0JK\nQKj8ZASEIkVKQKhQAkKRIiUgFCkBoUKFEhCKlIBQoQSEIkVKQChSAkKFCiUgFClSAkKFEhCKFCkB\noUgJCBVKQChSpASEyg9CQDzpc0bx+8E0F0pAKFICQoUSEIoUKQGh8l3rvd71SWQ5n+CfFCnWPExT\n4TA+7f8B+P15KNMCkjoAAAAldEVYdGRhdGU6Y3JlYXRlADIwMTctMTAtMjRUMjI6MDY6MjArMDA6\nMDC0w4BVAAAAJXRFWHRkYXRlOm1vZGlmeQAyMDE3LTEwLTI0VDIyOjA2OjIwKzAwOjAwxZ446QAA\nABR0RVh0cGRmOlZlcnNpb24AUERGLTEuNSAFXAs5AAAAAElFTkSuQmCC\n" }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%tikz\n", "\\tikzset{every node/.style={font=\\sffamily,white}} \n", "\\node[fill=blue] at (0,0) (a) {Sender}; \n", "\\node[fill=blue] at (3,0) (b) {Channel}; \n", "\\node[fill=blue] at (6,0) (c) {Receiver}; \n", "\\draw[->] (a) -- (b) node [midway,above,font=\\scriptsize,black]{$p(\\mathbf{t})$}; \n", "\\draw[->] (b) -- (c) node [midway,above,font=\\scriptsize,black]{$p(\\mathbf{s}|\\mathbf{t})$};" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can in turn operate this distribution in the direction we actually care about: to infer a target sentence $\\target$ given a source sentence $\\source$ we find the _maximum a posteriori_ sentence\n", "\n", "\\begin{equation}\n", "\\label{decode-nc}\n", "\\target^* = \\argmax_\\target \\prob(\\target | \\source) = \\argmax_\\target \\prob(\\target) \\, \\prob(\\source | \\target). \n", "\\end{equation}\n", "\n", "For the structured prediction recipe this means setting \n", "\n", "$$\n", "s_\\params(\\target,\\source) = \\prob(\\target) \\, \\prob(\\source | \\target). \n", "$$\n", "\n", "In the noisy channel approach for MT the distribution $\\prob(\\target)$ that generates the target sentence is usually referred to as [language model](/template/statnlpbook/01_tasks/01_languagemodels), and the noisy channel is called the _translation model_. As we have discussed language models earlier, in this chapter we focus on the translation model $\\prob(\\source|\\target)$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A Naive Baseline Translation Model\n", "The most straightforward translation model translates words one-by-one, in the order of appearance:\n", "$$\n", "\\prob_\\params^\\text{Naive}(\\ssource|\\starget) = \\prod_i^{\\length{\\source}} \\param_{\\ssource_i,\\starget_i}\n", "$$\n", "where $\\param_{\\ssource,\\starget} $ is the probability of translating $\\starget$ as $\\ssource$. $\\params$ is often referred to as *translation table*.\n", "\n", "For many language pairs one can acquire training sets $\\train=\\left( \\left(\\source_i,\\target_i\\right) \\right)_{i=1}^n $ of paired source and target sentences. For example, for French and English the [Aligned Hansards](http://www.isi.edu/natural-language/download/hansard/) of the Parliament of Canada can be used. Given such a training set $\\train$ we can learn the parameters $\\params$ using the [Maximum Likelhood estimator](/template/statnlpbook/02_methods/0x_mle). In the case of our Naive model this amounts to setting\n", "$$\n", "\\param_{\\ssource,\\starget} = \\frac{\\counts{\\train}{s,t}}{\\counts{\\train}{t}} \n", "$$\n", "Here $\\counts{\\train}{s,t}$ is the number of times we see target word $t$ translated as source word $s$, and $\\counts{\\train}{t}$ the number of times we the target word $t$ in total.\n", "\n", "### Training the Naive Model\n", "Let us prepare some toy data to show how to train this naive model." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "run_control": { "frozen": false, "read_only": false } }, "outputs": [ { "data": { "text/plain": [ "[(['the', 'house', 'is', 'small'], ['das', 'Haus', 'ist', 'klein']),\n", " (['the', 'house', 'is', 'small'], ['klein', 'ist', 'das', 'Haus']),\n", " (['a', 'man', 'is', 'tall'], ['ein', 'Mann', 'ist', 'groß']),\n", " (['my', 'house', 'is', 'small'], ['klein', 'ist', 'mein', 'Haus'])]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_raw = [\n", " (\"the house is small\", \"das Haus ist klein\"),\n", " (\"the house is small\", \"klein ist das Haus\"),\n", " (\"a man is tall\", \"ein Mann ist groß\"),\n", " (\"my house is small\", \"klein ist mein Haus\")\n", "]\n", "train = [(t.split(\" \"), s.split(\" \")) for t,s in train_raw]\n", "train" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice how we transformed raw strings into tokenised sentences via `split`. This dataset can be used to train the naive model as follows. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "run_control": { "frozen": false, "read_only": false } }, "outputs": [], "source": [ "from collections import defaultdict \n", "def learn_naive_model(data):\n", " \"\"\"\n", " Trains a naive per-word-translation model.\n", " Args:\n", " data: list of (target,source) pairs\n", " Returns:\n", " dictionary from (source,target) word pair to probability.\n", " \"\"\"\n", " norm = defaultdict(float)\n", " counts = defaultdict(float)\n", " for target, source in data:\n", " for i in range(0, len(target)):\n", " norm[target[i]] += 1.0\n", " counts[(source[i],target[i])] += 1.0\n", " result = {}\n", " for (source,target),score in counts.items():\n", " result[(source,target)] = score / norm[target]\n", " return result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us train on the toy dataset:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "run_control": { "frozen": false, "read_only": false } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEiCAYAAAACg5K6AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFvlJREFUeJzt3X+0XWV95/H3J0HQTnUEidgJOInIlFLFjs2AdWFRWiyo\nlVJoQR2tKMVggTUiaqrWqaOj4KoDg6IxWop2dBhrW82CIFWqto5jTSiIouKkgBJGJSCC+CtGvvPH\n3heO10vuuTcn95z75P1aK4uz937OPV+ycz732Xs/+9mpKiRJbVky7gIkSaNnuEtSgwx3SWqQ4S5J\nDTLcJalBhrskNchwl6QGGe6S1CDDXZIatMe4PnjfffetFStWjOvjJWlRuvrqq2+vqmWztRtbuK9Y\nsYJNmzaN6+MlaVFK8rVh2nlaRpIaZLhLUoMMd0lqkOEuSQ0y3CWpQUOFe5JjktyQZHOSNTNsf2qS\nu5Jc2/953ehLlSQNa9ahkEmWAhcBRwNbgI1J1lfVl6Y1/ceqetYuqFGSNEfD9NwPAzZX1Y1VtQ24\nFDhu15YlSdoZw9zEtBy4ZWB5C3D4DO2enOQ64FbgnKq6fnqDJKcBpwE8+tGPnnu1WrRWrLl83CU0\n6+ZznznuEjSBRnVB9Z+BR1fVocDbgA/P1Kiq1lXVqqpatWzZrHfPSpLmaZhwvxU4YGB5/37dfarq\n7qq6p3+9AXhQkn1HVqUkaU6GCfeNwEFJVibZEzgZWD/YIMmjkqR/fVj/c+8YdbGSpOHMes69qrYn\nOQO4ElgKXFxV1ydZ3W9fC5wInJ5kO/AD4OSqql1YtyRpB4aaFbI/1bJh2rq1A6/fDrx9tKVJkubL\nO1QlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDD\nXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwl\nqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDRoq3JMck+SGJJuTrNlBu/+QZHuSE0dX\noiRprmYN9yRLgYuAY4FDgOckOeQB2p0H/N2oi5Qkzc0wPffDgM1VdWNVbQMuBY6bod2ZwF8Dt42w\nPknSPAwT7suBWwaWt/Tr7pNkOXA88M7RlSZJmq9RXVC9AHhVVd27o0ZJTkuyKcmmrVu3juijJUnT\n7TFEm1uBAwaW9+/XDVoFXJoEYF/gGUm2V9WHBxtV1TpgHcCqVatqvkVLknZsmHDfCByUZCVdqJ8M\nPHewQVWtnHqd5BLgsunBLklaOLOGe1VtT3IGcCWwFLi4qq5PsrrfvnYX1yhJmqNheu5U1QZgw7R1\nM4Z6Vb1w58uSJO0M71CVpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S\n1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkN\nMtwlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGDRXuSY5J\nckOSzUnWzLD9uCTXJbk2yaYkR4y+VEnSsPaYrUGSpcBFwNHAFmBjkvVV9aWBZlcB66uqkhwKfBA4\neFcULEma3TA998OAzVV1Y1VtAy4FjhtsUFX3VFX1i/8KKCRJYzNMuC8HbhlY3tKv+ylJjk/yFeBy\n4EUz/aAkp/WnbTZt3bp1PvVKkoYwsguqVfW3VXUw8DvAGx6gzbqqWlVVq5YtWzaqj5YkTTNMuN8K\nHDCwvH+/bkZV9Q/AY5Lsu5O1SZLmaZhw3wgclGRlkj2Bk4H1gw2SPDZJ+tdPBPYC7hh1sZKk4cw6\nWqaqtic5A7gSWApcXFXXJ1ndb18LnAC8IMmPgR8AJw1cYJUkLbBZwx2gqjYAG6atWzvw+jzgvNGW\nJkmaL+9QlaQGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4\nS1KDDHdJapDhLkkNMtwlqUFDzec+aVasuXzcJTTr5nOfOe4SNCH8nu06C/E9s+cuSQ0y3CWpQYa7\nJDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtS\ngwx3SWrQUOGe5JgkNyTZnGTNDNufl+S6JF9I8pkkTxh9qZKkYc0a7kmWAhcBxwKHAM9Jcsi0ZjcB\nR1bV44E3AOtGXagkaXjD9NwPAzZX1Y1VtQ24FDhusEFVfaaq7uwXPwvsP9oyJUlzMUy4LwduGVje\n0q97IC8GrphpQ5LTkmxKsmnr1q3DVylJmpORXlBN8jS6cH/VTNural1VraqqVcuWLRvlR0uSBuwx\nRJtbgQMGlvfv1/2UJIcC7wGOrao7RlOeJGk+hum5bwQOSrIyyZ7AycD6wQZJHg38DfD8qvrq6MuU\nJM3FrD33qtqe5AzgSmApcHFVXZ9kdb99LfA64BHAO5IAbK+qVbuubEnSjgxzWoaq2gBsmLZu7cDr\nU4FTR1uaJGm+vENVkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMM\nd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1yHCX\npAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaNFS4JzkmyQ1J\nNidZM8P2g5P8nyQ/SnLO6MuUJM3FHrM1SLIUuAg4GtgCbEyyvqq+NNDs28BZwO/skiolSXMyTM/9\nMGBzVd1YVduAS4HjBhtU1W1VtRH48S6oUZI0R8OE+3LgloHlLf26OUtyWpJNSTZt3bp1Pj9CkjSE\nBb2gWlXrqmpVVa1atmzZQn60JO1Whgn3W4EDBpb379dJkibUMOG+ETgoycokewInA+t3bVmSpJ0x\n62iZqtqe5AzgSmApcHFVXZ9kdb99bZJHAZuAhwH3JvlPwCFVdfcurF2S9ABmDXeAqtoAbJi2bu3A\n62/Sna6RJE0A71CVpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDD\nXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwl\nqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNWiocE9yTJIbkmxOsmaG\n7UlyYb/9uiRPHH2pkqRhzRruSZYCFwHHAocAz0lyyLRmxwIH9X9OA9454jolSXMwTM/9MGBzVd1Y\nVduAS4HjprU5DnhfdT4LPDzJL4y4VknSkPYYos1y4JaB5S3A4UO0WQ58Y7BRktPoevYA9yS5YU7V\nLl77ArePu4hh5LxxVzAx3GeLy6LZX7DT++zfDtNomHAfmapaB6xbyM+cBEk2VdWqcdeh4bnPFhf3\n188a5rTMrcABA8v79+vm2kaStECGCfeNwEFJVibZEzgZWD+tzXrgBf2omScBd1XVN6b/IEnSwpj1\ntExVbU9yBnAlsBS4uKquT7K6374W2AA8A9gMfB84ZdeVvCjtdqeiGuA+W1zcX9OkqsZdgyRpxLxD\nVZIaZLhLUoMMd2kEkmTcNUiDDPcJleTn+v8aGotAefFqUUuy97hrGDXDfQIlOR54fZKjDI3JluSk\nJBck2S/Jw8Zdj+YuyVHAB5I8PckvjrueUXG0zITpe+or6CZpOwe4HPh8VX1snHVpZkn2A14BhG5o\n8bqqun68VWlYSZZW1U+S/DZwBPBwuu/bO8Zc2k4z3CdIkt8FHgu8o6ruSfLLdDNurgQ+WVV/NdYC\ndZ8kZwNLqurP+uXHA78J/D7w0qq6Zpz1aXZJTgJeAJxQVT9M8ki6TtV/Bq6sqnPHWuBO8rTMZPkx\n8FDg1CR79j3A9wLXAIcnedxYq9OgfwSOSnIWQFV9AfjvwPuBNyVZOc7iNJTLgS8Dl/Q9+Nuq6pPA\n2cARSU4ca3U7yXCfAEmm7hS+DPg4cH0/vTJVtRW4gu6Q34egjFn/fAOqaiPwGuA7U9uq6l7gfcCn\ngaf27b0gPmGmvm9VdQ/dPnxvVf1koMl1dL+kD0qyR5JFmZOLsuiW9D2G7f0/uBOr6lNT59engqGq\nbgU+AJxl7318Bs7PLknya1V1TVW9r982ta/uBr4IHNkve95zggx835YmObOqflRVV/TblgD0QX8N\n8BTg4P6X9qJjuI/ZVFjQTb725GmbA11wVNXn6A77H7LAJYruiz+wrz4KPG9we1XVQDh8BPhBkt8Y\nQ6nagYF9+GHg0Gnb7h34Jf0VulOihy7Woy8vqI5JkoP7f0AkOQ/4blW9sR/f/ltV9bf9tkz1/vpe\n+y1VddfYCt8NJVneHz2R5BLgG1X1x/0pmlXAl6rqu/32JX1IHADcUVXfH1vhuk+Sp1XVJ/rXrwH2\nraqXJdmH7kly/xu4qap+PPCehwE/qarvjaXonWS4j0GSZwEPrqoP9cvn0x1FfQdYRjdCZnNVHT2+\nKgWQ5FhgRVW9s18+F7gW+FW6i9+/DbwLeOPg4fvgL2WNVz/q7Mip4Y39E+EOpHui0TeBp9Gdhjm7\nqr49tkJHzHBfYEkeBdzW9+5OBb4CfAF4D/A54INV9bUk64Gzqurm8VW7e0uyV1X9qO+h/zHwIbrH\nR/5H4Gt008zuA7wZeHFV3Ta2YjWjJI+rqi/2r/+ErgP1F8Bq4Ht0F1O/n+RjwGur6p/GV+1oLehj\n9nZ3SX4TOB54b5JrgYOBXwK+V1W/17fZpz/0v9NgH5/+noPTk5xQVXf3p8v+K/Dyqjqlb7MEeBPd\nqTKDfcL09x78jySvqarL6DpRL6R7mNDU/Ql7J3k38PWWgh28oLrQbqHrqf8usCfdBdI7gef1wQ/w\narojqj8Ah9KN0Qbgn4A/70cyXQh8FrggydSFuLcB26rqpeC+mjT9vQevAs7p7zu4EngHcHySP+qb\n/QFwe1W9GNrah56WWWBJDgSWVdVn++X96f6B/Wu6ce6fq6of9tuWLNZhWC1I8mDgiKr6eL+8H90o\nmV8HXgncU1X/r9/mvppQSQ6qqv/bv96Lbv+9FPgY8M6BAQtN7UPDfYymjaw4E7i2qj7Qb/OC3ATq\nA/6PgG9X1QX9OvfVIjDwfXsw8HTgwKo6v9/W3D403CdEkn1aulLfsiQP629W0iKVZI+q2j7uOnYl\nw33CtNiDWAzm8/fuvpo87pP7eUF1F0jyrCRHzOe9/sNcWElelORX+jtM53QxzX01GZK8cOoCqfvk\nfob7iCV5ON280EcmOXwH7Zq5Kr9Y9fvqEXRz9jx+4MJaprVzX02oJA8CvgX8+yQv2kG73S7rdrv/\n4V2pn5ToO3RD5MK0eWCSLE/ye0keZA9j/Pp99T/pZnF87MD6SnJgkuOmlsdUombRTxfwSbr9eOfg\ntiQHJPnT/lRNM6NghmW4j0junzFwKd3NYedWNzf0oCfRDcP6Nwtdn+6X5NAkR/aLzwI+PTWXz4DH\nAC9K8oSFrU7D6L9nAFTVD6rqqhn24V50dxQfv6DFTQjDfQT6nsHUbHN/DZw0eCV+4JDwMrqbl35r\nDGWK+/bFD4Gzk9wAPLKqvjpD088BV+MsnBNn8PuWZF3fOz9+cHv/8iZgI/BzYyl0zJx+YAQGDtvf\nD2yqqrcAJPnFqrqhH1u7tJ+n5Gy65zRqgQ0cXd1CN+nXUro5Yn5GVd2V5O/pAkITZOD79k7gwXT7\n6M39ENX39qfVpvb1X9F1qHY7hvtOGBx2leQRwMOATUleAPwa8Owk66rq9VM9jX760EU5hehiNvBl\nXwL8O+AlwH7Ay/pQeFvfbv+q2gJQVZ8eX8WaLsmvVtXV/es/o8v5F/TLNwPv7r+Tl/T7OlV15w5+\nZNM8LTNPfVgM3rZ8B90j1t5FFx5/CRwNrOjviJt6DJsW2LTTZh8FTu9vR/8c8OfAbyR5WZJ3AUeN\ns1bNLN2DTx4zsGoJ8MQkT+pvSPoU3S/styV5Cngh3JuY5mFaL/DtdA+2/mhVXZHkoVX13XSTTa2j\nm/HxzLEWvBtLsrKqbupfv5vuARprkuxJNyf7zcBK4BV0UzG/ZGzF6gGle2D8tiSr6Sb6+lCSNwEr\ngP8CfLU//flLVfXlsRY7Iey5z8PAqJj1wL/Qnbd9e5KzgL37yYn+F7BkKtgdK73w0s20+YyBVd8C\nvpLkAroZOT8ErK6qzwAnTAW7+2py5P4Hkm/rj4AfDjwtybFV9Wrg68CfAof0R2hf7t+322ebPfc5\nSHIS3bC5W5M8GzgEOB+4HLiR7hzuPwAfBPapqs/372tqtrnFIMnPV9U9/Zf8tXQjlfaim897C91R\n1UPppvJdXVVf79/n7esTYtoR8iq6R1F+OckfAr8CXNYfLa8FvlBVF4214AljuA8pyUPpHqp7Dd0Y\n9tv7Q/tXAvdW1Zv6w8RjgXOq6qr+fYbFAutvPno5cGJV3ZbuCTyrgDUDPbs96J5+9YOqOn181WpH\n+mC/HLgOeArwd3QPTXkJ3cNu/r6q/mZ8FU6u3f7QZRj9HaXfBX6f7jzt65PsXVXbgAdx/1PUHwlc\nOhXs4EWdcaiqj9DN1f2e/lB+LfBx4C25f0qItwI/mgp2T8VMrLfS7bs39Mvf6u9KfTfd6dCHTjV0\nH/40e+6zyP1zQO9J9+zF79I9SenDwBrg5+me7vILwBdr4AlKBvvCSvJ84MlVdXp/rvYpU3cJJ9kX\nOInuBrLXAbdW1dZ+m6fNJsT0702SM+meXvZy4BNVdV6Sg+gebvPP7rcHZs99B/qRL/f2QfFKYGlV\nrQYeBywDzuX+sF9tsI/dFcC3+nHrPwE+NbWhqm6nu8j9WeCogWDfLecdmUSDw4sH3EP3eLyrquq8\nft1bgCOn9ps99pl5E9MD6C+YviLdA5Jv62+SWJbkIf3yauBLdH+HZ1XVxv59Bvv4/AT4ZeC5wNr+\nTsX79kd/neTCqrpn6g3uq8kxcPH0CmAr3dQBf0HXkTolyR10HamtVfXWgfe5D2dgz/0BVNV6uos3\nFyd5CN0wuv2Ag5PsVVXforsh5ibDYjL0dyO+EXhNkuf26yq9fvkesLc3STIwCRiwmm7QwvuB/YE/\noZtm4PV0Uw1cVVWn9O8zv3bAc+7TzHLedg3dEKy76KaIvb6qzuq32WOfEP349guBt1TVJQPrPbc+\nYQauaS0BTgSeSvfQ6i8keRJwHN3R8fnVP4x88H1jKXqRMNyn6S+8nQH8t6q6e4YLPEcAewPLq2pt\nv85gnzD9fvpLuvsQNlfVhjGXpGn6aQO290dRl9N1mg6ju3B6at/mcOD5wGeqf3i8hmO4T5Nkb7ob\nXK4aDG+6v6uf6SnYg5hc/aiKo+nmJPniYC9e45Wfnqbj1cDKqjol3QR8V9IF/Cv6tgdW1b+Ms97F\nyHCfQboHNFwGvGqqt2DvfHHrR9DcPe46dP9gBeAEYBvwZrqH2Dyvqq5Nsh/dRdWrq+oPB97nd3AO\nvCAxg37agFOA1yZ5Yb/uvhkgx1ia5slgnxwDgxXeTTf99fl0D7l5YZJD+8EKzwS+Oe19Bvsc2HPf\nAc/bSqMzw2CFX6+qT/TbngA8m+4u70uqn7e932aPfR7she5AdQ9reDqwHThqqhcvaV6m32T2yakN\n/dHyR+gegXjI4JsM9vmx5z4HnreV5u+BBitMG412QFXdMq4aW2K4S1owDzRYAX66h+6pmJ3n9AOS\nFkxVfT7JKcCF6Z6udMngYIWpYcUG+86z5y5pwTlYYdcz3CWNhTeZ7VqGu6Sxc7DC6BnuktQgx7lL\nUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBv1/fIlxQs8EY9IAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "table = learn_naive_model(train)\n", "def plot_table_for_target(table, target):\n", " source_for_is, scores = zip(*[item for item in sorted(table.items()) if item[0][1] == target])\n", " util.plot_bar_graph(scores, source_for_is, rotation=45, align='center')\n", "plot_table_for_target(table, \"is\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Decoding with the Naive Model\n", "\n", "*Decoding* in MT is the task of finding the solution to equation $\\ref{decode-mt}$. That is, we need to find that target sentence with maximum a posteriori probability, which is equivalent to finding the target sentence with maximum likelihood as per equation $\\ref{decode-nc}$. The phrase \"decoding\" relates to the noisy channel analogy. Somebody generated a message, the channel encodes (translates) this message and the receiver needs to find out what the original message was. \n", "\n", "In the naive model decoding is trivial if we assume a unigram language model. We need to choose, for each source word, the target word with maximal product of translation and language model probability. For more complex models this is not sufficient, and we discuss a more powerful decoding method later.\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "run_control": { "frozen": false, "read_only": false } }, "outputs": [ { "data": { "text/plain": [ "['my', 'house', 'the', 'small']" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def decode(source_sent, model, lm):\n", " \"\"\"\n", " Decodes using the naive model. Translates each source token in isolation and appends the results.\n", " Args:\n", " source_sent: the source sentence as a list of tokens.\n", " model: the naive model, a dictionary from (source,target) to probabilities.\n", " lm: a uniform language model as defined in the language_models chapter.\n", " Returns:\n", " a list of target tokens. \n", " \"\"\"\n", " source_to_targets = defaultdict(list)\n", " for (source,target),prob in model.items():\n", " source_to_targets[source] += [(target,prob)]\n", " result = []\n", " for tok in source_sent:\n", " candidates = source_to_targets[tok]\n", " multiplied_with_lm = [(target,prob * lm.probability(target)) for target, prob in candidates]\n", " target = max(multiplied_with_lm, key=lambda t: t[1])\n", " result.append(target[0])\n", " return result\n", "\n", "source = train[1][1]\n", "lm = UniformLM(set([target for _, target in table.keys()]))\n", "target = decode(source, table, lm)\n", "target" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The naive model is broken in several ways. Most severely, it ignores the fact that word order can differ and still yield (roughly) the same meaning. \n", "\n", "## IBM Model 2\n", "The IBM Model 2 is one of the most influential translation models, even though these days it is only indirectly used in actual MT systems, for example to initialize translation and alignment models. As IBM Model 2 can be understood as generalization of IBM Model 1, we omit the latter for now and briefly illustrate it afterward our introduction of Model 2. Notice that parts of these exposition are based on the excellent [lecture notes on IBM Model 1 and 2](http://www.cs.columbia.edu/~mcollins/courses/nlp2011/notes/ibm12.pdf) of Mike Collins.\n", "\n", "### Alignment\n", "The core difference of Model 2 to our naive baseline model is the introduction of _latent_ auxiliary variables: the word to word _alignment_ $\\aligns$ between words. In particular, we introduce a variable $a_i \\in [0 \\ldots \\length{\\target}]$ for each source sentence index $i \\in [1 \\ldots \\length{\\source}]$. The word alignment $a_i = j $ means that the source word at token $i$ is _aligned_ with the target word at index $j$. \n", "\n", "Notice that $\\align_i$ can be $0$. This corresponds to a imaginary _NULL_ token $\\starget_0$ in the target sentence and allows source words to be omitted in an alignment. \n", "\n", "Below you see a simple example of an alignment.\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "run_control": { "frozen": false, "read_only": false } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", "\n", " \n", " NULL the house is small\n", " \n", " \n", " klein ist das Haus\n", " \n", " \n", " \n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import statnlpbook.word_mt as word_mt\n", "alignments=word_mt.Alignment(\"NULL the house is small\".split(\" \"),\n", " \"klein ist das Haus\".split(\" \"),\n", " [(1,2),(2,3),(3,1),(4,0)])\n", "alignments" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An example where source words have been dropped, as indicated via the `NULL` alignment, can be seen below. Here the Japanese case marker が is dropped in the English translation." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "run_control": { "frozen": false, "read_only": false } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", "\n", " \n", " NULL I like music\n", " \n", " \n", " 音楽 好き\n", " \n", " \n", " \n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "word_mt.Alignment(\"NULL I like music\".split(\" \"),\n", " \"音楽 が 好き\".split(\" \"),\n", " [(0,1),(2,2),(3,0)])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "IBM Model 2 defines a conditional distribution $\\prob(\\source,\\aligns|\\target)$ over both the source sentence $\\source$ and its alignment $\\aligns$ to the target sentence $\\target$. Such a model can be used as translation model $\\prob(\\source|\\target)$, as defined above, by marginalizing out the alignment \n", "\n", "$$\n", "\\prob(\\source|\\target) = \\sum_{\\aligns} \\prob(\\source,\\aligns|\\target).\n", "$$\n", "\n", "### Model Parametrization\n", "\n", "\n", "IBM Model 2 defines its conditional distribution over source and alignments using two sets of parameters $\\params=(\\balpha,\\bbeta)$. Here $\\alpha(\\ssource|\\starget)$ is a parameter defining the probability of translation target word $\\starget$ into source word $\\ssource$, and $\\beta(j|i,l_\\starget,l_\\ssource)$ a parameter that defines the probability of aligning the source word at token $i$ with the target word at token $j$, conditioned on the length $l_\\starget$ of the target sentence, and the length $l_\\ssource$ of the source sentence. In addition, Model 2 assigns a uniform probabilitity $\\epsilon$ over source sentences lengths. \n", "\n", "With the above parameters, IBM Model 2 defines a conditional distribution over source sentences and alignments, conditioned on a target sentence _and a desired source sentence length_ $l_\\ssource$: Model 2 defines a conditional distribution over source sentences and alignments:\n", "\n", "\\begin{equation}\n", "\\label{ibm2}\n", " p_\\params^\\text{IBM2}(\\ssource_1 \\ldots \\ssource_{l_\\ssource},\\align_1 \\ldots \\align_{l_\\ssource}|\\starget_1 \\ldots \\starget_{l_\\starget}) = \\epsilon \\prod_i^{l_\\ssource} \\alpha(\\ssource_i|\\starget_{a_i}) \\beta(a_i|i,l_\\starget,l_\\ssource)\n", "\\end{equation}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training IBM Model 2 with the EM Algorithm\n", "\n", "Training IBM Model 2 is less straightforward than training our naive baseline. The main reason is the lack of _gold alignments_ in the training data. That is, while we can quite easily find, or heuristically construct, _sentence-aligned_ corpora like our toy dataset, we generally do not have _word aligned_ sentences.\n", "\n", "To overcome this problem, IBM Model can be trained using the Expectation Maximization (EM) Algorithm, a general recipe when learning with partially observed data—in our case the data is partially observed because we observe the source and target sentences, but not their alignments. The EM algorithm maximizes a lower bound of the log-likelihood of the data. The log-likelihood of the data is:\n", "\n", "$$\n", " \\sum_{(\\target_i,\\source_i) \\in \\train} \\log p_\\params^\\text{IBM2}(\\source_i|\\target_i) = \\sum_{(\\target_i,\\source_i) \\in \\train} \\log \\sum_{\\aligns} p_\\params^\\text{IBM2}(\\source_i,\\aligns|\\target_i) \n", "$$\n", "\n", "EM can be be seen as [block coordinate descent](https://www.dropbox.com/s/vrsefe3m57bxpgv/EMforTM.pdf?dl=0) on this bound.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The EM algorithm is an iterative method that iterates between two steps, the E-step (Expectation) and the M-Step (Maximization), until convergence. For the case of IBM Model 2 the E and M steps are instantiated as follows: \n", "\n", " * **E-Step**: given a current set of parameters $\\params$, calculate the **expectations** $\\pi$ of the latent alignment variables under the model $p_\\params^\\text{IBM2}$ — this amounts to estimating a _soft alignment_ for each sentence. \n", " * **M-Step**: Given training set of soft alignments $\\pi$, find new parameters $\\params$ that **maximize** the log likelihood of this (weighted) training set. This amounts to soft counting. \n", "\n", "### E-Step\n", "
\n", "\n", "The E-Step calculates the distribution\n", "\n", "$$\n", "\\pi(\\aligns|\\source,\\target) = p_\\params^\\text{IBM2}(\\aligns|\\source,\\target)\n", "$$\n", "\n", "for the current parameters $\\params$. For Model 2 this distribution has a very simple form:\n", "\n", "$$\n", "\\pi(\\aligns|\\source,\\target) = \\prod_i^{l_{\\ssource}} \\pi(a_i|\\source,\\target,i) = \\prod_i^{l_{\\ssource}} \n", " \\frac\n", " {\\alpha(\\ssource_i|\\starget_{a_i}) \\beta(a_i|i,l_\\starget,l_\\ssource)}\n", " {\\sum_j^{l_{\\starget}} \\alpha(\\ssource_i|\\starget_j) \\beta(j|i,l_\\starget,l_\\ssource) }\n", "$$\n", "\n", "Importantly, the distribution over alignments *factorizes* in a per-source-token fashion, and hence we only need to calculate, for each source token $i$ and each possible alignment $a_i$, the probability (or expectation) $\\pi(a_i|\\source,\\target,i)$.\n", "\n", "Before we look at the implementation of this algorithm we will set up the training data to be compatible with our formulation. This involves introducing a 'NULL' token to each target sentence to allow source tokens to remain unaligned. " ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "run_control": { "frozen": false, "read_only": false } }, "outputs": [ { "data": { "text/plain": [ "[(['NULL', 'the', 'house', 'is', 'small'], ['klein', 'ist', 'das', 'Haus']),\n", " (['NULL', 'a', 'man', 'is', 'tall'], ['groß', 'ist', 'ein', 'Mann']),\n", " (['NULL', 'my', 'house', 'is', 'small'], ['klein', 'ist', 'mein', 'Haus']),\n", " (['NULL', 'the', 'building', 'is', 'big'], ['groß', 'ist', 'das', 'Gebäude']),\n", " (['NULL', 'the', 'building', 'is', 'long'],\n", " ['lang', 'ist', 'das', 'Gebäude'])]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_model_2_raw = [\n", " (\"NULL the house is small\" , \"klein ist das Haus\"),\n", " (\"NULL a man is tall\" , \"groß ist ein Mann\"),\n", " (\"NULL my house is small\" , \"klein ist mein Haus\"),\n", " (\"NULL the building is big\" , \"groß ist das Gebäude\"),\n", " (\"NULL the building is long\" , \"lang ist das Gebäude\")\n", "]\n", "train_model_2 = [(t.split(\" \"), s.split(\" \")) for t,s in train_model_2_raw]\n", "train_model_2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now implement the E-Step. First we introduce a data structure to represent the IBM Model 2. " ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "class IBMModel2:\n", " \"\"\"\n", " A representation of IBM Model 2 with alpha and beta parameters.\n", " \"\"\"\n", " def __init__(self, alpha, beta):\n", " \"\"\"\n", " Create a new IBM Model 2 instance.\n", " Params:\n", " alpha: a dictionary that from pairs (s,t) of source and target words to probilities.\n", " beta: a dictionary that maps from tuples (ti,si,lt,ls) of target index, source index, target length\n", " and source length to the probability p(ti|si,lt,ls).\n", " \"\"\"\n", " self.alpha = alpha\n", " self.beta = beta" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We also need a way to normalise a vector of real values to sum up to 1." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "def norm_scores(scores):\n", " \"\"\"\n", " Normalises the alignment scores to sum up to one.\n", " Params:\n", " a list of unnormalised scores.\n", " Returns:\n", " a scaled version of the input list such that its elements sum up to 1.\n", " \"\"\"\n", " norm = sum(scores)\n", " return [s/norm for s in scores]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The E-Step can now be implemented as follows." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "run_control": { "frozen": false, "read_only": false } }, "outputs": [], "source": [ "def e_step(model, data):\n", " \"\"\"\n", " Perform the E-Step of the EM algorithm.\n", " Params:\n", " model: An IBMModel2 instance.\n", " data: a list of (t,s) pairs of target and source sentences.\n", " Returns:\n", " a list of alignment matrices, one for each instance in `data`. An alignment matrix is a list of lists, one list\n", " per source token, and the float numbers in each list correspond to the alignment probabilities for each token with\n", " respect to the source token.\n", " \"\"\"\n", " all_alignments = []\n", " for target,source in data:\n", " def score(si, ti):\n", " return model.alpha[source[si],target[ti]] * model.beta[ti,si, len(target),len(source)]\n", " result = []\n", " for si in range(0, len(source)):\n", " scores = norm_scores([score(si,ti) for ti in range(0, len(target))])\n", " result.append(scores)\n", " all_alignments.append(result)\n", " return all_alignments" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us run this code using a simple initial model. " ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "run_control": { "frozen": false, "read_only": false } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", "\n", " \n", " NULL the house is small\n", " \n", " \n", " klein ist das Haus\n", " \n", " \n", " \n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "source_vocab = set([tok for _,s in train_model_2 for tok in s])\n", "target_vocab = set([tok for t,_ in train_model_2 for tok in t])\n", "\n", "max_length = 5\n", "alpha, beta = {}, {}\n", "for s in source_vocab:\n", " for t in target_vocab:\n", " alpha[s,t] = 1.0 / len(source_vocab)\n", "for ti in range(0, max_length):\n", " for si in range(0, max_length):\n", " for lt in range(1, max_length+1):\n", " for ls in range(1, max_length+1):\n", " beta[ti,si,lt,ls] = 1.0 / lt\n", " \n", "init_model = IBMModel2(alpha,beta)\n", "align_matrices = e_step(init_model, train_model_2)\n", "word_mt.Alignment.from_matrix(align_matrices[0], train_model_2[0][1], train_model_2[0][0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can play around with the initialization of $\\balpha$ to see how the alignments react to changes of the word-to-word translation probabilities.\n", "\n", "### M-Step\n", "\n", "The M-Step optimizes a *weighted* or *expected* version of the log-likelihood of the data, using the distribution $\\pi$ from the last E-Step:\n", "\n", "$$\n", " \\params^* = \\argmax_\\params \\sum_{(\\target,\\source) \\in \\train} \\sum_\\aligns \\pi(\\aligns|\\target,\\source) \\log \\prob _\\params^\\text{IBM2}(\\source,\\aligns|\\target)\n", "$$\n", "\n", "The summing over hidden alignments seems daunting, but because $\\pi$ factorizes as we discussed above, we again have a simple closed-form solution:\n", "\n", "$$\n", " \\alpha(\\ssource|\\starget) = \\frac\n", " {\\sum_{(\\target,\\source)}\\sum_i^{l_\\source} \\sum_j^{l_\\target} \\pi(j|i) \\delta(\\ssource,\\ssource_i) \\delta(\\starget,\\starget_j) }\n", " {\\sum_{(\\target,\\source)} \\sum_j^{l_\\target} \\delta(\\starget,\\starget_j) }\n", "$$\n", "\n", "where $\\delta(x,y)$ is 1 if $x=y$ and 0 otherwise. The updates for $\\beta$ are similar. \n", "\n", "Let us implement the M-Step now. In this step we estimate parameters $\\params$ from a given set of (soft) alignments $\\aligns$. \n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "run_control": { "frozen": false, "read_only": false } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAExCAYAAAB71MlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJztnXeYXVXV/z8rE0JvJiEEQkiAUEKJYggtlCABQjGgIr0p\nYkAIgi0qRFBfKYJ0CKFIeUUstAABpIiKwE9A6S/wRuCVoIgoUhRFYP3+WPtm9lwmmTtzz7kzyfl+\nnmeeuaeuvc/Z53v2WXvtvc3dEUIIUR369XYChBBCtBYJvxBCVAwJvxBCVAwJvxBCVAwJvxBCVAwJ\nvxBCVAwJvxBCVAwJvxBCVAwJvxBCVAwJvxBCVIz+vZ2Azhg0aJCPGDGit5MhhBALDQ899NAr7j64\nkX37pPCPGDGCBx98sLeTIYQQCw1m9n+N7itXjxBCVAwJvxBCVAwJvxBCVAwJvxBCVIyGhN/MdjKz\np81sjplN62T7fmb2qJk9Zmb3mtmYbNvzaf3DZqYWWyGE6GW6jOoxszbgPGAiMBd4wMxmufuT2W7P\nAdu4+6tmNgmYCWyabZ/g7q8UmG4hhBA9pJEa/zhgjrs/6+5vA1cDk/Md3P1ed381Ld4PDCs2mUII\nIYqiEeFfFXghW56b1s2PTwO3ZMsO3GFmD5nZYd1PohBCiCIptAOXmU0ghH98tnq8u79oZisBt5vZ\nU+7+y06OPQw4DGD48OFFJkuIRY4R024u3cbzJ+9Sug3ROzRS438RWC1bHpbWdcDMNgIuBia7+19r\n6939xfT/ZeA6wnX0Ptx9pruPdfexgwc31OtYCCFED2hE+B8ARpnZSDMbAOwNzMp3MLPhwLXAAe7+\nTLZ+aTNbtvYb2AF4vKjECyGE6D5dunrc/R0zOxK4DWgDLnX3J8xsSto+A5gODATONzOAd9x9LDAE\nuC6t6w9c5e63lpITIYQQDdGQj9/dZwOz69bNyH4fChzayXHPAmPq1wshhOg91HNXCCEqhoRfCCEq\nhoRfCCEqhoRfCCEqhoRfCCEqhoRfCCEqhoRfCCEqhoRfCCEqhoRfCCEqhoRfCCEqhoRfCCEqhoRf\nCCEqhoRfCCEqhoRfCCEqhoRfCCEqhoRfCCEqhoRfCCEqhoRfCCEqhoRfCCEqhoRfCCEqhoRfCCEq\nhoRfCCEqhoRfCCEqhoRfCCEqhoRfCCEqhoRfCCEqhoRfCCEqhoRfCCEqhoRfCCEqhoRfCCEqhoRf\nCCEqhoRfCCEqRkPCb2Y7mdnTZjbHzKZ1sn0/M3vUzB4zs3vNbEyjxwohhGgtXQq/mbUB5wGTgNHA\nPmY2um6354Bt3H1D4FvAzG4cK4QQooU0UuMfB8xx92fd/W3gamByvoO73+vur6bF+4FhjR4rhBCi\ntTQi/KsCL2TLc9O6+fFp4JYeHiuEEKJk+hd5MjObQAj/+B4cexhwGMDw4cOLTJYQQoiMRmr8LwKr\nZcvD0roOmNlGwMXAZHf/a3eOBXD3me4+1t3HDh48uJG0CyGE6AGNCP8DwCgzG2lmA4C9gVn5DmY2\nHLgWOMDdn+nOsUIIIVpLl64ed3/HzI4EbgPagEvd/Qkzm5K2zwCmAwOB880M4J1Ue+/02JLyIoQQ\nogEa8vG7+2xgdt26GdnvQ4FDGz1WCCFE76Geu0IIUTEk/EIIUTEk/EIIUTEk/EIIUTEk/EIIUTEk\n/EIIUTEk/EIIUTEk/EIIUTEk/EIIUTEk/EIIUTEk/EIIUTEk/EIIUTEk/EIIUTEk/EIIUTEk/EII\nUTEk/EIIUTEk/EIIUTEk/EIIUTEk/EIIUTEk/EIIUTEk/EIIUTEk/EIIUTEk/EIIUTEk/EIIUTEk\n/EIIUTEk/EIIUTEk/EIIUTEk/EIIUTEk/EIIUTEk/EIIUTEk/EIIUTEk/EIIUTEk/EIIUTEaEn4z\n28nMnjazOWY2rZPt65rZfWb2bzP7Yt22583sMTN72MweLCrhQgghekb/rnYwszbgPGAiMBd4wMxm\nufuT2W5/A6YCu8/nNBPc/ZVmEyuEEKJ5GqnxjwPmuPuz7v42cDUwOd/B3V929weA/5SQRiGEEAXS\niPCvCryQLc9N6xrFgTvM7CEzO6w7iRNCCFE8Xbp6CmC8u79oZisBt5vZU+7+y/qd0kvhMIDhw4e3\nIFmiKEZMu7l0G8+fvEvpNoSoCo3U+F8EVsuWh6V1DeHuL6b/LwPXEa6jzvab6e5j3X3s4MGDGz29\nEEKIbtKI8D8AjDKzkWY2ANgbmNXIyc1saTNbtvYb2AF4vKeJFUII0Txdunrc/R0zOxK4DWgDLnX3\nJ8xsSto+w8xWBh4ElgPeM7PPA6OBQcB1ZlazdZW731pOVoQQQjRCQz5+d58NzK5bNyP7/RLhAqrn\ndWBMMwkUQghRLOq5K4QQFUPCL4QQFUPCL4QQFUPCL4QQFUPCL4QQFUPCL4QQFUPCL4QQFUPCL4QQ\nFUPCL4QQFUPCL4QQFUPCL4QQFUPCL4QQFUPCL4QQFUPCL4QQFUPCL4QQFUPCL4QQFUPCL4QQFUPC\nL4QQFUPCL4QQFUPCL4QQFUPCL4QQFUPCL4QQFUPCL4QQFUPCL4QQFUPCL4QQFUPCL4QQFUPCL4QQ\nFUPCL4QQFUPCL4QQFUPCL4QQFUPCL4QQFUPCL4QQFaMh4TeznczsaTObY2bTOtm+rpndZ2b/NrMv\ndudYIYQQraVL4TezNuA8YBIwGtjHzEbX7fY3YCpwWg+OFUII0UIaqfGPA+a4+7Pu/jZwNTA538Hd\nX3b3B4D/dPdYIYQQraUR4V8VeCFbnpvWNUIzxwohhCiB/r2dgBpmdhhwGMDw4cN7fJ4R024uKknz\n5fmTd+lztkXrqer9rmq+FyUaqfG/CKyWLQ9L6xqh4WPdfaa7j3X3sYMHD27w9EIIIbpLI8L/ADDK\nzEaa2QBgb2BWg+dv5lghhBAl0KWrx93fMbMjgduANuBSd3/CzKak7TPMbGXgQWA54D0z+zww2t1f\n7+zYsjIjhBCiaxry8bv7bGB23boZ2e+XCDdOQ8cKIYToPdRzVwghKoaEXwghKoaEXwghKoaEXwgh\nKoaEXwghKoaEXwghKoaEXwghKoaEXwghKoaEXwghKoaEXwghKoaEXwghKoaEXwghKoaEXwghKoaE\nXwghKoaEXwghKoaEXwghKoaEXwghKoaEXwghKoaEXwghKoaEXwghKoaEXwghKoaEXwghKoaEXwgh\nKoaEXwghKoaEXwghKoaEXwghKoaEXwghKoaEXwghKoaEXwghKoaEXwghKoaEXwghKoaEXwghKoaE\nXwghKkZDwm9mO5nZ02Y2x8ymdbLdzOzstP1RM9s42/a8mT1mZg+b2YNFJl4IIUT36d/VDmbWBpwH\nTATmAg+Y2Sx3fzLbbRIwKv1tClyQ/teY4O6vFJZqIYQQPaaRGv84YI67P+vubwNXA5Pr9pkMXOHB\n/cAKZja04LQKIYQogEaEf1XghWx5blrX6D4O3GFmD5nZYT1NqBBCiGLo0tVTAOPd/UUzWwm43cye\ncvdf1u+UXgqHAQwfPrwFyVq0GDHt5tJtPH/yLqXb6C5VzXdV6c37vSiVtUZq/C8Cq2XLw9K6hvZx\n99r/l4HrCNfR+3D3me4+1t3HDh48uLHUCyGE6DaNCP8DwCgzG2lmA4C9gVl1+8wCDkzRPZsBr7n7\nn8xsaTNbFsDMlgZ2AB4vMP1CCCG6SZeuHnd/x8yOBG4D2oBL3f0JM5uSts8AZgM7A3OAfwKHpMOH\nANeZWc3WVe5+a+G5EEII0TAN+fjdfTYh7vm6GdlvBz7XyXHPAmOaTKMQQogCUc9dIYSoGBJ+IYSo\nGBJ+IYSoGBJ+IYSoGBJ+IYSoGBJ+IYSoGBJ+IYSoGBJ+IYSoGBJ+IYSoGBJ+IYSoGBJ+IYSoGBJ+\nIYSoGBJ+IYSoGBJ+IYSoGBJ+IYSoGBJ+IYSoGBJ+IYSoGBJ+IYSoGBJ+IYSoGBJ+IYSoGBJ+IYSo\nGBJ+IYSoGBJ+IYSoGBJ+IYSoGBJ+IYSoGBJ+IYSoGBJ+IYSoGBJ+IYSoGBJ+IYSoGBJ+IYSoGBJ+\nIYSoGBJ+IYSoGA0Jv5ntZGZPm9kcM5vWyXYzs7PT9kfNbONGjxVCCNFauhR+M2sDzgMmAaOBfcxs\ndN1uk4BR6e8w4IJuHCuEEKKFNFLjHwfMcfdn3f1t4Gpgct0+k4ErPLgfWMHMhjZ4rBBCiBbSiPCv\nCryQLc9N6xrZp5FjhRBCtJD+vZ2AGmZ2GOEmAnjTzJ5ukelBwCvdOcBOkW3Zbo192ZbtbrB6ozs2\nIvwvAqtly8PSukb2WayBYwFw95nAzAbSUyhm9qC7j221Xdmunu3eti/b1bK9IBpx9TwAjDKzkWY2\nANgbmFW3zyzgwBTdsxnwmrv/qcFjhRBCtJAua/zu/o6ZHQncBrQBl7r7E2Y2JW2fAcwGdgbmAP8E\nDlnQsaXkRAghREM05ON399mEuOfrZmS/Hfhco8f2MVruXpLtytrubfuyXS3b88VCs4UQQlQFDdkg\nhBAVQ8K/kGNm1ttpEIsmKluBmS1yOrnIZahKmNkQz3x1i/KDamZLpf8tz2NvX1czW8rMVmq1Xe9D\nfuBW3wMzW9HMtjYzc/f3Wmk7T0NZ55bwl4iZfbik85qZbQlca2ZHm9k+0NoH1cw+Z2Z7mNm2LbC1\nB3CimW3XajEys+V6UwDNbEngCOAiM9vXzAa2wOZeZnammQ0xs+XKttdFWraDXnkJbQ/sCdxuZluZ\n2SqtNJ7yfZWZ7WBm6xR+/j70Ul+kMLM1gMuAm4E/uvuVBZ57MXf/j5mNAYYCxwG/dfepabuV/aCY\n2WRgiWT7EuCH7v7nEuwYMIIY5O+LxPV8xN1vL9pWJ7a3B44Bfgs86e4/LNtmnf0VgLfc/d9pxNtp\nwFPAY+7+kxLtDgG+BBgR+TezN8KwzWxV4BTgLSIk/P+5+wsLPqppm4OBdd39V2n5CGBtIkz9h+7+\nWJn2k802d3/XzHYDxgMrEGX+/MJsSPiLx8zWcvc5aaC6rYBtgdXcfbcCzr038E1gjLu/ldYtB/wc\nuM/dj2zWRhf2DyD6a9zv7m5mawMnAY8DV7v7/xRo62PAWsD57v6mma1PjPQ6Eri7ZPFb1t3fMLM1\ngVWAi4gX+dnu/s+y7Gb2lwSOJvq/zHT3v5jZMKK/zBjgnqJfRGZ2LNDP3U9LyxsSNd9PAke4+++K\ntNdFWsa7+z3p94HAcGBT4Kvu/nhJNvsRtfytgLvc/dq0fiywE7AcURafL8N+srUXcCDwcXf/V3Lx\njQa+Adzm7icXYUeunoJJheR3ZrZ16r38U+Ao4A0zu7fZ87v71USt9+dm1j/V/l8nXi7bmNm3mrXR\nBWsC+wPjzGyAuz9DCNRQ4OMF2/oPsCxwaLL1BHA58DtgUzPboGB7wLyH7+rUhvL7VPvbhRDBL5Rh\ns570Uv8t8VX1cTPr7+5zgR+n9WPMrOGxWRrkV8B2ZjY1peEx4CzgB8B3zGxkwfY6xcxGAGeY2eEp\nHVcA5wN3ES6vdcuwm3z5vwAeBEbXfOzu/iDtfZHWT2ksq83hZuB/gMtSzf9ld78bOBYYb2afKMSK\nu+uvoD/av6A+ATxHDJrUL9t+BVFj6Mm52+qWd6jfRtSKrgI2LCFv/bPfRwO71eV5deA3wKFF2SJc\nDdsAE+u2rwqcCRxY4r08nRC8ZbLrO4IYhuSYkstRXmbGAMPqtg8lvj4OKsheW/b7Q/XXlajpfh04\nJL/nJV+DscDPgE3ydBIuqHOAFQq2VyvHSxMV4oH1eQUOSmlaooT85s/X4sCk+nsE7AN8lXC/9WvK\nXtk3sCp/mTjUCtDobFu/9H8EcCqwTg/P3Q84mah1W71dYADwLWD7kvLWH9isk+21/I0BfgKsWpCt\nPeu25Q/hOKJmtkHR+cyWt89+1/K4NnAPMK6kclS7rzsAx3aW97S8MeHeW6uge9sP2HwB13sycFkZ\neV7A9X9fOSJq3CcDIzu7Lj20W7u3k4ga98r5Nai7Dt8F9i0j30ncj+osben3usSXR9NlXq6eAsga\nY/oBM1Mr/NNpWz9vDwd7ifh037o758/OPYuobfzeU0lI28yDt4FfAh+0mP2sEDL7twIfyc+d8v5e\n2v574FGgx+FvdXndom6zJZvm7r8h3BBL9tRWhxNn99DMTjKz9d39jmzbe+n/M8CPgA8UYbced/fU\ncH468ETd+n4pPf3c/bfADOAfPbWVzpPf2/06SUu/9PsG4C0z+0hP7XWRlvz6X2dmHwT+WEtnzbXi\n4e4zotGd2nPQQ5uLpXO8Z2YTgLOBk9z9JTNbvHb+/DoQk0k91VObnZHdg+uBjeq2vZfl/SnC1blR\n066mIt9cVf4jakw3AWfOZ3vuEjkCGNDAOQ/Kfh9CRBVATHH5JWL+goGdHLdkQXlaOvt9MTA9/V4C\n2AxYLM9b+r0VMKgHttbNfp8CHJd+LwXsUX8d0+8NgOULvoe3AKd1ti37vTMFunuAIcCWWX5/AqxB\nfMFtTbwElu4k/6N6eq/JatOE2+ik9LuNaERdtj7vxBDrSxWV7/lc/58Bp89ve7bfXrXy18Q1P6R2\n/YCvEI2qyxIunXuA4zs5brGirgEwIfv9deCM9PsDKW1r1+eRcLst3axtRfU0gZkdBZzr7p5Crz4O\nHE64dA4gasA/d/dn0/79PN7gS7j7vxZwXiMK2HHuPj2tG5POvQlwN+HnfYV4GdxXQt6mAs+5+41p\n+SzgTeKrZWNCdP4OfNSb7OBiZrsSXzI/TctnEA/334HBxCf4HHef2Iyd+djeLcvjIYRv9ZOp4Xgn\nQny/V3+/zGwZd3+zAPtGlJX7gRfd/R9m9lNCgF4F/kT4u19290Iaz81sEjDC3WtzY58MPAx8ONnd\nDbgQ+HZ+b8sIEzazzwHXeNSytyDE9/PAhoQAPwdclz1DRtT4l/AmoqusPTb+b8SztjZwLeE+vAP4\nXyJUeX8vMFIts78+sI2nEE2LiajWJCqGLwETiCCGY939b0Xbl6unh6QC+I8k+v2AZwgXxy+AzxDC\nvAmwcu2Y2kO0INFPbOnub7v7dDO70Mx+7O6PECP9neruX3D3fYmCMrTwzAXfd/cbzexAM/sA8Yk5\nlhDicwmf70tEDbXHmNnKwGx3/6mZHWpm44ETiBDKN4FT3H0k4WYY0YytTmwPJBoza9wPvG1mDwOf\nJmrdqxNfMR0oSPQHEQ3xVwB/BU42s62IRrzbiZrvMUTt710zW7YAm4u7+y2ES/K4FCFzO7Aj4Tb6\nZvq9CTF71DxKEP3+wPNJ9IcBfyBCdW8lIsdeAzYH5nVg8uC9noq+mQ20iM3/i7s/DRxPVKh+TeT5\nIHc/lWjEfwfo6lntSRo2cPcn3P18Mzs+VSCvAv5C6MfX3H0MEcQwqmj7gFw9PfkDDsh+f4EIgzOi\nlrJztu0mUiREN869OvA6UdOAiDJ4kiwaiPgUvIaI7y46b58GjiQe+hWJB+A4Orp9jHgRXNakre2B\n84iG2gHAaYRb40N1eb0MuLzgfI7Nfp+b7LYR7qO9sm03AJNLuM5thKCfQ3TSWYaI1T4D2Drbb0/g\nEWD3Amx+jBD55dLyd1I5GpHt0y9d7x5Fn3UjLZ8luUyICZruISoVawJbZfv9lOhDUJTdzQlX4ldS\nmZuQyt3XgDXSPnsRIZV7FGU3s79hup+7puXdCd/+gdk+KxIRZZeUdv3LvLmL4l8SvcdqDwbRuPj9\nVEBrrfOrEH7a73f33On/hwk30djMxv/UhB7YDjghO66p0K66NGyVHvxD0/Jw4gX2zfRgrphE8sr6\ndPfA1jpEH4eTk/CtRrxkTiNF1KTflzdrq87uukQj9B5pea20fHy2zzLpgby4xLI0gmjv+XpaHkL0\nzj2daEcw4qX00SLyTrTNfDuVzf7E1+iXUj43SvucR1ahKOJ6d5KOxYi4/KtT2R6Y8n0NKeKNqAD9\nOC9nBdrfmugFPjgtb5uu+VdSeR9LCpcuKf87Ee7akSn/OwDXAZ9L2z8PnFXqPSirUC+Kf5kwtxGf\nZLVY9qUJ8b8hbRtLfK7VjmtYmGlvwBpaZ3NJoub/g872LzifmwEfy5ZXA24EphOupdWLsk/U8DbL\nlocRDV2npgd0iaJs1dndLj18NcEbQfQ+Pj4tb0NJL9e6dKwKrJQtr5TE+PyUhlp5KOThJ8Q/D1Md\nQnQOup7wc69Sdp7TuZdM9/gTaXloJv6jkihOKzIt+TUElqnbtnW65sdTF1ZaUv5HZb8XByYm8T+i\nLp2l3AM17naTLOysQ0OXmS1NfLaPdPcJ2fo8nLMn9voRN/+dZOMb7v7lZvLQTfv9k+3hRA3sZHe/\nPm0rtLEva/xejfgSeNjdryrDVjrnSHd/rnbu1BP2JuAmd/9qfbqKtD2f9NTSMYRwuV3v7k+2wO4Q\nYga9v7n7mXlaSrbb393fyZZXJhp0JxFfY6+m9aVd/zyfKaTzJS+hMXcB9ucFfBA1/zXd/Yz6tBVu\nV8LffeoKy7xCmRrf9nH3hqdba/TmdvKQlP5gZrZqhfMDXkKEwXxsttJWP+JZeNdibJ5Dc+FvJZn4\nL+bu/2mh3eU8hv5oOXXP0KrAxp4irVpkv2XPUhfp6PCMl2qrD+S3T9PTQtGd42pfEd1PXeuoe9m1\n8qXTSlsd7kNRtvPw3b5+7fqKCEJzaTGz5YmAhD8WnKyepKXPXNMaCudcAOkN7NaDXrBd3Wgzm2Zm\nV6Z9303hoS3DzCalcLqGyPPTAzHZNYVpdpuiHhhrYBal+pdvQaK/InBUciMUlp8ubH7KzD6Yym63\nylUJ7rSV0/9ul+8mRH8JYjypPZPbsLbe6vYr7Zkzs4Mt+ii05J53Fwn/fDCzTwI3pLjnDsJcUIE5\nHRhuZufDvO7x9QWzYWHuDhbjvB8HHGsdh18o/EFItsYTI4duuoD9yrD9TUujlbbCRz8fFiP6Okww\ns3H1Gy0NG1AU6XoPBKaa2Yb5V1rdfqVXNCyGdzjXzNadn/iVkY70dXUfESiwrbUPzVC7FiPy5aJJ\n9v4MfMjMPrWA/XpNfyX882c2EUL5/eQCyAvJ4tCzG1cT2uS//QjRUzDfvrqZ3ZT8nu8U/WCk8/6d\niA/flmw8nPTyWc3M1kv7NmU7Xbe/E43eRt24Oma2qpntmfzZZTyEFwO7m9m0zGZtvJ+2fLloLGav\nGuruLxNjCr0EzK3bZyJwsJkNKMpuut4/JOLi18rWu5mtaTEOUKtqoU8T0W9ftuisNo/0girjC6M2\nttDtRKTdL/K2EjPbHDi8VsbLINm7m7gPr9albzUzOyG5f3qrMiLhrycrOG8SoV1X1vl9BwE/TbWp\nbt046zgQ1X7u/o67X5q21QZa+z+iA9c5KR1FRs3MG1At+T739jTTUMb+REeipmxneW0jYsZP9hhX\nPGczIoyu0GntLOjn7n8g4uHn9bJNAjgcOKv2Qi/h5Tqa6P7/XTM7PInxhe7+xzpbQ4gOY00PqGdm\nG5nZNmlxV2KiluvqdlsD+JTF8B+lkT1Dc4n+LRe7+yvZ9uWB0y2m1CzSrqXyPdrMTiWGufiDdXTV\n/pn4Alur87M0ZX+eHXd/y93v7OQeLE6E8Baa9+4i4c/IhNHMbJV0825J22qF+RViONyDLWZJahhv\nH4VvNtGDMN/m2Wf/NODR7p5/QVjHkRiPTbXNAdn22giAJwFtZnZ0E7Yss3UN0RM2j0iqlbubUhp2\n7KmtTmy3pRfoe2Y20N1fcPdza+kCSC+EkURnpaJfrksSna7OIobYWCfZeLdmKxOIq4hp9SZ0cqru\n2OxHDC1wrJk9TfQLeKaTXX8DPERBI5rOJy3zKhcWUW5/cfd707baS+9t4hkabQW6M9O13YHoB7IL\ncEx6jt/Nnt9niXj5tYuyCx3LvJnNTLX6PfLt6edzRG/4poY6aZZSfMgLI3XCeBPwkpktQ4y//WBd\n7f5nRA/XhiNxajX6dL5n3H2qxZSJ2wLPuvvj2SfpS0RHrbeaz1mHmpARsfgDgI8Ct5jZjcl2HkJ4\nENFztUdkQvoD4tqdmtKxjrs/7e1DHP/bYrq/FZrKYKLuHl4NDLZoQL/G3V+riW4S4b2BKVbQYGsZ\n7xIP92OE6K9n4eteHljR3S9Jaaz1jzikmU/+7MvqBWKAtTbg/zrb191fM7O7UvoKxzoOT34XMcDc\nEimPfycqmu+6+1tm9jghvoV9bVnMSzyTGF56DNEp7xgz+667v5zS5e5+V0pfYWRl/gKik9xzwEkW\nYbKX52XPzH5CVunqFbzkHmoLyx9RKI3wC08lhiZ4jY4zXeU96pbtoZ0pRMPqVUSN8xGidvi+4ZVL\nyOOZwHfT7zHJ7tcoaDKTuuszkJjUYidixMULgBeJDmjzrnkJeTSiB+Y3iDaUO0ljD9Xt0w9YvKTr\nfArh4x1ICNH16Z6/RMdhIayz392wk0+iMobo8Tqe+Mo6KttvWE/y0UT+v0BUcAalMvb/SDNmkQ0z\nTF3v2WbLHTHuzSXZ+m2JsYn+izQ8Qwl5/XD2+zRgRra8DTF448HN3Ocy/irv6jGzE81sc49a1xKE\nOD0CXAl8y91/ZtEI2aEB0t3f6KadH6VPv1nEXLK/I8YG2Z4YO2alYnLUwWYeiTSciHLYxWIi8UeI\nxq9VgAMsRkdsxta8BvBU8/4rMdXkhUTN7kqiW/oIi3A7vKDGLYvhlGt8lvgau8Dd7yTaaSYC+1j0\nUMWD99z930XYr8fdv0Lc358Qcxjs7u7fJr6kBlpqzK0rTz2Jt88nUTnc3f+XcOdcQkyYc4yZXUgM\nT1EaZvZRS2GTZnY6IcA3ufsr7n40cC9wm0WnvHkNrd7kl1ZWvms69iQRSbNLOv/dRM17ZZI7sa6N\npSnSl9wa2ap+wMZmtln6ovsFUR7PsRh1tdv3uTR6+83T23/EOCUPE70FIUYsfIaO0979jJj1vjvn\n7Ve3vDcwB9gxWzcA+G/KGWUzn0e1NhrjIEIULs7WjaPJuWvpWPM8n6jlTUrrlk3/+wOXAucUnM9l\ngSnZ8haEEJ5BmqSFaE+5l7q5ewtMQ63GOY4Q983S8jeIqK2BxJSBc6mbS7WH9kZmvy8iGs5r5Wlz\nYtybLQjlFHUVAAAP2klEQVRf9oVl5DmzP5z2we5WIMZeuoUYWjufzOUysrGPCrS/HTHmz2QidPaQ\ndE0OI+aN+A0xCGCh5a52vdP/KbSPOfQd4mt+XdrHWVqvzHvQo7T3dgJ6LeMdhfG89ICuSURZ3J0K\n0E7ENHs9GqGRcCmsQsdP0WdpH21xGnBevn9BectnKrqdiDC5Kq0bSkQMXUrd7FXN2Cd8yzcRn/nH\nEqOLTk3CsDjhfris6Lxm55tO+PIhxlU/K4lP7QXX1Ny0DdjfjfhSvIT4urk0CdF0IqzxUGCLAuxs\nTxrFMS1/GziYcOPV3Gkn5uWgjOvdSboOIRq1h6bn6I5UDgqbIa0Tm1sSlbbpxDSVhxIur52IxuOb\n0vL2RHRRUTNn5dqxRO05pr2yczLRxrQBlD/gWo/y0NsJ6NXMt0+1d2oSyN8Sn26bEJOWX0y4e7p1\n4zKhn0bUutbI1n2GaHzbkWz6xTIKRRKFU4la8T3ArLR+KNHwemiT59+LNIUf0Vg8LYn8HYRv+wbi\nRbAaMKbIvPL+ibmHpGtdG7p6U2IO1TPoOMJnGcPstqX8bp2WhxPDWNdG+vwe2Rj7TdhZJiu304ka\n7eaEO+34dA3WIqLGhped57rlMamsfZtwrayVnqnpZNNDFpUW4uVyI+017S1TeTuciFrqT3wBTSRe\nChsVme90D8aRavPpuc7FfwbZC7qv/fV6Ano183GzfpR+r5AKzQOk2iHdfFvXPwxp3bnEZ+7ItDwi\nFdgzsn3KeDCPTw9ePqnFL2kX/6ZqYsTL5E6iQWtQWjeAaMT8Wlr+DuHr/kiRea17+PYlXjqLE+0k\n1xP+fYg+Agc0a6+LtGwKjCbcS0fX0keEE15et28zX1ST0/1bKbu/N5C5EZLYXVbLf4l5zq//14mR\nRDdKgn8q8dJbmXB3fKVg27UK1PapfF9DR5fencRE7P2JL65jSGP8F5iGWoXxFMKFeEKydWR63j9W\npL0y/irVuNtJw84/gVrI5BvE5+AbwM8tm+avFg7Zxbnzzlknmtn+ZraWux+Zznliatw9C7jZY0o9\noLAxYeo7Ad1HdAQbbzHcMO6+NbCOmR3j7q/V8tYDW4t5NG5/kpg05kQzW9Hd3yYegI3SrisBV3s0\nspLS0FRe60I27yJCJo8j/Lr/Il7mq1hMV/lLd78yHVfGkBDrEffzZeILcVcz+7hHuOjfgaFmNjjr\nP9DjvLv7DYTQXZwax2cQX1anWvtQGKcD/3b3w1P6SumVnF3/G4gX7lDixfcq4eJYnHgh/MndTyki\nLdnxK6WG0zuI4Ig/ESGby3nMPX088CuPzpH/Ac70mGKxSE4nrv230vKfk62LiK/5eVNklnUPmqa3\n3zy98UeM9b4LcYN+R5oBKW07E/hsD8/bRojRt4mpCWcBO6VtxxIF5sJs/6I+e/Ma2EHEZ/dSxIQw\nPyC+ZFYvyFat/WBAOvcMQvhmEtMk1mbseojiZ86qNRT3IyasmJ6Wf53fM9LkIiWXoTGE6BxRSxvt\nbTgziIb8XQqwcwDtXzBtwLbZtkHEOPqzgA+ShSxSjutww+z37oTwLkEEPxydbVuf6BVelN1amduZ\naKw9g5iLGeKL60zCr75cSffa6paPIlxIt5K+aIhQ2rFlXPdS8tTbCWhJJjs2xgwmagVnEzHPI4hG\nuSuIRt08DrgR985mtE/hdgzwxfT7PqKR73KyGY+6c+5u5rEW1ncR0cB4dsrrxkmgv0jWV6AnQpwJ\nbxtRy746La9E8q8T/R8GAJs0Y6sT2x8l5jauuTompof9vuyaDwP2rTuuMDca7W6G2ov2LuDJun1G\npjKxURH2k7ifQHsjdb0IDSL6Yhxbn86Cy9cuqWzVfNgTiXDgO2h3cS1LehkUkRayfhZETP5jhFvt\nq8TE5JelbVsS0WSjemprAWnozH17CPAe8KVs3XXAF8q8B4Xmq7cT0LKMRoRNTaCHJZE+i6gxLENE\nAnyiOzcuHXsT4WdeMj2Ey6QHovYw3EN0ZNqy6EJBzNZT+30KMWVfGzGF4A+JxukPEKF9hzdpq154\n90/Xb8m0PAT4KxHNskx2XJHCe3y63ksSnWN+RcfG9+vJoqSKLj/p/7h0rRdPyz8H7iqx3K5I9AeY\nUp+WbLmQjlBdpGMVovH+NCKEckAq27/O9rmGbK7YJu2tQIRFjk/LE4komR2JDmHrEbX/S9P2MqOH\n+gG3EaHXRwPLAV8m+g18KpXJ75d9DwrNU28noNTMwQey34cC/0vqxUi4JL5L+Ckn1R3XnTly9yZq\nQvsTNZ6a73OLtP1HZD33CszbpangT0jLGxPDAtxK1O7XJkLdrs0fimaEuE54JxL+3A9lIvgDsp6p\nBeWz3tWxTbZtKvGldg3xcr28SNudpGX7dG/fIl6sS6T1s4H7S7Q7BniB7GuGqMjUvwDKqOnnIaGD\nkvifRVQmVkoifG0SxkLcmLS7dr5G1KTHpuUl0v3eNS2fSoTKbthTWwtIQ+4lOIL4upyUNOO76Vnf\nK207prPr1Zf/ej0BpWUsWth3rFv3HcIfPDwtjyPcO/t289xr1C3/N/EZul8S/s8Q3fPvIsXPp/2K\nrP1+nfCjf4v2doQVgB9m+/wY2L9JOwvyMU9L4n8hEU1xdtF5pWtXx8ZE7X/XbF0hD1+d6G0EPEXU\nNEcSL8Dzae/E83Ng0xLL8/ZEDfPg+aWxBJtDa9eceKlvSTTkfpXws29KRM98iI5ftD1OE9HZ7Uo6\nuk9vrF1bov1kCvEFehPluHfyfjCfTPd5w7RuM+CkJP6rdHbcwvDX6wkoJVNRKGo36gDgM9m2byaR\nHpMEe3o3z70D4Vcel5anEGOOH03UwvdLD8rGwOTsuKI7LH2EaJg+Mf1NSnYfTw/pbOCiZu03ILzj\nic5L83VFNJnPTl0d87NRlG3CRfbfwMpp+cNEzXZAlq5niMH0WlWuxxNDEEwFdi7Z1u5JgGtx6pcB\nI9LvlQlf/llkL9yirj9Ru74xXeMViErcjUR46HjClfprMtdsgfnun5Wx2cSX3e/JOnESL7xz6WaF\nsS/99XoCSrhxA4iYZyOiHU4k3B93ZA/xN4hPyAuy4xrx6S+Tzvt5ojb9PaKmV3Mf7Zcels/UClBa\nX1QN9CCyjkDp4TuPaGg9k3jZLJ7SN7U7eVuAzfkJb6d5Kiqvdefs1NXRgrJ0ElGrHEwEAVxJxIrX\nOlJNIToH/VcrynayOYpwL5xGCS7EZGMpwhV6ItHLe32iXWOVbJ+ViOi1fQqyuRKpfSCVr6NJbVhE\n5WMq0YYzOq2rBRoUWcnIhxeZTvLbE18hD5IGOEzr1izKbm/89XoCCs1Mxw5XJ6SC2ZZE/0GillAT\n/3wckUaidyYTNfuBaflY4BVg99w20dgzpdm8dGJ/DyKSYA7h3lmHaOjan5jY4UtEJM/EuuOK6CXb\nK8Jbl4aWuDo6EaCpJNce0Xbyo7TuU4SbcDJRG+3RaK1NprXw8MWUn7uJF/4qSQAvAh4lwpH3IyJ8\nDqbgES8Jl8r6nd3XJP5fJGrhKxRd/siCF9L5LyBe6h9M24cQPfsvqjuuT0fvzO+vJlYLPbXxzbPl\nQUQj3GXAne4+w8yeIGZj2tFjfPB8nPxGbEwnPvn3ImpF+xFhZqd5dB4phdqY8WZ2JCH2ixG18H2J\n8LbPEIX2y8AT7n5VCWnYnnixnOrul2Xr+3mLppCzmLD9SsK/PMfdZ5dk53wiOuiJ+vyZ2b7EEBwf\nIob7XY5wH+7k7v8sIz2txsy+QeRvTyKS5uNEY/7LRCeyQcSL7pi0f8PP0HzsGVFBO4HoDHVOZ+c1\ns8HEeDudzjfQLGZ2PBGL/wmiHWd/4v5e6u6PmtlQot/G8WXYbym9/eYp6G29dPrfnwg127z2Nib8\ngdun5XOAb3bz3I12oBlfUt5qXfUHpfx9mqhhbk1EVlxPasClxJC2dP6W+ZgXkIbSXB2pvPQnvhSP\nqltfPzZNGzHFYWHjwPTmXxflfAPCrXgOdS4OinW1rEf0fD2wjPM3mO8J2bYxRCTbOWTj7rciXWX/\nLfQ1/lQDW5FoiLsO+BsxyNolRIjZBCK0cSMi5vjodFxDtZT05XAk8D13f72TWsggIlT0bXf/XqGZ\na7eRf2ksT9TEdiQ6jDxjZku7+z+y/ZuqgXWRllFE7W8N4HHPav+tJnXTf72E865HtAsd7+5XpHXv\nu6ZmtiXwqrs/WXQaWk0D5XwMIZKP1q5JSenYmnD5nOJpuI20vpQvywbyvRHt+b5yfudZ2FgUhP+z\nhLvDCb/g1PTgHgr8gXCJrEHUys5Px3THvbMi0SP1Tnef0dnxVvz0fZjZAURfgMPTODxbeZqsPBXW\nfQgBPtXd7ynSdjfSWIrw9gUWIEClvVR7kwbL+Wru/kIL0jKBiNf/DjGu1R9KtNVn8t1KFlrht/a5\nUzGz/Qk/5FLAnunNPZoYQW8Xd388O67bNYdU27mJGJfjqrTufQNvFSkKfeFLo+q0UoD6Ar1RzheQ\nlg2IMXH+QUzYflKJtvpMvlvFQin81j4SZi7+2xGRBncTQy3/w8xuJ4YIfqAAmy1t3OytLw3RkVYK\nUF+gLzTiZzaXJPzuHwZ+4+5vdXFIM7b6TL5bwUIn/LUbkYaFvYnoRDOKaInfg/B9DwP+TORvnwJt\ntySqJLNXuZpIX6SVAtQXaHU57ytUKd8LnfDDPPH7MXA/caNeIjo23WNmHyWiPn7i7pfU9i/QBdPS\nxs2q1URE36AvNeK3kqrke6ERfjNbBXjdI559cWIMjx8Q0Tu3uvv3zGyYu881s01q7p2SI1xa0rhZ\npZqI6Hssyo34C2JRzvdCIfxmdiYxFMFtxKQPyxIds9YiQu6uTvtdQ3S8uSstLzIukKrURIQQ5dPn\nhd/MLiWGAT4W+EftDWxm2xBD8e5GxO5/DXjL3Q/upaS2jEW5JiKEKJ8+LfxmdjDwSXffOVu3ChHm\neDawITGt4BvAK+7+hbTPIlPTF0KIounf2wnoguWJgdFqk4kPJEbDfILwd08n4vfnRbmo0VMIIRZM\nv95OQGeYWW2W+peBjcxs+RSv/y/gRHf/GDFa4BbQQfRNoi+EEAumzwl/Cse8xcyGECNOGrBVEv/X\nvX3kybHAYrlLR+4dIYTomj4n/O4+i/b5O+cS88oeDOxuZhub2ZJmdi3wkrtf3ItJFUKIhZI+07jb\nyaBk22RhmYcQ067tRUx7+Ja7fyptU0OuEEJ0g74k/AsclCztMxR4093fSMtqyBVCiG7Sl1w97xJz\ne+4L4a+vjUtTw93/lIm+GnKFEKIH9JkaP8x/UDK5coQQojj6Uo0fd38EOAQ4LnXe6hCf34tJE0KI\nRYY+VeOvoUHJhBCiPPqk8IMGJRNCiLLos8Kfo0HJhBCiOBYK4RdCCFEcajAVQoiKIeEXQoiKIeEX\nQoiKIeEXQoiKIeEXQoiKIeEXQoiK8f8Bx6TOFtAZBDgAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "def m_step(aligns, data):\n", " \"\"\"\n", " Perform the M-step for IBM Model 2.\n", " Params:\n", " aligns: a list of alignments, one for each sentence in the training data. An alignment is represented\n", " as a list of vectors, one vector per source, that represents the distribution over aligned target tokens.\n", " data* a list of (target, source) sentence pairs.\n", " \n", " Return:\n", " an IBMModel2 object with transition and distortion tables estimated via soft counting on the \n", " assignments. \n", " \"\"\"\n", " alpha = defaultdict(float)\n", " alpha_norm = defaultdict(float)\n", " beta = defaultdict(float)\n", " beta_norm = defaultdict(float)\n", " for pi, (t,s) in zip(aligns, data):\n", " for ti in range(0, len(t)):\n", " for si in range(0, len(s)):\n", " prob = pi[si][ti]\n", " alpha[s[si], t[ti]] += prob\n", " alpha_norm[t[ti]] += prob\n", " beta[ti,si,len(t),len(s)] += prob\n", " beta_norm[si,len(t),len(s)] += prob\n", " for key in alpha.keys():\n", " alpha[key] /= alpha_norm[key[1]]\n", " for key in beta.keys():\n", " beta[key] /= beta_norm[key[1:]]\n", " return IBMModel2(alpha,beta)\n", "theta1 = m_step(align_matrices, train_model_2) \n", "plot_table_for_target(theta1.alpha, \"is\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that the algorithm already figured out that \"is\" is most likely translated to \"ist\". This is because it is (softly) aligned with \"is\" in every sentence, whereas other German words only appear in a subset of the sentences.\n", "\n", "### Initialization (IBM Model 1)\n", "We could already iteratively call `eStep` and `mStep` until convergence. However, a crucial question is how to initialize the model parameters for the first call to 'eStep'. So far we used a uniform initialization, but given that the EM algorithm's results usually depend significantly on initialization, using a more informed starting point can be useful. \n", "\n", "A common way to initialize EM for IBM Model 2 training is to first train the so called IBM Model 1 using EM. This model really is an instantiation of Model 2 with a specific and **fixed** alignment parameter set $\\bbeta$. Instead of estimating $\\bbeta$ it is set to assign uniform probability to all target tokens with respect to a given length: \n", "\n", "$$\n", " \\beta(a_i|i,l_\\starget,l_\\ssource) = \\frac{1}{l_\\starget + 1}\n", "$$\n", "\n", "After training the parameters $\\params$ of Model 1 can be used to initialize EM for Model 2. \n", "\n", "Training Model 1 using EM could have the same initialization problem. Fortunately it turns out that with $\\bbeta$ fixed in this way it can be shown, under mild conditions, that EM will converge to a global optimum, making IBM Model 1 robust to choices of initialization.\n", "\n", "Let us train IBM Model 1 now. This amounts to using our previous `eStep` and `mStep` methods, initializing $\\bbeta$ as above and not updating it during `mStep`. To measure the process of the algorithm we use the change between alignment vectors between consecutive iterations" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "def measure_change(alignments1, alignments2):\n", " \"\"\"\n", " Measure the difference between two alignmetns\n", " Params:\n", " alignments1: a list of alignments (lists of alignment vectors)\n", " alignments2: another list of alignments, for the same sentences\n", " Returns:\n", " The total absolute change between both lists of alignments.\n", " \"\"\"\n", " total_change = 0.0\n", " norm = 0.0\n", " for a1,a2 in zip(alignments1, alignments2):\n", " for t1, t2 in zip(a1,a2):\n", " for s1, s2 in zip(t1,t2):\n", " total_change += abs(s1 - s2)\n", " norm += 1.0\n", " return total_change / norm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "EM for IBM1 model amounts to the following algorithm. Notice how we reset the $\\beta$ part of the model in each step to be uniform again (as our M-step implementation had change it). " ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "run_control": { "frozen": false, "read_only": false } }, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAD8CAYAAAB3u9PLAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHaRJREFUeJzt3XuUHOV95vHv0z09gMRFt0HIuiCBZVj5gmxPZMV2HNss\nNiJeD87ZZcVZsOLgI3NiwCTmbOTNZpc9OcmyHC6xzwG0si0Hso4JiSHMOoqJzBp7HYMzAyGgiwUj\nISKNdRkQRlhYzO23f3QNW2p199Tcpa7nczxnqt56q+p9LdGP6lfV3YoIzMzMClM9ADMzOzE4EMzM\nDHAgmJlZwoFgZmaAA8HMzBIOBDMzAxwIZmaWcCCYmRngQDAzs0TTVA9gJObMmROLFy+e6mGYmZ1U\nnnzyyZciomW4fidVICxevJjOzs6pHoaZ2UlF0otZ+rlkZGZmgAPBzMwSDgQzMwMcCGZmlnAgmJkZ\nkDEQJF0qaYekLknrqmz/D5KekfSspB9Lumi4fSXNkrRZ0vPJ75njMyUzMxuNYQNBUhG4C1gFLAOu\nlLSsotsLwK9HxDuBPwI2ZNh3HfBoRCwFHk3WzcxsimS5QlgBdEXErojoBe4H2tIdIuLHEfFKsvoE\nsCDDvm3AvcnyvcDlo59GfY9uP8Ddj3VN1OHNzBpClkCYD+xJre9N2mq5Bvi7DPvOjYh9yfJ+YG61\ng0laK6lTUmdPT0+G4R7vB8/18NUf7hrVvmZmeTGuN5UlfYRyIPz+SPaLiACixrYNEdEaEa0tLcO+\n87qqUrFA30DVw5uZWSJLIHQDC1PrC5K2Y0h6F/A1oC0iXs6w7wFJ85J95wEHRzb07JqKondgcKIO\nb2bWELIEQgewVNISSc3AaqA93UHSIuBB4OqIeC7jvu3AmmR5DfDw6KdRX3OxQJ8DwcysrmE/3C4i\n+iVdBzwCFIGNEbFV0rXJ9vXAfwFmA3dLAuhPyjxV900OfQvwgKRrgBeBK8Z5bm8qFQtEwMBgUCxo\nok5jZnZSy/RppxGxCdhU0bY+tfxZ4LNZ903aXwYuHslgR6upWA6BvoFBioXiZJzSzOykk4t3KjcX\ny9P0fQQzs9pyEQilJBD6/aSRmVlNuQiEdMnIzMyqy0UgDF0h9PY7EMzMaslFIAzdQ+gfdMnIzKyW\nXASCS0ZmZsPLRSC4ZGRmNrxcBIJLRmZmw8tFILhkZGY2vFwEwlDJqM8lIzOzmvIVCC4ZmZnVlJNA\nSEpGvkIwM6spJ4GQXCH4HoKZWU35CgSXjMzMaspJILhkZGY2nEyBIOlSSTskdUlaV2X7hZIel/SG\npJtS7RdIejr1c1jSjcm2myV1p7ZdNn7TOpZLRmZmwxv2C3IkFYG7gEuAvUCHpPaI2Jbqdgi4Abg8\nvW9E7ACWp47TDTyU6nJnRNw2phlk4JKRmdnwslwhrAC6ImJXRPQC9wNt6Q4RcTAiOoC+Ose5GNgZ\nES+OerSj1Oz3IZiZDStLIMwH9qTW9yZtI7Ua+FZF2/WSnpG0UdLMURwzE79T2cxseJNyU1lSM/BJ\n4K9SzfcA51EuKe0Dbq+x71pJnZI6e3p6RnX+kj/LyMxsWFkCoRtYmFpfkLSNxCrgqYg4MNQQEQci\nYiAiBoGvUi5NHSciNkREa0S0trS0jPC0ZUNPGfnTTs3MassSCB3AUklLkn/prwbaR3ieK6koF0ma\nl1r9FLBlhMfMTBJNBblkZGZWx7BPGUVEv6TrgEeAIrAxIrZKujbZvl7SOUAncCYwmDxauiwiDkua\nTvkJpc9VHPpWScuBAHZX2T6uSsWCS0ZmZnUMGwgAEbEJ2FTRtj61vJ9yKanavkeA2VXarx7RSMeo\nVJRLRmZmdeTincpQvkJwycjMrDYHgpmZAXkKhCbRP+B7CGZmteQnEAoFen2FYGZWU34CwSUjM7O6\n8hMILhmZmdWVm0BocsnIzKyu3ARCs0tGZmZ15SYQXDIyM6svN4HQVPAVgplZPbkJhFKxQK+vEMzM\naspNIDQ3iX5fIZiZ1ZSbQHDJyMysvtwEQvmNaS4ZmZnVkptAaG7yF+SYmdWTm0BwycjMrL5MgSDp\nUkk7JHVJWldl+4WSHpf0hqSbKrbtlvSspKcldabaZ0naLOn55PfMsU+nNpeMzMzqGzYQJBWBu4BV\nwDLgSknLKrodAm4AbqtxmI9ExPKIaE21rQMejYilwKPJ+oQpuWRkZlZXliuEFUBXROyKiF7gfqAt\n3SEiDkZEB9A3gnO3Afcmy/cCl49g3xEruWRkZlZXlkCYD+xJre9N2rIK4HuSnpS0NtU+NyL2Jcv7\ngbkjOOaIlYoFBgMGBl02MjOrpmkSzvHBiOiWdDawWdJPI+KH6Q4REZKqvlInIbIWYNGiRaMeRKlJ\nAPQNDFIsFEd9HDOzRpXlCqEbWJhaX5C0ZRIR3cnvg8BDlEtQAAckzQNIfh+ssf+GiGiNiNaWlpas\npz1OqVCeqstGZmbVZQmEDmCppCWSmoHVQHuWg0uaLumMoWXgY8CWZHM7sCZZXgM8PJKBj1SpOHSF\n4JKRmVk1w5aMIqJf0nXAI0AR2BgRWyVdm2xfL+kcoBM4ExiUdCPlJ5LmAA9JGjrXX0TEd5ND3wI8\nIOka4EXgivGd2rFKTeXs8+cZmZlVl+keQkRsAjZVtK1PLe+nXEqqdBi4qMYxXwYuzjzSMSoVy4Hg\nb00zM6suN+9UdsnIzKy+HAWCS0ZmZvXkLhBcMjIzqy5HgeCSkZlZPTkKBL8PwcysHgeCmZkBuQoE\nl4zMzOrJUSAkVwj9vkIwM6smd4HQP+hAMDOrJkeBUC4Z9bpkZGZWVY4CwSUjM7N6chcILhmZmVWX\nm0BocsnIzKyu3ARCs0tGZmZ15SYQXDIyM6svN4HQ5DemmZnVlSkQJF0qaYekLknrqmy/UNLjkt6Q\ndFOqfaGk70vaJmmrpC+ktt0sqVvS08nPZeMzpeqGvlO51yUjM7Oqhv3GNElF4C7gEmAv0CGpPSK2\npbodAm4ALq/YvR/4YkQ8lXy38pOSNqf2vTMibhvzLDIoFERTQS4ZmZnVkOUKYQXQFRG7IqIXuB9o\nS3eIiIMR0QH0VbTvi4inkuXXgO3A/HEZ+Sg0FeWSkZlZDVkCYT6wJ7W+l1G8qEtaDLwb+Emq+XpJ\nz0jaKGlmjf3WSuqU1NnT0zPS0x6jVCy4ZGRmVsOk3FSWdDrwbeDGiDicNN8DnAcsB/YBt1fbNyI2\nRERrRLS2tLSMaRzNxYJLRmZmNWQJhG5gYWp9QdKWiaQS5TD4ZkQ8ONQeEQciYiAiBoGvUi5NTaim\noujrd8nIzKyaLIHQASyVtERSM7AaaM9ycEkCvg5sj4g7KrbNS61+CtiSbcijVyoW/AU5ZmY1DPuU\nUUT0S7oOeAQoAhsjYquka5Pt6yWdA3QCZwKDkm4ElgHvAq4GnpX0dHLI/xQRm4BbJS0HAtgNfG58\np3a85mKBvkFfIZiZVTNsIAAkL+CbKtrWp5b3Uy4lVfoRoBrHvDr7MMdHuWTkKwQzs2py805lcMnI\nzKye/AWCS0ZmZlXlLBBcMjIzqyVngeCSkZlZLQ4EMzMDchkIvodgZlZNzgJBvkIwM6shZ4HgkpGZ\nWS05DASXjMzMqslZILhkZGZWS84CwSUjM7NachcI/S4ZmZlVlbNAEL2+QjAzqypngeCSkZlZLbkL\nhMGAAX/AnZnZcTIFgqRLJe2Q1CVpXZXtF0p6XNIbkm7Ksq+kWZI2S3o++T1z7NOpr6lY/moGXyWY\nmR1v2ECQVATuAlZR/ha0KyUtq+h2CLgBuG0E+64DHo2IpcCjyfqEai6Wp+tAMDM7XpYrhBVAV0Ts\niohe4H6gLd0hIg5GRAfQN4J924B7k+V7gctHOYfMSskVgp80MjM7XpZAmA/sSa3vTdqyqLfv3IjY\nlyzvB+ZmPOaoNfkKwcysphPipnJEBFD1n+2S1krqlNTZ09MzpvMMlYz86KmZ2fGyBEI3sDC1viBp\ny6LevgckzQNIfh+sdoCI2BARrRHR2tLSkvG01ZWaXDIyM6slSyB0AEslLZHUDKwG2jMev96+7cCa\nZHkN8HD2YY9OU8ElIzOzWpqG6xAR/ZKuAx4BisDGiNgq6dpk+3pJ5wCdwJnAoKQbgWURcbjavsmh\nbwEekHQN8CJwxXhPrlLJJSMzs5qGDQSAiNgEbKpoW59a3k+5HJRp36T9ZeDikQx2rJpdMjIzq+mE\nuKk8WVwyMjOrLVeB4JKRmVltuQoEl4zMzGrLVSC4ZGRmVluuAqHkdyqbmdWUq0AYKhn1uWRkZnac\nXAWCS0ZmZrXlKhBKTQ4EM7Na8hUIycdf97pkZGZ2nHwFQlIy6vcVgpnZcfIVCC4ZmZnVlK9AKPop\nIzOzWvIVCH7KyMysplwFQqEgigU5EMzMqshVIEC5bOSSkZnZ8XIYCAVfIZiZVZEpECRdKmmHpC5J\n66psl6SvJNufkfSepP0CSU+nfg4n36aGpJsldae2XTa+U6vOgWBmVt2w35gmqQjcBVwC7AU6JLVH\nxLZUt1XA0uTnfcA9wPsiYgewPHWcbuCh1H53RsRt4zGRrEpF0dfvkpGZWaUsVwgrgK6I2BURvcD9\nQFtFnzbgvih7ApghaV5Fn4uBnRHx4phHPQalYoG+QV8hmJlVyhII84E9qfW9SdtI+6wGvlXRdn1S\nYtooaWaGsYxZuWTkKwQzs0qTclNZUjPwSeCvUs33AOdRLintA26vse9aSZ2SOnt6esY8lnLJyFcI\nZmaVsgRCN7Awtb4gaRtJn1XAUxFxYKghIg5ExEBEDAJfpVyaOk5EbIiI1ohobWlpyTDc+krFAv0u\nGZmZHSdLIHQASyUtSf6lvxpor+jTDnw6edpoJfBqROxLbb+SinJRxT2GTwFbRjz6UWgqFvxpp2Zm\nVQz7lFFE9Eu6DngEKAIbI2KrpGuT7euBTcBlQBfwOvCZof0lTaf8hNLnKg59q6TlQAC7q2yfEM0u\nGZmZVTVsIABExCbKL/rptvWp5QA+X2PfI8DsKu1Xj2ik48TvQzAzqy5371R2ycjMrLrcBYJLRmZm\n1eUuEPyUkZlZdbkLhCa/Mc3MrKrcBUKpKHpdMjIzO07uAqHZJSMzs6pyFwhN/oIcM7OqchcIpWLB\nTxmZmVWRu0Bo9sdfm5lVlbtAKBUL9PYPUn5ztZmZDcldIMw98xQGAw4cfmOqh2JmdkLJXSCc33I6\nADt7fjHFIzEzO7HkLxDOdiCYmVWTu0A4+4xTmN5cZOdBB4KZWVruAkES5599Ojt7jkz1UMzMTii5\nCwQo30fY5ZKRmdkxMgWCpEsl7ZDUJWldle2S9JVk+zOS3pPatlvSs5KeltSZap8labOk55PfM8dn\nSsM7v2U6P3v1KEfe6J+sU5qZnfCGDQRJReAuYBWwDLhS0rKKbquApcnPWuCeiu0fiYjlEdGaalsH\nPBoRS4FHk/VJMfSk0QsvuWxkZjYkyxXCCqArInZFRC9wP9BW0acNuC/KngBmSJo3zHHbgHuT5XuB\ny0cw7jHxk0ZmZsfLEgjzgT2p9b1JW9Y+AXxP0pOS1qb6zI2IfcnyfmBu5lGP0bmzp1EQftLIzCyl\naRLO8cGI6JZ0NrBZ0k8j4ofpDhERkqp+lkQSImsBFi1aNC4DOqWpyKJZ0/ykkZlZSpYrhG5gYWp9\nQdKWqU9EDP0+CDxEuQQFcGCorJT8Pljt5BGxISJaI6K1paUlw3CzOb/ldJeMzMxSsgRCB7BU0hJJ\nzcBqoL2iTzvw6eRpo5XAqxGxT9J0SWcASJoOfAzYktpnTbK8Bnh4jHMZkfNaprPrpSMMDPpD7szM\nIEPJKCL6JV0HPAIUgY0RsVXStcn29cAm4DKgC3gd+Eyy+1zgIUlD5/qLiPhusu0W4AFJ1wAvAleM\n26wyOL/ldHr7B+l+5Zcsmj1tMk9tZnZCynQPISI2UX7RT7etTy0H8Pkq++0CLqpxzJeBi0cy2PGU\nftLIgWBmltN3KoM/9dTMrFJuA2HW9GZmTiv5SSMzs0RuAwH8pJGZWVruA8EfcmdmVpbrQFg693Re\n+kUv2352eKqHYmY25XIdCP/2vQuYOa3Ef23fQvlBKTOz/Mp1IMyY1sy6VRfSsfsVHnyq8s3XZmb5\nkutAAPh3713IuxfN4L//3XZe/WXfVA/HzGzK5D4QCgXxR23v4NCRXu7c/NxUD8fMbMrkPhAA3jH/\nLK5aeS73Pb6b7ft8g9nM8smBkPi9S97GGaeW+OO/3e4bzGaWSw6ExIxpzXzh4qX8qOslHtvRM9XD\nMTObdA6ElKtWnsuSOdP5403b6R8YnOrhmJlNKgdCSnNTgS+tupCug7/gWx17ht/BzKyBOBAqXLJs\nLivPm8Wdm5/j8FE/hmpm+eFAqCCJ//wbyzh0pJf/+YOdUz0cM7NJkykQJF0qaYekLknrqmyXpK8k\n25+R9J6kfaGk70vaJmmrpC+k9rlZUrekp5Ofy8ZvWmPzjvln0bb8LXz9Ry+w/9WjUz0cM7NJMWwg\nSCoCdwGrgGXAlZKWVXRbBSxNftYC9yTt/cAXI2IZsBL4fMW+d0bE8uTnmG9km2o3fewCBgaDP/2e\n36xmZvmQ5QphBdAVEbsiohe4H2ir6NMG3BdlTwAzJM2LiH0R8RRARLwGbAfmj+P4J8zCWdO4auW5\nPNC5h+cPvDbVwzEzm3BZAmE+kH7kZi/Hv6gP20fSYuDdwE9SzdcnJaaNkmZWO7mktZI6JXX29Ezu\n+wOu/+hSpjc3cesjOyb1vGZmU2FSbipLOh34NnBjRAx9NsQ9wHnAcmAfcHu1fSNiQ0S0RkRrS0vL\nZAz3TbOmN3Pth89n87YDPLHr5Uk9t5nZZMsSCN3AwtT6gqQtUx9JJcph8M2IeHCoQ0QciIiBiBgE\nvkq5NHXC+e0PLGH+jNO4uX2r36xmZg0tSyB0AEslLZHUDKwG2iv6tAOfTp42Wgm8GhH7JAn4OrA9\nIu5I7yBpXmr1U8CWUc9iAp3WXOQPP7GMn+5/jfsef3Gqh2NmNmGGDYSI6AeuAx6hfFP4gYjYKula\nSdcm3TYBu4Auyv/a/52k/QPA1cBHqzxeequkZyU9A3wE+N1xm9U4+/jb5/Lrb2vhzs3PcfA1P4Zq\nZo1JJ9Mne7a2tkZnZ+eUnPuFl47w8Tt/yCfeNY87/v3yKRmDmdloSHoyIlqH6+d3Kme0ZM501n7o\nPB78p25+vPOlqR6Omdm4cyCMwOc/8laWzJnOFx/4Z1593Z9zZGaNxYEwAqc1F/ny6uX0vPYGX3ro\nGX+Rjpk1FAfCCL1rwQxu+vgFbHp2Pw90+iOyzaxxOBBGYe2vncf7z5/Nze3b6Droj7Uws8bgQBiF\nQkHcccVypjUX+a1vdHDwsB9FNbOTnwNhlM4561S+8Zlf4dCRXtZ8o8NfpmNmJz0Hwhi8a8EM1l/1\nXp4/8Bpr7+vkaN/AVA/JzGzUHAhj9KG3tXD7FRfxxK5D/PafdfhxVDM7aTkQxkHb8vncccVFdOw+\nxKfu/gd2v3RkqodkZjZiDoRx8pvvWcA3P7uSV17v5fK7/4EfPDe5391gZjZWDoRxtGLJLB76nQ/Q\ncvoprNn4j/zeXz7NoSO9Uz0sM7NMHAjjbPGc6fzv6z/I9R99K+3//DMuvv0x/vyJF33D2cxOeA6E\nCXBqqcgXP3YBf3vDr/HWs0/nD/9mCx/8H9/n7se6ePWXvulsZicmf/z1BIsIHt/5Mvf8YCf/9/mX\naC4W+PAFLfybi97Cxf/qbKY1N031EM2swWX9+OtMr0aSLgW+DBSBr0XELRXblWy/DHgd+K2IeKre\nvpJmAX8JLAZ2A1dExCtZxnMykcT73zqH9791Dlu6X+XBp7r5zjM/4++3HaBUFBctmMGvnj+b1sWz\nePtbzmTO6adM9ZDNLKeGvUKQVASeAy4B9lL+Ss0rI2Jbqs9lwPWUA+F9wJcj4n319pV0K3AoIm6R\ntA6YGRG/X28sJ+MVQjUDg0HH7kM8tqOHx3e9zLN7f85g8sdw9hmncME5Z7B49nTOnT2NhbOmcc6Z\np3LOWacye3ozTUVX+cxsZMbzCmEF0BURu5ID3w+0AdtSfdqA+6KcLk9ImpF8Z/LiOvu2AR9O9r8X\neAyoGwiNolgQK8+bzcrzZgNw+GgfW/a+yrZ9h9m27zDPH/gFD+/p5vDR/mP2k2DGaSVmTm9m1rRm\nzjytxBmnNnHGqU1MP6WJaaUmpp9S5NTS0E+BU5qKNDcVaC4W3vzdVBSlomgqFCgWRKlY/t1UEIWC\nKBZEQVDQ0HJ5vXwhaGaNKksgzAfSn/O8l/JVwHB95g+z79yI2Jcs7wfmZhxzwznz1NKbZaW0n7/e\ny55Dv2T/4aPsP3yUnsNHOfR6L68c6ePQkV4OvnaUnT39vHa0nyNv9PNG/+CEj1VJUChZFiL535vr\nenO93K+8Y6pNbzYdEzJKnWNobehYHLNtqP+xAXXstnR79SCrlW/1zjHsvtWbq+w/fM8Rx+845vVk\nRr//oZHt/+8/+c138iuLZ03oOE6IO5oREZKq1q4krQXWAixatGhSxzXVZkxrZsa0Zt7JWZn69w8M\n8nrfAEd7BzjaN8gv+wbo7R+kd2CAN/oG6RsM+voH6R0YpH8w6B8YpH8gGIh4c31gMBiMYDDKpa1I\nL0OyHkSQrENQXhnant4G5e1vLicL6e1DfXjzeLy5/P/XKvpX/G2JWv3Sfaqc7zg19j2mS40ya9bH\nM7I8xzHSRz3G8+GQSX3M5OR5pmXC1Py7WOG0UnGCR5ItELqBhan1BUlblj6lOvsekDQvIvYl5aWD\n1U4eERuADVC+h5BhvLnVVCxwZrHAmaeWpnooZnYSynKHsgNYKmmJpGZgNdBe0acd+LTKVgKvJuWg\nevu2A2uS5TXAw2Oci5mZjcGwVwgR0S/pOuARyo+OboyIrZKuTbavBzZRfsKoi/Jjp5+pt29y6FuA\nByRdA7wIXDGuMzMzsxHxG9PMzBpc1sdO/VC7mZkBDgQzM0s4EMzMDHAgmJlZwoFgZmbASfaUkaQe\nyo+ojsYc4KVxHM7JIo/zzuOcIZ/zzuOcYeTzPjciWobrdFIFwlhI6szy2FWjyeO88zhnyOe88zhn\nmLh5u2RkZmaAA8HMzBJ5CoQNUz2AKZLHeedxzpDPeedxzjBB887NPQQzM6svT1cIZmZWRy4CQdKl\nknZI6kq+v7nhSFoo6fuStknaKukLSfssSZslPZ/8njnVYx1vkoqS/knSd5L1PMx5hqS/lvRTSdsl\n/Wqjz1vS7yZ/t7dI+pakUxtxzpI2SjooaUuqreY8JX0peW3bIenjYzl3wweCpCJwF7AKWAZcKWnZ\n1I5qQvQDX4yIZcBK4PPJPNcBj0bEUuDRZL3RfAHYnlrPw5y/DHw3Ii4ELqI8/4adt6T5wA1Aa0S8\ng/LH6a+mMef8Z8ClFW1V55n8N74aeHuyz93Ja96oNHwgACuArojYFRG9wP1A2xSPadxFxL6IeCpZ\nfo3yC8R8ynO9N+l2L3D51IxwYkhaAPwG8LVUc6PP+SzgQ8DXASKiNyJ+ToPPm/L3t5wmqQmYBvyM\nBpxzRPwQOFTRXGuebcD9EfFGRLxA+TtpVoz23HkIhPnAntT63qStYUlaDLwb+AkwN/n2OoD9wNwp\nGtZE+VPgPwKDqbZGn/MSoAf4RlIq+5qk6TTwvCOiG7gN+BdgH+VvZfx7GnjOFWrNc1xf3/IQCLki\n6XTg28CNEXE4vS3Kj5Q1zGNlkj4BHIyIJ2v1abQ5J5qA9wD3RMS7gSNUlEoabd5JzbyNchi+BZgu\n6ap0n0abcy0TOc88BEI3sDC1viBpaziSSpTD4JsR8WDSfEDSvGT7PODgVI1vAnwA+KSk3ZRLgR+V\n9L9o7DlD+V+BeyPiJ8n6X1MOiEae978GXoiInojoAx4E3k9jzzmt1jzH9fUtD4HQASyVtERSM+Ub\nMO1TPKZxJ0mUa8rbI+KO1KZ2YE2yvAZ4eLLHNlEi4ksRsSAiFlP+c/0/EXEVDTxngIjYD+yRdEHS\ndDGwjcae978AKyVNS/6uX0z5Plkjzzmt1jzbgdWSTpG0BFgK/OOozxIRDf8DXAY8B+wE/mCqxzNB\nc/wg5cvIZ4Cnk5/LgNmUn0p4HvgeMGuqxzpB8/8w8J1kueHnDCwHOpM/778BZjb6vIH/BvwU2AL8\nOXBKI84Z+Bbl+yR9lK8Gr6k3T+APkte2HcCqsZzb71Q2MzMgHyUjMzPLwIFgZmaAA8HMzBIOBDMz\nAxwIZmaWcCCYmRngQDAzs4QDwczMAPh/NpPBN4z5s28AAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "def em_model1(init_model, data, iterations):\n", " \"\"\"\n", " Estimate IBM Model 1 parameters from data.\n", " Params:\n", " init_model: the initial model.\n", " data: (target,source) sentence pairs\n", " iterations: number of iterations.\n", " Return:\n", " Trained IBM Model 1.\n", " \"\"\"\n", " model = init_model\n", " alignments = []\n", " for t,s in data:\n", " alignments.append([[0.0 for _ in s] for _ in t]) \n", " results_for_iterations = []\n", " for _ in range(0, iterations):\n", " old = alignments\n", " alignments = e_step(model, data)\n", " tmp_model = m_step(alignments, data)\n", " model = IBMModel2(tmp_model.alpha, init_model.beta)\n", " results_for_iterations.append((alignments, model, measure_change(old,alignments)))\n", " return results_for_iterations\n", "\n", "# measure_change(align_matrices, align_matrices) \n", "ibm1_iterations = em_model1(init_model, train_model_2, 100)\n", "plt.plot(range(0,len(ibm1_iterations)), [change for _, _, change in ibm1_iterations])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us have a look at the translation table." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "run_control": { "frozen": false, "read_only": false } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEyCAYAAAABVZAhAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xm4XVV5x/HvL2FQBApCEAiJDFJoKqAQUZxwKBXUGqla\nUEABlQZkKIMQLSJDS8GCiEwxDIIIgoJohEiKVLQIagIiCAqEQQhFiBSZRGLg7R/vOsnO8Sb3JNx7\nzz3r/j7Pkyf37L32vWuds/d71l57DYoIzMysLqO6nQEzMxt4Du5mZhVycDczq5CDu5lZhRzczcwq\n5OBuZlYhB3czswo5uJuZVcjB3cysQit06w+vvfbaseGGG3brz5uZ9aSbbrrp9xExpr90XQvuG264\nIbNnz+7Wnzcz60mSfttJOjfLmJlVyMHdzKxCDu5mZhVycDczq5CDu5lZhToK7pJ2lHSnpDmSpvSx\n/22SnpB0S/l31MBn1czMOtVvV0hJo4EzgB2AucAsSdMj4o62pP8TEe8dhDyamdky6qTmvi0wJyLu\njYj5wCXApMHNlpmZvRidDGIaCzzYeD0XeH0f6d4o6VbgIeCwiLi9PYGkfYB9AMaPH7/suS02nHLV\nch873Nx/wnu6nQXrIbWc+z7vB99APVC9GRgfEVsCpwHf6StRREyLiIkRMXHMmH5Hz5qZ2XLqJLg/\nBIxrvN6gbFsoIp6MiKfLzzOAFSWtPWC5NDOzZdJJcJ8FbCppI0krAbsC05sJJK0rSeXnbcvvfWyg\nM2tmZp3pt809IhZI2h+YCYwGzouI2yVNLvunAh8E9pW0AHgW2DUiYhDzbWZmS9HRrJClqWVG27ap\njZ9PB04f2KyZmdny8ghVM7MKObibmVXIwd3MrEIO7mZmFXJwNzOrkIO7mVmFHNzNzCrk4G5mViEH\ndzOzCjm4m5lVyMHdzKxCDu5mZhVycDczq5CDu5lZhRzczcwq5OBuZlYhB3czswo5uJuZVcjB3cys\nQg7uZmYVcnA3M6uQg7uZWYUc3M3MKuTgbmZWIQd3M7MKObibmVXIwd3MrEIO7mZmFXJwNzOrkIO7\nmVmFHNzNzCrk4G5mVqGOgrukHSXdKWmOpClLSfc6SQskfXDgsmhmZsuq3+AuaTRwBrATMAH4sKQJ\nS0h3IvBfA51JMzNbNp3U3LcF5kTEvRExH7gEmNRHugOAy4FHBzB/Zma2HDoJ7mOBBxuv55ZtC0ka\nC+wMnDVwWTMzs+U1UA9UvwQcEREvLC2RpH0kzZY0e968eQP0p83MrN0KHaR5CBjXeL1B2dY0EbhE\nEsDawLslLYiI7zQTRcQ0YBrAxIkTY3kzbWZmS9dJcJ8FbCppIzKo7wp8pJkgIjZq/SzpfODK9sBu\nZmZDp9/gHhELJO0PzARGA+dFxO2SJpf9Uwc5j2Zmtow6qbkTETOAGW3b+gzqEbHni8+WmZm9GB6h\namZWIQd3M7MKObibmVXIwd3MrEIO7mZmFXJwNzOrkIO7mVmFHNzNzCrk4G5mViEHdzOzCjm4m5lV\nyMHdzKxCDu5mZhVycDczq5CDu5lZhRzczcwq5OBuZlYhB3czswo5uJuZVcjB3cysQg7uZmYVcnA3\nM6uQg7uZWYUc3M3MKuTgbmZWIQd3M7MKObibmVXIwd3MrEIO7mZmFXJwNzOrkIO7mVmFHNzNzCrk\n4G5mVqGOgrukHSXdKWmOpCl97J8k6VZJt0iaLenNA59VMzPr1Ar9JZA0GjgD2AGYC8ySND0i7mgk\nuxaYHhEhaUvgm8Dmg5FhMzPrXyc1922BORFxb0TMBy4BJjUTRMTTERHl5cuAwMzMuqaT4D4WeLDx\nem7ZthhJO0v6DXAVsHdfv0jSPqXZZva8efOWJ79mZtaBAXugGhFXRMTmwPuB45aQZlpETIyIiWPG\njBmoP21mZm06Ce4PAeMarzco2/oUET8GNpa09ovMm5mZLadOgvssYFNJG0laCdgVmN5MIOlVklR+\n3hpYGXhsoDNrZmad6be3TEQskLQ/MBMYDZwXEbdLmlz2TwU+AHxU0p+BZ4FdGg9YzcxsiPUb3AEi\nYgYwo23b1MbPJwInDmzWzMxseXmEqplZhRzczcwq5OBuZlYhB3czswo5uJuZVcjB3cysQg7uZmYV\ncnA3M6uQg7uZWYUc3M3MKuTgbmZWIQd3M7MKObibmVXIwd3MrEIO7mZmFXJwNzOrkIO7mVmFHNzN\nzCrk4G5mViEHdzOzCjm4m5lVyMHdzKxCDu5mZhVycDczq5CDu5lZhRzczcwq5OBuZlYhB3czswo5\nuJuZVcjB3cysQg7uZmYVcnA3M6tQR8Fd0o6S7pQ0R9KUPvbvJulWSbdJukHSVgOfVTMz61S/wV3S\naOAMYCdgAvBhSRPakt0HbB8RWwDHAdMGOqNmZta5Tmru2wJzIuLeiJgPXAJMaiaIiBsi4vHy8qfA\nBgObTTMzWxadBPexwION13PLtiX5OPD9F5MpMzN7cVYYyF8m6e1kcH/zEvbvA+wDMH78+IH802Zm\n1tBJzf0hYFzj9QZl22IkbQmcA0yKiMf6+kURMS0iJkbExDFjxixPfs3MrAOdBPdZwKaSNpK0ErAr\nML2ZQNJ44NvAHhFx18Bn08zMlkW/zTIRsUDS/sBMYDRwXkTcLmly2T8VOApYCzhTEsCCiJg4eNk2\nM7Ol6ajNPSJmADPatk1t/PwJ4BMDmzUzM1teHqFqZlYhB3czswo5uJuZVcjB3cysQg7uZmYVcnA3\nM6uQg7uZWYUc3M3MKuTgbmZWIQd3M7MKObibmVXIwd3MrEIO7mZmFXJwNzOrkIO7mVmFHNzNzCrk\n4G5mViEHdzOzCjm4m5lVyMHdzKxCDu5mZhVycDczq5CDu5lZhRzczcwq5OBuZlYhB3czswo5uJuZ\nVcjB3cysQg7uZmYVcnA3M6uQg7uZWYUc3M3MKuTgbmZWoY6Cu6QdJd0paY6kKX3s31zSjZKek3TY\nwGfTzMyWxQr9JZA0GjgD2AGYC8ySND0i7mgk+z/gQOD9g5JLMzNbJp3U3LcF5kTEvRExH7gEmNRM\nEBGPRsQs4M+DkEczM1tGnQT3scCDjddzy7ZlJmkfSbMlzZ43b97y/AozM+vAkD5QjYhpETExIiaO\nGTNmKP+0mdmI0klwfwgY13i9QdlmZmbDVCfBfRawqaSNJK0E7ApMH9xsmZnZi9Fvb5mIWCBpf2Am\nMBo4LyJulzS57J8qaV1gNrA68IKkfwEmRMSTg5h3MzNbgn6DO0BEzABmtG2b2vj5d2RzjZmZDQMe\noWpmViEHdzOzCjm4m5lVyMHdzKxCDu5mZhVycDczq5CDu5lZhRzczcwq5OBuZlYhB3czswo5uJuZ\nVcjB3cysQg7uZmYVcnA3M6uQg7uZWYUc3M3MKuTgbmZWIQd3M7MKObibmVXIwd3MrEIO7mZmFXJw\nNzOrkIO7mVmFHNzNzCrk4G5mViEHdzOzCjm4m5lVyMHdzKxCDu5mZhVycDczq5CDu5lZhRzczcwq\n1FFwl7SjpDslzZE0pY/9kvTlsv9WSVsPfFbNzKxT/QZ3SaOBM4CdgAnAhyVNaEu2E7Bp+bcPcNYA\n59PMzJZBJzX3bYE5EXFvRMwHLgEmtaWZBHwt0k+BNSStN8B5NTOzDnUS3McCDzZezy3bljWNmZkN\nkRWG8o9J2odstgF4WtKdQ/n3l8PawO8H8w/oxMH87S/KoJd9mBvJ5fd5P7y9spNEnQT3h4Bxjdcb\nlG3LmoaImAZM6yRjw4Gk2RExsdv56IaRXHYY2eV32esoeyfNMrOATSVtJGklYFdgelua6cBHS6+Z\nNwBPRMTDA5xXMzPrUL8194hYIGl/YCYwGjgvIm6XNLnsnwrMAN4NzAH+COw1eFk2M7P+dNTmHhEz\nyADe3Da18XMAnxrYrA0LPdOENAhGctlhZJffZa+AMi6bmVlNPP2AmVmFHNzNbEhJUrfzMBI4uA8D\nPtnr5894kXBb8JBwcO8CSZ+T9CFJr4U82Wu/+CVNlPSabudjqEn6B0krOaCBpA9IOlvSKyWt1e38\nDDVJH5X01iH7ez7nhp6kHcg5e8YD8yPigLJdNQYBSRsCHyTHSJwP/Dgibu1iloaEpJcDxwLrAFcC\nt4yEci+JpNWBQ4DVgZWBC8tcVCOCpD2A3YHrgP+NiAsG9e9VGEuGLUnHAedExG8lvQx4CfAd8oPe\npaSpKsBL2hG4MSKekPTXwGTgBeD2iPhqd3M3eCS9JyKuKj9PAjYG3gWcHBHXdDVzQ0zSAcA6EfG5\n8nozYHvgE8DhEXFdF7M36Er5vxkRj0jaBNgC+Ajw24j49GD9XTfLDC0BV0oaGxHPRMRjEfEWYIyk\nS6Cu9khJmwI7AJMlrRsRdwH/BtwCbCPpA13N4CCRNAY4TNLRABHxXeBs4GvA8UN5az5M/ADYQtKR\nABFxZ5mK5AzgXyW9uqu5G0SSRpGTKJ4t6RURcQ85ov+zwARJRw3W33ZwHwJlTnwi4kjgUmDnsn2F\nsv0dwJqSDulaJgdBRNwNXAG8lJzrn4j4P/LkvgPYUtKK3cvh4IiIecDHga0kvb1sezoiLibXOthd\n0vrdzONQaD1HiohfA4cD97UluYwc+f6mZvqaRMQLwOeA2cDftbZFxBzgSGADSdsNxt92s8wgkzQ6\nIp5v/b+U/a8D3hUR/9aFbA6o9rJKGlMC3sJmpxLczge+GhHf6FJWB1Qf5V4zIh5vS7Mx8EngGxFx\na23NcC3N90LSyyLimca+USXoIeldwJ7AR2p6H5Z0vbelWQ3YH3g4Is4f6Dy45j6Iykn8fLk1O13S\n5yV9UtK6rTSNE+BR4DWSNu9KZgdI48tqlKSPSdoN+EPZN6rVMygi/hc4GnibpL/qZp4HQlu5J0na\nHnimPV1E3As8Bkwpr6sJaC3N817S14FTJH1G0nhYWJul/DyTXP9hxy5ld8C1lf9QSQeUiRcXm+4l\nIp4CrgH2Gow7OdfcB1m51fwv4GryQeLHgL0j4uY+0m4OPFqaLnpW+TL7HvAzYGuyt8jbykpezXSr\nAW8HruqvltMLSrmvA64H/gG4Fvh8RDxR9i+spUt6PzCj/T2pRTnvvwHcAPySPB92iIifNdKMiogX\nSmB7olm773Wl/DOB/wa2IqdE3zUi5jbTlMrOG4GbI+JPA5kH19wH32bATRFxMrnW7Fcj4mZJY9vb\nmyPiN70c2BttpvuRF/XxwCrApc0g1miLfQr4Qa8H9ka5DwaujojPAs8Bt5VeQq1nLs3xDDNrC+zl\ny63llcADZBfQQ4FjIuJn5WEzsFgN/uFaAnvj8/0AMCsiTgDWAi6JiLmSXtJK27hrmz3QgR0c3Adc\n60JumEc2t9wHTI+IU0uaQ6lkKcJm8CqbHgSeJe9Wri1lHitp37Z0RMQfhzq/A6WPcs8BVpZ0PXkx\nn1uaIhY2ObTSRsSzQ53fwVSapV5QGgX8L7AScBVwXUScXN6vsyUtthhGDU1TSzgXVpd0E/lFfrqk\ndYBD+6jUDcqXvIP7AGpra7tKOVjpabJZ5mdAawDLN4CXR8T9XcrqgGlra/68pA3I5weHAXdERGtB\ntS8DG3UtowOsrdzHlIt7PhnIr4mIk0rS0ym9JGrV9mzpamC/ErDuI2vvvyhfchcBj0XE7C5md8C1\nnQsXK/vxP0M2xdwInFmSngFsGBF/Hop8DekaqrVr1VzI2sotUQarSJpJBvmTJT1Iti/uVfb1dG+J\nxkV9HTlYaS4wV9LHgHPLvs2AByLicOj9MsNi5b6KHIzyPPB9SVsBm0m6iByJ+X8RcXA38zrYGs0r\n1wDXR8Tp5XUrqO0CvII8Bw6COs6Blsa58APglxFxJ4Ckk4B9gW8qV7F7OCI+WfYNevn9QHUAaPFu\nX5sBnwGOIdsddwceIWvvvwBeGhGPlLSjGhdGT2l7OLgPsEVEHCDpb8hnC7PJvuwTgDUiYnpJ27Nl\nhr8o94HAuhHxWUljyXJfSza3vRxYKSIuK2l7utx90eJdGtcETgP+GXg/8E7gr4EDyzOml7eeJ9Xy\nXrSVfwfgvRFxkKStgd3IWvutQABrR8SN7ccNav4c3AdGqbFvRX6Y55JDjP+H7Oa1DvDraPRlraXm\nImllcp6cC8j1dh8newX9E9k74heNtFWUuUXS3mRAf7r8m0jejn84Sr/+kq6qcjeV836XiLhE0reA\nDcgmyO+Qo5NXiIgjmulrei9K+dcH1iAHKN5f/j0HvBX4dDSmVxjK8rtZZuC8C7gc2I6svWwWEbcB\nSLqUvPgXquEEl3Q42c78FeAg8tb7mxHxpKQ1gPYHRz1fZgBJhwGrRMSxpZx/AC4vPWNmkM1QC4N7\nLeVuagSp1wH/JGnViPiQpPUi4uGS5qNkoFuowvdiD2ASuW70XuQd239HxJ/Ll926zcRDWX4/UF1O\nbd2+iIiryXVkvw28ISJuU05t+k3guYg4rhv5HEjtZSabmyaQTU8/iohzMpkuJ8/jnw91HgeDGoNP\nyp3KncA4SQdFxBcj4jxgFeWAncci4vpu5XWwtbr6NYLUr8kv9+0kHRYRD0taQ9KVZMvAsc3jep3+\nsjfcbeRcSScDj0cOylpP0mXkdX/JUOexxcF9OTW6fR0k6ZVl2/nkqMtzldMJPEZ2g/oo9Bkce0qj\nzLuV1xeQD9G2IWtvryBrcvMiYjeo46KOiAWlJ8RrIuI5sl39W8DmklpNDu8FnoyIPaCOci+NpBmS\n3hY5VuF6sgfY1pL2jYg/AOc3Og2MqqXGXh6eStLB5fUvyPmT7gWOKL2C/gq4JyJ2hy5e9xHhf8vw\nDxjV+PklwAyyi9M4sn0Rsu3taWB8X8f12j/Ks5ny81bkqLsjGts+DtwF/AuwWg1l7qPc7yafJWxf\nXq8GfJQMbEe0HdfT5V7CezGq7fVBZOXlTeX1isA55EPEd9T2XrRd9y8DngTOamx7YzkXLhsu10BP\n1ySHmhYfqDEBGE0OM1+R7CHTmh/iLuDQiHigdWz0aO+AUuZW75AVIuKX5N3JVpJa86OcCzxE3pY+\n1Tq2V8sMi5cbICJmkF36vtuosV5FDtZ5onGcerncfWk7798paZOIOJWcm/8qSW+M7Ls9CjgtIv67\ndWwN70Wr/OXndSJH064HvEnSVwAi4gay9v694XINuLfMMiq3WJeTtfZngV9FxFGSziS/0bci54nY\nu5W+V09wLZr7YxRwHjl17x1k+dciR9nOI29DH4iIKqYsbiv3KWR/9Z+STQ9vBS4mp1jYlZxi4DPl\nuKp6gjSV9+Ia4I/keXALcALwjvL//WRzXFWLzrTKUcr/dWBN4B7y2dpPgJuBm8heQndHxD83j+tS\ntgEH92Um6YvkEmEHkqvrnER+uMcBWwJjSy1vWHzAA0HSxWTN/EJyXu67yBP9z2SQeyJyDo1qygwL\nB5/dBtxOzkc/hmx6ejvwZvK5YvWBHUA5dcSrI+JTkrYF3gCsFxGfUY7tWDnKEoI1vheSzicrMmeQ\n3V9fQ14PN5OrKkW5gx025XdXyH708UH9EfhJ5IOVe8gJ9z9LtrP9kpwBr6dr7E2S1iZrK1Mi4oHS\nDfBM8mQ+jmyOaqWtoswAymmZ/xgRh5XXryJ7Q/1jRFwoaWZpiqiq3C19lGlN8jkDEfHz0oPoWOUK\nW3c2jhsWgW0gSVqFvFu5ICLuLz3gVgF2juwZdU4j7bA5F9zm3o8+TtQ7gS9IGl8+xAfIGt2abccN\niw94WbU/2Y+I35Ntie+WtFZE/A44EehrVsueLDMs3t2xeJRc/vBwgMiVc+4mewYRjflBerncS9Jo\nY9+gbLoYWFG5yHOrjfl5siNB87ieD+ztPZ0iJ7e7m+wNs25EPEbOobOxctHvZtphcy44uC+BGut7\nSpou6VJJHye7PZ0M/Kh0CbyAbGv7bZeyOmAkrd94cDRF0mGl1vJt4LXAx8qD5MPIppghmQBpsJUL\ndoGkFZRdW3cv78Mx5JfY50rS7WkMTqqRpA8ru7RCLod4maR/J2PFd4D3S/qGpCvINvZZ3crrYJD0\nukYHglMknSFpI/LL7W7gjNIsdTRwX0Q82b3cLp3b3PsgaT3gx2SXxudY9CCx1Xf9i8DfAxuSbY3/\nUY7r2VtSSXsBm5NL3/072fz0PFlT3QZ4PTlnyMbkSX1wOa5nywwLv8TfQ849fyLwJ/LZyRVkMFuN\nXAPzceCZiNizHNfT5e6LckraY8iOAi8jp864gGxnvoF8kP4U2S30mSgDdGp5L5SrZx1IBvJ/ID/z\nl5LTh5wK/A74R3J208ejTKswXMvv4N5G0nYRcaNyAqyTgXUiYmLZ93dkUH8aODvKMOuyb9i0tS0r\n5QIKLwWOIvvvviQi9iv7vkJOqbBdRDyjxrqgvVxmgNKs9Epgb7Jr258iYt8S5L5Izsk9lRyJu1qr\nltbr5e6LcvK3r5G9PvYgpy4+oLSvb0j2iJkDnBm5RGLruCreC0l/ExG/lvRJco6gdSNiUtk3hfzC\nnxoRP24G8+FcfjfLNChHms4st+W/Bg4h211PAYiIH5CzO44la7ILDdcPuD/lC+sYsq/+UWS3v82V\nI2wpXbtuAB6W9FIWrYfa0/25S439e2Q/9XPIcm0vabOIeJTsFfTXwOeBNRuBvafL3ZfyJfda8n14\nkOzyeQewr6SNI9cd+DR5zr+xeWwN74WkLYBLJb0zIs4me79tppy2mtIT7Fbg6HJ+tAL7sD4XXHMv\nWt/GkrYhh5bvHBG/VK5rOg34cUQcWdL+bUTc3s38DpTSje1d5BfWMeRArM+Q3R2viohflXR7xiCs\n0N4tyuXOjgI2IbuybU7WWFcApkXEXeVh4vsi4swl/6Y6lC/uY8k5+b9droNJwKrA6RFxrxrT9tZG\n0k7AEeSo46fJZfK2JicB+1ZJ896IuLJ7uVw2Du4NWjR4Zf22W8/NgLPJifgPaGwflm1ty0rSJsCY\niPhpef035F3L/WSAv6WRdtjehi6rEuDfEosWVdkGeB85KGtaRNzRSFvFZ700yhHICxqvX0e+H2PJ\nKRbmle1VvheSNo2Iu8vPryDnC3o9uQDJ1xrpeqL8bpZpaAWtVmBX6RZY+vHuR7a9NtMP+w+4ExFx\nTyOwjy5NUieTD1LHt6WtIrADRMSfWoG9vL6J7CEC2UzRTFvFZ700rcCuReuBziK7/M2Ixvz0tb4X\njcA+KnJBnSvJAWwrtKXrifKP6Jr78n4D98o39/JqNFEtnJt7JJG0YVSwvu1Aq/2874uk1Ydzd8el\nGdHBHRbWVJ/vdj6Gsxou6k7K0J6mhnK3K336f1dj2ZbFspa/F9+vEdksoxygcyEsmp+523kabJLe\nK+nNy3Nsr53UTZL2Vs7DHv19zu3l7OVy90XSO4HTJW2+pLLVfC1I2lPSp2DZP9tePBdGZHAn25PH\nK2dypK8LX385HL1nKZeCezPZ1e/1S0lX1YVdyr0WcKCkLZpd2NrSVVXupbgT+BFwuHLOoIXKe9WT\nQawTpbvnI8BrlWvfLildNTGxmoJ0ovGg6M/k6uyz2/a/UtKV5YHKghou+tLs9AdyZXqRg5Wa+8dK\n+pCkFWu7sEu5v0EuovCqxvaQtImkSa3XXcrikGh0DJhLLiZxTuScQa39fwWcLGnnLmVx0JVr/jry\nfHi8uU/SOElHD/d+68tqxAT3Vtu6crm03SJiQeTalwvb0yLnh3mSDIQ9f9E3yjyafOJ/QjRWYi/e\nQM5Rvn778b1K0pbKoeSQ3dmuj4gr2pJtDOwtaauhzd3Q0qKFNkZJWo2cD+aGsq9VeZkP/BCYUNMd\nKyyq0AFExLMRcW0f58LKZHfPqr7cRkxwbwV2clm87dr2hRbNcDgFuFU5qKNnlS+sVpkvB3Zp68Pc\n+uyvBFYiBzL1vFKuPwGHSLqTnD7irj6S/pwcidjTn/PSNCs05NKI04BvtZpgKNd/RDwL/IoMcj1/\nt9rSvAYkTSu1852b+8uP9wGzyGl8qzEignvjQ/xP4K6I2F/S6pLeJ+nVsNgUrr8DLionfM9q3HVc\nBMyOiC/AwgFZrSldR0cu+HwIuWRcT9Oi5dAeJCf8Gg30OVtnRDxBBrz7hi6HQ6vRC+xgMngdQE5R\nPVPSGiXwrVjS3gJ8ISqZ6RMWuwbOIldOuw/4Dy2aViAaveW+Ra40VY0R1RVS0mRgbWAC2e72ZrId\n7tjIOZp7XrPLlqS1yMmgTiNnttuOHHE4LSKOKWmqGHHaVkvdgpzV8hVkYLsuIk4r6TYobc/VkvQ+\n4BcR8aCkk4FtgSMj4kdl/ynkHDE7RYXTCUjaJnJAGpJOAlaNiMnl9fbkaPPjo0yn0YvdHDsxUmru\nl5bbsenk0nC/IOeR+DtyTpF1upi9AaPFF7MeVb6wvgZ8hZwE60JgB2BD5dD7KkactjVBXQ3sGzna\n8OfAucA7JR2snOHyHd3M62CTNB4YXQL7GuSqWU8Dby9t7kRO1/xrcnrbqpTunhs3No0Ctpb0BuX0\nCj8C/hk4TdJboPefrS1JlTX39tqopF2BfwM+FREzy7aVyEWf/xgR+3QnpwOnreZ6OvkldnVEfF/S\nahHxVHlYNo2ci/uApf7CHiFpo4i4r/x8NvBYREwpn+825Pw4G5GzGj4aZQHj2inn59+GnJt/FfIL\nfgZwbmmSqpKklSJifrlL/31EXCbpeHLthWPJZtkXVKb47WpmB1mVNfdYtETY+qVWdwm5etBZ5ZYV\nsp35iVZg7/Vuj41eMdPJ1dl/Sw5YORBYU9LK5OIjo1qBvdfLrJyu+N2NTY8Av5H0JXJxhcuAyaV3\nyAeisTL9kGd2kDV7hRQ3k01TnwKeASaTCzsf1OwsUMt7oUXdnOeXu9I1yLuVnSLis+SzhqPJHkFq\nBXZV1K+9XXU191b7mXKC/dcDh5IrB4VyIv4jgX2AH0bE/HJMz7Y7S9qF7Or3UPnimgCcQj4gvZds\nd/4x8E3g5ZGLePd0mQEkrRoRT5eL80iy18/KwJ7kCkLTyIeqXyYD/APluOraV9vu2j5DdgqYRa4D\newjZe+hMMuBNiogTu5bZQdBW/onAU7Fo4Y3XAFeWO9ipwG0RcUZXMzxEqgnu6mOOGEmnk/NRHxMR\n9ylXlDmDpnH2AAAG9UlEQVQNmBMVLBNX2lC/Qz5DOCEifl+aIw4HXoiI48st6U7AYRFxbTmuZ8sM\noBx8dCjwwYh4VLnG6URgSqNGtgK5+MSzEbFv93I7NEpg+y55LswnZzHdCPhbYBeyaebIVpNMr58D\n7Ur5ryIX1XgLuajOv5Pt65uT87J/u3s5HHpV3JJo8QFKx0jaXdKrImJ/cs3HY8oD1VPJ+ckPbh3b\nqye4ckTpU8A/kW2rxyiXwJtPrqq0ZUm6DnBJK7BD75a5JSK+S3ZbO6fcgk8FfgB8QYumVzgZeK4V\n2GtpfmhSriDU8j5yJO7x5KC0EyPiuYi4mXyo/pNmW3uvnwN9OJk8B44rrx8p3TrPJpsoV2slrPFc\n6EtVNXfygr8BGAesSa73eLWkQ8gRaKs221179QTXokVFVgK+Sn6B/SNZi59C3q2cSa4L+quI+Fg5\nrmfLDCBpD+CNkeucjiYX2riu7FubrKG+i1xh6aFYtLhETzdB9UXSe8gFy79dmhx2IFeUGgd8LyJO\nLXd2+wGnRsSfynE9fQ60tJdD0gHAb8g7uh9GxImSNiUXXrm5ts+/Ez0d3CW9AbgnIuZJOpjsAnaS\npBvJxXxfAC6MXPu0eVzPXuyNni+jyfbVV0fErspFnb8CzCO7eT4DbBW54EIVF3UJ4PsDX4yIJ/u4\nwNcmn6f8KSK+WLb1fLn7Iml9ckm4tcleMNeTg7IUEW8qaS4H5kbEQV3L6CBYQhPsXmS31yMi4j/L\ntivI51Enl9dVngtL0rPBvdx+fw64GLgCeBn54Og04JZSc7keeIIcsPCTclzPfsDlgemnyZ4fj0ra\nHXgd2db8rHJpsDvIttcDI+LpclzPlrlJ0prkg9JrI2Jq2dYe4FdtlbtGzYpJ+TL7BHmHdilZofke\n8BB5Pdxfw51qX0ob+/fJysws8g52MvlA/STyTnZeROzVrTx2W8+2uUfEz4CvA28nF7N9juz69XLy\nw4Y8yb/VCuzluJ49wSNiOvmg6LzSne0RsjfM5pJWjlwa7Gqyd9DTjeN6tsxNEfE4OV7hXyV9pGwL\nFeX1wi+07uV0cChXxmp1870I2Ay4gOwd8yHyAeqbyHbnYxuBfVQN54AW7+45mXx4fBGwAVnRO4tc\n5P0lZAVgr3Jcz8a5F6Pnau6SNo6Iexuvvw5sBZxATuf5cfLkvgP4XUR8pKTr2ZpLP23NU8juXk+Q\n09reHhEHln09W+alUfZv/zI5F8r5je0929zWH0nvJysxx0d28zsfODoi7pe0LvAxcmbPayLiysZx\nVZwDjedMo4APAm8DzoqI20rz7CRy5tNTYvHF7as9J/rTU9N7Svp7slfIQRHxc+UotHFkl7d3lmTn\nkLP9jSu9Kmo4wb8PbKKynqOkH7V2RMQJyhWW1gTGLqm5oiYR8QNJ+wAXSlqd7No6o9aLWNIq5KCk\n1wL7KftrP0J2eSRy2byvktMJrNY8toZzQDltQGt9hSvJisy25Gymn4iIn0oKYA8y6F/cOrbWc6IT\nPVNzl7Qq+ZDwIHLSo7nkyb5HRMyVtBuwIzlg56uxaCX3nv/mXlJbM/n5/UXZaihzJ0pviB3IuUR+\n1azF10LZp/9gcq7xl5Jt7OPIAXrXkEH/D8AYspvvvC5ldVBo8akzPgtsFBF7KSfFm0n2jPl0SbtJ\nRNzTzfwOJz3RFlVO8KvIEZZfAn5K9hQ4tQR2RcRF5IIDo6Mxb3kNQW5Jbc1An9/MNZS5ExFxd0Sc\nGRGHAVUOUCl3nz8kHxjOIx+YPgI8S04GtxYZ6LeKRV0/q3jeUDoQzCg9wVYlHxxvq1wT9zHgPeSk\ncGcDtAJ7LeV/sXqp5n4UOVinNdpuN/IW7KSIuLGLWRsyI7GteaTq5znLq8nAtgHwpWZttbbmOC0a\nffxB8oHx7sDqwHkRcauk9YD9IuJzXczmsDSsg3s/J3hz0MoXIuL67uV06JT29QvJ+WPmRMSMLmfJ\nBoH679O/FdnGfGtEfK1b+RwMfVz3b42IH5Z9W5GjcdcBzo8yb3vZV9UX24s13Jtlvg88Uh4kPk+u\n3A5A5AK/l5IjUrftUv6GXPkS+3tgAfAOSXt2N0c2SJ4n54VZrMtna2fkBHCn1hbYi/br/rrWjlLu\n75JjWiY0D3JgX9xwr7mP+EEr/Wn1oOl2PmzglVrqleSoy4vLtlZ//uY1UFWNtcPrflxEPNitPPaC\nYR3cYeSe4GYwcp+z+Lp/8YZ9P/eI+KVy3ogvK1dZOb/1YTZPcH/AVqOR1qe/xdf9izfsa+4tfpBo\nI9lI6NPfF1/3y69ngjuM3BPcrGmkPWfxdb98eiq4N420E9zMfN0vi54N7mZmtmTDvZ+7mZktBwd3\nM7MKObibmVXIwd3MrEIO7mZmFXJwNzOr0P8DRSY53JuFIpUAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plot_table_for_target(ibm1_iterations[-1][1].alpha, \"house\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also inspect the alignments generated during EM." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "run_control": { "frozen": false, "read_only": false } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", "\n", " \n", " NULL the house is small\n", " \n", " \n", " klein ist das Haus\n", " \n", " \n", " \n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "word_mt.Alignment.from_matrix(ibm1_iterations[-1][0][0],train_model_2[0][1], train_model_2[0][0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Training IBM Model 2\n", "Now that we have a reasonable initial model we can use it to initialize EM for IBM Model 2. Here is the EM code in full.\n" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "run_control": { "frozen": false, "read_only": false } }, "outputs": [], "source": [ "def em_model2(init_model, data, iterations):\n", " \"\"\"\n", " Estimate IBM Model 2 parameters from data.\n", " Params:\n", " init_model: the initial model.\n", " data: (target,source) sentence pairs\n", " iterations: number of iterations.\n", " Return:\n", " Trained IBM Model 2.\n", " \"\"\" \n", " model = init_model\n", " alignments = []\n", " for t,s in data:\n", " alignments.append([[0.0 for _ in s] for _ in t]) \n", " results_for_iterations = []\n", " for _ in range(0, iterations):\n", " old = alignments\n", " alignments = e_step(model, data)\n", " model = m_step(alignments, data)\n", " results_for_iterations.append((alignments, model, measure_change(old,alignments)))\n", " return results_for_iterations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Initializing with the IBM Model 1 result gives us: " ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "run_control": { "frozen": false, "read_only": false } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEyCAYAAAABVZAhAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmcXFWZ//HPNwmgCAhCUEiCLINgRoxKRHADZRBQx+io\nIwgouDAo249FiA4oywwDSkRki2GRRTYXVIQIo47ADxElIDsDhEUIIkREZMfAM388p5KbspPuhOqu\nrtPf9+uVF133nkudU3XvU+eeexZFBGZmVpdR3c6AmZl1noO7mVmFHNzNzCrk4G5mViEHdzOzCjm4\nm5lVyMHdzKxCDu5mZhVycDczq9CYbr3xaqutFmuvvXa33t7MrCdde+21f4qIsf2l61pwX3vttZk1\na1a33t7MrCdJ+v1A0rlZxsysQg7uZmYVcnA3M6uQg7uZWYUc3M3MKtRvcJd0mqSHJd28iP2S9E1J\nsyXdKOlNnc+mmZktiYHU3E8HtlnM/m2B9cu/XYGTXny2zMzsxeg3uEfEFcCfF5NkCnBmpKuBlSWt\n0akMmpnZkuvEIKZxwP2N13PKtgfbE0ralazds9Zaay31G6499eKlPna4uffI93U7C2ZWoSF9oBoR\nMyJickRMHju239GzZma2lDoR3B8AJjRejy/bzMysSzoR3C8EPlF6zWwKPBYRf9ckY2ZmQ6ffNndJ\n5wJbAKtJmgN8BVgGICKmAzOB9wKzgaeAXQYrs2ZmNjD9BveI2L6f/QHs3rEcmZnZi+YRqmZmFXJw\nNzOrkIO7mVmFHNzNzCrk4G5mViEHdzOzCjm4m5lVyMHdzKxCDu5mZhVycDczq5CDu5lZhRzczcwq\n5OBuZlYhB3czswo5uJuZVcjB3cysQg7uZmYVcnA3M6uQg7uZWYUc3M3MKuTgbmZWIQd3M7MKObib\nmVXIwd3MrEIO7mZmFXJwNzOrkIO7mVmFHNzNzCrk4G5mViEHdzOzCjm4m5lVyMHdzKxCAwrukraR\ndLuk2ZKm9rH/5ZJ+IukGSbdI2qXzWTUzs4HqN7hLGg2cAGwLTAS2lzSxLdnuwK0RMQnYApgmadkO\n59XMzAZoIDX3TYDZEXF3RDwHnAdMaUsTwIqSBKwA/BmY19GcmpnZgA0kuI8D7m+8nlO2NR0PvBb4\nA3ATsHdEvNCRHJqZ2RLr1APVrYHrgTWBNwDHS1qpPZGkXSXNkjRr7ty5HXprMzNrN5Dg/gAwofF6\nfNnWtAtwQaTZwD3Ahu3/o4iYERGTI2Ly2LFjlzbPZmbWj4EE92uA9SWtUx6Sbgdc2JbmPmBLAEmv\nBDYA7u5kRs3MbODG9JcgIuZJ2gO4FBgNnBYRt0jareyfDhwOnC7pJkDAgRHxp0HMt5mZLUa/wR0g\nImYCM9u2TW/8/QfgPZ3NmpmZLS2PUDUzq5CDu5lZhRzczcwq5OBuZlYhB3czswo5uJuZVcjB3cys\nQg7uZmYVcnA3M6uQg7uZWYUc3M3MKuTgbmZWIQd3M7MKObibmVXIwd3MrEIO7mZmFXJwNzOrkIO7\nmVmFHNzNzCrk4G5mViEHdzOzCjm4m5lVyMHdzKxCDu5mZhVycDczq5CDu5lZhRzczcwq5OBuZlYh\nB3czswo5uJuZVcjB3cysQg7uZmYVcnA3M6vQgIK7pG0k3S5ptqSpi0izhaTrJd0i6fLOZtPMzJbE\nmP4SSBoNnABsBcwBrpF0YUTc2kizMnAisE1E3Cdp9cHKsJmZ9W8gNfdNgNkRcXdEPAecB0xpS/Nx\n4IKIuA8gIh7ubDbNzGxJDCS4jwPub7yeU7Y1vQZYRdJlkq6V9Im+/keSdpU0S9KsuXPnLl2Ozcys\nX516oDoG2Bh4H7A1cLCk17QniogZETE5IiaPHTu2Q29tZmbt+m1zBx4AJjRejy/bmuYAj0TEk8CT\nkq4AJgF3dCSXZma2RAZSc78GWF/SOpKWBbYDLmxL82Pg7ZLGSFoeeAtwW2ezamZmA9VvzT0i5kna\nA7gUGA2cFhG3SNqt7J8eEbdJugS4EXgBOCUibh7MjJuZ2aINpFmGiJgJzGzbNr3t9deAr3Uua2Zm\ntrQ8QtXMrEIO7mZmFXJwNzOrkIO7mVmFHNzNzCrk4G5mViEHdzOzCjm4m5lVyMHdzKxCDu5mZhVy\ncDczq5CDu5lZhRzczcwq5OBuZlYhB3czswo5uJuZVcjB3cysQg7uZmYVcnA3M6uQg7uZWYUc3M3M\nKuTgbmZWIQd3M7MKObibmVXIwd3MrEIO7mZmFXJwNzOrkIO7mVmFHNzNzCrk4G5mViEHdzOzCjm4\nm5lVaEDBXdI2km6XNFvS1MWke7OkeZI+0rksmpnZkuo3uEsaDZwAbAtMBLaXNHER6Y4C/rvTmTQz\nsyUzkJr7JsDsiLg7Ip4DzgOm9JFuT+AHwMMdzJ+ZmS2FgQT3ccD9jddzyrb5JI0DPgSc1LmsmZnZ\n0urUA9VvAAdGxAuLSyRpV0mzJM2aO3duh97azMzajRlAmgeACY3X48u2psnAeZIAVgPeK2leRPyo\nmSgiZgAzACZPnhxLm2kzM1u8gQT3a4D1Ja1DBvXtgI83E0TEOq2/JZ0OXNQe2M3MbOj0G9wjYp6k\nPYBLgdHAaRFxi6Tdyv7pg5xHMzNbQgOpuRMRM4GZbdv6DOoRsfOLz5aZmb0YHqFqZlYhB3czswo5\nuJuZVcjB3cysQg7uZmYVcnA3M6uQg7uZWYUc3M3MKuTgbmZWIQd3M7MKObibmVXIwd3MrEIO7mZm\nFXJwNzOrkIO7mVmFHNzNzCrk4G5mViEHdzOzCjm4m5lVyMHdzKxCDu5mZhVycDczq5CDu5lZhRzc\nzcwq5OBuZlYhB3czswo5uJuZVcjB3cysQg7uZmYVcnA3M6uQg7uZWYUc3M3MKuTgbmZWoQEFd0nb\nSLpd0mxJU/vYv4OkGyXdJOkqSZM6n1UzMxuofoO7pNHACcC2wERge0kT25LdA2weERsBhwMzOp1R\nMzMbuIHU3DcBZkfE3RHxHHAeMKWZICKuiohHy8urgfGdzaaZmS2JgQT3ccD9jddzyrZF+TTw0752\nSNpV0ixJs+bOnTvwXJqZ2RLp6ANVSe8ig/uBfe2PiBkRMTkiJo8dO7aTb21mZg1jBpDmAWBC4/X4\nsm0hkl4PnAJsGxGPdCZ7Zma2NAZSc78GWF/SOpKWBbYDLmwmkLQWcAGwU0Tc0flsmpnZkui35h4R\n8yTtAVwKjAZOi4hbJO1W9k8HvgysCpwoCWBeREwevGybmdniDKRZhoiYCcxs2za98fdngM90Nmtm\nZra0PELVzKxCDu5mZhVycDczq5CDu5lZhRzczcwq5OBuZlYhB3czswo5uJuZVcjB3cysQg7uZmYV\ncnA3M6uQg7uZWYUc3M3MKuTgbmZWIQd3M7MKObibmVXIwd3MrEIO7mZmFXJwNzOrkIO7mVmFHNzN\nzCrk4G5mViEHdzOzCjm4m5lVyMHdzKxCDu5mZhVycDczq5CDu5lZhRzczcwq5OBuZlYhB3czswo5\nuJuZVWhAwV3SNpJulzRb0tQ+9kvSN8v+GyW9qfNZNTOzgeo3uEsaDZwAbAtMBLaXNLEt2bbA+uXf\nrsBJHc6nmZktgYHU3DcBZkfE3RHxHHAeMKUtzRTgzEhXAytLWqPDeTUzswEaSHAfB9zfeD2nbFvS\nNGZmNkTGDOWbSdqVbLYBeELS7UP5/kthNeBPg/kGOmow/+8vyqCXfZgbyeV32Ye3Vw8k0UCC+wPA\nhMbr8WXbkqYhImYAMwaSseFA0qyImNztfHTDSC47jOzyu+x1lH0gzTLXAOtLWkfSssB2wIVtaS4E\nPlF6zWwKPBYRD3Y4r2ZmNkD91twjYp6kPYBLgdHAaRFxi6Tdyv7pwEzgvcBs4Clgl8HLspmZ9WdA\nbe4RMZMM4M1t0xt/B7B7Z7M2LPRME9IgGMllh5Fdfpe9Asq4bGZmNfH0A2ZmFXJwN7MhJUndzsNI\n4OA+DPhkr5+/4wXCbcFDwsG9CyQdLOmjkt4IebLXfvFLmizpDd3Ox1CT9M+SlnVAA0kflnSypFdL\nWrXb+Rlqkj4h6Z1D9n4+54aepK3IOXvWAp6LiD3LdtUYBCStDXyEHCNxOnBFRNzYxSwNCUmvAA4D\nVgcuAq4fCeVeFEkrAfsCKwHLAWeVuahGBEk7ATsClwF/iIgzBvX9Kowlw5akw4FTIuL3kl4GvAT4\nEflFf6ykqSrAS9oG+HVEPCbpNcBuwAvALRHx7e7mbvBIel9EXFz+ngKsC2wNTIuIn3U1c0NM0p7A\n6hFxcHm9AbA58BnggIi4rIvZG3Sl/N+NiIckrQdsBHwc+H1EfGGw3tfNMkNLwEWSxkXEkxHxSES8\nAxgr6Tyoqz1S0vrAVsBukl4VEXcA/wFcD2ws6cNdzeAgkTQW2F/SIQAR8WPgZOBM4IihvDUfJn4O\nbCTpIICIuL1MRXIC8O+SXtfV3A0iSaPISRRPlvTKiLiLHNH/JWCipC8P1ns7uA+BMic+EXEQcD7w\nobJ9TNn+bmAVSft2LZODICLuBH4IvJSc65+I+DN5ct8KvF7SMt3L4eCIiLnAp4FJkt5Vtj0REeeQ\nax3sKGnNbuZxKLSeI0XEbcABwD1tSb5Pjnx/WzN9TSLiBeBgYBbwT61tETEbOAgYL2mzwXhvN8sM\nMkmjI+L51n8Xs//NwNYR8R9dyGZHtZdV0tgS8OY3O5Xgdjrw7Yg4t0tZ7ag+yr1KRDzalmZd4LPA\nuRFxY23NcC3Nz0LSyyLiyca+USXoIWlrYGfg4zV9Dou63tvSrAjsATwYEad3Og+uuQ+ichI/X27N\njpf0FUmflfSqVprGCfAw8AZJG3Ylsx3S+LEaJemTknYA/lL2jWr1DIqIPwCHAFtIenk389wJbeWe\nImlz4Mn2dBFxN/AIMLW8riagtTTPe0nfAY6R9EVJa8H82izl70vJ9R+26VJ2O66t/PtJ2rNMvLjQ\ndC8R8TjwM2CXwbiTc819kJVbzf8GLiEfJH4S+FREXNdH2g2Bh0vTRc8qP2Y/AX4DvInsLbJFWcmr\nmW5F4F3Axf3VcnpBKfdlwJXAPwO/AL4SEY+V/fNr6ZI+CMxs/0xqUc77c4GrgBvI82GriPhNI82o\niHihBLbHmrX7XlfKfynwP8Akckr07SJiTjNNqey8FbguIp7pZB5ccx98GwDXRsQ0cq3Zb0fEdZLG\ntbc3R8T/9nJgb7SZfp68qI8AlgfObwaxRlvs48DPez2wN8q9D3BJRHwJeBa4qfQSaj1zaY5nuLS2\nwF5+3FpeDdxHdgHdDzg0In5THjYDC9XgH6wlsDe+3w8D10TEkcCqwHkRMUfSS1ppG3dtszod2MHB\nveNaF3LDXLK55R7gwog4tqTZj0qWImwGr7LpfuBp8m7lF6XM4yR9ri0dEfHUUOe3U/oo92xgOUlX\nkhfzqaUpYn6TQyttRDw91PkdTKVZ6gWlUcAfgGWBi4HLImJa+bxOlrTQYhg1NE0t4lxYSdK15A/5\n8ZJWB/bro1I3KD/yDu4d1NbWdrFysNITZLPMb4DWAJZzgVdExL1dymrHtLU1f0XSePL5wf7ArRHR\nWkjwm8A6Xctoh7WV+9BycT9HBvKfRcTRJenxlF4StWp7tnQJ8PkSsO4ha++/Kz9yZwOPRMSsLma3\n49rOhXOU/fifJJtifg2cWJKeAKwdEX8binwN6RqqtWvVXMjayvVRBqtIupQM8tMk3U+2L+5S9vV0\nb4nGRX0ZOVhpDjBH0ieBU8u+DYD7IuIA6P0yw0LlvpgcjPI88FNJk4ANJJ1NjsT8c0Ts0828DrZG\n88rPgCsj4vjyuhXUPga8kjwH9oY6zoGWxrnwc+CGiLgdQNLRwOeA7ypXsXswIj5b9g16+f1AtQO0\ncLevDYAvAoeS7Y47Ag+RtfffAS+NiIdK2lGNC6OntD0c3BXYKCL2lPRa8tnCLLIv+0Rg5Yi4sKTt\n2TLD35V7L+BVEfElSePIcv+CbG57BbBsRHy/pO3pcvdFC3dpXAU4Dvg34IPAlsBrgL3KM6ZXtJ4n\n1fJZtJV/K+D9EbG3pDcBO5C19huBAFaLiF+3Hzeo+XNw74xSY59EfpmnkkOM/z/ZzWt14LZo9GWt\npeYiaTlynpwzyPV2HyV7Bf0r2Tvid420VZS5RdKnyID+RPk3mbwd3z5Kv/6SrqpyN5Xz/mMRcZ6k\n7wHjySbIH5Gjk8dExIHN9DV9FqX8awIrkwMU7y3/ngXeCXwhGtMrDGX53SzTOVsDPwA2I2svG0TE\nTQCSzicv/vlqOMElHUC2M38L2Ju89f5uRPxV0spA+4Ojni8zgKT9geUj4rBSzr8APyg9Y2aSzVDz\ng3st5W5qBKk3A/8qaYWI+KikNSLiwZLmE2Sgm6/Cz2InYAq5bvQu5B3b/0TE38qP3auaiYey/H6g\nupTaun0REZeQ68heAGwaETcppzb9LvBsRBzejXx2UnuZyeamiWTT0+URcUom0w/I8/i3Q53HwaDG\n4JNyp3I7MEHS3hHx9Yg4DVheOWDnkYi4slt5HWytrn6NIHUb+eO+maT9I+JBSStLuohsGTiseVyv\n09/3hruJnCtpGvBo5KCsNSR9n7zuzxvqPLY4uC+lRrevvSW9umw7nRx1eapyOoFHyG5Qn4A+g2NP\naZR5h/L6DPIh2sZk7e2VZE1ubkTsAHVc1BExr/SEeENEPEu2q38P2FBSq8nh/cBfI2InqKPciyNp\npqQtIscqXEn2AHuTpM9FxF+A0xudBkbVUmMvD08laZ/y+nfk/El3AweWXkEvB+6KiB2hi9d9RPjf\nEvwDRjX+fgkwk+ziNIFsX4Rse3sCWKuv43rtH+XZTPl7Ejnq7sDGtk8DdwD/D1ixhjL3Ue73ks8S\nNi+vVwQ+QQa2A9uO6+lyL+KzGNX2em+y8vK28noZ4BTyIeK7a/ss2q77lwF/BU5qbHtrORe+P1yu\ngZ6uSQ41LTxQYyIwmhxmvgzZQ6Y1P8QdwH4RcV/r2OjR3gGlzK3eIWMi4gby7mSSpNb8KKcCD5C3\npY+3ju3VMsPC5QaIiJlkl74fN2qsF5ODdR5rHKdeLndf2s77LSWtFxHHknPzXyzprZF9t0cBx0XE\n/7SOreGzaJW//L165GjaNYC3SfoWQERcRdbefzJcrgH3lllC5RbrB2St/Wng5oj4sqQTyV/0SeQ8\nEZ9qpe/VE1wL5v4YBZxGTt17K1n+VclRtnPJ29D7IqKKKYvbyn0M2V/9arLp4Z3AOeQUC9uRUwx8\nsRxXVU+QpvJZ/Ax4ijwPrgeOBN5d/nsv2RxX1aIzrXKU8n8HWAW4i3y29ivgOuBaspfQnRHxb83j\nupRtwMF9iUn6OrlE2F7k6jpHk1/u4cDrgXGlljcsvuBOkHQOWTM/i5yX+w7yRP8bGeQei5xDo5oy\nw/zBZzcBt5Dz0Y8lm57eBbydfK5YfWAHUE4d8bqI2F3SJsCmwBoR8UXl2I7loiwhWONnIel0siJz\nAtn99Q3k9XAduapSlDvYYVN+d4XsRx9f1FPAryIfrNxFTrj/JbKd7QZyBryerrE3SVqNrK1MjYj7\nSjfAE8mT+XCyOaqVtooyAyinZX4qIvYvr/+B7A31LxFxlqRLS1NEVeVu6aNMq5DPGYiI35YeRIcp\nV9i6vXHcsAhsnSRpefJu5YyIuLf0gFse+FBkz6hTGmmHzbngNvd+9HGi3g58VdJa5Uu8j6zRrdJ2\n3LD4gpdU+5P9iPgT2Zb4XkmrRsQfgaOAvma17Mkyw8LdHYuHyeUPDwCIXDnnTrJnENGYH6SXy70o\njTb28WXTOcAyykWeW23Mz5MdCZrH9Xxgb+/pFDm53Z1kb5hXRcQj5Bw66yoX/W6mHTbngoP7Iqix\nvqekCyWdL+nTZLenacDlpUvgGWRb2++7lNWOkbRm48HRVEn7l1rLBcAbgU+WB8n7k00xQzIB0mAr\nF+w8SWOUXVt3LJ/DoeSP2MEl6eY0BifVSNL2yi6tkMshfl/Sf5Kx4kfAByWdK+mHZBv7Nd3K62CQ\n9OZGB4JjJJ0gaR3yx+1O4ITSLHUIcE9E/LV7uV08t7n3QdIawBVkl8ZnWfAgsdV3/evAe4C1ybbG\n/yrH9ewtqaRdgA3Jpe/+k2x+ep6sqW4MvIWcM2Rd8qTepxzXs2WG+T/i7yPnnj8KeIZ8dvJDMpit\nSK6B+SjwZETsXI7r6XL3RTkl7aFkR4GXkVNnnEG2M19FPkh/nOwW+mSUATq1fBbK1bP2IgP5P5Pf\n+UvJ6UOOBf4I/As5u+mjUaZVGK7ld3BvI2mziPi1cgKsacDqETG57PsnMqg/AZwcZZh12Tds2tqW\nlHIBhZcCXyb7774kIj5f9n2LnFJhs4h4Uo11QXu5zAClWenVwKfIrm3PRMTnSpD7Ojkn93RyJO6K\nrVpar5e7L8rJ384ke33sRE5dvGdpX1+b7BEzGzgxconE1nFVfBaSXhsRt0n6LDlH0KsiYkrZN5X8\nwZ8eEVc0g/lwLr+bZRqUI00vLbfltwH7ku2uxwBExM/J2R3HkTXZ+YbrF9yf8oN1KNlX/8tkt78N\nlSNsKV27rgIelPRSFqyH2tP9uUuN/SdkP/VTyHJtLmmDiHiY7BX0GuArwCqNwN7T5e5L+ZF7I/k5\n3E92+bwV+JykdSPXHfgCec6/tXlsDZ+FpI2A8yVtGREnk73fNlBOW03pCXYjcEg5P1qBfVifC665\nF61fY0kbk0PLPxQRNyjXNZ0BXBERB5W0/xgRt3Qzv51SurFtTf5gHUoOxPoi2d3x4oi4uaTbOQZh\nhfZuUS539mVgPbIr24ZkjXUMMCMi7igPEz8QEScu+v9Uh/LDfRg5J/8F5TqYAqwAHB8Rd6sxbW9t\nJG0LHEiOOn6CXCbvTeQkYN8rad4fERd1L5dLxsG9QQsGr6zZduu5AXAyORH/no3tw7KtbUlJWg8Y\nGxFXl9evJe9a7iUD/PWNtMP2NnRJlQD/jliwqMrGwAfIQVkzIuLWRtoqvuvFUY5Antd4/Wby8xhH\nTrEwt2yv8rOQtH5E3Fn+fiU5X9BbyAVIzmyk64nyu1mmoRW0WoFdpVtg6cf7ebLttZl+2H/BAxER\ndzUC++jSJDWNfJC6VlvaKgI7QEQ80wrs5fW1ZA8RyGaKZtoqvuvFaQV2LVgP9Bqyy9/MaMxPX+tn\n0QjsoyIX1LmIHMA2pi1dT5R/RNfcl/YXuFd+uZdWo4lq/tzcI4mktaOC9W07rfbzvi+SVhrO3R0X\nZ0QHd5hfU32+2/kYzmq4qAdShvY0NZS7XenT/8cay7YklrT8vfh5jchmGeUAnbNgwfzM3c7TYJP0\nfklvX5pje+2kbpL0KeU87NHf99xezl4ud18kbQkcL2nDRZWt5mtB0s6Sdocl/2578VwYkcGdbE9e\nSzmTI31d+Pr74eg9S7kU3NvJrn5vWUy6qi7sUu5Vgb0kbdTswtaWrqpyL8btwOXAAco5g+Yrn1VP\nBrGBKN09HwLeqFz7dlHpqomJ1RRkIBoPiv5Grs4+q23/qyVdVB6ozKvhoi/NTn8hV6YXOVipuX+c\npI9KWqa2C7uU+1xyEYV/aGwPSetJmtJ63aUsDolGx4A55GISp0TOGdTa/3JgmqQPdSmLg65c85eR\n58OjzX2SJkg6ZLj3W19SIya4t9rWlcul7RAR8yLXvpzfnhY5P8xfyUDY8xd9o8yjySf+R0ZjJfZi\nU3KO8jXbj+9Vkl6vHEoO2Z3tyoj4YVuydYFPSZo0tLkbWlqw0MYoSSuS88FcVfa1Ki/PAb8EJtZ0\nxwoLKnQAEfF0RPyij3NhObK7Z1U/biMmuLcCO7ks3mZt+0ILZjicCtyoHNTRs8oPVqvMPwA+1taH\nufXdXwQsSw5k6nmlXM8A+0q6nZw+4o4+kv6WHInY09/z4jQrNOTSiDOA77WaYCjXf0Q8DdxMBrme\nv1ttaV4DkmaU2vmHmvvLn/cA15DT+FZjRAT3xpf4NeCOiNhD0kqSPiDpdbDQFK5/BM4uJ3zPatx1\nnA3MioivwvwBWa0pXUdHLvi8L7lkXE/TguXQ7icn/BoN9DlbZ0Q8Rga8e4Yuh0Or0QtsHzJ47UlO\nUX2ppJVL4FumpL0e+GpUMtMnLHQNnESunHYP8F9aMK1ANHrLfY9caaoaI6orpKTdgNWAiWS729vJ\ndrjDIudo7nnNLluSViUngzqOnNluM3LE4YyIOLSkqWLEaVstdSNyVstXkoHtsog4rqQbX9qeqyXp\nA8DvIuJ+SdOATYCDIuLysv8Yco6YbaPC6QQkbRw5IA1JRwMrRMRu5fXm5GjzI6JMp9GL3RwHYqTU\n3M8vt2MXkkvD/Y6cR+KfyDlFVu9i9jpGCy9mPar8YJ0JfIucBOssYCtgbeXQ+ypGnLY1QV0CfC5y\ntOFvgVOBLSXto5zh8t3dzOtgk7QWMLoE9pXJVbOeAN5V2tyJnK75NnJ626qU7p7rNjaNAt4kaVPl\n9AqXA/8GHCfpHdD7z9YWpcqae3ttVNJ2wH8Au0fEpWXbsuSiz09FxK7dyWnntNVcjyd/xC6JiJ9K\nWjEiHi8Py2aQc3Hvudj/YY+QtE5E3FP+Phl4JCKmlu93Y3J+nHXIWQ0fjrKAce2U8/NvTM7Nvzz5\nAz8TOLU0SVVJ0rIR8Vy5S/9TRHxf0hHk2guHkc2yL6hM8dvVzA6yKmvusWCJsDVLre48cvWgk8ot\nK2Q782OtwN7r3R4bvWIuJFdn/z05YGUvYBVJy5GLj4xqBfZeL7NyuuL3NjY9BPyvpG+Qiyt8H9it\n9A75cDRWph/yzA6yZq+Q4jqyaWp34ElgN3Jh572bnQVq+Sy0oJvzc+WudGXybmXbiPgS+azhELJH\nkFqBXRX1a29XXc291X6mnGD/LcB+5MpBoZyI/yBgV+CXEfFcOaZn250lfYzs6vdA+eGaCBxDPiC9\nm2x3vgL4LvCKyEW8e7rMAJJWiIgnysV5ENnrZzlgZ3IFoRnkQ9VvkgH+vnJcde2rbXdtXyQ7BVxD\nrgO7L9l76EQy4E2JiKO6ltlB0Fb+ycDjsWDhjTcAF5U72OnATRFxQlczPESqCe7qY44YSceT81Ef\nGhH3KFe39M6ZAAAG90lEQVSUOQ6YHRUsE1faUH9EPkM4MiL+VJojDgBeiIgjyi3ptsD+EfGLclzP\nlhlAOfhoP+AjEfGwco3TycDURo1sDLn4xNMR8bnu5XZolMD2Y/JceI6cxXQd4B+Bj5FNMwe1mmR6\n/RxoV8p/MbmoxjvIRXX+k2xf35Ccl/2C7uVw6FVxS6KFBygdKmlHSf8QEXuQaz4eWh6oHkvOT75P\n69hePcGVI0ofB/6VbFs9VLkE3nPkqkqvL0lXB85rBXbo3TK3RMSPyW5rp5Rb8OnAz4GvasH0CtOA\nZ1uBvZbmhyblCkItHyBH4h5BDko7KiKejYjryIfqv2q2tff6OdCHaeQ5cHh5/VDp1nky2US5Yith\njedCX6qquZMX/FXABGAVcr3HSyTtS45AW6HZ7tqrJ7gWLCqyLPBt8gfsX8ha/FTybuVEcl3QmyPi\nk+W4ni0zgKSdgLdGrnM6mlxo47KybzWyhro1ucLSA7FgcYmeboLqi6T3kQuWX1CaHLYiV5SaAPwk\nIo4td3afB46NiGfKcT19DrS0l0PSnsD/knd0v4yIoyStTy68cl1t3/9A9HRwl7QpcFdEzJW0D9kF\n7GhJvyYX830BOCty7dPmcT17sTd6vowm21dfFxHbKRd1/hYwl+zm+SQwKXLBhSou6hLA9wC+HhF/\n7eMCX418nvJMRHy9bOv5cvdF0prkknCrkb1griQHZSki3lbS/ACYExF7dy2jg2ARTbC7kN1eD4yI\nr5VtPySfR00rr6s8FxalZ4N7uf0+GDgH+CHwMvLB0XHA9aXmciXwGDlg4VfluJ79gssD0y+QPT8e\nlrQj8Gayrflp5dJgt5Jtr3tFxBPluJ4tc5OkVcgHpb+IiOllW3uAX6FV7ho1Kyblx+wz5B3a+WSF\n5ifAA+T1cG8Nd6p9KW3sPyUrM9eQd7C7kQ/UjybvZOdGxC7dymO39Wybe0T8BvgO8C5yMdtnya5f\nryC/bMiT/HutwF6O69kTPCIuJB8UnVa6sz1E9obZUNJykUuDXUL2DnqicVzPlrkpIh4lxyv8u6SP\nl22horye/4PWvZwODuXKWK1uvmcDGwBnkL1jPko+QH0b2e58WCOwj6rhHNDC3T13Ix8enw2MJyt6\nJ5GLvL+ErADsUo7r2Tj3YvRczV3SuhFxd+P1d4BJwJHkdJ6fJk/uW4E/RsTHS7qerbn009Y8lezu\n9Rg5re0tEbFX2dezZV4cZf/2b5JzoZze2N6zzW39kfRBshJzRGQ3v9OBQyLiXkmvAj5Jzuz5s4i4\nqHFcFedA4znTKOAjwBbASRFxU2menULOfHpMLLy4fbXnRH96anpPSe8he4XsHRG/VY5Cm0B2eduy\nJDuFnO1vQulVUcMJ/lNgPZX1HCVd3toREUcqV1haBRi3qOaKmkTEzyXtCpwlaSWya+vMWi9iScuT\ng5LeCHxe2V/7IbLLI5HL5n2bnE5gxeaxNZwDymkDWusrXERWZDYhZzP9TERcLSmAncigf07r2FrP\niYHomZq7pBXIh4R7k5MezSFP9p0iYo6kHYBtyAE7344FK7n3/C/3otqaye/v78pWQ5kHovSG2Iqc\nS+TmZi2+Fso+/fuQc42/lGxjn0AO0PsZGfT/Aowlu/nO7VJWB4UWnjrjS8A6EbGLclK8S8meMV8o\nadeLiLu6md/hpCfaosoJfjE5wvIbwNVkT4FjS2BXRJxNLjgwOhrzltcQ5BbV1gz0+ctcQ5kHIiLu\njIgTI2J/oMoBKuXu85fkA8O55APTh4CnycngViUD/aRY0PWziucNpQPBzNITbAXywfEmyjVxHwHe\nR04KdzJAK7DXUv4Xq5dq7l8mB+u0RtvtQN6CHR0Rv+5i1obMSGxrHqn6ec7yOjKwjQe+0ayt1tYc\npwWjjz9CPjDeEVgJOC0ibpS0BvD5iDi4i9kcloZ1cO/nBG8OWvlqRFzZvZwOndK+fhY5f8zsiJjZ\n5SzZIFD/ffonkW3MN0bEmd3K52Do47p/Z0T8suybRI7GXR04Pcq87WVfVT9sL9Zwb5b5KfBQeZD4\nPLlyOwCRC/yeT45I3aRL+Rty5UfsPcA84N2Sdu5ujmyQPE/OC7NQl8/WzsgJ4I6tLbAX7df9Za0d\npdw/Jse0TGwe5MC+sOFecx/xg1b60+pB0+18WOeVWupF5KjLc8q2Vn/+5jVQVY11gNf9hIi4v1t5\n7AXDOrjDyD3BzWDkPmfxdf/iDft+7hFxg3LeiG8qV1k5vfVlNk9wf8FWo5HWp7/F1/2LN+xr7i1+\nkGgj2Ujo098XX/dLr2eCO4zcE9ysaaQ9Z/F1v3R6Krg3jbQT3Mx83S+Jng3uZma2aMO9n7uZmS0F\nB3czswo5uJuZVcjB3cysQg7uZmYVcnA3M6vQ/wEK6R9jqTacaAAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "ibm1 = ibm1_iterations[-1][1]\n", "ibm2_iterations = em_model2(ibm1, train_model_2, 100)\n", "ibm2 = ibm2_iterations[-1][1]\n", "plot_table_for_target(ibm2.alpha, \"house\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For alignments we get:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "run_control": { "frozen": false, "read_only": false } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", "\n", " \n", " NULL the house is small\n", " \n", " \n", " klein ist das Haus\n", " \n", " \n", " \n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "word_mt.Alignment.from_matrix(ibm2_iterations[-1][0][0],\n", " train_model_2[0][1], train_model_2[0][0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us look at the distortion probabilities for a given source position and source and target lengths." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "run_control": { "frozen": false, "read_only": false } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD8CAYAAACMwORRAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAADEJJREFUeJzt3H+o3fddx/Hny6TFydQOc5WaHyZ/xGlQW+c1FiZYN6ZJ\nJ4aBf7TTFctGKLRSQbDxH0X2j2MoY6xrCDWUoSwIKxrn1Tiw2j+2alLt2qY145LNJlkhqdXpHFhi\n3/5xz+R4muZ8783JPev7Ph9w4Xy/30/v9/2l5MmX77nnpKqQJPXybfMeQJI0e8Zdkhoy7pLUkHGX\npIaMuyQ1ZNwlqSHjLkkNGXdJasi4S1JDm+d14i1bttTOnTvndXpJelN66qmnXq6qhWnr5hb3nTt3\ncurUqXmdXpLelJL8y5B1PpaRpIaMuyQ1ZNwlqSHjLkkNGXdJamhq3JMcTXIxyXNvcDxJPp5kOckz\nSd4x+zElSasx5M79UWDfVY7vB3aPfg4CD1/7WJKkazE17lX1BPDKVZYcAD5VK54Ebkpy86wGlCSt\n3iyeuW8Fzo1tnx/tkyTNybp+QjXJQVYe3bBjx471PLWkN7mdh/5i3iPMzFd+773X/RyzuHO/AGwf\n29422vc6VXWkqharanFhYepXI0iS1mgWcT8O3D36q5nbgK9V1Usz+L2SpDWa+lgmyaeB24EtSc4D\nvwPcAFBVh4El4A5gGfgGcM/1GlaSNMzUuFfVXVOOF3DfzCaSJF0zP6EqSQ0Zd0lqyLhLUkPGXZIa\nMu6S1JBxl6SGjLskNWTcJakh4y5JDRl3SWrIuEtSQ8Zdkhoy7pLUkHGXpIaMuyQ1ZNwlqSHjLkkN\nGXdJasi4S1JDxl2SGjLuktSQcZekhoy7JDVk3CWpIeMuSQ0Zd0lqyLhLUkPGXZIaMu6S1JBxl6SG\njLskNWTcJamhQXFPsi/JmSTLSQ5d4fh3J/nzJF9McjrJPbMfVZI01NS4J9kEPATsB/YAdyXZM7Hs\nPuD5qroFuB34/SQ3znhWSdJAQ+7c9wLLVXW2ql4FjgEHJtYU8J1JArwVeAW4PNNJJUmDDYn7VuDc\n2Pb50b5xnwB+GPgq8CzwQFW9NpMJJUmrNqs3VH8eeBr4fuBW4BNJvmtyUZKDSU4lOXXp0qUZnVqS\nNGlI3C8A28e2t432jbsHeKxWLANfBn5o8hdV1ZGqWqyqxYWFhbXOLEmaYkjcTwK7k+wavUl6J3B8\nYs2LwLsBknwf8Hbg7CwHlSQNt3nagqq6nOR+4ASwCThaVaeT3Ds6fhj4MPBokmeBAA9W1cvXcW5J\n0lVMjTtAVS0BSxP7Do+9/irwc7MdTZK0Vn5CVZIaMu6S1JBxl6SGjLskNWTcJakh4y5JDRl3SWrI\nuEtSQ8Zdkhoy7pLUkHGXpIaMuyQ1ZNwlqSHjLkkNGXdJasi4S1JDxl2SGjLuktSQcZekhoy7JDVk\n3CWpIeMuSQ0Zd0lqyLhLUkPGXZIaMu6S1JBxl6SGjLskNWTcJakh4y5JDRl3SWrIuEtSQ8Zdkhoa\nFPck+5KcSbKc5NAbrLk9ydNJTif5u9mOKUlajc3TFiTZBDwEvAc4D5xMcryqnh9bcxPwSWBfVb2Y\n5Huv18CSpOmG3LnvBZar6mxVvQocAw5MrHk/8FhVvQhQVRdnO6YkaTWGxH0rcG5s+/xo37gfBN6W\n5G+TPJXk7lkNKElavamPZVbxe34CeDfwFuALSZ6sqi+NL0pyEDgIsGPHjhmdWpI0acid+wVg+9j2\nttG+ceeBE1X1X1X1MvAEcMvkL6qqI1W1WFWLCwsLa51ZkjTFkLifBHYn2ZXkRuBO4PjEmj8DfjrJ\n5iTfAfwU8MJsR5UkDTX1sUxVXU5yP3AC2AQcrarTSe4dHT9cVS8k+SvgGeA14JGqeu56Di5JemOD\nnrlX1RKwNLHv8MT2R4GPzm40SdJa+QlVSWrIuEtSQ8Zdkhoy7pLUkHGXpIaMuyQ1ZNwlqSHjLkkN\nGXdJasi4S1JDxl2SGjLuktSQcZekhoy7JDVk3CWpIeMuSQ0Zd0lqyLhLUkPGXZIaMu6S1JBxl6SG\njLskNWTcJakh4y5JDRl3SWrIuEtSQ8Zdkhoy7pLUkHGXpIaMuyQ1ZNwlqSHjLkkNDYp7kn1JziRZ\nTnLoKut+MsnlJL80uxElSas1Ne5JNgEPAfuBPcBdSfa8wbqPAH896yElSasz5M59L7BcVWer6lXg\nGHDgCut+DfgMcHGG80mS1mBI3LcC58a2z4/2/Z8kW4H3AQ/PbjRJ0lrN6g3VjwEPVtVrV1uU5GCS\nU0lOXbp0aUanliRN2jxgzQVg+9j2ttG+cYvAsSQAW4A7klyuqj8dX1RVR4AjAIuLi7XWoSVJVzck\n7ieB3Ul2sRL1O4H3jy+oql3ffJ3kUeCzk2GXJK2fqXGvqstJ7gdOAJuAo1V1Osm9o+OHr/OMkqRV\nGnLnTlUtAUsT+64Y9ar61WsfS5J0LfyEqiQ1ZNwlqSHjLkkNGXdJasi4S1JDxl2SGjLuktSQcZek\nhoy7JDVk3CWpIeMuSQ0Zd0lqyLhLUkPGXZIaMu6S1JBxl6SGjLskNWTcJakh4y5JDRl3SWrIuEtS\nQ8Zdkhoy7pLUkHGXpIaMuyQ1ZNwlqSHjLkkNGXdJasi4S1JDxl2SGjLuktSQcZekhoy7JDU0KO5J\n9iU5k2Q5yaErHP/lJM8keTbJ55PcMvtRJUlDTY17kk3AQ8B+YA9wV5I9E8u+DPxMVf0o8GHgyKwH\nlSQNN+TOfS+wXFVnq+pV4BhwYHxBVX2+qv5ttPkksG22Y0qSVmNI3LcC58a2z4/2vZEPAn95pQNJ\nDiY5leTUpUuXhk8pSVqVmb6hmuRnWYn7g1c6XlVHqmqxqhYXFhZmeWpJ0pjNA9ZcALaPbW8b7ft/\nkvwY8Aiwv6r+dTbjSZLWYsid+0lgd5JdSW4E7gSOjy9IsgN4DPhAVX1p9mNKklZj6p17VV1Ocj9w\nAtgEHK2q00nuHR0/DPw28D3AJ5MAXK6qxes3tiTpaoY8lqGqloCliX2Hx15/CPjQbEeTJK2Vn1CV\npIaMuyQ1ZNwlqSHjLkkNGXdJasi4S1JDxl2SGjLuktSQcZekhoy7JDVk3CWpIeMuSQ0Zd0lqyLhL\nUkPGXZIaMu6S1JBxl6SGjLskNWTcJakh4y5JDRl3SWrIuEtSQ8Zdkhoy7pLUkHGXpIaMuyQ1ZNwl\nqSHjLkkNGXdJasi4S1JDxl2SGjLuktTQoLgn2ZfkTJLlJIeucDxJPj46/kySd8x+VEnSUFPjnmQT\n8BCwH9gD3JVkz8Sy/cDu0c9B4OEZzylJWoUhd+57geWqOltVrwLHgAMTaw4An6oVTwI3Jbl5xrNK\nkgYaEvetwLmx7fOjfatdI0laJ5vX82RJDrLy2Abg60nOrOf512AL8PK8h5iTjXztsLGv32u/zvKR\na/rPf2DIoiFxvwBsH9veNtq32jVU1RHgyJDBvhUkOVVVi/OeYx428rXDxr5+r73HtQ95LHMS2J1k\nV5IbgTuB4xNrjgN3j/5q5jbga1X10oxnlSQNNPXOvaouJ7kfOAFsAo5W1ekk946OHwaWgDuAZeAb\nwD3Xb2RJ0jSDnrlX1RIrAR/fd3jsdQH3zXa0bwlvmkdI18FGvnbY2NfvtTeQlS5Lkjrx6wckqSHj\nfgXTvm6hsyRHk1xM8ty8Z1lvSbYneTzJ80lOJ3lg3jOtpyTfnuQfknxxdP2/O++Z1luSTUn+Kcln\n5z3LtTLuEwZ+3UJnjwL75j3EnFwGfqOq9gC3AfdtsP/3/w28q6puAW4F9o3++m0jeQB4Yd5DzIJx\nf70hX7fQVlU9Abwy7znmoapeqqp/HL3+T1b+kW+YT1qPvj7k66PNG0Y/G+ZNuSTbgPcCj8x7llkw\n7q/nVymIJDuBHwf+fr6TrK/RY4mngYvA56pqI13/x4DfBF6b9yCzYNylCUneCnwG+PWq+o95z7Oe\nqup/qupWVj5lvjfJj8x7pvWQ5BeAi1X11LxnmRXj/nqDvkpBPSW5gZWw/3FVPTbveealqv4deJyN\n8/7LO4FfTPIVVh7FvivJH813pGtj3F9vyNctqKEkAf4QeKGq/mDe86y3JAtJbhq9fgvwHuCf5zvV\n+qiq36qqbVW1k5V/839TVb8y57GuiXGfUFWXgW9+3cILwJ9U1en5TrV+knwa+ALw9iTnk3xw3jOt\no3cCH2Dlru3p0c8d8x5qHd0MPJ7kGVZucj5XVW/6PwncqPyEqiQ15J27JDVk3CWpIeMuSQ0Zd0lq\nyLhLUkPGXZIaMu6S1JBxl6SG/hfoR9x2tQodZAAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "def distort(si):\n", " return [ibm2.beta[ti,si,5,4] for ti in range(0,5)]\n", "util.plot_bar_graph(distort(0),range(0,5))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Decoding for IBM Model 2\n", "\n", "Decoding IBM Model 2 requires us to solve the argmax problem in equation $\\ref{decode-nc}$, this time using the conditional probability from equation $\\ref{ibm2}$ with the hidden alignments marginalized out:\n", "\n", "\\begin{equation}\n", " \\argmax_{\\target} \\prob(\\target) p_\\params^\\text{IBM2}(\\source | \\target) = \n", " \\argmax_{\\target} \\prob(\\target) \\sum_{\\aligns} p_\\params^\\text{IBM2}(\\source,\\aligns | \\target)\n", "\\end{equation}\n", "\n", "Recall that $\\prob(\\target)$ is the language model. This nested argmax and sum is generally computationally very hard (see [Park and Darwiche](http://arxiv.org/pdf/1107.0024.pdf)), and often replaced with the simpler problem of finding a combination of best target sequence and corresponding alignment. \n", "\n", "\\begin{equation}\n", " \\argmax_{\\target,\\aligns} \\prob(\\target) p_\\params^\\text{IBM2}(\\source,\\aligns | \\target)\n", "\\end{equation}\n", "\n", "As it turns out for IBM Model 2 the sum can be efficiently calculated, and [Wang and Waibel](http://aclweb.org/anthology/P/P97/P97-1047.pdf) show a stack based decoder that does take this into account. \n", "\n", "
\n", "However, both for simplicity of exposition and because for most real-world models this marginalization is not possible, we present a decoder that searches over both target and alignment. To simplify the algorithm further we assume that target and source sentences have to have them same length. Of course this is a major restriction, and it is not necessary, but makes the algorithm easier to explain while maintaining the core mechanism. Here we only show only the Python code and refer the reader to our [slides](https://www.dropbox.com/s/p495n19h5rtk3uf/IBM-decoding.pdf?dl=0) for an illustration of how stack and beam based decoders work. " ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "run_control": { "frozen": false, "read_only": false } }, "outputs": [], "source": [ "class Hypothesis:\n", " \"\"\"\n", " A Hypothesis represents a partial translation and its (log) score under the model. \n", " \"\"\"\n", " def __init__(self, target, align, remaining, score, parent = None, source=None):\n", " \"\"\"\n", " Create a new hypothesis. \n", " Params:\n", " target: the list of target words built so far.\n", " align: a list of source indices corresponding to the target list.\n", " remaining: a set of source token indices, indicating which source tokens still need to be translated.\n", " score: the log probability of the translation so far.\n", " parent: the previous hypothesis that was extended to create this one. For the root hypothesis\n", " this should be `None.\n", " source: the full source sentence. Can be `None` if the parent is not not `None`. \n", " \n", " \"\"\"\n", " self.target = target\n", " self.align = align\n", " self.remaining = remaining\n", " self.score = score\n", " self.parent = parent\n", " self.source = parent.source if source is None else source\n", " def __str__(self):\n", " \"\"\"\n", " Returns a string representation of the hypothesis.\n", " \"\"\"\n", " return \"({},{},{},{},{})\".format(self.target, self.align, self.remaining, self.score, self.parent)\n", " \n", "def decode_model_2(tm, lm, source, beam_size):\n", " \"\"\"\n", " Decode using IBM Model 2.\n", " Params:\n", " tm: an IBM model 2.\n", " lm: an LanguageModel instance. \n", " source: the source sentence.\n", " beam_size: the size of the beam.\n", " Returns:\n", " A list of beams, which itself are lists of hypotheses, with one beam per decoding step. The last\n", " element of the list is the beam of the last decoding step.\n", " \"\"\"\n", " target_length = len(source) + 1\n", " def score(hyp, new_target, source_index):\n", " \"\"\"\n", " Calculates the score of appending the target word `new_target` to the hypothesis `hyp` aligned\n", " with source index `source_index`.\n", " Params:\n", " hyp: the hypothesis to add a target to.\n", " new_target: the new target word to add to the list of target words in `hyp.target`.\n", " source_index: the index of the source word that is aligned with the new target word.\n", " Returns:\n", " the log probability of extending the hypothesis by the new target word.\n", " \"\"\"\n", " lm_prob = log(lm.probability(new_target, *hyp.target))\n", " tm_prob = log(tm.alpha[source[source_index],new_target]) + \\\n", " log(tm.beta[len(hyp.target), source_index, target_length, len(source)])\n", " return lm_prob + tm_prob\n", " def append(hyp):\n", " \"\"\"\n", " Expand the given hypothesis to create several new hypotheses: one for each possible remaining \n", " source word and each way of translating it.\n", " Params:\n", " hyp: the hypothesis to expand.\n", " Returns:\n", " A list of expanded hypotheses.\n", " \"\"\"\n", " return [Hypothesis(hyp.target + [target_word], \n", " hyp.align + [(source_index, len(hyp.target))],\n", " {r for r in hyp.remaining if r != source_index},\n", " hyp.score + score(hyp, target_word, source_index),\n", " hyp) \n", " for source_index in hyp.remaining\n", " for target_word in lm.vocab]\n", " # Create the initial beam\n", " beam = [Hypothesis(['NULL'], [], set(range(0,len(source))), 0.0, None, source)]\n", " history = [beam]\n", " while len(beam[0].remaining) > 0:\n", " # create all possible new hypotheses by expanding all elements in each of the current beams.\n", " with_new_target = {new_hyp \n", " for hyp in beam \n", " for new_hyp in append(hyp) }\n", " # sort the new hypotheses and keep only the top `beam_size` ones.\n", " beam = sorted(with_new_target, key=lambda h: -h.score)[:beam_size]\n", " # Remember this beam for future visualisations.\n", " history.append(beam)\n", " return history" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us test this decoder on a simple sentence, using a uniform language model." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "run_control": { "frozen": false, "read_only": false } }, "outputs": [ { "data": { "text/html": [ "
TargetRemainingLenScore
NULL groß ist ein Mann 4 0.00
NULL is groß _ ein Mann 3 -inf
NULL a groß ist _ Mann 3 -2.40
NULL a a _ ist _ Mann 2 -inf
NULL a man groß ist _ _ 2 -4.80
NULL a a man _ ist _ _ 1 -inf
NULL a man is groß _ _ _ 1 -7.89
NULL a man is big _ _ _ _ 0 -10.28
NULL a man is tall _ _ _ _ 0 -10.28
" ], "text/plain": [ "" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "source = [\"groß\", \"ist\", \"ein\", \"Mann\"]\n", "target_vocab = {tok for target, _ in train_model_2 for tok in target}\n", "lm = UniformLM({w for w in target_vocab if w != 'NULL'})\n", "hist = decode_model_2(ibm2, lm, source, 2)\n", "mt.render_history(hist)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are currently two contenders for the most likely translation. This is because the translation model is uncertain about the translation of \"groß\" which can be \"tall\" in the context of the height of humans, and \"big\" in most other settings. To avoid this uncertainty we can use a language model to capture the fact that \"man is big\" is a little less likely than \"man is tall\"." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "run_control": { "frozen": false, "read_only": false } }, "outputs": [ { "data": { "text/html": [ "
TargetRemainingLenScore
NULL groß ist ein Mann 4 0.00
NULL small groß ist ein _ 3 -inf
NULL a groß ist _ Mann 3 -2.48
NULL small long _ ist ein _ 2 -inf
NULL a man groß ist _ _ 2 -3.18
NULL a man NULL groß _ _ _ 1 -6.96
NULL a man is groß _ _ _ 1 -4.56
NULL a man is big _ _ _ _ 0 -7.66
NULL a man is tall _ _ _ _ 0 -5.26
" ], "text/plain": [ "" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lm_train = [tok for t,_ in train_model_2 for tok in t]\n", "lm2 = LaplaceLM(NGramLM(lm_train, 3),0.1)\n", "hist2 = decode_model_2(ibm2, lm2, source, 2)\n", "mt.render_history(hist2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that \"a man is tall\" is also more likely in the [Google N-grams corpus](https://books.google.com/ngrams/graph?content=a+man+is+tall%2C+a+man+is+big&year_start=1800&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t1%3B%2Ca%20man%20is%20tall%3B%2Cc0%3B.t1%3B%2Ca%20man%20is%20big%3B%2Cc0).\n", "\n", "## Summary\n", "There are a few high level messages to take away from this chapter.\n", "\n", "* MT is an instance structured prediction recipe\n", "* The noisy channel is one modeling framework\n", "* word-based MT is foundation and blue print for more complex models\n", "* Training with EM\n", "* NLP Tricks: \n", " * introducing latent alignment variables to simplify problem\n", " * decoding with Beams\n", "\n", "## Background Material\n", "* [Lecture notes on IBM Model 1 and 2](http://www.cs.columbia.edu/~mcollins/courses/nlp2011/notes/ibm12.pdf) of Mike Collins. \n", "* Jurafsky & Martin, Speech and Language Processing: \n", " * Chapter 26, Machine Translation.\n", " * Chapter 6, EM Algorithm\n", "* Brown et al., [The Mathematics of Statistical Machine Translation: Parameter Estimation](http://www.aclweb.org/anthology/J93-2003)" ] } ], "metadata": { "hide_input": false, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" } }, "nbformat": 4, "nbformat_minor": 1 }