{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Here we will see another dimesionality reduction technique called LDA and compare it with PCA. Later on, using this we will do topic modelling.\n", "* LDA\n", "* LDA vs PCA\n", "* Topic Modelling" ] }, { "cell_type": "code", "execution_count": 91, "metadata": {}, "outputs": [], "source": [ "import nltk\n", "import numpy as np\n", "from nltk.stem import WordNetLemmatizer\n", "from nltk.stem.porter import *\n", "from sklearn.datasets import fetch_20newsgroups\n", "from sklearn.feature_extraction.text import CountVectorizer\n", "from nltk.tokenize import word_tokenize\n", "from nltk.corpus import stopwords\n", "import pandas as pd\n", "from sklearn.datasets import load_iris\n", "from sklearn.decomposition import PCA,LatentDirichletAllocation\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### LDA(Linear Discriminant Analysis) and LDA vs PCA\n", "\n", "**LDA** is a dimesionality reduction tecnique.It transforms your data from say 'n' dimensions to 'k'dimensions. Both are pretty similar in output but with one **major** difference. LDA is a supervised algorithm whereas PCA is not, PCA ignores **class labels**.\n", "\n", " As, we have seen in previous weeks, PCA tries to find directions of maximum variance. PCA projects data onto new axis in such a way they explain the maximum variance without taking class labels into consideration. **LDA** on the other hand, creates new axis in such a way that when we project data on this axis, there is a maximum separation bewtween two class categories. LDA tries to separate classes as much as feasible on the new axis. \n", " \n", "Below is the demonstration of same with Iris dataset. " ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [], "source": [ "data=load_iris().data\n", "target=load_iris().target\n", "target_names=load_iris().target_names" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [], "source": [ "dataframe=pd.DataFrame(data=np.concatenate((data,target.reshape(150,1)),axis=1),columns=['col_1','col_2','col_3','col_4','target'])" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
col_1col_2col_3col_4target
05.13.51.40.20.0
14.93.01.40.20.0
24.73.21.30.20.0
34.63.11.50.20.0
45.03.61.40.20.0
\n", "
" ], "text/plain": [ " col_1 col_2 col_3 col_4 target\n", "0 5.1 3.5 1.4 0.2 0.0\n", "1 4.9 3.0 1.4 0.2 0.0\n", "2 4.7 3.2 1.3 0.2 0.0\n", "3 4.6 3.1 1.5 0.2 0.0\n", "4 5.0 3.6 1.4 0.2 0.0" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataframe.head()" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [], "source": [ "dataframe.drop(columns=['target'],axis=1,inplace=True)" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [], "source": [ "pca = PCA (n_components=2)\n", "X_feature_reduced = pca.fit(dataframe).transform(dataframe)" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAEICAYAAAC3Y/QeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAIABJREFUeJzsnXd4lFX2xz/3faenk0LvVSwgIGBBQBBUml1R7IJ9Xfuqa9l1rT/dXdfeRbErIkWliIggIEVAOqF3EkLq9Hnv748JgclM+iQTkvt5njySt9z3TEzOe++553yPkFKiUCgUisaFFmsDFAqFQlH3KOevUCgUjRDl/BUKhaIRopy/QqFQNEKU81coFIpGiHL+CoVC0QhRzl+hUCgaIcr5KxTFCCG2CyFcQohCIcQBIcSHQoj44nPDhRDzhRAFQogsIcQvQojRpe4fJISQQoiHYvMJFIrKo5y/QhHKKCllPNAL6AP8XQhxKfAV8BHQCmgKPA6MKnXvdUAOcG3dmatQVA/l/BWKCEgp9wA/ACcD/waeklK+K6XMk1IaUspfpJTjj1wvhIgDLgXuADoLIfrExHCFopIo569QREAI0Rq4AHACrYGvK7jlYqCQ4AphJsFVgEJRb1HOX6EIZYoQIhdYAPwC/Lf4+L4K7rsO+EJKGQA+Ba4UQphrz0yFomYo569QhHKhlDJZStlWSnk7cKj4ePOybiheJQwGPik+9B1gA0bUqqUKRQ1Qzl+hKJ+NwC7gknKuuYbg39I0IcR+YCtB569CP4p6i3L+CkU5yKDm+b3AY0KIG4QQiUIITQhxlhDi7eLLrgP+AfQ85usS4AIhRGpMDFcoKkA5f4WiAqSUXwNXADcCe4EDwL+A74QQ/YG2wGtSyv3HfE0FMoGxsbJboSgPoZq5KBQKReNDzfwVCoWiEaKcv0KhUDRClPNXKBSKRkhUnL8Q4n0hxEEhxJoyzg8SQuQJIVYWfz0ejecqFAqFonqYojTOh8CrBIWvyuJXKeXIyg6YlpYm27VrV0OzFAqFonGxfPnybCllekXXRcX5SynnCyHaRWOsI7Rr145ly5ZFc0iFQqFo8AghdlTmurqM+Z8uhFglhPhBCHFipAuEEBOEEMuEEMuysrLq0DSFQqFoXNSV818BtJVS9gBeAaZEukhK+baUso+Usk96eoWrFoVCoVBUkzpx/lLKfCllYfG/vwfMQoi0uni2QqFQKMKpE+cvhGgmhBDF/+5b/NxD5d+lUCgUitoiKhu+QojPgEFAmhBiN/AEYAaQUr5JsMPRbUIIP+ACrpRKV0KhUChiRrSyfcoVr5JSvkowFVShUCiOW6R/Gxj5YO6GENZYm1MjopXnr1AoFA0WGdiPPHwr+LeCMAESmfAYmuPiWJtWbZS8g0KhUJSDlBKZcyP4NwJukIUgiyD/SaR3VazNqzZq5l/L5Ow/zJxJ8zm0J4ceg06i38he6Loea7MUCkVl8a8HYw8QKHXCg3R+hLC8FAuraoxy/rXI6vnreHTEMxgBA6/bxw/vzaXdSa15ce6TWGyWWJunUCgqg5EDRJqwSQgcqGtrooYK+9QShmHw9JX/wV3kwev2AeAqdLN11Q6mvj4zxtYpFIpKYz4JpDfCCRtYB9W1NVFDOf9aYsfaXbgK3WHHPS4vsz/+JQYWKRSK6iC0ZIi/HYT9mKNW0NMQjitjZldNUWGfWkI3mzCMyKUMZou5jq1RKBQ1QYu/HWnujiyaCMZhsA1FOK5FaPGxNq3aKOdfS7Tu2oLU5ins3bI/5LgtzsqICUNjZJVCoaguwjoIcRyHeUqjwj61hBCCJ799gMS0BOwJNiw2M1aHhb7n92LY9YNibZ5CoWjkqJl/LdL+pDZ8tvNNFk1bzuH9uZw0oBuderaPtVkKhUKhnH9tY7FZGHjZ6bE2Q6FQKEJQYR+FQqFohCjnr1AoFI0Q5fwVCoWiEaJi/lVESsn8rxYx7c1ZeFxezhl7JiMmnKvkGhQKRQnSyAXXNKRxEGE5DSxnIUT9mmsr519FXr79bX6a9CvuIg8A2/7cwdxPF/CfX5/CZFY/ToWisSO9K5GHbwAZANxI58dgOgGaTESI+jNJrF+vonrOnsx9zJ74S4njB/A4vexYt5uFU5bG0DKFQlEfkFIic/8SlHymWN5FOsG3Flk0Kaa2lUY5/yrw5/z1aHr4j8xV6GbZzD9iYJFCoahXBLaAkRfhhBvc39a5OeWhnH8VSEpPRNPCf2Qmi06T5ikxsEihUNQvdKCs9uT1q4+Hcv5VoM/wHpht4aJsuq5z3o3nxMAihUJRr9DbgZ4R4YQd7JfVtTXlopx/FTBbzLw490matc/AFmfFHm/DGmflgglDVbaPQqFACIFIfhVEEog4gjk1drD2QziuiLV5IQgpy1qixJY+ffrIZcuWxdqMiEgpmfbmLN66byK6SUdKSSBgcNMzV3HJX0fG2jyFQhFjpHSBezYYWWDuDeYeCCHq5NlCiOVSyj4VXadyE6uBs8DF2w98XNyhy1dy/INHP+PUc06mwyltY2ecQqGIOULYwT461maUiwr7VIMlM1ag6eFvcZ/Xr7p0KRSK4wLl/KuB1+1DRujSJQ0DjytSr0+FQqGoXyjnXw1OO68ngYARdtzqsDLwUiXfrFAo6j/K+VeD1OYp3PzsVVjtFjRdQ4hge8YBl/TnlIHdY22eQqFQVIja8K0mF989kp6DT2bOpF/wuLwMuLg/PQadWGc7+gqFQlETopLqKYR4HxgJHJRSnhThvABeBi4AnMD1UsoV5Y1Zn1M9FQqFor5S2VTPaIV9PgTOK+f8+UDn4q8JwBtReq5CoVAoqkFUnL+Ucj6QU84lY4CPZJDFQLIQonk0nq1QKBSKqlNXG74tgV3HfL+7+FgIQogJQohlQohlWVlZdWSaQqFQND7qVbaPlPJtKWUfKWWf9PT0WJujUCgUDZa6cv57gNbHfN+q+JhCoVAoYkBdOf+pwLUiSH8gT0q5r46erVAoFIpSRCXPXwjxGTAISBNC7AaeAMwAUso3ge8JpnlmEkz1vCEaz1UoFApF9YiK85dSjq3gvATuiMazGiI+rw9XoZuElHhVJKZQKOoEVeEbQ/w+P28/8DHfvzuHgN8gMTWB2/97AwMvU/pACoWidqlX2T6NjVfveo/v352Dx+nF7/WTs+8w/3fDq6z8eU2sTVMoFA0c5fxjhLPAxeyPfsHjDJWA9ji9THrq6xhZpVAoGgvK+ceInP256CY94rl9Ww7UsTUKhaKxoWL+VSQQCDD/q8X8+P5c8rLzOaFfZy5/cAzN2zet0jgZrVOJJKknhKBLn47RMVahUCjKQM38q4BhGDw+5nleuP5VVsxZzZaV25n+1myu73IXX/9nepXGstgsjPv7JdjirCHHrQ4L1z55WTTNVigUijCU868Cy2etYtW8dfi9/pDjRkDy/sOfsHtz1erWLn9gDH95fTytu7YgLtlB73NP4T/zn6L9yaoBvEJRm0jfOqTzc6TnF6QMxNqcmKDCPlVgyYzleJyeiOcCgQALvlnMlX+7qNLjCSE495qBnHvNwGiZqFAoykFKL/LwHeD9HZAgdBBJkPoZQm9cQsNq5l8F4lPi0fTIRVhSBr8UCkX9RRZ9AN4lgAtwgywCYz8y955Ym1bnKOdfBYZdNwjdHHmxZDLrnHlR30qNk3+ogAXfLmHFnNUE/I1zyalQxATXl4C71EEDfH8ijfJakjQ8VNinCrTo2IwHP7iD5699Bb/vqNM2WUyMe+xS2nQLa1EQxuSXp/Pew59isphAgsVm5tmZf6dTz/a1abpCoQCQvjJOiHLONUyi0sO3NqjPPXxdRW5+/mwhGxZvIqNtOmdfdnqlHP+6xZt4cOg/wgq7ktMT+XzP22Xm/SsUiuhg5D8Lzk+A0L9B9A5o6T/GxKZoU9kevmrmXw3scTYuuHkIF9w8pEr3ff/2bLyu8NmF1+1j1S/r6DXk5GiZqFAoIiDib0d6fgbjIEgnYAVhQiT/X6xNq3OU869DCnOLiLjSEuAqcNW9QYoGz+78PHyGQbukZKUYCwgtCdKmgfsHpHc56G0Q9osReipA8O/Ttxr8m8HUFsx9GuzPTTn/OmTAJaezfPZq3EWh6aJ+r59TBnaPkVWKhkhmziHu+H4au/LzEEATu4P/nTeCU5u3iLVpMUcIK9gvRNgvDDkuDSfy8M3gW3vkQtDbQJOPEFpyDCytXVS2Tx0y8PLT6XRq+5KqXqEJrA4L458fR0JKfIytUzQUPH4/V37zBZk5h3D7/bj8fvYU5HPtlK/JcTljbV69RRb+OzjrxxX8kk7wb0HmPxljy2oHNfOPIlJKdm/ai98XoG33Vmha6LvVZDbxfz89wfyvF/Pr14uIbxLPiPFD6da3c4wsVjREftq2BY/fH6YdFZCSKRvWc+OpvWNiV73HNYWwjWB84J6NlAGEaFgJGcr5R4nta3fxxEUvcGjvYYQGjgQHj372V045OzScYzKbOGfsWZwz9qwYWapo6BwsKsJnGGHH3X4/+woLYmDR8UJZqZ4BwAAalvNXYZ8o4HV7uW/wE+zN3I/H6cFd6CFn32EeHfEMhw/kxto8RSPj1OYt0CNsUjrMZk5rUXFKcqPFMpBwlyjA3AshzLGwqFZRzj8KLJq2HJ8nfNZgBAxmf/xLDCxSNGZ6NG3G6a3aYDcdXdhbdZ0OKU04p72SCy8LkfgwaE0Ae/ERG4gERNJTsTSr1lBhn2rgKnLzzX+m89Mnv2Iy6bTu1iJM6ROC+fuH9jSuknFF/eCNEaP55M9VfLH2T/yGwYXdTuDGnr0xaWq+VxZCbw5ps5CuKeD/E/QuCMclDTLTB5TzrzIBf4B7Bz7OznW78bqDs/29W/bj94Zr9NjjbfQYfFJdm6hQYNZ1ru/Zi+t79oq1KccVQotHxI2LtRl1gnL+VeS3qcvYs2lfieMHQv59BIvNTNsTW9NvhPrjUygU9Q+1Bqwiaxasx1VYWhUwFCEEnXt14KWfn0TXG1aGgEJRn5DuuRhZ52Ps746RNQTDOS12tgQOIr2rkEZ+zGyoCmrmX0Uy2qRhsVvwukrnAx9FSsnW1Tuw2Cx1aJlC0biQ7p+RuX+lRKI5sAvyH8XAi+a4pO7skC5k7v3gmQ/CAtKLdFyLSLi/XktDqJl/FRly9QB0veIfm6ecl4NCoag5svBFwrX53VD4UmQNrdqyI/8fQcePB2RB8L/OSUjnF3VmQ3VQzr+KJKcn8dysx2jWLh2rwwIRXuxCCHoMOrHujVMoGhP+HZGPGzmUXbAVXaT0gGs6ULq9qwuc79WJDdVFOf9q0L1/F9744//IaJOG2Rpa/CE0gS3Oyp2v3BQj6xSKisn3ePjfkkWM+PQjrpr8JbO2bK7T2XJU0MsoWBNJQB0VZckiCBPSKMao3wWeUXH+QojzhBAbhRCZQoi/RTh/vRAiSwixsvjr5mg8N5Z88Ohn7Nt6EF+pTB9pSAKBAOuXbIqRZQpF+RR5vYz5fBJvLFvC+uwsFu/exT0zv+ffixfG2rQqIRLuAWylDtoh/q66i7WLFNDSIp0Ay2l1Y0M1qbHzF0G1o9eA84HuwFghRCR94i+klD2Lv96t6XNjzdxPF0Qs7ALwuny8fvcHeCNU/SoUVcVvGFGdlX+57k8OFBXiCRytTXH5/by7YhnZzuNH9VPYzoPEf4HWLHhAS4X4BxGOq+vOBiEQiU8SfAkdeeGYQMQhEu6vMzuqQzSyffoCmVLKrQBCiM+BMcC6KIxdbzEiCGeFIGHn+t2qN6+i2szeksnTv85jV34eiVYrE3qdxi19+qLVcFY7b/s23P7wiYtZ11l1YB9D6oEEhAzsRxa+HtxI1VIQcTeCbWTYjF5zjAbHaKT0I0RskheFbTCkfoIsfBsC28HcCxw3g3EAo2By8EVgG4UwtYqJfWURjZ9WS2DXMd/vBvpFuO4SIcTZwCbgHinlrtIXCCEmABMA2rRpEwXTao8Bl/Tnp0nzQxq5H4vf5yexidLoV1SPhbt2cPfMGSVOOs/j4dWli/EEAvy1/xk1GrtZfAKaEBilVhOGlKQ54mo0djSQgUPI7DHFmTN+MPYi8/4O/s2IhHsj3hMrx1/yfPPJiJRXgGCqt8y7H+meQzAbyYQsfAOZ9AyafWRM7TyWutrwnQa0k1KeAswGJka6SEr5tpSyj5SyT3p6eh2ZVj3GPz+O9NZpWGzhG0u6WadLn45ktCn/M3jdXia/PIM7+v6Ne85+jDmT5le8olA0Cv69aGHY7Nzl9/PuH8vwBiJPOCrLdT1OxVKq+FAXgmZx8ZyS0bRGY0cD6fyweCP12M/vgqL3kfV8ExUAzzzwzCHYFEYSzDxyQ94jSKMwpqYdSzSc/x6g9THftyo+VoKU8pCU8kgu1LvAcd9NIiktkffW/YdbX7oOkyV01qFpGve+fWu59wf8Ae4/5x+8/8inbFq2hTULNvDybW/z4o2v16bZiuOE7bmHIx43pOSwq2b9nrunZ/D8kOEkWCzEmy3YTCa6paXz0UWX1o+iJO9iwpuqECyg8tf/RArpng4ywv8jYQLvoro3qAyisVZaCnQWQrQn6PSvBK469gIhRHMp5b7ib0cD66Pw3Jhjtpj5/ceVGIHQ2bo0DD59djK9z+3Bly9OJT87n15DT+G6f1xB07bB1cDi6cvZtmZnSDGYu8jD/K8WccWDY2jbvTWKxkvnJmn8vnd32HGTptHEbo9wR5Cdebl8sfZP9hUWcHabdpzfqQtWU/if+aiu3RjeqTMbD2WTYLHQLjmlRvbKwEGkcyJ4fwe9HSLuRoT5hOoNprcqbqdYapNb+kGL/cqkQoSJ4OZvpE36+iOqIKKRRSCEuAD4L8FWN+9LKZ8WQvwTWCalnCqEeJag0/cDOcBtUsoN5Y3Zp08fuWzZshrbVpsYhsEFtrEE/OGhGt2sY7aYSpq1a7pGXKKDByfeyaG9OSz5fgWLpoZ/Povdwq0vXceoW4fVuv2K+svSvbu5bso3IaEfu8nEX/qdzi29+0a8Z972bdz+/VQChoHPMHCYzLRJTubry8biMNde3rsM7EFmXxjseYuPYEDBikj5H8I6sOrj+f5EHrqa0OpdM5h7oKV+Gh2jaxHpWYw8fAvBsM8xCAciYzFC2CLeFy2EEMullH0qvK6+FnYcL87/fOvYsJl/mQjQdQ2TxUTAF4i4WWxPsPPAB3cw4OJIe+aKxsRvu3byzIJf2HwomzRHHHee1o8rTzolYmjGbxj0ffcNct2hcgc2k4m7+57OLX0ivzCigZH7ALinEWx1eAxaU0T6/GqFkgzXLMh/DKQb8IHeFuLvQ9iGVDs0JQNZYOwDvQNCq91kDCP/eXBOKv5OByQi5XWE9cxafS5U3vnXnzXIcYimafQaejLLZq0KWeHpugaaIFDauUsI+A0C/rJ1f8xWk5KBVgBwRus2TB97TaWu3ZidhS/CRrDb72fapg216vzxLiDM8UOwwtU4CHpTpGchsuAZ8GeCaALx4xGOG8p05Jp9GIalN+RcCYEsCOyG/PuRRW2gySSEllhp844Kr/1SLLzmQ8bdjIj/S63tcWiJDyEdl4PnVxBxYDu3SjbXBcr514D8QwVsX7MrLLQX3yQeZ76LAJXLyrDH25BSkpyRxD+nPIjF2vD6hSpqF7vZHJa6eey5WkUkAYcinJAg4pDeFcjDt1ESxpGHoPBlpFGASLi77HELnobAXkp0eiTg34IseBaR9GylzZN5jxULr3lBFk+8it4NqoIKAdYzgy8iPbXSY1YGYWoPpvpb56O0fWrAp89OJjcrXLtbN+mcPKBbmO5PJDRd8MwPj/K/Rc/wUeartD+5bW2YqmjgtE9OoUVCYpjOoN1kZtwpPWv34Y4bg7IKIVjAOhihxSMLXyZMfVO6wPk+R5IApfQHY/2+oMaQlBLcMwkXaPOB6/ujw/g2YOSMxzjQFyN7BNI1I/QxhhPcPxIuvOaBwDrwr4WiD5HZI5CBg9X8ARyfqJl/JcjNyuPH9+aydfUOOp7anrMv7U/Ttuks/Pb3iBIPznwn418YxydPTWbJ98vRdA1pSHxeP9I4OjsTArr06cRJZ3ary4+jOA5Yn3WQt5cvZcvhHE5t3oIJvU+jZULZYQMhBG+NHMPVk7+k0OtDIgkU9+4d3aV2f7+E4zJkIBOcn4KwBmfXll5HZ+f+zLJvDmQhA5nI3AcBH0gD9AxIfoOIoaTgTQDBF0XOlcVplRL8uci8R5DGQbS4G4KXykIiSu+G4AWZjyx8E5H0eKU/9/GO2vCtgB3rd3P3mY/ic/tC2jXqZh0hRETnb7aambTtNZo0S6Eor4jCXCe6WeOufo9QmFuEu8iD1W7BbDXz3wVPqbRORQi/7tzOLdO/w+P3h0QUL+ranWeGnBsxdfMIfsPgt107yXYW0bt5S9om113zcWnkgG8T6C0QpqMV+kbOtcW5+6UQdmgyGQ5dROjKQIDWBPQTwPcboS8BDaxD0FJewzh8J3hmExZ3FXGIjCUIYUFKA5k1AIysij+A3hYtfXblP3A9RW34Ron/3voWzjwnpd+RYZu5xegmja6ndaRJs2DedFxSHHFJwZL59ze8zM+fLmDD0kzadGvJsOsGkZiaUKv2K44vpJT8fe6ciNo7321cR57HzbujLyrzfpOmcXbbdrVoYdkIrQlY+4cfj78bmbOSUAdvB8cN4J5KaCUvgAxm+djOAd9SjoZsNBDJiMTHgt9GqgU4cn9gP5jaIISGTHgS8u4rHqecya4W3Zh/fUc5/3II+AOsXbgxzPFHwmTWMVvNpLdO5e9fRNYfscfZuGD8UC4YPzTKlioaCvkeD/sKCyKeM4CFu3ayLfcw7ZNT2JOfz4Kd24mzWDinfcdazeWvCcLSG1LeQOY/DYEtQRnkuPGIuBuR+Q8T7vwJhn+K3iN01i85sokMBIvBjP0R7g0EVw7FaPZzkfpHyKK3gg1gZAEYh0o91x4Uj2tEKOdfBlJKtqzajtBESJy+LKwOK8/88Cgn9OtcP0rkFcclNpOpXNVOi66xNSeHb9ev450VS9GEQBMaMIv3Rl9M35bVV470BgKsz84izmymY0qTqP4eC+uZiPTvw09YzkK6fgRKS0n7I3TkkoAb6fwWiRP8kepEbWAfFZbHLyw9EZY3gqMYucjDd4DvTxDm4B5F/C0IW+MqrFTOPwK5WXk8fN7T7Fy/u9IFXFaHle79u9SyZYqGjtVkYlSXbkzZsI5AhCWnN2BQ6PPy3h/LQvT4AcZPm8LvN99a7p5AWczM3MxDc2ZiSElAGrRISOSdURfWWPahQmzDoej94k3hI2EhO5hPBN+a8OulC1xfBPP+S1fQYgb7xYjER8p9pNCSEamfIP07g3sBpq61XvRVGaSU4F0SDGfpzYO1AbVYDaycfwSevfp/bPtzR0TZhrIYecu5tWiRojHxz0FDOFBYyIJdoT1qLZrGGa1b8+uO7RH3BCSSxbt3MbBdaG757vw8ZmzeiMfv55z2HTmplHLnlpxD3DPr+5Axtx7OYdzkL5l/dS+EcQDMJyNM0U9DFsIMqZ8iiz4F9wwQNoRjLFJrCrnjI4TobRDYCmE1NAIsp6MlPVn5Z5vaAPVDOl5KDzLnevCvD65EhBXy/wWpnyFMHWrlmcr5lyL/UAF//rquSo4/tWUTeg87pRatUjQm7GYzH110KQt2bue5BfNZn52Fw2zm8hNP5sEzBnD7jKllblt6AqEvha/XreGxn+cEZ/SG5M3lS7nyxFN4fODgkms+/XN1WHVwmq2IiQO+IJDzOrqmgfQjbcMRSc8TbN4XPYSwIeJvhPhjYu5SIvX2xSqeR0I/WnGFriB81i/LTymt58iid4tXOsWb29IPOJG59yDSvquVZyrnXwpXoRtNK7v2zZ5g528f38lPnyxg+axV+Dw+nHlOHjr3KTr37sAz3z+KzWGtQ4sVDZWz2rRj+lXtSr43pOThn2bxa6kVwRH8AYPTWx2dyea4nDz285yQ8JDb7+eLtasZ0aULvZsHG6DvKyoICzH9t/9PtIrLQxfy6OzbPRtp7omIG1ejzyWNwqDevXSC5ayQtNAjCCGgyUfIgqfBNQPwg+VMSHgQDl0aYVQBpuM47OqaTHghmgxWNAcOIvSMqD9SVfiWIqNNGgnldODy+/z0GHgiJrMJr8uL1+3DVejGXeRh4++ZvP9I/VcdVByffPrnKqZv2oA/QsMfm8nEU4OHkGA9OvH4Zfv24Ky9FG6/n2kbj26WDm7XAYfpaKZQE6uLnqkHMGul1xcucH5So88gPb8hs85E5j2JzH8OmT0Co+DfEa8VWgJa0nNozf5ENF2H1uRdNHMXcIwDSlcUWxEJd9bItphSZkphWdLQNUc5/1IIIbjvvdsjSjNY7BbOvWYg9gQ7879ehK9UgZfX7WPWxHkl3xuGwcqf1/D9O3NYv2RzVJtwKxoWS/fu5obvJjPko/d5aM5MduaFd6yauOoPXBFi/ZoQfHLxZVzS/aSQ40KIiLWtoiRDKMjoLt1olZSEVQ8GAqy6HynLyPSR5Td4l4E9Qc0c/5bwc9KNzL2juCLXSXCD1wNFE5HepcdcF/5yOzbzSCTcD/F3BFNGEWDqhkh5B2E+jkOv9jFAhIiB3gah104PAxX2iUCfYT14a+X/8dZ9H/HHz3/i8/ix2MyMum04Nz1zFdKQBPyRi7yOvBDysvO5b9ATHNyZHUwVFdC5Vwee+UGFhRShfL95I/fP/rFkw3VnXi4/ZG5iyhVX0yHlaL6601da5yaIRdfJiAtfrQ5u1z5ixpBV1xnT7WijFavJxDeXjWXS6pXMyNxIvKUVUmsClK6KNUMZ6ZBS+pF5D4J7dnH6pB9p7oFIeROhFeflexYRWWrBjXRNRgYOQeHzENiDFE0g/naE45qwlFMhNET8BIifENGW4xERNx7pmQeBbcUvWDsIMyI58qooGijnXwatu7bkX9MfRkqJu8iNxW5BP6bvaccebcn8Y3vIPZqu0Wd4DwD+M+Etdm/eF1IJvGFpJhOf+IJb/u/aOvkMivqPISVPzJsbkmkTkBKnz8eLvy3g9RGjS44Pad+BL9b+ia9U2CfFZqdFfHileJLNxovnnsf9s34seZYQgpu4pozvAAAgAElEQVRO7UOPps1Cro2zWLilT98S6WfpbY/MGU+wEMoH2EFLQcTfFvFzyKK3wT0H8MCRjq2+P5D5TyKS/6/4Ki9lVuT6d4LrQY4qf+ZAwUtI/I2i+EpoDkj9Gjzzkb7VCL0Z2EbUagqq0vapBm/c8wHT3pqN7xitH5PFRFySg9d+f47UFimMjB8XUQIiMTWBb7Ler0tzFfUQXyDA3O1bWXfwIG8uX4rPCP9dSbU7WDr+qLPNdjoZ9dnH5HncuP1+TJqGWdN5e9QYzmxddhpmlrOImZmb8QQCDG7XPmQ1UR7Svxvp+jxYFWvpj7BfeHQWXwrj4Jll6OdYEE3/QAgz0ihAHjyDsI1N4QiGcIw94beLRETG7wihItSVpVFr++xYt4v927Po2KMtaS2jq9exZuEGZrzzU4jjBzACBq8seoambdNxO91lVgVHEoJTNC725Odz2defUeDx4PEH8EeIcQOkOkI3NdMcDmaOu54v1/7Jot07aZecwjWn9KywECvdEVctWWdhahWMr1eGMvcCAsV562aEloC0Xwmujzi6ArCBZRB455U9riwCoTSwok2Dcv5FeUX8fdRzbF6xNZiN4/YxdNwA/vrWLeWmb1aFeV8sxOsK78RltVtY+9tGMv/Yxit3vRfR+esmjf6jekfFDsXxy32zfiCrqChiPP4IdpOJW3uHt/JMtFq5uVcfbu5V4cSubrH0A8/c8OOmjiWrBaPwNXB9yVHHrwVn/YlPwuHrwb8u/H4Rf1TLRxFVGtRa6sWb3mDj75l4nF6K8pz4PD7mfraQKa/8EJXxNy3fws71e5BlpF7t2byf5697hcP7c8Mye2xxVpIzkpig4v2NmgKPhxX795bp+OMtFuwmE7ef1o8xXY+PPg9G/r/B83OpoxoIOyLxXwBIIw8K3yS0OMsIzupdnxavMEpLGdgh/m4V8qklGszM31XkZvH05WFhFY/Tw5RXfuDiu0dUe2yPy8PfRz3H+sXF6ZoR/m4DAYP1SzbhcYavCoQmuOaJyxh+wzn8+vVifv5sAbY4KyNvHUb/kb2VEFwjQiLLbC3SxGbno4supV1ySr1V6CyN4ZoBzjcjnBHQZCrCXLwX4VtbXJ0boaOW51dE/O2Q8j9k/vMQ2AFaBsTfhea4pLY/QqOlwTh/j7P0L9VRivLLz02uiI//8RXrftsY0swFgrnHFpsZCTz4wR1MfPLLiPfb422cMvBEHh/zPFtWbi+xdfX8dYyYMJRbX7q+RvYpjh8SrTa6paWz5uCBkDmEWdMZ2bUb3dOjX8kZTaR0FTvyJIS5MxS8VMaVgaDcwhHnr6UXSxaURgRFzABhHYRIH1QbZisi0GDWU0lpiaS3Cs9i0HSN0847tVJjeFweZk2cxxv3fsgP7/2EqyiYdvbjBz+HOX4IzuhvfOYqPt7yKgMvP4MTz+yKpof/SAO+ALs37mHrqu0hLyl3kYdpb8ziwI5KdBlSNBheGnY+iVYr9mL1zTizmdZJidzb/4xKj/Hrzu2M+uxjTnz9ZYZN+oAfMzfXlrklGM4vkQf7Iw9PQB66FCN7ZLEufhkEtpb8U5g7FzczLz3ftCIc19WKvYryaVCpnqvmreXRkc/i9/oI+A0sNjO2OBtvLH+ejDbp5d6bvTeHu/o9TGGeE3ehG1u8DZvDyqtLnmVCj/tw5pcWkgq+WL7L+6ikaGvf1gPc2usBXIVHs31scVYuf2AM+7cfZNaH88LGsMfbuPOVmxh23aAqfVbF8U2Bx8O0TRvYkZdLj6bNOLdDJ8x65QTTft2xnVtmfBdSG2AzmXjmnHO5sFv3WrFXev9A5lxHaDcuDTATrklTTOqPaOajipQykIXMvRN860CYgvcnPI7mGFMrNjdWGmWqZ49BJ/LmiheY/PL37Nqwh5PPPoHRtw8nOT2pwntf/+sH5OzPLdHvdxe68bq8vHz7O/Qf2ZtfvvwtTOmzc68OIdW6zTs05bXfn+P9Rz9l9fz1JGckcsWDFzJ03Nm89/An6GY9LPdfaEK1cmyEJFitXHVyj2rd++zC+WGSzm6/n+cXzmdM1xNqZQ9JOj8m3Mkf+XuIoD8jmiG00IbzQk9HpH6BDOwBIw9MnRDCEnVbFZWjQTl/gFZdWvCX126u8n1Lpi8Pa9xiBAyWz1rFpG2vsfLntTjznLidHix2C2aLifveC692bNWlBY9/FZ4bfd5NQ5jyyg9hzt9kMSk5aEWV2HY4J+LxLKcTbyBQrWYuFRLIImKmgzAVi5KVSnSQ+5FZgyHlVYR1YOgtekvQW0bfRkWVaHDOv7pEitUDaJogrWUqH2x4mTkfz2f975tp170VQ64ewNqFG1n47e+07d6K00f3wWQu+8fZqnNzHpx4Jy/e9DpCBFtDxifH8a/pD2O2HB+ZHYqasWLfXt5fuZwDhYUMatuea3r0JNFa9U5NzeIT2BFB+C3RasVSydBRlbEOBt8qQsM+BButU9YzPcjceyBjsZrh10MaVMy/Jrx00+vM+eTXkFRRk1nnzIv68ffP7wm59tC+w/zl9EcoyCnEVeTGHm8jOT2J/y16usIQk9fjY8OSzVjtFjr37hC14jNF/ebrdWt4Yt5PuP1+JEFxtVS7g+lXXUOyrbQ8cflM3bieh3+aFaLwaTeZuKf/mbVW/CWNQuShiyCwn6PhHzvobSEQqZduMSIekfwqwlr5zWxFzahszD8qnkcIcZ4QYqMQIlMI8bcI561CiC+Kzy8RQrSLxnOjyS0vXUfrri2wx9swW03YE2w069CUu169Keza/93+Dof25uAqdIMEV4Gbg7uyeeOeDyt8jsVq5pSzu9P1tE7K8TcSPH4///hlLq5ixw/gCQTIdjl5/4/lVR5vdNcTePzsc0i1O9CFRqLVyt39zuCmU2uvelxo8YjUbyH+TjD1AMsgRMrr4LiEcG39Y5FEVvKsHlIaSPePGIdvxzh8F9IzT0mlV5Maz/xFsKfbJuBcYDewFBgrpVx3zDW3A6dIKW8VQlwJXCSlvKK8cWMh7GYYBivnrmH72l207tqC3sN6hDloKSXnW8dGlHS22C3MKKpZswtFw2P1gf2M+/YrCr3hBYDd0tL5/qrqVX3LYvVPu9mMFqNCQWkUIrOHgZHD0Q3gYxAJiIxFUQn7SCmRuXeB51eOVgrbwX5RlXr3NnTqMtunL5Appdxa/ODPgTHAsUIdY4Ani//9NfCqEELIevbK1jSNXkNPodfQ6m3AlvXnV5hbxI/vz+XPBetp060lo24dVmHqqaLhkGyzRey+BZBqD581u/0+3l6+jG/Wr0UiubBrd27r0xd7qapfIQRxltjG0oUWD6mTkfkvgOdHghLQArCCEIjk16rs+KVRENQJki6wno3QWwRP+JaCdwGhEhEucE1Gxo1DmDpF50M1EqLh/FsCu475fjdQWpGq5BoppV8IkQekAtnHXiSEmABMAGjTJryvZ31ACEH/kb1ZPH1ZSOqnXrw/UJrsvTnc3vshnAVOPE4vv1v+YMorP/D87Mfp3v847jmqqDRtkpLpkprG2oMHQjR97CYTN5YK1RhSMm7yV6zNOljSe/edFUuZv3M7ky+/KmYz/PIQejNEyr+BfyN968D7W1CF03YeQqs4zfpYpGdBcbcvQXAl8TQy/ha0+DuRnvllqIca4FkIyvlXiXoVdJZSvi2l7COl7JOeXn9nxne9djOpLZtgT7AhNIE9wUbTNmnc9p/wSsUPHv2M/EP5JZo/fq8fd5GHl256va7NVkQZj9/Pgp07WLRrJ75A5M5uR3h75Bi6pqZhN5lIsFiw6jp3ntafwe06hFy3aPdONhzKDmm67gkE2JJziPk7ttfGx4gqwtwdEXczwnFF1R2/4QwWgZVu81j4NtK7CkQiEGkVYVKSz9UgGjP/PUDrY75vVXws0jW7hRAmIAkopy68fpPaPIUPN/6PRVOXsWvDXtqe2Ir+I3tHTPVcPGN5WHEYwN4tB8jPKSCxifqlPR6Zu20rf/1xRkmsTxcab40cQ9+WrSJenxEXz/SrrmXToWyynU5OymhKojW8nefK/fvDCrgAinw+Vh3Yx6B27UOOG1Kyr6CABKulWmmj9Qrvr0QOnnqRrsmI+NuQha+GnxaU2V5SUTbRcP5Lgc5CiPYEnfyVwFWlrpkKXAcsAi4F5ta3eP+xbFuzk8+emczW1Tvo2LMdYx++mHYntg65xmwxc/alp1c4ls1hJZ+CiOciNYlX1H/2FxZw5w/Twpz0jVMn89uNt0R06kfokppGl3L6C7VMSMBuMlFUql+vw2ymealWjXO2ZvLo3NkUeL0YUjKgTTteGnbe8fsSkJF7FAfDP16E3gyZ9BLkP8DRoIVApLxRq+0OGyo1DvtIKf3AncBMYD3wpZRyrRDin0KIIw1I3wNShRCZwL1AWDpofWHdoo3c1f8RfvnyN3as2828zxdyZ7+HWbdoY7XGG3XrMKyO0KWqyaxz2vCe2OOO0z/SRs7UjRswypi7zNyymWynk38vWsiV33zBIz/NIjOn8ovc4R07Y9H1kPmvIKj6OaJz15Jjaw4e4C8/ziDL6cTt9+MNBPh1x3ZunT61mp+qHmA9K7Lyp3AgbBcAoNnPRWQsRiS/HKwfyFiEsPStY0MbBlGJ+Uspv5dSdpFSdpRSPl187HEp5dTif7ullJdJKTtJKfseyQyqj7z6l/fxOD0YxcJshiHxOD28/tcP2LftAI+Nfo7zbVcyIu4q7h30OCt+Wl1unvFl94+m34jeWOwWHAl2bHFW2p7Ymvvfv72uPpIiyuR53HgjxPj9AYNd+XkMn/QB76xYyu97dvPVurWM+XwSC3buqNTYdrOZLy+9km5p6Vh0HYuu0yU1jS8uvYICr4ddeXlIKXl3xbIwG7xGgJUH9rEjN1j9K73LMA7fipE9GiP/OWTgYM0/fC0itGRI/DtgJRiUECDsYB0ClrOOXidsCOsAhPUMVTlcA1SF7zFIKRluviJy/10BSakJ5B8q4NgfmdAE3ft34blZj4WIvJVm9+Z9bPljG83aZ9ClT0fVwOU4ZsnuXdw09Vuc/tAwhc1k4oxWrZm3Y3vYyqBlQiLzr7+5Sv/fs4qKkEjcfj93fD+NzJxDCCFIcziwm8xsjrCiSLBYeHvkhZzWZAXkP8ZROQZzsNo27TuE3qyqH7lOkf6tSNdUkEUI61Cw9FV/L1WgUap61oTDB3KZ9uYsNE0jYEQo4LJZcDs9lH5XSkOycdkWPnz8c4ZefTZzJs3H5/Vx9iWnc8rA7iW/tK06N6dV5+Z18VEUtUzflq04q01bFuzcUfICcJjMjO7ajdlbt0QMCWU5i8h2OkmPq3w/2vS4OPyGwYAP3iHLWVQy7u78fExCYNY0fKXqB7yBAF1TkyD/KUJ1eHwgC5CFb0DiI0jnp+CaApgQjrFgv4hgvWbsEaYOiIS/xtqMBo9y/gRn5Xf1fxivyxuxctfqsNCiUzO2rd4Z8X6/18/0N2cz/Y1Z+Dw+pJTM+nAegy4/g3vfvU3NWhoYQgheu2AUP2Ru4tsN6zBpGpd1P4kh7TuyeM8uDrnCc9GllGFFWqU5WFTI2qyDNIuL54Tijl6/7NhGYfGG7rFoQqBrGoaUJbUDnRKdXNq9K4mmgwSLrUrjB8+vyJxrwbeeIy8HmZ8JnvmIlP9V+WehOH5Rzh94454PKMp1hsXuNV3DZNYZOu5smrbPYNeGL/B7I+dzl24j6S7yMO/L3xh+w2BOOuuEWrNdERt0TWNkl26M7BLaZP2Gnr15bsEvIaJrFk1nULv2xJdRjSul5J/zf+azNaux6DoBQ9IhJYUPx1zCgcJCAjI8VdhrGIzu1AWrycSO7OU812caLeIKMAkdDieCDJeSAIJ9dP0bCV0VuMAzD+lbhzDXvBlMsM91brCBu1BJDfUV5fyBlXPXRty0NQyDj7e+SfaeHB654OkyHb/QBEKIsH4AHqeXXycvUc6/EXH1yT3YmJ3F1+vXYtV1fIbByRlNeeHc4WXe8836tXy59k+8gUDJJu7GQ9mM+/Yr/IYRMe/fYTZzTvsOjOrSEXnwCZCHKdHbN8K7zgU5osIZKd/CAO8yqKHzl54FyLzHwDgICKTtfETiPxCao0bjKqKPcv4Ewzped/hMyWwx40i087cTn6LgcFHEe4UmsFjNGEgMV6jz13SB1a6yERoTmhD865xz+Uu/01mflUWrxEQ6NiknsR/4YOWKkJUCgN8w2HgoO+L1Fl2nZUIi53XqAp7ZBCWWK0rcEBB/GxjuYn2cUjn1wgxaWgVjlI/0rUcevp2QVYX7R6SRh2jydo3GVkQf5fyBC8YP5duXZ4Q0aTdbzQy5egDLZq6KWKF7BGlIPK7IS2yT2cSQqwdE3V5FbPEFAnyfuYmZmZtJtFoZe9Ip9GgWupmfERdPRlzlCo/yPWX0wI2A3WTixp69mdD7NCy6jnQfjJwbH34neH8PfpV2/ACYwTakUjZIaYB3Efi3gqkzWPoFGxQVvUtYRy884F2EDOwF32pk4ZvBngCmduC4EmEbVW82mhsbyvkD1z55OTvX7Wb57FXoZh2f109KsyQ6nNKG7L2HMcpQZIyEbtIw2ywEfAHGvzCOtt1bV3yT4rjBFwhwzZSvWXPwAE6fD00Ipm7awINnDOD6nr2qNeY57Tvw+ZrVYZk7kUiy2rjvjKM571hOpXLlOl7wLiZ8I1gHvSUi+XWEKDtV+QjSOIw8dDUY+4IvHaGD3g6afAz+bUSWdTYjC98F1zeUKHL6ciBvBTL/GWjyAcJ8YiU+gyKaqDz/Y9i2Zif/vPQlsnYfwuP0oOlaWBy/InSTzl9eH8/po3qT0jS5lixVxIrvNq7n0Z9mh+X4W3WdRTfdUuWuXBBMAx312cfkuz24A350IULUP4+ld/MWfHXZ2JBjxuHbgqqWpVssVgozZKxG0yo3+zZy7wH3TEJfImawXwjCBs7PCH/BmAnOM8vYixDJiIwFCGFBGnngXRIs7rL0U0Vc1UDl+VeDJTNWcHBXNt7iME5VHT8AAgZe1p+4pMrncytqj42HspmZuRldCC7o0pX2ySk1Gu+HzZvCHD+AWddZsmc3wzt2rvKY6Y44Zl59PZ+tWc3CXTtok5RMtrOIX3fuCNnstZtM3HFa/7D7RfIrSOfn4PoiqI9jZBVLHx9JULASDMdEeqEEECJA2X14jyKlBPcswp27D9wzEGk/IF1TQBZxdAVgB9tw8MwqZ1vCD56FGIH9UPBMcP8BgjalvIuw9KjQNkXVUc7/GGZ/NK/E8VeXZu0ycCSqzIb6wH8WL+SdFcvwBQIIIXh16RIePHMAN1QjPLOvoIAZmzeypyAfQQQ/JikzlbMyJNls3NqnL7f2CerUePx+nvxlLlM2rEMIgc1k4uEzzw5T9QQQwoSIGwdx44KmGDnIgv8GZ+jCAvZLwbsCfIvDLTd1qsLsWnL0hVL6lB+hN4fUb5AFLwZn71oiOG4C2wXg/r6cYQ2kfwMUvgF4QB7dA5GHb4KM39QKoBZQzv8YRA166mq6hsVq5p63blFFXfWA9dlZvLNi2dGZs5T4MXhh4XyGdexEy4TESo81bdMGHpo9EwOJLxCIOIG1mHT6tYze/o7VZOLZIcN47OzB5LpdZMTFY6rk76fQmiCS/glJ/yw5Jv2ZyOxLCWYGBQjuE1gQif+otE1CaEjLGcFmLSGxfR2sg4PXmNohUsJllw3becUvgEib0wHw7yR8sxhKGrXYBlfaTkXlqFfNXGLNeTcMrlJqZr8RvRh12zC6n96FYdcO5NXfn6XHILVxVR+YmbkpoviaQPDT1i2VHiff4+GhOTNxB4LKmcc6fpuuY9E0zJrGqc2asz47KwqWh+Iwm2mRkFhpx18mestghk2JXqgGwgFaRpWGEYn/AJHM0abtDtBSEYmPln9f0tNgG3XM849gh7jxBF8KEcKsUoIsrJKNisqhZv7HMObO8/j9hz9Yv3gT7qLy0+8sNjMnn30CVzxwYR1Zp6gKmtAi91QWVKoVYpHXS47LxYp9e9HLuP7ICs8XCPDz9m0s3LWTpwYN4ZLuJ9XA8tpBFr4B/i0cnXn7QeYi8x5ApH5e6XGEqTWkz0G6poN/Y7Ai2DaiwiIuIayI5OcxAg+D+9vgbF6LR9ivQFhPR7pnIT2zI7Rp9IGl4r4Ziqqjsn1KIaVkzYINLJ25kmUzV5L5x1YiVNfjSLAzMfMVktOr1qpOUTdk5hxi9OeTwqpjrbrOL9ffXGYOvjcQ4Ml5P/HthnVoQkMiMaSMuIrQhAjT3HGYzSwbfxs2U/1o1OM3DLKKishwno+QkSSdzUFNfK3yYbDaQMoA8vDN4Puj+AUgABvE34EWPyGmth1vqGyfSuAscGGymLAc01FLCMHJA07g5AEncNFd53PzSfdScLgwROY5MS2BZ394VDn+ekynJqnc0+8M/r14IRD8/yql5KnBQ8stvnpq/s9M2bi+uIdu2X15BURU79SEYM3Bg/Rp0bKmH6HGfLHmT55b+AueQIC55xeSUWYWajWy2qKMEMHMHjyzkK4fQItD2C9HWE6NtWkNlkbp/Dcu28JLN73OzvV7EJrg9NF9uOetW0hICXUK3/x3Oq5CV5i+v7vQTctO9VsTXQHje5/G+Z27MHvrFnQhGN6xM03jy3b8br+Pb9atxR2ouGLWquu4I6wGAoZBQjltHA0pWXPwAAHD4OSmzWoeyy+Dn7Zu4Z/z55bIRkzb2ZFxndZi1Y919AJMXYNNVOoBQuhgOx9hOz/WpjQKGp3zP7grmwfOeRJX4dGCmEVTl3FwRxavLnku5NoVc/7E5wl3BCarmW1rdnHSmd3CziliR6HXS6HXQ0ZcfElcv1ViUqVTO/PcngoVco4QyfFrQtAyIZEuZWj5rNq/j1tmfEeR1wsIzLrGK+eP5MzWbSv51MrzytLFIXpBr6zrzYBmu2npKCTO7APsICyI5Bei/mzF8UGjc/7T35yF31tKRMvrZ8e63WxavoUuvTuWHG/aNp3MFVvDGrj4vX5Sm9esWEgRPYq8Xh6ZO4uZWzIRCBKtFp4aPJRhVSy4SnM4cJhNeCox8z8Wk6Zh0XXSHA7eG31xxFTfIq+Xa6d8TYH3mHRGH0yYNoV5192M1wjwvyWLWLBzB6kOBxN69WFE567VThveW5Af8n2hz8roWZdwXuvdPHVWcxIdHYo3auORRiHS9S3414DeBeG4pN6sBhS1R6NL9dyxdjc+b/gft6Zp7NsauiF22f2jsZRK/TRZTHTr24nmHZrWqp2KyvOXH6czc0sm3kAAT8BPltPJPTO/Z9X+fVUaR9c0HjlrIHZT1eZEVt3EpIsu4+drb6J1UuR9oFlbMiNKNhhSMmn1SkZ9+jGT169lX2EBaw4e4KE5s3jl98VVsuNYejRtHpbt5Jc68w90Ia7JQwjHFUHHH9iHzB4GhS+C61sofBmZdS7SX2/bbCuiRKNz/t3P6BLm0AH8Pj8de4Quv7v378K979xKfEoctngbZquZnoNP5MnJD9SVuYoK2FdQwG+7doZl47j9ft5avrTK413S/SReu2A0pzZrTkZcHKdkNMNSge6N3WyiZ7Pm5c7SD7td+CPIhXgCAebt2EaRzxvycnD5fbyx7HcKqqD4eSz3nX4mdrM55AVgN5l48IwBIfsMMv9pMHJAHtHdcYPMD2ryKxo0jS7sc/7NQ/j6pWn4vf4S7R6r3UKf83rSqkuLsOvPGTuAgZedwd4t+4lPiSclQ2X41Cf2FxZg0fXi7JyjSGB7Xm61xhzUrn2JjEJWUSHnTvoQrydy5o9V17m4W8WFff1btUbTRFhijcNsJs/jjqjoadE1MnMOcWrz8N/LiuiWls5Xl43lpd8WsOrAfprHx3NX39PpnJrKmoMH6JqahlnXwTOf8GwfCb7lSBlQcssNmEbn/BObJPDasud596FJ/P7DH9gcVkbcei5j/3ZRmffoJp3WXWOfuqcIp2OT1Ig5+CZNo2/LVjUe/9Gf5+D0hssOaMV6O93S0rm7X8VFSFJKAqUcvC4EvZo1x242sysvL2yz2RsIkFFOdlJFnJCWzrujg7/Xe/LzmTB9CttyD6MLDV0Inh1yLsMTzSAjqYHqhFfjKhoSjc75A2S0TuORT/8aazMUUSDRauWmU/vwwcrlJdktuhA4TGYm9KqwzqVc/IbBvO3b8EeI1Zs1nQ/GXEyf5i0r3JQ1pGT8tClhs3tdaFx+4sm0SEhkwc4doX1/dZ3TWrSqkgZRWUgpGfftV+zKzwupTbhv9o/0u2Q4yUwlVFfHDLbhCNHoosKNCvV/V3Hcc9/pZ/LU4KF0SU0lze7ggs5dmTp2HC1q6DillBF7OwPomuC0Fq0qlY2z5uABCrzhsXuvEeDLdWvo1bwFLww9jxSbHbvJjEXXGdi2Ha9dMKpG9h/hj/37yHIWhRWleQMBXll3BphPIqjVYw/q/Zg6IRKfiMqzFfWXRjnzVzQshBBcfMKJXHxCdEX1zLpO35atWLJnd4jjNAnBsA6dKj2Ot1hSOhKe4tn+iC5dOa9TZ3bn55Nks1arKUxZZDuLIuoZGVKys8CLaPIZ+FaDfxOY2oO5t1KmbQQ06pl/blYeq+ev4+DO6KsxKhoGzw4ZRrLNhr1Yq8dhNpMRF8/DAwZWegyzpuH1h+9L2E0mxnQ9oeR7XdNom5wcVccP0LNZ84j7InaTiUFt2yOEQFh6IByXISx9lONvJDTKmb9hGLx29/v8+N5czFYzPo+PXkNP4dHP78HmqLiPqaLx0CYpmV+uu5lpmzaQmZPDiekZXNC5C9ZK1gI8Nf9nPluzGp8R6nwdZjMnpWdwaR0ogGbExXN9j158vHolruIuZFZdp2l8fLUUSKVvHfgzgxLRppPVy+I4pUaqnkKIJsAXQDtgO3C5lPJwhOsCwJ/F3+6UUo6uaOzqqnru23qA+V8vxu/zc8aY02h/Upuwaya/PIP3H/0Mj/NoHMGBBUoAABXkSURBVNZiMzN47Fnc/97tVX6movGxJz+fx+fN4dedO9CFxsguXXjs7MEkWm0l1yzft4drv/06ZCMXghvSLwwdzuiuJ6DXkrZPaaSU/LhlMxNX/kG+18N5HTtzQ89e5eoQhY/hQuZMCIaIhAhq7Zu7IlLeR2jVz0pSRJfKqnrW1Pm/AORIKZ8TQvwNSJFSPhThukIpZZV+O6rj/Ke9OZM3752IETCQhsRkMXHxX0dw49NXhVx3dfvbObgjPNRjtpn5LnciZkv9kONV1E8KvV4GT3yPXLerpDDLrGl0apLK9LHXlMyE//nLz0xctSIshdNhNvPkwHPqZNYfTYz8f4HzC4LdwI5gAfsItKTnY2WWohSVdf41nXaMASYW/3siELPOJtl7c3jz3ol43T78vgCBgIHH5WXyyzPI/GNbyLWFh4sijmEEjBr38FU0fL7buB5nqYpcn2GwIy+XJXt2lxzThCgzU16vQhplntvN1sM5EeP20UYaeUjXt8j/b+/Ow6Oq7z2Ov7+zZV9YQkBAQYQAIm4ICGoVRBEVlGurdXn0UdtetZb2tvda9d62Wu2l19bWDS0qaq/eom2xxRUXUNzYFEQRUBaVfQtLQpLJzJzv/WMGTDIzCSGZnEnm+3qePGTOnMz5hCf55szv/M73V/UcGtla/8nqWdQv/AC1UP1S0llRJn21tPiXquqBBipbgWQNb7JFZImILBCRpH8gROT7sf2W7NjRvIuwC1/8EPHE/6qFakLM/9sH9bYdf+bghOOUpUeV2OLrJiFHlTfWreHmV17g8aUfxg3lHNhnTfmug48nlg0kkODaQMRRzuobvxB7QzXhEFNefZERjz/CxJlPc/L0afzvx0tb9o00wql+A91+Orr3DnTf3eiOcTj7n/hmB012YhQmwZL2Js01edVKRN4AEjWvr7dop6qqiCT7CThKVTeJyNHAXBH5RFXjFlJV1enAdIgO+zSZvn5OEt6RKBJX6L/32yv5+K0V1FbXEg5F8HgEf3aAKQ9/3y5emTiqypRXX2Te+vVUxS6YJuIR4Zg67ZyHlnbnByedwiMfLkKJjvWrwj3jxh/SjJ5b3niN19aupTYSOXjWP/W9+RxRWMjYvv2a+OrmUWcv7P03oMHdvhV/QAOjEf8ACIyG2rep3w5CYlNDM3riYLvUZPFX1bOTPSci20Skh6puEZEeQKJ14lDVTbF/14nIW8CJwKGvon0ITp04jIemzIjb7g/4OPPSUfW29S7ryaPLf89ff/8CKxd8Tu+BPZn84/PZvGYb037yBN37dGPsFadT1NXdpe1M29lfW8vOqiq65+fHzeRZuGkj875svPD7PV6OKipmRIOWElNGjmLiwEHMXb+OLK+Xc4/pT0luXpN59gVrmLP2i7ihnupwmGmLFx528XdqP4a9P4fIVyA5kHcdnvwbITiXxAMBIbRmNuL/GVJ4O7praawdRA2QFV0ToOiOw8pi3NXSqZ6zgauBqbF//9lwBxHpBFSpalBEugKjgVZfQaJTaTE/evh73H/DoyCCOg4iwndvm0zf4+IXy+h2ZAk33XctEF3Occro29n65Q5qKmsI5AR46hfP8j9v/pKyYa17hmXSS9hxuPudt5j56Sd4Y8OGNwwbzo3DRhx8Fzh3/VqqQ/GF3wMggt/j4YIBZfzXGWclfOfYt7gT1514crNylVdX4/N4Eo7zb62sbNZrHeDUfgzl3/5mg1ZA5R9xQquRrNEkHrpxDg73iO9IKHkNrfpbtPe/ryzWGrrzYeUx7mpp8Z8KPCci1wFfAd8BEJFhwL+q6vXAIOBPIuIQ/X2ZqqqftfC4CZ179VmcfPZQ3p21iHAozKkTh9HzmB5Nft3Mqc+z6YuthILRX/Da6lpqgalX3seMlffZUFAHdu8H7/Lsik+iC7jE6uy0xQvpmpvHpcceB0BBVhY+jyeuN0+238+dZ45t9TuLAXoWFCa8KOwVSdqwTp39UPMqONvAfzwETq0/HLP31sQHC76C5t9M4rV8s5Hscw4+Ek8xkn99M74Tk65aVPxVdRcwNsH2JcD1sc/fB45ryXGao2vPLlx0c/PWAJ03872Dhb+u7V/vZMfGXXTr3bW14pk0EnEc/rx8GTUNLt4eGFo5UPwvKhvMw0sWxbddVhjXjDYPzeH3ern1tDP49fx59RrW5fj9CbuIauhztPxy0DBQHR3S8ZVB56cQid17EPky+QHDa6Hgp1BxLxAi+ocgG3IuBH/z3rWY9iEj7/BtyOtL3LNcFXx+62feUQUjkYO9dRraWVV18PPeRUVMHXsOP3/ztXoLoTxy/qRm3STVXJcNGUr3/AKmLV7IlsoKTjmiJz8acSp9iuOXENU9U0DrLN2oVRD6DN3/BJJ/Q3Sb5Nbfpy5fXzz+c9HAKLRmNmht9Izf+vx0WFb8gQnXj+XPv3qOYJ05/uIR+h7Xm87dba3ejirH56M0P5/NFRVxzw3p1q3e44llgxjTtx8fbPgan9fDqF5HHnKLh5aou7BMMhrZApGNCZ4JRpdmPFD8866Dyj/E7+bpjMc/AADxD0D8P2thatMe2Pws4OIpEzju9EFk52Xhz/KTU5BDp9Jibv/LT9yOZlJIRPjFGWeRXaeIC9E/CredFt+4LT8QYFy/Yzirz9FtUvhbmyf/Bsi6oMHGLtD5H+4EMq5qUXuHVDrc3j6HS1VZvXgNqxatoaRXF0acfxI+f/v7BTfxVLXRoYsFGzdw/8IP+HLPbgaXdOMnI0dxbLdk9ysmt31/JU8u+4gPt2ymX6fOXHfiyfSrM+8/lZwd50Gk4ezpbMi/IVr06+7rVELw/dhQT/82yWfaTpv09kmlti7+pmMJOw4PLvqAJz9eSkUwyLHdSvnVt8Zw0iGuh7umfBe/e/9dlmzZREluHjeeMoILBwxMuv/Xe/cwaebTVIfC1DoRvCIEvF4enziZkb16t9a3lVT0gu8VoCGgBiQbfAORzk8hYp1qM4kVf5PRbnvzNf6xemW9mTw5Ph+zLr2CIwuL2FJZQWlePnmBQNzXrttdzqSZT1MVCh2c+Z7j8/Gj4afyg2HDEx7vppdnM2ftmrjVsvoUF/PmVde2yUXT6FTPObGpnkMhMMou1magQy3+Nq5hOpzd1dXMWvVZ3A1SwUiEKa+8yIZ9e/GIEHaUy48bym2nfatea+UHFi2gJhyud8tTdTjM/YsWcPUJJ5Lti+/6+t6Gr+MKP8DGffvYFwxSlJ0d91xrE08e5E5O+XFMx2AXfE2Hs2HfXgLe+Cm6jipflO+iOhxmfyhEMBLmyWUfcfKj07jn/XfYF4x2rFy6ZXO9jp0HeAS+3rs34TELAomHVjwi9S4oG5MurPibDqd3YRGhJO2PG5Z0BfYFgzz20RImP/cMwXCY3kVFCb825Dh0y0vcl+eaE04kp0GRD3i8jO/Xv13ODDIdnxV/k1Jhx+HtL9fz95Ur+GrPnjY5ZqecHCYPOrZZZ9whx2FrZSUvfbGam04ZGVfIs7w+zuvXP2k3zmuOP4kLBwwky+ulIBAg2+dj2BE9uWvMuBZ9L60hXa/rGXfZBV+TMut2l3P5rOeoqg3hqBJRZfKgQdx11riUX4iMOA4PLV7Ak7FlC4eUlFIVqmXN7vJGv+5fBg7mnnPO44XVq7hz/jwqa2sBZWLZIO48c2yTZ/HbKiv5vHwnvQqL6JvgTty2pLXL0H13QPiz6N29OZcjBT9GxFaq68hsto9xlaoy7uknWL97d72hlhyfn9+MHcekskFtnunDLZu46vm/xfXyOSDg9XLDsOFMGRFtAe6osmP/fgqzssjxt6+CqeG16M7JQHX9J6QIKboLyT7XlVwm9dpqGUdjElq3u5wtFRVxY+zV4RBPL1/mSqaTe/TkuUsuY0yfvngSvPPwiYfvDP6mB6FHhNL8/HZX+AF0/3Qgwcpbuhfd8+84lY+2eSaTXqz4m5SoCYcTFlggYW/8tjKkWymPTZzM21dfz9BupQS8XnJ8PnrkFzBj0mR6FBS4lq1VhVZxsEd1nBqofBB1qpI8bzKBTUMwKVHWtQS/1wsNCn2W19fonbJtpWdhIf+47Eq2VVZSEw5zZFFRwusQjirVoRC5fn/7umHKfyyEPyfpHwDxRls8ewa3ZSqTRqz4m5TweTzce84Ebnp5NmHHIeQ45Pr8HFVczFXHn+h2vINK8/MTbndUmbZ4AdM/WkJ1KETnnFxuPe0MLhrYPoql5H0PrXk52to5Ea0FT0nbhjJpxYq/SZkz+/Tl1Suu4dkVy9lcWcEZR/ZlQv8BCW/ASjcPLPyA6R8tPriQyo6q/dw293XyAwHOTtECLq1JfH2h89Ponlsg8kWDZwOQdTriteKfyaz4m5TqXVTEz0ad7naMZgk7Do8tXXKw8B9QEw7zhwXvt4viDyD+IUjJSzhV/4SKu0GDQASyxiBF/+12POMyK/7GNFARDMYv2RizcV/i9g7pzJM7Cc05HyKbwVOMeArdjmTSgBV/YxoozMoi1++PawwHMKBL+1zPWcQHviPdjmHSiE31NKYBr8fDT0eOjmvxkO3z8e/tbAjLmGTszN+YBK4YegKF2dn8ccH7bK2spKxLF24ZfQbDe/ZyO5oxrcKKvzFJXDhgYFrck2BMKljxb2D1krXMfuhVdm0uZ8QFJzP+2jHk5KV+IQ5jjGlLVvzrmPPUPB646TFqa0Koo3z63ipmT5vDQ4umkluQuJWvMca0R3bBNyZYHeTBm2cQrKpFnWg7smBVLdu/3skLD89xOZ0xxrQuK/4xa5Z+iccT37ultrqWd2YtdCGRMcakjhX/mLyiXJxw4ht7Crsk7v9ijDHtVYvG/EXk28CvgEHAcFVNuPqKiIwH7gO8wGOqOrUlx02Fowb3orRPCRtWbcJxvulCn52XxcU3T3AxWcdUFQrxu/ff5flVK6iNRBjYtYQLBwxkUtkgOuXY9RVjUq2lZ/6fApOB+cl2EBEv8BBwHjAY+K6IpF1rRBHhrhdvpUe/UnLys8ktzCGQ7eeyWy7ilPHp04WyI1BVrnr+r/zfpx+zNxikOhxm6dYt3Dl/HqNm/IlZK1e4HdGYDq9FZ/6quhJoqs/5cGCNqq6L7TsTmAR81pJjp0L3Pt14YtX9rF68hr07Kxg4/BiKuloflNb20dbNrN61M2H7hGAkwu1z32BEr970LLD/e2NSpS3G/HsCG+o83hjbFkdEvi8iS0RkyY4dO9ogWsIMDBzenxETTrLCnyKrd+6ksbWjFeWVLz5vw0TGZJ4mz/xF5A2ge4KnblfVf7ZmGFWdDkyH6ALurfnaJn30Ke6UdIlHgIjjEIwkXmTdGNM6miz+qnp2C4+xCehd53Gv2DaToUb26k2P/ALW79lNJME7AL/Xy5g+R7uQzJjM0RbDPouB/iLSV0QCwGXA7DY4rklTHhFmXnIpZx/dj7rn/wLk+HxcPmQog0q6uRXPmIzQ0qmeFwMPACXASyKyTFXPFZEjiE7pnKCqYRH5ITCH6FTPGapq0zkyXOecXB4+fxJhx2HZ1i289MVqVJWJZYM4qccRbsczpsOTxi68uWnYsGG6ZEnC2waMMcYkISIfquqwpvazO3yNMSYDWfE3xpgMZMXfGGMykBV/Y4zJQFb8jUkRVQfV+BYWxqQDK/7GtDKN7MDZfSO67Vh02xCc8uvQyBa3YxlTjxV/Y1qRahgtvxSCbwGR6Efte+iuS1CtcTmdMd+w4m9Mawq+Bc5uoG5vIgd0P9S86lIoY+JZ8TemNYXXgwbjt2sVGl7X9nmMScKKvzGtyd8fJCt+u+QivrK2z2NMEi3q7WNMW9gXrGHWys/4ZPs2BnUt4ZLBx1KcnaZLPQZOB08PiHwFhGIbfeDpDNnj3ExmTD1W/E1a27hvLxc9+wzVoRDV4TCv+nxMW7yQv196OX2LO7kdL46IF7r8Ba34LdS8DOpA9jik4DaiTW2NSQ827GPS2h1vz2NPTQ3V4egF1OpwmL3BGv5z7usuJ0tOPEV4in6Dp3QZnu7L8RT/HvF2cTuWMfVY8Tdp7Z2vv8Rp0HlWgYWbNsZtN8YcOiv+Jq35PYl/RL3iIflCkMaYpljxN2nt4oGDCXi99bb5PV4uGFCGNLIOsDGmcVb8TVq7ZfQZDOlWSo7PT67PT67fT1nXrvzyW2PcjmZMu2azfUxaywsE+Osll7F821Y+L9/F0Z06cVL3I+ys35gWsuJv0p6IcHz3HhzfvYfbUYzpMGzYxxhjMpAVf2OMyUBW/I0xJgNZ8TfGmAxkxd8YYzKQFX9jjMlAomnaH0VEdgBftcGhugI72+A4rcGypkZ7ygrtK69lTY3Gsh6lqiVNvUDaFv+2IiJLVHWY2zkOhWVNjfaUFdpXXsuaGq2R1YZ9jDEmA1nxN8aYDGTFH6a7HaAZLGtqtKes0L7yWtbUaHHWjB/zN8aYTGRn/sYYk4Gs+BtjTAay4g+IyK9FZLmILBOR10TkCLczJSMi94jIqlje50Wk2O1MyYjIt0VkhYg4IpKWU+hEZLyIrBaRNSLyc7fzNEZEZojIdhH51O0sTRGR3iIyT0Q+i/0MTHE7UzIiki0ii0Tk41jWO9zO1BQR8YrIUhF58XBfw4p/1D2qOlRVTwBeBH7hdqBGvA4MUdWhwOfArS7nacynwGRgvttBEhERL/AQcB4wGPiuiAx2N1WjngTGux3iEIWBn6rqYGAkcFMa/98GgTGqejxwAjBeREa6nKkpU4CVLXkBK/6Aqu6r8zAPSNur4Kr6mqqGYw8XAL3czNMYVV2pqqvdztGI4cAaVV2nqrXATGCSy5mSUtX5QLnbOQ6Fqm5R1Y9in1cQLVQ93U2VmEZVxh76Yx9pWwNEpBdwPvBYS17Hin+MiNwtIhuAK0jvM/+6rgVecTtEO9YT2FDn8UbStEC1ZyLSBzgRWOhukuRiwyjLgO3A66qatlmBPwL/ATgteZGMKf4i8oaIfJrgYxKAqt6uqr2BZ4AfpnPW2D63E31r/Yx7SQ8tq8lcIpIP/B34cYN32GlFVSOxYd9ewHARGeJ2pkRE5AJgu6p+2NLXypg1fFX17EPc9RngZeCXKYzTqKayisg1wAXAWHX5Ro1m/L+mo01A7zqPe8W2mVYgIn6ihf8ZVZ3ldp5Doap7RGQe0Wsr6XhhfTQwUUQmANlAoYg8rapXNveFMubMvzEi0r/Ow0nAKreyNEVExhN9yzdRVavcztPOLQb6i0hfEQkAlwGzXc7UIYiIAI8DK1X1XrfzNEZESg7MmhORHGAcaVoDVPVWVe2lqn2I/rzOPZzCD1b8D5gaG6pYDpxD9Ep6unoQKABej01NfcTtQMmIyMUishE4FXhJROa4namu2IXzHwJziF6QfE5VV7ibKjkR+QvwAVAmIhtF5Dq3MzViNHAVMCb2c7osdraajnoA82K//4uJjvkf9hTK9sLaOxhjTAayM39jjMlAVvyNMSYDWfE3xpgMZMXfGGMykBV/Y4zJQFb8jTEmA1nxN8aYDPT/FPnS9Io0NHMAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.scatter(X_feature_reduced[:,0],X_feature_reduced[:,1],c=target)\n", "plt.title(\"PCA\")\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [], "source": [ " lda = LatentDirichletAllocation(n_components=2)" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [], "source": [ "X_feature_reduced = lda.fit(dataframe).transform(dataframe)" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEICAYAAACktLTqAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAG6ZJREFUeJzt3Xl4FfW9x/H392SFEBZJEAVkExdqXTByu+mVSgW3uNBaUFypKAXbqnVpa13o7WLVVqt0QW+1IlWxVZu6FKnFalGUUMEFl0ZAQQUispP1nO/9I9EbIcsEMueczPm8nifPk5n5/XI+zxA+z2TOnBlzd0REJFpiqQ4gIiIdT+UuIhJBKncRkQhSuYuIRJDKXUQkglTuIiIRpHIXEYkglbtkDDNbaWajd1h3tJklzGxr49dqM5tjZkc0M9/MbLmZLUteapFdo3IXgffdvRtQCHwOeAN41syO2WHcUUAfYEhz5S+STrJTHUAkXXjDx7VXA9eY2R7ADUBJkyHnAH8BujR+vyjpIUUC0pG7SPMeAkaYWQGAmXUFvgrMbvwab2a5Kcwn0iqVu0jz3gcM6Nm4fBpQAzwJPAbkACekJppI21TuIs3rBziwsXH5HGCOu9e7ezXw58Z1ImlJ59xFmncq8G9332Zm/YEvAyPNbFzj9q5AvpkVufuHKUsp0gKVu2SaHDPLb7L8yf8BMzNgb+AbjV+ljZvOAt4CRu3ws54DJgC3hZZWZBep3CXTPL7D8gJgbzPbSsM59k00lPbR7r6wccw5wAx3X9N0opn9tnGbyl3SjulhHSIi0aM3VEVEIkjlLiISQSp3EZEIUrmLiERQyq6WKSoq8kGDBqXq5UVEOqXFixd/6O7FbY1LWbkPGjSI8vLyVL28iEinZGbvBBmn0zIiIhGkchcRiSCVu4hIBKncRUQiSOUuIhJBKncRkQhSuYuIRFCnLnd3xxPb0Z0tRUQ+rdOWe2L7A3jl5/F1h+PrRpLYdpdKXkSkUad8WEdiexls/jFQ3bDCN8GWW3CysIKzU5pNRCQddM4j92238kmxf6IKts7Q0buICJ213ONrml/vG4H6pEYREUlHnbPcswY2vz7WF7Oc5GYREUlDnbLcrfuVQP4Oa/Oh8LupiCMiknY6Z7nn/TfW63bIPgDIh6yhWM+biHU5KdXRRETSQqe8WgbA8o7C8o4KNNarn8K33ADxVRDbE7p9h1jXU0JOKCKSOp223IPy6vn4xkv45OqaxPuw+VoS1BDr+vWUZhMRCUunPC3THr7lRpq9bHLzD0ms2Z/EujEk6rakIpqISGgiX+7EV7Wy0SGxAtYfTmLrQ0mLJCIStuiXe1a/YOO2XkVi/ZkkEtvDzSMikgSRL3crvJSdL5tsQd0iWHc4iZoXQ80kIhK26Jd7/rHQ46cQC3gETxw2TCTx4Wkk4joXLyKdU+TLHSDW5QRifeZD73nBJ9W/CpWHk6g8VferEZFOJyPK/WOxnIGQf277JsVfw9fuT2LbI6FkEhEJQ0aVO0Cs5/eh5xNA1/ZN3HIFiY1X6CheRDqFjCt3gFj+UGJ9l0DP3wBZwSdWP4KvPYDE5t+Elk1EpCNkZLl/LJZ/DLbnUuhyNmABZzls/yWJNQeRiG8NM56IyC7L6HIHMMsl1uNq2ONR2nUUTy1UjiBR/XxY0UREdlnGl/vHYrnDoM9SoKh9EzeeQ2Lb7FAyiYjsKpV7E7FYLrG+z0HhT9s3ccv1JCpPIlE1N5xgIiLtpHJvRqxgHLG+b0H++MBzPP4mvvFitr93BNVVa0NMJyLSNpV7K2I9p0NRsHPqBphBXmwTORuPZMOqI0kkEuEGFBFpgcq9DbHs3g1H8dkHBRpv1vDVPXsttR8MJ1FTHnJCEZGdqdwDihU9BMUvQNawQOPNICeWwD86g1eWX60PP4lIUqnc2yGW1YtY8WNY0d8hv+3H9H18FD+8yxwWLPsOifiaJKQUEVG57xLL3odYz59Dlwm4Q5CD8s/v8QReeRSJDVN1FC8ioVO574ZYj+tJ9PoTTusF//ERvAHUzMO33Iz7jo/+ExHpOIHK3czGmtmbZlZhZlc1s30fM5tvZi+Z2ctmdnzHR01POfkHY3u8RFW8S+CjeLbPxNceTGLtKBLxytAzikjmabPczSwLmAEcBwwHJpjZ8B2GXQ3McffDgPHArzs6aDrLyiugW/+l1PT6GwmyghU8gL8HlV8kUfVkqPlEJPMEOXIfCVS4+3J3rwXuB07eYYwD3Ru/7wG833ERO4+u+UPI7ruMirpJrNjSK3jJb7qYxKbr8MTmUPOJSOYIUu79gFVNllc3rmvqOmCima0GHgcubu4HmdlkMys3s/LKymiejjAz9t/nSgbvu5CqrGMJ1u8OVffhlcfg9ctDTigimaCj3lCdANzt7v2B44FZZrbTz3b3me5e4u4lxcXFHfTS6SlmRrc+t2OF10KsT4AZDr6ZxKbvs6m6moSuqBGR3ZAdYMx7wIAmy/0b1zU1CRgL4O7Pm1k+DbdXXNcRITuzWMGZUHAmidoX4aOJbYx2EjUv8V/3/YraRBaH9NmTP59+BrGYLmoSkfYJ0hqLgGFmNtjMcml4w7RshzHvAscAmNmBQD4QzfMuuyiWOxIKp7c5LoGRaDxoX7puLSPv/C2rN28KOZ2IRE2b5e7u9cA0YC7wOg1XxbxmZtPNrLRx2GXABWa2FLgPONf1SZ2dxArGQ583IO9kmnswSG08xvz396He/3/bR9VVjL7n98yt+E8Sk4pIZ2ep6uCSkhIvL8/cm2ol4lthwzlQX4GTYFtdgjVVBZwxv5SParrsNL4gJ4fyC75JXnaQM2kiElVmttjdS9oap6ZIkVhWN7z3n6DuJah7g6l/XcRz6/rhLTzLNeHOxIcfZEtNDSP37s9lX/gSPfLzk5xaRDoLlXsKmRnkjsByR1DUqwe+7o0Wx1bV17P4g4aPD7z10Xpmv7qUq488mvMOOzxZcUWkE9FlGGnil2NO4Lh9g91OGBo+NfajZ59m6K9uZtk6PflJRD5N5Z5GZhxfSsW0Szj9wM+QG4uRn51Nflbrf1w5cOL99zLpL39OTkgR6RT0hmqaWr7hI55b9S7b6uq46blniQf4dzpkzz3539LT2KNL1yQkFJFU0BuqndyQXnswpNceuDv3LP03H2zd2uacpWvXUnLHbxjaaw8eP+NscrJ2vtxSRDKDTsukOTPjj+O+3ubpmabe3vARB864hfkrdJ8akUylcu8EBvboyZILp/KlAfsEnpMAJv31Ya59+ik9+UkkA6ncO4nc7GzuOfVr/PjLx7Rr3qyXl3DgjFuZ/fKSkJKJSDpSuXcyEw46lIpplzCi716B59Qm4vzw6ac45f7Z1MbjIaYTkXShcu+EYrEYfzr9DN6a+h0O7F0UeN7L69YwfMYt3PXS4hDTiUg6ULl3YtlZWTx6xtkcM2hI4DkJ/v/DT0+v1BuuIlGlcu/kzIw7Sk/llQunUdy1IPA8B84ve5hvlD1EIpEIL6CIpITKPSIK8vJ44RsXcf+40ynIyQk87x8rV/CZ3/yK/6z/MMR0IpJsKveIGdlvAC9+Ywr79+4deE5NPM6Y2X/g+n8+FWIyEUkmlXsEdcnJ4Ykzz+WBcV+nMDc38Lw/LF3C/rf/kif1YBCRTk/lHmFH9OvP4slTGTN0aOA5dYkEFz1eRsnvZrC5ujrEdCISJpV7xGXHYvzmhFP4+egx7Zr3UU01h86cwW0vLgwpmYiESeWeIb46/CAqpl3SrlsYAPxy4QJueX4B2+vqQkomImFQuWeQWCzGPad+jX+ePYle7XhE368WLeSQ397Gn5a9GmI6EelIKvcMNKBnTxZPnsrPvnxs4Dlxd6566kmeeWdleMFEpMOo3DPY6Qd9lhcmXcieBcE+/JRw57y//JmvzLqLv7yxLOR0IrI7VO4ZrrigG89PuohvlYwMNN5puF/89/8xj9/rHjUiaUuP2ZNPfLhtG6X338uabW0/9eljA3v05LxDRzDx4EOJmYWYTkQg+GP2dOQunygqKOC5SRey5MJp9Al4n5p3Nm3khgXPcPU/5oWcTkTaQ+UuO+mel8eC8ydz5mcPoWt22/epqaqv5+E3lrFm65YkpBORIFTu0qysWIwfjRrNK1Mu5n9POpUu2a0/wzU3K4tllZVJSicibVG5S6vMjFGDh/CzY8ZQ3LWAls6q1ycS9OvePanZRKRlrR+OiTQ6af8DOHG//Xl57RomPDSH6vr6T7blxGIcUFTM/u14KpSIhEvlLoGZGYf03Ys7TzqVK/8+lw+3byPhcNTAQdz0lbGfGlu5bRvX/fMf/H15Be7OoXvtxS+OPZ7+3XukKL1IZtGlkLJL3J3K7dvokp1DYV7ep7bV1NdzzD2/54OtW2j622XAXSefxlEDByc1q0iU6FJICZWZ0aeg207FDjD37f+wobqKHQ8bHLjw0b/oJmQiSaBylw73n4/WU9XknHxT7jB/hR7MLRI2lbt0uH336E1WC59WNYPqePPFLyIdR+UuHW7s0GF0b+Z0DTScmjlqn0FJzSOSiVTu0uHysrN5bMLZ9C3o9un1WVlc/oUjKQ54F0oR2XW6FFJC0bewkAXnT+Zf777DExVv0SUnh3EHfobhxX1SHU0kIwQqdzMbC9wKZAF3uvvPmhlzOnAdDX95L3X3Mzowp3RCZsaRAwdx5MBBqY4iknHaLHczywJmAF8BVgOLzKzM3Zc1GTMM+B7wRXffYGY6PBMRSaEg59xHAhXuvtzda4H7gZN3GHMBMMPdNwC4+7qOjSkiIu0RpNz7AauaLK9uXNfUfsB+ZrbAzBY2nsbZiZlNNrNyMyuv1B0ERURC01FXy2QDw4CjgQnAHWbWc8dB7j7T3UvcvaS4uLiDXlpERHYUpNzfAwY0We7fuK6p1UCZu9e5+wrgLRrKXkREUiBIuS8ChpnZYDPLBcYDZTuMeYSGo3bMrIiG0zT6jLmISIq0We7uXg9MA+YCrwNz3P01M5tuZqWNw+YC681sGTAfuNzd14cVWkREWqdb/oqIdCK65a+ISAZTuYuIRJDKXUQkglTuIiIRpHIXEYkglbuISASp3EVEIkjlLiISQSp3EZEIUrmLiESQyl1EJIJU7iIiEaRyFxGJIJW7iEgEqdxFRCJI5S4iEkEqdxGRCFK5i4hEkMpdRCSCVO4iIhGkchcRiSCVu4hIBKncRUQiSOUuIhJBKncRkQhSuYuIRJDKXUQkglTuIiIRpHIXEYkglbuISASp3EVEIkjlLiISQSp3EZEIUrmLiESQyl1EJIJU7iIiEaRyFxGJIJW7iEgEBSp3MxtrZm+aWYWZXdXKuHFm5mZW0nERRUSkvdosdzPLAmYAxwHDgQlmNryZcYXAt4EXOjqkiIi0T5Aj95FAhbsvd/da4H7g5GbG/Qi4AajuwHwiIrILgpR7P2BVk+XVjes+YWYjgAHu/lhrP8jMJptZuZmVV1ZWtjusiIgEs9tvqJpZDPgFcFlbY919pruXuHtJcXHx7r60iIi0IEi5vwcMaLLcv3HdxwqBg4CnzWwl8DmgTG+qioikTpByXwQMM7PBZpYLjAfKPt7o7pvcvcjdB7n7IGAhUOru5aEkFhGRNrVZ7u5eD0wD5gKvA3Pc/TUzm25mpWEHFBGR9ssOMsjdHwce32HdNS2MPXr3Y4mIyO7QJ1RFRCJI5S4iEkEqdxGRCFK5i4hEkMpdRCSCVO4iIhGkchcRiSCVu4hIBKncRUQiSOUuIhJBKncRkQhSuYuIRJDKXUQkglTuIiIRpHIXEYkglbuISASp3EVEIkjlLiISQSp3EZEIUrmLSFJUbasmHo+nOkbGCPSAbBGRXbVo7hJun3Yna1ZWkpWdxRdOPoLL7/4mefl5qY4WaTpyF5HQvPHif7h+3I28//ZaEvEEdTV1/HPOc5zW+zwqlqxIdbxIU7mLSGj++OOHqKmq3Wl9bVUdV4yeTn1dfQpSZQaVu4iE5t033wNvfltdTR2L572c3EAZROUuIqHZ7/ChYC1v37phW/LCZBiVu4iE5syrx5GTm9Pstnh9gkOOHv6pdXW1ddTW1CUjWuSp3EUkNAMP7M8vnplOQY+un1qf1zWXr11eSlG/3gCs/2ADPzjxJ5zU7SxO6jaRy0Zdy/tvr0lF5Mgw9xZOiIWspKTEy8vLU/LaIpJcdbV1PDX7Xzz9wAK6FnbhhMmjOfwrhwAQr49zzn4X8+Hq9cTrEwBYzOjeu5BZb99Ol25dUhk97ZjZYncvaWucrnMXkdDl5OYw9rxRjD1v1E7bXnj832xev+WTYgfwhFOzvYYHb/4rB4wcxrARg+m1Z89kRu70VO4iklLvV6yhrmbnSyKrt9Xwxx8/RF7XPOpq6iidciwX3nwOZq28QyufULmLSEoNOXggObnZ1NfuXPDx+jjbN28H4NGZ88jvls9JU8bQe69eyY7Z6egNVRFJqUO/fBD9hu1FTl7rx5o122u57ycPcdaQqfzPhF/qqpo2qNxFJKVisRg3zb+O4y8YTeEe3ejSLZ+s7OarKZFw6mrqeL6snDuumJXkpJ2LrpYRkbQSj8cZv/dkNlZubnVcXpdcyrbMIhbLrGPUoFfLZNZeEZG0l5WVxaV3TiGvax6xrJYrqramjvq6OG8vXcm8e/7Ja8+9SaoOVtORjtxFJC2tePVdHr71cZ59aGGztynY54B+FA/ozasL3iQWMxLu9B1UzA3zrqF33+i+4aojdxHp1AYftA+X3nERN8+//lPn4WMxI69rHkMPG8Qrz75OzfYaqrZWU7OthndeW834vSdzzSk3UFO9890oM0mgcjezsWb2pplVmNlVzWy/1MyWmdnLZvaUmQ3s+KgikomGHDyQ3750I2POG8XQQwcxasKXuG3hT/j3vJeprW7+ipnny8opLZzIi39bkuS06aPN0zJmlgW8BXwFWA0sAia4+7ImY0YBL7j7djObAhzt7l9v7efqtIyI7I7SHmdRtaW6zXGjzzqKS++4qMUbmHU2HXlaZiRQ4e7L3b0WuB84uekAd5/v7tsbFxcC/dsbWESkPUYeN6LVN1w/9vdZz/DVPpN4+LbHk5AqfQQp937AqibLqxvXtWQS8ERzG8xsspmVm1l5ZWVl8JQiIju48Kaz6VFUGGjs9s1V/PrbdzE29+sZ83i/Dn1D1cwmAiXAjc1td/eZ7l7i7iXFxcUd+dIikmGK+/fmrjdu5eSpYwLPidcnmDLiCu783uwQk6WHIOX+HjCgyXL/xnWfYmajgR8Ape5e0zHxRERaVtCjgGm3fYNrHrwMiwW/odgDNzzCAzc9EmKy1AtS7ouAYWY22MxygfFAWdMBZnYY8Dsain1dx8cUEWnZkeM+x1+3zGLkcYcFnnPnFbM5Lm88Pyz9GXV10btPTZvl7u71wDRgLvA6MMfdXzOz6WZW2jjsRqAb8KCZLTGzshZ+nIhIKPK65PHjx77P75bcxP7/tW+gOfV1cRY+upjj885g5uX3hJwwufQJVRGJpBvPn8GTdz/drjmFvbtxT8XtdOtREE6oDqBPqIpIRrv891O57qHLoR3P9tiyfiun9jqXP1z7QHjBkkTlLiKR9cVTRvJk/RxOvngsOfnBn01074/+xKm9z2XT+tbvTJnOVO4iEmlmxrRbJ/Ho1tn84P7vBJ63dcM2vlo8iWtO+XmI6cKjcheRjBCLxTj69C8y6adntmve82WLKO1+FqveXB1SsnCo3EUko4y/8hTuXTGjXefiq7ZWc/6Bl/Czs39FPB4PL1wHUrmLSMbZc2Af5sUfZNwlJ7Rr3lP3PsvYnPE8dse8kJJ1HF0KKSIZrb6+njMHTuGjDza2a16fgUXMentG0h/zp0shRUQCyM7O5oH37uCHcy5t17x173zIFaOvDynV7lO5i4gAR3318zxRcx97Ddkz8JylTy/jt5feTdmMv1Ffn17n4nVaRkRkB2+/spJvfe4H1FYFf1RfLCvG/5RdyRHHjQgxmU7LiIjssqGfHcSjW+9lzPmjAs9JxBN8/8Sf8tjMedTWpP75rSp3EZFmmBnfvfOb3P7iT4NPcrjlopmc2HUiD/4itfdPVLmLiLRi/5J9mZd4kM+XHhF4jrsz87uzuHLMdLZu3BZiupap3EVEApj+yBU8GZ/DIxvu5ounBiv6f897hakjr6LipeQ/2k/lLiISkJlR0KOAy+78JrldcgPNeb9iDVMOv4Kv9Z2U1Oe3qtxFRNqpsFc35nxwB6PPPor8bnmB5mxct5lpI69i5WurQk7XQOUuIrILCrp35cq7L+aRDX+g9969As2J1yeYeUVynvikchcR2Q1ZWVnMWjGDY88+OtD4V//1ZriBGqncRUR2U05ODpffPZXZK3/Nl888kqycrBbH9izunpRMKncRkQ7SZ59ivjfrW/x+2S3k5O385Kec/By+9t2TkpJF5S4i0sH2HtqXeypuZ/DBA7GYkZ2bTU5eNqd9+wROvPDYpGQI/lBBEREJrKhfb2YuuYkN6zZRuepD+u3bl4IeBUl7fZW7iEiIevXpQa8+PZL+ujotIyISQSp3EZEIUrmLiESQyl1EJIJU7iIiEaRyFxGJIJW7iEgEqdxFRCLI3D01L2xWCbyTkhdPvSLgw1SHSGPaP23TPmpdlPfPQHcvbmtQyso9k5lZubuXpDpHutL+aZv2Ueu0f3RaRkQkklTuIiIRpHJPjZmpDpDmtH/apn3UuozfPzrnLiISQTpyFxGJIJW7iEgEqdxDZGZjzexNM6sws6ua2X6pmS0zs5fN7CkzG5iKnKnS1v5pMm6cmbmZZdSlbUH2j5md3vg79JqZ/THZGVMtwP+xfcxsvpm91Pj/7PhU5EwJd9dXCF9AFvA2MATIBZYCw3cYMwro2vj9FOCBVOdOp/3TOK4QeAZYCJSkOnc67R9gGPAS0KtxuU+qc6fhPpoJTGn8fjiwMtW5k/WlI/fwjAQq3H25u9cC9wMnNx3g7vPdfXvj4kKgf5IzplKb+6fRj4AbgOpkhksDQfbPBcAMd98A4O7rkpwx1YLsIwe6N37fA3g/iflSSuUenn7AqibLqxvXtWQS8ESoidJLm/vHzEYAA9z9sWQGSxNBfn/2A/YzswVmttDMxiYtXXoIso+uAyaa2WrgceDi5ERLPT0gOw2Y2USgBPjvVGdJF2YWA34BnJviKOksm4ZTM0fT8FffM2b2WXffmNJU6WUCcLe732xmnwdmmdlB7p5IdbCw6cg9PO8BA5os929c9ylmNhr4AVDq7jVJypYO2to/hcBBwNNmthL4HFCWQW+qBvn9WQ2UuXudu68A3qKh7DNFkH00CZgD4O7PA/k03FQs8lTu4VkEDDOzwWaWC4wHypoOMLPDgN/RUOyZdr601f3j7pvcvcjdB7n7IBrekyh19/LUxE26Nn9/gEdoOGrHzIpoOE2zPJkhUyzIPnoXOAbAzA6kodwrk5oyRVTuIXH3emAaMBd4HZjj7q+Z2XQzK20cdiPQDXjQzJaY2Y6/mJEVcP9krID7Zy6w3syWAfOBy919fWoSJ1/AfXQZcIGZLQXuA871xktnok63HxARiSAduYuIRJDKXUQkglTuIiIRpHIXEYkglbuISASp3EVEIkjlLiISQf8HW6ZD+cG/XpUAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.scatter(X_feature_reduced[:,0],X_feature_reduced[:,1],c=target)\n", "plt.title('LDA')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Observation\n", "As we can see from above that LDA projected data on new axis in such a way that class are separated as much as possible. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### How does LDA achieves this?\n", "\n", "LDA creates new axis based on two criteria:\n", "* Distance between means of classes\n", "* Variation within each category\n", "\n", "It projects data on new axis and finds mean for each class and variance for each class. It tries to maximise the distance between class means and tries to minimise the variation with each class. Using these into consideration we get a new axis." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![alt text](../../img/shivam_panwar_data1.png \"data\")\n", "Above is the data for two Genes, we want to project them on new axis with one dimension." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Criterion which we choose above to solve this is\n", "\n", "**(µ1-µ2)^2\n", "/(s1+s2)^2**\n", "\n", ",where µ1 and µ2 are mean of each class and 's1 and s2' are variation/scatter within a clas while making new axis.\n", "We try to maximise this criteria while making new axis." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Topic modelling \n", "\n", "**Topic modelling** is a method of assigning topic to each document. Each topic is made up of certain words.\n", "\n", "Consider for example:\n", "\n", "We have two topics, topic 1 and topic 2. **'Topic1'** is represented by 'apple, banana, mango' and **topic2** is represented by 'tennis, cricket, hockey'. We can infer that topic1 is talking about fruits and topic2 is talking about sports. We can assign new incoming document into one of these topics and that can be used for **clustering** purpose too. It is used in recommendation systems and many more. \n", "\n", "Another example: \n", "Consider we have 6 documents\n", "* apple banana\n", "* apple orange\n", "* banana orange\n", "* tiger cat\n", "* tiger dog\n", "* cat dog\n", "\n", "What topic modelling would do is if want to extract say two topic out of these documents, it will give two distributions, topic-word distribution and doc-topic distribution. In topic-word representation it should give word wise distribution for each topic and in doc-topic it would give for each document, it's topic representation or distribution of document for each topic.\n", "\n", "It's ideal topic-word distribution should be:\n", "\n", "| Topic | Apple | Banana | Orange | Tiger | Cat | Dog | \n", "| --- | --- | --- | --- | --- | --- | --- | \n", "| Topic 1 | .33 | .33 | .33 | 0 | 0 | 0 |\n", "| Topic 2 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 |\n", "\n", "and it's ideal document-topic distrubution should be:\n", "\n", "| Topic | doc1 | doc2 | doc3 | doc4 | doc5 | doc6 | \n", "| --- | --- | --- | --- | --- | --- | --- | \n", "| Topic 1 | 1 | 1 | 1 | 0 | 0 | 0 |\n", "| Topic 2 | 0 | 0 | 0 | 1 | 1 | 1 |\n", "\n", "and now suppose we have a new document say, ' cat dog apple', its topic wise representation should be\n", "\n", "\n", "**Topic1: 0.33**\n", "\n", "**Topic2: 0.63**\n", "\n", "\n", "LDA is highly used for this purpose.It's usage for topic modelling and has been demonstrated below. We give to it the number of topics we want to find out of the corpus. Remember it follow bow approach therefore, relationship between words are lost in this manner.\n" ] }, { "cell_type": "code", "execution_count": 89, "metadata": {}, "outputs": [], "source": [ "lemmatizer=WordNetLemmatizer() #For words Lemmatization\n", "stemmer=PorterStemmer() #For stemming words\n", "stop_words=set(stopwords.words('english'))" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [], "source": [ "def TokenizeText(text):\n", " ''' \n", " Tokenizes text by removing various stopwords and lemmatizing them\n", " '''\n", " text=re.sub('[^A-Za-z0-9\\s]+', '', text)\n", " word_list=word_tokenize(text)\n", " word_list_final=[]\n", " for word in word_list:\n", " if word not in stop_words:\n", " word_list_final.append(lemmatizer.lemmatize(word))\n", " return word_list_final" ] }, { "cell_type": "code", "execution_count": 90, "metadata": {}, "outputs": [], "source": [ "def gettopicwords(topics,cv,n_words=10):\n", " '''\n", " Print top n_words for each topic.\n", " cv=Countvectorizer\n", " '''\n", " for i,topic in enumerate(topics):\n", " top_words_array=np.array(cv.get_feature_names())[np.argsort(topic)[::-1][:n_words]]\n", " print \"For topic {} it's top {} words are \".format(str(i),str(n_words))\n", " combined_sentence=\"\"\n", " for word in top_words_array:\n", " combined_sentence+=word+\" \"\n", " print combined_sentence\n", " print \" \"" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [], "source": [ "df=pd.read_csv('million-headlines.zip',usecols=[1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Data link: \n", "\n", "[https://www.kaggle.com/therohk/million-headlines](https://www.kaggle.com/therohk/million-headlines)" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
headline_text
0aba decides against community broadcasting lic...
1act fire witnesses must be aware of defamation
2a g calls for infrastructure protection summit
3air nz staff in aust strike for pay rise
4air nz strike to affect australian travellers
\n", "
" ], "text/plain": [ " headline_text\n", "0 aba decides against community broadcasting lic...\n", "1 act fire witnesses must be aware of defamation\n", "2 a g calls for infrastructure protection summit\n", "3 air nz staff in aust strike for pay rise\n", "4 air nz strike to affect australian travellers" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 5min 6s, sys: 3.17 s, total: 5min 9s\n", "Wall time: 5min 8s\n" ] } ], "source": [ "%%time \n", "num_features=100000\n", "# cv=CountVectorizer(min_df=0.01,max_df=0.97,tokenizer=TokenizeText,max_features=num_features)\n", "cv=CountVectorizer(tokenizer=TokenizeText,max_features=num_features)\n", "transformed_data=cv.fit_transform(df['headline_text'])" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "<1103665x87637 sparse matrix of type ''\n", "\twith 5855365 stored elements in Compressed Sparse Row format>" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "transformed_data" ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 1h 33min 20s, sys: 5.73 s, total: 1h 33min 25s\n", "Wall time: 1h 33min 32s\n" ] } ], "source": [ "%%time\n", "no_topics=10 ## We can change this, hyperparameter\n", "lda = LatentDirichletAllocation(n_components=no_topics, max_iter=5, learning_method='online', learning_offset=50.,random_state=0).fit(transformed_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Lda.components_** is a topic_word table, it shows representation of each word in the topic. components_[i, j] can be viewed as pseudocount that represents the number of times word j was assigned to topic i. It can also be viewed as distribution over the words for each topic after normalization" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "For topic 0 it's top 10 words are \n", "sydney child first found national family brisbane house record abc \n", " \n", "For topic 1 it's top 10 words are \n", "court council home face final state drug could job cut \n", " \n", "For topic 2 it's top 10 words are \n", "call plan sa trump farmer test accused centre go league \n", " \n", "For topic 3 it's top 10 words are \n", "win queensland crash u school canberra perth set one service \n", " \n", "For topic 4 it's top 10 words are \n", "australia woman day attack china show death labor tasmania business \n", " \n", "For topic 5 it's top 10 words are \n", "new wa south coast cup nt price gold killed tasmanian \n", " \n", "For topic 6 it's top 10 words are \n", "say government world qld country car change help dy minister \n", " \n", "For topic 7 it's top 10 words are \n", "year nsw back election new adelaide north take fire west \n", " \n", "For topic 8 it's top 10 words are \n", "police australian interview report say hit get time ban driver \n", " \n", "For topic 9 it's top 10 words are \n", "man rural melbourne charged murder hour two open hospital road \n", " \n" ] } ], "source": [ "gettopicwords(lda.components_,cv)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Assigning new topic " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that each document is a combination of each topic. Let's see topic represntation of first ten documents." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First ten documents and their topicwise representation is shown below" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [], "source": [ "docs=df['headline_text'][:10]" ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [], "source": [ "data=[]\n", "for doc in docs:\n", " data.append(lda.transform(cv.transform([doc])))" ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [], "source": [ "cols=['topic'+str(i) for i in range(1,11)]\n", "doc_topic_df=pd.DataFrame(columns=cols,data=np.array(data).reshape((10,10)))" ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [], "source": [ "doc_topic_df['major_topic']=doc_topic_df.idxmax(axis=1)\n", "doc_topic_df['raw_doc']=docs" ] }, { "cell_type": "code", "execution_count": 88, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
topic1topic2topic3topic4topic5topic6topic7topic8topic9topic10major_topicraw_doc
00.0166670.0166670.1833320.0166670.0166670.0166670.0166700.0166670.6833310.016667topic9aba decides against community broadcasting lic...
10.0142860.0142860.0142860.0142860.0142860.0142860.0142860.0142890.8714240.014287topic9act fire witnesses must be aware of defamation
20.0166670.1833330.1833330.0166670.0166670.0166670.0166670.5166670.0166670.016667topic8a g calls for infrastructure protection summit
30.0125000.0125020.0125000.0125010.0125000.0125000.7624970.1375000.0125000.012500topic7air nz staff in aust strike for pay rise
40.0142860.0142860.0142860.0142890.0142860.0142860.4428540.0142860.4428570.014286topic9air nz strike to affect australian travellers
50.0166670.0166670.0166670.1834630.0166670.6832040.0166670.0166670.0166670.016667topic6ambitious olsson wins triple jump
60.1833330.0166670.0166670.0166670.0166670.0166670.5166670.1833330.0166670.016667topic7antic delighted with record breaking barca
70.0125000.0125000.0125000.1375000.0125000.1375000.0125000.0125000.6375000.012500topic9aussie qualifier stosur wastes four memphis match
80.0142860.7285710.0142860.0142860.1571430.0142860.0142860.0142860.0142860.014286topic2aust addresses un security council over iraq
90.0166670.5166670.0166670.0166670.1833340.1833320.0166670.0166670.0166670.016667topic2australia is locked into war timetable opp
\n", "
" ], "text/plain": [ " topic1 topic2 topic3 topic4 topic5 topic6 topic7 \\\n", "0 0.016667 0.016667 0.183332 0.016667 0.016667 0.016667 0.016670 \n", "1 0.014286 0.014286 0.014286 0.014286 0.014286 0.014286 0.014286 \n", "2 0.016667 0.183333 0.183333 0.016667 0.016667 0.016667 0.016667 \n", "3 0.012500 0.012502 0.012500 0.012501 0.012500 0.012500 0.762497 \n", "4 0.014286 0.014286 0.014286 0.014289 0.014286 0.014286 0.442854 \n", "5 0.016667 0.016667 0.016667 0.183463 0.016667 0.683204 0.016667 \n", "6 0.183333 0.016667 0.016667 0.016667 0.016667 0.016667 0.516667 \n", "7 0.012500 0.012500 0.012500 0.137500 0.012500 0.137500 0.012500 \n", "8 0.014286 0.728571 0.014286 0.014286 0.157143 0.014286 0.014286 \n", "9 0.016667 0.516667 0.016667 0.016667 0.183334 0.183332 0.016667 \n", "\n", " topic8 topic9 topic10 major_topic \\\n", "0 0.016667 0.683331 0.016667 topic9 \n", "1 0.014289 0.871424 0.014287 topic9 \n", "2 0.516667 0.016667 0.016667 topic8 \n", "3 0.137500 0.012500 0.012500 topic7 \n", "4 0.014286 0.442857 0.014286 topic9 \n", "5 0.016667 0.016667 0.016667 topic6 \n", "6 0.183333 0.016667 0.016667 topic7 \n", "7 0.012500 0.637500 0.012500 topic9 \n", "8 0.014286 0.014286 0.014286 topic2 \n", "9 0.016667 0.016667 0.016667 topic2 \n", "\n", " raw_doc \n", "0 aba decides against community broadcasting lic... \n", "1 act fire witnesses must be aware of defamation \n", "2 a g calls for infrastructure protection summit \n", "3 air nz staff in aust strike for pay rise \n", "4 air nz strike to affect australian travellers \n", "5 ambitious olsson wins triple jump \n", "6 antic delighted with record breaking barca \n", "7 aussie qualifier stosur wastes four memphis match \n", "8 aust addresses un security council over iraq \n", "9 australia is locked into war timetable opp " ] }, "execution_count": 88, "metadata": {}, "output_type": "execute_result" } ], "source": [ "doc_topic_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We saw how LDA can be used for topic modelling. This can be used for document custering based on the doc topic representation. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### References\n", "[Statquest LDA](https://www.youtube.com/watch?v=azXCzI57Yfc)\n", "\n", "[https://www.analyticsvidhya.com/blog/2016/08/beginners-guide-to-topic-modeling-in-python/](https://www.analyticsvidhya.com/blog/2016/08/beginners-guide-to-topic-modeling-in-python/)\n", "\n", "[https://sebastianraschka.com/faq/docs/lda-vs-pca.html](https://sebastianraschka.com/faq/docs/lda-vs-pca.html)\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }