{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Homework II: Logistic Regression, SVM and some computational drug design\n",
"\n",
"__Given date:__ Saturday March 7 \n",
"\n",
"__Due date:__ Friday March 20\n",
"\n",
"__Total__ : 30 pts + Bonus (One of the bonus question is on 3pts. For the other, it depends on what you can do)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exercise I.1. Logistic regression (10pts)\n",
"\n",
"Use the lines below to load the variables __HW2_ExI_X__ and __HW2_ExI_Y__. In this first exercise, you will learn a logistic regression classifier on this dataset. Recall that the logistig regression model takes the form\n",
"\n",
"$$p(t=1|\\mathbf{x}) = \\sigma(\\mathbf{\\beta}^T\\tilde{\\mathbf{x}})$$\n",
"\n",
"where $\\tilde{\\mathbf{x}} = [1, \\mathbf{x}] = [1, x_1, x_2, \\ldots, x_D]$. Consequently, we thus have \n",
"\n",
"$$p(t = 0|\\mathbf{x}) = 1 - \\sigma(\\mathbf{\\beta}^T\\tilde{\\mathbf{x}})$$\n",
"\n",
"we can then write the total probability that a point $\\mathbf{x}$ will be from class $c = \\left\\{0,1\\right\\}$ as \n",
"\n",
"$$p(t = c|\\mathbf{x}) = p(t = 1|\\mathbf{x})^{c} p(t = 0|\\mathbf{x})^{1-c} $$\n",
"\n",
"or equivalently\n",
"\n",
"$$p(t = c|\\mathbf{x}) = (\\sigma(\\mathbf{\\beta}^T\\tilde{\\mathbf{x}}))^{c} (1-\\sigma(\\mathbf{\\beta}^T\\tilde{\\mathbf{x}}))^{1-c} = p^c (1-p)^{1-c}$$\n",
"\n",
"which is a binomial distribution with probability of success $\\sigma(\\mathbf{\\beta}^T\\tilde{\\mathbf{x}})$. If we assume that all the samples are independent, the probability of observing the dataset can read as the product\n",
"\n",
"$$p(\\left\\{\\mathbf{x}_i, t_i\\right\\}) = \\prod_{i=1}^N p(t = t(\\mathbf{x}_i)|\\mathbf{x}_i)\\quad (*)$$\n",
"\n",
"We can then try to learn the parameters $\\mathbf{\\beta}$ that maximize this probability (i.e such that the probability $p(t = t_i |\\mathbf{x}_i)$ is high for every sample pair in the dataset. To do this, we take the negative logarithm of $(*)$ which gives \n",
"\n",
"$$-\\log \\prod_{i=1}^N p(t = t(\\mathbf{x}_i)|\\mathbf{x}_i) = -\\sum_{i=1}^N c_i \\log(\\sigma(\\mathbf{\\beta}^T\\tilde{\\mathbf{x}}_i)) - \\sum_{i=1}^N (1-c_i)\\log(1-\\sigma(\\mathbf{\\beta}^T\\tilde{\\mathbf{x}_i}))\\quad (**)$$\n",
"\n",
"and find the $\\mathbf{\\beta}$ that minimizes this expression."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXwAAAD8CAYAAAB0IB+mAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi40LCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcv7US4rQAAF5pJREFUeJzt3W+MHddZx/Hfk252vUX8aWWnLfUG\nx2mK7BaI8DaCF4BKS5sismmQgtxXRUW4VA4CBEV2XSEQqmS1iAiBkYggoiCo0xbCXlFoafqCConQ\nrqskjTFp45Skdilxy4tWwo7j9uHF3A3r9cy9c+/MmTlnzvcjXa139vreM3Pv/u7Zc545Y+4uAMDw\nXdd3AwAA3SDwASATBD4AZILAB4BMEPgAkAkCHwAyQeADQCYIfADIBIEPAJlY6LsBW+3cudP37NnT\ndzMAICmnTp36mrvvmna/qAJ/z5492tjY6LsZAJAUM3u6zv0Y0gGATBD4AJAJAh8AMkHgA0AmCHwA\nyASBjziMRtL+/dKOHcXX0ajvFgGDQ+Cjf6ORdPCgdOaM9NxzxdeDBwl9oGUEPuoL1Qs/ckS6ePHq\nbRcvSkePtvP4ACQR+KgrZC/8qafKt5892/yxAbyAwEc9IXvhe/eWb7/55uaPDeAFBD7qCdkLP35c\nWl6+etvycrEdQGsIfNQTshe+tiadPFnMCywtFV8feEC6447mjw3gBVEtnoaIHT9ejNlvHdZpsxe+\ntlbcAARDDx/10AsHkkcPH/XRCweSRg8fADJB4ANAJgh8AMgEgQ8AmWgl8M3sfjN71swe37LtpWb2\nSTP74vjrS9p4LgDAfNrq4f+5pNu3bTsi6VPufoukT42/R+xYphgYrFYC390/Lel/tm2+U9IHx//+\noKS3tvFcCIhlioFBCzmG/zJ3/6/xv78q6WUBnwttYJliYNA6mbR1d5fkZT8zs0NmtmFmGxcuXOii\nOajCMsXdYegMPQgZ+P9tZq+QpPHXZ8vu5O73ufuqu6/u2rUrYHMwFcsUd4OhM/QkZOCPJL19/O+3\nS1oP+FzD00cPkGWKu8HQGXrSVlnmhyT9q6TvN7NzZvYLko5L+ikz+6KkN46/Rx199QBZIK0bDJ2h\nJ1YMr8dhdXXVNzY2+m5G//bvL0K+bPvp0923B+3i9UXLzOyUu69Oux9n2saIHuCwMXSGnhD4MWLy\ndNgYOkNPWA8/RqGvLoX+cW0B9IAefozoAQIIgMCP1dpaMYF36VLx9Y474jhZJ1QbRiNpZUW67rri\ntrJCXTrQMgI/FTGcrBOqDaORdPfd0rlzkntxO3eu2EboA62hLDMVMZTyhWpD1eO28dhABijLHJoY\nSjUntaHJUE/V424+9qxiGPoCIkTgpyKGUs2qNtxwQ7OhnqrHlWbfvxiGvoBIEfipiOFknao2uM+2\nNsz2Hvhdd0mLi9feb2Fh9v1jnRqgEoGfihhKNavaULWsddlwTFkP/N57q2vSZ51jimHoC4gUk7Zo\nbpbJ3Kr7LixIV65cu31lRXrmmTBtAQaCSVt0Z5bhpqoeeFnYS0V5Zqi2AJkh8NHcLMNNkyZou24L\nkBmGdNCtzTH87esEvfjF0te/fu39Zx3SATLEkA7iVNUDf+c7r73v4qJ04kT3bayDWn8kiMBH97av\nE+ReVOpsZSa9+91xDsVQ649EEfgpGlrvsqx23l168MF+2jMNtf5IFIGfmiH2LkPVzof6YMyx1n9o\nnYxMEfipGWLvMsSyESE/GGNY5qJLQ+xkZIrAT80Qe5chaudDfjDmVus/xE5GpvIO/BT/TB1i7zJE\n7XzID8bcav2H2MnIlbtHcztw4IAHsb7uvm+f+9JS8XV9vbgtL29ebqO4LS8X22NW1e7RKMxzbT9u\nqdi37+pjtHnbv7+/NqV6PGM8lriKpA2vkbG9h/zWW5DArwrI3bvTfROvrxftXFoqvjYN+yF9IG6a\n9MFYFbwhAznl49llJwNzIfA3VfVOzMq3Ly2134aYDfEDcVPZB+N73nPta7+8XGwPGcgx9pJn+YBr\nu5OBVtUN/OEvrbBjR1FZsJ1Z+dK7ua2qWLW6ZNXxWVoqTphKzWgkHT5cvRjb4qJ0+fK129t6P1S9\nD/s6nlVLXJw8Wb1UNaLF0gqbqiY5d+/Oq9KiyqTLC5ZJcXJ4M9wmrbxZFvZSexOTsU22U3mTpeEH\nflUJ3YkTeVVaVBniB+L26qvDh68Nt+3KrrgltRfIsZVyUnmTpzrjPl3dglbpbI4/rqwU49OpVUqE\nMm1yM/TkcNvK9mfazcz92LHwE5MxjYPHOKeAuYlJ2xIpV0qE1EUQdXXsq4JsUti/973/38ZYAjk0\nKm8GhcAv01evJsb6667b1NWxX1qqH/YrK3kHXE4fcANH4JepCoN5SjHrBmaMf1Wsr7svLl7dpsXF\nsG1q89hPUvXBsrJCuGGwCPwybfUyZwnxGMdKq2rsV1bCPWdXx4GhCmSobuAPv0pnq7YqJWYpaYux\nGuL8+fLts14wfBZdVankts4NMIO8Ar+tMJglxGOrv+5Ll0G8/YpauYV9iosCoht1/gzo6hZ8SKct\nswxPxDjEUDWks3Nnf21KXSwT8zHOGSE4MaQT0CzDEzEOMZw4IS0sXLv9G98Ybm8wZK83pguEcAYt\nJgi+lo6Z/aekb0r6lqQrPmG9hyBr6YQyGhW/RGfPFsMzx4+nNXSwslI+Zj/EtYRCrxtTtR5RH8cy\ntjV70InY1tJ5vbvfWqdByUh9nPjChfLtQzy1PnSvt82J+aZ/iTBnlJ4O51wY0slVVTDccMPwJvxC\nV0q1FbJtDA3FtmYPJut6OLDOQH+Tm6QvSfqcpFOSDpX8/JCkDUkbN954Y6g5DWxXNrm3uHjtCVlD\nmPALfQ5AWxPzbZ4nwklmaWjpNVcsJ15JeuX46w2SHpX041X3TaZKZyi2B8MQLnpSpotKqTZCtquz\nkRGPll7zuoEffEjH3c+Pvz4r6UFJt4V+TtS0fR5iqOP6XVRKtTGnw/h7fjp+zYMGvpl9h5l95+a/\nJb1J0uMhnxMNDDlwUphkb2v8nROv0tHxnEvoHv7LJP2LmT0q6TOSPubuHw/8nJhXShN+Qwy1Nv4S\niemcAEzX8Xk6w7+mLWaTwvkFXI+1WkznBKAzdevwCXykh1CrxolXWYrtxCugPTGuQBqLIc/DoDEC\nH+kh1KqlNA+DzhH4SA+hVi3GxfoQDQIf6SHUJkuhBBW9IPCRpmmhNsSyTaAhAh9xahLY1KIDpQh8\nxKdpYHMREKAUgY/4NA1syjanY8grSwQ+4tM0sCnbnIwhr2wR+IhP08CmbHMyhryyReAjPk0Dm7LN\nyRjyyhaBj/i0EdjUoldjyCtbC303ACi1tsbKl6EcP16+2ihDXoNHDz9lVFpgHgx5ZYvlkVPFmvAA\nxlgeeeiotAAwIwI/VVRaAJgRgZ8qKi0AzIjATxUnFwGYEYGfKiotAMyIOvyUUasOYAb08AEgEwQ+\nAGSCwAeATBD4AJAJAh8AMkHgA0AmCHwAyASBDwCZIPABIBMEPgBkgsAHgEwQ+ACQCQIfADJB4ANA\nJoIHvpndbmZPmNmTZnYk9PMBAMoFDXwze5GkE5LeImm/pLeZ2f6Qz4mMjUbFhWB27Ci+jkZ9twiI\nSuge/m2SnnT3p9z9sqSTku4M/JzI0WgkHTwonTkjPfdc8fXgQUIf2CJ04L9S0pe3fH9uvA1o15Ej\n0sWLV2+7eFE6erSf9gAR6n3S1swOmdmGmW1cuHCh7+YgVU89Vb797Nlu2wFELHTgn5e0suX73eNt\nL3D3+9x91d1Xd+3aFbg5GKy9e8u333xzt+0AIhY68D8r6RYzu8nMFiUdlMSgKtp3/Li0vHz1tuXl\nYjsASYED392vSLpH0icknZH0YXc/HfI5kam1NenkyaI6Z2mp+PrAA9Idd/TdMiAa5u59t+EFq6ur\nvrGx0XczACApZnbK3Ven3a/3SVsAQDcIfADIBIEPAJkg8AEgEwQ+AGSCwAeATBD4AJAJAh8AMkHg\nA0AmCHwAyASBDwCZIPABIBMEPgBkgsAH0AjXjk8HgQ9gblw7Pi0EPoC5ce34tBD4AObGtePTQuAD\nmBvXjk8LgQ9gblw7Pi0EPoC5ce34tCz03QAAaVtbK26IHz18AMgEgQ8AmSDwW8ZZhwBiReC3iLMO\nAcSMwG8RZx0CiBmB3yLOOgQQMwK/RZx1CCBmBH6LUjnrMLeJ5dz2F6gyyMDv6xc8hbMOc5lY3nwP\nXH+99Na3Dn9/gTrM3ftuwwtWV1d9Y2Oj0WNsBtrWydPl5SKIORuwCMEzZ8q3nz7dfXtCKHsPbDek\n/QXM7JS7r0673+B6+FTKTNbmxHKsQyVl74HtmEhHjgYX+FTKTNbWxHLMQ0NV74GtmEhHjgYX+FTK\nTNbWxHLMf0lVvQc2xTiRDnRhcIGfSqVMX9qaWI75L6my94BZMYEb40Q60JXBBX4KlTJ9W1srJiwv\nXSq+znNsYv5Lquw9sL4uXb48//4CQzC4Kh10o6oaig9XoHu9V+mY2W+b2Xkze2R8++lQz4Xu5f6X\nVKwVSsAkoYd07nX3W8e3fwj8XOhYG0NDKZpUoTS0D4Kh7U/uBjWGz5sTXaiqULrnnnhLVecRc+kt\n5hM68O8xs8fM7H4ze0nIJ+LNia5UVSidOxdvqeo8Yi69xXwaTdqa2UOSXl7yo2OSHpb0NUku6Xcl\nvcLd31HyGIckHZKkG2+88cDTTz89V1tyWDIAcah6r1VZWJCefz5ce0LZsaPoPG23tFQM4yEedSdt\nO6nSMbM9kv7e3V876X5NqnR4c6IrZRVKZlLVr9LiYvl7M3Z0otIRQ5XOK7Z8e5ekx0M9lxR3Xfg8\nupyPYO5jNlsrlBYWJoe9NPlns+j6deIkxgFy9yA3SX8p6fOSHpM0UjGkM/H/HDhwwOe1vu6+vOxe\n/HoVt+Vl99Fo7ofsTdW+rK+n/VxDtG/f1ceu7LZ/f/Pn6et1Wl8v2r+0VHxN8fcpB5I2vE4u17lT\nV7cmge8+nDdnVYi0ERx9PteQrK/XC/u2Oh28TpikbuBzpm2EupyPYO5jdqORdPfdxVINVcykffuK\n4Y82zk/gdcIkvY/hY35dzkcMbe6jC4cPTw775eVi7Z42T0bjdUIbCPwIdTlZxsTc7M6fr/5ZqCUm\neJ3QBgI/Ql2uU5P7mjhtMgu3xASvE9rAGD4wo5WV4qzasu3PPNN9ewDG8Adusyb7+uuLHt/iIjX0\nXTlxojjeWy0uFtuBmBH4PZvnZJqt6wZduVJMID7/POsHdWVtTfrIR64eXvnoRxleQfwY0ulR1UVE\nTp4sQmXr/Y4cKRbt2rtX+uY3y4cUNnHqO5CXqNbSqSu3wK+zVknZh8I01GYDeWEMPwF1LgRetkTt\nNHVrs1lDB8gLgd+jOifTVH0oVJlUm7014FdWirNF275+AB8iaeJ1y0Sd9Re6ujVdSyc1dRZ8q1pD\nZWWlWEdlYcF9cdH9+usnrx9U9lxtr80ScoGvzbVrlpaKryzu1h4W0Eufclw8LUXTFnxraxXQOgt9\nSUU75hVqga/UAyn2D6s+F2aL/dikgsAfkDZWAV1aqhf4TX7Jq55j3g+RzTAw6y+Qmkrhw6rt162u\nFI5NKuoGPmP4CVhbK6p2Ll2a/9T9qvmCrZquzdLmAl9bzzXwikKyrZPbsUrhurB9LcyWwrEZGgI/\nMfNOrpUtvrW4WEzetrU2S5sLfNWpTkphpcg6lVh962ththSOzeDU+TOgqxtDOpM1/RO46dBQnfHW\nti5CU2cIavfu+f/8n3fseNb/l8qFS/q4eFAqxyYFYgx/ePqeXOtyvLXuJPM8bZh3X+b5f0O69Gbb\nODbtIfAHqK/JNffuP2zKwmDS5O0sPe9592Xe/zeUS2+GwLFpR93AZ2mFhNRZiiGUPi6xNxoVE3hn\nzxbj9V/4QrFY3HYLC8WqodPWJNo0775wmUHEiqUVBqjPqx71UcmxvTrpllvK73fddbNVe8y7L1xm\nEKkj8BPS51WPYrjEXlUbqv5Irar2mHdfYjgGIbG8QgbqjPt0dWMMP25djLdOG4sva8M8Y+vbH+fY\nsXpzAEMdc+YkqLSJSVukpu3qmbphTNhRIpm6uoHPkA6iMe+Zl02Hujjjk5OgcrHQdwOATU1CZ22t\nvCIn9PMOxd695RVgTEgPCz38gUpxAq6vKhiqb4Y/IY0CgT9AWxcea/PiJqH1FTp9PW9MH8p9VoCh\nQ3UG+ru6MWnbjpQn4Pqqgun6eZkoRpvEpG1+NnuMZWOxUhpj0ttPtnLvphfcxhLUswgxURzTXwyp\nGvwxrPOp0NWNHv786lzCMIUe/lZD7gVXrYu0sDD/Kp5DPVZdSfkYijr8vExbXTK2VQjrLHaW8tDU\nNFX7tn2BuLqBM+Rj1ZWUj2HdwGdIZyCqSgul+Cbg6k4qD7lcsmyi2OzaZSLqDvMM+Vh1JYdjSOAP\nRFVp4eZKmrGEvVR//HrI5ZJlVTELFWfF1AmcIR+rruRwDAn8gUipjrpuTyqlfZrH9oniV72q/H51\nAmfox6oLORxDAn8gmtRRd12ZULcnlVtteJ3AqXqtcjtWIWRxDOsM9Hd1Y9K2fXVWn+y6MoFL21Wb\ndD5AylUkCEtdVOlIulvSaUnflrS67WdHJT0p6QlJb67zeAR+u+oERF+VCUNdZjiklKtIEFbdwG90\niUMz2zcO+z+R9BvuvjHevl/ShyTdJul7JT0k6dXu/q1Jj8clDttV55KIXLYvHbxWqNLJJQ7d/Yy7\nP1HyozslnXT359z9Syp6+rc1eS7Mrs7kaA6VCUPBa4WmQk3avlLSl7d8f2687RpmdsjMNsxs48KF\nC4Gak6c6AZFDZcJQ8FqhqamBb2YPmdnjJbc722iAu9/n7qvuvrpr1642HhJjdQIii8qEgeC1QlNT\nL4Di7m+c43HPS1rZ8v3u8TZ0aDMgjh4thnFuvrkI++0B0eTiIegWrxWaCHXFq5Gkvzaz31cxaXuL\npM8Eei5MQEAA2NRoDN/M7jKzc5J+VNLHzOwTkuTupyV9WNK/S/q4pMPTKnQAAGE16uG7+4OSHqz4\n2fskva/J4wMA2sPSCgCQCQIfADJB4ANAJgh8AMgEgQ8AmWi0eFrbzOyCpKf7bsecdkr6Wt+NCGCo\n+yUNd9/Yr7S0sV/f5+5TlyqIKvBTZmYbdVarS81Q90sa7r6xX2npcr8Y0gGATBD4AJAJAr899/Xd\ngECGul/ScPeN/UpLZ/vFGD4AZIIePgBkgsBvyMzuNrPTZvZtM1vd9rOjZvakmT1hZm/uq41Nmdmt\nZvawmT0yvjrZYC5XaWa/bGb/MX4N3993e9pmZr9uZm5mO/tuSxvM7APj1+sxM3vQzL6n7zY1YWa3\nj/PhSTM7Evr5CPzmHpf0s5I+vXXj+ELuByW9RtLtkv7YzF7UffNa8X5Jv+Put0r6rfH3yTOz16u4\n/vIPuftrJP1ez01qlZmtSHqTpGf6bkuLPinpte7+g5K+IOloz+2Z2zgPTkh6i6T9kt42zo1gCPyG\nMrmQu0v6rvG/v1vSV3psS5veJem4uz8nSe7+bM/tadu9kn5Txes3CO7+T+5+Zfztwyquppeq2yQ9\n6e5PuftlSSdV5EYwBH44tS/knoBflfQBM/uyil5wsr2qbV4t6cfM7N/M7J/N7HV9N6gt42tOn3f3\nR/tuS0DvkPSPfTeigc4zItQlDgfFzB6S9PKSHx1z9/Wu2xPCpH2U9AZJv+buf2NmPyfpzyTNc63j\nzk3ZrwVJL5X0I5JeJ+nDZrbXEyldm7Jv71ExnJOcOr9vZnZM0hVJf9Vl21JH4NeQw4XcJ+2jmf2F\npF8Zf/sRSX/aSaNaMGW/3iXpb8cB/xkz+7aKdU0udNW+Jqr2zcx+QNJNkh41M6l4733OzG5z9692\n2MS5TPt9M7Ofl/Qzkt6Qyodzhc4zgiGdcEaSDprZkpndpLQv5P4VST8x/vdPSvpij21p099Jer0k\nmdmrJS1qAItzufvn3f0Gd9/j7ntUDBX8cAphP42Z3a5iXmLN3f+37/Y09FlJt5jZTWa2qKLIYxTy\nCenhN2Rmd0n6Q0m7VFzI/RF3f7O7nzazzQu5X1HaF3L/RUl/YGYLki5JOtRze9pyv6T7zexxSZcl\nvT3xHmMO/kjSkqRPjv96edjdf6nfJs3H3a+Y2T2SPiHpRZLud/fTIZ+TM20BIBMM6QBAJgh8AMgE\ngQ8AmSDwASATBD4AZILAB4BMEPgAkAkCHwAy8X+Y9usk9ukdSgAAAABJRU5ErkJggg==\n",
"text/plain": [
"