{ "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.1" }, "name": "", "signature": "sha256:d386cb093ae65da896b8f2d227892a990d38d97600ed14bc5c63eb8af80dddc7" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "[![Py4Life](https://raw.githubusercontent.com/Py4Life/TAU2015/gh-pages/img/Py4Life-logo-small.png)](http://py4life.github.io/TAU2015/)\n", "\n", "## Exam - Example\n", "\n", "### Tel-Aviv University / 0411-3122 / Spring 2015" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Important notes:\n", "* The example exam resembles the type and level of the questions in the real exams.\n", "* You have to answer all questions, in the dedicated boxes within the notebook.\n", "* Download the notebook and insert your solutions. In the real exam, you will submit them just like you did with homework during the semester.\n", "* The expected outputs are included (and will also be included in the real exams), and you have to replicate them with your code.\n", "* Make sure you follow the instructions and generate the outputs exactly as described.\n", "* We strongly recommend that you try to solve the example, and only then look at the solutions.\n", "* You are welcome to discuss the example exam in the course forum.\n", "\n", "Good luck!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1) Protein hydrophobicity\n", "\n", "In this question we will calculate the hydrphobicity of a protein based on its amino-acid (aa) sequence.\n", "\n", "> In chemistry, hydrophobicity is the physical property of a molecule (known as a hydrophobe) that is seemingly repelled from a mass of water.\n" ] }, { "cell_type": "code", "collapsed": true, "input": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "`ges_scale` is a `dict` that contains the hydrphobicity score of every aa. The keys are the letters that represent the aa, the values are the scores. For example, the letter for Leucine is `L` and its hydrphobicity score is -2.8." ] }, { "cell_type": "code", "collapsed": true, "input": [ "ges_scale = {'F':-3.7,'M':-3.4,'I':-3.1,'L':-2.8,'V':-2.6,\n", " 'C':-2.0,'W':-1.9,'A':-1.6,'T':-1.2,'G':-1.0,\n", " 'S':-0.6,'P': 0.2,'Y': 0.7,'H': 3.0,'Q': 4.1,\n", " 'N': 4.8,'E': 8.2,'K': 8.8,'D': 9.2,'R':12.3}" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "**a)** Write a function called `hydrphobicity` that calculates the hydrphobicity of a sequence. The function calculates the average hydrphobicity around every position in the sequence. The average is calculated over a _window_ - a set number of positions. This method is knows as a _sliding window_ as after each calculation the window _slides_ to the next position.\n", "\n", "Input: `seq` is a string, each character is an aa letter. `win_size` is the windows size to work with (number of positions on which to average).\n", "\n", "Output: `list` of `float`s of the hydrphobicity scores, calculated for each position of the window." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def hydrophobicity(seq, win_size=15):\n", " \"\"\"Scan a protein sequence for hydrophobic regions using the GES\n", " hydrophobicity scale.\n", " \"\"\"\n", " pass \n", "\n", "protein_seq = 'IRTNGTHMQPLLKLMKFQKFLLELFTLQKRKPEKGYNLPIISLNQ'\n", "scores = hydrophobicity(protein_seq)\n", "print(scores)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[0.7666666666666665, 1.5599999999999998, 0.49333333333333323, 0.8466666666666665, 1.1133333333333333, 0.9333333333333333, 0.8266666666666665, 0.43999999999999984, 1.2133333333333332, 0.7533333333333331, 0.493333333333333, 0.5999999999999996, 0.5999999999999996, 0.28666666666666624, 1.0599999999999996, 2.1066666666666665, 2.106666666666666, 2.3666666666666663, 2.6399999999999992, 2.6399999999999997, 2.82, 3.0533333333333332, 3.5599999999999996, 2.826666666666666, 3.026666666666666, 3.066666666666666, 2.9399999999999995, 3.086666666666666, 2.626666666666666, 2.3599999999999994, 1.8133333333333328]\n" ] } ], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "**b)** Next, plot the hydrophobicity scores. Don't forget the axis labels." ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAEPCAYAAABCyrPIAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmcU/XVx/HPQUBwq1Ue1CqCCFqoIojFFRlbtWqrVuvu\nC1zqI3UDd23dplSf1i5UrRviLiqiKGJFrQIBV1wAQUGLVVrQiqCgbKI45/njd0dCyMwkmdys3/fr\nldckNzf3nhjMyf0t52fujoiISItiByAiIqVBCUFERAAlBBERiSghiIgIoIQgIiIRJQQREQEKkBDM\nbD0zm2ZmTzTw/A1mNsfM3jSzXnHHIyIi6RXiCmEwMAtYZ8KDmR0CdHH3rsDpwC0FiEdERNKINSGY\n2TbAIcDtgKXZ5TDgHgB3nwJsamZbxBmTiIikF/cVwl+Bi4C6Bp7fGpiX9Hg+sE3MMYmISBqxJQQz\n+xnwibtPI/3Vwbe7pjxWLQ0RkSJoGeOx9wIOi/oJ2gCbmNm97j4gaZ8PgQ5Jj7eJtq3FzJQkRERy\n4O6N/SBfS2xXCO7+G3fv4O7bAccBE1KSAcBYYACAme0BLHH3BQ0cr2JvV111VdFj0PvTe9P7q7xb\ntuK8QkjlAGY2EMDdh7n7ODM7xMzeA5YDpxQwHhERSVKQhODuk4BJ0f1hKc+dXYgYRESkcZqpXAJq\namqKHUKsKvn9VfJ7A72/amO5tDMVmpl5OcQpIlJKzAwvhU5lEREpL0oIIiICKCGIiEhECUFERAAl\nBBERiSghiIgIoIQgIiIRJQQREQGUEEREJKKEICIigBKCiIhElBBERARQQhARkYgSgoiIAEoIIiIS\nUUIQERFACUFERCKxJgQza2NmU8xsupnNMrPfp9mnxsw+N7Np0e3yOGMSkeJ7661iRyDptIzz4O7+\npZnt5+4rzKwl8IKZ7ePuL6TsOsndD4szFhEpDe++CzvvHP7usEOxo5FksTcZufuK6G5rYD3gszS7\nZbzmp4iUt9tvh5Yt4dlnix2JpIo9IZhZCzObDiwAJrr7rJRdHNjLzN40s3Fm1j3umESkOL76Cu69\nFy6/HP7xj2JHI6libTICcPc6oKeZfQd4xsxq3D2RtMtUoEPUrHQwMAbQhaRIBXr8cejWDc48E7p0\nga+/hlatih2V1Is9IdRz98/N7ElgNyCRtH1p0v2nzOxmM9vM3ddqWqqtrf32fk1NDTU1NXGHLCJ5\nNnw4nH46/M//hIQwZQrss0+xo6ociUSCRCKR8+vN3fMXTerBzdoBq919iZm1BZ4Bfuvu45P22QL4\nxN3dzPoAo9y9U8pxPM44RSR+H3wAffrAvHnQpg1ceim0bg1DhhQ7ssplZrh7xn20cfchbAVMiPoQ\npgBPuPt4MxtoZgOjfY4CZkb7XAccF3NMIlIEd9wBJ54YkgHAgQeqH6HUxHqFkC+6QhApb6tXQ8eO\nIQH84Adh26pVoelo7lzYbLOihlexSu0KQUSEceNCQqhPBgDrrx/6DyZMKF5csjYlBBGJ3fDh8L//\nu+52NRuVFjUZiUis5s+HHj1CZ/KGG6793Ntvw09/GjqcTdNT805NRiJSUu66C449dt1kANC9e5iL\n8N57hY9L1qWEICKxqasLo4vSNRdBuCpQs1HpUEIQkdg8+yxsvjnsumvD+yghlA4lBBGJTUOdycn2\n3x8SidB0JMWlhCAisViwAMaPhxNOaHy/5DIWUlxKCCISi3vugSOOgE02aXrfAw5Qs1EpUEIQkbxz\nD+seNNVcVE/9CKVBCUFE8m7SpFC4bo89Mtt/771h1iz4LN3yWVIwSggikne33RauDjKdbKYyFqVB\nCUFE8urTT0Ptov79s3udmo2KTwlBRPLqvvvgZz/LvoJpfUJQlZriUUIQkbxxz2zuQTrduqmMRbEp\nIYhI3rz8clj7YN99s3+tylgUnxKCiOTN8OFw2mm5Vy5VQigulb8Wkbz4/HPo1AnefRfat8/tGAsX\nhlnLixZBq1Z5Da8qqfy1iBTFAw+EukS5JgMo/zIWdXXw5z/DfvuF++UmtoRgZm3MbIqZTTezWWb2\n+wb2u8HM5pjZm2bWK654RCReuXYmp4qr2cg9LMTzyCNw6aXhPGecAYsX5+f4n3wSFvsZPTpc6Tzz\nTH6OW0ixJQR3/xLYz917Aj2A/cxsn+R9zOwQoIu7dwVOB26JKx4Ric+IEbBsWbhCaK581DWq//J/\n+OHw5X/AAdCuHfTtG4bFbrABDBoELVqEdZ4feqh5w10nTAglvnv2hMmT4aKL4LrrmvceiqEgfQhm\ntgEwCTjJ3Wclbb8VmOjuD0WP3wH6ufuClNerD0GkRD3/PPziF+FLcaedmn+8VatC09HcudnPZbjr\nrtB0NXUqtG0LvXuvfdtyy3Vf89JLcPrpof/jppugY8fMz7d6NdTWwp13hmJ+Bxyw5j107AgTJ4bh\ntMVSUn0IZtbCzKYDCwhf/LNSdtkamJf0eD6wTZwxiUj+zJkDRx8N99+fn2QAuZexGDoUrrkGBg8O\ndZHmz4fHH4crrwxNOemSAcBee4UEsueeIWlcdx18803T5/vPf6CmBl57DaZNW5MM6t/DwIFwww3Z\nvYdiaxnnwd29DuhpZt8BnjGzGndPpOyWmr3SXgrU1tZ+e7+mpoaampr8BSoiWfv0UzjkEPjd79b+\nMsyH+n6Eo47KbP+hQ+Hmm8Mv8g4dsj9f69Zw2WUhuQ0cGBLc8OGhCSidxx6DX/0KLrgALrwwND2l\nOuOMcHVwzTXZX+nkKpFIkEgkcj+AuxfkBlwBXJiy7VbguKTH7wBbpHmti0jp+PJL97593S+6KJ7j\nv/22e8eO7nV1Te/7l7+4b7+9+3/+k59z19W533GHe/v27hdf7L58+ZrnVq50P/NM9+22c3/llaaP\n1b+/+7XX5ieuXETfnRl/T8c5yqidmW0a3W8LHABMS9ltLDAg2mcPYImn9B+ISGlxD5PP2reHP/wh\nnnN06xba55sqY9HcK4N0zODUU2HmTJg3D3beOVytzJ4Nu+8eRhBNnRruN2Xw4NAvsXp1fmKLW5xN\nRlsB95hZC0JfxX3uPt7MBgK4+zB3H2dmh5jZe8By4JQY4xGRPBgyBP75z/AlnK6pJB/M1ow26to1\n/T5xJINk7duHDuqnngrNSEuWwB//mN1M7N69YdttYcyYzJu/ikkzlUUkYyNGwOWXwyuvNNxJmy8P\nPggjR4aO4VRxJ4NUy5fD0qW5vedHHoHrrw+jsQot21FGSggikpF8Dy9tSkNlLAqdDJpr9WrYfnt4\n9NFwxVBIJTXsVEQqQxzDS5uSroxFuSUDgJYt4ayzwlVCqVNCEJFGxTm8tCnJZSzKMRnUO+00+Pvf\n4eOPix1J45QQRKRBq1bBEUeEWz7qFGWrvmO5nJMBhHkIxx4Lt95a7Egapz4EEUnLHQYMgJUrYdSo\n+EYUNWbVKth889CZW67JoN7s2aEK6r//HWYyF4L6EESk2b75Bs48M/Qd3HtvcZIBhC/OW28t/2QA\nYW5Fz55h5FSp0hWCiKxl5Uo48UT44oswMmaTTYodUeV46qlQIuONN3JfVS4bukIQkZwtXgw/+Qm0\naQPjxikZ5NtPfhLmNLzwQrEjSU8JQUSAUB20b98wVn7EiFDwTfKrRYtQzqJU10pQk5GIMGsWHHww\nnHNOqOBZiOaMarVsWVh74fXXw984qclIRLLy4oth9MvVV4dSzkoG8dpoIzj55FD0rtToCkGkio0d\nGyZN3XdfaN+Wwpg7F3bbLfzdaKP4zqMrBBHJyPDhoYrnk08qGRRap07Qr18Y0ltKdIUgUmXcQxmK\ne+6Bp59uuLy0xGvy5LCW86xZcZYR1xWCSFVxDxPJMrl99VWYcDZmTOg7UDIonr59oW3bNbWaSkGT\nCcHMNi9EICKSvbq6sIB8q1ZhmGhTt7Zt4YMPIJGIfz0DaZwZnHtuaQ1BzeQK4RUzezha2UzjD0RK\nyLXXhmGMX32V+VXC009rwlmpOO64sPrcY48VO5KgyT6EaAnM/YFTgR8Co4C73P2f8Yf3bQzqQxBJ\n8fLL8POfh/Hs5V7np5q9+ir87Gdh3YfttsvvsWNdMc3MfgSMADYEpgO/dveXso4yS0oIImtbsgR6\n9QrNDYcfXuxopLmuvz4sPvTCC/mdIZ73hGBm7YATgQHAAuB24AlgF+ARd+/UyGs7APcC7QEHbnP3\nG1L2qQEeB96PNo1296tT9lFCEIm4wzHHhD6Av/2t2NFIPrjDkUdCx4757VPINiG0zGCflwhXBYe7\n+/yk7a+bWVPLPXwNnOfu081sI+ANM3vW3Wen7DfJ3Q/LNGiRajZ8eChLfd99xY5E8sUM7rwTdt01\nzE844ojixJFJp/Ll7j4kORmY2TEA7v6Hxl7o7h+7+/To/jJgNvC9NLuqs1okA2+/HconjxwZKpJK\n5fjud+Ghh8JkwQ8+KE4MmSSES9Ns+3W2JzKzTkAvYErKUw7sZWZvmtk4M+ue7bFFqsGKFWEZxj/+\nEb7//WJHI3Ho0yck/GOPDSPHCq3BJiMzOxg4BNjGzG5gza/4jQlNQRmLmoseAQZHVwrJpgId3H1F\ndM4xwA6px6itrf32fk1NDTU1NdmEIFL2zj8fevQIhdGkcg0aFOaJXHxx9v0JiUSCRCKR87kb7FQ2\ns10Iv+iHAFewJiF8AUx098UZncCsFfB34Cl3b/LtmdkHQG93/yxpmzqVpaqNHh2+IKZN0xyCarB4\ncehPGDq0ef0JcYwyauXuWV0RJL3WgHuAT939vAb22QL4xN3dzPoAo1JHLikhSDWbOzc0JTz5JPzw\nh8WORgolH/MT8pYQzOxhdz/azGamedrdvUcGwewDTAZmEPoKAH4DbBsdZJiZnQWcAawGVgDnu/sr\nKcdRQpCq9PXXYdTJkUeGtQqkujR3fkI+E8L33P2jqDN4He4+N/vwcqOEINWqfkH2cePiq4gppau5\n8xPyNg/B3T+qPybwsbuvjE7QFtgi+9BEJBvjx8Pdd4d+AyWD6lTo+QmZ/DN7BPgm6XFdtE1EYvLJ\nJzBgQFizoH37YkcjxVTI+QmZJIT13P3bEbHuvgpoFV9IItWtrg5OOikkhP33L3Y0Ugrq5yccc0y8\n8xMySQiLzOzb8lnR/UXxhSRS3f7v/0JJ6yFDih2JlJJBg8Ls9CeeiO8cmQw77QLcz5qSE/OB/u7+\nXnxhrRODOpWlKjz3XLgyeP11+F66Ii9S1e6+O6yd8Pjjme0fW/nraLZxfU2iglJCkGowf36YZ/DA\nA7DffsWORkrR0qVh7Yv33oN27ZreP5/DTvu7+31mdgFr5hBAGHXk7j4005M0lxKCVLqvvoKaGjj0\nUPh11pXCpJqceCLsuSecfXbT+2abEBrrQ9gg+rtxAzcRyZOLL4bNN4dLLil2JFLqBgyAe++N59hZ\nrZhWLLpCkEo2ahRcemmYgPbd7xY7Gil1q1fDttvChAlNV73N5xVC/QG3N7MnzGyRmS00s8fNrHOm\nJxCRhr3zDpx1FjzyiJKBZKZlSzjhhHgWSMpk2OkDwChgK8JIo4eBB/Mfikh1Wb4cjjoqDDPddddi\nRyPlZMAAGDEizFnJp0wSQlt3v8/dv45uIwCt1STSDO5h5uluu8FppxU7Gik3PXqEK8rJk/N73MYW\nyNmMMKLoKTP7NWuuCo4FnspvGCLV5dZbYeZMePnlUK9GJFv9+4fO5XyuFdbYsNO5rD3c9NunCMNO\nc6zQnT11Kkslee01+OlP4cUXoWvXYkcj5eq//4Xu3eHDD2GDDdLvk89qp52yjlBEGvXpp3D00eEK\nQclAmmOrrWCPPcKs5eOPz88xMxll1NrMBpvZaDN7xMzOiZbFFJEs1NWFy/yjjgo17kWaq77ZKF8y\nqWV0B+FK4h5Cc1F/YLW7F6wrTE1GUqpeeQVWrsxs33Hjwv4TJkAr/aSSPFixArbeGmbNClcMqeJY\nU3lG6nKZ6bbFSQlBStGcOWG46G67Zbb/hhvCbbepaJ3k16mnwk47wfnnr/tc3voQkqw2sy711U3N\nbHvC+sciVW3qVDjgAHj00WJHItWsf38477z0CSFbmcxDuAiYYGaTzGwSMAHIaLlvM+tgZhPN7G0z\ne8vMBjWw3w1mNsfM3jSzXpmHL1I806ZBL/1rlSLr1w8++wxmzGj+sZpMCO4+HtgBGAScA+zg7hMy\nPP7XwHnu/gNgD+AsM+uWvIOZHQJ0cfeuwOnALVnEL1I0SghSClq0CFcJ+ShlkenS3bsCOwG9gGPN\nbEAmL3L3j919enR/GTCbNQvt1DuM0GGNu08BNjWzLTKMS6Qo3JUQpHT07w/33x8K3zVHJsNORwB/\nBvYGdgN+GN2yYmadCAllSspTWwPzkh7PB7bJ9vgihfTRR+GvOoilFHz/+7DNNjB+fPOOk0mncm+g\ne3OG+USrrT0CDG5gxbXUXvB1zlVbW/vt/ZqaGmryOV9bJEvTp0PPnio7IaVjwAD4858TvPxyIudj\nZDLs9GHCF/lHOZ0gTGL7O/CUu1+X5vlbgYS7j4wevwP0c/cFSfto2KmUlKuvDssZXnttsSMRCRYt\ngi5dYN482Dhawixv6yFEayA8AbQDZpnZP+q3mdnYTA5uZgbcAcxKlwwiY4EB0f57AEuSk4FIKVL/\ngZSadu3CiKPRo3M/RmPF7WpSNtXvWF/cblKTBzfbB5gMzEh6/W+AbQkHGRbtdyNwELAcOMXdp6Yc\nR1cIUlI6d4annoIddyx2JCJrjB4NN90UZsNDDDOVo4NuBfQB6oDX3P3j3MLNjRKClJIlS6BDB/j8\n8zDkT6RUrFoVBjpMmxaW2YxjCc3TCCODjgSOAqaY2S9zD1mkvE2fHhYoUTKQUrP++qGa7v335/b6\nTP5JXwz0cveT3P0kwpyES3I7nUj5U/+BlLIBA0IF1FwaVTJJCIuA5KGiy6JtIlVJCUFK2Z57wtdf\nwxtvZP/aTOYh/At4xcwejx4fDswwswsInctDsz+tSPmaNg3OPbfYUYikZ5b7OgmZzEOoje6uNcqo\n/nl3/232p82OOpWlVKxcCZttFjqW11+/2NGIpPf++2E1tYUL81z+2t1rAcxs4+jx0pyjFClzb70F\nO+ygZCClrXPn8O904cLsXpfJKKOdzWwa8Dbwtpm9YWY75RamSHmbPl39B1IeHn44+9dk0ql8G3C+\nu2/r7tsCF0TbRKqOOpSlXKRbUrMpmSSEDdx9Yv0Dd08AG2Z/KpHyp4QglSyTUUYfmNkVwH2EDuUT\ngfdjjUqkBH3zDcycGaqcilSiTK4QTgHaA48Co4H/AU6NMyiRUvTPf8KWW8ImmxQ7EpF4NHqFYGYt\ngUfdfb8CxSNSstRcJJWu0SsEd18N1JnZpgWKR6RkKSFIpcukD2E5MNPMno3uQ5ihPCi+sERKz7Rp\ncMEFxY5CJD6ZJIRHo1syTRuWquKuKwSpfJnMVL67AHGIlLR586BVq9CpLFKpGkwIZjazkde5u/eI\nIR6RkqSrA6kGjV0hHBr9PTP6mzwPQaSqqGSFVIMGE4K7zwUwswPdPXkqzoyotpEWyZGqMW0anKif\nQlLhMpmYZma2T9KDvQlXCpm88E4zW9BQ85OZ1ZjZ52Y2LbpdnlnY6U2bBjfc0JwjiKSnJiOpBpms\nh9AbuAv4TrRpCXCKu09t8uBmfQkrrN3r7juneb6GUDjvsCaOk9F6CJdcAk89BTNmNLmrSMY+/TSU\nE168WOsoS3kxy/N6CMB0d+9RPznN3ZdkenB3f97MOjWxW8bBNmXiRJg9G778Etq0yddRpdpNnw67\n7KJkIJUvk3/ic8zsT8D3skkGGXJgLzN708zGmVn3XA/0xRcwaxZst134K5Ivai6SapHJFUJP4Djg\ndjNbD7gTeNDdv8jD+acCHdx9hZkdDIwBdki3Y21t7bf3a2pqqKmpWev555+HPn1gm23CL7pdd81D\ndCKEhHDAAcWOQqRpiUSCRCKR8+ub7ENYa+fQ5n8/8F3gYeB37v5eE6/pBDyRrg8hzb4fAL3d/bOU\n7U32IVx0EWy8MWy0Ecydq85lyZ/u3eHBB0OzkUg5ybYPIZMlNFua2eFmNga4DvgL0Bl4AhiXc6Th\n2FuYmUX3+xAS1GdNvCytiRNhv/1Crfrp05sTlcgaK1aEHxjduhU7EpH4ZdJk9E8gAfzR3V9K2v6I\nmfVr7IVm9iDQD2hnZvOAq4BWAO4+DDgKOMPMVgMrCE1TWVuyBN59NzQZLVsGb74JdXXqBJTmmzED\nvv99aN262JGIxC+ThNDD3Zele8Ldz2nshe5+fBPP3wTclEEMjXr+edh9d1h//XD7znfCr7rOnZt7\nZKl26lCWatJYLaO/Jd131h4eWlLlrxMJSO5jrm82UkKQ5lJCkGrSWKPKG8Dr0d/Dk+7X30pGff9B\nvV12UT+C5IdqGEk1yWiUkZlNc/ei/W/R2CijxYth223DbNL6dt7Ro+Gee2Ds2AIGKRVn9erQ/Lhg\nQRi9JlJu8j7KqNRNngx77rl2p59GGkk+vPNOmNeiZCDVouwTQiKxdnMRhNnKS5aEqwaRXKn/QKpN\ngwnBzJaZ2VIzWwrsXH8/uuVjlnJeTJy4docyhOGmu+wShp+K5EoJQapNgwnB3Tdy942jW8uk+xu7\n+yaFDLIhn30G778Pu+227nNqNpLmUkKQalPWTUaTJsFee4W1blMpIUhzuId/Pz17Nr2vSKUo64SQ\nOtw0mRKCNMfcubDBBtC+fbEjESmcsk4IqRPSkv3gBzBnTlgbQSRbai6SalS2CWHhQvj3v6F37/TP\nt2kDXbpobYRMfPUVvP56saMoLUoIUo3KNiFMngx77w0tG6nGpGajzDz+eLjSWpLv5Y/KmBKCVKOy\nTQiN9R/UU0LIzIQJ4e+ddxY3jlKikhVSjco2ITTWf1BPCSEzEybA0KHwt7/BN98UO5riW7gQli+H\nTp2KHYlIYZVlQvjkE5g/v+lfcPWT0+rqChNXOZo/HxYtgtNOgy23VP0nCM1FPXuCZVwBRqQylGVC\nmDQJ+vZtvP8AoF072GSTMIRQ0qtvemvRAs49F66/vtgRFZ/6D6RalWVCSFeuoiGFbjb68ks46igY\nP75w52yO8ePhxz8O9488Ev71LzWzKSFItSrLhJCuoF1DCp0QzjsvFNU7/ngYObJw582Fe+g/+NGP\nwuNWreDMM3WVUN9kJFJtyi4hfPwx/Pe/oX8gE4VMCCNHwnPPhWGczz0HF14I111XmHPn4l//Cp3I\nO+ywZtvpp8OYMaGfphotWwbz5oV1lEWqTawJwczuNLMFZjazkX1uMLM5ZvammTV5oT5pEuy7L6y3\nXmYxFCohzJkDgwbBqFGh36JHD3jxRRg2DC65JPwaLzX1VwfJnaebbw5HHx3irkbTpoVZ7unqY4lU\nurivEO4CDmroSTM7BOji7l2B04FbmjpgNv0HUJi1Eb78MnyJDhmydttzx47wwgthEt3JJ8PXX8cX\nQy4mTFjTf5Bs0CC4+eYwg7na1P/gEKlGsSYEd38eWNzILocB90T7TgE2NbMtGjtmJhPSkhVibYTz\nzgtNDAMHrvvc5puHjtvPPoNDDw1NEqWgri4khHT/LXfaKfxKHjWq8HEVW3Kfiki1KXYfwtbAvKTH\n84FtGtr5o4/CpKEePbI7SZzNRvX9Brfd1vC49Q02gMceg623Dl82CxfGE0s23n47NG117Jj++foh\nqKXY1BWXlSvh1Vd1hSDVq4mR/AWR+jWa9iuotraWmTNDOeLJk2uoyaLdqGfP0BSQb/X9Bs88E75c\nG9OyJdx+O1x5ZajB9PTT0Llz/mPKVFO/hA85JFz5vPRSiLcavPRS+LGx8cbFjkQkN4lEgkQikfPr\nzWP+CWhmnYAn3H3nNM/dCiTcfWT0+B2gn7svSNnP3Z3TTw9NGYMHZxfDG2/AKafAjBk5vok0vvwS\n9tgDfvWrcMvGzTfDNdfA3/9evPHuhx8ehsYed1zD+9xwQ+gDqZamo8suC1d5V19d7EhE8sPMcPeM\n59wXu8loLDAAwMz2AJakJoNk2fYf1ItjbYTG+g2acuaZ4cv2Jz8pzgS21atDR3dT/y1PPjk0h/3n\nPwUJq+jUfyDVLu5hpw8CLwE7mtk8MzvVzAaa2UAAdx8HvG9m7wHDgDMbOtb8+bB4cejwzFa+10bI\npN+gKb/4BTz8cPiVfvXVoZhaoUydCttsA1s02n0fmsFOOgluuqkwcRXTF1/AW2+FJVlFqlXco4yO\nd/fvuXtrd+/g7ne6+zB3H5a0z9nu3sXdd3H3qQ0dK5GAfv3CqKFc5KtjOXW+QXP06wevvBI6eLt2\nLdxQz2x+CZ9zTiiLXciEVQyTJ0OfPuHHg0i1KnaTUcayKVeRTj4SQkPzDZqjc2d48EF48slQabR7\n9/A4zgqt2SSEzp3Dr+YRI+KLpxSouUikjBJCthPSUuUjITSn36ApvXqFkUfDh4dyF717h8f57vNf\ntQpefjlcnWRq8ODKH4KqhCBSRglh6dLQOZyr5q6NkI9+g0zst19oRrryypCA6h/ny5Qp0K0bbLpp\ndjG1bAnPPpu/OErJokXwwQew227FjkSkuMomIfTr17wv4uasjfDRR6EtPR/9BpkwgyOOgJkzYcAA\nOOaY8Hj27OYfe/z47H8Jm625SqhEiURYX0P1i6TalU1CaE7/Qb1cm40uuwx++cvCzxlo2RJOPRXe\nfTdMDuvXLzQpNUeuTSMnnACvvRZiqTS5JEmRShT7xLR8MDN/+22ne/fmHeeKK8Kv3SFDMn/N1Knw\n05+GL8JCXB005q23YP/9w0inXGbTLl8ehpouWAAbbpj96y+/PBQKvPHG7F9bynbcER56SGsgSOUp\nt4lpGevWrfnH6NkzuyJ37nD++fDb3xY/GUCYg/HjH4dJbbl44QXYddfckgGECXUPPBCSQqWoX1M6\n2/pYIpWobBJCPjpys20yGjMmlM0+9dTmnztfrroK/vrXMEkvW8nLZebie9+Dgw6CO+7I/RilJnlN\naZFqV1X/G2y3Xfgi/eyzpvddtQouugiGDg1t+aVihx1CHaK//CX71+ZjaOXgweEKpVTKeDdXc5Ok\nSCWpqoTTeyvDAAAN5UlEQVSQzdoIN94Y5hwccED8cWXryivhlluyW+Zy8eLQD7L77s079+67hy/Q\n008v/3kJqWtKi1S7qkoIkFmz0cKF8Ic/wJ//XJiYstWxY6iBdO21mb9m0qQw47h16+af/6abQl2o\nm29u/rGKKd2a0iLVTAkhjdra8IVbygutX3YZ3HUXfPhhZvs3tFxmLtq2hUceCZ3tU6bk55jFkG5N\naZFqpoSQYtasUIX0qqsKF1MuttoqzI245prM9s/3WPsuXWDYsDBpbtGi/B23kNRcJLK2spmHkK84\nv/wSNtsstKmvv/66zx98cFin4Nxz83K6WC1aFMbQv/566DBvyMcfh2G7ixbBeuvlN4aLLgozqp98\nMv/HjlNdHWy5ZZhs19AyoiLlrmLnIeRLmzaw/fbp10Z4+unQrnxmg6sylJZ27eCss5qeaDdxYpjl\nHMcX9u9/DytWlN8qY02tKS1SjaouIUD6ZqPVq+GCC+BPf8pPx2uhnH9+WIqzsZIScTaNtGwZZvkO\nGxbWli4Xai4SWVcJjbAvnHQJ4bbbQlmHww4rTky52nTTkBRqa8M6CulMmBBvE9hWW4UZzMcdF5pg\nOnTI7Tgff5zdUNpu3XIvSDdhQhg4ICJrVF0fAoQO1iFDwlBMCKUYdtwx/MItx3o2y5aFTt5//GPd\nEgxz54a5Ax9/HP9ommuvhcceC6uPZXOVtWBB6BwfMSIs7ZmJL76AffbJbeGe1atDc9u77za9jKhI\nOcu2D6EqrxDqJ6e5hy/Jq6+GQw8tz2QAsNFGcMklYcLamDFrP1fIoZUXXwwvvQQXXphZvaUvvggz\nrm+8Efr3h3fegfbtMzvX8uWw887w1FNhIEA2pk4NVzFKBiJri70PwcwOMrN3zGyOmV2S5vkaM/vc\nzKZFt8vjjqldu1AtdO7c0Il8993l1yma6owzwmijV19de3sh28rN4J57woijkSMb3m/VqrAqXNeu\n4TN4443wONNkAKFA3623hvedbRkN9R+IpBdrQjCz9YAbgYOA7sDxZpaubukkd+8V3Qry1Vzfj3Dx\nxaENfsstC3HW+LRpE8pTX3HFmm3FKM2w6aZh0to556y7oM8334SEseOOIa7nnguPO3XK7VwHHgj7\n7huujLKhhCCSXtxXCH2A99x9rrt/DYwEDk+zX8HnivbsGZoqXn89LFVZCU49NayVMHlyePzuu6Et\nv3PnwsbRq1co/fGLX4Rf7+4wdmxoqhs+PLT7jx0bmnyaa+jQ0KH92muZ7Z/LmtIi1SLuPoStgXlJ\nj+cDqeXVHNjLzN4EPgQudPc0swTyq2fP0Ez0wAOhFEMlaN06zLC+/PLQYV4/O7kYpRl++Ut48UU4\n9tjQaf/FF2HOws9+lt942rUL/RCnnRaSe1Ojjl55JZQkyWZNaZFqEXdCyGRo0FSgg7uvMLODgTHA\nOuXGamtrv71fU1NDTU1NswLr2zc0axx3XLMOU3JOPDF88T77bGgaOeKI4sVy003hi/rYY0Nccc1k\nPuEEuO++kBguvbTxffNZ00mk1CQSCRKJRM6vj3XYqZntAdS6+0HR418Dde7eYJ1OM/sA6O3unyVt\ny+uw00o3alSYYPf++zBjBmy9dbEjit/cubDbbqE5qGvXhvfr2zf0sxx4YMFCEymabIedxp0QWgLv\nAj8GPgJeBY5399lJ+2wBfOLubmZ9gFHu3inlOEoIWairC+34q1aFoZzV4q9/hSeeCE1l6Zqlmrum\ntEi5KalaRu6+GjgbeAaYBTzk7rPNbKCZDYx2OwqYaWbTgeuACmvEKbwWLUKH+YUXFjuSwho0CJYu\nDWXB03n++eatKS1S6apyprJUrjffDKvczZix7lDiiy8OyaDUS5uL5EtJXSGIFNouu4QRToMHr/uc\n5h+INE5XCFJxVq4MNZ2GDg0lSSCsf7HttvDpp+VVzVakOXSFIFWvbdtQvfass8L8B4BEIn9rSotU\nKiUEqUj77Rf6Ei67LDxWc5FI05QQpGL96U8wenSYm6AJaSJNUx+CVLRRo+A3vwl9B3GsKS1Sykpq\nYlq+KCFIrtzDKnjrrbfuWhEilU4JQSTF0qVhlnK5lzgXyZYSgoiIABp2KiIiOVJCEBERQAlBREQi\nSggiIgIoIYiISEQJQUREACUEERGJKCGIiAighCAiIpFYE4KZHWRm75jZHDO7pIF9boief9PMesUZ\nj4iINCy2hGBm6wE3AgcB3YHjzaxbyj6HAF3cvStwOnBLXPGUskQiUewQYlXJ76+S3xvo/VWbOK8Q\n+gDvuftcd/8aGAkcnrLPYcA9AO4+BdjUzLaIMaaSVOn/KCv5/VXyewO9v2oTZ0LYGpiX9Hh+tK2p\nfbaJMSYREWlAnAkh0/KkqZX4VNZURKQIYit/bWZ7ALXuflD0+NdAnbtfm7TPrUDC3UdGj98B+rn7\ngpRjKUmIiOQgm/LXLWOM43Wgq5l1Aj4CjgWOT9lnLHA2MDJKIEtSkwFk94ZERCQ3sSUEd19tZmcD\nzwDrAXe4+2wzGxg9P8zdx5nZIWb2HrAcOCWueEREpHFlsWKaiIjEr+RnKmcyua1cmdlcM5thZtPM\n7NVix9NcZnanmS0ws5lJ2zYzs2fN7J9m9g8z27SYMTZHA++v1szmR5/hNDM7qJgxNoeZdTCziWb2\ntpm9ZWaDou1l/xk28t4q4vMzszZmNsXMppvZLDP7fbQ9q8+upK8Qoslt7wL7Ax8CrwHHu/vsogaW\nJ2b2AdDb3T8rdiz5YGZ9gWXAve6+c7Ttj8Aid/9jlNC/6+6XFjPOXDXw/q4Clrr70KIGlwdmtiWw\npbtPN7ONgDeAnxOacsv6M2zkvR1D5Xx+G7j7CjNrCbwAXEiY65XxZ1fqVwiZTG4rdxXTYe7uzwOL\nUzZ/O/kw+vvzggaVRw28P6iQz9DdP3b36dH9ZcBswlyhsv8MG3lvUDmf34robmtCv+1isvzsSj0h\nZDK5rZw58JyZvW5m/1vsYGKyRdLIsQVAJc5EPyeqxXVHOTanpBONDuwFTKHCPsOk9/ZKtKkiPj8z\na2Fm0wmf0UR3f5ssP7tSTwil256VH3u7ey/gYOCsqEmiYnlon6y0z/QWYDugJ/Bf4C/FDaf5oiaV\n0cBgd1+a/Fy5f4bRe3uE8N6WUUGfn7vXuXtPQrWHfc1sv5Tnm/zsSj0hfAh0SHrcgXCVUBHc/b/R\n34XAY4QmskqzIGq/xcy2Aj4pcjx55e6feAS4nTL/DM2sFSEZ3OfuY6LNFfEZJr23EfXvrdI+PwB3\n/xx4EuhNlp9dqSeEbye3mVlrwuS2sUWOKS/MbAMz2zi6vyFwIDCz8VeVpbHASdH9k4AxjexbdqL/\nyeodQRl/hmZmwB3ALHe/Lumpsv8MG3pvlfL5mVm7+uYuM2sLHABMI8vPrqRHGQGY2cHAdayZ3Pb7\nIoeUF2a2HeGqAMIEwfvL/b2Z2YNAP6Adob3ySuBxYBSwLTAXOMbdlxQrxuZI8/6uAmoIzQ0OfAAM\nTDfbvhyY2T7AZGAGa5oWfg28Spl/hg28t98QqieU/ednZjsTOo1bRLf73P1PZrYZWXx2JZ8QRESk\nMEq9yUhERApECUFERAAlBBERiSghiIgIoIQgIiIRJQQREQGUEKSMmNk3UYnimWY2KpqAk+lrd4nm\ntDS1X28zu755kWbPzA6tL+9uZj83s25Jz/3WzH5c6Jik+mgegpQNM1vq7vWzu0cAb7j7X5Oeb+nu\nqxt47cmEUuPnFCTYZjCzu4En3H10sWOR6qIrBClXzwNdzKyfmT1vZo8Db5nZ+mZ2l4WFh6aaWU1U\nw2YIcGx0hXG0mW1oYcGbKdF+hwFE+z8R3a+N9ploZv8ys7TJxMyWmdnQaOGV58ysXbS9p5m9ElXS\nfDSptMAgCwu1vGlmD0TbTjazv5nZnsChwJ+iuDqb2d1m9otovx9H22dE1TlbR9vnRvG+ET23Y5z/\n8aUyKSFI2YkWADmEUIYAQinjQe7+feBs4Bt370EoS1A/nf8KYKS793L3h4HLgPHuvjvwI8IX8AZp\nTrcDoc5UH+AqC4s2pdoAeM3ddwImEUpaANwLXOTuuxBq5NRvvwToGW3/VbTNAdz9ZUL9mQvdfVd3\nfz96zs2sDXAXofxAD0LJkzOSXr/Q3XsTKnhe2OR/SJEUSghSTtqa2TTCynlzgTsJi5u86u7/jvbZ\nGxgB4O7vAv8mfKnD2guhHAhcGh1vIrA+a1fWhfAl+6S7f+3unxIqRaarJ18HPBTdHwHsY2abAN+J\nFtWBkJj2je7PAB4wsxOBbxp4r6mLthiwI/CBu7+X5pgAj0Z/pwKdGjiuSINaFjsAkSysjNaP+FYo\nYsnylP0yXQHrSHefk3K8rVL2+Srp/jc0/f+Mkb7mfHJMPyV8kR8KXBYVJkuNOd0xUrelnmtVFnGK\nrENXCFJpngdOBDCzHQhVHt8BlgIbJ+33DDCo/oGZrZVo6jdneM4WwNHR/ROA5939C2BxVGUToD+Q\niMowb+vuCeBS4DvARinHWwpskrLNCeuLdzKz7ZOOOSnDGEWapIQg5aShX83J228GWpjZDMIa3CdF\n63FPBLrXdyoDvwNaRR2wbwG/TXOeTFcHWw70MbOZhHLYQ6LtJxH6Jt4EekTbWwL3RfFNBa6PFjRJ\nPtdI4KKog7jzt0G5ryIseP9w9PrVwK1p/tuU9apmUjwadirSTMnDYUXKma4QRJpPv6qkIugKQURE\nAF0hiIhIRAlBREQAJQQREYkoIYiICKCEICIiESUEEREB4P8BJ2luiNVtD6sAAAAASUVORK5CYII=\n", "text": [ "" ] } ], "prompt_number": 4 }, { "cell_type": "heading", "level": 2, "metadata": { "collapsed": true }, "source": [ "2) Translate DNA to protein" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this question we will translate RNA sequences to amino-acid (aa) sequences.\n", "\n", "`genetic_code` is a `dict` in which keys are RNA codons - triplets of base names (A, G, C, T) and the values are short names of aa (`Leu` for Leucine, etc.)." ] }, { "cell_type": "code", "collapsed": true, "input": [ "genetic_code = { \n", " 'UUU':'Phe', 'UUC':'Phe', 'UCU':'Ser', 'UCC':'Ser',\n", " 'UAU':'Tyr', 'UAC':'Tyr', 'UGU':'Cys', 'UGC':'Cys',\n", " 'UUA':'Leu', 'UCA':'Ser', 'UAA':None, 'UGA':None,\n", " 'UUG':'Leu', 'UCG':'Ser', 'UAG':None, 'UGG':'Trp',\n", " 'CUU':'Leu', 'CUC':'Leu', 'CCU':'Pro', 'CCC':'Pro',\n", " 'CAU':'His', 'CAC':'His', 'CGU':'Arg', 'CGC':'Arg',\n", " 'CUA':'Leu', 'CUG':'Leu', 'CCA':'Pro', 'CCG':'Pro',\n", " 'CAA':'Gln', 'CAG':'Gln', 'CGA':'Arg', 'CGG':'Arg',\n", " 'AUU':'Ile', 'AUC':'Ile', 'ACU':'Thr', 'ACC':'Thr',\n", " 'AAU':'Asn', 'AAC':'Asn', 'AGU':'Ser', 'AGC':'Ser',\n", " 'AUA':'Ile', 'ACA':'Thr', 'AAA':'Lys', 'AGA':'Arg',\n", " 'AUG':'Met', 'ACG':'Thr', 'AAG':'Lys', 'AGG':'Arg',\n", " 'GUU':'Val', 'GUC':'Val', 'GCU':'Ala', 'GCC':'Ala',\n", " 'GAU':'Asp', 'GAC':'Asp', 'GGU':'Gly', 'GGC':'Gly',\n", " 'GUA':'Val', 'GUG':'Val', 'GCA':'Ala', 'GCG':'Ala', \n", " 'GAA':'Glu', 'GAG':'Glu', 'GGA':'Gly', 'GGG':'Gly'}" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "**a)** Write a function called `protein_translation` that translates an RNA sequence to an aa sequence.\n", "\n", "Input: `seq` is a string of A, G, C and U (all uppercase).\n", "\n", "Output: `list` of strings of short names of aa." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def protein_translation(seq):\n", " \"\"\" This function translates a nucleic acid sequence into a\n", " protein sequence, until the end or until it comes across\n", " a stop codon\n", " \"\"\"\n", " pass\n", " \n", " \n", "dna_seq = 'ATGGTGCATCTGACTCCTGAGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTG'\n", "aa_seq = protein_translation(dna_seq)\n", "print(aa_seq)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "['Met', 'Val', 'His', 'Leu', 'Thr', 'Pro', 'Glu', 'Glu', 'Lys', 'Ser', 'Ala', 'Val', 'Thr', 'Ala', 'Leu', 'Trp', 'Gly', 'Lys', 'Val']\n" ] } ], "prompt_number": 7 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "3) Maize chloroplast" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**a)** Write a function called `fetch_gb_by_id` that receives a GenBank ID (such as `KF241981`) and returns a Biopython `SeqRecord` object of the corresponding result. Use it to fetch the maize chloroplast genome record. \n", "\n", "Assume the default settings, as shown in lecture 6. Ignore any warning messages from NCBI that might be displayed." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from Bio import Entrez\n", "from Bio import SeqIO" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 8 }, { "cell_type": "code", "collapsed": false, "input": [ "def fetch_gb_by_id(rec_id):\n", " pass\n", "\n", "maize_chl = fetch_gb_by_id('KF241981')\n", "print(maize_chl.description)\n", "assert len(maize_chl.features) == 263" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Zea mays subsp. mays cultivar B73 chloroplast, complete genome.\n" ] } ], "prompt_number": 10 }, { "cell_type": "markdown", "metadata": {}, "source": [ "__b)__ Write a function called `extract_rRNA` that receives a `SeqRecord` object and extract its **gene** features. \n", "The function should return a `list` of `SeqRecord` objects of the corresponding sequences. \n", "\n", "Change the `description` field to have the corresponding gene name, followed by ` | ` and the sequence length. For example: `rrn22 | 1231`." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def extract_rRNA(gb_record):\n", " pass\n", " \n", "maize_chl_rRNAs = extract_rRNA(maize_chl)\n", "print([x.description for x in maize_chl_rRNAs])" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "['rrn16 | 1492', 'rrn23 | 2888', 'rrn4.5 | 95', 'rrn5 | 121', 'rrn5 | 121', 'rrn4.5 | 95', 'rrn23 | 2888', 'rrn16 | 1492']\n" ] } ], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "__c)__ Print the rRNA sequences to the output file `maize_chloroplast_rRNAs.fasta` in __fasta__ format." ] }, { "cell_type": "code", "collapsed": false, "input": [ "out_filename = \"maize_chloroplast_rRNAs.fasta\"\n" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 12, "text": [ "8" ] } ], "prompt_number": 12 } ], "metadata": {} } ] }