{ "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.1" }, "name": "", "signature": "sha256:03ad4f7c7bf50ab03c7a43e3940ddb590d69b2f706757f213a1767fa7dacd1e1" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "[![Py4Life](https://raw.githubusercontent.com/Py4Life/TAU2015/gh-pages/img/Py4Life-logo-small.png)](http://py4life.github.io/TAU2015/)\n", "\n", "## Exam - Example - Solution\n", "\n", "### Tel-Aviv University / 0411-3122 / Spring 2015" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1) Protein hydrophobicity\n", "\n", "In this question we will calculate the hydrphobicity of a protein based on its amino-acid (aa) sequence.\n", "\n", "> In chemistry, hydrophobicity is the physical property of a molecule (known as a hydrophobe) that is seemingly repelled from a mass of water.\n" ] }, { "cell_type": "code", "collapsed": true, "input": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "`ges_scale` is a `dict` that contains the hydrphobicity score of every aa. The keys are the letters that represent the aa, the values are the scores. For example, the letter for Leucine is `L` and its hydrphobicity score is -2.8." ] }, { "cell_type": "code", "collapsed": true, "input": [ "ges_scale = {'F':-3.7,'M':-3.4,'I':-3.1,'L':-2.8,'V':-2.6,\n", " 'C':-2.0,'W':-1.9,'A':-1.6,'T':-1.2,'G':-1.0,\n", " 'S':-0.6,'P': 0.2,'Y': 0.7,'H': 3.0,'Q': 4.1,\n", " 'N': 4.8,'E': 8.2,'K': 8.8,'D': 9.2,'R':12.3}" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "**a)** Write a function called `hydrphobicity` that calculates the hydrphobicity of a sequence. The function calculates the average hydrphobicity around every position in the sequence. The average is calculated over a _window_ - a set number of positions. This method is knows as a _sliding window_ as after each calculation the window _slides_ to the next position.\n", "\n", "Input: `seq` is a string, each character is an aa letter. `win_size` is the windows size to work with (number of positions on which to average).\n", "\n", "Output: `list` of `float`s of the hydrphobicity scores, calculated for each position of the window." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def hydrophobicity(seq, win_size=15):\n", " \"\"\"Scan a protein sequence for hydrophobic regions using the GES\n", " hydrophobicity scale.\n", " \"\"\"\n", "\n", " score = None\n", " score_list = []\n", "\n", " for i in range(len(seq)- win_size+1):\n", " j = i + win_size\n", " \n", " if score is None:\n", " score = 0\n", " for k in range(i,j):\n", " score += ges_scale[seq[k]]\n", "\n", " else:\n", " score += ges_scale[seq[j - 1]]\n", " score -= ges_scale[seq[i - 1]]\n", "\n", " score_list.append(score/win_size)\n", "\n", " return score_list\n", "\n", "protein_seq = 'IRTNGTHMQPLLKLMKFQKFLLELFTLQKRKPEKGYNLPIISLNQ'\n", "scores = hydrophobicity(protein_seq)\n", "print(scores)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[0.7666666666666665, 1.5599999999999998, 0.49333333333333323, 0.8466666666666665, 1.1133333333333333, 0.9333333333333333, 0.8266666666666665, 0.43999999999999984, 1.2133333333333332, 0.7533333333333331, 0.493333333333333, 0.5999999999999996, 0.5999999999999996, 0.28666666666666624, 1.0599999999999996, 2.1066666666666665, 2.106666666666666, 2.3666666666666663, 2.6399999999999992, 2.6399999999999997, 2.82, 3.0533333333333332, 3.5599999999999996, 2.826666666666666, 3.026666666666666, 3.066666666666666, 2.9399999999999995, 3.086666666666666, 2.626666666666666, 2.3599999999999994, 1.8133333333333328]\n" ] } ], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "**b)** Next, plot the hydrophobicity scores. Don't forget the axis labels." ] }, { "cell_type": "code", "collapsed": false, "input": [ "plt.plot(scores)\n", "plt.xlabel('Protein position')\n", "plt.ylabel('Hydrophobicity');" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAEPCAYAAABCyrPIAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmcU/XVx/HPQUBwq1Ue1CqCCFqoIojFFRlbtWqrVuvu\nC1zqI3UDd23dplSf1i5UrRviLiqiKGJFrQIBV1wAQUGLVVrQiqCgbKI45/njd0dCyMwkmdys3/fr\nldckNzf3nhjMyf0t52fujoiISItiByAiIqVBCUFERAAlBBERiSghiIgIoIQgIiIRJQQREQEKkBDM\nbD0zm2ZmTzTw/A1mNsfM3jSzXnHHIyIi6RXiCmEwMAtYZ8KDmR0CdHH3rsDpwC0FiEdERNKINSGY\n2TbAIcDtgKXZ5TDgHgB3nwJsamZbxBmTiIikF/cVwl+Bi4C6Bp7fGpiX9Hg+sE3MMYmISBqxJQQz\n+xnwibtPI/3Vwbe7pjxWLQ0RkSJoGeOx9wIOi/oJ2gCbmNm97j4gaZ8PgQ5Jj7eJtq3FzJQkRERy\n4O6N/SBfS2xXCO7+G3fv4O7bAccBE1KSAcBYYACAme0BLHH3BQ0cr2JvV111VdFj0PvTe9P7q7xb\ntuK8QkjlAGY2EMDdh7n7ODM7xMzeA5YDpxQwHhERSVKQhODuk4BJ0f1hKc+dXYgYRESkcZqpXAJq\namqKHUKsKvn9VfJ7A72/amO5tDMVmpl5OcQpIlJKzAwvhU5lEREpL0oIIiICKCGIiEhECUFERAAl\nBBERiSghiIgIoIQgIiIRJQQREQGUEEREJKKEICIigBKCiIhElBBERARQQhARkYgSgoiIAEoIIiIS\nUUIQERFACUFERCKxJgQza2NmU8xsupnNMrPfp9mnxsw+N7Np0e3yOGMSkeJ7661iRyDptIzz4O7+\npZnt5+4rzKwl8IKZ7ePuL6TsOsndD4szFhEpDe++CzvvHP7usEOxo5FksTcZufuK6G5rYD3gszS7\nZbzmp4iUt9tvh5Yt4dlnix2JpIo9IZhZCzObDiwAJrr7rJRdHNjLzN40s3Fm1j3umESkOL76Cu69\nFy6/HP7xj2JHI6libTICcPc6oKeZfQd4xsxq3D2RtMtUoEPUrHQwMAbQhaRIBXr8cejWDc48E7p0\nga+/hlatih2V1Is9IdRz98/N7ElgNyCRtH1p0v2nzOxmM9vM3ddqWqqtrf32fk1NDTU1NXGHLCJ5\nNnw4nH46/M//hIQwZQrss0+xo6ociUSCRCKR8+vN3fMXTerBzdoBq919iZm1BZ4Bfuvu45P22QL4\nxN3dzPoAo9y9U8pxPM44RSR+H3wAffrAvHnQpg1ceim0bg1DhhQ7ssplZrh7xn20cfchbAVMiPoQ\npgBPuPt4MxtoZgOjfY4CZkb7XAccF3NMIlIEd9wBJ54YkgHAgQeqH6HUxHqFkC+6QhApb6tXQ8eO\nIQH84Adh26pVoelo7lzYbLOihlexSu0KQUSEceNCQqhPBgDrrx/6DyZMKF5csjYlBBGJ3fDh8L//\nu+52NRuVFjUZiUis5s+HHj1CZ/KGG6793Ntvw09/GjqcTdNT805NRiJSUu66C449dt1kANC9e5iL\n8N57hY9L1qWEICKxqasLo4vSNRdBuCpQs1HpUEIQkdg8+yxsvjnsumvD+yghlA4lBBGJTUOdycn2\n3x8SidB0JMWlhCAisViwAMaPhxNOaHy/5DIWUlxKCCISi3vugSOOgE02aXrfAw5Qs1EpUEIQkbxz\nD+seNNVcVE/9CKVBCUFE8m7SpFC4bo89Mtt/771h1iz4LN3yWVIwSggikne33RauDjKdbKYyFqVB\nCUFE8urTT0Ptov79s3udmo2KTwlBRPLqvvvgZz/LvoJpfUJQlZriUUIQkbxxz2zuQTrduqmMRbEp\nIYhI3rz8clj7YN99s3+tylgUnxKCiOTN8OFw2mm5Vy5VQigulb8Wkbz4/HPo1AnefRfat8/tGAsX\nhlnLixZBq1Z5Da8qqfy1iBTFAw+EukS5JgMo/zIWdXXw5z/DfvuF++UmtoRgZm3MbIqZTTezWWb2\n+wb2u8HM5pjZm2bWK654RCReuXYmp4qr2cg9LMTzyCNw6aXhPGecAYsX5+f4n3wSFvsZPTpc6Tzz\nTH6OW0ixJQR3/xLYz917Aj2A/cxsn+R9zOwQoIu7dwVOB26JKx4Ric+IEbBsWbhCaK581DWq//J/\n+OHw5X/AAdCuHfTtG4bFbrABDBoELVqEdZ4feqh5w10nTAglvnv2hMmT4aKL4LrrmvceiqEgfQhm\ntgEwCTjJ3Wclbb8VmOjuD0WP3wH6ufuClNerD0GkRD3/PPziF+FLcaedmn+8VatC09HcudnPZbjr\nrtB0NXUqtG0LvXuvfdtyy3Vf89JLcPrpof/jppugY8fMz7d6NdTWwp13hmJ+Bxyw5j107AgTJ4bh\ntMVSUn0IZtbCzKYDCwhf/LNSdtkamJf0eD6wTZwxiUj+zJkDRx8N99+fn2QAuZexGDoUrrkGBg8O\ndZHmz4fHH4crrwxNOemSAcBee4UEsueeIWlcdx18803T5/vPf6CmBl57DaZNW5MM6t/DwIFwww3Z\nvYdiaxnnwd29DuhpZt8BnjGzGndPpOyWmr3SXgrU1tZ+e7+mpoaampr8BSoiWfv0UzjkEPjd79b+\nMsyH+n6Eo47KbP+hQ+Hmm8Mv8g4dsj9f69Zw2WUhuQ0cGBLc8OGhCSidxx6DX/0KLrgALrwwND2l\nOuOMcHVwzTXZX+nkKpFIkEgkcj+AuxfkBlwBXJiy7VbguKTH7wBbpHmti0jp+PJL97593S+6KJ7j\nv/22e8eO7nV1Te/7l7+4b7+9+3/+k59z19W533GHe/v27hdf7L58+ZrnVq50P/NM9+22c3/llaaP\n1b+/+7XX5ieuXETfnRl/T8c5yqidmW0a3W8LHABMS9ltLDAg2mcPYImn9B+ISGlxD5PP2reHP/wh\nnnN06xba55sqY9HcK4N0zODUU2HmTJg3D3beOVytzJ4Nu+8eRhBNnRruN2Xw4NAvsXp1fmKLW5xN\nRlsB95hZC0JfxX3uPt7MBgK4+zB3H2dmh5jZe8By4JQY4xGRPBgyBP75z/AlnK6pJB/M1ow26to1\n/T5xJINk7duHDuqnngrNSEuWwB//mN1M7N69YdttYcyYzJu/ikkzlUUkYyNGwOWXwyuvNNxJmy8P\nPggjR4aO4VRxJ4NUy5fD0qW5vedHHoHrrw+jsQot21FGSggikpF8Dy9tSkNlLAqdDJpr9WrYfnt4\n9NFwxVBIJTXsVEQqQxzDS5uSroxFuSUDgJYt4ayzwlVCqVNCEJFGxTm8tCnJZSzKMRnUO+00+Pvf\n4eOPix1J45QQRKRBq1bBEUeEWz7qFGWrvmO5nJMBhHkIxx4Lt95a7Egapz4EEUnLHQYMgJUrYdSo\n+EYUNWbVKth889CZW67JoN7s2aEK6r//HWYyF4L6EESk2b75Bs48M/Qd3HtvcZIBhC/OW28t/2QA\nYW5Fz55h5FSp0hWCiKxl5Uo48UT44oswMmaTTYodUeV46qlQIuONN3JfVS4bukIQkZwtXgw/+Qm0\naQPjxikZ5NtPfhLmNLzwQrEjSU8JQUSAUB20b98wVn7EiFDwTfKrRYtQzqJU10pQk5GIMGsWHHww\nnHNOqOBZiOaMarVsWVh74fXXw984qclIRLLy4oth9MvVV4dSzkoG8dpoIzj55FD0rtToCkGkio0d\nGyZN3XdfaN+Wwpg7F3bbLfzdaKP4zqMrBBHJyPDhoYrnk08qGRRap07Qr18Y0ltKdIUgUmXcQxmK\ne+6Bp59uuLy0xGvy5LCW86xZcZYR1xWCSFVxDxPJMrl99VWYcDZmTOg7UDIonr59oW3bNbWaSkGT\nCcHMNi9EICKSvbq6sIB8q1ZhmGhTt7Zt4YMPIJGIfz0DaZwZnHtuaQ1BzeQK4RUzezha2UzjD0RK\nyLXXhmGMX32V+VXC009rwlmpOO64sPrcY48VO5KgyT6EaAnM/YFTgR8Co4C73P2f8Yf3bQzqQxBJ\n8fLL8POfh/Hs5V7np5q9+ir87Gdh3YfttsvvsWNdMc3MfgSMADYEpgO/dveXso4yS0oIImtbsgR6\n9QrNDYcfXuxopLmuvz4sPvTCC/mdIZ73hGBm7YATgQHAAuB24AlgF+ARd+/UyGs7APcC7QEHbnP3\nG1L2qQEeB96PNo1296tT9lFCEIm4wzHHhD6Av/2t2NFIPrjDkUdCx4757VPINiG0zGCflwhXBYe7\n+/yk7a+bWVPLPXwNnOfu081sI+ANM3vW3Wen7DfJ3Q/LNGiRajZ8eChLfd99xY5E8sUM7rwTdt01\nzE844ojixJFJp/Ll7j4kORmY2TEA7v6Hxl7o7h+7+/To/jJgNvC9NLuqs1okA2+/HconjxwZKpJK\n5fjud+Ghh8JkwQ8+KE4MmSSES9Ns+3W2JzKzTkAvYErKUw7sZWZvmtk4M+ue7bFFqsGKFWEZxj/+\nEb7//WJHI3Ho0yck/GOPDSPHCq3BJiMzOxg4BNjGzG5gza/4jQlNQRmLmoseAQZHVwrJpgId3H1F\ndM4xwA6px6itrf32fk1NDTU1NdmEIFL2zj8fevQIhdGkcg0aFOaJXHxx9v0JiUSCRCKR87kb7FQ2\ns10Iv+iHAFewJiF8AUx098UZncCsFfB34Cl3b/LtmdkHQG93/yxpmzqVpaqNHh2+IKZN0xyCarB4\ncehPGDq0ef0JcYwyauXuWV0RJL3WgHuAT939vAb22QL4xN3dzPoAo1JHLikhSDWbOzc0JTz5JPzw\nh8WORgolH/MT8pYQzOxhdz/azGamedrdvUcGwewDTAZmEPoKAH4DbBsdZJiZnQWcAawGVgDnu/sr\nKcdRQpCq9PXXYdTJkUeGtQqkujR3fkI+E8L33P2jqDN4He4+N/vwcqOEINWqfkH2cePiq4gppau5\n8xPyNg/B3T+qPybwsbuvjE7QFtgi+9BEJBvjx8Pdd4d+AyWD6lTo+QmZ/DN7BPgm6XFdtE1EYvLJ\nJzBgQFizoH37YkcjxVTI+QmZJIT13P3bEbHuvgpoFV9IItWtrg5OOikkhP33L3Y0Ugrq5yccc0y8\n8xMySQiLzOzb8lnR/UXxhSRS3f7v/0JJ6yFDih2JlJJBg8Ls9CeeiO8cmQw77QLcz5qSE/OB/u7+\nXnxhrRODOpWlKjz3XLgyeP11+F66Ii9S1e6+O6yd8Pjjme0fW/nraLZxfU2iglJCkGowf36YZ/DA\nA7DffsWORkrR0qVh7Yv33oN27ZreP5/DTvu7+31mdgFr5hBAGHXk7j4005M0lxKCVLqvvoKaGjj0\nUPh11pXCpJqceCLsuSecfXbT+2abEBrrQ9gg+rtxAzcRyZOLL4bNN4dLLil2JFLqBgyAe++N59hZ\nrZhWLLpCkEo2ahRcemmYgPbd7xY7Gil1q1fDttvChAlNV73N5xVC/QG3N7MnzGyRmS00s8fNrHOm\nJxCRhr3zDpx1FjzyiJKBZKZlSzjhhHgWSMpk2OkDwChgK8JIo4eBB/Mfikh1Wb4cjjoqDDPddddi\nRyPlZMAAGDEizFnJp0wSQlt3v8/dv45uIwCt1STSDO5h5uluu8FppxU7Gik3PXqEK8rJk/N73MYW\nyNmMMKLoKTP7NWuuCo4FnspvGCLV5dZbYeZMePnlUK9GJFv9+4fO5XyuFdbYsNO5rD3c9NunCMNO\nc6zQnT11Kkslee01+OlP4cUXoWvXYkcj5eq//4Xu3eHDD2GDDdLvk89qp52yjlBEGvXpp3D00eEK\nQclAmmOrrWCPPcKs5eOPz88xMxll1NrMBpvZaDN7xMzOiZbFFJEs1NWFy/yjjgo17kWaq77ZKF8y\nqWV0B+FK4h5Cc1F/YLW7F6wrTE1GUqpeeQVWrsxs33Hjwv4TJkAr/aSSPFixArbeGmbNClcMqeJY\nU3lG6nKZ6bbFSQlBStGcOWG46G67Zbb/hhvCbbepaJ3k16mnwk47wfnnr/tc3voQkqw2sy711U3N\nbHvC+sciVW3qVDjgAHj00WJHItWsf38477z0CSFbmcxDuAiYYGaTzGwSMAHIaLlvM+tgZhPN7G0z\ne8vMBjWw3w1mNsfM3jSzXpmHL1I806ZBL/1rlSLr1w8++wxmzGj+sZpMCO4+HtgBGAScA+zg7hMy\nPP7XwHnu/gNgD+AsM+uWvIOZHQJ0cfeuwOnALVnEL1I0SghSClq0CFcJ+ShlkenS3bsCOwG9gGPN\nbEAmL3L3j919enR/GTCbNQvt1DuM0GGNu08BNjWzLTKMS6Qo3JUQpHT07w/33x8K3zVHJsNORwB/\nBvYGdgN+GN2yYmadCAllSspTWwPzkh7PB7bJ9vgihfTRR+GvOoilFHz/+7DNNjB+fPOOk0mncm+g\ne3OG+USrrT0CDG5gxbXUXvB1zlVbW/vt/ZqaGmryOV9bJEvTp0PPnio7IaVjwAD4858TvPxyIudj\nZDLs9GHCF/lHOZ0gTGL7O/CUu1+X5vlbgYS7j4wevwP0c/cFSfto2KmUlKuvDssZXnttsSMRCRYt\ngi5dYN482Dhawixv6yFEayA8AbQDZpnZP+q3mdnYTA5uZgbcAcxKlwwiY4EB0f57AEuSk4FIKVL/\ngZSadu3CiKPRo3M/RmPF7WpSNtXvWF/cblKTBzfbB5gMzEh6/W+AbQkHGRbtdyNwELAcOMXdp6Yc\nR1cIUlI6d4annoIddyx2JCJrjB4NN90UZsNDDDOVo4NuBfQB6oDX3P3j3MLNjRKClJIlS6BDB/j8\n8zDkT6RUrFoVBjpMmxaW2YxjCc3TCCODjgSOAqaY2S9zD1mkvE2fHhYoUTKQUrP++qGa7v335/b6\nTP5JXwz0cveT3P0kwpyES3I7nUj5U/+BlLIBA0IF1FwaVTJJCIuA5KGiy6JtIlVJCUFK2Z57wtdf\nwxtvZP/aTOYh/At4xcwejx4fDswwswsInctDsz+tSPmaNg3OPbfYUYikZ5b7OgmZzEOoje6uNcqo\n/nl3/232p82OOpWlVKxcCZttFjqW11+/2NGIpPf++2E1tYUL81z+2t1rAcxs4+jx0pyjFClzb70F\nO+ygZCClrXPn8O904cLsXpfJKKOdzWwa8Dbwtpm9YWY75RamSHmbPl39B1IeHn44+9dk0ql8G3C+\nu2/r7tsCF0TbRKqOOpSlXKRbUrMpmSSEDdx9Yv0Dd08AG2Z/KpHyp4QglSyTUUYfmNkVwH2EDuUT\ngfdjjUqkBH3zDcycGaqcilSiTK4QTgHaA48Co4H/AU6NMyiRUvTPf8KWW8ImmxQ7EpF4NHqFYGYt\ngUfdfb8CxSNSstRcJJWu0SsEd18N1JnZpgWKR6RkKSFIpcukD2E5MNPMno3uQ5ihPCi+sERKz7Rp\ncMEFxY5CJD6ZJIRHo1syTRuWquKuKwSpfJnMVL67AHGIlLR586BVq9CpLFKpGkwIZjazkde5u/eI\nIR6RkqSrA6kGjV0hHBr9PTP6mzwPQaSqqGSFVIMGE4K7zwUwswPdPXkqzoyotpEWyZGqMW0anKif\nQlLhMpmYZma2T9KDvQlXCpm88E4zW9BQ85OZ1ZjZ52Y2LbpdnlnY6U2bBjfc0JwjiKSnJiOpBpms\nh9AbuAv4TrRpCXCKu09t8uBmfQkrrN3r7juneb6GUDjvsCaOk9F6CJdcAk89BTNmNLmrSMY+/TSU\nE168WOsoS3kxy/N6CMB0d+9RPznN3ZdkenB3f97MOjWxW8bBNmXiRJg9G778Etq0yddRpdpNnw67\n7KJkIJUvk3/ic8zsT8D3skkGGXJgLzN708zGmVn3XA/0xRcwaxZst134K5Ivai6SapHJFUJP4Djg\ndjNbD7gTeNDdv8jD+acCHdx9hZkdDIwBdki3Y21t7bf3a2pqqKmpWev555+HPn1gm23CL7pdd81D\ndCKEhHDAAcWOQqRpiUSCRCKR8+ub7ENYa+fQ5n8/8F3gYeB37v5eE6/pBDyRrg8hzb4fAL3d/bOU\n7U32IVx0EWy8MWy0Ecydq85lyZ/u3eHBB0OzkUg5ybYPIZMlNFua2eFmNga4DvgL0Bl4AhiXc6Th\n2FuYmUX3+xAS1GdNvCytiRNhv/1Crfrp05sTlcgaK1aEHxjduhU7EpH4ZdJk9E8gAfzR3V9K2v6I\nmfVr7IVm9iDQD2hnZvOAq4BWAO4+DDgKOMPMVgMrCE1TWVuyBN59NzQZLVsGb74JdXXqBJTmmzED\nvv99aN262JGIxC+ThNDD3Zele8Ldz2nshe5+fBPP3wTclEEMjXr+edh9d1h//XD7znfCr7rOnZt7\nZKl26lCWatJYLaO/Jd131h4eWlLlrxMJSO5jrm82UkKQ5lJCkGrSWKPKG8Dr0d/Dk+7X30pGff9B\nvV12UT+C5IdqGEk1yWiUkZlNc/ei/W/R2CijxYth223DbNL6dt7Ro+Gee2Ds2AIGKRVn9erQ/Lhg\nQRi9JlJu8j7KqNRNngx77rl2p59GGkk+vPNOmNeiZCDVouwTQiKxdnMRhNnKS5aEqwaRXKn/QKpN\ngwnBzJaZ2VIzWwrsXH8/uuVjlnJeTJy4docyhOGmu+wShp+K5EoJQapNgwnB3Tdy942jW8uk+xu7\n+yaFDLIhn30G778Pu+227nNqNpLmUkKQalPWTUaTJsFee4W1blMpIUhzuId/Pz17Nr2vSKUo64SQ\nOtw0mRKCNMfcubDBBtC+fbEjESmcsk4IqRPSkv3gBzBnTlgbQSRbai6SalS2CWHhQvj3v6F37/TP\nt2kDXbpobYRMfPUVvP56saMoLUoIUo3KNiFMngx77w0tG6nGpGajzDz+eLjSWpLv5Y/KmBKCVKOy\nTQiN9R/UU0LIzIQJ4e+ddxY3jlKikhVSjco2ITTWf1BPCSEzEybA0KHwt7/BN98UO5riW7gQli+H\nTp2KHYlIYZVlQvjkE5g/v+lfcPWT0+rqChNXOZo/HxYtgtNOgy23VP0nCM1FPXuCZVwBRqQylGVC\nmDQJ+vZtvP8AoF072GSTMIRQ0qtvemvRAs49F66/vtgRFZ/6D6RalWVCSFeuoiGFbjb68ks46igY\nP75w52yO8ePhxz8O9488Ev71LzWzKSFItSrLhJCuoF1DCp0QzjsvFNU7/ngYObJw582Fe+g/+NGP\nwuNWreDMM3WVUN9kJFJtyi4hfPwx/Pe/oX8gE4VMCCNHwnPPhWGczz0HF14I111XmHPn4l//Cp3I\nO+ywZtvpp8OYMaGfphotWwbz5oV1lEWqTawJwczuNLMFZjazkX1uMLM5ZvammTV5oT5pEuy7L6y3\nXmYxFCohzJkDgwbBqFGh36JHD3jxRRg2DC65JPwaLzX1VwfJnaebbw5HHx3irkbTpoVZ7unqY4lU\nurivEO4CDmroSTM7BOji7l2B04FbmjpgNv0HUJi1Eb78MnyJDhmydttzx47wwgthEt3JJ8PXX8cX\nQy4mTFjTf5Bs0CC4+eYwg7na1P/gEKlGsSYEd38eWNzILocB90T7TgE2NbMtGjtmJhPSkhVibYTz\nzgtNDAMHrvvc5puHjtvPPoNDDw1NEqWgri4khHT/LXfaKfxKHjWq8HEVW3Kfiki1KXYfwtbAvKTH\n84FtGtr5o4/CpKEePbI7SZzNRvX9Brfd1vC49Q02gMceg623Dl82CxfGE0s23n47NG117Jj++foh\nqKXY1BWXlSvh1Vd1hSDVq4mR/AWR+jWa9iuotraWmTNDOeLJk2uoyaLdqGfP0BSQb/X9Bs88E75c\nG9OyJdx+O1x5ZajB9PTT0Llz/mPKVFO/hA85JFz5vPRSiLcavPRS+LGx8cbFjkQkN4lEgkQikfPr\nzWP+CWhmnYAn3H3nNM/dCiTcfWT0+B2gn7svSNnP3Z3TTw9NGYMHZxfDG2/AKafAjBk5vok0vvwS\n9tgDfvWrcMvGzTfDNdfA3/9evPHuhx8ehsYed1zD+9xwQ+gDqZamo8suC1d5V19d7EhE8sPMcPeM\n59wXu8loLDAAwMz2AJakJoNk2fYf1ItjbYTG+g2acuaZ4cv2Jz8pzgS21atDR3dT/y1PPjk0h/3n\nPwUJq+jUfyDVLu5hpw8CLwE7mtk8MzvVzAaa2UAAdx8HvG9m7wHDgDMbOtb8+bB4cejwzFa+10bI\npN+gKb/4BTz8cPiVfvXVoZhaoUydCttsA1s02n0fmsFOOgluuqkwcRXTF1/AW2+FJVlFqlXco4yO\nd/fvuXtrd+/g7ne6+zB3H5a0z9nu3sXdd3H3qQ0dK5GAfv3CqKFc5KtjOXW+QXP06wevvBI6eLt2\nLdxQz2x+CZ9zTiiLXciEVQyTJ0OfPuHHg0i1KnaTUcayKVeRTj4SQkPzDZqjc2d48EF48slQabR7\n9/A4zgqt2SSEzp3Dr+YRI+KLpxSouUikjBJCthPSUuUjITSn36ApvXqFkUfDh4dyF717h8f57vNf\ntQpefjlcnWRq8ODKH4KqhCBSRglh6dLQOZyr5q6NkI9+g0zst19oRrryypCA6h/ny5Qp0K0bbLpp\ndjG1bAnPPpu/OErJokXwwQew227FjkSkuMomIfTr17wv4uasjfDRR6EtPR/9BpkwgyOOgJkzYcAA\nOOaY8Hj27OYfe/z47H8Jm625SqhEiURYX0P1i6TalU1CaE7/Qb1cm40uuwx++cvCzxlo2RJOPRXe\nfTdMDuvXLzQpNUeuTSMnnACvvRZiqTS5JEmRShT7xLR8MDN/+22ne/fmHeeKK8Kv3SFDMn/N1Knw\n05+GL8JCXB005q23YP/9w0inXGbTLl8ehpouWAAbbpj96y+/PBQKvPHG7F9bynbcER56SGsgSOUp\nt4lpGevWrfnH6NkzuyJ37nD++fDb3xY/GUCYg/HjH4dJbbl44QXYddfckgGECXUPPBCSQqWoX1M6\n2/pYIpWobBJCPjpys20yGjMmlM0+9dTmnztfrroK/vrXMEkvW8nLZebie9+Dgw6CO+7I/RilJnlN\naZFqV1X/G2y3Xfgi/eyzpvddtQouugiGDg1t+aVihx1CHaK//CX71+ZjaOXgweEKpVTKeDdXc5Ok\nSCWpqoTTeyvDAAAN5UlEQVSQzdoIN94Y5hwccED8cWXryivhlluyW+Zy8eLQD7L77s079+67hy/Q\n008v/3kJqWtKi1S7qkoIkFmz0cKF8Ic/wJ//XJiYstWxY6iBdO21mb9m0qQw47h16+af/6abQl2o\nm29u/rGKKd2a0iLVTAkhjdra8IVbygutX3YZ3HUXfPhhZvs3tFxmLtq2hUceCZ3tU6bk55jFkG5N\naZFqpoSQYtasUIX0qqsKF1MuttoqzI245prM9s/3WPsuXWDYsDBpbtGi/B23kNRcJLK2spmHkK84\nv/wSNtsstKmvv/66zx98cFin4Nxz83K6WC1aFMbQv/566DBvyMcfh2G7ixbBeuvlN4aLLgozqp98\nMv/HjlNdHWy5ZZhs19AyoiLlrmLnIeRLmzaw/fbp10Z4+unQrnxmg6sylJZ27eCss5qeaDdxYpjl\nHMcX9u9/DytWlN8qY02tKS1SjaouIUD6ZqPVq+GCC+BPf8pPx2uhnH9+WIqzsZIScTaNtGwZZvkO\nGxbWli4Xai4SWVcJjbAvnHQJ4bbbQlmHww4rTky52nTTkBRqa8M6CulMmBBvE9hWW4UZzMcdF5pg\nOnTI7Tgff5zdUNpu3XIvSDdhQhg4ICJrVF0fAoQO1iFDwlBMCKUYdtwx/MItx3o2y5aFTt5//GPd\nEgxz54a5Ax9/HP9ommuvhcceC6uPZXOVtWBB6BwfMSIs7ZmJL76AffbJbeGe1atDc9u77za9jKhI\nOcu2D6EqrxDqJ6e5hy/Jq6+GQw8tz2QAsNFGcMklYcLamDFrP1fIoZUXXwwvvQQXXphZvaUvvggz\nrm+8Efr3h3fegfbtMzvX8uWw887w1FNhIEA2pk4NVzFKBiJri70PwcwOMrN3zGyOmV2S5vkaM/vc\nzKZFt8vjjqldu1AtdO7c0Il8993l1yma6owzwmijV19de3sh28rN4J57woijkSMb3m/VqrAqXNeu\n4TN4443wONNkAKFA3623hvedbRkN9R+IpBdrQjCz9YAbgYOA7sDxZpaubukkd+8V3Qry1Vzfj3Dx\nxaENfsstC3HW+LRpE8pTX3HFmm3FKM2w6aZh0to556y7oM8334SEseOOIa7nnguPO3XK7VwHHgj7\n7huujLKhhCCSXtxXCH2A99x9rrt/DYwEDk+zX8HnivbsGZoqXn89LFVZCU49NayVMHlyePzuu6Et\nv3PnwsbRq1co/fGLX4Rf7+4wdmxoqhs+PLT7jx0bmnyaa+jQ0KH92muZ7Z/LmtIi1SLuPoStgXlJ\nj+cDqeXVHNjLzN4EPgQudPc0swTyq2fP0Ez0wAOhFEMlaN06zLC+/PLQYV4/O7kYpRl++Ut48UU4\n9tjQaf/FF2HOws9+lt942rUL/RCnnRaSe1Ojjl55JZQkyWZNaZFqEXdCyGRo0FSgg7uvMLODgTHA\nOuXGamtrv71fU1NDTU1NswLr2zc0axx3XLMOU3JOPDF88T77bGgaOeKI4sVy003hi/rYY0Nccc1k\nPuEEuO++kBguvbTxffNZ00mk1CQSCRKJRM6vj3XYqZntAdS6+0HR418Dde7eYJ1OM/sA6O3unyVt\ny+uw00o3alSYYPf++zBjBmy9dbEjit/cubDbbqE5qGvXhvfr2zf0sxx4YMFCEymabIedxp0QWgLv\nAj8GPgJeBY5399lJ+2wBfOLubmZ9gFHu3inlOEoIWairC+34q1aFoZzV4q9/hSeeCE1l6Zqlmrum\ntEi5KalaRu6+GjgbeAaYBTzk7rPNbKCZDYx2OwqYaWbTgeuACmvEKbwWLUKH+YUXFjuSwho0CJYu\nDWXB03n++eatKS1S6apyprJUrjffDKvczZix7lDiiy8OyaDUS5uL5EtJXSGIFNouu4QRToMHr/uc\n5h+INE5XCFJxVq4MNZ2GDg0lSSCsf7HttvDpp+VVzVakOXSFIFWvbdtQvfass8L8B4BEIn9rSotU\nKiUEqUj77Rf6Ei67LDxWc5FI05QQpGL96U8wenSYm6AJaSJNUx+CVLRRo+A3vwl9B3GsKS1Sykpq\nYlq+KCFIrtzDKnjrrbfuWhEilU4JQSTF0qVhlnK5lzgXyZYSgoiIABp2KiIiOVJCEBERQAlBREQi\nSggiIgIoIYiISEQJQUREACUEERGJKCGIiAighCAiIpFYE4KZHWRm75jZHDO7pIF9boief9PMesUZ\nj4iINCy2hGBm6wE3AgcB3YHjzaxbyj6HAF3cvStwOnBLXPGUskQiUewQYlXJ76+S3xvo/VWbOK8Q\n+gDvuftcd/8aGAkcnrLPYcA9AO4+BdjUzLaIMaaSVOn/KCv5/VXyewO9v2oTZ0LYGpiX9Hh+tK2p\nfbaJMSYREWlAnAkh0/KkqZX4VNZURKQIYit/bWZ7ALXuflD0+NdAnbtfm7TPrUDC3UdGj98B+rn7\ngpRjKUmIiOQgm/LXLWOM43Wgq5l1Aj4CjgWOT9lnLHA2MDJKIEtSkwFk94ZERCQ3sSUEd19tZmcD\nzwDrAXe4+2wzGxg9P8zdx5nZIWb2HrAcOCWueEREpHFlsWKaiIjEr+RnKmcyua1cmdlcM5thZtPM\n7NVix9NcZnanmS0ws5lJ2zYzs2fN7J9m9g8z27SYMTZHA++v1szmR5/hNDM7qJgxNoeZdTCziWb2\ntpm9ZWaDou1l/xk28t4q4vMzszZmNsXMppvZLDP7fbQ9q8+upK8Qoslt7wL7Ax8CrwHHu/vsogaW\nJ2b2AdDb3T8rdiz5YGZ9gWXAve6+c7Ttj8Aid/9jlNC/6+6XFjPOXDXw/q4Clrr70KIGlwdmtiWw\npbtPN7ONgDeAnxOacsv6M2zkvR1D5Xx+G7j7CjNrCbwAXEiY65XxZ1fqVwiZTG4rdxXTYe7uzwOL\nUzZ/O/kw+vvzggaVRw28P6iQz9DdP3b36dH9ZcBswlyhsv8MG3lvUDmf34robmtCv+1isvzsSj0h\nZDK5rZw58JyZvW5m/1vsYGKyRdLIsQVAJc5EPyeqxXVHOTanpBONDuwFTKHCPsOk9/ZKtKkiPj8z\na2Fm0wmf0UR3f5ssP7tSTwil256VH3u7ey/gYOCsqEmiYnlon6y0z/QWYDugJ/Bf4C/FDaf5oiaV\n0cBgd1+a/Fy5f4bRe3uE8N6WUUGfn7vXuXtPQrWHfc1sv5Tnm/zsSj0hfAh0SHrcgXCVUBHc/b/R\n34XAY4QmskqzIGq/xcy2Aj4pcjx55e6feAS4nTL/DM2sFSEZ3OfuY6LNFfEZJr23EfXvrdI+PwB3\n/xx4EuhNlp9dqSeEbye3mVlrwuS2sUWOKS/MbAMz2zi6vyFwIDCz8VeVpbHASdH9k4AxjexbdqL/\nyeodQRl/hmZmwB3ALHe/Lumpsv8MG3pvlfL5mVm7+uYuM2sLHABMI8vPrqRHGQGY2cHAdayZ3Pb7\nIoeUF2a2HeGqAMIEwfvL/b2Z2YNAP6Adob3ySuBxYBSwLTAXOMbdlxQrxuZI8/6uAmoIzQ0OfAAM\nTDfbvhyY2T7AZGAGa5oWfg28Spl/hg28t98QqieU/ednZjsTOo1bRLf73P1PZrYZWXx2JZ8QRESk\nMEq9yUhERApECUFERAAlBBERiSghiIgIoIQgIiIRJQQREQGUEKSMmNk3UYnimWY2KpqAk+lrd4nm\ntDS1X28zu755kWbPzA6tL+9uZj83s25Jz/3WzH5c6Jik+mgegpQNM1vq7vWzu0cAb7j7X5Oeb+nu\nqxt47cmEUuPnFCTYZjCzu4En3H10sWOR6qIrBClXzwNdzKyfmT1vZo8Db5nZ+mZ2l4WFh6aaWU1U\nw2YIcGx0hXG0mW1oYcGbKdF+hwFE+z8R3a+N9ploZv8ys7TJxMyWmdnQaOGV58ysXbS9p5m9ElXS\nfDSptMAgCwu1vGlmD0TbTjazv5nZnsChwJ+iuDqb2d1m9otovx9H22dE1TlbR9vnRvG+ET23Y5z/\n8aUyKSFI2YkWADmEUIYAQinjQe7+feBs4Bt370EoS1A/nf8KYKS793L3h4HLgPHuvjvwI8IX8AZp\nTrcDoc5UH+AqC4s2pdoAeM3ddwImEUpaANwLXOTuuxBq5NRvvwToGW3/VbTNAdz9ZUL9mQvdfVd3\nfz96zs2sDXAXofxAD0LJkzOSXr/Q3XsTKnhe2OR/SJEUSghSTtqa2TTCynlzgTsJi5u86u7/jvbZ\nGxgB4O7vAv8mfKnD2guhHAhcGh1vIrA+a1fWhfAl+6S7f+3unxIqRaarJ18HPBTdHwHsY2abAN+J\nFtWBkJj2je7PAB4wsxOBbxp4r6mLthiwI/CBu7+X5pgAj0Z/pwKdGjiuSINaFjsAkSysjNaP+FYo\nYsnylP0yXQHrSHefk3K8rVL2+Srp/jc0/f+Mkb7mfHJMPyV8kR8KXBYVJkuNOd0xUrelnmtVFnGK\nrENXCFJpngdOBDCzHQhVHt8BlgIbJ+33DDCo/oGZrZVo6jdneM4WwNHR/ROA5939C2BxVGUToD+Q\niMowb+vuCeBS4DvARinHWwpskrLNCeuLdzKz7ZOOOSnDGEWapIQg5aShX83J228GWpjZDMIa3CdF\n63FPBLrXdyoDvwNaRR2wbwG/TXOeTFcHWw70MbOZhHLYQ6LtJxH6Jt4EekTbWwL3RfFNBa6PFjRJ\nPtdI4KKog7jzt0G5ryIseP9w9PrVwK1p/tuU9apmUjwadirSTMnDYUXKma4QRJpPv6qkIugKQURE\nAF0hiIhIRAlBREQAJQQREYkoIYiICKCEICIiESUEEREB4P8BJ2luiNVtD6sAAAAASUVORK5CYII=\n", "text": [ "" ] } ], "prompt_number": 5 }, { "cell_type": "heading", "level": 2, "metadata": { "collapsed": true }, "source": [ "2) Translate DNA to protein" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this question we will translate RNA sequences to amino-acid (aa) sequences.\n", "\n", "`genetic_code` is a `dict` in which keys are RNA codons - triplets of base names (A, G, C, T) and the values are short names of aa (`Leu` for Leucine, etc.)." ] }, { "cell_type": "code", "collapsed": true, "input": [ "genetic_code = { \n", " 'UUU':'Phe', 'UUC':'Phe', 'UCU':'Ser', 'UCC':'Ser',\n", " 'UAU':'Tyr', 'UAC':'Tyr', 'UGU':'Cys', 'UGC':'Cys',\n", " 'UUA':'Leu', 'UCA':'Ser', 'UAA':None, 'UGA':None,\n", " 'UUG':'Leu', 'UCG':'Ser', 'UAG':None, 'UGG':'Trp',\n", " 'CUU':'Leu', 'CUC':'Leu', 'CCU':'Pro', 'CCC':'Pro',\n", " 'CAU':'His', 'CAC':'His', 'CGU':'Arg', 'CGC':'Arg',\n", " 'CUA':'Leu', 'CUG':'Leu', 'CCA':'Pro', 'CCG':'Pro',\n", " 'CAA':'Gln', 'CAG':'Gln', 'CGA':'Arg', 'CGG':'Arg',\n", " 'AUU':'Ile', 'AUC':'Ile', 'ACU':'Thr', 'ACC':'Thr',\n", " 'AAU':'Asn', 'AAC':'Asn', 'AGU':'Ser', 'AGC':'Ser',\n", " 'AUA':'Ile', 'ACA':'Thr', 'AAA':'Lys', 'AGA':'Arg',\n", " 'AUG':'Met', 'ACG':'Thr', 'AAG':'Lys', 'AGG':'Arg',\n", " 'GUU':'Val', 'GUC':'Val', 'GCU':'Ala', 'GCC':'Ala',\n", " 'GAU':'Asp', 'GAC':'Asp', 'GGU':'Gly', 'GGC':'Gly',\n", " 'GUA':'Val', 'GUG':'Val', 'GCA':'Ala', 'GCG':'Ala', \n", " 'GAA':'Glu', 'GAG':'Glu', 'GGA':'Gly', 'GGG':'Gly'}" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "**a)** Write a function called `protein_translation` that translates an RNA sequence to an aa sequence.\n", "\n", "Input: `seq` is a string of A, G, C and U (all uppercase).\n", "\n", "Output: `list` of strings of short names of aa." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def protein_translation(seq):\n", " \"\"\" This function translates a nucleic acid sequence into a\n", " protein sequence, until the end or until it comes across\n", " a stop codon\n", " \"\"\"\n", " \n", " seq = seq.replace('T','U') # Make sure we have RNA sequence\n", " proteinSeq = []\n", "\n", " i = 0\n", " while i+2 < len(seq):\n", " codon = seq[i:i+3]\n", " aminoAcid = genetic_code[codon]\n", "\n", " if aminoAcid is None: # Found stop codon\n", " break\n", "\n", " proteinSeq.append(aminoAcid)\n", " i += 3\n", "\n", " return proteinSeq\n", "\n", "dna_seq = 'ATGGTGCATCTGACTCCTGAGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTG'\n", "aa_seq = protein_translation(dna_seq)\n", "print(aa_seq)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "['Met', 'Val', 'His', 'Leu', 'Thr', 'Pro', 'Glu', 'Glu', 'Lys', 'Ser', 'Ala', 'Val', 'Thr', 'Ala', 'Leu', 'Trp', 'Gly', 'Lys', 'Val']\n" ] } ], "prompt_number": 7 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "3) Maize chloroplast" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**a)** Write a function called `fetch_gb_by_id` that receives a GenBank ID (such as `KF241981`) and returns a Biopython `SeqRecord` object of the corresponding result. Use it to fetch the maize chloroplast genome record. \n", "\n", "Assume the default settings, as shown in lecture 6. Ignore any warning messages from NCBI that might be displayed." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from Bio import Entrez\n", "from Bio import SeqIO" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 8 }, { "cell_type": "code", "collapsed": false, "input": [ "def fetch_gb_by_id(rec_id):\n", " handle = Entrez.efetch(db=\"nucleotide\", rettype=\"gb\", retmode=\"text\", id=rec_id)\n", " gb_record = SeqIO.read(handle, \"gb\")\n", " handle.close()\n", " return gb_record\n", "\n", "maize_chl = fetch_gb_by_id('KF241981')\n", "print(maize_chl.description)\n", "assert len(maize_chl.features) == 263" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Zea mays subsp. mays cultivar B73 chloroplast, complete genome.\n" ] } ], "prompt_number": 10 }, { "cell_type": "markdown", "metadata": {}, "source": [ "__b)__ Write a function called `extract_rRNA` that receives a `SeqRecord` object and extract its **gene** features. \n", "The function should return a `list` of `SeqRecord` objects of the corresponding sequences. \n", "\n", "Change the `description` field to have the corresponding gene name, followed by ` | ` and the sequence length. For example: `rrn22 | 1231`." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def extract_rRNA(gb_record):\n", " rRNAs = []\n", " for feat in gb_record.features:\n", " if feat.type == 'rRNA':\n", " name = feat.qualifiers['gene'][0]\n", " location = feat.location\n", " start = location.start\n", " end = location.end\n", " rRNA = gb_record[start:end]\n", " rRNA.description = name + ' | ' + str(len(rRNA))\n", " rRNAs.append(rRNA)\n", " return rRNAs\n", " \n", "maize_chl_rRNAs = extract_rRNA(maize_chl)\n", "print([x.description for x in maize_chl_rRNAs])" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "['rrn16 | 1492', 'rrn23 | 2888', 'rrn4.5 | 95', 'rrn5 | 121', 'rrn5 | 121', 'rrn4.5 | 95', 'rrn23 | 2888', 'rrn16 | 1492']\n" ] } ], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "__c)__ Print the rRNA sequences to the output file `maize_chloroplast_rRNAs.fasta` in __fasta__ format." ] }, { "cell_type": "code", "collapsed": false, "input": [ "out_file = \"maize_chloroplast_rRNAs.fasta\"\n", "SeqIO.write(maize_chl_rRNAs,out_file,'fasta')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 12, "text": [ "8" ] } ], "prompt_number": 12 } ], "metadata": {} } ] }