{
 "metadata": {
  "name": "",
  "signature": "sha256:b343f9ef4ed576d5ac685ae4237bd33849e1d7adc6eb6a9bdf4fbe9afeaec436"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## [Python Programming for Biologists, Tel-Aviv University / 0411-3122 / Spring 2015](http://py4life.github.io/TAU2015/)\n",
      "# Homework 2"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## 1) Bacteria growth\n",
      "The bacteria _P. pythonicus_ replicates every one hour, in a 100 ml tube. Being a very unfriendly bacteria, they reach stationary phase when there are 1,000,000 or more bacteria in the tube.  \n",
      "  \n",
      "**a)** Write a program that will calculate the number of bacteria after one hour, two hours, etc, until stationarity is reached. The program will receive the starter size (number of bacteria to begin with), and start calculating from there. At each time point, the following message should be printed:  \n",
      "< time > hours: < no. of bacteria > bacteria"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "starter = 5 # Replace ??? with a value of your choice.\n",
      "bacteria = starter\n",
      "time = 0\n",
      "\n",
      "while bacteria < 1000000:\n",
      "    print(time,'hours:',bacteria,'bacteria')\n",
      "    time = time + 1\n",
      "    bacteria = bacteria * 2   "
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0 hours: 5 bacteria\n",
        "1 hours: 10 bacteria\n",
        "2 hours: 20 bacteria\n",
        "3 hours: 40 bacteria\n",
        "4 hours: 80 bacteria\n",
        "5 hours: 160 bacteria\n",
        "6 hours: 320 bacteria\n",
        "7 hours: 640 bacteria\n",
        "8 hours: 1280 bacteria\n",
        "9 hours: 2560 bacteria\n",
        "10 hours: 5120 bacteria\n",
        "11 hours: 10240 bacteria\n",
        "12 hours: 20480 bacteria\n",
        "13 hours: 40960 bacteria\n",
        "14 hours: 81920 bacteria\n",
        "15 hours: 163840 bacteria\n",
        "16 hours: 327680 bacteria\n",
        "17 hours: 655360 bacteria\n"
       ]
      }
     ],
     "prompt_number": 1
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**b)** It turns out that the growth rate of _P. pythonicus_ is affected by temperature. It's replication time _r_, is a function of the temperature T, so that:  \n",
      "$r = \\frac{19 T (T - 70)}{2450} + 10$.  \n",
      "However, when the temperature is below 5, or over 50, the bacteria don't grow at all.  \n",
      "Write a program that will receive the starter size __and__ the growth temperature, and will calculate the time to reach stationarity, printing the number of bacteria at each time point (like in part a). If bacteria can't grow, print an appropriate message (and don't do any calculation)."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "starter = 5 # Replace ??? with a value of your choice.\n",
      "temp = 23 # Replace ??? with a value of your choice.\n",
      "\n",
      "# check temperature and calculate replication time\n",
      "if temp < 5 or temp > 50:\n",
      "    print(\"Bacteria can't grow in this temperature\")\n",
      "else:\n",
      "    r = (19 * temp * (temp - 70))/2450 + 10\n",
      "# calculate and growth\n",
      "    bacteria = starter\n",
      "    time = 0\n",
      "    while bacteria < 1000000:\n",
      "        print(time,'hours:',bacteria,'bacteria')\n",
      "        time = time + r\n",
      "        bacteria = bacteria * 2           "
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0 hours: 5 bacteria\n",
        "1.6167346938775502 hours: 10 bacteria\n",
        "3.2334693877551004 hours: 20 bacteria\n",
        "4.850204081632651 hours: 40 bacteria\n",
        "6.466938775510201 hours: 80 bacteria\n",
        "8.083673469387751 hours: 160 bacteria\n",
        "9.700408163265301 hours: 320 bacteria\n",
        "11.317142857142851 hours: 640 bacteria\n",
        "12.933877551020402 hours: 1280 bacteria\n",
        "14.550612244897952 hours: 2560 bacteria\n",
        "16.167346938775502 hours: 5120 bacteria\n",
        "17.784081632653052 hours: 10240 bacteria\n",
        "19.400816326530602 hours: 20480 bacteria\n",
        "21.017551020408153 hours: 40960 bacteria\n",
        "22.634285714285703 hours: 81920 bacteria\n",
        "24.251020408163253 hours: 163840 bacteria\n",
        "25.867755102040803 hours: 327680 bacteria\n",
        "27.484489795918353 hours: 655360 bacteria\n"
       ]
      }
     ],
     "prompt_number": 2
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## 2) Splicing out introns\n",
      "\n",
      "**a)** \n",
      "Here\u2019s a short section of genomic DNA:\n",
      "\n",
      "```\n",
      "ATCGATCGATCGATCGACTGACTAGTCATAGCTATGCATGTAGCTACTCGATCGATCGATCGATCGATCGATCGATCGATCGATCATGCTATCATCGATCGATATCGATGCATCGACTACTAT\n",
      "```\n",
      "\n",
      "It comprises two exons and an intron. The first exon runs from the start of the sequence to the sixty-third character, and the second exon runs from the ninety-first character to the end of the sequence. \n",
      "\n",
      "Write a program that will print just the coding regions of the DNA sequence."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "seq = 'ATCGATCGATCGATCGACTGACTAGTCATAGCTATGCATGTAGCTACTCGATCGATCGATCGATCGATCGATCGATCGATCGATCATGCTATCATCGATCGATATCGATGCATCGACTACTAT'\n",
      "\n",
      "## your code goes here\n",
      "exon1 = seq[:63]\n",
      "exon2 = seq[90:]\n",
      "coding = exon1 + exon2\n",
      "print(coding)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "ATCGATCGATCGATCGACTGACTAGTCATAGCTATGCATGTAGCTACTCGATCGATCGATCGAATCATCGATCGATATCGATGCATCGACTACTAT\n"
       ]
      }
     ],
     "prompt_number": 3
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**b)** \n",
      "Using the sequence from **a**, write a program that will calculate what percentage of the DNA sequence is coding."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "## your code goes here\n",
      "percent_coding = len(coding)/len(seq)*100\n",
      "print(percent_coding,\"%\",\"of the sequence is coding\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "77.23577235772358 % of the sequence is coding\n"
       ]
      }
     ],
     "prompt_number": 5
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**c)**\n",
      "Using the data from **a**, write a program that will print out the original genomic DNA sequence with coding bases in uppercase and non-coding bases in lowercase."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "## your code goes here\n",
      "intron = seq[63:90]\n",
      "print(exon1 + intron.lower() + exon2)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "ATCGATCGATCGATCGACTGACTAGTCATAGCTATGCATGTAGCTACTCGATCGATCGATCGAtcgatcgatcgatcgatcgatcatgctATCATCGATCGATATCGATGCATCGACTACTAT\n"
       ]
      }
     ],
     "prompt_number": 7
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## 3) Processing DNA in a file\n",
      "\n",
      "The list `sequences` contains a number of DNA sequences as strings. Each sequence starts with the same 14 base pair fragment \u2013 a sequencing adapter that should have been removed. \n",
      "\n",
      "Write a program that will trim this adapter and print the cleaned sequences to the screen. The program will then print the length of each cleaned sequence to the screen."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "sequences = ['ATTCGATTATAAGCTCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATC', \\\n",
      "'ATTCGATTATAAGCACTGATCGATCGATCGATCGATCGATGCTATCGTCGT', \\\n",
      "'ATTCGATTATAAGCATCGATCACGATCTATCGTACGTATGCATATCGATATCGATCGTAGTC', \\\n",
      "'ATTCGATTATAAGCACTATCGATGATCTAGCTACGATCGTAGCTGTA', \\\n",
      "'ATTCGATTATAAGCACTAGCTAGTCTCGATGCATGATCAGCTTAGCTGATGATGCTATGCA']\n",
      "\n",
      "## your code goes here\n",
      "for seq in sequences:\n",
      "    cleaned_seq = seq[14:]\n",
      "    print(cleaned_seq)\n",
      "    print(len(cleaned_seq))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "TCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATC\n",
        "42\n",
        "ACTGATCGATCGATCGATCGATCGATGCTATCGTCGT\n",
        "37\n",
        "ATCGATCACGATCTATCGTACGTATGCATATCGATATCGATCGTAGTC\n",
        "48\n",
        "ACTATCGATGATCTAGCTACGATCGTAGCTGTA\n",
        "33\n",
        "ACTAGCTAGTCTCGATGCATGATCAGCTTAGCTGATGATGCTATGCA\n",
        "47\n"
       ]
      }
     ],
     "prompt_number": 22
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## 4) Multiple exons from genomic DNA\n",
      "\n",
      "The string `genomic_dna` contains a section of genomic DNA.\n",
      "\n",
      "The list `exons` contains start/stop positions of exons. \n",
      "Each exon is a separate list (within the list of exons) with two elements: the start and stop positions. \n",
      "\n",
      "Write a program that will extract the exon segments from `genomic_dna` using the positions in `exons`, concatenate them, and print them to the screen."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "genomic_dna = 'TCGATCGTACCGTCGACGATGCTACGATCGTCGATCGTAGTCGATCATCGATCGATCGACTGATCGATCGATCGATCGATCGATATCGATCGATATCATCGATGCATCGATCATCGATCGATCGATCGATCGATCGATCATATGTCAGTCGATGCATCGTAGCATCGTATAGTAGCTACGTAGCTACGATCGATCGATCGATCGTAGCTAGCTAGCTAGATCGATCATCATCGTAGCTAGCTCGACTAGCTACGTACGATCGATGCATCGATCGTAGCTAGTACGATCGCGTAGCTAGCATGCTACGTAGATCGATCGATGCATGCTAGCTAGCTAGCTACGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGTAGCTAGCTACGATCGATGCTACGTAGATCGATCGCTAGTAGATCGATCGCTAGCTAGCTGACTAGTACGCTGCTAGTAGTCAGCTAGATCGATGCTAGTCA'\n",
      "exons = [[5, 58], [72, 133], [190, 276], [340, 398]] # [[start, stop], [start, stop], ...]\n",
      "\n",
      "## your code goes here\n",
      "coding = \"\"\n",
      "for exon in exons:\n",
      "    start = exon[0]\n",
      "    stop = exon[1]\n",
      "    exon_seq = genomic_dna[start:stop]\n",
      "    coding = coding + exon_seq\n",
      "print(coding)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "CGTACCGTCGACGATGCTACGATCGTCGATCGTAGTCGATCATCGATCGATCGCGATCGATCGATATCGATCGATATCATCGATGCATCGATCATCGATCGATCGATCGATCGACGATCGATCGATCGTAGCTAGCTAGCTAGATCGATCATCATCGTAGCTAGCTCGACTAGCTACGTACGATCGATGCATCGATCGTACGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGTAGCTAGCTACGATCG\n",
        "258\n"
       ]
      }
     ],
     "prompt_number": 9
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "\n",
      "## References\n",
      "\n",
      "Questions modified from [Python for Biologists](http://pythonforbiologists.com/index.php/introduction-to-python-for-biologists/lists-and-loops/), a great book by Martin Jones."
     ]
    }
   ],
   "metadata": {}
  }
 ]
}