{
 "metadata": {
  "name": "",
  "signature": "sha256:463e6fc1b91bae8bbeaf3fb943ed0adad3a6fca5b0eaec17f37fb02cae7f4c07"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "[![Py4Life](https://raw.githubusercontent.com/Py4Life/TAU2015/gh-pages/img/Py4Life-logo-small.png)](http://py4life.github.io/TAU2015/)\n",
      "## Lecture 2 - 18.3.2015\n",
      "### Last update: 17.3.2015\n",
      "### Tel-Aviv University / 0411-3122 / Spring 2015\n",
      "\n",
      "This notebook is still a draft."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from IPython.display import YouTubeVideo, HTML, Image"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Previously on Py4Life\n",
      "\n",
      "- Python\n",
      "- The IPython notebook\n",
      "- Variables (`int`, `float`, `bool`)\n",
      "- Operators (`+`, `-`, `*`, ..., `==`, `<`, ..., `and`, `or`, ...)\n",
      "- Conditional statements (`if`, `elif`, `else`)\n",
      "- While loops"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## In today's episode\n",
      "\n",
      "- Strings\n",
      "- Lists\n",
      "- Loops (`for`)\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Strings\n",
      "\n",
      "Strings are ordered collections of _characters_. \n",
      "\n",
      "### Ordered\n",
      "_Ordered collections_ means that elements are numbered with _indexes_: 0, 1, 2, 3, 4...  \n",
      "Note that the first index is 0, __not__ 1!"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "YouTubeVideo('kQC82okzTXI')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Characters\n",
      "\n",
      "Characters are textual symbols, like letters (`ABCDE...`), numerals (`12345`), punctuation marks (`,.?:&`), and even things like newline (`\\n`) and whitespace (` `)."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "![keyboard](http://cdn1.tnwcdn.com/wp-content/blogs.dir/1/files/2014/01/type-786x305.jpg)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Back to strings\n",
      "\n",
      "Most commonly, strings are used to work with _text_. \n",
      "\n",
      "We can _assign_ and _print_ strings:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "x = \"Py4Life\"\n",
      "y = 'I love python'\n",
      "print(x)\n",
      "print(y)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Strings are objects of type `str`:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "type(x)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can concat (\u05dc\u05e9\u05e8\u05e9\u05e8) strings:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print(x + \"2015\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can convert string to numbers and vice versa (if it is appropriate):"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "x = \"4\"\n",
      "y = int(x)\n",
      "print(\"y+1 =\", y + 1)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Otherwise, we get an error message..."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print(\"x+1 =\", x + 1)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "x = str(y)\n",
      "print(\"x =\", x)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "x = \"3.14\"\n",
      "y = float(x)\n",
      "print(\"y*2 =\", y * 2)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Why do we care about text in programming?\n",
      "- Sequences\n",
      "- Data in formated text files (lesson 4)\n",
      "- Free text"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Because we are biologists, strings are not just text, they are also __sequences__!\n",
      "\n",
      "![sequences](http://upload.wikimedia.org/wikipedia/commons/thumb/a/a7/WPP_domain_alignment.PNG/650px-WPP_domain_alignment.PNG)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "dna = \"ATGCGTA\"\n",
      "print(dna)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Again we can concat strings:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "upstream = \"AAA\"\n",
      "downstream = \"GGG\"\n",
      "dna = upstream + \"ATG\" + downstream\n",
      "print(dna)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can find the length of a string using the command `len`:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "n = len(dna)\n",
      "print(\"The length of the DNA variable is\", n)\n",
      "\n",
      "dna = dna + \"AGCTGA\"\n",
      "print(\"Now it is\", len(dna))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Just a moment, what was that...?"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print(dna)\n",
      "dna = dna + \"AGCTGA\"\n",
      "print(dna)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "also works with numbers:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "x = 10\n",
      "x = x + 7\n",
      "print(x)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### String slicing\n",
      "We can extract subsets of a string by using _slicing_, with the corresponding indexes.  \n",
      "Remember: string indexes start from __0__!"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can access specific indexes of the list (_starting from 0_)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "bacteria = 'Escherichia coli'"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# get the 1st and 6th letters\n",
      "print(bacteria[0])\n",
      "print(bacteria[5])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Indexes work from the tail as well, using negative numbers:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# get the last letter\n",
      "print(bacteria[-1])\n",
      "# get 5th letter from the end\n",
      "print(bacteria[-5])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can get a range of indexes using _\\[start:end\\]_"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# get the 3rd to 8th letters\n",
      "print(bacteria[2:8])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Notice that the _start_ position is included, but not the _end_ position. We actually take the character with indexes 2,3,4,5,6,7.\n",
      "And what do we get?"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "type(bacteria[2:8])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "There are shorts for taking the first and last characters:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# get the first 5 letters\n",
      "print(bacteria[0:5])\n",
      "# or simply:\n",
      "print(bacteria[:5])\n",
      "\n",
      "# get 3rd to last nucleotides:\n",
      "print(bacteria[3:])\n",
      "\n",
      "# last 3 nucleotides\n",
      "print(bacteria[-3:])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## <span style=\"color:blue\">Class exercise 2A</span>"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The sequence below (named _seq_) consists of 20 nucleotides.  \n",
      "1) Print the 2nd and 7th nucleotides.\n",
      "2) Print the 2nd nucleotide from the end.\n",
      "3) Slice the first half of the sequence.  \n",
      "4) Slice the second half of the sequence.  \n",
      "5) Slice the middle 10 nucleotides"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "seq = \"CAAGTAATGGCAGCCATTAA\"\n",
      "# print 2nd nucleotide\n",
      "print(seq[1])\n",
      "# print 7th nucleotide\n",
      "print(seq[6])\n",
      "# print 2nd nucleotide from the tail\n",
      "print(seq[-2])\n",
      "\n",
      "first_half = seq[:10]\n",
      "print(first_half)\n",
      "second_half = seq[10:]\n",
      "print(second_half)\n",
      "middle = seq[5:15]\n",
      "print(middle)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### String methods\n",
      "\n",
      "There are some methods (actions, commands) we can operate on strings. These are provoked using the '`.`' character.\n",
      "\n",
      "We can change a string to lowercase:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "dna = dna.lower()\n",
      "print(dna)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "And back to uppercase:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "dna = dna.upper()\n",
      "print(dna)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can replace characters:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "rna = dna.replace(\"T\", \"U\")\n",
      "print(rna)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "#### Count\n",
      "We can count characters. \n",
      "\n",
      "For example, let's count the number of histidine (`H`) and proline (`P`) in the [AA](http://upload.wikimedia.org/wikipedia/commons/a/a9/Amino_Acids.svg) (amino-acid) sequence of [Human Insulin](http://www.uniprot.org/blast/?about=P01308):"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "insulin = 'MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN'\n",
      "print(\"# of histidine:\", insulin.count('H'))\n",
      "print(\"# of proline:\", insulin.count('P'))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "#### Find\n",
      "We can find a substring within a string.\n",
      "For example, we can look for the character `D` in the insulin sequence."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "pos = insulin.index('D')\n",
      "print(pos)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "type(pos)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print(insulin[pos])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The result is the index (position) of the first `D` found in the sequence."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can also look for longer substrings, representing motiffs. For example, let's find the position of the Insulin [B-chain](http://www.uniprot.org/blast/?about=P01308[25-54]) in the entire peptide:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "b_chain = \"FVNQHLCGSHLVEALYLVCGERGFFYTPKT\"\n",
      "position = insulin.index(b_chain)\n",
      "print(\"Position:\", position)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print(len(b_chain))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "found = insulin[position:position + len(b_chain)] # slicing (notice the ':')\n",
      "print(b_chain == found)\n",
      "print(\"Original:\", b_chain)\n",
      "print(\"Found:   \", found)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "#### Split\n",
      "\n",
      "We can split a string on every occurence of a separator character:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "names = \"melanogaster,simulans,yakuba,ananassae\"\n",
      "species = names.split(\",\")\n",
      "print(species)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "What do we get?"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "type(species)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Lists\n",
      "\n",
      "Lists are similar to strings in being sequential, only they can contain any type of data, not just characters.\n",
      "\n",
      "This includes `int`, `float`, `bool`, `str`, and even `list`.  \n",
      "Lists could even include mixed variable types.\n",
      "\n",
      "We define a list just like any other variable, but use '[ ]' and ',' to separate _elements_."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# a list of strings\n",
      "apes = [\"Homo sapiens\", \"Pan troglodytes\", \"Pongo pygmaeus\"]\n",
      "print(apes)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "![Gorila](http://upload.wikimedia.org/wikipedia/commons/thumb/c/c0/Western_Lowland_Gorilla_at_Bronx_Zoo_2_cropped.jpg/338px-Western_Lowland_Gorilla_at_Bronx_Zoo_2_cropped.jpg)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# a list of numbers\n",
      "nums = [7,13,2,400]\n",
      "print(nums)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# a mixed list\n",
      "mixed = [12,'Mus musculus',True]\n",
      "print(mixed)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "You can access list elements just like strings, using indexes (starting from 0):"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print(\"Human:\", apes[0])\n",
      "print(\"Gorila:\", apes[-1])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Lists are dynamic - you can append, remove and insert into them. This is done using _list methods_, again using the '.':"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can access and change list elements."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "new_apes = apes[:] # make a copy of the apes list\n",
      "new_apes[2] = 'Hylobates lar'\n",
      "print(new_apes)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "This __does NOT__ work with strings though..."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print(dna)\n",
      "dna[5] = 'G'"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# add element to the end of the list\n",
      "apes.append(\"Gorilla gorilla\")\n",
      "print(apes)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# insert element at a given index\n",
      "apes.insert(2, \"Pan paniscus\")\n",
      "print(apes)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# remove element from list\n",
      "apes.remove(\"Pongo pygmaeus\")\n",
      "print(apes)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "To remove a list item by index:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# option 1\n",
      "apes.remove(apes[1])\n",
      "# option 2\n",
      "del(apes[1])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can concat lists, just like strings:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print(apes + [\"Pongo pygmaeus\", \"Pongo abelii\"])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "![Organutan](http://upload.wikimedia.org/wikipedia/commons/thumb/b/be/Orang_Utan%2C_Semenggok_Forest_Reserve%2C_Sarawak%2C_Borneo%2C_Malaysia.JPG/220px-Orang_Utan%2C_Semenggok_Forest_Reserve%2C_Sarawak%2C_Borneo%2C_Malaysia.JPG)\n",
      "\n",
      "Searching in lists is done using `index` (not `find`):"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "i = apes.index('Pan troglodytes')\n",
      "print(i)\n",
      "print(apes[i])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "You can also check if something is in a list (works as well for strings):"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "if 'Saguinus nigricollis' in apes:\n",
      "    print('Saguinus nigricollis is an ape')\n",
      "else:\n",
      "    print('Saguinus nigricollis is not an ape')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Lists of numbers\n",
      "\n",
      "Suppose we have a list of experimental measurements and we want to do basic statistics: count the number of results, calculate the average, and find the maximum and minimum."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "measurements = [33, 55,45,87,88,95,34,76,87,56,45,98,87,89,45,67,45,67,76,73,33,87,12,100,77,89,92]\n",
      "\n",
      "count = len(measurements)\n",
      "avg = sum(measurements) / len(measurements)\n",
      "maximum = max(measurements)\n",
      "minimum = min(measurements)\n",
      "\n",
      "print(count, \"measurements with average\", avg, \"maximum\", maximum, \"minimum\", minimum)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Sorting lists\n",
      "  \n",
      "We can sort lists using the `sorted` method.  \n",
      "If the list is made __entirely__ of numbers, then sorting is straightforward:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "sorted_measurements = sorted(measurements)\n",
      "print(sorted_measurements)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "A list of strings will be sorted lexicographically (think about the way '<' and '>' work on strings):"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "sorted_apes = sorted(apes)\n",
      "print(sorted_apes)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "But beware of mixed lists!"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "mixed = apes + measurements\n",
      "print(mixed)\n",
      "print(sorted(mixed))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### List of lists (nested lists)\n",
      "  \n",
      "List elements can be of any type, including lists!  \n",
      "For example:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "birds = ['Gallus gallus', 'Corvus corone', 'Passer domesticus']\n",
      "snakes = ['Ophiophagus hannah', 'Vipera palaestinae', 'Python bivittatus']\n",
      "animals = [apes,birds,snakes]\n",
      "print(animals)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We access lists of lists using double-indexes. For example, to get the 3rd snake:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print(animals[2][2])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Note that the elements of the outer list are __lists__ themselves, not strings. For example:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "type(animals[1])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### List slicing\n",
      "  \n",
      "We can slice lists just like we did with strings, to get partial lists.  \n",
      "For example:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# get the first 10 measurements\n",
      "print(measurements[:10])\n",
      "# get the last 3 measurements\n",
      "print(measurements[-3:])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## <span style=\"color:blue\">Class exercise 2B</span>\n",
      "Use the lists `birds` and `snakes` defined above to create a single list of strings with the animal names. Then add the string `Mus musculus` to the list. Finally, remove the `Corvus corone` from the list. Print the 2nd to 5th elements of the resulting list, sorted alphabetically."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# create list\n",
      "animals = birds + snakes\n",
      "# add Mus musculus\n",
      "animals.append('Mus musculus')\n",
      "# remove Corvus corone element\n",
      "animals.remove('Corvus corone')\n",
      "# print\n",
      "print(sorted(animals[1:5]))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Loops\n",
      "\n",
      "Say we want to print each element of our list:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print(apes[0], \"is an ape\")\n",
      "print(apes[1], \"is an ape\")\n",
      "print(apes[2], \"is an ape\")\n",
      "print(apes[3], \"is an ape\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "but this is very repetitive and relies on us knowing the number of elements in the list. What we need is a way to say something along the lines of \u201cfor each element in the list of apes, print out the element, followed by the words \u2018 is an ape\u2019\u201c. Python\u2019s loop syntax allows us to express those instructions like this:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "for ape in apes:\n",
      "    print(ape, \"is an ape\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "![Python loop](http://2.bp.blogspot.com/-7lXe1_Gou3k/UX92PWche3I/AAAAAAAAAFA/JxD4u8St-9g/s1600/python+loop.jpg)\n",
      "\n",
      "A more complex loop will go over each ape name and print some stats:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "for ape in apes:\n",
      "    name_length = len(ape)\n",
      "    first_letter = ape[0]\n",
      "    print(ape, \"is an ape. Its name starts with\", first_letter)\n",
      "    print(\"Its name has\", name_length, \"letters\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can also loop over a string. \n",
      "\n",
      "Let's go over the Insulin AA sequnce and count the number of prolines manualy:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "count = 0\n",
      "for aa in insulin:\n",
      "    if aa == \"P\":\n",
      "        count = count + 1\n",
      "print(\"# of prolines:\", count)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Can you remember another way of doing this?"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Let's count how many measurements (see above) are above the average:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print(measurements)\n",
      "print(avg)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "over = 0\n",
      "for x in measurements:\n",
      "    if x > avg:\n",
      "        over = over + 1\n",
      "print(over, \"measurements are over the average.\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## <span style=\"color:blue\">Class exercise 2C</span>"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "1) Complete the code below to count the _ratio_ of electrically-charged amino acids in the Insulin sequence."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "charged = ['R','H','K','D','E']\n",
      "\n",
      "charged_count = 0\n",
      "for aa in insulin:\n",
      "    if aa in charged:\n",
      "        charged_count += 1\n",
      "\n",
      "insulin_length = len(insulin)\n",
      "charged_ratio = charged_count/insulin_length\n",
      "print(\"Ratio of charged amino acids is:\",charged_ratio)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Using `range`\n",
      "\n",
      "Sometimes we want to loop over consecutive numbers.\n",
      "\n",
      "This is accomplished using the `range` command.\n",
      "\n",
      "`range` accepts one, two, or three arguments: the bottom and upper limits and the step size.  \n",
      "The bottom limit can be omited - default is zero - and the step can be omited - default is 1.  \n",
      "The upper limit is __not__ included."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "for i in range(10): # aka range(0,10,1)\n",
      "    print(i)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "for i in range(10,20):\n",
      "    print(i, end=' ') # prints ending with space instead of newline"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "for i in range(100,1000,10):\n",
      "    print(i, end=' ')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Let's check if the number `n` is a prime number - that is, it can only be divided by 1 and itself:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "n = 97 # try other numbers\n",
      "divider = 1\n",
      "\n",
      "for k in range(2,n): # why start at 2? can we choose a different limit to range? a different step perhaps?\n",
      "    if n % k == 0:\n",
      "        divider = k\n",
      "if divider != 1:\n",
      "    print(n, \"is divided by\", divider)\n",
      "else:\n",
      "    print(n, \"is a prime number\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can also use `range()` to loop on a list. This is useful in some cases."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "for i in range(len(apes)):\n",
      "    print(apes[i])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## <span style=\"color:blue\">Class exercise 2D</span>"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### 1) Restriction fragment lengths\n",
      "\n",
      "Here\u2019s a short DNA sequence:\n",
      "\n",
      "```ACTGATCGATTACGTATAGTAGAATTCTATCATACATATATATCGATGCGTTCAT```\n",
      "\n",
      "The sequence contains a recognition site for the EcoRI restriction enzyme, which cuts at the motif `G*AATTC` (the position of the cut is indicated by an asterisk). Write a program which will calculate the size of the two fragments that will be produced when the DNA sequence is digested with EcoRI.\n",
      "\n",
      "(from [Python for Biologists](http://pythonforbiologists.com/index.php/introduction-to-python-for-biologists/2-printing-and-manipulating-text/))"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "fragments = seq.split('GAATTC')\n",
      "f1_length = len(fragments[0]) + 1 # add 1 for the 'G'\n",
      "f2_length = len(fragments[1]) + 5 # add 5 for the 'AATTC'\n",
      "print('Fragment lengths of',f1_length,'and',f2_length,'will be produced.')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### 2) Complementing DNA\n",
      "\n",
      "Write a program that will print the complement of the sequence above.\n",
      "\n",
      "(from [Python for Biologists](http://pythonforbiologists.com/index.php/introduction-to-python-for-biologists/2-printing-and-manipulating-text/))"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "complement = ''\n",
      "for base in seq:\n",
      "    if base == 'A':\n",
      "        complement = complement + 'T'\n",
      "    elif base == 'T':\n",
      "        complement = complement + 'A' \n",
      "    elif base == 'G':\n",
      "        complement = complement + 'C'\n",
      "    elif base == 'C':\n",
      "        complement = complement + 'G'    \n",
      "    else:\n",
      "        print(\"Bad base:\", base)\n",
      "print(\"Complement:\", complement)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### 3) Loop over ape pictures\n",
      "Go over the `ape_pics` list and display the pics using the command `display(Image(url=<url string>))`. \n",
      "Before each pic print the name of that ape from the `apes` list."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "ape_pics = ['http://upload.wikimedia.org/wikipedia/commons/thumb/6/68/Akha_cropped_hires.JPG/330px-Akha_cropped_hires.JPG', 'http://upload.wikimedia.org/wikipedia/commons/thumb/6/62/Schimpanse_Zoo_Leipzig.jpg/330px-Schimpanse_Zoo_Leipzig.jpg', 'http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Bonobo_0155.jpg/330px-Bonobo_0155.jpg', 'http://upload.wikimedia.org/wikipedia/commons/thumb/c/c0/Western_Lowland_Gorilla_at_Bronx_Zoo_2_cropped.jpg/338px-Western_Lowland_Gorilla_at_Bronx_Zoo_2_cropped.jpg']"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from IPython.display import YouTubeVideo, HTML, Image, display\n",
      "for i in range(len(apes)):\n",
      "    print(apes[i])\n",
      "    display(Image(url=ape_pics[i]))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Extra resources\n",
      "\n",
      "- Python for Biologists: [Strings](http://pythonforbiologists.com/index.php/introduction-to-python-for-biologists/2-printing-and-manipulating-text/), [Lists and loops](http://pythonforbiologists.com/index.php/introduction-to-python-for-biologists/lists-and-loops/)\n",
      "- Software carpentry: [Strings](http://software-carpentry.org/v4/python/strings.html), [Lists](http://software-carpentry.org/v4/python/lists.html)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Fin\n",
      "This notebook is part of the _Python Programming for Life Sciences Graduate Students_ course given in Tel-Aviv University, Spring 2015.\n",
      "\n",
      "Part of this notebook was adapted from the [Lists and Loops](http://pythonforbiologists.com/index.php/introduction-to-python-for-biologists/lists-and-loops/) chapter in Martin Jones's _Python for Biologists_ book.\n",
      "\n",
      "The notebook was written using [Python](http://pytho.org/) 3.4.1 and [IPython](http://ipython.org/) 2.1.0 (download from [PyZo](http://www.pyzo.org/downloads.html)).\n",
      "\n",
      "The code is available at https://github.com//Py4Life/TAU2015/blob/master/lecture2.ipynb.\n",
      "\n",
      "The notebook can be viewed online at http://nbviewer.ipython.org/github//Py4Life/TAU2015/blob/master/lecture2.ipynb.\n",
      "\n",
      "The notebook is also available as a PDF at https://github.com/Py4Life/TAU2015/blob/master/lecture2.pdf?raw=true.\n",
      "\n",
      "This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.\n",
      "\n",
      "![Python logo](https://www.python.org/static/community_logos/python-logo.png)"
     ]
    }
   ],
   "metadata": {}
  }
 ]
}