{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "**Source of the materials**: Biopython cookbook (adapted)\n", "Status: Draft" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Appendix: Useful stuff about Python {#sec:appendix}\n", "===================================\n", "\n", "If you haven’t spent a lot of time programming in Python, many questions\n", "and problems that come up in using Biopython are often related to Python\n", "itself. This section tries to present some ideas and code that come up\n", "often (at least for us!) while using the Biopython libraries. If you\n", "have any suggestions for useful pointers that could go here, please\n", "contribute!\n", "\n", "What the heck is a handle? {#sec:appendix-handles}\n", "--------------------------\n", "\n", "Handles are mentioned quite frequently throughout this documentation,\n", "and are also fairly confusing (at least to me!). Basically, you can\n", "think of a handle as being a “wrapper” around text information.\n", "\n", "Handles provide (at least) two benefits over plain text information:\n", "\n", "1. They provide a standard way to deal with information stored in\n", " different ways. The text information can be in a file, or in a\n", " string stored in memory, or the output from a command line program,\n", " or at some remote website, but the handle provides a common way of\n", " dealing with information in all of these formats.\n", "\n", "2. They allow text information to be read incrementally, instead of all\n", " at once. This is really important when you are dealing with huge\n", " text files which would use up all of your memory if you had to load\n", " them all.\n", "\n", "Handles can deal with text information that is being read (e. g. reading\n", "from a file) or written (e. g. writing information to a file). In the\n", "case of a “read” handle, commonly used functions are `read()`, which\n", "reads the entire text information from the handle, and `readline()`,\n", "which reads information one line at a time. For “write” handles, the\n", "function `write()` is regularly used.\n", "\n", "The most common usage for handles is reading information from a file,\n", "which is done using the built-in Python function `open`. Here, we open a\n", "handle to the file [m\\_cold.fasta](data/m_cold.fasta) (also\n", "available online\n", "[here](http://biopython.org/DIST/docs/tutorial/examples/m_cold.fasta)):\n", "\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "\">gi|8332116|gb|BE037100.1|BE037100 MP14H09 MP Mesembryanthemum crystallinum cDNA 5' similar to cold acclimation protein, mRNA sequence\\n\"" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "handle = open(\"data/m_cold.fasta\", \"r\")\n", "handle.readline()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Handles are regularly used in Biopython for passing information to\n", "parsers. For example, since Biopython 1.54 the main functions in\n", "`Bio.SeqIO` and `Bio.AlignIO` have allowed you to use a filename instead\n", "of a handle:\n", "\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "gi|8332116|gb|BE037100.1|BE037100 1111\n" ] } ], "source": [ "from Bio import SeqIO\n", "for record in SeqIO.parse(\"data/m_cold.fasta\", \"fasta\"):\n", " print(record.id, len(record))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "On older versions of Biopython you had to use a handle, e.g.\n", "\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "gi|8332116|gb|BE037100.1|BE037100 1111\n" ] } ], "source": [ "from Bio import SeqIO\n", "handle = open(\"data/m_cold.fasta\", \"r\")\n", "for record in SeqIO.parse(handle, \"fasta\"):\n", " print(record.id, len(record))\n", "handle.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "This pattern is still useful - for example suppose you have a gzip\n", "compressed FASTA file you want to parse:\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```python\n", "import gzip\n", "from Bio import SeqIO\n", "handle = gzip.open(\"m_cold.fasta.gz\")\n", "for record in SeqIO.parse(handle, \"fasta\"):\n", " print(record.id, len(record))\n", "handle.close()\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "See Section \\[sec:SeqIO\\_compressed\\] for more examples like this,\n", "including reading bzip2 compressed files.\n", "\n", "### Creating a handle from a string\n", "\n", "One useful thing is to be able to turn information contained in a string\n", "into a handle. The following example shows how to do this using\n", "`cStringIO` from the Python standard library:\n", "\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A string\n", " with multiple lines.\n" ] } ], "source": [ "my_info = 'A string\\n with multiple lines.'\n", "print(my_info)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A string\n", "\n" ] } ], "source": [ "from io import StringIO\n", "my_info_handle = StringIO(my_info)\n", "first_line = my_info_handle.readline()\n", "print(first_line)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " with multiple lines.\n" ] } ], "source": [ "second_line = my_info_handle.readline()\n", "print(second_line)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.1" } }, "nbformat": 4, "nbformat_minor": 0 }