{ "metadata": { "name": "", "signature": "sha256:335876f4c7ce4b35357ada350e35c1201e78e3cb51723e1656f1912ddde7818c" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "In the [previous version of this notebook][prev] attempted to run the CPAN module for InterologWalk.\n", "There were problems installing this and getting it to run locally.\n", "It turned out that it had already been run over a large set of proteins at Edinburgh and that the output file was available, which makes this task much easier.\n", "\n", "Looking at this file and loading it:\n", "\n", "[prev]: http://nbviewer.ipython.org/github/ggray1729/opencast-bio/blob/5a37db51e897f54ca3eaecd1d3d7ab22531adaea/notebooks/Extracting%20InterologWalk%20features.ipynb" ] }, { "cell_type": "code", "collapsed": false, "input": [ "cd ../../InterologWalk/" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "/home/gavin/Documents/MRes/InterologWalk\n" ] } ], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "ls" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\u001b[0m\u001b[40m\u001b[m\u001b[00mIW_entrez.csv\u001b[0m\r\n" ] } ], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "!head IW_entrez.csv" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "5610\t3667\r", "\r\n", "5610\t3667\r", "\r\n", "197\t265\r", "\r\n", "197\t266\r", "\r\n", "197\t10117\r", "\r\n", "57326\t1147\r", "\r\n", "57326\t1147\r", "\r\n", "335\t3778\r", "\r\n", "335\t3778\r", "\r\n", "335\t3778\r", "\r\n" ] } ], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "import csv" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating dictionary and feature object\n", "\n", "As was done with the [STRING notebook][string] we will create a `ocbio.ppipred.features` object to store the dictionary of interactions.\n", "This can then be pickled and loaded when assembling feature vectors.\n", "\n", "[string]: http://nbviewer.ipython.org/github/ggray1729/opencast-bio/blob/master/notebooks/Extracting%20STRING%20results-based%20features.ipynb" ] }, { "cell_type": "code", "collapsed": false, "input": [ "f = open(\"IW_entrez.csv\")\n", "featuredict = {}\n", "for line in csv.reader(f,delimiter=\"\\t\"):\n", " featuredict[frozenset(line)] = ['1']\n", "f.close()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 6 }, { "cell_type": "code", "collapsed": false, "input": [ "import sys" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 8 }, { "cell_type": "code", "collapsed": false, "input": [ "sys.path.append(\"../opencast-bio/\")" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 9 }, { "cell_type": "code", "collapsed": false, "input": [ "import ocbio.ppipred" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 10 }, { "cell_type": "code", "collapsed": false, "input": [ "features = ocbio.ppipred.features(featuredict,1)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Testing\n", "\n", "Testing with arbitrary keys:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "realkey = featuredict.keys()[0]\n", "fakekey = frozenset([\"1234\",\"4321\"])" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 12 }, { "cell_type": "code", "collapsed": false, "input": [ "features[realkey]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 13, "text": [ "['1']" ] } ], "prompt_number": 13 }, { "cell_type": "code", "collapsed": false, "input": [ "features[fakekey]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 14, "text": [ "['0']" ] } ], "prompt_number": 14 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pickling\n", "\n", "Finally, we pickle this instance so that it can be accessed by the assembler to create feature vectors:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pickle" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 15 }, { "cell_type": "code", "collapsed": false, "input": [ "f = open(\"human.interologwalk.features.pickle\",\"wb\")\n", "pickle.dump(features,f)\n", "f.close()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 16 } ], "metadata": {} } ] }