{ "metadata": { "name": "", "signature": "sha256:54bca3dd18934c9795cdc1aeead88da84f0583ff1e83c382ac4e6e20f8d8dd8c" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Investigating the code and data used in the [paper by Qi in 2006][qi_evaluation_2006]. See if I can figure out what's going on.\n", "\n", "[qi_evaluation_2006]: http://onlinelibrary.wiley.com/doi/10.1002/prot.20865/full" ] }, { "cell_type": "code", "collapsed": false, "input": [ "ls" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\u001b[0m\u001b[40m\u001b[m\u001b[34;42m0yeast_gene_list\u001b[0m/ \u001b[40m\u001b[m\u001b[34;42m12homology-PPI\u001b[0m/ \u001b[40m\u001b[m\u001b[34;42m2tf-binding\u001b[0m/ \u001b[40m\u001b[m\u001b[34;42m5essentiality\u001b[0m/ \u001b[40m\u001b[m\u001b[34;42m8nature-compare-sequence\u001b[0m/ \u001b[40m\u001b[m\u001b[00mBatch_feature_summary_ExtractWrapper.pl\u001b[0m \u001b[40m\u001b[m\u001b[34;42mtrain-set\u001b[0m/\r\n", "\u001b[40m\u001b[m\u001b[34;42m10mips-phenotype\u001b[0m/ \u001b[40m\u001b[m\u001b[34;42m13domain-interaction\u001b[0m/ \u001b[40m\u001b[m\u001b[34;42m3gene-ontology\u001b[0m/ \u001b[40m\u001b[m\u001b[34;42m6HighExp-PPI\u001b[0m/ \u001b[40m\u001b[m\u001b[34;42m9mips-pclass\u001b[0m/ \u001b[40m\u001b[m\u001b[00mInvestigating qi_evaluation_2006.ipynb\u001b[0m\r\n", "\u001b[40m\u001b[m\u001b[34;42m11sequence-similarity\u001b[0m/ \u001b[40m\u001b[m\u001b[34;42m1gene-expression\u001b[0m/ \u001b[40m\u001b[m\u001b[34;42m4protein-expression\u001b[0m/ \u001b[40m\u001b[m\u001b[34;42m7genetic-interaction\u001b[0m/ \u001b[40m\u001b[m\u001b[00mBatch_feature_ExtractWrapper.pl\u001b[0m \u001b[40m\u001b[m\u001b[00mREADME\u001b[0m\r\n" ] } ], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Should start by looking at the README, probably." ] }, { "cell_type": "code", "collapsed": false, "input": [ "cat README" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "-------------------------------------\r", "\r\n", "\r", "\r\n", "copyright @ Yanjun Qi , qyj@cs.cmu.edu\r", "\r\n", "\r", "\r\n", "-------------------------------------\r", "\r\n", "\r", "\r\n", "Please cite: \r", "\r\n", "\r", "\r\n", "==> Major: \r", "\r\n", "Y. Qi, Z. Bar-Joseph, J. Klein-Seetharaman, \"Evaluation of different biological data and computational classification methods for use in protein interaction prediction\", PROTEINS: Structure, Function, and Bioinformatics. 63(3):490-500. 2006\r", "\r\n", "\r", "\r\n", "==> Two related papers: \r", "\r\n", "Y. Qi, J. Klein-Seetharaman, Z. Bar-Joseph, \"A mixture of feature experts approach for protein-protein interaction prediction\", BMC Bioinformatics 8 (S10):S6, 2007 \r", "\r\n", "Y. Qi, J. Klein-Seetharaman, Z. Bar-Joseph, \ufffdRandom Forest Similarity for Protein-Protein Interaction Prediction from Multiple source\ufffd, Pacific Symposium on Biocomputing 10: (PSB 2005) Jan. 2005. \r", "\r\n", "\r", "\r\n", "\r", "\r\n", "---------------------------------------\r", "\r\n", "\r", "\r\n", "\r", "\r\n", "You can find the detailed description in URL: \r", "\r\n", "\r", "\r\n", "http://www.cs.cmu.edu/~qyj/papers_sulp/proteins05_PPI.html\r", "\r\n", "http://www.cs.cmu.edu/~qyj/papers_sulp/proteins05_pages/feature-download.html\r", "\r\n", "\r", "\r\n", "\r", "\r\n", "---------------------------------------\r", "\r\n", "\r", "\r\n", "\r", "\r\n", "The files we used to extract features for yeast protein pairs \r", "\r\n", "were downloaded from different related papers. \r", "\r\n", "You should be able to find more recent/upated new version \r", "\r\n", "( than those shared in this directory ) from the internet.\r", "\r\n", "\r", "\r\n", "---------------------------------------\r", "\r\n", "\r", "\r\n", "HOW TO USE?\r", "\r\n", "\r", "\r\n", "- You could use codes and ORF_list in ./0yeast_gene_list\r", "\r\n", " to generate your reference protein pairs list \r", "\r\n", " \r", "\r\n", "- You could use the scripts \r", "\r\n", " \"Batch_feature_summary_ExtractWrapper.pl\" and \"Batch_feature_ExtractWrapper.pl\"\r", "\r\n", " to generate features for a file containing a list of yeast pairs. \r", "\r\n", " \r", "\r\n", "- In each line of this file, \r", "\r\n", " ORF1\tORF2\t1/0 \r", "\r\n", " The last item means this is positive protein-protein interaction pair(1) or not (0). \r", "\r\n" ] } ], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "So what we want to do is the second thing, generating features for a file containing a list of yeast pairs. So we should have a look at these scripts:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%%bash\n", "head -n 40 ./Batch_feature_ExtractWrapper.pl" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "######################################################################3\r\n", "#\r\n", "# copyright @ Yanjun Qi , qyj@cs.cmu.edu\r\n", "# Please cite: \r\n", "# Y. Qi, Z. Bar-Joseph, J. Klein-Seetharaman, \"Evaluation of different biological data and computational classification methods for use in protein interaction prediction\", PROTEINS: Structure, Function, and Bioinformatics. 63(3):490-500. 2006\r\n", "# Y. Qi, J. Klein-Seetharaman, Z. Bar-Joseph, \"A mixture of feature experts approach for protein-protein interaction prediction\", BMC Bioinformatics 8 (S10):S6, 2007 \r\n", "# Y. Qi, J. Klein-Seetharaman, Z. Bar-Joseph, \ufffdRandom Forest Similarity for Protein-Protein Interaction Prediction from Multiple source\ufffd, Pacific Symposium on Biocomputing 10: (PSB 2005) Jan. 2005. \r\n", "# \r\n", "######################################################################3\r\n", "\r\n", "\r\n", "# This program is a yeast PPI feature extraction wrapper \r\n", "# perl command inputPairlist\r\n", "\r\n", "\r\n", "use strict; \r\n", "die \"Usage: command inputPairFile \\n\" if scalar(@ARGV) < 1;\r\n", "my ($inputPair ) = @ARGV;\r\n", "\r\n", "\r\n", "print \"\\n--------------------------- 1gene-expression ----------------------------------------\\n\"; \r\n", "\r\n", "# ------------------- 1gene-expression ------------------------------\r\n", "\r\n", "my $cmdPre = \"perl ./1gene-expression/get_gene_expression.pl \"; \r\n", "my $cmdPro = \"./1gene-expression/YeastGeneListOrfGeneName-106_pval_v9.0.txt ./1gene-expression/all_expression_fixed_s4_csv.txt ./1gene-expression/expressionYanjunSplit.txt 0.6 \"; \r\n", "\r\n", "my $cmd = $cmdPre.\" \".$inputPair.\" \".$cmdPro.\" \".$inputPair.\".genexp\" ; \r\n", "print \"$cmd\\n\"; \r\n", "system($cmd); \r\n", "\r\n", "\r\n", "\r\n", "print \"\\n-------------------------------------------------------------------\\n\"; \r\n", "\r\n", "# ------------------- 2tf-group-binding ------------------------------\r\n", "\r\n", "# perl -d get_tfGroupBinding.pl ./lists/sciencesubset.txt pvalbygene_nature04.txt 204_pvalbygene_nature04_TFs.groupIndex 0.05 ./lists/sciencesubset.tfgroup\r\n", "\r\n", "my $cmdPre = \"perl ./2tf-binding/get_tfGroupBinding.pl \"; \r\n" ] } ], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, looks like it's a perl script that calls different perl scripts to create feature vectors for each protein interactions. Ok, so if this does that then where is the code for the various types of classification done in the paper?\n", "\n", "Had a look for them and couldn't find them." ] } ], "metadata": {} } ] }