{ "metadata": { "name": "", "signature": "sha256:c4ef5942519b0625d297c7d65f712b646926b88f0b6635d9bf3a00841197e06b" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#Software Overview \n", "\n", "This repository contains instructions for reproduction and extension of [Multi-tiered genomic analysis of head and neck cancer ties TP53 mutation to 3p loss]() by Gross et al. In general code for data-processing and computation is enclosed in standard python modules, while high level analyis was recorded in IPython Notebooks. The analysis for this project was relatively non-linear and has thus been split into a number of notebooks as described in [Analysis Notebooks](./Analysis_Notebooks#guide-to-running), but results should be able to be replicated by running these notebooks. \n", "\n", "__As of July 1, 2014 all error bars are off due to a [Pandas bug](https://github.com/pydata/pandas/issues/7643). They now show the difference between the mean and the lower bound as the uncertanty for the upper and lower bound rather than show the true 95% confidence interval... hopefully this will be addressed soon.__\n", "\n", "##Dependencies \n", "\n", "This code uses a number of features in the scientific python stack as well as a small set of standard R libraries. Thus far, this code has only been tested in a Linux enviroment, it may take some modification to run on other operating systems.\n", "\n", "I highly recomend installing a scientific Python distribution such as [Anaconda](http://continuum.io/) or [Enthought](https://www.enthought.com/) to handle the majority of the Python dependencies in this project (other than rPy2 and matplotlib_venn). These are both free for academic use.\n", "\n", "###Python Dependencies \n", "* [Numpy and Scipy](http://www.scipy.org/), numeric calculations and statistics in Python \n", "* [matplotlib](http://matplotlib.org/), plotting in Python\n", "* [Pandas](http://pandas.pydata.org/), data-frames for Python, handles the majority of data-structures \n", "* [statsmodels](http://statsmodels.sourceforge.net/), used for statstics \n", "* [scikit-learn](http://scikit-learn.org/stable/), used for supervised learning\n", "* [rPy2](http://rpy.sourceforge.net/rpy2.html), communication between R and Python \n", " * __NOT IN DISTRIBUTIONS__ \n", " * I recommend installing with `pip install rpy2` \n", " * Needs R to be compiled with shared libraries \n", "* [matplotlib_venn](https://pypi.python.org/pypi/matplotlib-venn) \n", " * __NOT IN DISTRIBUTIONS__ \n", " * I recommend installing with `pip install matplotlib_venn` \n", " * Only used for Venn diagrams, not essential\n", " \n", " \n", "###R Dependencies\n", "* Needs to be compiled with shared libraries to communicate with Python (_this can be tricky_)\n", "* Packages\n", " * base\n", " * survival\n", " * MASS\n", " \n", "###Command Line Dependencies \n", "* curl (http://curl.haxx.se/) for fetching urls" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "