{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Survey of other packages\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Because Python is an extendable language, it affords us to use domain specific packages. We have used Numpy for numerical computations, SciPy for special functions, statistics, and other scientific applications, Pandas for handling data sets, Bokeh for low-level plotting, HoloViews for high-level plotting, and Panel for dashboards.\n", "\n", "There are **plenty** of other Python-based packages that can be useful in computing in the biological sciences, and hopefully you will write (and share) some of your own for your applications.\n", "\n", "There are countless useful Python packages for scientific computing. Here, I am highlighting just a few. Actually, I am highlighting only ones I have come across and used in my own work. There are many, many more very high quality packages out there fore various domain specific applications that I am not covering here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data science\n", "
\n", "\n", "#### Dask\n", "\n", "[Dask](https://dask.org) allows for out-of-core computation with large data structures. For example, if your data set is too large to fit in RAM, thereby precluding you from using a Pandas data frame, you can use a Dask data frame, which will handle the out-of-core computing for you, and your data type will look an awful lot like a Pandas data frame. It also handles parallelization of calculations on large data sets.\n", "\n", "\n", "#### xarray\n", "\n", "[xarray](https://xarray.pydata.org/) extends the concepts of Pandas data frames to more dimensions. It is convenient for organizing, accessing, and computing with more complex data structures." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plotting\n", "
\n", "\n", "We have used Bokeh and HoloViews for plotting. Tthe landscape for Python plotting libraries is large. Here, I discuss a few other packages I have used.\n", "\n", "#### Altair\n", "\n", "[Altair](https://altair-viz.github.io) is a very nice plotting package that generates plots using [Vega-Lite](https://vega.github.io/vega-lite/). It is high level and declarative. The plots are rendered using JavaScript and have some interactivity.\n", "\n", "#### Matplotlib\n", "\n", "[Matplotlib](https://matplotlib.org) is really *the* main plotting library for Python. It is the most fully featured and most widely used. It has some high-level functionality, but is primarily a lower level library for building highly customizable graphics.\n", "\n", "#### Seaborn\n", "\n", "[Seaborn](https://seaborn.pydata.org) is a high-level statistical plotting package build on top of Matplotlib. I find its grammar clean and accessible; you can quickly make beautiful, informative graphics with it." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bioinformatics\n", "
\n", "\n", "#### Bioconda\n", "\n", "[Bioconda](https://bioconda.github.io) is not a Python package, but is a channel for the conda package manager that has many (7000+) bioinformatics packages. Most of these packages are not available through the default conda channel. This allows use of conda to keep all of your bioinformatics packages installed and organized.\n", "\n", "#### Biopython\n", "\n", "[Biopython](https://biopython.org) is a widely used package for parsing bioinformatics files of various flavors, managing sequence alignments, etc.\n", "\n", "#### scikit-bio\n", "\n", "[scikit-bio](http://scikit-bio.org) has similar functionality as Biopython, but also includes some algorithms as well, for example for alignment and making phylogenetic trees." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Image processing\n", "
\n", "\n", "#### scikit-image\n", "\n", "We haven't covered image processing in the main portion of the lessons, but it is discussed in recitation 5. The main package used there is [scikit-image](https://scikit-image.org), which has many classic image processing operations included.\n", "\n", "\n", "#### DeepCell\n", "\n", "These days, the state-of-the-art image segmentation tools use deep learning methods. [DeepCell](https://www.deepcell.org) is developed at Caltech in the [Van Valen lab](http://www.vanvalen.caltech.edu), and is an excellent cell segmentation tool." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Machine learning\n", "
\n", "\n", "Python is widely used in machine learning applications, largely because it so easily wraps compiled code written in C or C++.\n", "\n", "#### scikit-learn\n", "\n", "[scikit-learn](https://scikit-learn.org/) is a widely used machine learning package for Python that does many standard machine learning tasks such as classification, clustering, dimensionality reduction, etc.\n", "\n", "#### TensorFlow\n", "\n", "[TensorFlow](https://www.tensorflow.org) is an extensive library for computation in machine learning developed by Google. It is especially effective for deep learning. It has a Python API.\n", "\n", "#### Keras\n", "\n", "In practice, you might rarely use TensorFlow's core functionality, but rather use [Keras](https://keras.io) to build deep learning models. Keras has an intuitive API and allows you to rapidly get up and running with deep learning.\n", "\n", "\n", "#### PyTorch\n", "\n", "[PyTorch](https://pytorch.org) is a library similar to TensorFlow." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Statistics\n", "
\n", "\n", "In addition to the scipy.stats package, there are many packages for statistical analysis in the Python ecosystem.\n", "\n", "#### statsmodels\n", "\n", "[statsmodels](https://www.statsmodels.org/) has extensive functionality for computing hypothesis tests, kernel density estimation, regression, time series analysis, and much more.\n", "\n", "#### PyMC3\n", "\n", "[PyMC3](https://docs.pymc.io) is a probabilistic programming package primarily used for performing Markov chain Monte Carlo. It relies on [Theano](http://deeplearning.net/software/theano/), which is no longer actively developed. [PyMC4](https://github.com/pymc-devs/pymc4) will use TensorFlow, but this will result in a new API.\n", "\n", "#### Stan/PyStan/CmdStanPy\n", "\n", "[Stan](https://mc-stan.org) is a probabilistic programming language that uses state-of-the-art algorithms for Markov chain Monte Carlo and Bayesian inference. It is its own language, and you can access Stan models through two Python interfaces, [PyStan](https://pystan.readthedocs.io) and [CmdStanPy](https://cmdstanpy.readthedocs.io). I prefer to use the latter, which is a much more lightweight interface.\n", "\n", "#### ArviZ\n", "\n", "[ArviZ](https://arviz-devs.github.io/arviz/) is a wonderful packages that generates output of various Bayesian inference packages in a unified format using [xarray](https://xarray.pydata.org/). Using ArviZ, you can use whatever MCMC package you like, and your downstream analysis will always use the same syntax." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## More...\n", "
\n", "\n", "### Pyserial\n", "\n", "[pySerial](https://pythonhosted.org/pyserial/) is a useful package for communication with external devices using a serial port. If you are designing your own instruments for research and wish to control them with your computer via Python, you will almost certainly use this package." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Numba\n", "\n", "[Numba](https://numba.pydata.org) is a Python package for [just-in-time compilation](https://en.wikipedia.org/wiki/Just-in-time_compilation). The result is often greatly accelerated Python code, even beyond what Numpy can provide. It particularly excels when you have loops in your Python code." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }