{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Advanced: Sending PatMatch output directly to Python \n", "\n", "The conclusion of the [previous notebook, 'PatMatch with more Python'](PatMatch%20with%20more%20Python.ipynb), alluded to the fact there was a way to further leverage the computational environment afforded by the Jupyter ecosystem to skip over needing to save a file to actually pass results from shell scripts into Python. This notebook will demonstrate several of those.\n", "\n", "## Preparing\n", "\n", "Similar to the previous notebook, in order to insure everything is all set, act as if this is a new session in this Jupyter environment, and run the next cell so that you can step through the preparation steps to get a sequence file and prepare it. Plus, you'll get the file for script to convert it to dataframe and import its the main function. We don't need to run PatMatch yet because we will be addressing that part of the workflow in this notebook. \n", "\n", "Repeating these steps if you had already done so this session will cause no harm, and so go ahead and run this cell." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!curl -O http://sgd-archive.yeastgenome.org/sequence/S288C_reference/chromosomes/fasta/chrmt.fsa\n", "!perl ../patmatch_1.2/unjustify_fasta.pl chrmt.fsa\n", "!curl -O https://raw.githubusercontent.com/fomightez/sequencework/master/patmatch-utilities/patmatch_results_to_df.py\n", "from patmatch_results_to_df import patmatch_results_to_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Running Patmatch and passing the results to Python without creating an output file intermediate\n", "\n", "#### Option 1\n", "\n", "First we'll do one of the several methods I have found to do this and show how to go to the next step entirely. This first example uses an approach illustrated [here](https://stackoverflow.com/a/42703609/8508004). The result that is returned is of the type `IPython.utils.text.SList`, which offers some handy utility attributes associated with it as detailed [here](http://ipython.readthedocs.io/en/stable/api/generated/IPython.utils.text.html#IPython.utils.text.SList)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "output = !perl ../patmatch_1.2/patmatch.pl -c \"DDWDWTAWAAGTARTADDDD\" chrmt.fsa.prepared " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That performed the matching step without generating an intermediate file. Next, run the next cell to combine this with use of main function of `patmatch_results_to_df.py` as illustrated in the [previous notebook, 'PatMatch with more Python'](PatMatch with more Python.ipynb)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#patmatch_results_to_df.py with the `.n` attribute of IPython.utils.text.SList\n", "my_pattern= \"DDWDWTAWAAGTARTADDDD\"\n", "df = patmatch_results_to_df(output.n, pattern=my_pattern, name=\"promoter\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This option has the distinction that all the steps in it can be combined in one cell of the notebook. This is because the `!` approach makes a temporary shell outside of the notebook environment to which it sends the command after the exclamation site. After that command is processed and any communication handled, that subshell where it ran is discarded. Given the transient nature of this environment you'll find `!cd` never seems to work as they discuss [here](http://ipython.readthedocs.io/en/stable/interactive/magics.html), search `!cd` to find the pertinent section where they also show you the line magics solution. ([Here](https://jakevdp.github.io/PythonDataScienceHandbook/01.05-ipython-and-shell-commands.html) is another good resource in this regard.)\n", "\n", "This option can be worked into a utility function that includes the preparation step. This is very handy if trying a lot of patterns and/or sequences because could easily be called in a loop and supplied with different parameters. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def patmatch_query(pattern, seq_file, name):\n", " '''\n", " function that will work in Jupyter lab notebooks to run Patmatch query.\n", " \n", " Requires pattern and sequence file. Plus a name to use to refer to \n", " instances of pattern in output.\n", " \n", " Returns dataframe of results.\n", " '''\n", " from patmatch_results_to_df import patmatch_results_to_df\n", "\n", " !perl ../patmatch_1.2/unjustify_fasta.pl {seq_file}\n", " output = !perl ../patmatch_1.2/patmatch.pl -c {pattern} {seq_file+\".prepared\"}\n", " #!rm {seq_file+\".prepared\"} # OPTIONAL deleting of prepared file.\n", " df = patmatch_results_to_df(output.n, pattern=pattern, name=name)\n", " return df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(I commented out deleting the preparation file in the function in the above cell because I want to use it later on in this notebook, but it would be useful to include normally to not generate a lot of uncessary files.)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = patmatch_query(my_pattern, \"chrmt.fsa\", \"promoter\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "\n", "The additional options for passing the results demonstrated below instead rely on cell magics, and so the output needs to be captured from a cell before the next steps can be undertaken in an additional cell. While not a big deal that extra cells are involved, I find the `!` approach can be nice for streamlining things when making mini-pipelines/workflows using the Jupyter environment as a 'glue' to merge Python and command line use. \n", "\n", "The additional options for passing the results will be stepped through in a manner similar to 'Option 1' using different approaches for step 1 each time and step #2 varied to handle as necessary.\n", "\n", "#### Option 2\n", "\n", "In this option, [`%%capture` cell magics](http://ipython.readthedocs.io/en/stable/interactive/magics.html#cellmagic-capture) is used, and then using the attributes of the `utils.cpature` object you can easily get the stdout and/or stderr as a string, see [here]( http://ipython.readthedocs.io/en/stable/api/generated/IPython.utils.capture.html)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%capture out\n", "!perl ../patmatch_1.2/patmatch.pl -c \"DDWDWTAWAAGTARTADDDD\" chrmt.fsa.prepared" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#patmatch_results_to_df.py with the `.stdout` attribute of `utils.cpature`\n", "my_pattern= \"DDWDWTAWAAGTARTADDDD\"\n", "df = patmatch_results_to_df(out.stdout, pattern=my_pattern, name=\"promoter\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Option 3\n", "\n", "In this option, a varation of `%%bash` cell magic is used to send the output to a variable as illustrated [here](https://stackoverflow.com/a/24776049/8508004)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash --out output\n", "perl ../patmatch_1.2/patmatch.pl -c \"DDWDWTAWAAGTARTADDDD\" chrmt.fsa.prepared" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#patmatch_results_to_df.py by passing the stdout to a varible using %%bash cell magic\n", "my_pattern= \"DDWDWTAWAAGTARTADDDD\"\n", "df = patmatch_results_to_df(output, pattern=my_pattern, name=\"promoter\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "------\n", "\n", "The next advanced notebook, [Using brackets or other strange characters to make complex patterns on command line or with Python](Using%20brackets%20or%20other%20strange%20characters%20to%20make%20complex%20patterns%20on%20command%20line%20or%20with%20Python.ipynb), builds on one of the options here to show what you need to do when making more complex patterns for matching." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }