{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Test `alignparse.ccs.Summaries`\n", "Tests this class and makes sure it works with and without reports and `np` tags giving number of passes.\n", "This Jupyter notebook is designed to be run with `nbval` for the testing.\n", "\n", "First, import Python modules:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import contextlib\n", "import os\n", "import tempfile\n", "import warnings\n", "\n", "import Bio.SeqIO\n", "\n", "import pandas as pd\n", "\n", "import alignparse.ccs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hide warnings that clutter output:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "warnings.simplefilter('ignore')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Data frame giving the `ccs` report file and the CCS FASTQ file for each run:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "run_names = ['recA_lib-1', 'recA_lib-2']\n", "ccs_dir = '../notebooks/input_files'\n", "\n", "ccs_df = pd.DataFrame(\n", " {'name': run_names,\n", " 'report': [f\"{ccs_dir}/{name}_report.txt\" for name in run_names],\n", " 'fastq': [f\"{ccs_dir}/{name}_ccs.fastq\" for name in run_names]\n", " })" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create an `alignparse.ccs.Summaries` object:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "recA_lib-1\n", "recA_lib-2\n" ] } ], "source": [ "summaries = alignparse.ccs.Summaries(ccs_df)\n", "\n", "for summary in summaries.summaries:\n", " print(summary.name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Confirm ZMW stats exist:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "summaries.has_zmw_stats()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Get and plot the ZMW stats:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | name | \n", "status | \n", "number | \n", "fraction | \n", "
|---|---|---|---|---|
| 0 | \n", "recA_lib-1 | \n", "Success -- CCS generated | \n", "139 | \n", "0.837349 | \n", "
| 1 | \n", "recA_lib-1 | \n", "Failed -- Lacking full passes | \n", "19 | \n", "0.114458 | \n", "
| 2 | \n", "recA_lib-1 | \n", "Failed -- Draft generation error | \n", "3 | \n", "0.018072 | \n", "
| 3 | \n", "recA_lib-1 | \n", "Failed -- Min coverage violation | \n", "1 | \n", "0.006024 | \n", "
| 4 | \n", "recA_lib-1 | \n", "Failed -- CCS below minimum RQ | \n", "2 | \n", "0.012048 | \n", "
| 5 | \n", "recA_lib-1 | \n", "Failed -- Other reason | \n", "2 | \n", "0.012048 | \n", "
| 6 | \n", "recA_lib-2 | \n", "Success -- CCS generated | \n", "124 | \n", "0.794872 | \n", "
| 7 | \n", "recA_lib-2 | \n", "Failed -- Lacking full passes | \n", "22 | \n", "0.141026 | \n", "
| 8 | \n", "recA_lib-2 | \n", "Failed -- Draft generation error | \n", "4 | \n", "0.025641 | \n", "
| 9 | \n", "recA_lib-2 | \n", "Failed -- Min coverage violation | \n", "2 | \n", "0.012821 | \n", "
| 10 | \n", "recA_lib-2 | \n", "Failed -- CCS below minimum RQ | \n", "2 | \n", "0.012821 | \n", "
| 11 | \n", "recA_lib-2 | \n", "Failed -- Other reason | \n", "2 | \n", "0.012821 | \n", "