{ "cells": [ { "cell_type": "markdown", "metadata": { "kernel": "SoS" }, "source": [ "## Creation of a SoS workflow from interactive analysis" ] }, { "cell_type": "markdown", "metadata": { "kernel": "SoS" }, "source": [ "## Basic Syntax" ] }, { "cell_type": "markdown", "metadata": { "kernel": "SoS" }, "source": [ "### Script format of function calls" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "kernel": "SoS" }, "outputs": [], "source": [ "res_file = 'test.pdf'" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "kernel": "SoS" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "null device \n", " 1 \n" ] } ], "source": [ "R(f'''\n", "pdf('{res_file}')\n", "plot(0, 0)\n", "dev.off()\n", "''', workdir='result')" ] }, { "cell_type": "markdown", "metadata": { "kernel": "SoS" }, "source": [ "is equivalent to" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "kernel": "SoS" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "null device \n", " 1 \n" ] } ], "source": [ "R: expand=True, workdir='result'\n", " pdf('{res_file}')\n", " plot(0, 0)\n", " dev.off() " ] }, { "cell_type": "markdown", "metadata": { "kernel": "SoS" }, "source": [ "Or with different sigil" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "kernel": "SoS" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "null device \n", " 1 \n" ] } ], "source": [ "R: expand='${ }', workdir='result'\n", " pdf('${res_file}')\n", " plot(0, 0)\n", " dev.off() " ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "kernel": "SoS" }, "outputs": [], "source": [ "[RNASeq_20 (QC)]\n", "\n", "parameter: fastq_files = list\n", "\n", "input: fastq_files, group_by=1\n", "depends: executable('fastqc')\n", "output: f'{_input:bn}_fastqc_html'\n", "\n", "print(f'Processing {_input}')\n", "\n", "task: walltime='30m'\n", "\n", "sh: expand=True\n", " fastqc {_input}" ] }, { "cell_type": "markdown", "metadata": { "kernel": "SoS" }, "source": [ "### Interactive data analysis" ] }, { "cell_type": "markdown", "metadata": { "kernel": "SoS" }, "source": [ "Interactive data analysis can be performed in cells with different kernels as follows. Because SoS is an extension to Python 3, you can use arbitrary Python statements in SoS cells." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "kernel": "SoS" }, "outputs": [], "source": [ "excel_file = 'data/DEG.xlsx'\n", "csv_file = 'DEG.csv'\n", "figure_file = 'output.pdf'" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "kernel": "Bash" }, "outputs": [], "source": [ "%expand\n", "xlsx2csv {excel_file} > {csv_file}" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "kernel": "R" }, "outputs": [ { "data": { "text/html": [ "pdf: 2" ], "text/latex": [ "\\textbf{pdf:} 2" ], "text/markdown": [ "**pdf:** 2" ], "text/plain": [ "pdf \n", " 2 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%expand\n", "data <- read.csv('{csv_file}')\n", "pdf('{figure_file}')\n", "plot(data$log2FoldChange, data$stat)\n", "dev.off()" ] }, { "cell_type": "markdown", "metadata": { "kernel": "R" }, "source": [ "### Convert to SoS actions" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "kernel": "SoS" }, "outputs": [], "source": [ "excel_file = 'data/DEG.xlsx'\n", "csv_file = 'DEG.csv'\n", "figure_file = 'output.pdf'" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "kernel": "SoS" }, "outputs": [], "source": [ "sh: expand=True\n", " xlsx2csv {excel_file} > {csv_file}" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "kernel": "SoS" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "null device \n", " 1 \n" ] } ], "source": [ "R: expand=True\n", " data <- read.csv('{csv_file}')\n", " pdf('{figure_file}')\n", " plot(data$log2FoldChange, data$stat)\n", " dev.off()" ] }, { "cell_type": "markdown", "metadata": { "kernel": "R" }, "source": [ "### Conversion to a SoS Workflow" ] }, { "cell_type": "markdown", "metadata": { "kernel": "R" }, "source": [ "SoS workflows within a SoS Notebook are defined by sections marked by section headers (`[name: option]`). A `[global]` section should be used for definitions that will be used by all steps.\n", "\n", "You also need to convert scripts to SoS actions so that they can be executed as **complete** scripts. Remember also to change the cell type from subkernel to SoS." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "kernel": "SoS" }, "outputs": [], "source": [ "[global]\n", "excel_file = 'data/DEG.xlsx'\n", "csv_file = 'DEG.csv'\n", "figure_file = 'output.pdf'" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "kernel": "SoS" }, "outputs": [], "source": [ "[plot_1 (convert)]\n", "sh: expand=True\n", " xlsx2csv {excel_file} > {csv_file}" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "kernel": "SoS" }, "outputs": [], "source": [ "[plot_2 (plot)]\n", "R: expand=True\n", " data <- read.csv('{csv_file}')\n", " pdf('{figure_file}')\n", " plot(data$log2FoldChange, data$stat)\n", " dev.off()" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "kernel": "SoS" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "null device \n", " 1 \n" ] }, { "data": { "text/html": [ "
INFO: Workflow plot (ID=e9a443b49c71e268) is executed successfully with 2 completed steps.
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%sosrun plot" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "kernel": "SoS" }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "#!/usr/bin/env sos-runner\n", "#fileformat=SOS1.0\n", "\n", "[RNASeq_20 (QC)]\n", "\n", "parameter: fastq_files = list\n", "\n", "input: fastq_files, group_by=1\n", "depends: executable('fastqc')\n", "output: f'{_input:bn}_fastqc_html'\n", "\n", "print(f'Processing {_input}')\n", "\n", "task: walltime='30m'\n", "\n", "sh: expand=True\n", " fastqc {_input}\n", "\n", "[global]\n", "excel_file = 'data/DEG.xlsx'\n", "csv_file = 'DEG.csv'\n", "figure_file = 'output.pdf'\n", "\n", "[plot_1 (convert)]\n", "sh: expand=True\n", " xlsx2csv {excel_file} > {csv_file}\n", "\n", "[plot_2 (plot)]\n", "R: expand=True\n", " data <- read.csv('{csv_file}')\n", " pdf('{figure_file}')\n", " plot(data$log2FoldChange, data$stat)\n", " dev.off()\n", "\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%preview --workflow" ] } ], "metadata": { "kernelspec": { "display_name": "SoS", "language": "sos", "name": "sos" }, "language_info": { "codemirror_mode": "sos", "file_extension": ".sos", "mimetype": "text/x-sos", "name": "sos", "nbconvert_exporter": "sos_notebook.converter.SoS_Exporter", "pygments_lexer": "sos" }, "sos": { "default_kernel": "SoS", "kernels": [ [ "Bash", "bash", "Bash", "#E6EEFF" ], [ "R", "ir", "R", "#DCDCDA" ], [ "SoS", "sos", "", "" ] ], "panel": { "displayed": false, "height": 0, "style": "side" }, "version": "0.16.11" } }, "nbformat": 4, "nbformat_minor": 2 }