{ "cells": [ { "cell_type": "markdown", "metadata": { "kernel": "SoS" }, "source": [ "## Creation of a SoS workflow from interactive analysis" ] }, { "cell_type": "markdown", "metadata": { "kernel": "SoS" }, "source": [ "## Basic Syntax" ] }, { "cell_type": "markdown", "metadata": { "kernel": "SoS" }, "source": [ "### Script format of function calls" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "kernel": "SoS" }, "outputs": [], "source": [ "res_file = 'test.pdf'" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "kernel": "SoS" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "null device \n", " 1 \n" ] } ], "source": [ "R(f'''\n", "pdf('{res_file}')\n", "plot(0, 0)\n", "dev.off()\n", "''', workdir='result')" ] }, { "cell_type": "markdown", "metadata": { "kernel": "SoS" }, "source": [ "is equivalent to" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "kernel": "SoS" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "null device \n", " 1 \n" ] } ], "source": [ "R: expand=True, workdir='result'\n", " pdf('{res_file}')\n", " plot(0, 0)\n", " dev.off() " ] }, { "cell_type": "markdown", "metadata": { "kernel": "SoS" }, "source": [ "Or with different sigil" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "kernel": "SoS" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "null device \n", " 1 \n" ] } ], "source": [ "R: expand='${ }', workdir='result'\n", " pdf('${res_file}')\n", " plot(0, 0)\n", " dev.off() " ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "kernel": "SoS" }, "outputs": [], "source": [ "[RNASeq_20 (QC)]\n", "\n", "parameter: fastq_files = list\n", "\n", "input: fastq_files, group_by=1\n", "depends: executable('fastqc')\n", "output: f'{_input:bn}_fastqc_html'\n", "\n", "print(f'Processing {_input}')\n", "\n", "task: walltime='30m'\n", "\n", "sh: expand=True\n", " fastqc {_input}" ] }, { "cell_type": "markdown", "metadata": { "kernel": "SoS" }, "source": [ "### Interactive data analysis" ] }, { "cell_type": "markdown", "metadata": { "kernel": "SoS" }, "source": [ "Interactive data analysis can be performed in cells with different kernels as follows. Because SoS is an extension to Python 3, you can use arbitrary Python statements in SoS cells." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "kernel": "SoS" }, "outputs": [], "source": [ "excel_file = 'data/DEG.xlsx'\n", "csv_file = 'DEG.csv'\n", "figure_file = 'output.pdf'" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "kernel": "Bash" }, "outputs": [], "source": [ "%expand\n", "xlsx2csv {excel_file} > {csv_file}" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "kernel": "R" }, "outputs": [ { "data": { "text/html": [ "pdf: 2" ], "text/latex": [ "\\textbf{pdf:} 2" ], "text/markdown": [ "**pdf:** 2" ], "text/plain": [ "pdf \n", " 2 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%expand\n", "data <- read.csv('{csv_file}')\n", "pdf('{figure_file}')\n", "plot(data$log2FoldChange, data$stat)\n", "dev.off()" ] }, { "cell_type": "markdown", "metadata": { "kernel": "R" }, "source": [ "### Convert to SoS actions" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "kernel": "SoS" }, "outputs": [], "source": [ "excel_file = 'data/DEG.xlsx'\n", "csv_file = 'DEG.csv'\n", "figure_file = 'output.pdf'" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "kernel": "SoS" }, "outputs": [], "source": [ "sh: expand=True\n", " xlsx2csv {excel_file} > {csv_file}" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "kernel": "SoS" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "null device \n", " 1 \n" ] } ], "source": [ "R: expand=True\n", " data <- read.csv('{csv_file}')\n", " pdf('{figure_file}')\n", " plot(data$log2FoldChange, data$stat)\n", " dev.off()" ] }, { "cell_type": "markdown", "metadata": { "kernel": "R" }, "source": [ "### Conversion to a SoS Workflow" ] }, { "cell_type": "markdown", "metadata": { "kernel": "R" }, "source": [ "SoS workflows within a SoS Notebook are defined by sections marked by section headers (`[name: option]`). A `[global]` section should be used for definitions that will be used by all steps.\n", "\n", "You also need to convert scripts to SoS actions so that they can be executed as **complete** scripts. Remember also to change the cell type from subkernel to SoS." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "kernel": "SoS" }, "outputs": [], "source": [ "[global]\n", "excel_file = 'data/DEG.xlsx'\n", "csv_file = 'DEG.csv'\n", "figure_file = 'output.pdf'" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "kernel": "SoS" }, "outputs": [], "source": [ "[plot_1 (convert)]\n", "sh: expand=True\n", " xlsx2csv {excel_file} > {csv_file}" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "kernel": "SoS" }, "outputs": [], "source": [ "[plot_2 (plot)]\n", "R: expand=True\n", " data <- read.csv('{csv_file}')\n", " pdf('{figure_file}')\n", " plot(data$log2FoldChange, data$stat)\n", " dev.off()" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "kernel": "SoS" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "null device \n", " 1 \n" ] }, { "data": { "text/html": [ "