{ "cells": [ { "cell_type": "markdown", "id": "0a3b715a-560d-463f-a7f4-aea8ec65a262", "metadata": { "tags": [] }, "source": [ "# Snakemake Overview\n", "\n", "A Snakemake workflow is defined in terms of rules that are written in a file named Snakefile (similar to a Makefile with GNU Make). Rules consist of a name, input file(s), output file(s), and a shell command to generate the output from the input. Dependencies between rules are handled implicitly, by matching filenames of input files against output files.\n", "\n", "## A first workflow\n", "\n", "To illustrate the use of Snakemake, we will test if a book follows the Zipf law: an empirical law which states that given a large sample of words, the frequency of any word is inversely proportional to its rank in the frequency table.\n", "\n", "The first rule (named `count_words`) will take a book stored in text file as input and generate a list of words sorted by the number of occurrences in the book and a second rule (named `fit_zipf`) will try to fit the data from previous step to check if it follows the Zipf law. The final step (rule named `plot_zipf`) will generate a graph from the generated data.\n", "\n", "These 3 rules are in the file named `Snakefile` that will be used to provide the workflow description to Snakemake.\n", "\n", "
plot_zipf rule. By default Snakemake executes the first rule in the Snakefile, thus, the rule that produce the final result should be the first rule.\n",
"