{ "cells": [ { "cell_type": "markdown", "id": "64a2805e-db3f-4a4b-995a-4feceb1cbdfd", "metadata": {}, "source": [ "This notebook contains an example of combining checkpoint use inside a input function in Snakemake 8.18.2. The notebook was based on a [work](https://edwards.flinders.edu.au/how-to-use-snakemake-checkpoints/) written on an outdated version of the library.\n", "\n", "Acknowledgments: I would like to thank [Wayne](https://stackoverflow.com/users/8508004/wayne) for his help in implementation." ] }, { "cell_type": "code", "execution_count": 1, "id": "7974e5c5-26cf-4386-919b-f84dced59129", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "8.18.2\n" ] } ], "source": [ "! snakemake --version" ] }, { "cell_type": "markdown", "id": "49434f37-5ac4-403c-9639-342552ae7e61", "metadata": {}, "source": [ "# Case" ] }, { "cell_type": "code", "execution_count": 1, "id": "ccf705e8-214c-4b87-8f43-0341eb027957", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "OUTDIR = \"first_directory\"\n", "SNDDIR = \"second_directory\"\n", "\n", "SMP = None\n", "\n", "def get_file_names(wildcards):\n", " ck_output = checkpoints.make_five_files.get(**wildcards).output[0]\n", " global SMP\n", " SMP, = glob_wildcards(os.path.join(ck_output, \"{sample}.txt\"))\n", " return expand(os.path.join(ck_output, \"{SAMPLE}.txt\"), SAMPLE=SMP)\n", "\n", "def get_second_files(wildcards):\n", " ck_output = checkpoints.make_five_files.get(**wildcards).output[0]\n", " SMP2, = glob_wildcards(os.path.join(ck_output, \"{sample}.txt\"))\n", " return expand(os.path.join(SNDDIR, \"{SM}.tsv\"), SM=SMP2)\n", "\n", "\n", "rule all:\n", " input: \n", " get_second_files,\n", " expand(\"list_of_files_{SAMPLE}.txt\",SAMPLE=SMP)\n", "\n", "\n", "checkpoint make_five_files:\n", " output:\n", " directory(OUTDIR)\n", " params:\n", " o = OUTDIR\n", " shell:\n", " \"\"\"\n", " mkdir {output};\n", " for D in $(seq 1 5); do\n", " touch {params.o}/$RANDOM.txt\n", " done\n", " \"\"\"\n", "\n", "rule copy_files:\n", " input:\n", " get_file_names\n", " output:\n", " os.path.join(SNDDIR, \"{SAMPLE}.tsv\")\n", " shell:\n", " \"\"\"\n", " touch {output}\n", " \"\"\"\n", "\n", "\n", "rule list_all_files:\n", " input:\n", " expand(os.path.join(SNDDIR, \"{SAMPLE}.tsv\"), SAMPLE=SMP)\n", " output:\n", " out = \"list_of_files_{SAMPLE}.txt\"\n", " shell:\n", " \"\"\"\n", " echo {input} > {output.out}\n", " \"\"\"" ] } ], "source": [ "! cat Snakefile" ] }, { "cell_type": "markdown", "id": "ab91e06b-9db4-4560-ba84-687a3fa71236", "metadata": {}, "source": [ "In the source code, the {SAMPLE} wildcard uses the value of the SMP variable. However, SMP is initially defined as None, which may cause the code to not work correctly. When a checkpoint is executed, the SMP variable changes and must maintain a list of file names in the OUTDIR directories." ] }, { "cell_type": "code", "execution_count": 3, "id": "62be524f-82b8-4552-be03-2cb0e14d78f5", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[33mAssuming unrestricted shared filesystem usage.\u001b[0m\n", "\u001b[33mBuilding DAG of jobs...\u001b[0m\n", "\u001b[33mcandidate job all\n", " wildcards: \u001b[0m\n", "\u001b[33mcandidate job make_five_files\n", " wildcards: \u001b[0m\n", "\u001b[33mselected job make_five_files\n", " wildcards: \u001b[0m\n", "\u001b[33mcandidate job list_all_files\n", " wildcards: SAMPLE=None\u001b[0m\n", "\u001b[33mcandidate job copy_files\n", " wildcards: SAMPLE=None\u001b[0m\n", "\u001b[33mcandidate job make_five_files\n", " wildcards: \u001b[0m\n", "\u001b[33mselected job make_five_files\n", " wildcards: \u001b[0m\n", "\u001b[33mselected job copy_files\n", " wildcards: SAMPLE=None\u001b[0m\n", "\u001b[33mselected job list_all_files\n", " wildcards: SAMPLE=None\u001b[0m\n", "\u001b[33mselected job all\n", " wildcards: \u001b[0m\n", "\u001b[33mcandidate job copy_files\n", " wildcards: SAMPLE=None\u001b[0m\n", "\u001b[33mselected job copy_files\n", " wildcards: SAMPLE=None\u001b[0m\n", "\u001b[33mcandidate job all\n", " wildcards: \u001b[0m\n", "\u001b[33mcandidate job copy_files\n", " wildcards: SAMPLE=10536\u001b[0m\n", "\u001b[33mselected job copy_files\n", " wildcards: SAMPLE=10536\u001b[0m\n", "\u001b[33mcandidate job copy_files\n", " wildcards: SAMPLE=16393\u001b[0m\n", "\u001b[33mselected job copy_files\n", " wildcards: SAMPLE=16393\u001b[0m\n", "\u001b[33mcandidate job copy_files\n", " wildcards: SAMPLE=16815\u001b[0m\n", "\u001b[33mselected job copy_files\n", " wildcards: SAMPLE=16815\u001b[0m\n", "\u001b[33mcandidate job copy_files\n", " wildcards: SAMPLE=2323\u001b[0m\n", "\u001b[33mselected job copy_files\n", " wildcards: SAMPLE=2323\u001b[0m\n", "\u001b[33mcandidate job copy_files\n", " wildcards: SAMPLE=6362\u001b[0m\n", "\u001b[33mselected job copy_files\n", " wildcards: SAMPLE=6362\u001b[0m\n", "\u001b[33mcandidate job list_all_files\n", " wildcards: SAMPLE=None\u001b[0m\n", "\u001b[33mselected job list_all_files\n", " wildcards: SAMPLE=None\u001b[0m\n", "\u001b[33mselected job all\n", " wildcards: \u001b[0m\n", "\u001b[33mNothing to be done (all requested files are present and up to date).\u001b[0m\n" ] } ], "source": [ "!snakemake -c 1 --debug-dag" ] }, { "cell_type": "markdown", "id": "ab27d67f-1333-447f-b153-692b31a8dc8a", "metadata": {}, "source": [ "The logs show that after executing make_five_files, the {SAMPLE} wildcard gets the correct values (for example, 10536, 16393, etc.). copy_files and list_all_files are selected multiple times with SAMPLE=None, which is not correct behavior." ] }, { "cell_type": "code", "execution_count": 5, "id": "d9b3b80d-8704-4e42-acf5-74d93e4f878a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "SM2.ipynb Snakefile first_directory list_of_files_None.txt second_directory\n" ] } ], "source": [ "!ls" ] }, { "cell_type": "code", "execution_count": 2, "id": "d5c90c1e-d1c4-4558-a7d2-4dad4b4b6f37", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10536.txt 16393.txt 16815.txt 2323.txt 6362.txt\n" ] } ], "source": [ "!ls first_directory/" ] }, { "cell_type": "code", "execution_count": 4, "id": "d6fbf95b-931a-4416-ad02-2f3ce254f80e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10536.tsv 16393.tsv 16815.tsv 2323.tsv 6362.tsv None.tsv\n" ] } ], "source": [ "!ls second_directory/" ] }, { "cell_type": "markdown", "id": "6a6130e9", "metadata": {}, "source": [ "# Solution" ] }, { "cell_type": "markdown", "id": "e103c78f-706f-453a-aa15-9bf266c5de22", "metadata": {}, "source": [ "Using a function that returns a list of file names from a directory. The OUTDIR directory is used as input in the all rule." ] }, { "cell_type": "code", "execution_count": 6, "id": "97aa1f60", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "import os\n", "\n", "OUTDIR = \"first_directory\"\n", "\n", "\n", "def get_txt_files(wildcards):\n", " ck_output = checkpoints.make_five_files.get(**wildcards).output[0]\n", " print([file.split(\".\")[0] for file in os.listdir(ck_output) if file.endswith('.txt')])\n", " return [file.split(\".\")[0] for file in os.listdir(ck_output) if file.endswith('.txt')]\n", "\n", "\n", "rule all:\n", " input: \n", " OUTDIR,\n", " expand(f\"{OUTDIR}/\"+\"{SMP}.doc\", SMP=get_txt_files)\n", "\n", "checkpoint make_five_files:\n", " output:\n", " directory(OUTDIR)\n", " params:\n", " o = OUTDIR\n", " shell:\n", " \"\"\"\n", " mkdir {output};\n", " for D in $(seq 1 5); do\n", " touch {params.o}/$RANDOM.txt\n", " done\n", " \"\"\"\n", "\n", "rule copy_files:\n", " input:\n", " \"{SMP}.txt\"\n", " output:\n", " out = \"{SMP}.tsv\"\n", " shell:\n", " \"\"\"\n", " touch {output.out}\n", " \"\"\"\n", "\n", "rule copy2:\n", " input:\n", " \"{SMP}.tsv\"\n", " output:\n", " \"{SMP}.doc\"\n", " shell:\n", " \"\"\"\n", " touch {output}\n", " \"\"\"" ] } ], "source": [ "! cat Snakefile" ] }, { "cell_type": "code", "execution_count": 2, "id": "8ae96d3f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[33mAssuming unrestricted shared filesystem usage.\u001b[0m\n", "\u001b[33mBuilding DAG of jobs...\u001b[0m\n", "\u001b[33mUsing shell: /usr/bin/bash\u001b[0m\n", "\u001b[33mProvided cores: 1 (use --cores to define parallelism)\u001b[0m\n", "\u001b[33mRules claiming more threads will be scaled down.\u001b[0m\n", "\u001b[33mJob stats:\n", "job count\n", "--------------- -------\n", "all 1\n", "make_five_files 1\n", "total 2\n", "\u001b[0m\n", "\u001b[33mSelect jobs to execute...\u001b[0m\n", "\u001b[33mExecute 1 jobs...\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:43:06 2024]\u001b[0m\n", "\u001b[32mlocalcheckpoint make_five_files:\n", " output: first_directory\n", " jobid: 1\n", " reason: Forced execution\n", " resources: tmpdir=/tmp\u001b[0m\n", "\u001b[33mDAG of jobs will be updated after completion.\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:43:06 2024]\u001b[0m\n", "\u001b[32mFinished job 1.\u001b[0m\n", "\u001b[32m1 of 2 steps (50%) done\u001b[0m\n", "['10576', '23600', '21071', '24382', '17800']\n", "\u001b[33mSelect jobs to execute...\u001b[0m\n", "\u001b[33mExecute 1 jobs...\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:43:06 2024]\u001b[0m\n", "\u001b[32mlocalrule copy_files:\n", " input: first_directory/10576.txt\n", " output: first_directory/10576.tsv\n", " jobid: 5\n", " reason: Forced execution\n", " wildcards: SMP=first_directory/10576\n", " resources: tmpdir=/tmp\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:43:06 2024]\u001b[0m\n", "\u001b[32mFinished job 5.\u001b[0m\n", "\u001b[32m2 of 12 steps (17%) done\u001b[0m\n", "\u001b[33mSelect jobs to execute...\u001b[0m\n", "\u001b[33mExecute 1 jobs...\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:43:06 2024]\u001b[0m\n", "\u001b[32mlocalrule copy2:\n", " input: first_directory/10576.tsv\n", " output: first_directory/10576.doc\n", " jobid: 4\n", " reason: Forced execution\n", " wildcards: SMP=first_directory/10576\n", " resources: tmpdir=/tmp\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:43:06 2024]\u001b[0m\n", "\u001b[32mFinished job 4.\u001b[0m\n", "\u001b[32m3 of 12 steps (25%) done\u001b[0m\n", "\u001b[33mSelect jobs to execute...\u001b[0m\n", "\u001b[33mExecute 1 jobs...\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:43:06 2024]\u001b[0m\n", "\u001b[32mlocalrule copy_files:\n", " input: first_directory/23600.txt\n", " output: first_directory/23600.tsv\n", " jobid: 7\n", " reason: Forced execution\n", " wildcards: SMP=first_directory/23600\n", " resources: tmpdir=/tmp\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:43:06 2024]\u001b[0m\n", "\u001b[32mFinished job 7.\u001b[0m\n", "\u001b[32m4 of 12 steps (33%) done\u001b[0m\n", "\u001b[33mSelect jobs to execute...\u001b[0m\n", "\u001b[33mExecute 1 jobs...\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:43:06 2024]\u001b[0m\n", "\u001b[32mlocalrule copy2:\n", " input: first_directory/23600.tsv\n", " output: first_directory/23600.doc\n", " jobid: 6\n", " reason: Forced execution\n", " wildcards: SMP=first_directory/23600\n", " resources: tmpdir=/tmp\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:43:06 2024]\u001b[0m\n", "\u001b[32mFinished job 6.\u001b[0m\n", "\u001b[32m5 of 12 steps (42%) done\u001b[0m\n", "\u001b[33mSelect jobs to execute...\u001b[0m\n", "\u001b[33mExecute 1 jobs...\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:43:06 2024]\u001b[0m\n", "\u001b[32mlocalrule copy_files:\n", " input: first_directory/21071.txt\n", " output: first_directory/21071.tsv\n", " jobid: 9\n", " reason: Forced execution\n", " wildcards: SMP=first_directory/21071\n", " resources: tmpdir=/tmp\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:43:06 2024]\u001b[0m\n", "\u001b[32mFinished job 9.\u001b[0m\n", "\u001b[32m6 of 12 steps (50%) done\u001b[0m\n", "\u001b[33mSelect jobs to execute...\u001b[0m\n", "\u001b[33mExecute 1 jobs...\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:43:06 2024]\u001b[0m\n", "\u001b[32mlocalrule copy2:\n", " input: first_directory/21071.tsv\n", " output: first_directory/21071.doc\n", " jobid: 8\n", " reason: Forced execution\n", " wildcards: SMP=first_directory/21071\n", " resources: tmpdir=/tmp\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:43:06 2024]\u001b[0m\n", "\u001b[32mFinished job 8.\u001b[0m\n", "\u001b[32m7 of 12 steps (58%) done\u001b[0m\n", "\u001b[33mSelect jobs to execute...\u001b[0m\n", "\u001b[33mExecute 1 jobs...\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:43:06 2024]\u001b[0m\n", "\u001b[32mlocalrule copy_files:\n", " input: first_directory/17800.txt\n", " output: first_directory/17800.tsv\n", " jobid: 13\n", " reason: Forced execution\n", " wildcards: SMP=first_directory/17800\n", " resources: tmpdir=/tmp\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:43:06 2024]\u001b[0m\n", "\u001b[32mFinished job 13.\u001b[0m\n", "\u001b[32m8 of 12 steps (67%) done\u001b[0m\n", "\u001b[33mSelect jobs to execute...\u001b[0m\n", "\u001b[33mExecute 1 jobs...\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:43:06 2024]\u001b[0m\n", "\u001b[32mlocalrule copy2:\n", " input: first_directory/17800.tsv\n", " output: first_directory/17800.doc\n", " jobid: 12\n", " reason: Forced execution\n", " wildcards: SMP=first_directory/17800\n", " resources: tmpdir=/tmp\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:43:06 2024]\u001b[0m\n", "\u001b[32mFinished job 12.\u001b[0m\n", "\u001b[32m9 of 12 steps (75%) done\u001b[0m\n", "\u001b[33mSelect jobs to execute...\u001b[0m\n", "\u001b[33mExecute 1 jobs...\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:43:06 2024]\u001b[0m\n", "\u001b[32mlocalrule copy_files:\n", " input: first_directory/24382.txt\n", " output: first_directory/24382.tsv\n", " jobid: 11\n", " reason: Forced execution\n", " wildcards: SMP=first_directory/24382\n", " resources: tmpdir=/tmp\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:43:06 2024]\u001b[0m\n", "\u001b[32mFinished job 11.\u001b[0m\n", "\u001b[32m10 of 12 steps (83%) done\u001b[0m\n", "\u001b[33mSelect jobs to execute...\u001b[0m\n", "\u001b[33mExecute 1 jobs...\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:43:06 2024]\u001b[0m\n", "\u001b[32mlocalrule copy2:\n", " input: first_directory/24382.tsv\n", " output: first_directory/24382.doc\n", " jobid: 10\n", " reason: Forced execution\n", " wildcards: SMP=first_directory/24382\n", " resources: tmpdir=/tmp\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:43:06 2024]\u001b[0m\n", "\u001b[32mFinished job 10.\u001b[0m\n", "\u001b[32m11 of 12 steps (92%) done\u001b[0m\n", "\u001b[33mSelect jobs to execute...\u001b[0m\n", "\u001b[33mExecute 1 jobs...\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:43:06 2024]\u001b[0m\n", "\u001b[32mlocalrule all:\n", " input: first_directory, first_directory/10576.doc, first_directory/23600.doc, first_directory/21071.doc, first_directory/24382.doc, first_directory/17800.doc\n", " jobid: 0\n", " reason: Forced execution\n", " resources: tmpdir=/tmp\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:43:06 2024]\u001b[0m\n", "\u001b[32mFinished job 0.\u001b[0m\n", "\u001b[32m12 of 12 steps (100%) done\u001b[0m\n", "\u001b[33mComplete log: .snakemake/log/2024-08-22T174306.319827.snakemake.log\u001b[0m\n" ] } ], "source": [ "! snakemake -c 1 -F" ] }, { "cell_type": "code", "execution_count": 3, "id": "1b96dd98-7175-4d6c-8017-2dcd7bbe38e9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "first_directory Snakefile Untitled.ipynb\n" ] } ], "source": [ "! ls " ] }, { "cell_type": "code", "execution_count": 4, "id": "6f929305-7dd3-4630-bdaf-b998cb43c818", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10576.doc 17800.doc 21071.doc 23600.doc 24382.doc\n", "10576.tsv 17800.tsv 21071.tsv 23600.tsv 24382.tsv\n", "10576.txt 17800.txt 21071.txt 23600.txt 24382.txt\n" ] } ], "source": [ "! ls first_directory/" ] }, { "cell_type": "markdown", "id": "9b5df059-4a62-4cf1-83df-66b168788667", "metadata": {}, "source": [ "Let's add the creation of files in a new directory. The directory name is specified by the SNDDIR variable at the beginning of the file." ] }, { "cell_type": "code", "execution_count": 7, "id": "f5f296b4-90a4-4579-9148-95254b2379ec", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "import os\n", "\n", "OUTDIR = \"first_directory\"\n", "SNDDIR = \"second_directory\"\n", "\n", "\n", "\n", "def get_txt_files(wildcards):\n", " ck_output = checkpoints.make_five_files.get(**wildcards).output[0]\n", " print([file.split(\".\")[0] for file in os.listdir(ck_output) if file.endswith('.txt')])\n", " return [file.split(\".\")[0] for file in os.listdir(ck_output) if file.endswith('.txt')]\n", "\n", "\n", "rule all:\n", " input: \n", " OUTDIR,\n", " expand(f\"{SNDDIR}/\"+\"{SMP}.doc\", SMP=get_txt_files)\n", "\n", "\n", "checkpoint make_five_files:\n", " output:\n", " directory(OUTDIR)\n", " params:\n", " o = OUTDIR\n", " shell:\n", " \"\"\"\n", " mkdir {output};\n", " for D in $(seq 1 5); do\n", " touch {params.o}/$RANDOM.txt\n", " done\n", " \"\"\"\n", "\n", "rule copy_files:\n", " input:\n", " f\"{OUTDIR}/\"+\"{SMP}.txt\"\n", " output:\n", " out = f\"{SNDDIR}/\"+\"{SMP}.tsv\"\n", " shell:\n", " \"\"\"\n", " touch {output.out}\n", " \"\"\"\n", "\n", "rule copy2:\n", " input:\n", " f\"{SNDDIR}/\"+\"{SMP}.tsv\"\n", " output:\n", " f\"{SNDDIR}/\"+\"{SMP}.doc\"\n", " shell:\n", " \"\"\"\n", " touch {output}\n", " \"\"\"\n", "\n", "\n" ] } ], "source": [ "! cat Snakefile" ] }, { "cell_type": "code", "execution_count": 14, "id": "8e2aa619-5712-444e-ae3c-22481b4a5ab1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[33mAssuming unrestricted shared filesystem usage.\u001b[0m\n", "\u001b[33mBuilding DAG of jobs...\u001b[0m\n", "\u001b[33mUsing shell: /usr/bin/bash\u001b[0m\n", "\u001b[33mProvided cores: 1 (use --cores to define parallelism)\u001b[0m\n", "\u001b[33mRules claiming more threads will be scaled down.\u001b[0m\n", "\u001b[33mJob stats:\n", "job count\n", "--------------- -------\n", "all 1\n", "make_five_files 1\n", "total 2\n", "\u001b[0m\n", "\u001b[33mSelect jobs to execute...\u001b[0m\n", "\u001b[33mExecute 1 jobs...\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:52:10 2024]\u001b[0m\n", "\u001b[32mlocalcheckpoint make_five_files:\n", " output: first_directory\n", " jobid: 1\n", " reason: Forced execution\n", " resources: tmpdir=/tmp\u001b[0m\n", "\u001b[33mDAG of jobs will be updated after completion.\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:52:11 2024]\u001b[0m\n", "\u001b[32mFinished job 1.\u001b[0m\n", "\u001b[32m1 of 2 steps (50%) done\u001b[0m\n", "['5114', '11823', '32502', '6262', '21833']\n", "\u001b[33mSelect jobs to execute...\u001b[0m\n", "\u001b[33mExecute 1 jobs...\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:52:11 2024]\u001b[0m\n", "\u001b[32mlocalrule copy_files:\n", " input: first_directory/5114.txt\n", " output: second_directory/5114.tsv\n", " jobid: 5\n", " reason: Forced execution\n", " wildcards: SMP=5114\n", " resources: tmpdir=/tmp\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:52:11 2024]\u001b[0m\n", "\u001b[32mFinished job 5.\u001b[0m\n", "\u001b[32m2 of 12 steps (17%) done\u001b[0m\n", "\u001b[33mSelect jobs to execute...\u001b[0m\n", "\u001b[33mExecute 1 jobs...\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:52:11 2024]\u001b[0m\n", "\u001b[32mlocalrule copy2:\n", " input: second_directory/5114.tsv\n", " output: second_directory/5114.doc\n", " jobid: 4\n", " reason: Forced execution\n", " wildcards: SMP=5114\n", " resources: tmpdir=/tmp\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:52:11 2024]\u001b[0m\n", "\u001b[32mFinished job 4.\u001b[0m\n", "\u001b[32m3 of 12 steps (25%) done\u001b[0m\n", "\u001b[33mSelect jobs to execute...\u001b[0m\n", "\u001b[33mExecute 1 jobs...\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:52:11 2024]\u001b[0m\n", "\u001b[32mlocalrule copy_files:\n", " input: first_directory/11823.txt\n", " output: second_directory/11823.tsv\n", " jobid: 7\n", " reason: Forced execution\n", " wildcards: SMP=11823\n", " resources: tmpdir=/tmp\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:52:11 2024]\u001b[0m\n", "\u001b[32mFinished job 7.\u001b[0m\n", "\u001b[32m4 of 12 steps (33%) done\u001b[0m\n", "\u001b[33mSelect jobs to execute...\u001b[0m\n", "\u001b[33mExecute 1 jobs...\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:52:11 2024]\u001b[0m\n", "\u001b[32mlocalrule copy2:\n", " input: second_directory/11823.tsv\n", " output: second_directory/11823.doc\n", " jobid: 6\n", " reason: Forced execution\n", " wildcards: SMP=11823\n", " resources: tmpdir=/tmp\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:52:11 2024]\u001b[0m\n", "\u001b[32mFinished job 6.\u001b[0m\n", "\u001b[32m5 of 12 steps (42%) done\u001b[0m\n", "\u001b[33mSelect jobs to execute...\u001b[0m\n", "\u001b[33mExecute 1 jobs...\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:52:11 2024]\u001b[0m\n", "\u001b[32mlocalrule copy_files:\n", " input: first_directory/32502.txt\n", " output: second_directory/32502.tsv\n", " jobid: 9\n", " reason: Forced execution\n", " wildcards: SMP=32502\n", " resources: tmpdir=/tmp\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:52:11 2024]\u001b[0m\n", "\u001b[32mFinished job 9.\u001b[0m\n", "\u001b[32m6 of 12 steps (50%) done\u001b[0m\n", "\u001b[33mSelect jobs to execute...\u001b[0m\n", "\u001b[33mExecute 1 jobs...\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:52:11 2024]\u001b[0m\n", "\u001b[32mlocalrule copy2:\n", " input: second_directory/32502.tsv\n", " output: second_directory/32502.doc\n", " jobid: 8\n", " reason: Forced execution\n", " wildcards: SMP=32502\n", " resources: tmpdir=/tmp\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:52:11 2024]\u001b[0m\n", "\u001b[32mFinished job 8.\u001b[0m\n", "\u001b[32m7 of 12 steps (58%) done\u001b[0m\n", "\u001b[33mSelect jobs to execute...\u001b[0m\n", "\u001b[33mExecute 1 jobs...\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:52:11 2024]\u001b[0m\n", "\u001b[32mlocalrule copy_files:\n", " input: first_directory/21833.txt\n", " output: second_directory/21833.tsv\n", " jobid: 13\n", " reason: Forced execution\n", " wildcards: SMP=21833\n", " resources: tmpdir=/tmp\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:52:11 2024]\u001b[0m\n", "\u001b[32mFinished job 13.\u001b[0m\n", "\u001b[32m8 of 12 steps (67%) done\u001b[0m\n", "\u001b[33mSelect jobs to execute...\u001b[0m\n", "\u001b[33mExecute 1 jobs...\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:52:11 2024]\u001b[0m\n", "\u001b[32mlocalrule copy2:\n", " input: second_directory/21833.tsv\n", " output: second_directory/21833.doc\n", " jobid: 12\n", " reason: Forced execution\n", " wildcards: SMP=21833\n", " resources: tmpdir=/tmp\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:52:11 2024]\u001b[0m\n", "\u001b[32mFinished job 12.\u001b[0m\n", "\u001b[32m9 of 12 steps (75%) done\u001b[0m\n", "\u001b[33mSelect jobs to execute...\u001b[0m\n", "\u001b[33mExecute 1 jobs...\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:52:11 2024]\u001b[0m\n", "\u001b[32mlocalrule copy_files:\n", " input: first_directory/6262.txt\n", " output: second_directory/6262.tsv\n", " jobid: 11\n", " reason: Forced execution\n", " wildcards: SMP=6262\n", " resources: tmpdir=/tmp\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:52:11 2024]\u001b[0m\n", "\u001b[32mFinished job 11.\u001b[0m\n", "\u001b[32m10 of 12 steps (83%) done\u001b[0m\n", "\u001b[33mSelect jobs to execute...\u001b[0m\n", "\u001b[33mExecute 1 jobs...\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:52:11 2024]\u001b[0m\n", "\u001b[32mlocalrule copy2:\n", " input: second_directory/6262.tsv\n", " output: second_directory/6262.doc\n", " jobid: 10\n", " reason: Forced execution\n", " wildcards: SMP=6262\n", " resources: tmpdir=/tmp\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:52:11 2024]\u001b[0m\n", "\u001b[32mFinished job 10.\u001b[0m\n", "\u001b[32m11 of 12 steps (92%) done\u001b[0m\n", "\u001b[33mSelect jobs to execute...\u001b[0m\n", "\u001b[33mExecute 1 jobs...\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:52:11 2024]\u001b[0m\n", "\u001b[32mlocalrule all:\n", " input: first_directory, second_directory/5114.doc, second_directory/11823.doc, second_directory/32502.doc, second_directory/6262.doc, second_directory/21833.doc\n", " jobid: 0\n", " reason: Forced execution\n", " resources: tmpdir=/tmp\u001b[0m\n", "\u001b[32m\u001b[0m\n", "\u001b[32m[Thu Aug 22 17:52:11 2024]\u001b[0m\n", "\u001b[32mFinished job 0.\u001b[0m\n", "\u001b[32m12 of 12 steps (100%) done\u001b[0m\n", "\u001b[33mComplete log: .snakemake/log/2024-08-22T175210.669728.snakemake.log\u001b[0m\n" ] } ], "source": [ "! snakemake -c 1 -F" ] }, { "cell_type": "code", "execution_count": 15, "id": "9ded16d3-9e70-4e11-a260-1b330eb43563", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "first_directory second_directory Snakefile Untitled.ipynb\n" ] } ], "source": [ "! ls " ] }, { "cell_type": "code", "execution_count": 16, "id": "aeb60d76-14eb-48cc-99d2-782131ceddf4", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "11823.txt 21833.txt 32502.txt 5114.txt 6262.txt\n" ] } ], "source": [ "! ls first_directory/" ] }, { "cell_type": "code", "execution_count": 17, "id": "c59b96c4-e1fc-4b9a-a7b5-0195e4f273ae", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "11823.doc 21833.doc 32502.doc 5114.doc 6262.doc\n", "11823.tsv 21833.tsv 32502.tsv 5114.tsv 6262.tsv\n" ] } ], "source": [ "! ls second_directory/" ] }, { "cell_type": "code", "execution_count": null, "id": "60ca27bf-aebd-41f2-91ea-e17e5a43c612", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:snakemake-8-18-1]", "language": "python", "name": "conda-env-snakemake-8-18-1-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.5" } }, "nbformat": 4, "nbformat_minor": 5 }