Advanced workflow management with Snakemake

BDI Code Clinic – Advanced workflow management with Snakemake
18 October, 11am – noon, Microsoft Teams (a link will be circulated in advance)
To register, please visit: oxford.onlinesurveys.ac.uk/bdi-code-clinic-snakemake-18-november

Advanced workflow management with Snakemake

Snakemake is a workflow management tool to perform a number of related tasks in an efficient, reproducible and readable way. In this tutorial, the following topics will be covered:

Running Snakemake on the BMRC cluster (Rescomp) – using profiles
Passing parameters to external scripts (R/Python)
Using conda environments
Using input functions

As in the previous session, a mix of slides and code will be used. However, in this session we will use a ‘real-world’ workflow, taking an example set of very small fastq files and we will run through a workflow of common pre-processing steps (e.g. raw fastq > map to genome > mark duplicates > plot) using common bioinformatic software such as Samtools, BWA and GATK. We will also go through how to set up Snakemake to run the workflow on the cluster using profiles and talk about a couple of issues specific to running Snakemake on the BMRC cluster.

By the end of the tutorial, you should be able to run a workflow that passes custom formatted parameters to external scripts or programs and use conda environments to improve reproducibility.
Code and example data for the tutorial will be circulated closer to the date.
Prerequisites: This tutorial would be easier to follow if you can understand and run a basic Snakemake workflow (See here for the code and powerpoint slides that were shown in the introductory session: github.com/sraorao/snakemake_code_clinic ; If you haven’t done this already, try working on the Snakefile_wildcards.smk workflow and see if you can fill in the empty rules in the ‘TODO’ based on the comments in the file and see Snakemake_wildcards_solution.smk for one possible way to do this.)