Introduction to Workflow Management with Snakemake

Introduction to Workflow Management with Snakemake – Dr Srinivasa Rao (Nuffield Dept. of Surgical Sciences)

17 March, 10 am – 12 noon

Microsoft Teams Link:
Join on your computer or mobile app
Click here to join the meeting

To register, please visit:
oxford.onlinesurveys.ac.uk/introduction-to-workflow-management-with-snakemake-17-ma

Snakemake is a workflow management tool to perform a number of related tasks (aka “rules” in Snakemake lingo) in an efficient, reproducible and readable way. It uses a simple vocabulary to define expected input, output, parameters, script and resources for each rule. We will use a ‘real-world’ workflow, taking an example set of small fastq files and run through a workflow of common pre-processing steps (e.g. raw fastq > map to genome > mark duplicates > plot) using common bioinformatic software such as Samtools, BWA and GATK. We will also go through how to set up Snakemake to run the workflow on the cluster using profiles and talk about a couple of issues specific to running Snakemake on the BMRC cluster.

Topics to be covered
1. Snakemake vocabulary and syntax
2. Various kinds of tasks (shell, R/python scripts, conda environments)
3. Parameters and config
4. Wildcards
5. External scripts
6. Conda environments

Learning objectives
1. Construct a basic workflow
2. Use different modes of execution of tasks (shell, R/python scripts)
3. Pass additional parameters and configuration
4. Use wildcards to generalize rules
5. Run workflows on the BMRC cluster
6. Use conda environments for reproducibility

Prior knowledge required
Some familiarity with Python, Conda and the bash command line is essential.

Intended Audience
This session is for those interested in writing reproducible workflows, for example (but not limited to) performing bioinformatic analysis of sequencing data.

Audience requirements
The first half of the session will focus on constructing a simple workflow, for which your own device (with the requisite software installed) is fine. The second half of the session will focus more on getting Snakemake to run on the BMRC cluster, so participants will need access to the cluster if this is of interest.

Pre-course work
Download the code (or clone the repository) from the GitHub page ahead of the session.

Type of session
A mix of presentation slides and code will be used. Participants are encouraged to follow along during the demo, following which some time will be spent on solving a ‘problem set’ individually or in groups.

Software required
Snakemake (>= v5.26), Conda (See here for how to install these on your device.); bash, R and Python for basic workflows. Please note that if you are planning to use the BMRC cluster for this tutorial, all this software will be available as modules, so you don’t need to install them again.
Linux, MacOS