Advanced workflow management with Snakemake

BDI Code Clinic – Advanced workflow management with Snakemake
18 October, 11am – noon, Microsoft Teams (a link will be circulated in advance)
To register, please visit: oxford.onlinesurveys.ac.uk/bdi-code-clinic-snakemake-18-november

Snakemake is a workflow management tool to perform a number of related tasks in an efficient, reproducible and readable way. In this tutorial, the following topics will be covered:

Running Snakemake on the BMRC cluster (Rescomp) – using profiles
Passing parameters to external scripts (R/Python)
Using conda environments
Using input functions

As in the previous session, a mix of slides and code will be used. However, in this session we will use a ‘real-world’ workflow, taking an example set of very small fastq files and we will run through a workflow of common pre-processing steps (e.g. raw fastq > map to genome > mark duplicates > plot) using common bioinformatic software such as Samtools, BWA and GATK. We will also go through how to set up Snakemake to run the workflow on the cluster using profiles and talk about a couple of issues specific to running Snakemake on the BMRC cluster.

By the end of the tutorial, you should be able to run a workflow that passes custom formatted parameters to external scripts or programs and use conda environments to improve reproducibility.
Code and example data for the tutorial will be circulated closer to the date.
Prerequisites: This tutorial would be easier to follow if you can understand and run a basic Snakemake workflow (See here for the code and powerpoint slides that were shown in the introductory session: github.com/sraorao/snakemake_code_clinic ; If you haven’t done this already, try working on the Snakefile_wildcards.smk workflow and see if you can fill in the empty rules in the ‘TODO’ based on the comments in the file and see Snakemake_wildcards_solution.smk for one possible way to do this.)

Date: 18 November 2020, 11:00 (Wednesday, 6th week, Michaelmas 2020)
Venue: Venue to be announced
Speakers: Speaker to be announced
Organising department: Big Data Institute (NDPH)
Organiser: Sarah Laseke (Big Data Institute)
Organiser contact email address: sarah.laseke@ndph.ox.ac.uk
Host: Sarah Laseke (Big Data Institute)
Booking required?: Required
Booking url: https://oxford.onlinesurveys.ac.uk/bdi-code-clinic-snakemake-18-november
Audience: Members of the University only
This talk features in the following public collections:
- Talks of Interest to Medical Sciences
Editor: Sarah Laseke