BDI Seminar and Workshop: Hail: Scalable genomic association analysis
Cotton is one of the key architects behind Hail, an open-source, scalable framework for exploring and analyzing genomic data.

Hail was invented to empower the worldwide genetics community to harness the flood of genomes to discover the biology of human disease. Hail has been used for dozens of major studies and is the core analysis platform of large-scale genomics efforts. The functionalities in Hail are exposed through Python and backed by distributed algorithms built on top of Apache Spark to efficiently analyze gigabyte-scale data on a laptop or terabyte-scale data on a cluster, without the need to manually chop up data or manage job failures. Users can script pipelines or explore data interactively through Jupyter notebooks that flow between Hail with methods for genomics, PySpark with scalable SQL and machine learning algorithms, and pandas with scikit-learn and Matplotlib for results that fit on one machine. Hail also provides a flexible domain language to express complex quality control and analysis pipelines with concise, readable code.

WORKSHOP SCHEDULE:
10:00-11:00 Talk Dr. Cotton Seed: “Hail: Scalable Genomic Association Analysis”. Room: BDI Seminar room 1.

11:00-12:00 Hands-on tutorial, showing potential users what you can do in Hail to get users excited. Room: BDI Seminar room 0.

12:00-13:00 Lunch break

13:00-14:00 Hands-on tutorial, focussing on high-level, operational and strategic questions involved in implementing Hail. Room: BDI Seminar room 0.
Date: 6 March 2018, 10:00 (Tuesday, 8th week, Hilary 2018)
Venue: Venue to be announced
Speaker: Dr Cotton Seed (Broad Institute of Harvard and MIT)
Organising department: Big Data Institute (NDM)
Organiser: Carol Mulligan-John (University of Oxford)
Part of: BDI seminars
Booking required?: Not required
Audience: Members of the University only
Editors: Graham Bagley, Hannah Freeman