SIMON: open-source software for application of machine learning to biomedical data - Dr Adriana Tomic

This course will cover how to use SIMON – a recently developed open-source software for the application of machine learning to biological and clinical data. In SIMON, analysis is performed using an intuitive graphical user interface and standardized, automated machine learning approach allowing non-technical researchers to identify patterns and extract knowledge from high-dimensional data and build thousands of high-quality predictive models using 180+ machine learning algorithms. With an easy-to-use graphical user interface, standardized pipelines, and automated approach for machine learning and other statistical analysis methods, SIMON helps to identify optimal algorithms and provides a resource that empowers non-technical and technical researchers to identify crucial patterns in biomedical data.

Topics to be covered: – Data preparation and integration – Overfitting and how to avoid it – Feature processing methods to avoid ‘curse of dimensionality’ – How to deal with missing data using in-built multi-set interaction algorithm – Performing machine learning (using also automated ML option) – Performance metrics, evaluation and selection of high-quality models – Feature selection: scoring and elimination – Exploratory analysis

Learning Objectives: – complete end-to-end machine learning analysis using SIMON – learn how to prepare data for analysis – understand the importance of reducing the dimensionality using appropriate methods – select appropriate machine learning algorithms – learn how to properly evaluate predictive models using performance metrics – select the most important features – perform exploratory analysis

Type of session:
Short presentation, followed by practical, hands-on SIMON demo session using provided biomedical dataset

Software required:
Docker (version 17.05 or later is required)
SIMON latest version
Type: Seminar Series
Series organiser: Sarah Laseke (Big Data Institute)
Timing: 5 & 12 May, 11 - 12 noon
Web Address:
Organising department: Big Data Institute (NDPH)


No upcoming talks to display for this series.
Editor: Sarah Laseke