Robust Research – A practical guide – Verena Heise, DPhil, NDPH Intermediate Fellow, Nuffield Department of Population Health
Are most published research findings false? Why should we care? And is there anything we can do about it? In this talk I will give an overview of some practical solutions such as open science and good research practices that can help make our research findings more robust. While there are a number of solutions that can be implemented by individual researchers, there are wider issues, for example around skills training and incentives, that require cultural change. To lobby for this change we have started the cross-divisional initiative Reproducible Research Oxford and I will briefly outline our current and planned activities.
Same Data – Different Software – Different Results? Analytic Variability of Group fMRI Results – Alex Bowring, DPhil student
A wealth of analysis tools are available to fMRI researchers in order to extract patterns of task variation and, ultimately, understand cognitive function. However, this ‘methodological plurality’ comes with a drawback. While conceptually similar, two different analysis pipelines applied on the same dataset may not produce the same scientific results. Differences in methods, implementations across software, and even operating systems or software versions all contribute to this variability. Consequently, attention in the field has recently been directed to reproducibility and data sharing.
In this work, our goal is to understand how choice of software package impacts on analysis results. We use publicly shared data from three published task fMRI neuroimaging studies, reanalyzing each study using the three main neuroimaging software packages, AFNI, FSL and SPM, using parametric and nonparametric inference. We obtain all information on how to process, analyze, and model each dataset from the publications. We make quantitative and qualitative comparisons between our replications to gauge the scale of variability in our results and assess the fundamental differences between each software package. Qualitatively we find similarities between packages, backed up by Neurosynth association analyses that correlate similar words and phrases to all three software package’s unthresholded results for each of the studies we reanalyze. However, we also discover marked differences, such as Dice similarity coefficients ranging from 0.000 – 0.684 in comparisons of thresholded statistic maps between software. We discuss the challenges involved in trying to reanalyse the published studies, and highlight our efforts to make this research reproducible.