Strengths and weaknesses of large-sample machine learning for hydroclimatic extremes

Recent advances in large-sample hydrology using machine learning (ML) models have enabled significant progress, overcoming the constraints of limited observations in individual locations. Large-sample models typically integrate historical discharge records from thousands of stream gauges with a wide range of environmental and human impacts data, learning relationships across diverse environments to improve model performance. At the core of our models is the GRIT river network, a new global bifurcating network essential for accurate flood modelling. Our research addresses three challenges: (1) evaluating the strengths and weaknesses of ML for estimating extreme events across data-rich and data-sparse regions; (2) attributing driving mechanisms, by comparing various explainability methods and exploring the effects of human impacts; and (3) enhancing event prediction and projection, by assessing the extent to which ML models trained on climate model outputs are able to correct biases and deliver reliable predictions. We also discuss challenges in large-sample modelling such as data biases, inconsistencies in explainability, and causality. By integrating ML techniques and global datasets, large-sample hydrology offers new opportunities for understanding and managing hydroclimatic extremes.