Either due to lack of data, knowledge or high sensitivity to some free parameters, simulations often produce large prediction intervals where “almost anything can happen”. The standard approach to tame this problem is to look at some “average” run or behaviour and perhaps to rank policies by their average performance. Even when prediction intervals are large, these averages tend to converge quickly giving the impression that such rankings are both easy to produce and informative. I would like to discuss how these averages are actually uninformative when, as it surely almost the case, the priors on parameters are not well thought out (as for example using a uniform prior as a substitute for lack of knowledge). Rather, the best approach should be to perform a thorough parameter space exploration and cluster the problem into separate scenarios, identify the key parameters dividing them and exploiting simulation’s cues to produce better policies.