This paper examines the extent to which reported standard errors in empirical studies can be used to generate accurate interval and density predictions for estimates of closely related parameters in other studies. We create groups of baseline and validation studies whose parameters are linked through a hierarchical Bayes model and examine the coverage frequencies of predictive intervals. We regard the reported standard errors as collectively well calibrated if the coverage frequencies for the validation study estimates match the nominal coverage probability or probability integral transforms with density predictions appear to be uniform. The assessment crucially depends on assumptions about the similarity of estimands across different studies within groups. We consider two applications. If it is believed that the grouped studies estimate the same parameter, the standard errors reported understate the uncertainty. However, there is a degree of parameter heterogeneity under which the standard errors can be regarded as well calibrated.