On 28th November OxTalks will move to the new Halo platform and will become 'Oxford Events' (full details are available on the Staff Gateway).
There will be an OxTalks freeze beginning on Friday 14th November. This means you will need to publish any of your known events to OxTalks by then as there will be no facility to publish or edit events in that fortnight. During the freeze, all events will be migrated to the new Oxford Events site. It will still be possible to view events on OxTalks during this time.
If you have any questions, please contact halo@digital.ox.ac.uk
The use of Artificial Intelligence (AI), or more generally data-driven algorithms, has become ubiquitous in today’s society. Yet, in many cases, and especially when stakes are high, humans still make final decisions. The critical question, therefore, is whether AI helps humans make better decisions compared to a human-alone or AI-alone system. We introduce a new methodological framework to empirically answer this question with a minimal set of assumptions. We measure a decision maker’s ability to make correct decisions using standard classification metrics based on the baseline potential outcome. We consider a single-blinded and unconfounded treatment assignment, where the provision of AI-generated recommendations is assumed to be randomized across cases with humans making final decisions. Under this study design, we show how to compare the performance of three alternative decision-making systems —- human-alone, human-with-AI, and AI-alone. Importantly, the AI-alone system includes any individualized treatment assignment, including those not used in the original study. We also show when AI recommendations should be provided to a human-decision maker, and when one should follow such recommendations. We apply the proposed methodology to our own randomized controlled trial evaluating a pretrial risk assessment instrument. We find that the risk assessment recommendations do not improve the classification accuracy of a judge’s decision to impose cash bail. Furthermore, we find that replacing a human judge with algorithms —- the risk assessment score and a large language model in particular —- leads to a worse classification performance.