OxTalks will soon move to the new Halo platform and will become 'Oxford Events.' There will be a need for an OxTalks freeze. This was previously planned for Friday 14th November – a new date will be shared as soon as it is available (full details will be available on the Staff Gateway).
In the meantime, the OxTalks site will remain active and events will continue to be published.
If staff have any questions about the Oxford Events launch, please contact halo@digital.ox.ac.uk
In this presentation the transformation of assessment development using AI, computational psychometric models, and engineering techniques will be discussed. The cycle of test construction, administration, and scoring is labor-intensive, time-consuming and costly, yet it is necessary to support high standards of validity evidence or even legislative requirements. We will hear about how AI generally, and large computational language models in particular, have penetrated various phases of the test development cycle, with a focus on item generation and test design for different skills and domains.
These generative AI-based language models consist of multi-layered neural networks. Generative Pre-trained Transformer 3 (GPT-3) is a well-known application of a machine learning model that was (pre)trained on a dataset of 500 billion words, and which can generate stories, blogs, news reports and chats that can be indistinguishable from those written by humans. The advantage of pre-trained language models such as GPT-3 is that, once they have been built with a vast corpus of data and learned the connections between words, they can then be tuned for various purposes with relatively little data. This can be useful, for instance, in automated item generation (AIG) and text generation. Rather than the older approach to AIG with rule-based templates where elements are swapped out, pre-trained language models can create original content (texts and items). Using computational psychometrics (a blend of AI and psychometrics—von Davier 2015; 2017) we can then produce estimates of item difficulties with very little pilot data (Attali et al, 2022).
These applications will be illustrated with the “Item Factory”, a newly launched intelligent system that uses human-in-the-loop AI for test development for the Duolingo assessments.