On 28th November OxTalks will move to the new Halo platform and will become 'Oxford Events' (full details are available on the Staff Gateway).
There will be an OxTalks freeze beginning on Friday 14th November. This means you will need to publish any of your known events to OxTalks by then as there will be no facility to publish or edit events in that fortnight. During the freeze, all events will be migrated to the new Oxford Events site. It will still be possible to view events on OxTalks during this time.
If you have any questions, please contact halo@digital.ox.ac.uk
This talk concerns the challenge of evaluating intelligence in artificial systems such as GPT. While contemporary methods provide fine-grained assessments of task performance, they often fail to distinguish genuine intelligence from sophisticated mimicry, leading to familiar, long-standing debates about what to say about “Block Heads” and the Chinese Room. I think that we can make headway on this stubborn issue by thinking more carefully about how various activities come to be performed rather than focusing only on outward performance. In the context of LLMs, this suggests a path towards “deep benchmarking” which requires attending to the mechanisms AI systems use to complete tasks along with careful thinking about the conditions under which those mechanisms underwrite intelligent activity. Once we do that, how do things look with respect to chatbot intelligence? In my view, there are plausible mechanisms present in contemporary LLMs that underwrite intelligent activities, but the activities in question are a good way off from anything like semantic understanding let alone AGI.