Oxford Events, the new replacement for OxTalks, will launch on 16th March. From now until the launch of Oxford Events, new events cannot be published or edited on OxTalks while all existing records are migrated to the new platform. The existing OxTalks site will remain available to view during this period.
From 16th, Oxford Events will launch on a new website: events.ox.ac.uk, and event submissions will resume. You will need a Halo login to submit events. Full details are available on the Staff Gateway.
As AI systems grow more capable, it becomes more and more urgent to ensure that they act consistently with human values – the challenge of “AI alignment.” In some respects, this challenge seems structurally analogous to that of fostering the moral development of human beings. For this reason, the world’s great religious and philosophical traditions, which over many centuries have learned valuable lessons about moral cultivation, may have something to offer to technologists seeking to train safe and trustworthy AI systems.
In Buddhism, the development of moral discipline centrally involves the cultivation of certain emotions. There is no agreement about whether computer systems, even if highly intelligent, can literally have emotions. But functional analogues of moral emotions may be useful for addressing practical alignment challenges.
It would, plausibly, be quite helpful if we could get AIs to act from lovingkindness and compassion. This talk, however, will focus on two less well-known moral emotions, which Buddhists call hiri and ottappa, commonly but imprecisely translated as “shame” and “embarrassment.” Current methodologies for AI alignment appear well suited to produce only the second of these; but for building trustworthy AI, the first, hiri, is likely to be more important.
In humans, recognition of our moral errors can produce painful emotions such as guilt and shame. But trying to build computer systems that undergo functional analogues of guilt and shame could have serious disadvantages. There appears to be a third, similar emotion, remorse; the primary referent of hiri could then be defined as “anticipatory remorse-proneness.” In advanced spiritual practitioners, hiri will manifest instead as a maturation and transcendence of anticipatory remorse-proneness. If an analogue of this last state can be developed in highly capable AI systems, that could go far towards ensuring that they do not pose lethal threats to humanity’s future.