Remorse-Proneness, AI Alignment, and Artificial Moral Psychology

Places limited for in-person attendance. To register for in-person attendance, please contact pyi.kyaw@ames.ox.ac.uk. Online attendance via Teams. Meeting ID: 354 964 053 222 Passcode: 4Sm2zg66

As AI systems grow more capable, it becomes more and more urgent to ensure that they act consistently with human values – the challenge of “AI alignment.” In some respects, this challenge seems structurally analogous to that of fostering the moral development of human beings. For this reason, the world’s great religious and philosophical traditions, which over many centuries have learned valuable lessons about moral cultivation, may have something to offer to technologists seeking to train safe and trustworthy AI systems. In Buddhism, the development of moral discipline centrally involves the cultivation of certain emotions. There is no agreement about whether computer systems, even if highly intelligent, can literally have emotions. But functional analogues of moral emotions may be useful for addressing practical alignment challenges.
It would, plausibly, be quite helpful if we could get AIs to act from lovingkindness and compassion. This talk, however, will focus on two less well-known moral emotions, which Buddhists call hiri and ottappa, commonly but imprecisely translated as “shame” and “embarrassment.” Current methodologies for AI alignment appear well suited to produce only the second of these; but for building trustworthy AI, the first, hiri, is likely to be more important.
In humans, recognition of our moral errors can produce painful emotions such as guilt and shame. But trying to build computer systems that undergo functional analogues of guilt and shame could have serious disadvantages. There appears to be a third, similar emotion, remorse; the primary referent of hiri could then be defined as “anticipatory remorse-proneness.” In advanced spiritual practitioners, hiri will manifest instead as a maturation and transcendence of anticipatory remorse-proneness. If an analogue of this last state can be developed in highly capable AI systems, that could go far towards ensuring that they do not pose lethal threats to humanity’s future.

Date: 27 February 2025, 17:00
Venue: Spalding Room (3rd floor)
Speaker: Charles Goodman (Binghamton University)
Organising department: Faculty of Asian and Middle Eastern Studies
Organiser: Kate Crosby (University of Oxford)
Organiser contact email address: pyi.kyaw@ames.ox.ac.uk
Host: Kate Crosby (University of Oxford)
Part of: Colloquium on Buddhism and AI
Booking required?: Required
Booking email: pyi.kyaw@ames.ox.ac.uk
Audience: Public
This talk features in the following public collections:
- CCW Recommended Seminars
Editor: Pyi Kyaw