As AI systems grow more capable, it becomes more and more urgent to ensure that they act consistently with human values – the challenge of “AI alignment.” In some respects, this challenge seems structurally analogous to that of fostering the moral development of human beings. For this reason, the world’s great religious and philosophical traditions, which over many centuries have learned valuable lessons about moral cultivation, may have something to offer to technologists seeking to train safe and trustworthy AI systems.
In Buddhism, the development of moral discipline centrally involves the cultivation of certain emotions. There is no agreement about whether computer systems, even if highly intelligent, can literally have emotions. But functional analogues of moral emotions may be useful for addressing practical alignment challenges.
It would, plausibly, be quite helpful if we could get AIs to act from lovingkindness and compassion. This talk, however, will focus on two less well-known moral emotions, which Buddhists call hiri and ottappa, commonly but imprecisely translated as “shame” and “embarrassment.” Current methodologies for AI alignment appear well suited to produce only the second of these; but for building trustworthy AI, the first, hiri, is likely to be more important.
In humans, recognition of our moral errors can produce painful emotions such as guilt and shame. But trying to build computer systems that undergo functional analogues of guilt and shame could have serious disadvantages. There appears to be a third, similar emotion, remorse; the primary referent of hiri could then be defined as “anticipatory remorse-proneness.” In advanced spiritual practitioners, hiri will manifest instead as a maturation and transcendence of anticipatory remorse-proneness. If an analogue of this last state can be developed in highly capable AI systems, that could go far towards ensuring that they do not pose lethal threats to humanity’s future.