Opponent-Shaping and Interference in General-Sum Games
In general-sum games, the interaction of self-interested learning agents commonly leads to collectively worst-case outcomes, such as defect-defect in the iterated prisoner’s dilemma (IPD). To overcome this, some methods, such as Learning with Opponent-Learning Awareness (LOLA), shape their opponents’ learning process. However, these methods are myopic since only a small number of steps can be anticipated, are asymmetric since they treat other agents as naïve learners, and require the use of higher-order derivatives, which are calculated through white-box access to an opponent’s differentiable learning algorithm.

In this talk I will first introduce Model-Free Opponent Shaping (M-FOS), which overcomes all of these limitations. M-FOS learns in a meta-game in which each meta-step is an episode of the underlying (``inner’‘) game. The meta-state consists of the inner policies, and the meta-policy produces a new inner policy to be used in the next episode. M-FOS then uses generic model-free optimisation methods to learn meta-policies that accomplish long-horizon opponent shaping.

I will finish off the talk with our recent results for adversarial (or cooperative) cheap-talk: How can agents interfere with (or support) the learning process of other agents without being able to act in the environment?
Date: 7 February 2024, 12:30 (Wednesday, 4th week, Hilary 2024)
Venue: Please register to receive venue details
Speaker: Professor Jakob Foerster (Engineering Science, Oxford)
Host: Dr Caroline Green (Oxford)
Part of: Ethics in AI Lunchtime Seminars
Booking required?: Required
Booking url: https://www.oxford-aiethics.ox.ac.uk/ethics-ai-lunchtime-seminars-opponent-shaping-and-interference-general-sum-games
Booking email: aiethics@philosophy.ox.ac.uk
Cost: Free
Audience: Public
Editors: Marie Watson, Lauren Czerniawska