Incomplete Preferences and Dynamic Consistency: Theory and Applications to AI Safety

Please note change of date and location.

The AI shutdown problem is, roughly, the problem of getting future advanced (agentic) AI systems to shut down when and only when we want them to. Thornley (2023) proved some theorems showing that an agent satisfying some rather minimal rationality criteria precludes it from being both capable and shutdownable. However, by denying the completeness axiom of utility theory, he manages to characterise a capable and shutdownable agent. A seemingly relevant condition for an agent to be capably goal-directed is that it avoids sequences of actions that foreseeably leave it worse off. My project proposes a choice rule which, as I derive, guarantees the dynamic consistency of agents with incomplete preferences.

Date: 29 November 2023, 15:00
Venue:
Manor Road Building
Manor Road OX1 3UQ
See location on maps.ox

Details: Seminar Room B or https://zoom.us/j/97316381707?pwd=WDdBVlJVVU1Ga3JnRW15c0RnNlE5QT09
Speaker: Sami Petersen (University of Oxford)
Organising department: Department of Economics
Part of: Novel Ideas: MPhil Seminar Series
Booking required?: Not required
Audience: Members of the University only
Editor: Shreyasi Banerjee