Incomplete Preferences and Dynamic Consistency: Theory and Applications to AI Safety


Please note change of date and location.

The AI shutdown problem is, roughly, the problem of getting future advanced (agentic) AI systems to shut down when and only when we want them to. Thornley (2023) proved some theorems showing that an agent satisfying some rather minimal rationality criteria precludes it from being both capable and shutdownable. However, by denying the completeness axiom of utility theory, he manages to characterise a capable and shutdownable agent. A seemingly relevant condition for an agent to be capably goal-directed is that it avoids sequences of actions that foreseeably leave it worse off. My project proposes a choice rule which, as I derive, guarantees the dynamic consistency of agents with incomplete preferences.