Economic decision-making through distributional dopaminergic codes

Effective economic decision-making depends on the brain’s capacity to integrate uncertainty about the state of the world and reward/loss outcomes. Various theories, such as expected utility and prospect theory, have been proposed to model economic preferences of the agents. However, a full characterization of the specific preference measures employed by agents has yet to be achieved. Moreover, while progress in neuroeconomics has been substantial, critical gaps persist in understanding how dopaminergic system computes such preference measures. Here, we present a framework motivated by an aspiration to bridge an analytical gap between system and cognitive level perspectives. We focus on multi-armed bandit tasks where we assume that the reward distribution in each arm is represented by the expected values of the encoding functions of dopamine neurons, a representational scheme referred to as distributed distributional coding (DDC). We extend the temporal difference (TD) learning to distributional DDC-TD learning and propose that the neural response of dopamine neurons can be characterized by the DDC-TD error. Using dopaminergic recordings from the ventral tegmental area (VTA) in mice, we show that unit activity broadly accords with the predictions of DDC-TD model. Building on these empirical observations, we go on to detail a neural network that utilizes DDC values associated with alternative choice options as input. In conjunction with a simple learning rule, this DDC network can infer a large class of preference/risk measures that an agent may adopt, including expected utility, conditional value at risk (expected tail gain), and compound utility-risk measures. A core feature of the network is that a dopaminergic preference computation emerges intrinsically within the network’s configuration, enabling any inferred preference measure to be computed from a mapping of the linear readouts of dopamine neurons. Thus, our framework provides a robust approach for inferring agents’ preference measures as well as detailing how dopaminergic system might compute these probabilistic measures.