History’s Top Brain Computation Insights: Day 25

25) The dopamine system implements a reward prediction error algorithm (Schultz – 1996, Sutton – 1988)

It used to be that the main thing anyone "knew" about the dopamine system was that it is important for motor control.
Parkinson's disease, which visibly manifests itself as motor tremors, is caused by disruption of the dopamine system (specifically, the substantia nigra), so this was an understandable conclusion.

When Wolfram Schultz began recording from dopamine neurons in mice and monkeys he was having trouble finding correlations with his motor task. Was he doing something wrong? Was he recording from the right cells?

Instead of towing the line of dopamine = motor control he set out to find out what this system really does. It turns out that it is related to reward.

Schultz observed dopamine cell bursting at the onset of unexpected reward. He also observed that this bursting shifts to a cue (e.g., a bell sound) indicating a reward is forthcoming. When the reward cue occurs but no reward follows he saw that the dopamine cells go silent (below resting firing rate).

This pattern is quite interesting computationally. The dopamine signal mimics the error signal in a form of reinforcement learning called temporal difference learning.

This form of learning was originally developed by Sutton. It is a powerful algorithm for learning to predict reward and learn from errors in attaining reward.

Temporal difference learning basically propagates reward prediction back in time as far as possible, thus facilitating the process of attaining reward in the future.

Figure: (Top) No conditioned stimulus cue is given, so the reward is unexpected and there is a big dopamine burst. (Middle) The animal learns to predict the reward based on the cue and the dopamine burst moves to the cue. (Bottom) The reward is predicted, but since no reward occurs there is a depression in dopamine release.
Source: Figure 2 of Schultz, 1999. (News in Physiological Sciences, Vol. 14, No. 6, 249-255, December 1999)

Implication: The mind, largely governed by reward-seeking behavior on a continuum between controlled and automatic processing, is implemented in an electro-chemical organ with distributed and modular function consisting of excitatory and inhibitory neurons communicating via ion-induced action potentials over convergent and divergent synaptic connections altered by timing-dependent correlated activity often driven by expectation errors. The cortex, a part of that organ organized via local competition and composed of functional column units whose spatial dedication determines representational resolution, is composed of many specialized regions forming specialized networks involved in perception (e.g., touch: parietal, vision: occipital), action (e.g., frontal), and memory (e.g., short-term: prefrontal, long-term: temporal), which depend on inter-regional connectivity for functional integration, population vector summation for representational specificity, dopamine signals for reinforcement learning, and recurrent connectivity for sequential learning.

[This post is part of a series chronicling history's top brain computation insights (see the first of the series for a detailed description). See the history category archive to see all of the entries thus far.]

-MC

History’s Top Brain Computation Insights: Day 25

Leave a comment

Cancel reply