A Brief Introduction to Reinforcement Learning

Computational models that are implemented, i.e., written out as equations or software, are an increasingly important tool for the cognitive neuroscientist.  This is because implemented models are, effectively, hypotheses that have been worked out to the point where they make quantitative predictions about behavior and/or neural activity.

In earlier posts, we outlined two computational models of learning hypothesized to occur in various parts of the brain, i.e., Hebbian-like LTP (here and here) and error-correction learning (here and here). The computational model described in this post contains hypotheses about how we learn to make choices based on reward.

The goal of this post is to introduce a third type of learning: Reinforcement Learning (RL).  RL is hypothesized by a number of cognitive neuroscientists to be implemented by the basal ganglia/dopamine system.  It has become somewhat of a hot topic in Cognitive Neuroscience and received a lot of coverage at this past year’s Computational Cognitive Neuroscience Conference.

Read more

Grand Challenges of Neuroscience: Day 2

Topic 2: Conflict and Cooperation

Generally, cognitive neuroscience aims to explain how mental processes such as believing, knowing, and inferring arise in the brain and affect behavior.  Two behaviors that have important effects on the survival of humans are cooperation and conflict.

According to the NSF committee convened last year, conflict and cooperation is an important focus area for future cognitive neuroscience work.  Although research in this area has typically been the domain of psychologists, it seems that the time is ripe to apply findings from neuroscience to ground psychological theories in the underlying biology.

Neuroscience has produced a large amount of information about the brain regions that are relevant to social interactions.  For example, the amygdala has been shown to be involved in strong emotional responses.  The “mirror” neuron system in the frontal lobe allows us to put ourselves in someone else’s shoes by allowing us to understand their actions as though they were our own.  Finally, the superior temporal gyrus and orbitofrontal cortex, normally involved in language and reward respectively, have also been shown to be involved in social behaviors.


The committee has left it up to us to come up with a way to study these phenomena! How can we study conflict and cooperation from cognitive neuroscience perspective?

At least two general approaches come to mind. The first is fMRI studies in which social interactions are simulated (or carried out remotely) over a computer link to the experiment participant.  A range of studies of this sort have recently begun to appear investigating trust and decision-making in social contexts.

The second general approach that comes to mind is that of  using neurocomputational simulations of simple acting organisms with common or differing goals.  Over the past few years, researchers have been carrying out studies with multiple interacting “agents” that “learn” through the method of Reinforcement Learning.

Reinforcement Learning is an artificial intelligence algorithm which allows “agents” to develop behaviors through trial-and-error in an attempt to meet some goal which provides reward in the form of positive numbers.  Each agent is defined as a small program with state (e.g., location, sensory input) and a memory or “value function” which can keep track  of how much numerical reward it expects to obtain by choosing a possible action.

Although normally thought to be of interest only to computer scientists, Reinforcement Learning has recently attracted the attention of cognitive neuroscientists because of emerging evidence that something like it might be used in the brain.

By providing these agents with a goal that can only be achieved through some measure of coorperation or under some pressure, issues of conflict and coorperation can by studied in a perfectly controlled computer simulation environment.


History’s Top Brain Computation Insights: Day 18

Reaction times for a visual search task illustrating controlled and automatic processing18) Behavior exists on a continuum between controlled and automatic processing (Schneider & Shiffrin – 1977)

During the 1970s those studying the cognitive computations underlying visual search were at an impasse. One group of researchers claimed that visual search was a flat search function (i.e., adding more distracters doesn't increase search time), while another group claimed that the function was linear (i.e., adding more distracters increases search time linearly).

Both groups had solid evidence supporting their view. What were the two groups doing differently that could explain such different results?

As a graduate student working with Shiffrin, Schneider sat the two groups down during a scientific conference to have them figure out why their results differed so much. Needless to say, little was accomplished as both sides talked past one another.

Several years later Schneider & Shiffrin came to the realization that the two groups were practicing their subjects differently. The group with the flat search function allowed their subjects to practice the search task many times before collecting data. In contrast, the group with the linear search function began collecting data as soon as their subjects could perform the task.

This realization lead Schneider & Shiffrin to posit a distinction between automatic (flat search function) and controlled (linear search function) processing. In a landmark set of papers they clearly demonstrated this dual process distinction along with the boundary conditions of controlled and automatic task performance.

Read more

History’s Top Brain Computation Insights: Day 8

A dog being trained to jump on command over the course of 20 minutes8) Reward-based reinforcement learning can explain much of behavior (Skinner – 1938, Thorndike – 1911, Pavlov – 1905)

B. F. Skinner showed that reward governs much of human and animal behavior. He discovered operant conditioning, a method for manipulating behavior so powerful he could teach a pigeon to bowl (or a dog to jump on command; see figure). This was an expansion of Thorndike's Law of Effect, which says that behaviors associated with a reward will be reinforced, while behaviors associated with no reward will not. Both Skinner and Thorndike expanded on earlier work by Pavlov, which showed that some reflexes can be conditioned using paired stimulus presentation (e.g., a bell with food later causes salivation to a bell alone).

Implication: The mind, largely governed by reward-seeking behavior, is implemented in an organ with distributed and modular function consisting of excitatory and inhibitory neurons communicating via electro-chemical synaptic connections.

[This post is part of a series chronicling history's top brain computation insights (see the first of the series for a detailed description)]