Inter-individual variability in human feedback learning Stefano - PowerPoint PPT Presentation

Financial Education and Investor Behavior Conference Rio de Janeiro - 7/12/2015 Inter-individual variability in human feedback learning Stefano Palminteri, PhD Institute of Cognitive Science (UCL, London) Laboratoire de Neurosciences Cognitives (ENS, Paris) stefano.palminteri@gmail.com

Reinforcement learning (I) Learning by trial and error to select actions that maximize the occurrence of pleasant events ( rewards ) and minimize the occurrence of unpleasant events ( punishments ) An evolutionary pervasive psychological process A. Mellifera C. Elegans M. Musculus H. Sapiens volution E Thorndike, Skinner, Sutton, Barto etc…

Reinforcement learning (II) Learning is driven by prediction errors and choices are made comparing action values Policy: Prediction error: Learning rule: P(A) t =1/(1+exp((V(B) t -V(A) t )/ β )) V(A) t+1 =V(A) t + α PE t PE t =R-V(A) t Prediction error (PE t ) Prediction (V(A) t ) 1,0 1,0 1,0 P(A) t 0,5 0,5 0,5 0,0 0,0 0,0 -1 -0,5 0 0,5 1 1 20 1 20 (V(A) t -V(B) t ) Trials Trials Q-learning or Rescorla-Wagner model (RW)

Reinforcement learning (III) Fundamental dimensions: positive/negative vs. exploration/exploration Decision rule: Prediction error: Learning rule: P(A) t =1/(1+exp((V(B) t -V(A) t )/ β )) V(A) t+1 =V(A) t + α PE t PE t =R-V(A) t 1 Exploit previous knowledge 1 Positive prediction errors 2 Explore new options 2 Negative prediction errors

The framework (I) Reinforcement learning processes has been show to operate at different levels of human behavior Economy Motor learning “High level” “Low level” The general idea: Can “low level” reinforcement learning biases explain ‘high level’ behavioral biases? Erev, Camerer, Schultz, etc.

The framework (II) Context (s 1 ,… s j ) Options (a 1 , … a i ) Learning biases! Decision biases Update Selection Option values Choice Probabilities V(s j ,a i ) P(s j ,a i ) Agent Environment 1 Obtained outcomes Action (a) 1 Learning from direct experience (“factual”)

Optimism bias (I) Today special question: good news/bad news effect "It is the peculiar and perpetual error of the human understanding to be more moved and excited by affirmatives than negatives ; whereas it ought properly to hold Data itself indifferently disposed towards both alike" (p. 36). Francis Bacon (1620) PE>0 Belief(t) Information(t) Belief(t+1) PE<0 Revising beliefs as a function of: Insensitivity to negative errors can generates: • • Better than expected (PE>0) Inflated likelihood for desired events • • Reduced likelihood for undesired events Worst than expected (PE<0) Sharot et al.

Optimism bias (II) Current hypothesis: Asymmetric learning from positive and negative prediction errors as an atomic computational mechanism to generate and sustain optimistic beliefs (low  high level) Current questions: 1) Is this learning asymmetry specific of abstract belief or applies also to rewards? 2) Is this learning asymmetry dependent on the stimuli having prior desirability or still stands for neutral stimuli? Is this learning asymmetry specific to fictive – simulated- 3) experience or also stands for actual outcomes?

First study Experimental task and contingencies and dependent variables: Symmetric Symmetric options values Asymmetric option values options values = > < = Data “Conservatism” Motor bias Learning performances Data from Palminteri et al, Jneurosci, 2009; Worbe et al, Archives Gen Psy, 2011

First study (N=50)  The stimuli has no prior desirability Data  The outcomes are not hypothetical but real Conditions 2 & 3 1 0.9 Correct Choice Rate 0.8 0.7 0.6 0.5 0 5 10 15 20 25 Trials Conditions 1 & 4 1 0.9 Preferred Choice Rate 0.8 0.7 0.6 0.5 0 5 10 15 20 25 Trials

Formalism and predictions Decision rule: Prediction error: Learning rule: P(A) t =1/(1+exp((V(B) t -V(A) t )/ β )) V(A) t+1 =V(A) t + α PE t PE t =R-V(A) t Rescorla-Wagner model (RW) Rescorla-Wagner model (RW ) V(A) t+1 =V(A) t + α + PE t P(A) t =1/(1+exp((V(B) t -V(A) t )/ β )) PE t =R-V(A) t V(A) t+1 =V(A) t + α - PE t Standard RL Optimistic RL Pessimistic RL Possible results concerning the learning rates

The computational and behavioral results Model comparison Parameters Bayes: Which model? Parameters Behaviour Popper: Why?  Signs of optimistic reinforcement learning  Optimism enforces

A microscopic analysis or optimistic and realistic behavior Typical RW subjects Typical RW subjects A computational Preferred choice rate Preferred choice rate explanation for developing a “preferred option” even in poorly rewarding environments Preferred - Not preferred Preferred - Not preferred Value difference Value difference =

The robustness of the result Minimum outcome: Learning phase: Contingencies: Optimistic RL is expressed Optimistic RL is an artifact Optimistic RL is an because not winning is not arising from subjects artifact arising from “deciding” which is the subjects “giving up” that bad and would disappear with actual best option after the first symmetrical low reward monetary punishments: outcomes : conditions :

Testing the inflexibility of optimist subjects Limitations of the first studies  Task included only stable environments  Thus, subjects incurred not big losses by behaving optimistically Hypothesis concerning the learning rate: Standard Optimistic VS α C+ α C- α C+ α C- Contingence reversal The asymmetry is robust VS suppresses the asymmetry across contingency types.

The second study (N=20) The task includes a reversal learning condition (which should promote flexibility) First set: Second set: Learning is driven by Learning is driven by positive prediction negative prediction errors errors

The computational and behavioral results

The reversal learning curves Slower, but flexible Quicker, but inflexible  Optimistic learning is confirmed also when there are losses  Optimistic learning is confirmed also when it is maladaptive

Interim conclusions (I) and new questions So far: • We demonstrated that even in simple task involving abstract neutral items and direct reinforcement , subjects preferentially update their reward expectations following positive, that negative prediction errors. • This is true even when this behavior is disadvantageous (reversal learning) • However this tendency was quite variable across subjects: New questions: 1) Is optimistic reinforcement learning associated to interindividual differences in optimistic personality trait ? 2) Is this interindividual variability associated to specific neuroanatomical and functional brain signatures ? 3) Is this computational bias influenced by individual socioeconomic environment?

The link with optimistic personality trait Model-based correlation Life Orientation Test - Revised (LOT-R) Behavioural correlation External validity (I): Relation to psychometric measures of “optimism”

The neural bases of optimistic RL Neuroanatomical (VBM) Neurophysiology (fMRI) Model-based correlation Behavioural correlation Policy Update External validity (II): Relation to brain signatures ( Neurocomputational phenotypes)

The effect of environmental harshness ( preliminary ) Different life trajectories Bias amplitude External validity (III): Sensitivity to socio economic status

Extending the framework Context (s 1 ,… s j ) Options (a 1 , … a i ) Learning biases! Update Selection Option values Choice Probabilities V(s j ,a i ) P(s j ,a i ) Agent Environment 1 Obtained outcomes Action (a) 2 Forgone outcomes 1 Learning from direct experience (“factual”) 2 Learning from simulates experience (“counterfactual”)

The third study (N=20) Remain question (among others): - Is counterfactual learning also biased ? - This reinforcement learning bias is a valuation bias or a confirmation bias? New design - This task includes counterfactual feedbacks R U R C

Formalism and predictions Factual learning (as before) Counterfactual learning (new) V(C) t+1 =V(C) t + α C+ PE Ct V(U) t+1 =V(U) t + α U+ PE Ut PE Ct =R C -V(C) t PE Ut =R U -V(U) t V(C) t+1 =V(C) t + α C- PE Ct V(U) t+1 =V(U) t + α U- PE Ut Optimistic Realistic Allocentric Egocentric VS VS α U+ α U- α U+ α U- α U+ α U- α C+ α C- We know that Counterfactual The bias is choice the bias is choice feedback processing independent dependent is unbiased ( valuation bias ) ( confirmation bias )

Results - Counterfactual learning is also biased - The counterfactual learning bias is choice oriented (as a confirmation bias would be)

Final Conclusions  The good news/bad news (optimism) effect may be the consequence of a low level reinforcement learning bias  Factual learning extend to counterfactual learning in the form of a confirmation bias  This computational bias is highly variable in the population and this variability has external validity in term of neural bases, personality trait and environmental influences. Real life Abstract cues Fictive Real VS VS experience experience Reward Punishment Stable Volatile VS VS omission reception environment environment

Inter-individual variability in human feedback learning Stefano - PowerPoint PPT Presentation

Financial Education and Investor Behavior Conference Rio de Janeiro - 7/12/2015 Inter-individual variability in human feedback learning Stefano Palminteri, PhD Institute of Cognitive Science (UCL, London) Laboratoire de Neurosciences

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

VARIABILITY OF HAWAIIAN WINTER RAINFALL VARIABILITY OF HAWAIIAN WINTER RAINFALL VARIABILITY OF

Variability of an artificial tandem repeat Ted Pak HURS 2007 Variability of an artificial tandem

world of In Inter Ic Ice-Pump JAN 2016 Presentation of Inter Ice-Pump 1 Inter Ice-Pump ApS //

Why Inter- -Municipal Municipal Why Inter Cooperation? Cooperation? 1 Inter- -Municipal

Variability Extraction and Analysis Toolkit (VEXA) VEXA Introduction The Variability Extraction

Climate Variability in South Asia V. Niranjan, M. Dinesh Kumar, and Nitin Bassi Institute for

Introduction Variability in Data Summarizing variability in a data set CS 239

Chapter 4: Variability Variability Provides a quantitative measure of the degree to which

Multivariate models of inter-subject anatomical variability John Ashburner Wellcome Trust Centre

Baby Got Feedback: How to Give and Take Feedback Like A Boss Sarah Hagan @thesarahhagan Sarah

Geochemical Variability of Soils in Geochemical Variability of Soils in the Maritime Provinces of

Landscape variability and impacts Landscape variability and impacts of ammonia in relation to of

A Taxonomy of Variability in Web Service Flows A Taxonomy of Variability in Web Service Flows

Impacts of Impacts of Climate Variability and Climate Change Climate Variability and Climate

Issues in Managing Variability of Medical Imaging ACHER Mathieu, COLLET Philippe, LAHIRE

Optimism is a political act. Those who benefit from the status quo are perfectly happy for us to

Journal Question: How would you describe your personality?.... Do you like your personality?

Recalculating! Skills for When Lifes GPS Takes an Unexpected Turn South Dakota MGMA

Workshop November 20, 2013 Scott R. Chesney, AICP, Director City of Spokane Planning and

SEMINAR Evaluating Your Data: Types of Data and Basic Tests of Association L. H. Domenico, PhD,

Japan Making sense of the cyclical and the structural Webcall November 2017 Paul Chesson

O VERCOMING H OUSING AND E MPLOYMENT C HALLENGES FOR R ETURNING C ITIZENS : L ESSONS L EARNED FROM

November 2019 THE RESILIENT LAWYER How is it, when several people are exposed to the same

Sambuz

Useful Links

Newsletter

Mail Us