Bayesian(Updating( Peter(Bossaerts,(Caltech( Goals( - - PDF document

bayesian updating
SMART_READER_LITE
LIVE PREVIEW

Bayesian(Updating( Peter(Bossaerts,(Caltech( Goals( - - PDF document

8/3/12& Bayesian(Updating( Peter(Bossaerts,(Caltech( Goals( Relation(With(Reinforcement(Learning( To(highlight(core(characteristics(of(Bayesian(updating:( Optimal(Integration(of(Prior(belief(and(Evidence((via( 1. Likelihood)(


slide-1
SLIDE 1

8/3/12& 1&

Bayesian(Updating(

Peter(Bossaerts,(Caltech(

Goals(

 Relation(With(Reinforcement(Learning(  To(highlight(core(characteristics(of(Bayesian(updating:(

1.

Optimal(Integration(of(Prior(belief(and(Evidence((via( Likelihood)(

2.

Optimality:(Martingales(

3.

ModelLBased(Learning(Approach(

4.

Integration(of(Hypotheses((“Marginalization”)(

5.

Polyvalent(Uncertainty(  Humans(Are(Not(Bayesians?(

 Monty(Hall(

2"

slide-2
SLIDE 2

8/3/12& 2&

Reinforcement(Learning(

 Most(of(the(examples(in(psychology/neuroscience(are(about(formation(of(

beliefs(about(events/stimuli(that(have(a((fixed)(affective(value((reward/loss).(

 In(such(a(context,(psychologists/neuroscientists(usually(talk(about(

reinforcement(learning.(

 One(distinguishes(two(types((Daw,(Niv,(Dayan(2005)(  ModelLfree:(

Pure(Pavlovian:(TD(learning((see(before)(

Instrumental:(Q(learning((WatkinsLDayan(1992)(  ModelLbased(

Example:(Bayesian(learning(

 I(will(casually(talk(about(modelLfree(learning(as(“reinforcement(learning”((RL)(

while(identifying(modelLbased(learning(as(“Bayesian.”((

3"

1.(Integration(of(Prior(belief(and(Evidence( (via(Likelihood)(

 Posterior(=(Prior(*(Likelihood(  (Compare(to(Prediction(Error(based(learning:(New(Belief(=(

Old(Belief(+(Learning(Rate*Prediction(Error)(

4"

slide-3
SLIDE 3

8/3/12& 3&

Sensorimotor(Learning(Example( (Körding/Wolpert,(Nature(2004)(

 Prior(unobserved(lateral(shift(  Noisy(observation(

5"

8

Results(

6"

8 8

slide-4
SLIDE 4

8/3/12& 4&

Results((c’d)(

7"

(Note:(Posterior(MEAN(only;(recent(evidence:(tradeLoff(posterior(meanLvariance)(

Drawing(from(an(Urn:(Conservatism(

 Bet(whether(right(urn(was(selected…(

8"

b

a

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Bayesian Observed Observed Bayesian Robust Bayesian

b

Update −0.10 −0.05 0.00 0.05 0.10 Observed Bayesian Robsut Bayesian See Orange Ball See Green Ball

(D’Acremont(ea,(under(review)(

slide-5
SLIDE 5

8/3/12& 5&

2.(Bayesian(Beliefs(Form(A(Martingale(

 What(is(a(martingale?(E[X(t+1)(|(Past(Data](=(X(t).(  “One(cannot(predict(direction+magnitude(of(changes(in(X.”(  (Still(possible:(predict(E[(X(t+1)LX(t))^2(|(Past(Data]!)(  Fundamental(concept(in(stochastic(process(theory((and(

mathematical(finance)(

9"

Doob’s(Lemma(

 Bayesian(beliefs(form(a(martingale.(  That(is:(E[Posterior(outcome)(|(Past(Data](=(Prior(outcome).(  Intuition:(If(this(were(violated,(one(could(predict(changes(in(one’s(own(

beliefs,(which(means(that(one’s(own(beliefs(have(not(been(updated( “enough.”(

 This(is(the(essence(of(“rational(learning.”(  Remarks:(

 Martingale(Convergence(Theorem:(Bayesian(beliefs(are(expected(to(

converge.(

 When(beliefs(are(a(martingale,(updates(“maximize(surprise,”(and(hence(

beliefs(incorporate(as(much(information(as(possible(–(information(theory.(

10"

slide-6
SLIDE 6

8/3/12& 6&

Why(are(Bayesian(beliefs(a(martingale?(

 Because(Bayesians(update(based(on(the(likelihood((ratio):(

likelihood(of(observed(data((“stimulus/signal”)(given(one( hypothesis(compared(to(likelihood(of(observed(data(given( alternatives.(

 (Contrast(this(with(standard(predictionLerror(based(learning(

schemes(like(RescorlaLWagner,(which(are(based(on:( PE(=(Outcome(L(Prediction(

11"

Still,(predictionLerror(learning(models(can( be(made(to(“emulate”(Bayesian(learning(

 Nicest(example((I(think):(Sutton(1992.((  He(sets(the(learning(rate((“gain”)(such(that(one(expects(to(minimize(the(size(of(

the(subsequent(prediction(errors.(

 Sutton(proves(that(this(is(the(same(as(to(minimize(the(correlation((over(time)(of(

the(prediction(error.(

 If(prediction(errors(are(positively(correlated,(one’s(learning(rate(is(TOO(LOW;(  If(negatively(correlated,(the(learning(rate(is(TOO(HIGH.(  If(predictions(form(a(martingale,(changes(in(predictions(are(uncorrelated(  So,(Sutton(attempts(to(generate(a(martingale…(  (Sutton’s(algorithm(works(very(well!!)(

12"

slide-7
SLIDE 7

8/3/12& 7&

Back(to(Urn(Betting…(

13"

 Martingale(test(accepted…(

a

2 4 6 8 10 −0.04 −0.02 0.00 0.02 0.04

Sample Size Update

b

2 4 6 8 10 −0.010 −0.005 0.000 0.005 0.010

Sample Size Covariance

…(despite(conservatism(

 …(because(participants(used(a(

robust(prior,(not(the( “true”((announced)(prior,( unlike(in(KördingLWolpert.(

 (Robust(prior:(mixturesLofL

binomials)(

14" Probability Density High range Low range

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5 2.0 Expected prior Less conservatism More conservatism

slide-8
SLIDE 8

8/3/12& 8&

Remarks(

 Truth(is(more(complicated:(Bayesian(beliefs(are(a(martingale(

  • nly(from(the(perspective(of(the(learner.(

 Specifically,(they(may(not(be(a(martingale(from(the(

perspective(of(an(observer(who(knows(more((e.g.,(which(urn( is(more(likely(to(be(correct?)(

 Doob’s(result(can(be(extended((Bossaerts,(REStud02004)...(

15"

Neurobiological(basis?(

 YangLShadlen((Nature,(

2007):(recordings(in(monkey( parietal(cortex(shows( updating(based(on(likelihood( ratio(

 (In(their(task,(information(is(

not(I.I.D.(conditional(on( correct(target(location.)(

16"

0 ms: Target on 1st shape on 500 ms: 2nd shape on 1,000 ms: 3rd shape on 1,500 ms: 4th shape on 2,000 ms: shapes off Time 2,400–2,600 ms: Fixation off, saccade Fixation

a

+∞ 0.9 0.7 0.5 0.3 –0.3 –0.5 –0.7 –0.9 –∞

Assigned weights

Favouring green Favouring red

Shapes

slide-9
SLIDE 9

8/3/12& 9&

Results…(

17"

3rd shape on 4th shape on All shape off 20 40 60 80 Tin Tout 3,000 Time (ms) 1,000 2,000 2nd shape on Targets and 1st shape on Response (sp s–1)

a 6.2 ± 0.7 5.8 ± 0.7 4.9 ± 0.5 6.2 ± 0.5 Epoch 1 Epoch 2 Epoch 4 Epoch 3 20 40 60 80 Response (sp s–1) Response (sp s–1) logLR for Tin + –

600 0 600 0 600 0 600

30 logLR (ban) Time (ms)

c b

−4 4 −4 4 −4 4 −4 4

3.(Bayesian(Learning(Is(ModelLBased(

 Bayesian(learning(is(about(“inverting(

beliefs”((Laplace)(to(assess(the(veracity(of( underlying(“causes”((

 This(requires(a(model0(of0the0hidden0causes);0S(t)(

(medication)(and(Y(t)((symptoms)(are(not(just( correlated,(but(S(t)(causes(X(t)((infection)(which( causes(Y(t).(

 This(contrasts(with(Reinforcement0Learning(which(

  • nly(involves(observables(((certain(S(t)(

(medication)(help(Y(t)((symptoms),(but(the(RL( agent(does(not(care(to(probe(why?0

 (But(modelLbased(learning(does(not(need(

Bayesian(updating…)(

18" Xt Yt St

slide-10
SLIDE 10

8/3/12& 10&

Neurobiological(Foundation?(

 Reversal(Task:(Does(the((human)(brain(record(that(when(one(

  • ption(goes(bad,(the(other(must0be0better?0(Hampton(ea,(JN0

2006;(threeLoption(case:(Beierholm(ea,(NeuroImage02011)0

19"

More(Challenging…(see(correlation(study( in(Class(3(

 Underlying(correlation(changes(  Do(humans(learn(by(trial(and(error((reinforcement)(or(by(explicitly(tracking(

correlation((Bayesian)?( (Wunderlich(ea,(Neuron02011)(

20"

slide-11
SLIDE 11

8/3/12& 11&

Choices…(



              

Subject Model Complete Info

21"

( Brain(Activation…( (

       

R

A

z = 7

Correlation( Correlation(Prediction( Error(

22"

slide-12
SLIDE 12

8/3/12& 12&

4.(Bayesians(Follow(Evidence(For(ALL( Hypotheses(

 …(as(opposed(to(“attention(gating”((hypothesis(testing):(pick(

  • ne(hypothesis(and(accept(it(until(evidence(gathers(against(it.(

 Bayesians(“marginalize”(across(hypotheses.(

23"

The(Task.(

 Two(modalities((“dimensions”)(may(“cause”(reward;(choose(

Top(or(Bottom((Wunderlich(ea,(J0Neurophys02011)(

24"

slide-13
SLIDE 13

8/3/12& 13&

Analysis:(Weight(on(each(dimension(

Subject(could(choose(based(on(motion(even(if(she(is(more(confident(that( color(is(right(because(confidence(in(choice(condition(on(motion(is(higher…(

25"

50 100 150 200 250 300

green red right left COLOR MOTION DIMENSION color motion trial

Activation…(

 To(be(able(to(weigh(appropriately(the(evidence(for(the(two(

dimensions(in(final(choice,(you(need(a(signal(of(confidence( (left)(or(uncertainty((right)(for(the(two(dimensions((summed( here)(

26"

B

z = 35 z = 10 x = 0

A

x = 2

slide-14
SLIDE 14

8/3/12& 14&

5.(Polyvalent(Uncertainty(

 In(Reinforcement(Learning,(

there(is(only(uncertainty( about(the(relation(between( S(t)(and(Y(t).(

 For(Bayesians,(there(is(

uncertainty(about(X(t),(about( Y(t)(given(X(t),(and(even( about(whether(the(relation( (S(t),(X(t))(changes…((

27"

Xt Yt St St Yt

Uncertainty…((

 Irreducible"uncertainty"or"risk:(Decision(Maker((DM)(knows(that(

the(chance(of(heads(on(a(fair(coin(is(0.5;(DM(doesn’t(know(whether( the(next(toss(will(be(heads(or(tails.((Concerns0the0relation0between0 X(t)0and0Y(t))(

 Estimation"uncertainty"or"ambiguity:(DM(is(given(a(new(coin(and(

doesn’t(know(whether(it(is(fair;(DM(needs(to(learn(the(probability(of( heads.((Concerns0how0sure0one0is0of0X(t))(

 Unexpected"uncertainty"or"jump"risk"(or"“volatility”):(Unknown(

to(DM,(the(coin(is(replaced(with(another((possibly(unfair)(coin.( (Concerns0whether0X(t)0has0changed)(

 Model"or"“Knightean”"uncertainty:(Is(the(coin(being(replaced(

regularly(or(are(coin(tosses(correlated?((Concerns0the0nature0of0X(t))"

28"

slide-15
SLIDE 15

8/3/12& 15&

Remarks(

 By(suitably(changing(the(learning(rate,(even(the(RL(agent(can(

behave(as0if(she(cares(about(the(separate(underlying(sources(of( uncertainty(

 E.g.:(  When(the(environment(becomes(inherently(less(predictable,(then( learning(rate(should(be(lower(  When(the(environment(becomes(more(unstable((“volatile”),(then( the(learning(rate(should(be(higher(  The(distinguishing(features(really(are:(  Is(the(agent(behaviorally(sensitive(to(separate(sources(of( uncertainty((e.g.,(ambiguity(averse)?(  Does(the(brain(form(explicit(representations(of(the(separate(source(

  • f(uncertainty?(

29"

Take(unexpected(uncertainty(or( “volatility”(

 “Jumps,”(e.g.,(binary(gamble:(reward(probability(reverts(with(

probability(v(

 V(could(be(called(“volatility”((don’t(be(confused(–(it(means(

something(else(in(finance)(–(Behrens(ea.(2007(

 (Intuitive:(

 As(v(increases,(INCREASE(learning(rate((older(data(become(

  • bsolete)(

 As(v(decreases,(DECREASE(learning(rate(

 Learning(rate:(effect(of(last(prediction(error(on(new(

prediction(

30(

slide-16
SLIDE 16

8/3/12& 16&

Reversal(Learning(Task(( With(Changing(Reversal(Rate(

 Estimated(volatility(tracks(

reversal(rate( (

 Learning(rates(track(

volatility((optimally)( (Behrens(ea(2007)(

31(

Brain(Activation(

 Volatility(correlates(with(ACC(

activation(in(“monitoring”(period( (after(outcome(is(revealed(and( before(subsequent(decision( period)(

 (Could(also(be(learning(rate,(  …(consistent(with(animal(studies(

where(lesions(to(ACC(lead(to( impairment(in(adjusting( “memory”(of(learning)(

 ACC(activates(also(as(a(function(

  • f(TOTAL(uncertainty((“variance(
  • f(reward)(which(combines(

“volatility”(and(“irreducible( uncertainty”((which(was( DIFFERENT(across(stable(and( volatile(periods)( 32(

slide-17
SLIDE 17

8/3/12& 17&

Role(of(Norepinephrine((NE)(and(Acetylcholine( (ACh)(In(Expected/Unexpected(Uncertainty(

 Uncertainty(

about(cue( validity(is( irreducible0

 Uncertainty(

about(the( right(cue(can( be(reduced(

  • ver(time(–(

estimation( uncertainty( (YuLDayan:( unexpected( uncertainty)(

33(

Evolution(Over(Time( (

In(YuLDayan(algorithm,(estimation(uncertainty(stays(high(for(10(trials(after( perceived(context(switch(rather(than(gradually(decreasing;( gamma=prob(cue(is(correct),(so(irreducible(uncertainty=gamma*(1L gamma)( 34(

slide-18
SLIDE 18

8/3/12& 18&

Relation(with(pupil(dilation(

 …(which(is(thought(to(correlate(with(NE(fluctuations(  See(Preuschoff(ea(pupil(dilation(study(in(earlier(class(  (Unexpected(uncertainty(=(risk(prediction(error)(  See(also(Nassar(ea,(Nature(Neuroscience(June(2012.(

35(

Final(Remark:(Humans(Are(Not( Bayesian!?(

 “Monty(Hall”(  Most(people(

cannot(solve(this( problem(correctly(

36"