bayesian updating
play

Bayesian(Updating( Peter(Bossaerts,(Caltech( Goals( - PDF document

8/3/12& Bayesian(Updating( Peter(Bossaerts,(Caltech( Goals( Relation(With(Reinforcement(Learning( To(highlight(core(characteristics(of(Bayesian(updating:( Optimal(Integration(of(Prior(belief(and(Evidence((via( 1. Likelihood)(


  1. 8/3/12& Bayesian(Updating( Peter(Bossaerts,(Caltech( Goals(  Relation(With(Reinforcement(Learning(  To(highlight(core(characteristics(of(Bayesian(updating:( Optimal(Integration(of(Prior(belief(and(Evidence((via( 1. Likelihood)( Optimality:(Martingales( 2. ModelLBased(Learning(Approach( 3. Integration(of(Hypotheses((“Marginalization”)( 4. Polyvalent(Uncertainty( 5.  Humans(Are( Not (Bayesians?(  Monty(Hall( 2" 1&

  2. 8/3/12& Reinforcement(Learning(  Most(of(the(examples(in(psychology/neuroscience(are(about(formation(of( beliefs(about(events/stimuli(that(have(a((fixed)(affective(value((reward/loss).(  In(such(a(context,(psychologists/neuroscientists(usually(talk(about( reinforcement(learning.(  One(distinguishes(two(types((Daw,(Niv,(Dayan(2005)(  ModelLfree:( Pure(Pavlovian:(TD(learning((see(before)(  Instrumental:(Q(learning((WatkinsLDayan(1992)(   ModelLbased( Example:(Bayesian(learning(   I(will(casually(talk(about(modelLfree(learning(as(“reinforcement(learning”((RL)( while(identifying(modelLbased(learning(as(“Bayesian.”(( 3" 1.(Integration(of(Prior(belief(and(Evidence( (via(Likelihood)(  Posterior(=(Prior(*(Likelihood(  (Compare(to(Prediction(Error(based(learning:(New(Belief(=( Old(Belief(+(Learning(Rate*Prediction(Error)( 4" 2&

  3. 8/3/12& Sensorimotor(Learning(Example( (Körding/Wolpert,( Nature (2004)(  Prior(unobserved(lateral(shift(  Noisy(observation( 8 5" Results( 8 8 6" 3&

  4. 8/3/12& Results((c’d)( (Note:(Posterior(MEAN(only;(recent(evidence:(tradeLoff(posterior(meanLvariance)( 7" Drawing(from(an(Urn:(Conservatism(  Bet(whether(right(urn(was(selected…( b a b 1.0 Observed 0.10 Bayesian Observed Bayesian Robsut Bayesian 0.8 Robust Bayesian 0.05 0.6 Observed See Orange Update Ball 0.00 See Green 0.4 Ball − 0.05 0.2 0.0 − 0.10 0.0 0.2 0.4 0.6 0.8 1.0 Bayesian (D’Acremont(ea,(under(review)( 8" 4&

  5. 8/3/12& 2.(Bayesian(Beliefs(Form(A(Martingale(  What(is(a(martingale?(E[X(t+1)(|(Past(Data](=(X(t).(  “One(cannot(predict(direction+magnitude(of(changes(in(X.”(  (Still(possible:(predict(E[(X(t+1)LX(t))^2(|(Past(Data]!)(  Fundamental(concept(in(stochastic(process(theory((and( mathematical(finance)( 9" Doob’s(Lemma(  Bayesian(beliefs(form(a(martingale.(  That(is:(E[Posterior(outcome)(|(Past(Data](=(Prior(outcome).(  Intuition:(If(this(were(violated,(one(could(predict(changes(in(one’s(own( beliefs,(which(means(that(one’s(own(beliefs(have(not(been(updated( “enough.”(  This(is(the(essence(of(“rational(learning.”(  Remarks:(  Martingale(Convergence(Theorem:(Bayesian(beliefs(are(expected(to( converge.(  When(beliefs(are(a(martingale,(updates(“maximize(surprise,”(and(hence( beliefs(incorporate(as(much(information(as(possible(–(information(theory.( 10" 5&

  6. 8/3/12& Why(are(Bayesian(beliefs(a(martingale?(  Because(Bayesians(update(based(on(the(likelihood((ratio):( likelihood(of(observed(data((“stimulus/signal”)(given(one( hypothesis(compared(to(likelihood(of(observed(data(given( alternatives.(  (Contrast(this(with(standard(predictionLerror(based(learning( schemes(like(RescorlaLWagner,(which(are(based(on:( PE(=(Outcome(L(Prediction( 11" Still,(predictionLerror(learning(models(can( be(made(to(“emulate”(Bayesian(learning(  Nicest(example((I(think):(Sutton(1992.((  He(sets(the(learning(rate((“gain”)(such(that(one(expects(to(minimize(the(size(of( the(subsequent(prediction(errors.(  Sutton(proves(that(this(is(the(same(as(to(minimize(the(correlation((over(time)(of( the(prediction(error.(  If(prediction(errors(are(positively(correlated,(one’s(learning(rate(is(TOO(LOW;(  If(negatively(correlated,(the(learning(rate(is(TOO(HIGH.(  If(predictions(form(a(martingale,(changes(in(predictions(are( uncorrelated (  So,(Sutton(attempts(to(generate(a(martingale…(  (Sutton’s(algorithm(works(very(well!!)( 12" 6&

  7. 8/3/12& Back(to(Urn(Betting…(  Martingale(test(accepted…( a b 0.04 0.010 0.02 0.005 Covariance Update 0.000 0.00 − 0.005 − 0.02 − 0.010 − 0.04 2 4 6 8 10 2 4 6 8 10 Sample Size Sample Size 13" …(despite(conservatism(  …(because(participants(used(a( robust(prior,(not(the( 2.0 High range “true”((announced)(prior,( Low range unlike(in(KördingLWolpert.( 1.5  (Robust(prior:(mixturesLofL Density binomials)( 1.0 Expected prior More conservatism 0.5 Less conservatism 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Probability 14" 7&

  8. 8/3/12& Remarks(  Truth(is(more(complicated:(Bayesian(beliefs(are(a(martingale( only (from(the(perspective(of(the(learner.(  Specifically,(they(may(not(be(a(martingale(from(the( perspective(of(an(observer(who(knows(more((e.g.,(which(urn( is(more(likely(to(be(correct?)(  Doob’s(result(can(be(extended((Bossaerts,( REStud02004)... ( 15" Neurobiological(basis?(  YangLShadlen(( Nature ,( 2,400–2,600 ms: 2007):(recordings(in(monkey( a Fixation off, saccade 2,000 ms: parietal(cortex(shows( shapes off updating(based(on(likelihood( 1,500 ms: 4th shape on ratio( 1,000 ms: 3rd shape on 500 ms: 2nd shape on  (In(their(task,(information(is( Favouring red + ∞ 0 ms: Time Target on 0.9 not(I.I.D.(conditional(on( Assigned weights 1st shape on 0.7 0.5 Shapes 0.3 correct(target(location.)( Fixation –0.3 –0.5 –0.7 –0.9 – ∞ Favouring green 16" 8&

  9. 8/3/12& Results…( b Epoch 1 Epoch 2 Epoch 3 Epoch 4 80 a Targets and 2nd shape on 1st shape on 3rd shape on 4th shape on All shape off 60 Response (sp s –1 ) 80 + 40 T in 60 logLR for T in Response (sp s –1 ) 20 40 – 0 0 600 0 600 0 600 0 600 Time (ms) T out 20 c Response (sp s –1 ) 30 0 0 1,000 2,000 3,000 Time (ms) 6.2 ± 0.7 5.8 ± 0.7 4.9 ± 0.5 6.2 ± 0.5 0 − 4 0 4 − 4 0 4 − 4 0 4 − 4 0 4 logLR (ban) 17" 3.(Bayesian(Learning(Is(ModelLBased(  Bayesian(learning(is(about(“inverting( beliefs”((Laplace)(to(assess(the(veracity(of( underlying(“causes”((  This(requires(a( model0(of0the0hidden0causes);0 S(t)( S t (medication)(and(Y(t)((symptoms)(are(not(just( correlated,(but(S(t)(causes(X(t)((infection)(which( causes(Y(t).( X t  This(contrasts(with( Reinforcement0Learning (which( only(involves( observables (((certain(S(t)( (medication)(help(Y(t)((symptoms),(but(the(RL( Y t agent(does(not(care(to(probe( why?0  (But(modelLbased(learning(does(not(need( Bayesian(updating…)( 18" 9&

  10. 8/3/12& Neurobiological(Foundation?(  Reversal(Task:(Does(the((human)(brain(record(that(when(one( option(goes(bad,(the(other( must0be0better?0 (Hampton(ea,( JN0 2006 ;(threeLoption(case:(Beierholm(ea,( NeuroImage02011 ) 0 19" More(Challenging…(see(correlation(study( in(Class(3(  Underlying(correlation(changes(  Do(humans(learn(by(trial(and(error((reinforcement)(or(by(explicitly(tracking( correlation((Bayesian)?( (Wunderlich(ea,( Neuron02011 )( 20" 10&

  11. 8/3/12& Choices…( Subject            Complete Info Model      21" ( Brain(Activation…( (         A R Correlation( Correlation(Prediction( Error( z = 7 22" 11&

  12. 8/3/12& 4.(Bayesians(Follow(Evidence(For(ALL( Hypotheses(  …(as(opposed(to(“attention(gating”((hypothesis(testing):(pick( one(hypothesis(and(accept(it(until(evidence(gathers(against(it.(  Bayesians(“marginalize”(across(hypotheses.( 23" The(Task.(  Two(modalities((“dimensions”)(may(“cause”(reward;(choose( Top(or(Bottom((Wunderlich(ea,( J0Neurophys02011 )( 24" 12&

  13. 8/3/12& Analysis:(Weight(on(each(dimension( Subject(could(choose(based(on( motion (even(if(she(is(more(confident(that( color (is(right(because(confidence(in(choice(condition(on( motion (is(higher…( green COLOR red DIMENSION color motion MOTION right left 0 50 100 150 200 250 300 trial 25" Activation…(  To(be(able(to(weigh(appropriately(the(evidence(for(the(two( dimensions(in(final(choice,(you(need(a(signal(of(confidence( (left)(or(uncertainty((right)(for(the(two(dimensions((summed( here)( A B x = 2 x = 0 z = 35 z = 10 26" 13&

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend