 
              8/3/12& Bayesian(Updating( Peter(Bossaerts,(Caltech( Goals(  Relation(With(Reinforcement(Learning(  To(highlight(core(characteristics(of(Bayesian(updating:( Optimal(Integration(of(Prior(belief(and(Evidence((via( 1. Likelihood)( Optimality:(Martingales( 2. ModelLBased(Learning(Approach( 3. Integration(of(Hypotheses((“Marginalization”)( 4. Polyvalent(Uncertainty( 5.  Humans(Are( Not (Bayesians?(  Monty(Hall( 2" 1&
8/3/12& Reinforcement(Learning(  Most(of(the(examples(in(psychology/neuroscience(are(about(formation(of( beliefs(about(events/stimuli(that(have(a((fixed)(affective(value((reward/loss).(  In(such(a(context,(psychologists/neuroscientists(usually(talk(about( reinforcement(learning.(  One(distinguishes(two(types((Daw,(Niv,(Dayan(2005)(  ModelLfree:( Pure(Pavlovian:(TD(learning((see(before)(  Instrumental:(Q(learning((WatkinsLDayan(1992)(   ModelLbased( Example:(Bayesian(learning(   I(will(casually(talk(about(modelLfree(learning(as(“reinforcement(learning”((RL)( while(identifying(modelLbased(learning(as(“Bayesian.”(( 3" 1.(Integration(of(Prior(belief(and(Evidence( (via(Likelihood)(  Posterior(=(Prior(*(Likelihood(  (Compare(to(Prediction(Error(based(learning:(New(Belief(=( Old(Belief(+(Learning(Rate*Prediction(Error)( 4" 2&
8/3/12& Sensorimotor(Learning(Example( (Körding/Wolpert,( Nature (2004)(  Prior(unobserved(lateral(shift(  Noisy(observation( 8 5" Results( 8 8 6" 3&
8/3/12& Results((c’d)( (Note:(Posterior(MEAN(only;(recent(evidence:(tradeLoff(posterior(meanLvariance)( 7" Drawing(from(an(Urn:(Conservatism(  Bet(whether(right(urn(was(selected…( b a b 1.0 Observed 0.10 Bayesian Observed Bayesian Robsut Bayesian 0.8 Robust Bayesian 0.05 0.6 Observed See Orange Update Ball 0.00 See Green 0.4 Ball − 0.05 0.2 0.0 − 0.10 0.0 0.2 0.4 0.6 0.8 1.0 Bayesian (D’Acremont(ea,(under(review)( 8" 4&
8/3/12& 2.(Bayesian(Beliefs(Form(A(Martingale(  What(is(a(martingale?(E[X(t+1)(|(Past(Data](=(X(t).(  “One(cannot(predict(direction+magnitude(of(changes(in(X.”(  (Still(possible:(predict(E[(X(t+1)LX(t))^2(|(Past(Data]!)(  Fundamental(concept(in(stochastic(process(theory((and( mathematical(finance)( 9" Doob’s(Lemma(  Bayesian(beliefs(form(a(martingale.(  That(is:(E[Posterior(outcome)(|(Past(Data](=(Prior(outcome).(  Intuition:(If(this(were(violated,(one(could(predict(changes(in(one’s(own( beliefs,(which(means(that(one’s(own(beliefs(have(not(been(updated( “enough.”(  This(is(the(essence(of(“rational(learning.”(  Remarks:(  Martingale(Convergence(Theorem:(Bayesian(beliefs(are(expected(to( converge.(  When(beliefs(are(a(martingale,(updates(“maximize(surprise,”(and(hence( beliefs(incorporate(as(much(information(as(possible(–(information(theory.( 10" 5&
8/3/12& Why(are(Bayesian(beliefs(a(martingale?(  Because(Bayesians(update(based(on(the(likelihood((ratio):( likelihood(of(observed(data((“stimulus/signal”)(given(one( hypothesis(compared(to(likelihood(of(observed(data(given( alternatives.(  (Contrast(this(with(standard(predictionLerror(based(learning( schemes(like(RescorlaLWagner,(which(are(based(on:( PE(=(Outcome(L(Prediction( 11" Still,(predictionLerror(learning(models(can( be(made(to(“emulate”(Bayesian(learning(  Nicest(example((I(think):(Sutton(1992.((  He(sets(the(learning(rate((“gain”)(such(that(one(expects(to(minimize(the(size(of( the(subsequent(prediction(errors.(  Sutton(proves(that(this(is(the(same(as(to(minimize(the(correlation((over(time)(of( the(prediction(error.(  If(prediction(errors(are(positively(correlated,(one’s(learning(rate(is(TOO(LOW;(  If(negatively(correlated,(the(learning(rate(is(TOO(HIGH.(  If(predictions(form(a(martingale,(changes(in(predictions(are( uncorrelated (  So,(Sutton(attempts(to(generate(a(martingale…(  (Sutton’s(algorithm(works(very(well!!)( 12" 6&
8/3/12& Back(to(Urn(Betting…(  Martingale(test(accepted…( a b 0.04 0.010 0.02 0.005 Covariance Update 0.000 0.00 − 0.005 − 0.02 − 0.010 − 0.04 2 4 6 8 10 2 4 6 8 10 Sample Size Sample Size 13" …(despite(conservatism(  …(because(participants(used(a( robust(prior,(not(the( 2.0 High range “true”((announced)(prior,( Low range unlike(in(KördingLWolpert.( 1.5  (Robust(prior:(mixturesLofL Density binomials)( 1.0 Expected prior More conservatism 0.5 Less conservatism 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Probability 14" 7&
8/3/12& Remarks(  Truth(is(more(complicated:(Bayesian(beliefs(are(a(martingale( only (from(the(perspective(of(the(learner.(  Specifically,(they(may(not(be(a(martingale(from(the( perspective(of(an(observer(who(knows(more((e.g.,(which(urn( is(more(likely(to(be(correct?)(  Doob’s(result(can(be(extended((Bossaerts,( REStud02004)... ( 15" Neurobiological(basis?(  YangLShadlen(( Nature ,( 2,400–2,600 ms: 2007):(recordings(in(monkey( a Fixation off, saccade 2,000 ms: parietal(cortex(shows( shapes off updating(based(on(likelihood( 1,500 ms: 4th shape on ratio( 1,000 ms: 3rd shape on 500 ms: 2nd shape on  (In(their(task,(information(is( Favouring red + ∞ 0 ms: Time Target on 0.9 not(I.I.D.(conditional(on( Assigned weights 1st shape on 0.7 0.5 Shapes 0.3 correct(target(location.)( Fixation –0.3 –0.5 –0.7 –0.9 – ∞ Favouring green 16" 8&
8/3/12& Results…( b Epoch 1 Epoch 2 Epoch 3 Epoch 4 80 a Targets and 2nd shape on 1st shape on 3rd shape on 4th shape on All shape off 60 Response (sp s –1 ) 80 + 40 T in 60 logLR for T in Response (sp s –1 ) 20 40 – 0 0 600 0 600 0 600 0 600 Time (ms) T out 20 c Response (sp s –1 ) 30 0 0 1,000 2,000 3,000 Time (ms) 6.2 ± 0.7 5.8 ± 0.7 4.9 ± 0.5 6.2 ± 0.5 0 − 4 0 4 − 4 0 4 − 4 0 4 − 4 0 4 logLR (ban) 17" 3.(Bayesian(Learning(Is(ModelLBased(  Bayesian(learning(is(about(“inverting( beliefs”((Laplace)(to(assess(the(veracity(of( underlying(“causes”((  This(requires(a( model0(of0the0hidden0causes);0 S(t)( S t (medication)(and(Y(t)((symptoms)(are(not(just( correlated,(but(S(t)(causes(X(t)((infection)(which( causes(Y(t).( X t  This(contrasts(with( Reinforcement0Learning (which( only(involves( observables (((certain(S(t)( (medication)(help(Y(t)((symptoms),(but(the(RL( Y t agent(does(not(care(to(probe( why?0  (But(modelLbased(learning(does(not(need( Bayesian(updating…)( 18" 9&
8/3/12& Neurobiological(Foundation?(  Reversal(Task:(Does(the((human)(brain(record(that(when(one( option(goes(bad,(the(other( must0be0better?0 (Hampton(ea,( JN0 2006 ;(threeLoption(case:(Beierholm(ea,( NeuroImage02011 ) 0 19" More(Challenging…(see(correlation(study( in(Class(3(  Underlying(correlation(changes(  Do(humans(learn(by(trial(and(error((reinforcement)(or(by(explicitly(tracking( correlation((Bayesian)?( (Wunderlich(ea,( Neuron02011 )( 20" 10&
8/3/12& Choices…( Subject            Complete Info Model      21" ( Brain(Activation…( (         A R Correlation( Correlation(Prediction( Error( z = 7 22" 11&
8/3/12& 4.(Bayesians(Follow(Evidence(For(ALL( Hypotheses(  …(as(opposed(to(“attention(gating”((hypothesis(testing):(pick( one(hypothesis(and(accept(it(until(evidence(gathers(against(it.(  Bayesians(“marginalize”(across(hypotheses.( 23" The(Task.(  Two(modalities((“dimensions”)(may(“cause”(reward;(choose( Top(or(Bottom((Wunderlich(ea,( J0Neurophys02011 )( 24" 12&
8/3/12& Analysis:(Weight(on(each(dimension( Subject(could(choose(based(on( motion (even(if(she(is(more(confident(that( color (is(right(because(confidence(in(choice(condition(on( motion (is(higher…( green COLOR red DIMENSION color motion MOTION right left 0 50 100 150 200 250 300 trial 25" Activation…(  To(be(able(to(weigh(appropriately(the(evidence(for(the(two( dimensions(in(final(choice,(you(need(a(signal(of(confidence( (left)(or(uncertainty((right)(for(the(two(dimensions((summed( here)( A B x = 2 x = 0 z = 35 z = 10 26" 13&
Recommend
More recommend