Introduction to Bayesian Methods from a Cognitive Perspective Tejas - PowerPoint PPT Presentation

Introduction to Bayesian Methods from a Cognitive Perspective Tejas D Kulkarni (tejask@mit.edu) MIT 9.S915 Sunday, September 21, 14

Everyday Inductive Leaps • How do we learn so much from so little data? • Properties of natural kinds • One shot recognition of novel objects • Meaning of words • Future outcomes of dynamic process • Hidden causal properties of objects • Causes of person’s action (beliefs, goals) • Causal laws governing a domain Sunday, September 21, 14

Learning concepts and words � tufa � � tufa � � tufa � Can you pick out the tufas? Sunday, September 21, 14

Why Probability? • Our internal models of reality are often incomplete. Therefore we need a mathematical language to handle uncertainty • Probability theory is a framework to extend logic to include reasoning on uncertain information • Probability need not have anything to do with randomness. Probabilities do not describe reality -- only our information about reality - E.T. Jaynes • Bayesian statistics describes epistemological ( study of the nature and scope of knowledge ) uncertainty using the mathematical language of probability • Start with prior beliefs and update these using data to give posterior beliefs Sunday, September 21, 14

Fundamentals Given: D = { x 1 , x 2 , ..., x n } Prior probability: P ( H ) Likelihood: P ( D | H = h ) Calculate Posterior: P ( D | H = h ) P ( H = h ) P ( H = h | D ) = P i P ( D | H = h i ) P ( H = h i ) Sunday, September 21, 14

Hypothesis Testing: Coin Flipping Data (D): H H T H T => fair coin => always heads H 1 H 2 1 P ( H 1 ) = 0 . 5 P ( D | H 1 ) = 2 5 P ( D | H 2 ) P ( H 2 ) = = 0 . 5 0 P ( H 1 | D ) P ( D | H 1 ) P ( H 1 ) inf = = P ( H 2 | D ) P ( D | H 2 ) P ( H 2 ) Sunday, September 21, 14

Hypothesis Testing: Coin Flipping Data (D): H H H H H => fair coin => always heads H 1 H 2 1 P ( H 1 ) 999 / 1000 = P ( D | H 1 ) = 2 5 P ( D | H 2 ) P ( H 2 ) 1 / 1000 = = 1 P ( H 1 | D ) P ( D | H 1 ) P ( H 1 ) = = 31 . 21 P ( H 2 | D ) P ( D | H 2 ) P ( H 2 ) Sunday, September 21, 14

Hypothesis Testing: Coin Flipping Data (D): H H H H H H H H H H => fair coin => always heads H 1 H 2 1 P ( H 1 ) 999 / 1000 = P ( D | H 1 ) = 2 10 P ( D | H 2 ) P ( H 2 ) 1 / 1000 = = 1 P ( H 1 | D ) P ( D | H 1 ) P ( H 1 ) 0 . 97 = = P ( H 2 | D ) P ( D | H 2 ) P ( H 2 ) Sunday, September 21, 14

Example: Vision as Inverse Graphics face-id Eyes Eyes Nose Nose Light Mouth Mouth Outline Outline Shape Texture Shading Affine Inference Problem: Simulator P ( S, T, L, A | I ) ∝ P ( I | S, T, L, A ) P ( L ) P ( S ) P ( T ) P ( A ) Y ∝ N ( I − O ; 0 , 0 . 1) P ( L ) P ( A ) P ( S i ) P ( T i ) Image i Sunday, September 21, 14

Example: Vision as Inverse Graphics face-id Eyes Eyes Nose Nose Light Mouth Mouth Outline Outline Shape Texture Shading Affine Random Draw Simulator Image Sunday, September 21, 14

Nose Nose Eyes Eyes ? ? Light Outline Outline Mouth Mouth ? Shape Texture Shading Affine ? Simulator Image Sunday, September 21, 14

Nose Nose Eyes Eyes Light Outline Outline Mouth Mouth Shape Texture Shading Affine Simulator Image Sunday, September 21, 14

Example: Vision as Inverse Graphics Aldrian et. al., Inverse Rendering with a Morphable Model: A Multilinear Approach, ECCV 2011 Sunday, September 21, 14

Optimal Predictions in Everyday Cognition • How well do cognitive judgements compare with optimal statistical inferences in real-world settings? • In Griffiths&Tenenbaum [06], people were asked to predict the duration or extent of everyday phenomena such as human life spans and the gross of movies • During experiments, the phenomena and amount of data for each phenomena was parametrically varied to test predictions from an optimal bayesian model to reported human predictions. Sunday, September 21, 14

Optimal Predictions in Everyday Cognition • Let us denote t total : eg. total amount of time man/woman will live : eg. indicates his or her current age t • The Bayesian predictor computes a probability distribution over given , by applying Bayes’s rule: t total t P ( t total | t ) ∝ p ( t | t total ) p ( t total ) • The likelihood is the probability of first encountering a man/woman at age t given that his total life span is t total • For eg. when we are equally likely to meet a man/woman at any point in his life, the likelihood probability is uniform: p ( t | t total ) = 1 /t total Sunday, September 21, 14

Sample Questions Sunday, September 21, 14

Comparing model with humans ... Sunday, September 21, 14

Comparing model with humans ... • These results are inconsistent with claims that cognitive judgments are based on non-Bayesian heuristics that are insensitive to priors (Kahneman et al., 1982; Tversky & Kahneman, 1974) • The results are also inconsistent with simpler Bayesian prediction models that adopt a single uninformative prior , regardless of the phenomenon to p ( t total ) ∝ 1 /t total be predicted (Gott, 1993, 1994; Jaynes, 2003; Jeffreys, 1961; Ledford et al., 2001) • Why is variance high for the Pharaoh experiment? Sunday, September 21, 14

Comparing model with humans ... • Given an unfamiliar prediction task, people might be able to identify the appropriate form of the distribution by making an analogy to more familiar phenomena in the same broad class, even if they do not have sufficient direct experience to set the parameters of that distribution accurately • If participants predicted the reign of the pharaoh by drawing an analogy to modern monarchs and adjusting the mean reign duration downward by some uncertain but insufficient factor, that would be entirely consistent with the pattern of errors we observed. Such a strategy of prediction by analogy could be an adaptive way of making judgments that would otherwise lie beyond people’s limited base of knowledge and experience Ref: http://web.mit.edu/cocosci/Papers/Griffiths-Tenenbaum-PsychSci06.pdf Sunday, September 21, 14

Graphical Models: Bayes Nets • Compact way to represent probabilities • Mental model of causal information flow that gives rise to data/ observations Sunday, September 21, 14

Graphical Models: Bayes Nets Joint: P ( C, S, R, W ) = P ( C ) P ( S | C ) P ( R | C, S ) P ( W | C, S, R ) Space required to represent probability table: O (2 N ) Sunday, September 21, 14

Graphical Models: Bayes Nets Joint (conditional ind): P ( C, S, R, W ) = P ( C ) P ( S | C ) P ( R | C ) P ( W | S, R ) Space required to represent probability table (K is max fan-in of a node): O ( N 2 K ) Sunday, September 21, 14

Inference Suppose we observe that grass is wet. This reasons could be: (1) either it is raining, or (2) the sprinkler is on. Which is more likely? X P ( W = 1) = P ( C = c, S = s, R = r, W = 1) = 0 . 6471 c,r,s P ( S = 1 | W = 1) = P ( S = 1 , W = 1) P ( C = c, S = 1 , R = r, W = 1) X = = 0 . 2781 / 0 . 6471 P ( W = 1) P ( W = 1) c,r P ( R = 1 | W = 1) = P ( R = 1 , W = 1) P ( C = c, S = s, R = 1 , W = 1) X = = 0 . 4581 / 0 . 6471 P ( W = 1) P ( W = 1) c,s Sunday, September 21, 14

Bayes Nets: Explaining Away • Two causes ( R =1 and S =1) were competing to explain the data. Therefore, S and R become conditionally dependent given that W is observed (even though they are marginally independent) • Suppose grass is wet ( W =1) but we also know that it is raining. Then the posterior probability that the sprinkler is on goes down: P ( S = 1 | W = 1 , R = 1) = 0 . 19 • Remember Earlier: P ( S = 1 | W = 1) = P ( S = 1 , W = 1) P ( C = c, S = 1 , R = r, W = 1) 0.42 X = = 0 . 2781 / 0 . 6471 P ( W = 1) P ( W = 1) c,r Sunday, September 21, 14

More complex models ... Sunday, September 21, 14

Inference • Many inference strategies for generative models: MCMC, Variational, Message Passing, Particle filtering etc. • Today we will discuss an algorithm that is simple and general (not always efficient) Sunday, September 21, 14

Inference • Simplest MCMC Algorithm: Metropolis Hastings • For simplicity, let us fix light and affine variables. S Nose ∼ randn (50) T Nose ∼ randn (50) . Nose Nose . Eyes Eyes . Light Outline Outline Mouth Mouth S Mouth ∼ randn (50) Shape Texture T Mouth ∼ randn (50) Shading P ( I | S, T ) ∝ Normal( O − R ; 0 , σ 0 ) Affine Simulator Image Sunday, September 21, 14

Introduction to Bayesian Methods from a Cognitive Perspective Tejas - PowerPoint PPT Presentation

Introduction to Bayesian Methods from a Cognitive Perspective Tejas D Kulkarni (tejask@mit.edu) MIT 9.S915 Sunday, September 21, 14 Everyday Inductive Leaps How do we learn so much from so little data? Properties of natural kinds

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Cognitive Interviewing Debbie Collins What is cognitive interviewing? Cognitive interviewing

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Bayesian Methods in Cryo-EM Marcus A. Brubaker York University / Structura Biotechnology Toronto,

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

Our Path So Far COGS 105 Research Methods for Cognitive Scientists Remember, cognitive

Cognitive Computing Venkat N Gudivada East Carolina University Greenville, North Carolina USA

Cognitive Modeling Symbolic School Lecture 2: Approaches Symbolic Models 2 Symbolic

Bayesian Cognitive Science Cognitive Science views the brain as an Information Processor :

Bayesian Zig Zag Developing probabilistic models using grid methods and MCMC Allen Downey ACM

Bayesian data analysis & cognitive modeling Session 12: Bayesian ideas in philosophy of

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Advanced Molecular Dynamics Lyopanov instability and the shadow theorem (4.3.4) Lagrangian and

Week 1 August 28-September 1, 2017 Concept: Classroom Community Essential Question: How do

Internati national onal Symposium mposium on reveal alin ing g the histor ory of the univer

ECE228 and SIO209 Machine learning for physical applications, Spring 2019

The Art, Science and Algorithms into our workshop on Thursday using DIY Light finders ! of

Jan Verspecht bvba Gertrudeveld 15 1840 Steenhuffel Belgium email: contact@janverspecht.com

Landing Gear Layout Prof. Rajkumar Pant Aerospace Engineering Department IIT Bombay AE-332M /

COVID-19 Incident Briefing Thursday, June 11, 2020 Bill George Carbon County Public Health

Introduction to Bayesian Methods from a Cognitive Perspective Tejas - PowerPoint PPT Presentation

Introduction to Bayesian Methods from a Cognitive Perspective Tejas D Kulkarni (tejask@mit.edu) MIT 9.S915 Sunday, September 21, 14 Everyday Inductive Leaps How do we learn so much from so little data? Properties of natural kinds

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Cognitive Interviewing Debbie Collins What is cognitive interviewing? Cognitive interviewing

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Bayesian Methods in Cryo-EM Marcus A. Brubaker York University / Structura Biotechnology Toronto,

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

Our Path So Far COGS 105 Research Methods for Cognitive Scientists Remember, cognitive

Cognitive Computing Venkat N Gudivada East Carolina University Greenville, North Carolina USA

Cognitive Modeling Symbolic School Lecture 2: Approaches Symbolic Models 2 Symbolic

Bayesian Cognitive Science Cognitive Science views the brain as an Information Processor :

Bayesian Zig Zag Developing probabilistic models using grid methods and MCMC Allen Downey ACM

Bayesian data analysis &amp; cognitive modeling Session 12: Bayesian ideas in philosophy of

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Advanced Molecular Dynamics Lyopanov instability and the shadow theorem (4.3.4) Lagrangian and

Week 1 August 28-September 1, 2017 Concept: Classroom Community Essential Question: How do

Internati national onal Symposium mposium on reveal alin ing g the histor ory of the univer

ECE228 and SIO209 Machine learning for physical applications, Spring 2019

The Art, Science and Algorithms into our workshop on Thursday using DIY Light finders ! of

Jan Verspecht bvba Gertrudeveld 15 1840 Steenhuffel Belgium email: contact@janverspecht.com

Landing Gear Layout Prof. Rajkumar Pant Aerospace Engineering Department IIT Bombay AE-332M /

COVID-19 Incident Briefing Thursday, June 11, 2020 Bill George Carbon County Public Health

Bayesian data analysis & cognitive modeling Session 12: Bayesian ideas in philosophy of