Introduction to Bayesian Methods from a Cognitive Perspective Tejas - - PowerPoint PPT Presentation

introduction to bayesian methods from a cognitive
SMART_READER_LITE
LIVE PREVIEW

Introduction to Bayesian Methods from a Cognitive Perspective Tejas - - PowerPoint PPT Presentation

Introduction to Bayesian Methods from a Cognitive Perspective Tejas D Kulkarni (tejask@mit.edu) MIT 9.S915 Sunday, September 21, 14 Everyday Inductive Leaps How do we learn so much from so little data? Properties of natural kinds


slide-1
SLIDE 1

Introduction to Bayesian Methods from a Cognitive Perspective

Tejas D Kulkarni (tejask@mit.edu) MIT 9.S915

Sunday, September 21, 14

slide-2
SLIDE 2

Everyday Inductive Leaps

  • How do we learn so much from so little data?
  • Properties of natural kinds
  • One shot recognition of novel objects
  • Meaning of words
  • Future outcomes of dynamic process
  • Hidden causal properties of objects
  • Causes of person’s action (beliefs, goals)
  • Causal laws governing a domain

Sunday, September 21, 14

slide-3
SLIDE 3

Learning concepts and words

tufa tufa tufa

Can you pick out the tufas?

Sunday, September 21, 14

slide-4
SLIDE 4

Why Probability?

  • Our internal models of reality are often incomplete.

Therefore we need a mathematical language to handle uncertainty

  • Probability theory is a framework to extend logic to

include reasoning on uncertain information

  • Probability need not have anything to do with
  • randomness. Probabilities do not describe reality -- only
  • ur information about reality - E.T. Jaynes
  • Bayesian statistics describes epistemological (study of the

nature and scope of knowledge) uncertainty using the mathematical language of probability

  • Start with prior beliefs and update these using data to

give posterior beliefs

Sunday, September 21, 14

slide-5
SLIDE 5

Fundamentals

Prior probability: Likelihood: Calculate Posterior: Given:

D = {x1, x2, ..., xn} P(H) P(D|H = h)

P(H = h|D) = P(D|H = h)P(H = h) P

i P(D|H = hi)P(H = hi)

Sunday, September 21, 14

slide-6
SLIDE 6

Hypothesis Testing: Coin Flipping

H H T H T Data (D):

H2 H1 P(D|H1) P(D|H2) P(H1) P(H2) P(H1|D) P(H2|D) P(D|H1) P(D|H2) P(H1) P(H2)

=> fair coin => always heads = = = = = =

1 25 0.5 0.5 inf

Sunday, September 21, 14

slide-7
SLIDE 7

Hypothesis Testing: Coin Flipping

H H H H H Data (D):

H2 H1 P(D|H1) P(D|H2) P(H1) P(H2) P(H1|D) P(H2|D) P(D|H1) P(D|H2) P(H1) P(H2)

=> fair coin => always heads = = = = = =

1 25 1 999/1000 1/1000 31.21

Sunday, September 21, 14

slide-8
SLIDE 8

Hypothesis Testing: Coin Flipping

H H H H H Data (D):

H2 H1 P(D|H1) P(D|H2) P(H1) P(H2) P(H1|D) P(H2|D) P(D|H1) P(D|H2) P(H1) P(H2)

=> fair coin => always heads = = = = = =

1 999/1000 1/1000

H H H H H

1 210 0.97

Sunday, September 21, 14

slide-9
SLIDE 9

Example: Vision as Inverse Graphics

Light Nose Eyes

Outline Mouth

Nose Eyes

Outline Mouth

Shape Texture

Shading Simulator

face-id

Image

Inference Problem:

Affine P(S, T, L, A|I) ∝ P(I|S, T, L, A)P(L)P(S)P(T)P(A) ∝ N(I − O; 0, 0.1)P(L)P(A) Y

i

P(Si)P(Ti)

Sunday, September 21, 14

slide-10
SLIDE 10

Example: Vision as Inverse Graphics

Random Draw

Light Nose Eyes

Outline Mouth

Nose Eyes

Outline Mouth

Shape Texture

Shading Simulator Image

face-id

Affine

Sunday, September 21, 14

slide-11
SLIDE 11

Example: Vision as Inverse Graphics

Random Draw

Light Nose Eyes

Outline Mouth

Nose Eyes

Outline Mouth

Shape Texture

Shading Simulator Image

face-id

Affine

Sunday, September 21, 14

slide-12
SLIDE 12

Example: Vision as Inverse Graphics

Random Draw

Light Nose Eyes

Outline Mouth

Nose Eyes

Outline Mouth

Shape Texture

Shading Simulator Image

face-id

Affine

Sunday, September 21, 14

slide-13
SLIDE 13

Example: Vision as Inverse Graphics

Random Draw

Light Nose Eyes

Outline Mouth

Nose Eyes

Outline Mouth

Shape Texture

Shading Simulator Image

face-id

Affine

Sunday, September 21, 14

slide-14
SLIDE 14

Light Nose Eyes Outline Mouth Nose Eyes Outline Mouth

Shape Texture

Shading Simulator Image

Affine

? ?

? ?

Sunday, September 21, 14

slide-15
SLIDE 15

Light Nose Eyes Outline Mouth Nose Eyes Outline Mouth

Shape Texture

Shading Simulator Image

Affine

Sunday, September 21, 14

slide-16
SLIDE 16

Light Nose Eyes Outline Mouth Nose Eyes Outline Mouth

Shape Texture

Shading Simulator Image

Affine

Sunday, September 21, 14

slide-17
SLIDE 17

Light Nose Eyes Outline Mouth Nose Eyes Outline Mouth

Shape Texture

Shading Simulator Image

Affine

Sunday, September 21, 14

slide-18
SLIDE 18

Aldrian et. al., Inverse Rendering with a Morphable Model: A Multilinear Approach, ECCV 2011

Example: Vision as Inverse Graphics

Sunday, September 21, 14

slide-19
SLIDE 19

Optimal Predictions in Everyday Cognition

  • How well do cognitive judgements compare with optimal

statistical inferences in real-world settings?

  • In Griffiths&Tenenbaum [06], people were asked to

predict the duration or extent of everyday phenomena such as human life spans and the gross of movies

  • During experiments, the phenomena and amount of data

for each phenomena was parametrically varied to test predictions from an optimal bayesian model to reported human predictions.

Sunday, September 21, 14

slide-20
SLIDE 20

Optimal Predictions in Everyday Cognition

  • The Bayesian predictor computes a probability

distribution over given , by applying Bayes’s rule:

  • The likelihood is the probability of first encountering a

man/woman at age t given that his total life span is ttotal

P(ttotal|t) ∝ p(t|ttotal)p(ttotal)

t

ttotal

  • For eg. when we are equally likely to meet a man/woman at

any point in his life, the likelihood probability is uniform:

p(t|ttotal) = 1/ttotal

  • Let us denote

ttotal

t

: eg. total amount of time man/woman will live : eg. indicates his or her current age

Sunday, September 21, 14

slide-21
SLIDE 21

Sample Questions

Sunday, September 21, 14

slide-22
SLIDE 22

Comparing model with humans ...

Sunday, September 21, 14

slide-23
SLIDE 23
  • These results are inconsistent with claims that cognitive

judgments are based on non-Bayesian heuristics that are insensitive to priors (Kahneman et al., 1982; Tversky & Kahneman, 1974)

  • The results are also inconsistent with simpler Bayesian

prediction models that adopt a single uninformative prior , regardless of the phenomenon to be predicted (Gott, 1993, 1994; Jaynes, 2003; Jeffreys, 1961; Ledford et al., 2001)

  • Why is variance high for the Pharaoh experiment?

Comparing model with humans ...

p(ttotal) ∝ 1/ttotal

Sunday, September 21, 14

slide-24
SLIDE 24
  • Given an unfamiliar prediction task, people might be able

to identify the appropriate form of the distribution by making an analogy to more familiar phenomena in the same broad class, even if they do not have sufficient direct experience to set the parameters of that distribution accurately

  • If participants predicted the reign of the pharaoh by

drawing an analogy to modern monarchs and adjusting the mean reign duration downward by some uncertain but insufficient factor, that would be entirely consistent with the pattern of errors we observed. Such a strategy

  • f prediction by analogy could be an adaptive way of

making judgments that would otherwise lie beyond people’s limited base of knowledge and experience

Comparing model with humans ...

Ref: http://web.mit.edu/cocosci/Papers/Griffiths-Tenenbaum-PsychSci06.pdf

Sunday, September 21, 14

slide-25
SLIDE 25
  • Compact way to represent probabilities
  • Mental model of causal information flow that gives rise to data/
  • bservations

Graphical Models: Bayes Nets

Sunday, September 21, 14

slide-26
SLIDE 26

Joint: Space required to represent probability table: O(2N)

P(C, S, R, W) = P(C)P(S|C)P(R|C, S)P(W|C, S, R)

Graphical Models: Bayes Nets

Sunday, September 21, 14

slide-27
SLIDE 27

Joint (conditional ind): Space required to represent probability table (K is max fan-in of a node):

P(C, S, R, W) = P(C)P(S|C)P(R|C)P(W|S, R)

O(N2K)

Graphical Models: Bayes Nets

Sunday, September 21, 14

slide-28
SLIDE 28

Suppose we observe that grass is wet. This reasons could be: (1) either it is raining, or (2) the sprinkler is on. Which is more likely?

P(S = 1|W = 1) = P(S = 1, W = 1) P(W = 1) = X

c,r

P(C = c, S = 1, R = r, W = 1) P(W = 1) = 0.2781/0.6471 P(R = 1|W = 1) = P(R = 1, W = 1) P(W = 1) = X

c,s

P(C = c, S = s, R = 1, W = 1) P(W = 1) = 0.4581/0.6471

P(W = 1) = X

c,r,s

P(C = c, S = s, R = r, W = 1) = 0.6471

Inference

Sunday, September 21, 14

slide-29
SLIDE 29

Bayes Nets: Explaining Away

  • Two causes (R=1 and S=1) were competing to explain the
  • data. Therefore, S and R become conditionally dependent

given that W is observed (even though they are marginally independent)

  • Suppose grass is wet (W=1) but we also know that it is
  • raining. Then the posterior probability that the sprinkler is
  • n goes down:

P(S = 1|W = 1, R = 1) = 0.19

  • Remember Earlier:

P(S = 1|W = 1) = P(S = 1, W = 1) P(W = 1) = X

c,r

P(C = c, S = 1, R = r, W = 1) P(W = 1) = 0.2781/0.6471

0.42

Sunday, September 21, 14

slide-30
SLIDE 30

More complex models ...

Sunday, September 21, 14

slide-31
SLIDE 31

Inference

  • Many inference strategies for generative models: MCMC,

Variational, Message Passing, Particle filtering etc.

  • Today we will discuss an algorithm that is simple and

general (not always efficient)

Sunday, September 21, 14

slide-32
SLIDE 32

Inference

  • Simplest MCMC Algorithm: Metropolis Hastings
  • For simplicity, let us fix light and affine variables.

Light Nose Eyes Outline Mouth Nose Eyes Outline Mouth

Shape Texture

Shading Simulator Image

Affine

SNose ∼ randn(50) TNose ∼ randn(50)

. . .

SMouth ∼ randn(50) TMouth ∼ randn(50)

P(I|S, T) ∝ Normal(O − R; 0, σ0)

Sunday, September 21, 14

slide-33
SLIDE 33

MCMC

  • Simplest MCMC Algorithm: Metropolis Hastings
  • For simplicity, let us fix light and affine variables.

SNose ∼ randn(50) TNose ∼ randn(50)

. . .

SMouth ∼ randn(50) TMouth ∼ randn(50)

P(I|S, T) ∝ Normal(O − R; 0, σ0)

Repeat until convergence:

(1) Let x be either Si or Ti

We sample new x0 ∼ randn(50)

(2) r = p(x0)q(x|x0) p(x)q(x0|x)

(3) Accept x’ with probability: α = min{1, r} Otherwise, x’= x

Sunday, September 21, 14

slide-34
SLIDE 34
  • Bayesian modeling allows us to

move beyond parameter estimation to infer structure of the model itself

  • Depending on the data, humans

use different structural forms to form abstractions.

  • Structure Learning can be cast

as posterior inference problem to obtain the most likely model form/structure under

  • bservations

Kemp et. al., The discovery of structural form, PNAS 2008

Towards more human like learning

Sunday, September 21, 14

slide-35
SLIDE 35

Kemp et. al., The discovery of structural form, PNAS 2008

Towards more human like learning

Sunday, September 21, 14

slide-36
SLIDE 36

Kemp et. al., The discovery of structural form, PNAS 2008

Towards more human like learning

Sunday, September 21, 14

slide-37
SLIDE 37

Growing Abstractions

Ref: http://mlg.eng.cam.ac.uk/zoubin/talks/uai05tutorial-b.pdf

  • Humans can grow abstractions in

an arbitrary way to fit data

  • Bayesian Non-parametric models

naturally increase the number of parameters with data. Therefore no issue of over or under fitting.

  • Many non-parametric processes:

Chinese Restaurant Process, Gaussian Process, HDP , Indian Buffet Process etc.

K = ?

Sunday, September 21, 14

slide-38
SLIDE 38

Growing Abstractions

DP Mixture Infinite HMM

Ref: http://mlg.eng.cam.ac.uk/zoubin/talks/uai05tutorial-b.pdf

Sunday, September 21, 14

slide-39
SLIDE 39

Probabilistic Programs

Ref: Slide idea from Noah Goodman’s PLOT presentation

0.125 0.25 0.375 0.5 1 2 3

P(n) = ✓3 n ◆ 0.3n0.73−n

Probabilistic Program (assume a (flip 0.3)) (assume b (flip 0.3)) (assume c (flip 0.3)) (+ a b c) Execution: (a=1, b=0, c=0),(a=0,b=0, c=0), (a=1,b=0, c=1), ... Theorem: Any computable distribution can be represented by a Church expression (Freer & Roy, 2012)

Sunday, September 21, 14

slide-40
SLIDE 40

Example: Probabilistic Programs

Sunday, September 21, 14

slide-41
SLIDE 41

ASSUME road_width (uniform_discrete 5 8) //arbitrary units ASSUME road_height (uniform_discrete 70 150) ASSUME lane_pos_x (uniform_continuous -1.0 1.0) //uncentered renderer ASSUME lane_pos_y (uniform_continuous -5.0 0.0) //coordinate system ASSUME lane_pos_z (uniform_continuous 1.0 3.5) ASSUME lane_size (uniform_continuous 0.10 0.35) ASSUME eps (gamma 1 1) ASSUME theta_left (list 0.13 ... 0.03) ASSUME theta_right (list 0.03 ... 0.02) ASSUME theta_road (list 0.05 ... 0.07) ASSUME theta_lane (list 0.01 ... 0.21) ASSUME surfaces (render_surfaces lane_pos_x lane_pos_y lane_pos_z road_width road_height lane_size) ASSUME data (load_image "frame201.png") OBSERVE (incorporate_stochastic_likelihood theta_left theta_right theta_road theta_lane data surfaces eps) True

Example: Probabilistic Programs

Sunday, September 21, 14

slide-42
SLIDE 42

Example: Probabilistic Programs

Method Accuracy Aly et al [1] 68.31% GPGP (Best Single Appearance) 64.56% GPGP (Maximum Likelihood over Multiple Appearances) 74.60%

Sunday, September 21, 14

slide-43
SLIDE 43

Example: Probabilistic Programs

superimposed sampled (d)

Assumptions violated: broad posterior Assumptions satisfied: narrower posterior

Sunday, September 21, 14

slide-44
SLIDE 44
  • Our conceptual knowledge about the world is productive

(requiring abstractions) and graded (requiring handling of uncertainty)

  • Programming languages have an unparalleled ability to be
  • expressive. Stochastic primitives give languages the ability to

handle uncertainty. This makes stochastic lambda calculus an appealing framework to express and instantiate concepts in human modeling and AI.

  • A more human like approach to learning is to infer most likely

program given observations.

  • Challenges in scaling up general purpose inference

Probabilistic Programming

Sunday, September 21, 14

slide-45
SLIDE 45

References

  • http://www.scholarpedia.org/article/Bayesian_statistics
  • http://bayes.wustl.edu/etj/articles/backward.look.pdf
  • http://www.johndcook.com/blog/2008/02/26/what-a-probability-means/
  • Josh Tenenbaum -- Presentation on “Bayesian Models of Human

Learning and Inference “

  • http://web.mit.edu/cocosci/Papers/Griffiths-Tenenbaum-PsychSci06.pdf
  • http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html

Sunday, September 21, 14