[PPT] - Probability & Statistics Intro / Review NEU 560 Jonathan Pillow PowerPoint Presentation

SLIDE 1

Probability & Statistics Intro / Review

NEU 560 Jonathan Pillow Lecture 6, part II

1

SLIDE 2

continuous probability distribution

takes values in a continuous space, e.g., probability density function (pdf):

2

SLIDE 3

discrete probability distribution

takes finite (or countably infinite) number of values, eg

probability mass function (pmf):

3

SLIDE 4

some friendly neighborhood distributions

⇤ ⌅ P(x; µ, σ) = 1 √ 2πσ exp (x − u)2 2σ2 ⇥

P(xn; µ, Λ) = 1 (2π)

n 2 |Λ| 1 2 exp

− 1

2(x − µ)T Λ−1(x − µ)

⇥

Gaussian multivariate Gaussian

P(x; a) = aeax

exponential Continuous

P(k; n, p) = ⇤n k ⌅ pk(1 − p)n−k

binomial

P(k; λ) = λk k! e−λ

Poisson Bernoulli Discrete coin flipping sum of n coin flips sum of n coin flips with P(heads)=λ/n, in limit n→∞

4

SLIDE 5

joint density

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

positive
sums to 1

5

SLIDE 6

marginalization (“integration”)

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

6

SLIDE 7

marginalization (“integration”)

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

7

SLIDE 8

conditionalization (“slicing”)

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

3
2
1

1 2 3

(“joint divided by marginal”)

8

SLIDE 9

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

3
2
1

1 2 3

conditionalization (“slicing”)

(“joint divided by marginal”)

9

SLIDE 10

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

3
2
1

1 2 3

marginal P(y) conditional

conditionalization (“slicing”)

10

SLIDE 11

conditional densities

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

3
2
1

1 2 3

11

SLIDE 12

conditional densities

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

3
2
1

1 2 3

12

SLIDE 13

Bayes’ Rule

Conditional Densities

posterior

Bayes’ Rule

likelihood prior marginal probability of y (“normalizer”)

13

SLIDE 14

Terminology question:

When do we call this a likelihood?

A: when considered as a function of x  (i.e., with y held fixed)

note: doesn’t integrate to 1.
What’s it called as a function of y, for fixed x?

conditional distribution or sampling distribution

14

SLIDE 15

Expectations (“averages”)

pdf

con'nuous discrete

r

Corresponds to taking weighted average of f(X), weighted by how probable they are under P(x). Expectation is the weighted average of a function (of a random variable) according to the distribution (of that random variable)

pmf

15

SLIDE 16

Expectations (“averages”)

Monte Carlo evaluation of an expectation: x(i) ∼ P(x)

1. draw samples from distribu'on:
2. average

for i = 1 to N

E[f(x)] ≈ 1

N N

X

i=1

f(x(i))

Expectation is the weighted average of a function (of a random variable) according to the distribution (of that random variable)

pdf

con'nuous

pmf

discrete

r

16

SLIDE 17

Expectations (“averages”)

Expectation is the weighted average of a function (of a random variable) according to the distribution (of that random variable) It’s really just a dot product! Thus, expectation is a linear function:

pdf

con'nuous

pmf

discrete

r

17

SLIDE 18

Expectations (“averages”)

The two most important expectations (also known as “moments”):

Mean: E[x] (average value of RV) 
Variance: E[(x - E[x])2] (average squared dist between X and its mean).

Note: expectations don’t always exist!

e.g. Cauchy: has no mean!

18

SLIDE 19

independence

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

19

SLIDE 20

independence

Definition: x, y are independent iff

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

20

SLIDE 21

independence

Definition: x, y are independent iff In linear algebra terms:

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

(outer product)

21

SLIDE 22

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

independence

Definition: x, y are independent iff Alternative definition:

3
2
1

1 2 3

All conditionals are the same!

22

SLIDE 23

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

independence

Definition: x, y are independent iff Alternative definition:

3
2
1

1 2 3

All conditionals are the same!

23

SLIDE 24

Correlation vs. Dependence

Mean of y|x changes systematically with x

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

positive correlation

−3 −2 −1 1 2 3 3 2 1 1 2 3

negative correlation

1. Correlation

24

SLIDE 25

Correlation vs. Dependence

Mean of y|x changes systematically with x

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

positive correlation

−3 −2 −1 1 2 3 3 2 1 1 2 3

negative correlation

1. Correlation
2. Dependence
arises whenever
quantified by

mutual information:

KL divergence

MI=0 ⇒ independence

25

SLIDE 26

Correlation vs. Dependence

Q: Can you draw a distribution that is uncorrelated but dependent?

26

SLIDE 27

Correlation vs. Dependence

filter 1 output filter 2 output P(filter 2 output | filter 1 output)

Flower image: [Schwartz & Simoncelli 2001]

“Bowtie” dependencies in natural scenes:

(uncorrelated but dependent)

Q: Can you draw a distribution that is uncorrelated but dependent?

27

SLIDE 28

Is this distribution independent?

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

28

SLIDE 29

Is this distribution independent?

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

29

SLIDE 30

Is this distribution independent?

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3

−3 −2 −1 1 2 3

No! Conditionals over y are different for different x!

30

SLIDE 31

FUN FACT:

independent (equal to the product of its marginals)
spherically symmetric:

Independent Gaussian is the only distribution that is both: Corollary: circular scatter / contour plot   not sufficient to show independence!

rthogonal matrix

31

SLIDE 32

Summary

continuous & discrete distributions
marginalization (splatting)
conditionalization (slicing)
Bayes’ rule (prior, likelihood, posterior)
Expectations
Independence & Correlation

32

Probability & Statistics Intro / Review

NEU 560 Jonathan Pillow Lecture 6, part II

continuous probability distribution

takes values in a continuous space, e.g., probability density function (pdf):

discrete probability distribution

takes finite (or countably infinite) number of values, eg

some friendly neighborhood distributions

⇤ ⌅ P(x; µ, σ) = 1 √ 2πσ exp (x − u)2 2σ2 ⇥

P(x; a) = aeax

P(k; n, p) = ⇤n k ⌅ pk(1 − p)n−k

joint density

marginalization (“integration”)

marginalization (“integration”)

conditionalization (“slicing”)

conditionalization (“slicing”)

conditionalization (“slicing”)

conditional densities

conditional densities

Bayes’ Rule

Conditional Densities

Bayes’ Rule

Terminology question:

A: when considered as a function of x (i.e., with y held fixed)

conditional distribution or sampling distribution

Expectations (“averages”)

con'nuous discrete

Corresponds to taking weighted average of f(X), weighted by how probable they are under P(x). Expectation is the weighted average of a function (of a random variable) according to the distribution (of that random variable)

Expectations (“averages”)

Monte Carlo evaluation of an expectation: x(i) ∼ P(x)

E[f(x)] ≈ 1

X

f(x(i))

Expectation is the weighted average of a function (of a random variable) according to the distribution (of that random variable)

con'nuous

discrete

Expectations (“averages”)

Expectation is the weighted average of a function (of a random variable) according to the distribution (of that random variable) It’s really just a dot product! Thus, expectation is a linear function:

con'nuous

discrete

Expectations (“averages”)

The two most important expectations (also known as “moments”):

Note: expectations don’t always exist!

independence

independence

independence

independence

independence

Correlation vs. Dependence

Correlation vs. Dependence

mutual information:

Correlation vs. Dependence

Q: Can you draw a distribution that is uncorrelated but dependent?

Correlation vs. Dependence

“Bowtie” dependencies in natural scenes:

Q: Can you draw a distribution that is uncorrelated but dependent?

Is this distribution independent?

Is this distribution independent?

Is this distribution independent?

FUN FACT:

Independent Gaussian is the only distribution that is both: Corollary: circular scatter / contour plot not sufficient to show independence!

Summary

A: when considered as a function of x  (i.e., with y held fixed)

Independent Gaussian is the only distribution that is both: Corollary: circular scatter / contour plot   not sufficient to show independence!