Advanced Mathematical Methods Part II Statistics Statistical - - PowerPoint PPT Presentation

advanced mathematical methods
SMART_READER_LITE
LIVE PREVIEW

Advanced Mathematical Methods Part II Statistics Statistical - - PowerPoint PPT Presentation

Advanced Mathematical Methods Part II Statistics Statistical Inference Mel Slater http://www.cs.ucl.ac.uk/staff/m.slater/Teaching/Statistics/ R.A. Fisher Karl Pearson Thomas Bayes 1 Outline Bayes Theorem Examples of Inference


slide-1
SLIDE 1

1

Advanced Mathematical Methods

Part II – Statistics Statistical Inference

Mel Slater

http://www.cs.ucl.ac.uk/staff/m.slater/Teaching/Statistics/

R.A. Fisher

Karl Pearson Thomas Bayes

slide-2
SLIDE 2

2

Outline

Bayes’ Theorem Examples of Inference Estimation Likelihood Ratios Classical Statistical Testing Gallery of Standard Tests

slide-3
SLIDE 3

3

Statistical Inference

To make informed probability statements

about hypotheses in the light of evidence.

H1, H2, …, Hn are n mutually exclusive

and exhaustive ‘events’

  • one and only one is true

E is some event. Require P(Hj|E)

  • If we had initial P(Hj )
  • How should the probability of Hj be updated if

evidence E was known to have occurred?

slide-4
SLIDE 4

4

Bayes Theorem

It is easy to see: and therefore: and so:

slide-5
SLIDE 5

5

Bayes’ Theorem and Inference

If we interpret the Hj as hypotheses

and E as evidence then Bayes’ Theorem gives a method for statistical inference.

P(Hj ) is called the prior probability P(E|Hj) is the likelihood P(Hj|E) is the posterior probability posterior ∝ likelihood * prior

slide-6
SLIDE 6

6

Bayes’ Theorem for Distributions

Suppose X is a r.v. with probability

density dependent on a parameter set θ (e.g., mean and variance).

f(θ|x) ∝ f(x|θ) f(θ)

  • f(θ) ~ prior probability distribution for θ
  • f(x|θ) ~ the likelihood
  • x may be a vector of r.v.s
slide-7
SLIDE 7

7

Example - ESP

Recall the ‘extra sensory hypothesis’

example and J.B. Rhine from 1930s

Simplification of the Rhine experiments:

  • Target person (T) is in Lab1 selecting cards at

random from a pack – each card has a square or circle.

  • Subject (S) is in Lab2 and on signal guesses

which card T is looking at and writes it down.

  • There is no possible communication between

Lab1 and Lab2 (etc).

  • X = number of correct guesses subject makes

in n trials.

slide-8
SLIDE 8

8

Example ESP

X ~ binomial(p,n) where p =

P(correct guess)

Suppose we ‘knew nothing’ about p,

we would assign p ~ uniform(0,1) distribution.

We require f(p|x) : the posterior

distribution of p given that we know that x correct guesses were made.

slide-9
SLIDE 9

9

f(p|x) ∝ f(x|p)f(p) This is is the Beta distribution with

parameters a = x+1 and b = n-x+1

  • See notes

Suppose n = 500 and x = 280. From MATLAB (betacdf) we find

  • P(p > 0.5) = 0.9964

We would conclude that something ‘beyond

chance’ must be happening.

(This is the type of data that Rhine obtained

again and again!).

Example ESP

x n x

p p x p f

− ∝ ) 1 ( ) | (

slide-10
SLIDE 10

10

Example ESP Continued

From MATLAB using betainv, we can find

for this example that

  • P(0.5162 < p < 0.6029) = 0.95

Since this range does not include the

value of ½ which should be expected if chance alone was operating then we may decide to reject the hypothesis that this was the result of coincidence

This is called an ‘interval estimate’ for p.

slide-11
SLIDE 11

11

Point Estimators

For the p ~ beta(a,b) distribution it can be

shown that:

E(p|x) could serve as a single value estimator

with the variance as an indication of margin of error.

  • E(p|x) = 0.5598
  • Standard deviation(p|x) = 0.0221

) 3 ( ) 2 ( ) 1 )( 1 ( ) | ( 2 1 ) | (

2

+ + + − + = + + = n n x n x x p Var n x x p E

slide-12
SLIDE 12

12

Properties of the Estimator for p

For large n,

  • E(p|x) ∝ x/n
  • Var(p|x) → 0
  • Probability and frequency ratio become

identical.

Note that E((x+1)/(n+2)|p) ≠ p

  • This estimator is called biased.

In general if t is an estimator for parameter

θ based on n observations then

  • E(t|θ) = θ unbiased
  • Var(t|θ) → 0 as n → ∞ consistent.
slide-13
SLIDE 13

13

The Posterior Beta Distribution

The pdf shows

how most of the probability is concentrated above 0.5.

The full

posterior distribution is the ideal way to examine the situation.

slide-14
SLIDE 14

14

Recasting as a Statistical Test

In our example we state a null hypothesis

and an alternative:

  • H0: p = ½
  • H1: p > ½

We will decide to reject H0 if on our

‘posterior distribution’ we find

  • P(p > ½ | data) > some high value, eg:
  • reject H0 if
  • P(p > ½ | x ) > 0.95
slide-15
SLIDE 15

15

Statistical Test

We know p|x ~ Beta(x+1,n-x+1) Using MATLAB notation the test is:

  • Reject H0 if
  • 1 – betacdf(0.5,x+1,n-x+1) > 0.95, or
  • betacdf(0.5,x+1,n-x+1) < 0.05

If we find in advance a value x0 such that

  • betacdf(0.5,x0+1,n-x0+1) = 0.05

Then the test becomes

  • Reject H0 if the observed X >= x0
slide-16
SLIDE 16

16

Classical Statistical Testing

The 0.05 is called the ‘significance level’

– it is equal to the prob of rejecting H0 when in fact it is ‘true’

The significance level is usually denoted

by α = P(Type I error)

P(Type II error) = β Power = 1-β

power Type II error H1 ‘true’ Type I error H0 ‘true’ H1 decided H0 decided

slide-17
SLIDE 17

17

Classical Statistical Tests

Rely on Central Limit Theorem Testing Population Means Testing Difference of Means Testing Variances Testing Ratios of Variances Testing Goodness of Fit Relationship between variables

slide-18
SLIDE 18

18

Central Limit Theorem

See notes for ‘proof’ X is any r.v. with finite mean µ and variance

σ2.

x1,x2, …, xn are n independent observations

  • n X.

The sample mean is For n ‘large’

=

=

n i i

x n x

1

1

) , ( ~

2

n N x σ µ

slide-19
SLIDE 19

19

Central Limit Theorem

If X itself is Normal then the result is

exact for any n

In practice n > 30 is usually used as

the interpretation for ‘large’.

This can be illustrated very easily

with simulation (see exercises).

slide-20
SLIDE 20

20

Hypothesis about a Mean

X is any r.v. with finite mean µ and

variance σ2

We assume that σ2 is known, but

that µ is unknown

The problem is to make inferences

about µ from

x1,x2, …, xn

  • n independent observations on X.
slide-21
SLIDE 21

21

Hypothesis about a mean

Assume we know ‘nothing’ about µ The ‘pseudo’ pdf for µ would then be

  • f(µ) = k, -∝ < µ < ∝

– This is not a true pdf!

From Bayes’ Theorem From the CLT we know therefore that

) ( ) | ( ) | ( µ µ µ f x f x f ∝ ) , ( ) | (

2

n x N x f σ µ =

slide-22
SLIDE 22

22

Estimators for µ

An estimator for µ From the normal distribution it is also

easy to show for example that a 95% interval estimate is:

n x Var x x E

2

) | ( ) | ( σ µ µ = =

slide-23
SLIDE 23

23

Hypothesis Test for µ

H0: µ = µ0 H1: µ = µ1 > µ0

  • sig. level here is 0.05