Review DS GA 1002 Statistical and Mathematical Models - - PowerPoint PPT Presentation

review
SMART_READER_LITE
LIVE PREVIEW

Review DS GA 1002 Statistical and Mathematical Models - - PowerPoint PPT Presentation

Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with uncertainty Statistics: Framework


slide-1
SLIDE 1

Review

DS GA 1002 Statistical and Mathematical Models

http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall16 Carlos Fernandez-Granda

slide-2
SLIDE 2

Probability and statistics

◮ Probability:

Framework for dealing with uncertainty

◮ Statistics:

Framework for extracting information from data making probabilistic assumptions

slide-3
SLIDE 3

Probability

◮ Probability basics: probability spaces, conditional probability,

independence, conditional independence

◮ Random variables: pmf, cdf, pdf, important distributions, functions of

random variables

◮ Multivariate random variables: joint pmf, joint cdf, joint pdf, marginal

distributions, conditional distributions, independence, joint distribution

  • f discrete/continuous random variables
slide-4
SLIDE 4

Probability

◮ Expectation: definition, mean, median, variance, Markov and

Chebyshev inequalities, covariance, correlation coefficient, covariance matrix, conditional expectation

◮ Random processes: definition, mean, autocovariance, important

processes (iid, Gaussian, Poisson, random walk), Markov chains

◮ Convergence: types of convergence, law of large numbers, central limit

theorem, convergence of Markov chains

◮ Simulation: motivation, inverse-transform sampling, rejection

sampling, Markov-chain Monte Carlo

slide-5
SLIDE 5

Statistics

◮ Descriptive statistics: histogram, empirical mean/variance, order

statistics, empirical covariance, principal component analysis

◮ Statistical estimation: frequentist perspective, mean square error,

consistency, confidence intervals

◮ Learning models: method of moments, maximum likelihood, empirical

cdf, kernel density estimation

slide-6
SLIDE 6

Statistics

◮ Hypothesis testing: definitions (null/alternative hypothesis, Type I/II

errors), significance level, power, p value, parametric testing, power function, likelihood-ratio test, permutation test, multiple testing, Bonferroni’s method

◮ Bayesian statistics: prior, likelihood, posterior, posterior mean/mode ◮ Linear regression: linear models, least squares, geometric

interpretation, probabilistic interpretation, overfitting

slide-7
SLIDE 7

Random walk with a drift

We define the random walk X as the discrete-state discrete-time random process

  • X (0) := 0,
  • X (i) :=

X (i − 1) + S (i) + 1, i = 1, 2, . . . where

  • S (i) =
  • +1

with probability 1

2,

−1 with probability 1

2,

is an iid sequence of steps

slide-8
SLIDE 8

Random walk with a drift

What is the mean of this random process? E

  • X (i)
slide-9
SLIDE 9

Random walk with a drift

What is the mean of this random process? E

  • X (i)
  • = E

 

i

  • j=1
  • S (j) + 1

slide-10
SLIDE 10

Random walk with a drift

What is the mean of this random process? E

  • X (i)
  • = E

 

i

  • j=1
  • S (j) + 1

 =

i

  • j=1

E

  • S (j)
  • + n
slide-11
SLIDE 11

Random walk with a drift

What is the mean of this random process? E

  • X (i)
  • = E

 

i

  • j=1
  • S (j) + 1

 =

i

  • j=1

E

  • S (j)
  • + n

= i

slide-12
SLIDE 12

Random walk with a drift

What is the autocovariance? Use the fact that the autocovariance of the random walk without drift W that we studied in the lecture notes is R

W (i, j) = min {i, j}

slide-13
SLIDE 13

Random walk with a drift

  • X (i)
slide-14
SLIDE 14

Random walk with a drift

  • X (i) =

W (i) + i

slide-15
SLIDE 15

Random walk with a drift

  • X (i) =

W (i) + i E

  • W (i)
slide-16
SLIDE 16

Random walk with a drift

  • X (i) =

W (i) + i E

  • W (i)
  • = 0
slide-17
SLIDE 17

Random walk with a drift

  • X (i) =

W (i) + i E

  • W (i)
  • = 0

R

X (i, j)

slide-18
SLIDE 18

Random walk with a drift

  • X (i) =

W (i) + i E

  • W (i)
  • = 0

R

X (i, j) := E

  • X (i)

X (j)

  • − E
  • X (i)
  • E
  • X (j)
slide-19
SLIDE 19

Random walk with a drift

  • X (i) =

W (i) + i E

  • W (i)
  • = 0

R

X (i, j) := E

  • X (i)

X (j)

  • − E
  • X (i)
  • E
  • X (j)
  • = E
  • W (i) + i
  • W (j) + j
  • − E
  • W (i) + i
  • E
  • W (j) + j
slide-20
SLIDE 20

Random walk with a drift

  • X (i) =

W (i) + i E

  • W (i)
  • = 0

R

X (i, j) := E

  • X (i)

X (j)

  • − E
  • X (i)
  • E
  • X (j)
  • = E
  • W (i) + i
  • W (j) + j
  • − E
  • W (i) + i
  • E
  • W (j) + j
  • = E
  • W (i)

W (j)

  • + iE
  • W (j)
  • + jE
  • W (i)
  • + ij

− iE

  • W (j)
  • − jE
  • W (i)
  • − ij
slide-21
SLIDE 21

Random walk with a drift

  • X (i) =

W (i) + i E

  • W (i)
  • = 0

R

X (i, j) := E

  • X (i)

X (j)

  • − E
  • X (i)
  • E
  • X (j)
  • = E
  • W (i) + i
  • W (j) + j
  • − E
  • W (i) + i
  • E
  • W (j) + j
  • = E
  • W (i)

W (j)

  • + iE
  • W (j)
  • + jE
  • W (i)
  • + ij

− iE

  • W (j)
  • − jE
  • W (i)
  • − ij

= min {i, j}

slide-22
SLIDE 22

Random walk with a drift

Compute the first-order pmf of X (i). Recall that the first-order pmf of the random walk W equals p

W (i) (x) =

i

i+x 2

1

2i

if i + x is even and −i ≤ x ≤ i

  • therwise
slide-23
SLIDE 23

Random walk with a drift

p

X(i) (x)

slide-24
SLIDE 24

Random walk with a drift

p

X(i) (x) = P

  • X (i) = x
slide-25
SLIDE 25

Random walk with a drift

p

X(i) (x) = P

  • X (i) = x
  • = P
  • W (i) = x − i
slide-26
SLIDE 26

Random walk with a drift

p

X(i) (x) = P

  • X (i) = x
  • = P
  • W (i) = x − i
  • = p

W (i) (x − 1)

slide-27
SLIDE 27

Random walk with a drift

p

X(i) (x) = P

  • X (i) = x
  • = P
  • W (i) = x − i
  • = p

W (i) (x − 1)

=

slide-28
SLIDE 28

Random walk with a drift

p

X(i) (x) = P

  • X (i) = x
  • = P
  • W (i) = x − i
  • = p

W (i) (x − 1)

= i

x 2

1

2i

slide-29
SLIDE 29

Random walk with a drift

p

X(i) (x) = P

  • X (i) = x
  • = P
  • W (i) = x − i
  • = p

W (i) (x − 1)

= i

x 2

1

2i

if x is even and 0 ≤ x ≤ 2i

slide-30
SLIDE 30

Random walk with a drift

p

X(i) (x) = P

  • X (i) = x
  • = P
  • W (i) = x − i
  • = p

W (i) (x − 1)

= i

x 2

1

2i

if x is even and 0 ≤ x ≤ 2i

  • therwise
slide-31
SLIDE 31

Random walk with a drift

Does the process satisfy the Markov condition? p

X(i+1) | X(1), X(2),..., X(i) (xi+1 | x1, x2, . . . , xi) = p X(i+1) | X(i) (xi+1|xi)

slide-32
SLIDE 32

Random walk with a drift

p

X(i+1) | X(1), X(2),..., X(i) (xi+1 | x1, x2, . . . , xi)

slide-33
SLIDE 33

Random walk with a drift

p

X(i+1) | X(1), X(2),..., X(i) (xi+1 | x1, x2, . . . , xi)

= P

  • xi +

S (i + 1) + 1 = xi+1

slide-34
SLIDE 34

Random walk with a drift

p

X(i+1) | X(1), X(2),..., X(i) (xi+1 | x1, x2, . . . , xi)

= P

  • xi +

S (i + 1) + 1 = xi+1

  • = p

X(i+1) | X(i) (xi+1|xi)

slide-35
SLIDE 35

Random walk with a drift

We observe that X (10) = 16 and X (20) = 30. What is the best estimator for X (21) in terms of probability of error?

slide-36
SLIDE 36

Random walk with a drift

p

X(21) | X(10), X(20) (x|16, 30)

slide-37
SLIDE 37

Random walk with a drift

p

X(21) | X(10), X(20) (x|16, 30) = p X(21) | X(20) (x|30)

slide-38
SLIDE 38

Random walk with a drift

p

X(21) | X(10), X(20) (x|16, 30) = p X(21) | X(20) (x|30)

=     

1 2

if x = 32

1 2

if x = 30

  • therwise
slide-39
SLIDE 39

Markov chain

Consider a Markov chain X with transition matrix T

X :=

  • a

1 1 − a

  • ,

where a is a constant between 0 and 1. We label the two states 0 and 1. The transition matrix T

X has two eigenvectors

  • q1 :=
  • 1

1−a

1

  • ,
  • q2 :=

1 −1

  • The corresponding eigenvalues are λ1 := 1 and λ2 := a − 1
slide-40
SLIDE 40

Markov chain

For what values of a is the Markov chain irreducible?

slide-41
SLIDE 41

Markov chain

For what values of a is the Markov chain periodic?

slide-42
SLIDE 42

Markov chain

Express the stationary distribution of X in terms of a

  • pstat
slide-43
SLIDE 43

Markov chain

Express the stationary distribution of X in terms of a

  • pstat =

1 ( q1)1 + ( q1)2

  • q1
slide-44
SLIDE 44

Markov chain

Express the stationary distribution of X in terms of a

  • pstat =

1 ( q1)1 + ( q1)2

  • q1

= 1 2 − a

  • 1

1 − a

slide-45
SLIDE 45

Markov chain

Does the Markov chain always converge in probability for all values of a? Justify that this is the case or provide a counterexample.

slide-46
SLIDE 46

Markov chain

Express the conditional pmf of X (i) conditioned on X (1) = 0 as a function

  • f a and i. (Hint: Computing

q1 + q2 could be a helpful first step.) Evaluate the expression at a = 0 and a = 1. Does the result make sense?

slide-47
SLIDE 47

Markov chain

We have

  • q1 +

q2

slide-48
SLIDE 48

Markov chain

We have

  • q1 +

q2 =

  • 1

1−a

1

  • +

1 −1

slide-49
SLIDE 49

Markov chain

We have

  • q1 +

q2 =

  • 1

1−a

1

  • +

1 −1

  • =

2−a

1−a

slide-50
SLIDE 50

Markov chain

We have

  • q1 +

q2 =

  • 1

1−a

1

  • +

1 −1

  • =

2−a

1−a

  • p

X(0)

slide-51
SLIDE 51

Markov chain

We have

  • q1 +

q2 =

  • 1

1−a

1

  • +

1 −1

  • =

2−a

1−a

  • p

X(0) =

1

slide-52
SLIDE 52

Markov chain

We have

  • q1 +

q2 =

  • 1

1−a

1

  • +

1 −1

  • =

2−a

1−a

  • p

X(0) =

1

  • = 1 − a

2 − a ( q1 + q2)

slide-53
SLIDE 53

Markov chain

  • p

X(i)

slide-54
SLIDE 54

Markov chain

  • p

X(i) = T i

  • X

p

X(0)

slide-55
SLIDE 55

Markov chain

  • p

X(i) = T i

  • X

p

X(0)

= T i

  • X

1 − a 2 − a ( q1 + q2)

slide-56
SLIDE 56

Markov chain

  • p

X(i) = T i

  • X

p

X(0)

= T i

  • X

1 − a 2 − a ( q1 + q2) = 1 − a 2 − a

  • λi

1

q1 + λi

2

q2

slide-57
SLIDE 57

Markov chain

  • p

X(i) = T i

  • X

p

X(0)

= T i

  • X

1 − a 2 − a ( q1 + q2) = 1 − a 2 − a

  • λi

1

q1 + λi

2

q2

  • = 1 − a

2 − a

  • 1

1−a

1

  • + (a − 1)i

1 −1

slide-58
SLIDE 58

Markov chain

  • p

X(i) = T i

  • X

p

X(0)

= T i

  • X

1 − a 2 − a ( q1 + q2) = 1 − a 2 − a

  • λi

1

q1 + λi

2

q2

  • = 1 − a

2 − a

  • 1

1−a

1

  • + (a − 1)i

1 −1

  • =

1 2 − a

  • 1 − (a − 1)i+1

(1 − a)

  • 1 − (a − 1)i
slide-59
SLIDE 59

Markov chain

For a = 1 we have

  • p

X(i) =

1

slide-60
SLIDE 60

Markov chain

For a = 0 we have

  • p

X(i) = 1

2

  • 1 − (−1)i+1

1 − (−1)i

slide-61
SLIDE 61

Markov chain

For a = 0 we have

  • p

X(i) = 1

2

  • 1 − (−1)i+1

1 − (−1)i

  • =

          

  • 1
  • if i is odd,
  • 1
  • if i is even.
slide-62
SLIDE 62

Sampling from multivariate distributions

We are interested in generating samples from the joint distribution of two random variables X and Y . If we generate a sample x according to the pdf fX and a sample y according to the pdf fY , are these samples a realization

  • f the joint distribution of X and Y ? Explain your answer with a simple

example.

slide-63
SLIDE 63

Sampling from multivariate distributions

Now, assume that X is discrete and Y is continuous. Propose a method to generate a sample from the joint distribution using the pmf of X and the conditional cdf of Y given X using two independent samples from a distribution that is uniform between 0 and 1. Assume that the conditional cdf is invertible.

slide-64
SLIDE 64

Sampling from multivariate distributions

  • 1. Obtain two independent samples u1 and u2 from the uniform

distribution.

slide-65
SLIDE 65

Sampling from multivariate distributions

  • 1. Obtain two independent samples u1 and u2 from the uniform

distribution.

  • 2. Set x to equal the smallest value a such that pX (a) = 0 and

u1 ≤ FX (a).

slide-66
SLIDE 66

Sampling from multivariate distributions

  • 1. Obtain two independent samples u1 and u2 from the uniform

distribution.

  • 2. Set x to equal the smallest value a such that pX (a) = 0 and

u1 ≤ FX (a).

  • 3. Define

Fx (·) := FY | X (· | x) Set y := F −1

x

(u2)

slide-67
SLIDE 67

Sampling from multivariate distributions

Explain how to generate samples from a random variable with pdf fW (w) = 0.1 λ1 exp (−λ1w) + 0.9 λ2 exp (−λ2w) , w ≥ 0, where λ1 and λ2 are positive constants, using two iid uniform samples between 0 and 1.

slide-68
SLIDE 68

Sampling from multivariate distributions

Let us define a Bernoulli random variable X with parameter 0.9, such that if X = 0 then Y is exponential with parameter λ1 and if X = 1 then Y is exponential with parameter λ2

slide-69
SLIDE 69

Sampling from multivariate distributions

Let us define a Bernoulli random variable X with parameter 0.9, such that if X = 0 then Y is exponential with parameter λ1 and if X = 1 then Y is exponential with parameter λ2 The marginal distribution of Y is fY (w) = pX (0) fY | X (w | 0) + pX (1) fY | X (w | 1)

slide-70
SLIDE 70

Sampling from multivariate distributions

Let us define a Bernoulli random variable X with parameter 0.9, such that if X = 0 then Y is exponential with parameter λ1 and if X = 1 then Y is exponential with parameter λ2 The marginal distribution of Y is fY (w) = pX (0) fY | X (w | 0) + pX (1) fY | X (w | 1) = 0.1 λ1 exp (−λ1w) + 0.9 λ2 exp (−λ2w)

slide-71
SLIDE 71

Sampling from multivariate distributions

  • 1. We obtain two independent samples u1 and u2 from the uniform

distribution.

slide-72
SLIDE 72

Sampling from multivariate distributions

  • 1. We obtain two independent samples u1 and u2 from the uniform

distribution.

  • 2. If u1 ≤ 0.1 we set

w := 1 λ1 log

  • 1

1 − u2

  • therwise we set

w := 1 λ2 log

  • 1

1 − u2

slide-73
SLIDE 73

Convergence

Let U be a random variable uniformly distributed between 0 and 1. If we define the discrete random process X

  • X (i) = U for all i,

does X converge to 1 − U in probability?

slide-74
SLIDE 74

Convergence

Does X converge to 1 − U in distribution?

slide-75
SLIDE 75

Convergence

You draw some iid samples x1, x2, . . . from a Cauchy random variable. Will the empirical mean 1

n

n

i=1 xi converge in probability as n grows large?

Explain why briefly and if the answer is yes state what it converges to.

slide-76
SLIDE 76

Convergence

You draw m iid samples x1, x2, . . . , xm from a Cauchy random variable. Then you draw iid samples y1, y2, . . . uniformly from {x1, x2, . . . , xm} (each yi is equal to each element of {x1, x2, . . . , xm} with probability 1/m). Will the empirical mean 1

n

n

i=1 yi converge in probability as n grows large?

Explain why very briefly and if the answer is yes state what it converges to.

slide-77
SLIDE 77

Earthquake

We are interested in learning a model for the occurrence of earthquakes. We decide to model the time between earthquakes as an exponential random variable with parameter λ. Compute the maximum-likelihood estimate of λ given t1, t2, . . . , tn, which are interarrival times for past earthquakes. Assume that the data are iid.

slide-78
SLIDE 78

Earthquake

L (λ)

slide-79
SLIDE 79

Earthquake

L (λ) := f

T(1),..., T(n) (t1, . . . , tn)

slide-80
SLIDE 80

Earthquake

L (λ) := f

T(1),..., T(n) (t1, . . . , tn)

=

n

  • i=1

λ exp (−λti)

slide-81
SLIDE 81

Earthquake

L (λ) := f

T(1),..., T(n) (t1, . . . , tn)

=

n

  • i=1

λ exp (−λti) = λn exp

  • −λ

n

  • i=1

ti

slide-82
SLIDE 82

Earthquake

L (λ) := f

T(1),..., T(n) (t1, . . . , tn)

=

n

  • i=1

λ exp (−λti) = λn exp

  • −λ

n

  • i=1

ti

  • log L (λ)
slide-83
SLIDE 83

Earthquake

L (λ) := f

T(1),..., T(n) (t1, . . . , tn)

=

n

  • i=1

λ exp (−λti) = λn exp

  • −λ

n

  • i=1

ti

  • log L (λ) = n log λ − λ

n

  • i=1

ti

slide-84
SLIDE 84

Earthquake

d log Lt1,...,tn (λ) dλ

slide-85
SLIDE 85

Earthquake

d log Lt1,...,tn (λ) dλ = n λ −

n

  • i=1

ti

slide-86
SLIDE 86

Earthquake

d log Lt1,...,tn (λ) dλ = n λ −

n

  • i=1

ti d2 log Lt1,...,tn (λ) dλ2

slide-87
SLIDE 87

Earthquake

d log Lt1,...,tn (λ) dλ = n λ −

n

  • i=1

ti d2 log Lt1,...,tn (λ) dλ2 = −n λ2

slide-88
SLIDE 88

Earthquake

d log Lt1,...,tn (λ) dλ = n λ −

n

  • i=1

ti d2 log Lt1,...,tn (λ) dλ2 = −n λ2 λML

slide-89
SLIDE 89

Earthquake

d log Lt1,...,tn (λ) dλ = n λ −

n

  • i=1

ti d2 log Lt1,...,tn (λ) dλ2 = −n λ2 λML = 1

1 n

n

i=1 ti

slide-90
SLIDE 90

Earthquake

Find an approximate 0.95 confidence interval based on the central limit theorem for the value of λ. Assume that you know a bound b on the standard deviation (i.e. the variance of the exponential 1/λ2 is bounded by b2) and express your answer using the Q function. (Hint: Express the ML estimate in terms of the empirical mean.) (See solutions.)

slide-91
SLIDE 91

Earthquake

What is the posterior distribution of the parameter Λ if we model it as a random variable with a uniform distribution between 0 and u? Express your answer in terms of the sum n

i=1 ti, u and the marginal pdf of the data

evaluated at t1, t2, . . . , tn c := f

T(1),..., T(n) (t1, . . . , tn) .

slide-92
SLIDE 92

Earthquake

fΛ |

T(1),..., T(n) (λ | t1, . . . , tn)

slide-93
SLIDE 93

Earthquake

fΛ |

T(1),..., T(n) (λ | t1, . . . , tn) = fΛ (λ) λn exp (−λ n i=1 ti)

f

T(1),..., T(n) (t1, . . . , tn)

slide-94
SLIDE 94

Earthquake

fΛ |

T(1),..., T(n) (λ | t1, . . . , tn) = fΛ (λ) λn exp (−λ n i=1 ti)

f

T(1),..., T(n) (t1, . . . , tn)

= 1 u c λn exp

  • −λ

n

  • i=1

ti

slide-95
SLIDE 95

Earthquake

fΛ |

T(1),..., T(n) (λ | t1, . . . , tn) = fΛ (λ) λn exp (−λ n i=1 ti)

f

T(1),..., T(n) (t1, . . . , tn)

= 1 u c λn exp

  • −λ

n

  • i=1

ti

  • for 0 ≤ λ ≤ u and zero otherwise
slide-96
SLIDE 96

Earthquake

λ fΛ |

T(1),..., T(n) (λ | t1, . . . , tn)

slide-97
SLIDE 97

Earthquake

Explain how you would use the answer in the previous question to construct a confidence interval for the parameter

slide-98
SLIDE 98

Chad

You hate a coworker and want to predict when he is in the office from the temperature. Chad 61 65 59 61 61 65 61 63 63 59 No Chad 68 70 68 64 64

  • You model his presence using a random variable C which is equal to 1 if he

is there and 0 if he is not. Estimate pC.

slide-99
SLIDE 99

Chad

The empirical pmf is pC (0) = 5 15 = 1 3, pC (1) = 10 15 = 2 3.

slide-100
SLIDE 100

Chad

You model the temperature using a random variable T. Sketch the kernel density estimator of the conditional distribution of T given C using a rectangular kernel with width equal to 2.

slide-101
SLIDE 101

Chad

55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 0.00 0.05 0.10 0.15 0.20 fT|C(t|0) fT|C(t|1)

slide-102
SLIDE 102

Chad

If T = 68◦ what is the ML estimate of C?

slide-103
SLIDE 103

Chad

If T = 68◦ what is the ML estimate of C? fT|C (68|0) = 0.2 fT|C (68|1) = 0

slide-104
SLIDE 104

Chad

If T = 64◦ what is the MAP estimate of C?

slide-105
SLIDE 105

Chad

If T = 64◦ what is the MAP estimate of C? pC|T (0|64)

slide-106
SLIDE 106

Chad

If T = 64◦ what is the MAP estimate of C? pC|T (0|64) = pC (0) fT|C (64|0) pC (0) fT|C (64|0) + pC (1) fT|C (64|1)

slide-107
SLIDE 107

Chad

If T = 64◦ what is the MAP estimate of C? pC|T (0|64) = pC (0) fT|C (64|0) pC (0) fT|C (64|0) + pC (1) fT|C (64|1) =

1 30.2 1 30.2 + 2 30.1

slide-108
SLIDE 108

Chad

If T = 64◦ what is the MAP estimate of C? pC|T (0|64) = pC (0) fT|C (64|0) pC (0) fT|C (64|0) + pC (1) fT|C (64|1) =

1 30.2 1 30.2 + 2 30.1

= 1 2

slide-109
SLIDE 109

Chad

If T = 64◦ what is the MAP estimate of C? pC|T (0|64) = pC (0) fT|C (64|0) pC (0) fT|C (64|0) + pC (1) fT|C (64|1) =

1 30.2 1 30.2 + 2 30.1

= 1 2 pC|T (1|64)

slide-110
SLIDE 110

Chad

If T = 64◦ what is the MAP estimate of C? pC|T (0|64) = pC (0) fT|C (64|0) pC (0) fT|C (64|0) + pC (1) fT|C (64|1) =

1 30.2 1 30.2 + 2 30.1

= 1 2 pC|T (1|64) = 1 − pC|T (0|64)

slide-111
SLIDE 111

Chad

If T = 64◦ what is the MAP estimate of C? pC|T (0|64) = pC (0) fT|C (64|0) pC (0) fT|C (64|0) + pC (1) fT|C (64|1) =

1 30.2 1 30.2 + 2 30.1

= 1 2 pC|T (1|64) = 1 − pC|T (0|64) = 1 2

slide-112
SLIDE 112

Chad

What happens if the temperature is 57◦? Explain how using parametric estimation may alleviate this problem.

slide-113
SLIDE 113

3-point shooting

The New York Knicks hire you as a data analyst. Your first task is to come up with a way to determine whether a 3-point shooter is any good. You will use the following graph of the function g (θ, n) = θn.

0.5 0.6 0.7 0.8 0.9 1.0

θ

0.050 0.100 0.150 0.200 0.250 0.300 0.350 0.400 0.450 0.500 0.550 0.600 0.650 0.700 0.750 0.800 0.850 0.900 0.950 0.005

g(θ,n)

n = 4 n = 9 n = 14 n = 19 n = 24

slide-114
SLIDE 114

3-point shooting

  • 1. Interpret g (θ, n).
slide-115
SLIDE 115

3-point shooting

  • 1. Interpret g (θ, n).
  • 2. The coach tells you: I want to make sure that the guy has a shooting

percentage over 80%. What is your null hypothesis?

slide-116
SLIDE 116

3-point shooting

  • 1. Interpret g (θ, n).
  • 2. The coach tells you: I want to make sure that the guy has a shooting

percentage over 80%. What is your null hypothesis?

  • 3. What number of shots does a player need to make in a row for you to

reject the null hypothesis with a confidence level of 5%?

slide-117
SLIDE 117

3-point shooting

  • 1. Interpret g (θ, n).
  • 2. The coach tells you: I want to make sure that the guy has a shooting

percentage over 80%. What is your null hypothesis?

  • 3. What number of shots does a player need to make in a row for you to

reject the null hypothesis with a confidence level of 5%? 14

slide-118
SLIDE 118

3-point shooting

  • 1. Interpret g (θ, n).
  • 2. The coach tells you: I want to make sure that the guy has a shooting

percentage over 80%. What is your null hypothesis?

  • 3. What number of shots does a player need to make in a row for you to

reject the null hypothesis with a confidence level of 5%? 14

  • 4. A player makes 9 shots in a row. What is the corresponding p value?

Do you declare him as a good shooter?

slide-119
SLIDE 119

3-point shooting

  • 1. Interpret g (θ, n).
  • 2. The coach tells you: I want to make sure that the guy has a shooting

percentage over 80%. What is your null hypothesis?

  • 3. What number of shots does a player need to make in a row for you to

reject the null hypothesis with a confidence level of 5%? 14

  • 4. A player makes 9 shots in a row. What is the corresponding p value?

Do you declare him as a good shooter? ≈ 0.14

slide-120
SLIDE 120

3-point shooting

  • 1. Interpret g (θ, n).
  • 2. The coach tells you: I want to make sure that the guy has a shooting

percentage over 80%. What is your null hypothesis?

  • 3. What number of shots does a player need to make in a row for you to

reject the null hypothesis with a confidence level of 5%? 14

  • 4. A player makes 9 shots in a row. What is the corresponding p value?

Do you declare him as a good shooter? ≈ 0.14

  • 5. What is the probability that you do not declare a player who has a

shooting percentage of 90% as a good shooter?

slide-121
SLIDE 121

3-point shooting

  • 1. Interpret g (θ, n).
  • 2. The coach tells you: I want to make sure that the guy has a shooting

percentage over 80%. What is your null hypothesis?

  • 3. What number of shots does a player need to make in a row for you to

reject the null hypothesis with a confidence level of 5%? 14

  • 4. A player makes 9 shots in a row. What is the corresponding p value?

Do you declare him as a good shooter? ≈ 0.14

  • 5. What is the probability that you do not declare a player who has a

shooting percentage of 90% as a good shooter? 1 − g (0.9, 14) ≈ 0.76

slide-122
SLIDE 122

3-point shooting

  • 1. Interpret g (θ, n).
  • 2. The coach tells you: I want to make sure that the guy has a shooting

percentage over 80%. What is your null hypothesis?

  • 3. What number of shots does a player need to make in a row for you to

reject the null hypothesis with a confidence level of 5%? 14

  • 4. A player makes 9 shots in a row. What is the corresponding p value?

Do you declare him as a good shooter? ≈ 0.14

  • 5. What is the probability that you do not declare a player who has a

shooting percentage of 90% as a good shooter? 1 − g (0.9, 14) ≈ 0.76

  • 6. You apply the test on 10 players. You adapt the threshold applying

Bonferroni’s method. What is the new threshold?

slide-123
SLIDE 123

3-point shooting

  • 1. Interpret g (θ, n).
  • 2. The coach tells you: I want to make sure that the guy has a shooting

percentage over 80%. What is your null hypothesis?

  • 3. What number of shots does a player need to make in a row for you to

reject the null hypothesis with a confidence level of 5%? 14

  • 4. A player makes 9 shots in a row. What is the corresponding p value?

Do you declare him as a good shooter? ≈ 0.14

  • 5. What is the probability that you do not declare a player who has a

shooting percentage of 90% as a good shooter? 1 − g (0.9, 14) ≈ 0.76

  • 6. You apply the test on 10 players. You adapt the threshold applying

Bonferroni’s method. What is the new threshold? n = 24

slide-124
SLIDE 124

3-point shooting

  • 1. Interpret g (θ, n).
  • 2. The coach tells you: I want to make sure that the guy has a shooting

percentage over 80%. What is your null hypothesis?

  • 3. What number of shots does a player need to make in a row for you to

reject the null hypothesis with a confidence level of 5%? 14

  • 4. A player makes 9 shots in a row. What is the corresponding p value?

Do you declare him as a good shooter? ≈ 0.14

  • 5. What is the probability that you do not declare a player who has a

shooting percentage of 90% as a good shooter? 1 − g (0.9, 14) ≈ 0.76

  • 6. You apply the test on 10 players. You adapt the threshold applying

Bonferroni’s method. What is the new threshold? n = 24

  • 7. With the correction, what is the probability that you do not declare a

player who has a shooting percentage of 90% as a good shooter?

slide-125
SLIDE 125

3-point shooting

  • 1. Interpret g (θ, n).
  • 2. The coach tells you: I want to make sure that the guy has a shooting

percentage over 80%. What is your null hypothesis?

  • 3. What number of shots does a player need to make in a row for you to

reject the null hypothesis with a confidence level of 5%? 14

  • 4. A player makes 9 shots in a row. What is the corresponding p value?

Do you declare him as a good shooter? ≈ 0.14

  • 5. What is the probability that you do not declare a player who has a

shooting percentage of 90% as a good shooter? 1 − g (0.9, 14) ≈ 0.76

  • 6. You apply the test on 10 players. You adapt the threshold applying

Bonferroni’s method. What is the new threshold? n = 24

  • 7. With the correction, what is the probability that you do not declare a

player who has a shooting percentage of 90% as a good shooter? 1 − g (0.9, 14) ≈ 0.92

slide-126
SLIDE 126

3-point shooting

  • 1. Interpret g (θ, n).
  • 2. The coach tells you: I want to make sure that the guy has a shooting

percentage over 80%. What is your null hypothesis?

  • 3. What number of shots does a player need to make in a row for you to

reject the null hypothesis with a confidence level of 5%? 14

  • 4. A player makes 9 shots in a row. What is the corresponding p value?

Do you declare him as a good shooter? ≈ 0.14

  • 5. What is the probability that you do not declare a player who has a

shooting percentage of 90% as a good shooter? 1 − g (0.9, 14) ≈ 0.76

  • 6. You apply the test on 10 players. You adapt the threshold applying

Bonferroni’s method. What is the new threshold? n = 24

  • 7. With the correction, what is the probability that you do not declare a

player who has a shooting percentage of 90% as a good shooter? 1 − g (0.9, 14) ≈ 0.92

  • 8. What is the advantage of adapting the threshold? What is the

disadvantage?