HYPOTHESIS TESTING PART I RECAP & OUTLOOK BAYESIAN PARAMETER - - PowerPoint PPT Presentation

hypothesis testing
SMART_READER_LITE
LIVE PREVIEW

HYPOTHESIS TESTING PART I RECAP & OUTLOOK BAYESIAN PARAMETER - - PowerPoint PPT Presentation

INTRODUCTION TO DATA ANALYSIS HYPOTHESIS TESTING PART I RECAP & OUTLOOK BAYESIAN PARAMETER ESTIMATION FREQUENTIST HYPOTHESIS TESTING model captures prior beliefs model captures a hypothetically M M about data-generating process


slide-1
SLIDE 1

HYPOTHESIS TESTING

INTRODUCTION TO DATA ANALYSIS PART I

slide-2
SLIDE 2

RECAP & OUTLOOK

▸ model

captures prior beliefs about data-generating process

▸ prior over latent parameters ▸ likelihood of data ▸ Bayesian posterior inference using

  • bserved data

▸ compare posterior beliefs to some

parameter value of interest

M Dobs

BAYESIAN PARAMETER ESTIMATION FREQUENTIST HYPOTHESIS TESTING ▸ model

captures a hypothetically assumed data-generating process

▸ fix parameter value of interest ▸ likelihood of data ▸ single out some aspect of the data

as most important (test statistic)

▸ look at distribution of test statistic

given the assumed model (sampling distribution)

▸ check likelihood of test statistic

applied to the observed data

M Dobs

slide-3
SLIDE 3

CAVEAT

▸ there are at least three flavors of frequentist

hypothesis testing

▸ Fisher ▸ Neyman-Pearson ▸ modern hybrid NHST

[null-hypothesis significance testing]

▸ not every text book is clear on these differences

and/or which flavor it endorses

▸ there is also no unanimity of practice between or

within research fields

FREQUENTIST HYPOTHESIS TESTING

!

slide-4
SLIDE 4

LEARNING GOALS

▸ understand basic idea of frequentist hypothesis testing ▸ understand what a p-value is ▸ definition, one- vs two-sided ▸ test statistic & sampling distribution ▸ relation to confidence intervals ▸ significance levels & -error

α

slide-5
SLIDE 5

p-value

slide-6
SLIDE 6

PRELIMINARIES

▸ research hypothesis: theoretically implied answer to a main question of interest for

research

▸ e.g., truth-judgements of sentences with presupposition failure at chance level?

(King of France)

▸ e.g., faster reactions in reaction time trials than in go/No-go trials? (Mental

Chronometry)

▸ null hypothesis: specific assumption made for purposes of analysis ▸ fix parameter value in a data-generating model for technical reasons ▸ analogy: useful assumption in mathematical proof (e.g., in reductio ad absurdum) ▸ alternative hypothesis: the antagonist of the null hypothesis, specified to relate the

null hypothesis to the research hypothesis

slide-7
SLIDE 7

P-VALUE

slide-8
SLIDE 8

Binomial Model

slide-9
SLIDE 9

BAYESIAN BINOMIAL MODEL (AS ORIGINALLY INTRODUCED)

k ∼ Binomial(θ, N)

k θ

θ ∼ Beta(…)

N

slide-10
SLIDE 10

BAYESIAN BINOMIAL MODEL (EXTENDED)

xi ∼ Bernoulli(θ0) k =

N

i=1

xi

xi N θ k

θ ∼ Beta(…)

slide-11
SLIDE 11

FREQUENTIST BINOMIAL MODEL

xi ∼ Bernoulli(θ0) k =

N

i=1

xi

FACT: The sampling distribution of is:

k

k ∼ Binomial(θ0, N)

xi N θ0 k

[likelihood of “raw” data] [test statistic (derived from “raw” data)] [doted line = “working assumption”]

slide-12
SLIDE 12

FREQUENTIST BINOMIAL MODEL

xi N θ0 k

▸ null-hypothesis: ▸ test statistic: derived from “raw” data ▸ the most important (numerical) aspect of the

data for the current testing purposes

▸ sampling distribution: likelihood of observing a

particular value of in this model

▸ notice: the observed data

has not yet made any appearance

remark: sometimes summary statistics of

  • ther than the

test statistic might be used in the model

θ = θ0 k ⃗ x k Dobs

Dobs

slide-13
SLIDE 13

FREQUENTIST BINOMIAL MODEL

xi N θ0 k

▸ likelihood of data: random variable ▸ sampling distribution: random variable

𝒠|H0 P(𝒠|H0 = ⟨x1, …, xN⟩) =

N

i=1

Bernoulli(xi, θ0) T|H0 P(T|H0 = k) = Binomial(k, θ0, N)

slide-14
SLIDE 14

Binomial p-values

slide-15
SLIDE 15

BINOMIAL TEST

▸ 24/7 example:

and

▸ ▸ ▸ p-value definition:

N = 24 k = 7 t(Dobs) = 7 P(T|H0 = k) = Binomial(k, θ0, N)

p(Dobs) = P(T|H0 ⪰H0,a t(Dobs))

we know this we know this ???

What counts as “more extreme evidence against the null hypothesis” is a context-sensitive notion that depends on the null-hypothesis and the alternative hypothesis because only when put together do null- and alternative hypothesis address the research question in the background.

slide-16
SLIDE 16

BINOMIAL TEST

▸ compare two research questions

  • 1. Is the coin fair?

▸ ▸

  • 2. Is the coin biased towards heads?

▸ ▸

H0: θ = 0.5 Ha: θ ≠ 0.5 H0: θ = 0.5 Ha: θ < 0.5

▸ we still use a point-valued null-

hypothesis for technical reasons

▸ the alternative hypothesis is

important to fix the meaning of ⪰H0,a

slide-17
SLIDE 17

BINOMIAL TEST

▸ Case 1: Is the coin fair? ▸ ▸ ▸ which values of are

more extreme evidence against ?

H0: θ = 0.5 Ha: θ ≠ 0.5 k H0

slide-18
SLIDE 18

BINOMIAL TEST

▸ Case 1: Is the coin fair? ▸ ▸ ▸ which values of are

more extreme evidence against ?

▸ anything that’s even

less likely to occur

H0: θ = 0.5 Ha: θ ≠ 0.5 k H0

slide-19
SLIDE 19

BINOMIAL TEST

slide-20
SLIDE 20

BINOMIAL TEST

▸ Case 2: Is the coin

biased towards heads?

▸ ▸ ▸ which values of are

more extreme evidence against ?

H0: θ = 0.5 Ha: θ < 0.5 k H0

slide-21
SLIDE 21

BINOMIAL TEST

▸ Case 2: Is the coin

biased towards heads?

▸ ▸ ▸ which values of are

more extreme evidence against ?

▸ anything even more in

favor of

H0: θ = 0.5 Ha: θ < 0.5 k H0 Ha

slide-22
SLIDE 22

BINOMIAL TEST

slide-23
SLIDE 23

p-value

revisit

slide-24
SLIDE 24

P-VALUE

slide-25
SLIDE 25

significance and -errors

α

slide-26
SLIDE 26

SIGNIFICANCE LEVELS

▸ standardly we fix a significance level before the test ▸ common values of are: ▸ ▸ ▸ ▸ if the p-value for the observed data passes the pre-established threshold of

significance, we say that the test result was significant

▸ a significant test result is conventionally regarded as “strong enough” evidence

against the null-hypothesis, so that we can reject the null hypothesis as a viable explanation of the data

▸ non-significant results are interpreted differently in different approaches (more

later)

α α α = 0.05 α = 0.01 α = 0.001

slide-27
SLIDE 27
  • ERROR

α

▸ an -error (aka type-I error) occurs when we reject a true null hypothesis ▸ by definition this type of error occurs, in the long run, with a proportion of no

more than

▸ it is in this way that frequentist statistic is subscribed and cherishes a regime of

long-term error control on research results

▸ Bayesian approaches (usually) are not concerned with long-term error control

α α