Inference concepts DAAG Chapter 4 Learning objectives Point - - PowerPoint PPT Presentation

inference concepts
SMART_READER_LITE
LIVE PREVIEW

Inference concepts DAAG Chapter 4 Learning objectives Point - - PowerPoint PPT Presentation

Inference concepts DAAG Chapter 4 Learning objectives Point estimation Confidence intervals and hypothesis tests Contingency tables One-way and two-way comparisons, ANOVA Response curves Nested structures, pseudoreplication


slide-1
SLIDE 1

Inference concepts

DAAG Chapter 4

slide-2
SLIDE 2

Learning objectives

  • Point estimation
  • Confidence intervals and hypothesis tests
  • Contingency tables
  • One-way and two-way comparisons, ANOVA
  • Response curves
  • Nested structures, pseudoreplication
  • Maximum likelihood estimation
  • Bayesian estimation
slide-3
SLIDE 3

Inference

  • Interested in population quantities

– Parameters (e.g. μ, σ2)

  • Collect sample X
  • Use a sample statistic

() to estimate a population quantity

  • The sampling distribution

implies |

  • We use

| for inference about , or

  • We use |

for inference about (Bayesian)

slide-4
SLIDE 4

Point estimation

  • What is the population mean μ?

– A point estimate of is the sample mean ̅

  • Look to the sampling distribution

|

– According to CLT,

|~(, /)

– The standard error of the mean is thus / – Can approximate SEM ≈ s/

  • The sampling distribution of =

̅ is |

– Includes variability from ̅ and s ≈ – is the number of SEM units between ̅ and

slide-5
SLIDE 5

Hypothesis tests

  • Use

| for inference about

  • In hypothesis testing,

– Begin by assuming = ! (null hypothesis) – What is the sampling distribution

|"?

– Imagine we sample from

|". What values are

likely? What values are unlikely?

  • Our answer determines the rejection region of the test

– Now, collect a sample and compute #$%&

  • Is

$%& in the rejection region? Reject our initial hypothesis that = !

slide-6
SLIDE 6

Hypothesis tests

  • How to decide what is an unlikely value?

– Formulate an alternative hypothesis

  • > ! or < ! or ≠ !

– Decide on a Type 1 error rate α (false rejection) – α, together with alternative hypothesis, implies a rejection region (“unlikely value”)

  • If we don’t want to decide α, compute p-value

– Smallest α that would result in rejection of null hypothesis

slide-7
SLIDE 7

Confidence intervals

  • Consider

, the sampling distribution of

  • Given a probability, (e.g. 95% or 99%) we can

compute an interval for − from

  • For μ, use

~(0, /n) or

(

) ⁄

~./

  • Results in confidence intervals for μ

̅ ± 12//

  • r ̅ ± 3

4,./5/

slide-8
SLIDE 8

A short comment…

  • Use hypothesis tests sparingly, and for good

reason.

– Multiple comparisons can result in false alarms – Ask directed questions

  • Consider alternatives to hypothesis tests

– They provide little or no information about

  • What is the probability of the null hypothesis?

– Confidence intervals (or Bayesian posterior distributions) provide much more information

  • Always report means (point estimates) and

standard errors when reporting hypothesis tests

slide-9
SLIDE 9

Contingency tables

  • Comparing two or more categorical variables
  • Common question: are the variables

independent? Which categories have more or fewer units than expected?

Men Women Totals Brown Eyes 42 39 81 (81/174) Blue Eyes 35 38 73 (73/174) Other 12 8 20 (20/174) Totals 89 (89/174) 75 (75/174) 174

slide-10
SLIDE 10

One-way comparisons

  • Data: tinting
  • Experiment: time to discriminate a target for

different window tinting levels

no lo hi 50 100 150 200 Time (ms) Tinting

slide-11
SLIDE 11

One way ANOVA

Analysis of Variance Table Response: it Df Sum Sq Mean Sq F value Pr(>F) tint 2 6597 3298.4 2.1769 0.1164 Residuals 179 271220 1515.2

slide-12
SLIDE 12

Two-way comparisons

  • There are other factors that might influence

time to discriminate a target, e.g. age

it

50 100 150 200 no lo hi

Younger

no lo hi

Older

slide-13
SLIDE 13

Two way ANOVA

Analysis of Variance Table Response: it Df Sum Sq Mean Sq F value Pr(>F) tint 2 6597 3298 3.0965 0.04765 * agegp 1 81612 81612 76.6164 1.567e-15 *** Residuals 178 189607 1065

  • Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’

0.1 ‘ ’ 1

slide-14
SLIDE 14

Interaction plots

40 50 60 70 80 90 tint mean of it no lo hi agegp Older Younger

slide-15
SLIDE 15

Two-way ANOVA: interaction

Analysis of Variance Table Response: it Df Sum Sq Mean Sq F value Pr(>F) tint 2 6597 3298 3.1109 0.04702 * agegp 1 81612 81612 76.9729 1.466e-15 *** tint:agegp 2 2999 1499 1.4141 0.24590 Residuals 176 186609 1060

  • Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’

0.1 ‘ ’ 1

slide-16
SLIDE 16

Response curves

  • Sometimes a response should be handled as a

regression problem rather than ANOVA

3.0 3.5 4.0 4.5 0.6 0.8 1.0 1.2 angle distance

slide-17
SLIDE 17

Pseudoreplication

slide-18
SLIDE 18

Nested structures

  • If the scale of your effect doesn’t match the scale of

your experimental unit, don’t pretend that it does.

Q: How many experimental units do we have for comparing treatment to control?

slide-19
SLIDE 19

Maximum likelihood estimation

  • Likelihood is the probability of data given a

population, parameterized by

  • The value of that maximizes the likelihood is the

maximum likelihood estimate 6 7 . 89 = + ;9, ;9~ 0, , < = 1,2, … , @ A; , = C 1 2D E(FG)4

H4 . 9I/

J A; , = − 1 2 log(2D) − N (89 − ) 2

. 9I/

slide-20
SLIDE 20

Bayesian estimation

O = O O() O() It is often difficult to get O() directly, but O() is just a normalizing constant O ∝ O O() so use various tricks to generate samples from O O() The most popular trick is MCMC