The Aggregate Association Index Eric J. Beh School of Mathematical - - PowerPoint PPT Presentation

the aggregate association index
SMART_READER_LITE
LIVE PREVIEW

The Aggregate Association Index Eric J. Beh School of Mathematical - - PowerPoint PPT Presentation

The Aggregate Association Index Eric J. Beh School of Mathematical and Physical Sciences University of Newcastle, Australia COMPSTAT 2010, Paris, France August 24 The 2 x 2 Contingency Table Cross-classify a sample of size n according to


slide-1
SLIDE 1

The Aggregate Association Index

Eric J. Beh

School of Mathematical and Physical Sciences University of Newcastle, Australia

COMPSTAT 2010, Paris, France – August 24

slide-2
SLIDE 2

The 2x2 Contingency Table

Cross-classify a sample of size n according to two dichotomous variables

( )

                − =

  • 2

1 2 1 2 2 1 1 1 1 1 2

p p p p p p P n p , p | P X

Column 1 Column 2 Total Row 1 Row 2 Total 1

11

p

12

p

  • 1

p

  • 2

p

2

p•

1

p•

21

p

22

p

  • =

1 11 1

p p P

Define

“Let us blot out the contents of the table, leaving

  • nly

the marginal frequencies . . . [they] by themselves supply no information

  • n . . . the proportionality of the

frequencies in the body of the table . . . ”

– Fisher (1935)

slide-3
SLIDE 3

Bounds of P1

1 1 1 1 1 2 1 1

U 1 , n n min P n n n , max L =         ≤ ≤         − =

  • Duncan & Davis (1953) Bounds

* 2 1 2 1 2 2 1 1 2 1 2 1 2 2 1 *

U p p p p n p p P p p p p n p p L

α

  • α
  • α
  • α

=         χ + < <         χ − =

( ) ( )

α α α α

= < < = U U , 1 min P L , max L

* 1 *

100(1 – α)% Confidence Bounds

slide-4
SLIDE 4

Aggregate Association Index (AAI)

0.0 0.2 0.4 0.6 0.8 1.0 5 10 15 20 25 30

L1 U1 Uα Lα p1* χ2

α

Chi-squared Statistic

Statistically significant association

P1

If the area under X2(P1) but above is large than there may be evidence to suggest that there is a significant association (at the α level of significance) between the two dichotomous variables.

2 α

χ

slide-5
SLIDE 5

Aggregate Association Index (AAI)

( ) ( ) [ ] ( ) ( )

          + χ − + − − =

∫ ∫

  • α

α α α

α α 1 1

U L 1 1 1 1 2 U L 1 1 1 1 2 2 1 1

dP p , p | P X dP p , p | P X U U L L 1 100 A

0.0 0.2 0.4 0.6 0.8 1.0 5 10 15 20 25 30

L1 U1 Uα Lα p1* χ2

α

Chi-squared Statistic

Statistically significant association

P1

slide-6
SLIDE 6

Example – Fisher’s Twin Data

Fisher's data studies 30 criminal twins and classifies them according to whether they are a monozygotic twin or a dizygotic twin. The table also classifies whether their same sex twin has been convicted of a criminal offence.

Pearson chi-squared statistic is 13.032.

 p-value = 0.0003 → there is evidence of a strong association

between the two variables.

 The product moment correlation = 0.6591 → positive association

slide-7
SLIDE 7

Example – Fisher’s Twin Data

But, as Fisher (1935) did, suppose we “blot out” the cells of the table. Question: What information do the margins provide in understanding the extent to which the variables are associated. We shall calculate the aggregate association index

slide-8
SLIDE 8

Example – Fisher’s Twin Data

( )

2 1 1 2

17 12 P 30 216 221 P X       − = where 0 ≤ P1 ≤ 0.9231

A0.05 = 61.83

If we consider the 5% level of significance, the margins provide strong evidence that there may exist a significant association between twin type & conviction status

slide-9
SLIDE 9

Direction of the Association

− α + α α

+ = A A A

− α

A

+ α

A

slide-10
SLIDE 10

Fisher’s Twin Data ( . . . revisited)

83 . 61 A

05 .

=

Therefore based solely on the marginal information we can determine that the variables are three times more likely to be positively associated than negatively associated

43 . 46 A

05 .

=

+

40 . 15 A

05 .

=

slide-11
SLIDE 11

Discussion

 The index provides an indication of the extent to which two dichotomous variables are statistically significantly association given only the marginal information  Index is not meant to infer the individual level correlation of the variables, but to provide a measure reflecting how likely the two variables may be associated.

Further Issues:

 Investigate the applicability of index for G (>1) 2x2 tables, including incorporating covariate information (ecological inference)  Has links with the correspondence analysis of aggregate data  Link with Fisher’s exact test