Visualizing categorical data & inference Applied Multivariate - - PowerPoint PPT Presentation

visualizing categorical data inference
SMART_READER_LITE
LIVE PREVIEW

Visualizing categorical data & inference Applied Multivariate - - PowerPoint PPT Presentation

Visualizing categorical data & inference Applied Multivariate Statistics Spring 2012 Goals Chi-Square test of independence R: mosaic plot, cotabplot (with shading) Appl. Multivariate Statistics - Spring 2012 2 Start simple: Two


slide-1
SLIDE 1

Visualizing categorical data & inference

Applied Multivariate Statistics – Spring 2012

slide-2
SLIDE 2

Goals

  • Chi-Square test of independence
  • R: mosaic plot, cotabplot (with shading)

2

  • Appl. Multivariate Statistics - Spring 2012
slide-3
SLIDE 3

Start simple: Two binary variables

  • Education and Marriage (Kiser and Schaefer, 1949)
  • Two questions:
  • How to visualize (esp. if more than two variables)?
  • Dependence? Why?

3

  • Appl. Multivariate Statistics - Spring 2012

Education Married Once Married More Than Once Total College 550 61 611 No College 681 144 825 Total 1231 205 1436

slide-4
SLIDE 4

Visualizing categorical data: Mosaic Plot

4

  • Appl. Multivariate Statistics - Spring 2012

Education Married Once Married More Than Once Total College 550 61 611 No College 681 144 825 Total 1231 205 1436

Area proportional to table entry

slide-5
SLIDE 5

Chi-Square Test of Independence

A=1 A=2 Total B=1 n11 n12 n1* B=2 n21 n22 n2* n*1 n*2 n

5

  • Appl. Multivariate Statistics - Spring 2012

“observed values” Oij = nij

H0: A and B are independent; therefore P(A = i \ B = j) = P(A = i) ¢ P(B = j) ¼ ^ P(A = i) ¢ ^ P(B = j) = = n¢i n ¢ nj¢ n = ^ ¼ij Expected values in cells if H0 is true: Eij = n ¢ ^

¼ij

slide-6
SLIDE 6

Chi-Square Test of Independence

A=1 A=2 Total B=1 n11 n12 n1* B=2 n21 n22 n2* n*1 n*2 n

6

  • Appl. Multivariate Statistics - Spring 2012

How different are observed and expected values? Most popular: Pearson Chi-Square Statistics

X2 = PI

i=1

PJ

j=1 (Oij¡Eij)2 Eij

Contribution

  • f each cell to misfit

If H0 is true, X2 follows a Chi-Square distribution with (I-1)(J-1) degrees of freedom (if n large and no empty cells) Thus, can compute p-values. Alternative: Permutation test; more computer intensive but more precise

slide-7
SLIDE 7

Mosaic plot with shading

7

  • Appl. Multivariate Statistics - Spring 2012

p-value of independence test: Highly significant

Suprisingly small

  • bserved cell

count Suprisingly large

  • bserved cell

count

slide-8
SLIDE 8

Conditional plots: Mosaic plot per group

8

  • Appl. Multivariate Statistics - Spring 2012
slide-9
SLIDE 9

Case study: Admission UC Berkeley

9

  • Appl. Multivariate Statistics - Spring 2012
slide-10
SLIDE 10

Concepts to know

  • Chi-Square test of independence

10

  • Appl. Multivariate Statistics - Spring 2012
slide-11
SLIDE 11

R commands to know

  • mosaic (with shading)
  • Cotabplot (with shading)

(both in package “vcd”)

11

  • Appl. Multivariate Statistics - Spring 2012