categorical data
play

Categorical data Reasoning by diagrams R.W. Oldford Crossed data - - PowerPoint PPT Presentation

Categorical data Reasoning by diagrams R.W. Oldford Crossed data - tables The main data structure for crossed categorical data is a table . Each variate has a finite number of values (categories) city <- c ("Kitchener",


  1. Categorical data Reasoning by diagrams R.W. Oldford

  2. Crossed data - tables The main data structure for crossed categorical data is a table . Each variate has a finite number of values (categories) city <- c ("Kitchener", "Waterloo") housing <- c ("House", "Apartment", "Residence") All combinations of one value from each variate are possible (crossed) and we have the number of times each combination occurs # fake data counts <- rpois (6, lambda = 50) Arranged in a rectangular array: vacancy <- matrix (counts, nrow = length (city), ncol = length (housing), byrow = TRUE, dimnames = list (city = city, housing = housing)) And now coerced to be an object of class table vacancy <- as.table (vacancy) vacancy ## housing ## city House Apartment Residence ## Kitchener 52 53 46 ## Waterloo 47 64 43

  3. Crossed data - tables The table can be a many-way array from crossing many categorical variates term <- c ("Fall", "Winter", "Spring") # more fake counts counts <- seq (from = 10, to = 180, by = 10) vacancy <- array (counts, dim= c ( length (city), length (housing), length (term)), dimnames = list (city = city, housing = housing, term = term)) vacancy <- as.table (vacancy) vacancy ## , , term = Fall ## ## housing ## city House Apartment Residence ## Kitchener 10 30 50 ## Waterloo 20 40 60 ## ## , , term = Winter ## ## housing ## city House Apartment Residence ## Kitchener 70 90 110 ## Waterloo 80 100 120 ## ## , , term = Spring ## ## housing ## city House Apartment Residence ## Kitchener 130 150 170 ## Waterloo 140 160 180 Note when filling the array, the earlier indices change more quickly than do the later indices.

  4. Crossed data - tables The order of dimensions can be rearranged - the R function aperm(...) aperm (vacancy, perm= c (3,2,1)) ## , , city = Kitchener ## ## housing ## term House Apartment Residence ## Fall 10 30 50 ## Winter 70 90 110 ## Spring 130 150 170 ## ## , , city = Waterloo ## ## housing ## term House Apartment Residence ## Fall 20 40 60 ## Winter 80 100 120 ## Spring 140 160 180

  5. Crossed data - constructing tables from data Have an existing dataframe with categorical variates SAheart[1 : 3,] ## sbp tobacco ldl adiposity famhist typea obesity alcohol age chd ## 1 160 12.00 5.73 23.11 Present 49 25.30 97.20 52 1 ## 2 144 0.01 4.41 28.61 Absent 55 28.87 2.06 63 1 ## 3 118 0.08 3.48 32.28 Present 52 29.14 3.81 46 0 Create the table directly from individual factors (like famhist ) or unique values (like chd ): table (SAheart $ chd, SAheart $ famhist, dnn = c ("chd", "famhist")) ## famhist ## chd Absent Present ## 0 206 96 ## 1 64 96 Or, by cross-tabulation (“cross tabs” or xtabs ) xtabs ( ~ chd + famhist, data = SAheart) # Note formula ## famhist ## chd Absent Present ## 0 206 96 ## 1 64 96

  6. Crossed data - working with tables Consider the three-way table (a 4 x 4 x 2 array) HairEyeColor : ## , , Sex = Male ## ## Eye ## Hair Brown Blue Hazel Green ## Black 32 11 10 3 ## Brown 53 50 25 15 ## Red 10 10 7 7 ## Blond 3 30 5 8 ## ## , , Sex = Female ## ## Eye ## Hair Brown Blue Hazel Green ## Black 36 9 5 2 ## Brown 66 34 29 14 ## Red 16 7 7 7 ## Blond 4 64 5 8 The names of its variates ( dimnames ) in order are: names ( dimnames (HairEyeColor)) ## [1] "Hair" "Eye" "Sex" are used to create interesting sub-tables or alternative tables.

  7. Crossed data - working with tables Selecting slices (conditioning) HairEyeColor["Black",,] ## Sex ## Eye Male Female ## Brown 32 36 ## Blue 11 9 ## Hazel 10 5 ## Green 3 2 HairEyeColor[,"Green",] ## Sex ## Hair Male Female ## Black 3 2 ## Brown 15 14 ## Red 7 7 ## Blond 8 8 HairEyeColor["Black","Blue",] ## Male Female ## 11 9 HairEyeColor["Black","Green","Male"] ## [1] 3

  8. Crossed data - working with tables Collapsing dimensions (marginalizing, projecting) # Zero dimensional margin.table (HairEyeColor) ## [1] 592 # 1 dimensional -- here margin 1 ("Hair") is preserved margin.table (HairEyeColor, margin=1) ## Hair ## Black Brown Red Blond ## 108 286 71 127 # 2 dimensional -- here margins 1 and 2 ("Hair", "Eye") are preserved margin.table (HairEyeColor, margin= c (1,2)) ## Eye ## Hair Brown Blue Hazel Green ## Black 68 20 15 5 ## Brown 119 84 54 29 ## Red 26 17 14 14 ## Blond 7 94 10 16 # Note: except for 0 dimensional. these are the same as using "apply" with "sum" apply (HairEyeColor, MARGIN=1, FUN=sum) ## Black Brown Red Blond ## 108 286 71 127

  9. Crossed data - working with tables Summing along every margin (new variate value Sum for each variate) # Every margin is summed addmargins (HairEyeColor) ## , , Sex = Male ## ## Eye ## Hair Brown Blue Hazel Green Sum ## Black 32 11 10 3 56 ## Brown 53 50 25 15 143 ## Red 10 10 7 7 34 ## Blond 3 30 5 8 46 ## Sum 98 101 47 33 279 ## ## , , Sex = Female ## ## Eye ## Hair Brown Blue Hazel Green Sum ## Black 36 9 5 2 52 ## Brown 66 34 29 14 143 ## Red 16 7 7 7 37 ## Blond 4 64 5 8 81 ## Sum 122 114 46 31 313 ## ## , , Sex = Sum ## ## Eye ## Hair Brown Blue Hazel Green Sum ## Black 68 20 15 5 108 ## Brown 119 84 54 29 286 ## Red 26 17 14 14 71 ## Blond 7 94 10 16 127 ## Sum 220 215 93 64 592

  10. Crossed data - working with tables Summing along a single margin # Just produce marginal sums over dimension 2 ("Eyes") values # for each pair (i, k) of remaining variates "Hair" and "Sex" addmargins (HairEyeColor, margin=2) ## , , Sex = Male ## ## Eye ## Hair Brown Blue Hazel Green Sum ## Black 32 11 10 3 56 ## Brown 53 50 25 15 143 ## Red 10 10 7 7 34 ## Blond 3 30 5 8 46 ## ## , , Sex = Female ## ## Eye ## Hair Brown Blue Hazel Green Sum ## Black 36 9 5 2 52 ## Brown 66 34 29 14 143 ## Red 16 7 7 7 37 ## Blond 4 64 5 8 81

  11. Crossed data - working with tables Summing along two margins # Produce marginal sums over both dimensions 1 and 2 ("Hair" and "Eyes") # for each value for "Eye" addmargins (HairEyeColor, margin= c (1,2)) ## , , Sex = Male ## ## Eye ## Hair Brown Blue Hazel Green Sum ## Black 32 11 10 3 56 ## Brown 53 50 25 15 143 ## Red 10 10 7 7 34 ## Blond 3 30 5 8 46 ## Sum 98 101 47 33 279 ## ## , , Sex = Female ## ## Eye ## Hair Brown Blue Hazel Green Sum ## Black 36 9 5 2 52 ## Brown 66 34 29 14 143 ## Red 16 7 7 7 37 ## Blond 4 64 5 8 81 ## Sum 122 114 46 31 313

  12. Crossed data - working with tables Proportions (depends on which margin is fixed) # No margins fixed, just total ... single multinomial round ( prop.table (HairEyeColor), 3) ## , , Sex = Male ## ## Eye ## Hair Brown Blue Hazel Green ## Black 0.054 0.019 0.017 0.005 ## Brown 0.090 0.084 0.042 0.025 ## Red 0.017 0.017 0.012 0.012 ## Blond 0.005 0.051 0.008 0.014 ## ## , , Sex = Female ## ## Eye ## Hair Brown Blue Hazel Green ## Black 0.061 0.015 0.008 0.003 ## Brown 0.111 0.057 0.049 0.024 ## Red 0.027 0.012 0.012 0.012 ## Blond 0.007 0.108 0.008 0.014 Possible generative model:

  13. Crossed data - working with tables Proportions (depends on which margin is fixed) # No margins fixed, just total ... single multinomial round ( prop.table (HairEyeColor), 3) ## , , Sex = Male ## ## Eye ## Hair Brown Blue Hazel Green ## Black 0.054 0.019 0.017 0.005 ## Brown 0.090 0.084 0.042 0.025 ## Red 0.017 0.017 0.012 0.012 ## Blond 0.005 0.051 0.008 0.014 ## ## , , Sex = Female ## ## Eye ## Hair Brown Blue Hazel Green ## Black 0.061 0.015 0.008 0.003 ## Brown 0.111 0.057 0.049 0.024 ## Red 0.027 0.012 0.012 0.012 ## Blond 0.007 0.108 0.008 0.014 Possible generative model: multinomial . Here counts n ijk have fixed total n = n +++ = � ijk n ijk = 592. n Pr ( Data ) = � � p n 111 · · · p n 442 111 442 n 111 n 211 · · · n 442 with p +++ = � 4 � 4 � 2 k =1 p ijk = 1. i =1 j =1

  14. Crossed data - working with tables Proportions (depends on which margin is fixed) # One margin (the third here, i.e. Sex) is fixed ... as many multinomials as in round ( prop.table (HairEyeColor, margin=3), 2) ## , , Sex = Male ## ## Eye ## Hair Brown Blue Hazel Green ## Black 0.11 0.04 0.04 0.01 ## Brown 0.19 0.18 0.09 0.05 ## Red 0.04 0.04 0.03 0.03 ## Blond 0.01 0.11 0.02 0.03 ## ## , , Sex = Female ## ## Eye ## Hair Brown Blue Hazel Green ## Black 0.12 0.03 0.02 0.01 ## Brown 0.21 0.11 0.09 0.04 ## Red 0.05 0.02 0.02 0.02 ## Blond 0.01 0.20 0.02 0.03 Possible generative model:

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend