dependencies in interval valued valued dependencies in
play

Dependencies in Interval- -valued valued Dependencies in Interval - PowerPoint PPT Presentation

Dependencies in Interval- -valued valued Dependencies in Interval Symbolic Data Symbolic Data Lynne Billard University of Georgia lynne@stat.uga.edu Tribute to Professor Edwin Diday: Paris, France; 5 September 2007 Naturally occurring


  1. Dependencies in Interval- -valued valued Dependencies in Interval Symbolic Data Symbolic Data Lynne Billard University of Georgia lynne@stat.uga.edu Tribute to Professor Edwin Diday: Paris, France; 5 September 2007

  2. Naturally occurring Symbolic Data -- Mushrooms

  3. Patient Records – – Single Hospital, Single Hospital, Cardiology Cardiology Patient Records Patient Hospital Age Smoker …. Patient 1 Fontaines 74 heavy Patient 2 Fontaines 78 light Patient 3 Beaune 69 no Patient 4 Beaune 73 heavy Patient 5 Beaune 80 light Patient 6 Fontaines 70 heavy Patient 7 Fontaines 82 heavy M M M M

  4. Patient Records by Hospital -- aggregate over patients Result: Symbolic Data Patient Hospital Age Smoker Patient 1 Fontaines 74 heavy Patient 2 Fontaines 78 light Patient 3 Beaune 69 no Patient 4 Beaune 73 heavy Patient 5 Beaune 80 light Patient 6 Fontaines 70 heavy Patient 7 Fontaines 82 heavy M M M M Hospital Age Smoker Fontaines [70, 82] {light ¼, heavy ¾} Beaune [69, 80] {no, light, heavy} M M M

  5. Histogram-valued Data -- Weight by Age Distribution:

  6. Logical dependency rule E.g. Y 1 = age Y 2 = # children Classical: Y a = (10, 0), Y b = (20, 2), Y c = (18, 1) Aggregation → ξ = (10 , 20) × (0, 1, 2) Symbolic: 2 1 0 10 20 I.e., ξ implies classical Y d = (10, 2) is possible ν : {If Y 1 < 15, then Y 2 = 0} Need rule

  7. Interval-valued data u Y 1 Y 2 u Y 1 Y 2 Team # At-Bats # Hits Team # At-Bats # Hits 1 (289, 538) (75, 162) 11 (212, 492) (57, 151) 2 (88, 422) (49, 149) 12 (177, 245) (189, 238) 3 (189, 223) (201, 254) 13 (342, 614) (121, 206) 4 (184, 476) (46, 148) 14 (120, 439) (35, 102) 5 (283, 447) (86, 115) 15 (80, 468) (55, 115) 6 (24, 26) (133, 141) 16 (75, 110) (75, 110) 7 (168, 445) (37, 135) 17 (116, 557) (95, 163) 8 (123, 148) (137, 148) 18 (197, 507) (52, 53) 9 (256, 510) (78, 124) 19 (167, 203) (48, 232) 10 (101, 126) (101, 132) ξ (2): Y 2 = 149 not possible when Y 1 < 149

  8. Observation ξ(2) Y 2 Y 2 = α Y 1 149 R 4 R 1 88 R 2 R 3 49 88 149 422

  9. Dependencies between Variables – Interval-valued Variables E.g., Regression Analysis Y = ( Y 1 , L , Y q ), e.g., q=1 Dependent variable: X = (X 1 , L , X p ) Predictor/regression variable: Multiple regression model: Y = β 0 + β 1 X 1 + L + β p X p + e e ∼ E(e)=0, Var(E) = σ 2 , Cov(e i , e k )= 0, i ≠ k. Error:

  10. Y = β 0 + β 1 X 1 + L + β p X p + e Multiple Regression Model: In vector terms, Y = X β + e Observation matrix: Y 0 = (Y 1 , L , Y n ) ⎛ ⎞ 1 X 11 · · · X 1 p ⎜ ⎟ . . . . . . Design matrix: X = . . . ⎝ ⎠ 1 X n 1 · · · X np Regression coefficient matrix: β 0 = ( β 0 , β 1 , L , β p ) Error matrix: e 0 = (e 1 , L , e n )

  11. Model: Y = X β + e Least squares estimator of β is ˆ = ( X 0 X ) -1 X 0 Y β When p=1, P n i =1 ( X i − ¯ X )( Y i − ¯ Y ) = Cov ( X, Y ) ˆ β 1 = V ar ( X ) , P n i =1 ( X i − ¯ X ) 2 ˆ Y − ˆ ¯ β ¯ β 0 = X where n n X X Y = 1 X = 1 ¯ ¯ Y i , X i . n n i =1 i =1

  12. Y = β 0 + β 1 X 1 + L + β p X p + e Model: Or, write as Y − ¯ Y = β 1 ( X 1 − ¯ X 1 ) + . . . + β p ( X p − ¯ X p ) + e n X X j = 1 ¯ j = 1 , . . . , p. X ij , n i =1 β 0 ≡ ¯ Y − ( β 1 ¯ X 1 + . . . + β p ¯ X p ) Then,

  13. Y − ¯ Y = β 1 ( X 1 − ¯ X 1 ) + . . . + β p ( X p − ¯ X p ) + e Least squares estimator of β is 0 ( X − ¯ 0 ( Y − ¯ − 1 ( X − ¯ β = [( X − ¯ ˆ X ) X )] X ) Y ) where X ) 0 ( X − ¯ ( X − ¯ X ) = ⎛ ⎞ X 1 ) 2 Σ ( X 1 − ¯ Σ ( X 1 − ¯ X 1 )( X p − ¯ · · · X p ) ⎜ ⎟ . . . . = . . ⎝ ⎠ X p ) 2 Σ ( X p − ¯ X p )( X 1 − ¯ Σ ( X p − ¯ · · · X 1 ) ⎛ ⎞ ⎝ X ⎠ , ( X j 1 − ¯ X j 1 )( X j 2 − ¯ j 1 , j 2 = 1 , · · · , p = X j 2 ) i ⎛ ⎞ ⎝ X X ) 0 ( Y − ¯ ( X − ¯ ⎠ , j = 1 , · · · , p ( X j − ¯ X j )( Y − ¯ Y ) = Y ) i

  14. Interval-valued data: = = ∈ = Y [ a , b ], j 1,..., , p u E { w ,..., w ,... w } uj uj uj 1 u m Bertrand and Goupil (2000): Symbolic sample mean is 1 = + ∑ Y ( b a ), j uj uj 2 m u ∈ E Symbolic sample variance is 1 1 = + + − + 2 ∑ 2 2 ∑ 2 S ( b b a a ) [ ( b a )] j uj uj uj uj uj uj 3 m 2 ∈ 4 m ∈ u E u E Notice, e.g., m = 1, Y = Weight Y 1 = [132, 138] → = 2 = Y 135, S 3 1 1 = 2 = Y 135, S 12 Y 2 = [129, 141] → 1 2

  15. Can rewrite 1 ∑ 2 = − 2 + − − + − 2 S [( a Y ) ( a Y )( b Y ) ( b Y ) ] j uj j uj j uj j uj j m ∈ 3 u E Then, by analogy, for j = 1,2, for interval-valued variables Y 1 and Y 2 , empirical covariance function Cov ( Y 1 , Y 2 ) is 1 ∑ 1/ 2 = Cov Y Y ( , ) G G Q Q [ ] 1 2 1 2 1 2 3 m ∈ u E = − 2 + − − + − 2 Q ( a Y ) ( a Y )( b Y ) ( b Y ) j uj j uj j uj j uj j ⎧− ≤ ⎪ 1, if Y Y , uj j = ⎨ G j > ⎪ 1, if Y Y , ⎩ uj j = + Y ( a b )/ 2. uj uj uj ≡ 2 Notice, special cases: (i) C o v Y ( , Y ) S 1 1 1 (ii) If a uj = b uj = y j , for all u , i.e., classical data, 1 = Σ − − C ov Y ( , Y ) ( y Y )( y Y ) 1 2 1 1 2 2 m

  16. Back to Bertrand and Goupil (2000) Sample variance is 1 1 = + + − + 2 ∑ 2 2 ∑ 2 S ( b b a a ) [ ( a b )] j uj uj uj uj uj uj 3 m 2 ∈ 4 m ∈ u E u E This is total variance. = 2 SS mS Take Total Sum of Squares = Total j j Then, we can show = + Total SS Within Objects SS Betwee n Obje cts S S j j j where

  17. 1 ∑ 2 2 2 = − + − − + − S [( a Y ) ( a Y )( b Y ) ( b Y ) ] j uj j uj j uj j uj j m ∈ 3 u E 1 2 2 = − + − − + − ∑ [( a Y ) ( a Y )( b Y ) ( b Y ) ] Within Objects SS j uj uj uj uj uj uj uj uj 3 u ∈ E ∑ 2 = + − S S [( a b ) / 2 Y ] Between Objects j uj uj j ∈ u E with 1 ∑ = + = + Y ( a b ) / 2, Y ( a b ). uj uj uj j uj uj 2 m ∈ u E = = a b Y Classical data: u j u j u j → Within Objects SS j = 0

  18. So, for Y j , we have Sum of Squares SS, = + Total SS Within Objects SS Betwee n Obje cts S S j j j Likewise, for ( Y i , Y j ), we have Sum of Products SP = + Tota l SP Within Objects SP Between Objec ts SP ij ij ij

  19. Can rewrite 1 ∑ 2 = − 2 + − − + − 2 S [( a Y ) ( a Y )( b Y ) ( b Y ) ] j uj j uj j uj j uj j m ∈ 3 u E Then, by analogy, for j = 1,2, for interval-valued variables Y 1 and Y 2 , empirical covariance function Cov ( Y 1 , Y 2 ) is 1 ∑ 1/ 2 = Cov Y Y ( , ) G G Q Q [ ] 1 2 1 2 1 2 3 m ∈ u E = − 2 + − − + − 2 Q ( a Y ) ( a Y )( b Y ) ( b Y ) j uj j uj j uj j uj j ⎧− ≤ ⎪ 1, if Y Y , uj j = ⎨ G j > ⎪ 1, if Y Y , ⎩ uj j = + Y ( a b )/ 2. uj uj uj

  20. Can rewrite 1 ∑ 2 = − 2 + − − + − 2 S [( a Y ) ( a Y )( b Y ) ( b Y ) ] j uj j uj j uj j uj j m ∈ 3 u E Then, by analogy, for j = 1,2, for interval-valued variables Y 1 and Y 2 , empirical covariance function Cov ( Y 1 , Y 2 ) is 1 ∑ 1/ 2 = Cov Y Y ( , ) G G Q Q [ ] 1 2 1 2 1 2 3 m ∈ u E = − 2 + − − + − 2 Q ( a Y ) ( a Y )( b Y ) ( b Y ) j uj j uj j uj j uj j ⎧− ≤ ⎪ 1, if Y Y , uj j = ⎨ G j > ⎪ 1, if Y Y , ⎩ uj j = + Y ( a b )/ 2. uj uj uj ( Total)SP part can be replaced by X Total SP = 1 £ 2( a − ¯ Y )( c − ¯ X ) + ( a − ¯ Y )( d − ¯ X ) + ( b − ¯ Y )( c − ¯ X ) 6 u ¤ +2( b − ¯ Y )( d − ¯ X )

  21. How is this obtained? Recall that for a Uniform distribution, Y ∼ S ( a, b ), V ar ( Y ) = ( b − a ) 2 12 By analogy, we can show, for u=1,…,m observations, m X 1 ( a u − b u )( c u − d u ) Within SP = 12 u =1 µ a u + b u ¶ µ c u + d u ¶ m X − ¯ − ¯ Between SP = Y 1 Y 2 2 2 u =1 where Y u 1 = [ a u , b u ] , Y u 2 = [ c u , d u ] µ a u + b u ¶ µ c u + d u ¶ m m X X Y 1 = 1 Y 2 = 1 ¯ ¯ , 2 2 m m u =1 u =1

  22. m X 1 Within SP = ( a u − b u )( c u − d u ) 12 u =1 µ a u + b u ¶ µ c u + d u ¶ m X − ¯ − ¯ Between SP = Y 1 Y 2 2 2 u =1 Hence, from Total SP = Within SP + Between SP m X =1 £ 2( a u − ¯ Y 1 )( c − ¯ Y 2 ) + ( a − ¯ Y 1 )( d − ¯ Y 2 ) 6 u =1 ¤ +( b − ¯ Y 1 )( c − ¯ Y 2 ) + 2( b − ¯ Y 1 )( d − ¯ Y 2 )

  23. Y X1 X2 Pulse Systolic Diastolic u Rate Pressure Pressure 1 [44, 68] [90, 110] [50, 70] 2 [60, 72] [90, 130] [70, 90] 3 [56, 90] [140, 180] [90, 100] 4 [70, 112] [110, 142] [80, 108] 5 [54, 72] [90, 100] [50, 70] 6 [70, 100] [134, 142] [80, 110] 7 [72, 100] [130, 160] [76, 90] 8 [76, 98] [110, 190] [70, 110] 9 [86, 96] [138, 180] [90, 110] 10 [86, 100] [110, 150] [78, 100] 11 [63, 75] [60, 100] [140, 150] Rule: X2 = Diastolic Pressure < Systolic Pressure = X1

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend