The Aggregate Prediction Index and Non-Symmetric Correspondence - PowerPoint PPT Presentation

The Aggregate Prediction Index and Non-Symmetric Correspondence Analysis of Aggregate Data: The 2 x 2 Table Eric J. Beh School of Mathematical and Physical Sciences University of Newcastle, Australia Rosaria Lombardo Economics Faculty, Second University of Naples, Italy CARME 2011, Rennes, France – February 8-11

The 2 x 2 Contingency Table Cross-classify a sample of size n according to two dichotomous variables “Let us blot out the Column 1 Column 2 Total contents of the table, p leaving only the marginal p p ? ? Row 1  1 12 11 frequencies . . . [they] by p p ? ? p themselves supply no Row 2  22 2 21 information on . . . the p  p  Total 1 proportionality of the 1 2 frequencies in the body of the table . . . ” Symmetric association – Pearson chi-squared statistic Define – Fisher (1935) Aggregate Association Index (Beh; 2010 CS&DA) 2      P p p p p p             2 1 1 1 2 11 21 X P | p , p n P P       1 1 1 1 2  p   p p  p p      2 1 2 1 2

Bounds & Accounting Identity Duncan & Davis (1953) Bounds      n n n            1 2 1 L max 0 , P min , 1 U     1 1 1  n   n    1 1      n n n            1 1 1 L max 0 , P min , 1 U     2 2 2  n   n    2 2 The Accounting Identity (King, 1997; and others)   n P n P n    1 1 1 2 2 3

Non-Symmetric Correspondence Analysis p    ij Define p  ij j p  i as the difference between the unconditional marginal prediction p •j (column marginal proportion) and the conditional prediction p ij /p i• (row profiles) . Rows → Predictor Variable Columns → Response Variable Goodman-Kruskal tau index (1954) 2 2  For a 2x2 contingency table . . .  2 p  i ij       i 1 j 1 num 2 2     Light & Margolin (1971) 2 2 1 p 1 p   j j     j 1 j 1     2 C n 1 ~  NSCA ( D’Ambra & Lauro1989) , 1

Non-Symmetric Correspondence Analysis Decomposition of  ij   x  y ij i j Akin to the SVD and BMD of a general two-way contingency table Lancaster (1969) Orthonormality  2 i 3   p          1 , 2 i 1 1 x ( 1 )        p x p x i  p      1 1 2 2   0 , 1 2 1      j 1  y ( 1 ) 1 , 2     j  y y 2   1 2  0 , 1

Bounds Under the hypothesis of independence, ρ is an asymptotic standard normal random variable and can be expressed as a function of P 1 and of the marginal information:    P p p         1 1 1 P p 2  1 1 x y p  i j 2 Duncan & Davis (1953) showed that     p p p p p p p p                 1 1 2 2 2 1 1 2 min , min ,     p p p p p p p p             2 2 1 1 1 2 2 1 Which only requires the marginal information 6

NSCA and Classical Coordinates Some insight into the asymmetric association may be made using NSCA, by constructing a classical plot or biplot graphical display. For a classical plot  2 i 3      p               i 1   1 j 1 f x ( 1 ) g y ( 1 )   i i j j  p    2  2 These coordinates may be expressed in terms of P 1 and the marginal proportions p        1 g y P p    1 1 1 1     p f x 2 P p   2 1 1 1 1 p p         1      f x 2 ( P p )  1 g y P p  2 2 1 1  p 2 2 1 1 p  2  7 2

NSCA and the Biplot To depict the asymmetric relationship between row and column categories, consider a row metric preserving biplot (Kroonenberg, Lombardo, 1999). The biplot coordinates for the ith row  2 i 3   p          i 1 1 f x ( 1 )   i i  p   2 and for the j.th column   1     j 1   g y ( 1 ) j j   2 The row isometric biplot it is used to project the column coordinates on the line defined by the row coordinates, the shorter is the distance the stronger is the predictability! Bounds can be computed for coordinates. For example,     2 2 p p p p p p              1 1 2 1 1 2 min , f min ,     1 2 2 p p p p p p           8 2 2 1 2 2 1

Bounds of P 1 100(1 –  )% Confidence Bounds under the null hypothesis of independence     2 2     1 p 1 p p p   j j             j j * * 2 2 L p Z P p Z U             1 / 2 1 1 / 2 n 1  p  n 1  p    1 1         * * L max 0 , L P min 1 , U U     1 Given  and the aggregate data, there is a significant asymmetric association between the two dichotomous variables if     L P L or U P U   1 1 1 1

Aggregate Prediction Index (API) 30 30 Chi-squared Statistic Chi-squared Statistic C – Statistic Statistically significant association 25 25 Statistically significant association 20 20 15 15  2  2 10 10   5 5 0 0 p 1* p 1* L 1 0.0 0.2 L  0.4 0.6 U  0.8 1.0 L 1 U 1 0.0 0.2 L  0.4 0.6 0.8 1.0 U  U 1 P 1 P 1 Consider a plot of the chi-squared statistic versus P 1 If the area under C but above χ 2 is large than there is evidence that the row categories are good predictors of the column categories

Aggregate Prediction Index (API) 30 30 Chi-squared Statistic Chi-squared Statistic C – Statistic Statistically significant association 25 25 Statistically significant association 20 20 15 15  2  2 10 10   5 5 0 0 p 1* p 1* L 1 0.0 0.2 L  0.4 0.6 U  0.8 1.0 L 1 U 1 0.0 0.2 L  0.4 0.6 0.8 1.0 U  U 1 P 1 P 1 This area may be calculated by           U        2  L L U U C P | p , p dP       1 1 1 1 1 1   L    API 100 1    U    1 C P | p , p dP     1 1 1 1 L 1

Example – Fisher’s Twin Data Fisher's data studies 30 criminal twins and classifies them according to whether they are monozygotic twins or dizygotic twins. The table also classifies whether the twins have been convicted of a criminal offence. The Goodman – Kruskal tau index = 0.434 The C – statistic is 12.597. m p-value = 0.0004 → the type of twin is a good predictor of the conviction status of a criminal.

Example – Fisher’s Twin Data But, as Fisher (1935) did, suppose we “blot out” the cells of the table. Question: What information do the margins provide in understanding the extent to which the variables are associated. We shall • consider the non-symmetric correspondence analysis using only the aggregate data, and • calculate the aggregate prediction index

Example – Fisher’s Twin Data 51.7 API 0.05 = 56,85 If we consider the 5% C – Statistic level of significance, 23.0 the margins provide strong evidence that there may exist a significant prediction of conviction status 0.0 given twin type 0.92 1.00 0.00 0.20 0.40 0.60 0.80 12   p 1 0 . 4  30 2    26 30 P 12      1 C P No prediction when 0,19 ≤ P 1 ≤ 0.60 1   34 17

Example – Fisher’s Twin Data 1 Row=monoz. 0,8 Classical plot 0,6 Column=conv. proposed in 0,4 CA of 0,2 aggregate data (Beh, 2008) 0 0 0,2 0,4 0,6 0,8 1 1,2 -0,2 -0,4 Column=non conv. -0,6 Row=dizy. -0,8 No prediction if 0,19 ≤ P 1 ≤ 0.60

Example – Fisher’s Twin Data 1 Row=monoz. Column=conv. 0,8 Row 0,6 isometric 0,4 Biplot 0,2 0 0 0,2 0,4 0,6 0,8 1 1,2 -0,2 -0,4 Row=dizy. -0,6 -0,8 Column=non conv. No prediction if 0,19 ≤ P 1 ≤ 0.60 Inverse prediction if 0.0 ≤ P 1 ≤ 0.19 Direct prediction when 0.60 ≤ P 1 ≤ 0.92

The Aggregate Prediction Index and Non-Symmetric Correspondence - PowerPoint PPT Presentation

The Aggregate Prediction Index and Non-Symmetric Correspondence Analysis of Aggregate Data: The 2 x 2 Table Eric J. Beh School of Mathematical and Physical Sciences University of Newcastle, Australia Rosaria Lombardo Economics Faculty,

Aggregate Sampling Aggregate Stockpiles CIVL 3137 2 Stockpile Segregation CIVL 3137 3

CS143: Index 1 Topics to Learn Important concepts Dense index vs. sparse index Primary

Class 42: Free symmetric top Class 42: Free symmetric top Free symmetric top in body frame Assume

Asphalt Aggregate Specifications Aggregate Specifications In order to make good asphalt

Aggregate Blending Aggregate Blending To meet the gradation specifications for a concrete or

Short-Run Aggregate Supply (SRAS) Video explanation in 2 minutes or 12 minutes AND Long-Run

Inequalities for Symmetric Polynomials Curtis Greene October 24, 2009 Inequalities for Symmetric

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

Index Rules and Methodology Index Name Ticker S-Network US Equity 3000 Index SN3000 S-Network

NON-SYMMETRIC FRACTIONAL DIFFUSION NON-SYMMETRIC FRACTIONAL DIFFUSION AS A SPECIAL CASE OF AS A

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

0 4 1 3 2 No deterministic symmetric dining solution [RL81] Probabilistic symmetric

3.3: Time Series and Index Numbers 1. Time series: Plots Components 2. Index numbers: Simple

Index of Aggregate Work Hours per Person Index, 2007=100 Monthly data available as of Feb-2016

Symmetric Designs Lucia Moura School of Electrical Engineering and Computer Science University

A FRAILTY MODEL FOR CENSORED FAMILY SURVIVAL DATA, applied to the age at onset of mental problems

Presentations: A Presentation Thomas J. Leeper Department of Government London School of

The Aggregate Association Index Eric J. Beh School of Mathematical and Physical Sciences

Aberration in Context Vision for the future Ruth E. Ley Ruth E. Ley Department of Microbiology

Variation in falling and fall risk among community-dwelling older citizens in 12 European countries

frailty collaborative Learning Session 1 19 September 2019 #LWiCFrailty Enabling health and

Mechanisms of Anabolic Therapies for Osteoporosis Clifford Rosen MD rosenc@mmc.org MMC:

Time-indexed Types for Contracts Patrick Bahr Jost Berthold Martin Elsman DIKU paba@diku.dk

Sambuz

Useful Links

Newsletter

Mail Us