LOGISTIC BIPLOTS FOR BINARY, NOMINAL AND ORDINAL DATA Jos Luis - - PowerPoint PPT Presentation
LOGISTIC BIPLOTS FOR BINARY, NOMINAL AND ORDINAL DATA Jos Luis - - PowerPoint PPT Presentation
LOGISTIC BIPLOTS FOR BINARY, NOMINAL AND ORDINAL DATA Jos Luis Vicente Villardn Universidad de Salamanca. Spain. villardon@usal.es http://biplot.usal.es SUMMARY Classical Biplot methods allow for the simultaneous representation of
SUMMARY Classical Biplot methods allow for the simultaneous representation of individuals and continuous variables in a given data matrix. When variables are binary, nominal or ordinal, a classical linear biplot representation is not
- suitable. We propose a linear biplot representation based on logistic response
- models. The coordinates of individuals and variables are computed to have
logistic responses along the biplot dimensions. The method is related to logistic regression in the same way that Classical Biplot Analysis (CBA) is related to linear regression, thus we refer to the method as Logistic Biplot (LB). In the same way as Linear Biplots are related to Principal Components Analysis, Logistic Biplots are related to Latent Trait Analysis or Item Response Theory. The geometry of those kinds of biplots is studied: For nominal data, the linear biplot results in a partition of the representation the divides the space onto a prediction region for each category; for ordinal data, we obtain a prediction direction with points separating each category. The usefulness of the proposal is illustrated using data on SNPs (Single Nucleotide Polymorphisms) from the HAPMAP project.
1.- INTRODUCTION 2.- CLASSICAL BIPLOT 2.1 Linear biplot based on alternating regressions/interpolations 2.2 Geometry of linear regression biplots 3.- LOGISTIC BIPLOT FOR BINARY DATA 3.1.-Formulation 3.2.- Parameter Estimation 3.2.- Geometry of logistic Biplots 4.- LOGISTIC BIPLOT FOR CATEGORICAL & ORDINAL DATA 4.1.-Formulation 4.2.- Parameter Estimation 4.3.- Geometry of logistic Biplots 4.- APPLICATION: HAPMAP
1.- INTRODUCTION
- Item Response Theory (BAKER, 1992)
Biplot (GABRIEL, 1971)
- Alternate regressions. (GABRIEL y ZAMIR, 1979)
- Regression/calibration
- Interpolation (GOWER & HAND, 1996)
CONTINUOUS DATA CATEGORICAL DATA
- Multiple Correspondence Analysis (ACM).
- Prediction Regions (GOWER & HAND, 1996)
- Generalised bilinear models. (FALGUEROLLES et al, 1995)
- Segmented Bilinear Models (GABRIEL,1999)
LOGISTIC BIPLOT
2.- CLASSICAL BIPLOTS
2.2 Geometry of linear regression biplots
- Let L be the space spanned by the columns of A, usually two-dimensional. We complete L
with a third dimension for the j-th variable and adjust the regression plane.
- A linear response surface in the three-dimensional space is obtained. Let us call it H.
- The set of points in H predicting a fixed value is given by the intersection between the
plane normal to the third axis passing trough the fixed value, and H. That intersection is a straight line.
- The points in L predicting different values of the variable are also on parallel straight lines.
- The biplot axis can be completed with scales.
The squared R for the regressions is interpreted as a measure of the “quality of the representation” in the sense commonly used in Correspondence Analysis.
ALGORITHM
Interpolation
- If we fix the markers in A and adjust a logistic model for a two-dimensional
representation, we obtain a logistic response surface H. In this case the third axis show a scale for the expected probabilities.
- Although the response surface is non linear, the intersections of the planes normal to
the probability axis and H are also straight lines on H.
- The points in L predicting different probabilities are also on parallel straight lines.
- The direction of bj is given by (bj1, bj2), the parameter estimates of the logistic biplot.
- The biplot axis bj is completed with marks for projection points predicting probabilities;
the main difference with the linear biplot is that equally spaced marks do not correspond with equally spaced probabilities.
- To simplify the graphical representation we propose to add marks for fixed values of the
predictions, for example, .25, .50 and .75. This will look like a symmetrical box-plot and no labels are necessary.
- 1. LOGISTIC BIPLOT FOR NOMINAL DATA
Let XIxJ be a data matrix containing the values of J categorical variables -each with Kj (j=1, …, J) categories- for I individuals, and let GIxL be the corresponding indicator matrix with columns. The last category of each variable will be used as a baseline. Let pi(jk) the expected probability that the category k of variable j be present at individual i. In the he multinomial logistic latent trait model we assume that the log-odds of each response (relative to the last category) follows a linear model where ais and b(jk)s (i=1, …,I; j=1, …,J; k=1,…,Kj-1 s=1, ..., S) are the model
- parameters. In matrix form
where OIxL is the matrix containing the expected odds, defines a biplot for the odds. Although the biplot for the odds may be useful, it would be more interpretable in terms of predicts probabilities and categories.
The points in L predicting different probabilities are no longer on parallel straight lines (see the figure with the response surfaces); this means that predictions on the logistic biplot are not made in the same way as in the linear biplots, the surfaces define now prediction regions for each category as shown in the contour graph.
The points in L predicting different probabilities are no longer on parallel straight lines (see the figure with the response surfaces); this means that predictions on the logistic biplot are not made in the same way as in the linear biplots, the surfaces define now prediction regions for each category as shown in the contour graph.
The points in L predicting different probabilities are no longer on parallel straight lines (see the figure with the response surfaces); this means that predictions on the logistic biplot are not made in the same way as in the linear biplots, the surfaces define now prediction regions for each category as shown in the contour graph.
The points in L predicting different probabilities are no longer on parallel straight lines (see the figure with the response surfaces); this means that predictions on the logistic biplot are not made in the same way as in the linear biplots, the surfaces define now prediction regions for each category as shown in the contour graph.
- 3. LOGISTIC BIPLOT FOR ORDINAL DATA
Let XIxJ be a data matrix containing the values of J ordinal variables -each with Kj (j=1, …, J) categories- for I individuals, and let GIxL be the cumulative indicator matrix with columns. The last category of each variable will be used as a baseline. Let pi(j≤k) = P(xij≤k). The ordinal logistic latent trait model for the cumulative probabilities is The equations define a biplot in the logit scale that shares the geometry of the binary case for each category. Observe that each category have a different constant but the same slopes, that means that the prediction direction is common to all categories and just the prediction markers are different. The parameters b define the direction
- f the projection; the representation subspace can be divided into prediction regions,
for each category, delimited by parallel straight lines.
Ordinal data: Cummulative probabilities
Ordinal Data: Expected probabilities
Ordinal Data: Back Projection onto the biplot
- 4. PARAMETER ESTIMATION
- Alternated generalized regressions and interpolations. (Maximum Likelihood).
- Marginal Maximum Likelihood (As in Item Response Theory).
- Separation problem (Maximum likelihood does not converge). Penalized Maximum
Likelihood.
- Iterative Majorization.
- Other Methods
- Heuristic approach for big data matrices: External Logistic Biplots (Logistic fits on
the Principal Coordinates).
- 5. APPLICATIONS
- HAPMAP Data.
- Sugar cane germoplasm.
- Innovation profiles in Portugal.
- Irregular working force in Spain.
APPLICATION TO MAPMAP DATA
Whole Set Bonferroni Selection R2 Selection
Interpretation of the biplot
Characterization of the groups
http://biplot.usal.es/ClassicalBiplot/index.html
MULTBIPLOT (MULTivariate analysis using BIPLOTs
References
BAKER, F.B. (1992): Item Response Theory. Parameter Estimation Techniques. Marcel Dekker. New
York.
GABRIEL, K. R. (1998). Generalised bilinear regresión. Biometrika, 85: 689 – 700. GOWER, J. C. & HAND, D. (1986): Biplots. Chapman & Hall. London. DEMEY, J., VICENTE-VILLARDON, J. L., GALINDO, M.P
. & ZAMBRANO, A. (2008) Identifying Molecular Markers Associated With Classification Of Genotypes Using External Logistic Biplots. Bioinformatics, 24(24):2832-2838.
PATINO MC., VICENTE P
., VICENTE-VILLARDÓN JL. & GALINDO P . (2010) Identifying Inmigrant Women patterns by external Logistic Biplots. Sociological Methodology (En revisión).
VERBOON, P
. & HEISER, W . J. (1994). Resistant Lower Rank Approximation of Matrices by Iterative
- Majorization. Computational Statistics & Data Analysis. 18: 457-467.
VICENTE, M.P
.; NORONHA, T. AND NIJKAMP , P . (2011). Institutional capacity to dynamically innovate: an application to the portuguese case. Technological Forecasting & Social Change.18.(1):3-12.
VICENTE-VILLARDON, J. L., GALINDO M. P
. & BLAZQUEZ, A. (2006). Logistic Biplots. In “Múltiple Correspondence Análisis And Related Methods”. Grenacre, M & Blasius, J. Eds. Chapman and Hall. Boca Ratón.
ZAMBRANO, A.
Y., MARTÍNEZ, G., GUTIÉRREZ, E., MANZANILLA E., VICENTE-VILLARDÓN,
- J. L. & DEMEY, J. (2007). Marcador RAPD asociado a la resistencia a Fusarium Oxysporum en Musa.
- Interciencia. Vol 32, nº 11: 775-779. (2007).