LOGISTIC BIPLOTS FOR BINARY, NOMINAL AND ORDINAL DATA Jos Luis - - PowerPoint PPT Presentation

logistic biplots for binary nominal and ordinal data
SMART_READER_LITE
LIVE PREVIEW

LOGISTIC BIPLOTS FOR BINARY, NOMINAL AND ORDINAL DATA Jos Luis - - PowerPoint PPT Presentation

LOGISTIC BIPLOTS FOR BINARY, NOMINAL AND ORDINAL DATA Jos Luis Vicente Villardn Universidad de Salamanca. Spain. villardon@usal.es http://biplot.usal.es SUMMARY Classical Biplot methods allow for the simultaneous representation of


slide-1
SLIDE 1

LOGISTIC BIPLOTS FOR BINARY, NOMINAL AND ORDINAL DATA

José Luis Vicente Villardón Universidad de Salamanca. Spain. villardon@usal.es http://biplot.usal.es

slide-2
SLIDE 2

SUMMARY Classical Biplot methods allow for the simultaneous representation of individuals and continuous variables in a given data matrix. When variables are binary, nominal or ordinal, a classical linear biplot representation is not

  • suitable. We propose a linear biplot representation based on logistic response
  • models. The coordinates of individuals and variables are computed to have

logistic responses along the biplot dimensions. The method is related to logistic regression in the same way that Classical Biplot Analysis (CBA) is related to linear regression, thus we refer to the method as Logistic Biplot (LB). In the same way as Linear Biplots are related to Principal Components Analysis, Logistic Biplots are related to Latent Trait Analysis or Item Response Theory. The geometry of those kinds of biplots is studied: For nominal data, the linear biplot results in a partition of the representation the divides the space onto a prediction region for each category; for ordinal data, we obtain a prediction direction with points separating each category. The usefulness of the proposal is illustrated using data on SNPs (Single Nucleotide Polymorphisms) from the HAPMAP project.

slide-3
SLIDE 3

1.- INTRODUCTION 2.- CLASSICAL BIPLOT 2.1 Linear biplot based on alternating regressions/interpolations 2.2 Geometry of linear regression biplots 3.- LOGISTIC BIPLOT FOR BINARY DATA 3.1.-Formulation 3.2.- Parameter Estimation 3.2.- Geometry of logistic Biplots 4.- LOGISTIC BIPLOT FOR CATEGORICAL & ORDINAL DATA 4.1.-Formulation 4.2.- Parameter Estimation 4.3.- Geometry of logistic Biplots 4.- APPLICATION: HAPMAP

slide-4
SLIDE 4

1.- INTRODUCTION

  • Item Response Theory (BAKER, 1992)

Biplot (GABRIEL, 1971)

  • Alternate regressions. (GABRIEL y ZAMIR, 1979)
  • Regression/calibration
  • Interpolation (GOWER & HAND, 1996)

CONTINUOUS DATA CATEGORICAL DATA

  • Multiple Correspondence Analysis (ACM).
  • Prediction Regions (GOWER & HAND, 1996)
  • Generalised bilinear models. (FALGUEROLLES et al, 1995)
  • Segmented Bilinear Models (GABRIEL,1999)

LOGISTIC BIPLOT

slide-5
SLIDE 5

2.- CLASSICAL BIPLOTS

slide-6
SLIDE 6
slide-7
SLIDE 7

2.2 Geometry of linear regression biplots

slide-8
SLIDE 8
  • Let L be the space spanned by the columns of A, usually two-dimensional. We complete L

with a third dimension for the j-th variable and adjust the regression plane.

  • A linear response surface in the three-dimensional space is obtained. Let us call it H.
  • The set of points in H predicting a fixed value is given by the intersection between the

plane normal to the third axis passing trough the fixed value, and H. That intersection is a straight line.

  • The points in L predicting different values of the variable are also on parallel straight lines.
  • The biplot axis can be completed with scales.
slide-9
SLIDE 9

The squared R for the regressions is interpreted as a measure of the “quality of the representation” in the sense commonly used in Correspondence Analysis.

slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13

ALGORITHM

slide-14
SLIDE 14

Interpolation

slide-15
SLIDE 15
  • If we fix the markers in A and adjust a logistic model for a two-dimensional

representation, we obtain a logistic response surface H. In this case the third axis show a scale for the expected probabilities.

  • Although the response surface is non linear, the intersections of the planes normal to

the probability axis and H are also straight lines on H.

  • The points in L predicting different probabilities are also on parallel straight lines.
slide-16
SLIDE 16
  • The direction of bj is given by (bj1, bj2), the parameter estimates of the logistic biplot.
  • The biplot axis bj is completed with marks for projection points predicting probabilities;

the main difference with the linear biplot is that equally spaced marks do not correspond with equally spaced probabilities.

  • To simplify the graphical representation we propose to add marks for fixed values of the

predictions, for example, .25, .50 and .75. This will look like a symmetrical box-plot and no labels are necessary.

slide-17
SLIDE 17
slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20
  • 1. LOGISTIC BIPLOT FOR NOMINAL DATA

Let XIxJ be a data matrix containing the values of J categorical variables -each with Kj (j=1, …, J) categories- for I individuals, and let GIxL be the corresponding indicator matrix with columns. The last category of each variable will be used as a baseline. Let pi(jk) the expected probability that the category k of variable j be present at individual i. In the he multinomial logistic latent trait model we assume that the log-odds of each response (relative to the last category) follows a linear model where ais and b(jk)s (i=1, …,I; j=1, …,J; k=1,…,Kj-1 s=1, ..., S) are the model

  • parameters. In matrix form

where OIxL is the matrix containing the expected odds, defines a biplot for the odds. Although the biplot for the odds may be useful, it would be more interpretable in terms of predicts probabilities and categories.

slide-21
SLIDE 21

The points in L predicting different probabilities are no longer on parallel straight lines (see the figure with the response surfaces); this means that predictions on the logistic biplot are not made in the same way as in the linear biplots, the surfaces define now prediction regions for each category as shown in the contour graph.

slide-22
SLIDE 22

The points in L predicting different probabilities are no longer on parallel straight lines (see the figure with the response surfaces); this means that predictions on the logistic biplot are not made in the same way as in the linear biplots, the surfaces define now prediction regions for each category as shown in the contour graph.

slide-23
SLIDE 23

The points in L predicting different probabilities are no longer on parallel straight lines (see the figure with the response surfaces); this means that predictions on the logistic biplot are not made in the same way as in the linear biplots, the surfaces define now prediction regions for each category as shown in the contour graph.

slide-24
SLIDE 24

The points in L predicting different probabilities are no longer on parallel straight lines (see the figure with the response surfaces); this means that predictions on the logistic biplot are not made in the same way as in the linear biplots, the surfaces define now prediction regions for each category as shown in the contour graph.

slide-25
SLIDE 25
  • 3. LOGISTIC BIPLOT FOR ORDINAL DATA

Let XIxJ be a data matrix containing the values of J ordinal variables -each with Kj (j=1, …, J) categories- for I individuals, and let GIxL be the cumulative indicator matrix with columns. The last category of each variable will be used as a baseline. Let pi(j≤k) = P(xij≤k). The ordinal logistic latent trait model for the cumulative probabilities is The equations define a biplot in the logit scale that shares the geometry of the binary case for each category. Observe that each category have a different constant but the same slopes, that means that the prediction direction is common to all categories and just the prediction markers are different. The parameters b define the direction

  • f the projection; the representation subspace can be divided into prediction regions,

for each category, delimited by parallel straight lines.

slide-26
SLIDE 26

Ordinal data: Cummulative probabilities

slide-27
SLIDE 27

Ordinal Data: Expected probabilities

slide-28
SLIDE 28

Ordinal Data: Back Projection onto the biplot

slide-29
SLIDE 29
  • 4. PARAMETER ESTIMATION
  • Alternated generalized regressions and interpolations. (Maximum Likelihood).
  • Marginal Maximum Likelihood (As in Item Response Theory).
  • Separation problem (Maximum likelihood does not converge). Penalized Maximum

Likelihood.

  • Iterative Majorization.
  • Other Methods
  • Heuristic approach for big data matrices: External Logistic Biplots (Logistic fits on

the Principal Coordinates).

  • 5. APPLICATIONS
  • HAPMAP Data.
  • Sugar cane germoplasm.
  • Innovation profiles in Portugal.
  • Irregular working force in Spain.
slide-30
SLIDE 30

APPLICATION TO MAPMAP DATA

slide-31
SLIDE 31
slide-32
SLIDE 32
slide-33
SLIDE 33

Whole Set Bonferroni Selection R2 Selection

slide-34
SLIDE 34

Interpretation of the biplot

slide-35
SLIDE 35

Characterization of the groups

slide-36
SLIDE 36

http://biplot.usal.es/ClassicalBiplot/index.html

MULTBIPLOT (MULTivariate analysis using BIPLOTs

slide-37
SLIDE 37
slide-38
SLIDE 38
slide-39
SLIDE 39
slide-40
SLIDE 40
slide-41
SLIDE 41

References

 BAKER, F.B. (1992): Item Response Theory. Parameter Estimation Techniques. Marcel Dekker. New

York.

 GABRIEL, K. R. (1998). Generalised bilinear regresión. Biometrika, 85: 689 – 700.  GOWER, J. C. & HAND, D. (1986): Biplots. Chapman & Hall. London.  DEMEY, J., VICENTE-VILLARDON, J. L., GALINDO, M.P

. & ZAMBRANO, A. (2008) Identifying Molecular Markers Associated With Classification Of Genotypes Using External Logistic Biplots. Bioinformatics, 24(24):2832-2838.

 PATINO MC., VICENTE P

., VICENTE-VILLARDÓN JL. & GALINDO P . (2010) Identifying Inmigrant Women patterns by external Logistic Biplots. Sociological Methodology (En revisión).

 VERBOON, P

. & HEISER, W . J. (1994). Resistant Lower Rank Approximation of Matrices by Iterative

  • Majorization. Computational Statistics & Data Analysis. 18: 457-467.

 VICENTE, M.P

.; NORONHA, T. AND NIJKAMP , P . (2011). Institutional capacity to dynamically innovate: an application to the portuguese case. Technological Forecasting & Social Change.18.(1):3-12.

 VICENTE-VILLARDON, J. L., GALINDO M. P

. & BLAZQUEZ, A. (2006). Logistic Biplots. In “Múltiple Correspondence Análisis And Related Methods”. Grenacre, M & Blasius, J. Eds. Chapman and Hall. Boca Ratón.

 ZAMBRANO, A.

Y., MARTÍNEZ, G., GUTIÉRREZ, E., MANZANILLA E., VICENTE-VILLARDÓN,

  • J. L. & DEMEY, J. (2007). Marcador RAPD asociado a la resistencia a Fusarium Oxysporum en Musa.
  • Interciencia. Vol 32, nº 11: 775-779. (2007).

GRACIAS – MERCI – THANK YOU