Extensions of CCA and PLS to unravel relationships between two data - - PowerPoint PPT Presentation

extensions of cca and pls to unravel relationships
SMART_READER_LITE
LIVE PREVIEW

Extensions of CCA and PLS to unravel relationships between two data - - PowerPoint PPT Presentation

Extensions of CCA and PLS to unravel relationships between two data sets S. Djean (1) - I. Gonzlez (2) - K-A. L Cao (3) 1. Institut de Mathmatiques de Toulouse, UMR 5219 Universit de Toulouse et CNRS


slide-1
SLIDE 1

Extensions of CCA and PLS to unravel relationships between two data sets

  • S. Déjean(1) - I. González(2) - K-A. Lê Cao(3)
  • 1. Institut de Mathématiques de Toulouse, UMR 5219

Université de Toulouse et CNRS

sebastien.dejean@math.univ-toulouse.fr

  • 2. Plateforme Biopuces, Genopôle Toulouse Midi-Pyrénées

Institut National des Sciences Appliquées

ignacio.gonzalez@insa-toulouse.fr

  • 3. ARC Centre of Excellence in Bioinformatics

Institute for Molecular Bioscience, University of Queensland, Australia

k.lecao@uq.edu.au Conference 2009 – Rennes, July 8-10

slide-2
SLIDE 2

UseR Conference 2009, Rennes, July 8-10

  • S. Déjean, I. González, K-A Lê Cao – integrOmics

2 / 10

History

Once upon a time in Toulouse, a city in South West of France, two groups of scientists lived nearly together without talking to each other. But one day, they decided to do so and to work together. They had Ph.D students, wrote articles and built R packages...

DNA RNA ATGCC TACCAGT

1 n∑

i=1 n

X i  X ' X 

−1 X ' Y

« Stat » « Bio »

1 n∑

i=1 n

X i

ATGCC

  • fw

CCA integrOmics SAGMB BMC Bioinformatics

  • J. Biol.

Syst.

  • J. Stat.

Soft.

slide-3
SLIDE 3

UseR Conference 2009, Rennes, July 8-10

  • S. Déjean, I. González, K-A Lê Cao – integrOmics

3 / 10

Why integrOmics ?

Biological system

Transcript-omics Prote-omics Metabol-omics Lipid-omics Gen-omics ...-omics

  • Each « -omics » data set can

be studied separately, but

  • A great part of relevant

information can be extracted from joint analysis of 2 or several datasets, so ⇒ Integrate omics data project

in short :

integrOmics

math.univ-toulouse.fr/biostat

Methodology Applications Software

slide-4
SLIDE 4

UseR Conference 2009, Rennes, July 8-10

  • S. Déjean, I. González, K-A Lê Cao – integrOmics

4 / 10

Methodology

CCA / regularization

  • Maximize correlation

between linear combination

  • f variables in X and Y
  • Requires inversion of XX'

and YY'

  • Regularization of CCA

PLS / selection

  • Maximize covariance

between linear combination

  • f variables in X and Y
  • Selection obtained through

Lasso penalization of loading vectors Two ways to deal with the 'large p - small n' problem in the classical framework provided by Canonical Correlation Analysis and Partial Least Squares regression.

X n p Y q

 XX ' 

−1⇒ XX 'X I n −1

Methodology

slide-5
SLIDE 5

UseR Conference 2009, Rennes, July 8-10

  • S. Déjean, I. González, K-A Lê Cao – integrOmics

5 / 10

Applications

  • I. González, S. Déjean, P.G.P. Martin, O. Gonçalves, P. Besse,
  • A. Baccini (2009). Highlighting Relationships Between

Heterogeneous Biological Data Through Graphical Displays Based On Regularized Canonical Correlation Analysis. Journal of Biological Systems, 17(2), 173-199

  • E. Yergeau, S.A. Schoondermark-Stolk, E.L. Brodie, S. Déjean, T.Z.

DeSantis, O. Gonçalves, Y.M. Piceno, G.L. Andersen and G.A. Kowalchuk (2008). Environmental microarray analyses of Antarctic soil microbial communities. The International Society for Microbial Ecology Journal, 3, 340-351

  • S. Combes, I. González, S. Déjean, A. Baccini, N. Jehl, H. Juin, L.

Cauquil, B. Gabinaud, F. Lebas, C. Larzul (2008). Relationships between sensory and physicochemical measurements in meat of rabbit from three different breeding systems using canonical correlation analysis. Meat science, 80(3), 835-841

  • K. A. Lê Cao, D. Rossouw, C. Robert-Granié, P. Besse

(2008). A sparse PLS for variable selection when integrating Omics data, Statistical Applications in Genetics and Molecular Biology, 7(1), article 35

Applications

slide-6
SLIDE 6

UseR Conference 2009, Rennes, July 8-10

  • S. Déjean, I. González, K-A Lê Cao – integrOmics

6 / 10

IntegrOmics R package

Software

slide-7
SLIDE 7

UseR Conference 2009, Rennes, July 8-10

  • S. Déjean, I. González, K-A Lê Cao – integrOmics

7 / 10

Using integrOmics

  • From X and Y two matrices
  • Preliminary view of the correlations matrices

R> imgCor(X, Y, type = "separate")

  • Classical CCA

R> res.rcc = rcc(X, Y)

  • Regularized CCA

R> res.rcc = rcc(X, Y, 0.05, 0.01)

  • PLS

R> res.pls = pls(X, Y)

  • Sparse PLS

R> res.pls = spls(X, Y, mode=c("regression", "canonical"),

+ keep.X=c(10, 10, 10), keep.Y=c(10, 10, 10))

Software

slide-8
SLIDE 8

UseR Conference 2009, Rennes, July 8-10

  • S. Déjean, I. González, K-A Lê Cao – integrOmics

8 / 10

Graphical display

  • 1.0
  • 0.5

0.0 0.5 1.0

  • 1.0
  • 0.5

0.0 0.5 1.0 Comp 1 Comp 2 C14.0 C16.0 C18.0 C16.1n.9 C16.1n.7 C18.1n.9 C18.1n.7 C20.1n.9 C20.3n.9 C18.2n.6 C18.3n.6 C20.2n.6 C20.3n.6 C20.4n.6 C22.4n.6 C22.5n.6 C20.3n.3 C20.5n.3 C22.5n.3 C22.6n.3 ACOTH AOX BIEN BSEP CPT2 CYP27a1 CYP2c29 CYP3A11 CYP4A10 FAS FAT GK GSTa GSTpi2 HMGCoAred HPNCL LDLr Lpin2 MCAD Ntcp PLTP PMDCI SPI1.1 SR.BI THIOL UCP2 apoA.I apoC3 cHMGCoAS

  • 2
  • 1

1

  • 1.5
  • 1.0
  • 0.5

0.0 0.5 1.0 1.5 Dimension 1 Dimension 2 lin sun sun fish ref coc lin lin fish coc fish ref sun ref sun lin coc fish coc ref coc ref sun fish sun ref ref lin fish lin coc coc ref sun fish coc lin fish lin sun WT PPAR

Variables plot Individuals plot R> plotVar(res.rcc, + X.label=T,Y.label=T) R> plotIndiv(res.rcc), + ind.names=nutrimouse$diet

Software

slide-9
SLIDE 9

UseR Conference 2009, Rennes, July 8-10

  • S. Déjean, I. González, K-A Lê Cao – integrOmics

9 / 10

Future work

1 2 3 4 5 6 7 8 9 10 11 12 13 14
  • Methodologies to deal

with more than 2 data sets

  • Functional statistics to

deal with metabolomics

  • r proteomics data
  • Provide new graphical display using

graphs

50 100 150 200 10 20 30 40

n

p1 p2 p3 p4

Software

slide-10
SLIDE 10

UseR Conference 2009, Rennes, July 8-10

  • S. Déjean, I. González, K-A Lê Cao – integrOmics

10 / 10

Summary

2003 2004 2005 2006 2007 2008 2009 CCA integrOmics

First steps of collaborations between biologists and statisticians in Toulouse around Omics technologies

Ph.D K-A. Lê Cao

(P. Besse, C. Robert-Granié)

  • fw

First article « state

  • f the

art »

Ph.D I. González

(A. Baccini, J.R. Léon)

Conference Rennes