A Unified Regularized Group PLS Algorithm Scalable to Big Data - - PowerPoint PPT Presentation

a unified regularized group pls algorithm scalable to big
SMART_READER_LITE
LIVE PREVIEW

A Unified Regularized Group PLS Algorithm Scalable to Big Data - - PowerPoint PPT Presentation

A Unified Regularized Group PLS Algorithm Scalable to Big Data Pierre Lafaye de Micheaux 1 , Benoit Liquet 2 , Matthew Sutton 3 21 October, 2016 1 CREST, ENSAI. 2 Universit e de Pau et des Pays de lAdour, LMAP . 3 Queensland Uninversity of


slide-1
SLIDE 1

A Unified Regularized Group PLS Algorithm Scalable to Big Data

Pierre Lafaye de Micheaux1, Benoit Liquet2, Matthew Sutton3 21 October, 2016

1 CREST, ENSAI. 2 Universit´

e de Pau et des Pays de l’Adour, LMAP .

3 Queensland Uninversity of Technology, Brisbane, Australia. Big Data PLS Methods JSTAR 2016, Rennes 1/54

slide-2
SLIDE 2

Contents

  • 1. Motivation: Integrative Analysis for group data
  • 2. Application on a HIV vaccine study
  • 3. PLS approaches: SVD, PLS-W2A, canonical, regression
  • 4. Sparse Models

◮ Lasso penalty ◮ Group penalty ◮ Group and Sparse Group PLS

  • 5. R package: sgPLS
  • 6. Regularized PLS Scalable to BIG-DATA
  • 7. Concluding remarks

Big Data PLS Methods JSTAR 2016, Rennes 2/54

slide-3
SLIDE 3

Integrative Analysis

  • Wikipedia. Data integration “involves combining data residing in dif-

ferent sources and providing users with a unified view of these data. This process becomes significant in a variety of situations, which in- clude both commercial and scientific domains”. System Biology. Integrative Analysis: Analysis of heterogeneous types of data from inter-platform technologies.

  • Goal. Combine multiple types of data:

◮ Contribute to a better understanding of biological mechanisms. ◮ Have the potential to improve the diagnosis and treatments of

complex diseases.

Big Data PLS Methods JSTAR 2016, Rennes 3/54

slide-4
SLIDE 4

Example: Data definition

  • n observations
  • p variables

X

n

  • n observations
  • q variables

Y

n p q Big Data PLS Methods JSTAR 2016, Rennes 4/54

slide-5
SLIDE 5

Example: Data definition

  • n observations
  • p variables

X

n

  • n observations
  • q variables

Y

n p q

◮ “Omics.” Y matrix: gene expression, X matrix: SNP (single nu-

cleotide polymorphism). Many others such as proteomic, metabolomic data.

Big Data PLS Methods JSTAR 2016, Rennes 4/54

slide-6
SLIDE 6

Example: Data definition

  • n observations
  • p variables

X

n

  • n observations
  • q variables

Y

n p q

◮ “Omics.” Y matrix: gene expression, X matrix: SNP (single nu-

cleotide polymorphism). Many others such as proteomic, metabolomic data.

◮ “Neuroimaging”. Y matrix: behavioral variables, X matrix: brain

activity (e.g., EEG, fMRI, NIRS)

Big Data PLS Methods JSTAR 2016, Rennes 4/54

slide-7
SLIDE 7

Example: Data definition

  • n observations
  • p variables

X

n

  • n observations
  • q variables

Y

n p q

◮ “Omics.” Y matrix: gene expression, X matrix: SNP (single nu-

cleotide polymorphism). Many others such as proteomic, metabolomic data.

◮ “Neuroimaging”. Y matrix: behavioral variables, X matrix: brain

activity (e.g., EEG, fMRI, NIRS)

◮ “Neuroimaging Genetics.” Y matrix: DTI (Diffusion Tensor Imag-

ing), X matrix: SNP

Big Data PLS Methods JSTAR 2016, Rennes 4/54

slide-8
SLIDE 8

Data: Constraints and Aims

◮ Main constraint: colinearity among the variables, or situation with

p > n or q > n. But p and q are supposed to be not too large.

Big Data PLS Methods JSTAR 2016, Rennes 5/54

slide-9
SLIDE 9

Data: Constraints and Aims

◮ Main constraint: colinearity among the variables, or situation with

p > n or q > n. But p and q are supposed to be not too large.

◮ Two Aims:

  • 1. Symmetric situation. Analyze the association between two blocks
  • f information. Analysis focused on shared information.

Big Data PLS Methods JSTAR 2016, Rennes 5/54

slide-10
SLIDE 10

Data: Constraints and Aims

◮ Main constraint: colinearity among the variables, or situation with

p > n or q > n. But p and q are supposed to be not too large.

◮ Two Aims:

  • 1. Symmetric situation. Analyze the association between two blocks
  • f information. Analysis focused on shared information.
  • 2. Asymmetric situation. X matrix= predictors and Y matrix=

response variables. Analysis focused on prediction.

Big Data PLS Methods JSTAR 2016, Rennes 5/54

slide-11
SLIDE 11

Data: Constraints and Aims

◮ Main constraint: colinearity among the variables, or situation with

p > n or q > n. But p and q are supposed to be not too large.

◮ Two Aims:

  • 1. Symmetric situation. Analyze the association between two blocks
  • f information. Analysis focused on shared information.
  • 2. Asymmetric situation. X matrix= predictors and Y matrix=

response variables. Analysis focused on prediction.

◮ Partial Least Square Family: dimension reduction approaches

Big Data PLS Methods JSTAR 2016, Rennes 5/54

slide-12
SLIDE 12

Data: Constraints and Aims

◮ Main constraint: colinearity among the variables, or situation with

p > n or q > n. But p and q are supposed to be not too large.

◮ Two Aims:

  • 1. Symmetric situation. Analyze the association between two blocks
  • f information. Analysis focused on shared information.
  • 2. Asymmetric situation. X matrix= predictors and Y matrix=

response variables. Analysis focused on prediction.

◮ Partial Least Square Family: dimension reduction approaches

◮ PLS finds pairs of latent vectors ξ = Xu, ω = Yv with maximal

covariance. e.g., ξ = u1 × SNP1 + u2 × SNP2 + · · · + up × SNPp

◮ Symmetric situation and Asymmetric situation. ◮ Matrix decomposition of X and Y into successive latent variables.

Latent variables: are not directly observed but are rather inferred (through a mathematical model) from other variables that are observed (directly measured). Capture an underlying phenomenon (e.g., health).

Big Data PLS Methods JSTAR 2016, Rennes 5/54

slide-13
SLIDE 13

PLS and sparse PLS

Classical PLS

◮ Output of PLS: H pairs of latent variables (ξh, ωh), h = 1, . . . , H. ◮ Reduction method (H << min(p, q)). But no variable selection for

extracting the most relevant (original) variables from each latent variable.

Big Data PLS Methods JSTAR 2016, Rennes 6/54

slide-14
SLIDE 14

PLS and sparse PLS

Classical PLS

◮ Output of PLS: H pairs of latent variables (ξh, ωh), h = 1, . . . , H. ◮ Reduction method (H << min(p, q)). But no variable selection for

extracting the most relevant (original) variables from each latent variable. sparse PLS

◮ sparse PLS selects the relevant SNPs ◮ Some coefficients uℓ are equal to 0

ξh = u1 × SNP1 +

u2

  • =0

×SNP2 +

u3

  • =0

×SNP3 + · · · + up × SNPp

◮ The sPLS components are linear combinations of the selected

variables

Big Data PLS Methods JSTAR 2016, Rennes 6/54

slide-15
SLIDE 15

Group structures within the data

◮ Natural example: Categorical variables form a group of dummy variables

in a regression setting.

Big Data PLS Methods JSTAR 2016, Rennes 7/54

slide-16
SLIDE 16

Group structures within the data

◮ Natural example: Categorical variables form a group of dummy variables

in a regression setting.

◮ Genomics: genes within the same pathway have similar functions and

act together in regulating a biological system. ֒→ These genes can add up to have a larger effect ֒→ can be detected as a group (i.e., at a pathway or gene set/module level).

Big Data PLS Methods JSTAR 2016, Rennes 7/54

slide-17
SLIDE 17

Group structures within the data

◮ Natural example: Categorical variables form a group of dummy variables

in a regression setting.

◮ Genomics: genes within the same pathway have similar functions and

act together in regulating a biological system. ֒→ These genes can add up to have a larger effect ֒→ can be detected as a group (i.e., at a pathway or gene set/module level). We consider that variables are divided into groups:

◮ Example: p SNPs grouped into K genes (Xj = SNPj)

X =

  • SNP1, . . . , SNPk
  • gene1

| SNPk+1, SNPk+2, . . . , SNPh

  • gene2

| . . . | SNPl+1, . . . , SNPp

  • geneK
  • ◮ Example: p genes grouped into K pathways/modules (Xj = genej)

X =

  • X1, X2, . . . , Xk
  • M1

| Xk+1, Xk+2, . . . , Xh

  • M2

| . . . | Xl+1, Xl+2, . . . , Xp

  • MK
  • Big Data PLS Methods

JSTAR 2016, Rennes 7/54

slide-18
SLIDE 18

Group PLS

Aim: select groups of variables taking into account the data structure

Big Data PLS Methods JSTAR 2016, Rennes 8/54

slide-19
SLIDE 19

Group PLS

Aim: select groups of variables taking into account the data structure

◮ PLS components

ξh = u1 × X1 + u2 × X2 + u3 × X3 + · · · + up × Xp

◮ sparse PLS components (sPLS)

ξh = u1 × X1 + u2

  • =0

×X2 + u3

  • =0

×X3 + · · · + up × Xp

Big Data PLS Methods JSTAR 2016, Rennes 8/54

slide-20
SLIDE 20

Group PLS

Aim: select groups of variables taking into account the data structure

◮ PLS components

ξh = u1 × X1 + u2 × X2 + u3 × X3 + · · · + up × Xp

◮ sparse PLS components (sPLS)

ξh = u1 × X1 + u2

  • =0

×X2 + u3

  • =0

×X3 + · · · + up × Xp

◮ group PLS components (gPLS)

ξh =

module1

  • u1
  • =0

X1 + u2

  • =0

X2 +

module2

  • u3
  • X3 +

u4

  • X1 +

u5

  • X5 + · · ·+

moduleK

  • up−1
  • =0

Xp−1 + up

  • =0

Xp ֒→ select groups of variables; either all the variables within a group are selected

  • r none of them are selected

Big Data PLS Methods JSTAR 2016, Rennes 8/54

slide-21
SLIDE 21

Group PLS

Aim: select groups of variables taking into account the data structure

◮ PLS components

ξh = u1 × X1 + u2 × X2 + u3 × X3 + · · · + up × Xp

◮ sparse PLS components (sPLS)

ξh = u1 × X1 + u2

  • =0

×X2 + u3

  • =0

×X3 + · · · + up × Xp

◮ group PLS components (gPLS)

ξh =

module1

  • u1
  • =0

X1 + u2

  • =0

X2 +

module2

  • u3
  • X3 +

u4

  • X1 +

u5

  • X5 + · · ·+

moduleK

  • up−1
  • =0

Xp−1 + up

  • =0

Xp ֒→ select groups of variables; either all the variables within a group are selected

  • r none of them are selected

... does not achieve sparsity within each group ...

Big Data PLS Methods JSTAR 2016, Rennes 8/54

slide-22
SLIDE 22

Sparse Group PLS

Aim: combine both sparsity of groups and within each group. Example: X matrix = genes. We might be interested in identifying particularly important genes in pathways of interest.

◮ sparse PLS components (sPLS)

ξh = u1 × X1 + u2

  • =0

×X2 + u3

  • =0

×X3 + · · · + up × Xp

◮ group PLS components (gPLS)

ξh =

module1

  • u1
  • =0

X1 + u2

  • =0

X2 +

module2

  • u3
  • X3 +

u4

  • X1 +

u5

  • X5 + · · ·+

moduleK

  • up−1
  • =0

Xp−1 + up

  • =0

Xp

Big Data PLS Methods JSTAR 2016, Rennes 9/54

slide-23
SLIDE 23

Sparse Group PLS

Aim: combine both sparsity of groups and within each group. Example: X matrix = genes. We might be interested in identifying particularly important genes in pathways of interest.

◮ sparse PLS components (sPLS)

ξh = u1 × X1 + u2

  • =0

×X2 + u3

  • =0

×X3 + · · · + up × Xp

◮ group PLS components (gPLS)

ξh =

module1

  • u1
  • =0

X1 + u2

  • =0

X2 +

module2

  • u3
  • X3 +

u4

  • X1 +

u5

  • X5 + · · ·+

moduleK

  • up−1
  • =0

Xp−1 + up

  • =0

Xp

◮ sparse group PLS components (sgPLS)

ξh =

module1

  • u1
  • =0

X1 + u2

  • =0

X2 +

module2

  • u3
  • X3 +

u4

  • =0

X4 + u5

  • =0

X5 + · · ·+

moduleK

  • up−1
  • =0

Xp−1 + up

  • =0

Xp

Big Data PLS Methods JSTAR 2016, Rennes 9/54

slide-24
SLIDE 24

Aims in a regression setting

◮ Select groups of variables taking into account the data structure;

all the variables within a group are selected otherwise none of them are selected

◮ Combine both sparsity of groups and within each group; only rel-

evant variables within a group are selected

Big Data PLS Methods JSTAR 2016, Rennes 10/54

slide-25
SLIDE 25

Illustration: Dendritic Cells in Addition to Antiretroviral Treatment (DALIA) trial

◮ Evaluation of the safety and the immunogenicity of a vaccine on n = 19

HIV-1 infected patients.

◮ The vaccine was injected on weeks 0, 4, 8 and 12 while patients re-

ceived an antiretroviral therapy. An interruption of the antiretrovirals was performed at week 24.

◮ After vaccination, a deep evaluation of the immune response was per-

formed at week 16.

◮ Repeated measurements of the main immune markers and gene ex-

pression were performed every 4 weeks until the end of the trials.

Big Data PLS Methods JSTAR 2016, Rennes 11/54

slide-26
SLIDE 26

DALIA trial: Question ?

First results obtained using group of genes

◮ Significant change of gene expression among 69 modules over

time before antiretroviral treatment interruption.

Big Data PLS Methods JSTAR 2016, Rennes 12/54

slide-27
SLIDE 27

DALIA trial: Question ?

First results obtained using group of genes

◮ Significant change of gene expression among 69 modules over

time before antiretroviral treatment interruption.

◮ How does the gene abundance of these 69 modules as measured

at week 16 correlate with immune markers measured at week 16?

Big Data PLS Methods JSTAR 2016, Rennes 12/54

slide-28
SLIDE 28

sPLS, gPLS and sgPLS

◮ Response variables Y= immune markers composed of q = 7 cy-

tokines (IL21, IL2, IL13, IFNg, Luminex score, TH1 score, CD4).

◮ Predictor variables X= expression of p = 5399 genes extracted

from the 69 modules.

◮ Use the structure of the data (modules) for gPLS and sgPLS.

Each gene belongs to one of the 69 modules.

◮ Asymmetric situation.

Big Data PLS Methods JSTAR 2016, Rennes 13/54

slide-29
SLIDE 29

Results: Modules and number of genes selected

p = 5399 ; 24 modules selected by gPLS or sgPLS on 3 scores

Big Data PLS Methods JSTAR 2016, Rennes 14/54

slide-30
SLIDE 30

Results: Modules and number of genes selected

Big Data PLS Methods JSTAR 2016, Rennes 15/54

slide-31
SLIDE 31

Results: Venn diagram

Big Data PLS Methods JSTAR 2016, Rennes 16/54

slide-32
SLIDE 32

Results: Venn diagram

◮ sgPLS selects slightly more genes than sPLS (respectively 487 and 420 genes selected) ◮ But sgPLS selects fewer modules than sPLS (respectively 21 and 64 groups of genes selected) ◮ Note: all the 21 groups of genes selected by sgPLS were included in those selected by sPLS. ◮ sgPLS selects slightly more modules than gPLS (4 more, 14/21 in common).

Big Data PLS Methods JSTAR 2016, Rennes 16/54

slide-33
SLIDE 33

Results: Venn diagram

◮ sgPLS selects slightly more genes than sPLS (respectively 487 and 420 genes selected) ◮ But sgPLS selects fewer modules than sPLS (respectively 21 and 64 groups of genes selected) ◮ Note: all the 21 groups of genes selected by sgPLS were included in those selected by sPLS. ◮ sgPLS selects slightly more modules than gPLS (4 more, 14/21 in common). ◮ However, gPLS leads to more genes selected than sgPLS (944) ◮ In this application, the sgPLS approach led to a parsimonious selection of modules and genes that sound very relevant biologically

Chaussabel’s functional modules: http://www.biir.net/public wikis/module annotation/V2 Trial 8 Modules Big Data PLS Methods JSTAR 2016, Rennes 16/54

slide-34
SLIDE 34

Stability of the variable selection (100 bootstrap samples)

Apoptosis / Survival Apoptosis / Survival Cell Cycle Cell Death Cytotoxic/NK Cell Erythrocytes Inflammation Mitochondrial Respiration Mitochondrial Stress / Proteasome Monocytes Neutrophils Plasma Cells Platelets Protein Synthesis T cell T cells Undetermined Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected

0.0 0.2 0.4 0.6 0.8 1.0 M7.1 M5.5 M5.1 M5.13 M3.5 M5.10 M5.7 M7.5 M4.7 M7.12 M5.4 M7.4 M7.11 M7.25 M6.2 M6.7 M5.9 M5.6 M7.8 M4.9 M1.1 M7.6 M4.6 M6.4 M7.2 M7.21 M5.3 M7.16 M5.8 M6.20 M6.10 M5.11 M4.12 M4.16 M6.9 M3.2 M7.15 M4.15 M2.1 M7.14 M6.14 M3.1 M4.1 M7.26 M7.24 M4.2 M6.6 M4.8 M6.13 M4.13 M5.14 M7.27 M8.14 M3.6 M7.35 M8.13 M5.15 M7.7 M7.28 M5.2 M4.5 M4.4 M7.33 M2.3 M4.3 M4.14 M8.59 M8.35 M4.11

Module Frequency Apoptosis / Sur sPLS − component 1 Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected 0.0 0.2 0.4 0.6 0.8 1.0

M5.15 M4.2 M6.6 M3.2 M7.27 M4.13 M4.15 M7.1 M6.13 M3.6 M7.35 M4.6 M1.1 M4.8 M5.14 M6.14 M2.3 M3.1 M4.4 M5.1 M4.7 M4.1 M2.1 M6.10 M5.3 M5.11 M7.8 M6.9 M5.5 M4.3 M7.33 M8.14 M4.14 M6.20 M4.11 M6.7 M8.59 M4.12 M7.14 M6.4 M7.15 M4.16 M7.24 M7.25 M5.7 M5.9 M3.5 M6.2 M7.2 M7.6 M8.13 M8.35 M5.8 M7.16 M4.5 M4.9 M5.2 M7.5 M7.7 M5.10 M5.4 M7.11 M7.4 M5.13 M5.6 M7.12 M7.21 M7.26 M7.28

Module Frequency Apoptosis / Sur gPLS − component 1 Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected Not Selected 0.0 0.2 0.4 0.6 0.8 1.0

M7.1 M5.1 M3.2 M5.15 M4.2 M4.13 M4.15 M4.6 M6.6 M7.27 M6.13 M5.7 M5.14 M1.1 M3.6 M3.5 M4.1 M5.5 M5.3 M4.7 M5.11 M2.3 M6.14 M3.1 M7.2 M4.4 M7.35 M5.4 M5.8 M7.5 M7.12 M7.15 M4.16 M7.8 M7.25 M6.4 M6.10 M6.7 M7.6 M7.11 M7.33 M8.14 M4.3 M5.10 M5.6 M5.9 M7.16 M7.7 M4.12 M4.14 M4.8 M7.24 M6.20 M7.21 M5.13 M4.9 M6.2 M2.1 M7.4 M7.14 M7.26 M4.5 M7.28 M8.35 M5.2 M8.13 M4.11 M6.9 M8.59

Module Frequency Apoptosis / Sur sgPLS − component 1

Stability of the variable selection assessed on 100 bootstrap samples

  • n DALIA-1 trial data, for the gPLS, sgPLS and sPLS procedures re-
  • spectively. For each procedure, the modules selected on the original

sample are separated from those that were not.

Big Data PLS Methods JSTAR 2016, Rennes 17/54

slide-35
SLIDE 35

Now some mathematics ...

Big Data PLS Methods JSTAR 2016, Rennes 18/54

slide-36
SLIDE 36

PLS family

PLS = Partial Least Squares or Projection to Latent Structures Four main methods coexist in the literature: (i) Partial Least Squares Correlation (PLSC) also called PLS-SVD; (ii) PLS in mode A (PLS-W2A, for Wold’s Two-Block, Mode A PLS); (iii) PLS in mode B (PLS-W2B) also called Canonical Correlation Analysis (CCA); (iv) Partial Least Squares Regression (PLSR, or PLS2).

Big Data PLS Methods JSTAR 2016, Rennes 19/54

slide-37
SLIDE 37

PLS family

PLS = Partial Least Squares or Projection to Latent Structures Four main methods coexist in the literature: (i) Partial Least Squares Correlation (PLSC) also called PLS-SVD; (ii) PLS in mode A (PLS-W2A, for Wold’s Two-Block, Mode A PLS); (iii) PLS in mode B (PLS-W2B) also called Canonical Correlation Analysis (CCA); (iv) Partial Least Squares Regression (PLSR, or PLS2).

◮ (i),(ii) and (iii) are symmetric while (iv) is asymmetric. ◮ Different objective functions to optimise. ◮ Good news: all use the singular value decomposition (SVD).

Big Data PLS Methods JSTAR 2016, Rennes 19/54

slide-38
SLIDE 38

Singular Value Decomposition (SVD)

Definition 1

Let a matrix M : p × q of rank r: M = U∆VT =

r

  • l=1

δlulvT

l ,

(1)

◮ U = (ul) : p × p and V = (vl) : q × q are two orthogonal matrices

which contain the normalised left (resp. right) singular vectors

◮ ∆ = diag(δ1, . . . , δr, 0, . . . , 0): the ordered singular values δ1 δ2

· · · δr > 0. Note: fast and efficient algorithms exist to solve the SVD.

Big Data PLS Methods JSTAR 2016, Rennes 20/54

slide-39
SLIDE 39

Connexion between SVD and maximum covariance

We were able to describe the optimization problem of the four PLS methods as:

(u∗, v∗) = argmax

u2=v2=1

Cov(Xh−1u, Yh−1v), h = 1, . . . , H. Matrices Xh and Yh are obtained recursively from Xh−1 and Yh−1.

Big Data PLS Methods JSTAR 2016, Rennes 21/54

slide-40
SLIDE 40

Connexion between SVD and maximum covariance

We were able to describe the optimization problem of the four PLS methods as:

(u∗, v∗) = argmax

u2=v2=1

Cov(Xh−1u, Yh−1v), h = 1, . . . , H. Matrices Xh and Yh are obtained recursively from Xh−1 and Yh−1. The four methods differ by the deflation process, chosen so that the above scores or weight vectors satisfy given constraints.

Big Data PLS Methods JSTAR 2016, Rennes 21/54

slide-41
SLIDE 41

Connexion between SVD and maximum covariance

We were able to describe the optimization problem of the four PLS methods as:

(u∗, v∗) = argmax

u2=v2=1

Cov(Xh−1u, Yh−1v), h = 1, . . . , H. Matrices Xh and Yh are obtained recursively from Xh−1 and Yh−1. The four methods differ by the deflation process, chosen so that the above scores or weight vectors satisfy given constraints. The solution at step h is obtained by computing only the first triplet

(δ1, u1, v1) of singular elements of the SVD of Mh−1 = XT

h−1Yh−1:

(u∗, v∗) = (u1, v1)

Big Data PLS Methods JSTAR 2016, Rennes 21/54

slide-42
SLIDE 42

Connexion between SVD and maximum covariance

We were able to describe the optimization problem of the four PLS methods as:

(u∗, v∗) = argmax

u2=v2=1

Cov(Xh−1u, Yh−1v), h = 1, . . . , H. Matrices Xh and Yh are obtained recursively from Xh−1 and Yh−1. The four methods differ by the deflation process, chosen so that the above scores or weight vectors satisfy given constraints. The solution at step h is obtained by computing only the first triplet

(δ1, u1, v1) of singular elements of the SVD of Mh−1 = XT

h−1Yh−1:

(u∗, v∗) = (u1, v1)

Why is this useful ?

Big Data PLS Methods JSTAR 2016, Rennes 21/54

slide-43
SLIDE 43

SVD properties

Theorem 2 Eckart-Young (1936) states that the (truncated) SVD of a given matrix M (of rank r) provides the best reconstitution (in a least squares sense) of M by a matrix with a lower rank k:

min

A of rank k M − A2 F =

  • M −

k

  • ℓ=1

δℓuℓvT

  • 2

F

=

r

  • ℓ=k+1

δ2

ℓ.

If the minimum is searched for matrices A of rank 1, which are under the form u vT where u, v are non-zero vectors, we obtain min

  • u,

v

  • M −

u vT

  • 2

F = r

  • ℓ=2

δ2

ℓ =

  • M − δ1u1vT

1

  • 2

F .

Big Data PLS Methods JSTAR 2016, Rennes 22/54

slide-44
SLIDE 44

SVD properties

Thus, solving

argmin

  • u,

v

  • Mh−1 −

u vT

  • 2

F

(2) and norming the resulting vectors gives us u1 and v1. This is an-

  • ther approach to solve the PLS optimization problem.

Big Data PLS Methods JSTAR 2016, Rennes 23/54

slide-45
SLIDE 45

Towards sparse PLS

◮ Shen and Huang (2008) connected (2) (in a PCA context) to least square

minimisation in regression:

  • Mh−1 −

u vT

  • 2

F =

  • vec(Mh−1)
  • y

− (I p ⊗ u) v

  • 2

2

=

  • vec(Mh−1)
  • y

− ( v ⊗ I q) u

  • 2

2

. ֒→ Possible to use many existing variable selection techniques using regularization penalties.

Big Data PLS Methods JSTAR 2016, Rennes 24/54

slide-46
SLIDE 46

Towards sparse PLS

◮ Shen and Huang (2008) connected (2) (in a PCA context) to least square

minimisation in regression:

  • Mh−1 −

u vT

  • 2

F =

  • vec(Mh−1)
  • y

− (I p ⊗ u) v

  • 2

2

=

  • vec(Mh−1)
  • y

− ( v ⊗ I q) u

  • 2

2

. ֒→ Possible to use many existing variable selection techniques using regularization penalties. We propose iterative alternating algorithms to find normed vectors

  • u/

u and v/ v that minimise the following penalised sum-of-squares criterion

  • Mh−1 −

u vT

  • 2

F + Pλ(

u, v), for various penalization terms Pλ( u, v).

֒→ We obtain several sparse versions (in terms of the weights u and

v) of the four methods (i)–(iv).

Big Data PLS Methods JSTAR 2016, Rennes 24/54

slide-47
SLIDE 47

Sparse PLS models

For cases (i)–(iv),

◮ Aim: obtaining sparse weight vectors uh and vh. ◮ Associated component scores (i.e., latent variables) ξh := Xh−1uh and

ωh := Yh−1vh, h = 1, . . . , H, for a small number of components.

◮ Recursive procedure with objective function involving Xh−1 and Yh−1

֒→ decomposition (approximation) of the original matrices X and Y: X = ΞHCT

H + FX,H,

Y = ΩHDT

H + FY,H,

(3) where Ξ = (ξh) and Ω = (ωh).

◮ For the regression mode, we have the multivariate linear regression

model Y = X BPLS + E, with BPLS = UH(CT

HUH)−1PHDT H and E is a matrix of residuals.

Big Data PLS Methods JSTAR 2016, Rennes 25/54

slide-48
SLIDE 48

Example case (ii): PLS-W2A

Definition 3

The objective function at step h is (uh, vh) = argmax

u2=v2=1

Cov(Xh−1u, Yh−1v) subject to the constraints: Cov(ξh, ξj) = Cov(ωh, ωj) = 0, 1 j < h. In order to satisfy these constraints: Xh = Pξ⊥

h Xh−1 and Yh = Pω⊥ h Yh−1, (X0 = X, Y0 = Y)

where ξh (resp. Ωh) is the first left (resp. right) singular vector obtained by applying a SVD to Mh−1 := XT

h−1Yh−1, h = 1, . . . , H.

Big Data PLS Methods JSTAR 2016, Rennes 26/54

slide-49
SLIDE 49

Regression mode (iv): PLSR, PLS2

◮ Aim of this asymmetric model is prediction. ◮ PLS2 finds latent variables that model X and simultaneously predict Y. ◮ Difference with PLS-W2A is the deflation step:

Xh = Pξ⊥

h Xh−1 and Yh = Pξ⊥ h Yh−1.

Big Data PLS Methods JSTAR 2016, Rennes 27/54

slide-50
SLIDE 50

The algorithm

Main steps of the iterative algorithm

  • 1. X0 = X, Y0 = Y h = 1
  • 2. Mh−1 := XT

h−1Yh−1.

  • 3. SVD: extraction of the first pair of singular vectors uh and vh.
  • 4. Sparsity step. Produces sparse weights usparse and vsparse.
  • 5. Latent variables: ξh = Xh−1usparse and ωh = Yh−1vsparse
  • 6. Slope coefficients:

◮ ch = XT

h−1ξh/ξT hξh for both modes

◮ dh = YT

h−1ξh/ξT hξh for “PLSR regression mode”

◮ eh = YT

h−1ωh/ωT hωh for “PLS mode A”

  • 7. Deflation:

◮ Xh = Xh−1 − ξhcT

h for both modes

◮ Yh = Yh−1 − ξhdT

h for “PLSR regression mode”

◮ Yh = Yh−1 − ωheT

h for “PLS mode A”

  • 8. If h = H stop, else h = h + 1 and goto step 2.

Big Data PLS Methods JSTAR 2016, Rennes 28/54

slide-51
SLIDE 51

Introducing sparsity

“Sparsity” implies many zeros in a vector or a matrix. (Credits: Jun Liu, Shuiwang Ji, and Jieping Ye)

Big Data PLS Methods JSTAR 2016, Rennes 29/54

slide-52
SLIDE 52

Introducing sparsity

Let θ be the model parameters to be estimated. A commonly employed method for estimating θ is

min [loss(θ) + λ penalty(θ)] .

This is equivalent to the following method:

min loss(θ)

subject to the constraints penalty(θ) z (for some z). Example: loss(θ) = 0.5θ − v2

2 for some fixed vector v.

Big Data PLS Methods JSTAR 2016, Rennes 30/54

slide-53
SLIDE 53

Why does L1 induce sparsity?

Analysis in 1D (comparison with L2) 0.5 × (θ − v)2 + λ|θ|

If v λ, θ = v − λ If v −λ, θ = v + λ Else, θ = 0 (sparsity!)

0.5 × (θ − v)2 + λθ2

θ =

v 1 + 2λ No sparsity here. Nondifferentiable at 0 Differentiable at 0

Big Data PLS Methods JSTAR 2016, Rennes 31/54

slide-54
SLIDE 54

Why does L1 induce sparsity?

Understanding from the projection

Big Data PLS Methods JSTAR 2016, Rennes 32/54

slide-55
SLIDE 55

Why does L1 induce sparsity?

Understanding from constrained optimization

Big Data PLS Methods JSTAR 2016, Rennes 33/54

slide-56
SLIDE 56

sparse PLS (sPLS)

In sPLS, the optimisation problem to solve is min

uh,vh

  • Mh − uhvT

h

  • 2

F + Pλ1,h(uh) + Pλ2,h(vh),

◮ Mh − uhvT

h2 F = p i=1

q

j=1(mij − uihvjh)2,

◮ Mh = XT

hYh for each iteration h.

◮ Pλ1,h(uh) = p

i=1 2λh 1|ui| and Pλ2,h(vh) = q j=1 2λh 2|vi| Big Data PLS Methods JSTAR 2016, Rennes 34/54

slide-57
SLIDE 57

sparse PLS (sPLS)

In sPLS, the optimisation problem to solve is min

uh,vh

  • Mh − uhvT

h

  • 2

F + Pλ1,h(uh) + Pλ2,h(vh),

◮ Mh − uhvT

h2 F = p i=1

q

j=1(mij − uihvjh)2,

◮ Mh = XT

hYh for each iteration h.

◮ Pλ1,h(uh) = p

i=1 2λh 1|ui| and Pλ2,h(vh) = q j=1 2λh 2|vi|

Iterative solution. Applying the thresholding function gsoft(x, λ) = sign(x)(|x| − λ)+

◮ to the vector Mvh componentwise to get uh. ◮ to the vector MTuh componentwise to get vh.

Big Data PLS Methods JSTAR 2016, Rennes 34/54

slide-58
SLIDE 58

group PLS (gPLS)

◮ X and Y can be divided respectively into K and L sub-matrices (groups) X(k) :

n × pk and Y(l) : n × ql.

◮ Same idea as Yuan and Lin (2006), we use group lasso penalties:

Pλ1(u) = λ1

K

  • k=1

√pku(k)2 and Pλ2(v) = λ2

L

  • l=1

√qlv(l)2, where u(k) (resp. v(l)) is the weight vector associated to the k-th (resp. l-th) block. In gPLS, the optimisation problem to solve is

K

  • k=1

L

  • l=1
  • M(k,l) − u(k)v(l)T
  • 2

F + Pλ1(u) + Pλ2(v),

◮ M(k,l) = X(k)Y(l)T.

Remark if the k-th block is composed by only one variable then u(k)2 =

  • (u(k))2 = |u(k)|.

Big Data PLS Methods JSTAR 2016, Rennes 35/54

slide-59
SLIDE 59

group PLS (gPLS)

Previous objective function can be written as

K

  • k=1
  • M(k,·) − u(k)vT2

F + λ1

√pku(k)2

  • + Pλ2(v)

where M(k,·) = X(k)YT. We can optimize (for v fixed) over groupwise components of u separately. First term above expands as:

trace[M(k,·)M(k,·)T] − 2trace[u(k)vTM(k,·)T] + trace[u(k)u(k)T]

Optimal u(k) thus optimizes

trace[u(k)u(k)T] − 2trace[u(k)vTM(k,·)T] + λ1 √pku(k)2.

This objective function is convex, so the optimal solution is character- ized by subgradient equations (subdifferential equals to 0).

Big Data PLS Methods JSTAR 2016, Rennes 36/54

slide-60
SLIDE 60

Subdifferential

Subderivative, subgradient, and subdifferential generalize the deriva- tive to functions which are not differentiable (e.g., |x| is nondifferen- tiable at 0). The subdifferential of a function is set-valued. Blue: convex function (nondifferentiable at x0). Slope of each red line = a subderivative at x0. The set [a, b] of all subderivatives is called the subdifferential of the function f at x0. If f is convex and its subdifferen- tial at x0 contains exactly one subderivative, then f is differentiable at x0.

Big Data PLS Methods JSTAR 2016, Rennes 37/54

slide-61
SLIDE 61

We have a = lim

x→x−

f(x) − f(x0) x − x0 and b = lim

x→x+

f(x) − f(x0) x − x0

.

Example: Consider the function f(x) = |x| which is convex. Then, the subdifferential at the origin is the interval [a, b] = [−1, 1]. The subdifferential at any point x0 < 0 is the singleton set {−1}, while the subdifferential at any point x0 > 0 is the singleton set {1}.

Big Data PLS Methods JSTAR 2016, Rennes 38/54

slide-62
SLIDE 62

For group k, u(k) must satisfy that the subdifferential is null:

− 2u(k) + 2M(k,·)v = λ1 √pkθ,

(4) where θ is a subgradient of u(k)2 evaluated at u(k). So,

θ =       

u(k) u(k)2

if u(k) 0;

∈ {θ : θ2 1}

if u(k) = 0. We can see that subgradient equations (4) are satisfied with u(k) = 0 if

M(k,·)v2 2−1λ1 √pk.

(5) For u(k) 0, equation (4) gives

− 2u(k) + 2M(k,·)v = λ1 √pk

u(k)

||u(k)||2 .

(6) Combining equations (5) and (6), we find: u(k) =

  • 1 − λ1

2

√pk ||M(k,·)v||2

  • +

M(k,·)v,

k = 1, . . . , K, (7) where (a)+ = max(a, 0).

Big Data PLS Methods JSTAR 2016, Rennes 39/54

slide-63
SLIDE 63

In the same vein, optimisation over v for a fixed u is also obtained by

  • ptimising over groupwise components:

v(l) =

     1 − λ2

2

√ql ||M(·,l)Tu||2      

+

M(·,l)Tu,

l = 1, . . . , L. (8) We thus obtain the following theorem.

Big Data PLS Methods JSTAR 2016, Rennes 40/54

slide-64
SLIDE 64

group PLS (gPLS)

Theorem 4

Solution of the group PLS optimisation problem is given by: u(k) =

  • 1 − λ1

2 √pk ||M(k,·)v||2

  • +

M(k,·)v (for fixed v) and v(l) =      1 − λ2 2 √ql ||M(·,l)Tu||2      

+

M(·,l)Tu (for fixed u).

Note: we will iterate until convergence of u(k) and v(l), using alterna- tively one of the above formulas.

Big Data PLS Methods JSTAR 2016, Rennes 41/54

slide-65
SLIDE 65

sparse group PLS: sparsity within groups

◮ Following Simon et al. (2013), we introduce sparse group lasso penalties:

Pλ1(u) = (1 − α1)λ1

K

  • k=1

√pku(k)2 + α1λ1u1, Pλ2(v) = (1 − α2)λ2

L

  • l=1

√qlv(l)2 + α2λ2v1.

Big Data PLS Methods JSTAR 2016, Rennes 42/54

slide-66
SLIDE 66

sparse group PLS (sgPLS)

Theorem 5

Solution of the sparse group PLS optimisation problem is given by: u(k) = 0 if

  • gsoft

M(k,·)v, λ1α1/2

  • 2 λ1(1 − α1) √pk ,
  • therwise

u(k) = 1 2         gsoft M(k,·)v, λ1α1/2

  • − λ1(1 − α1) √pk

gsoft M(k,·)v, λ1α1/2

  • gsoft

M(k,·)v, λ1α1/2

  • 2

         . We have v(l) = 0 if

  • gsoft

M(·,l)T u, λ2α2/2

  • 2

λ2(1 − α2) √ql and v(l) = 1 2              gsoft M(·,l)T u, λ1α1/2

  • − λ2(1 − α2) √ql

gsoft

  • M(·,l)T u, λ2α2/2
  • gsoft
  • M(·,l)T u, λ2α2/2
  • 2

            

  • therwise.

Similar proof (see our paper in Bioinformatics, 2016).

Big Data PLS Methods JSTAR 2016, Rennes 43/54

slide-67
SLIDE 67

R package: sgPLS

◮ sgPLS package implements sPLS, gPLS and sgPLS methods:

http://cran.r-project.org/web/packages/sgPLS/index.html

◮ Includes some functions for choosing the tuning parameters related to

the predictor matrix for different sparse PLS model (regression mode).

◮ Some simple code to perform a sgPLS:

model.sgPLS <- sgPLS(X, Y, ncomp = 2, mode = "regression", keepX = c(4, 4), keepY = c(4, 4), ind.block.x = ind.block.x , ind.block.y = ind.block.y, alpha.x = c(0.5, 0.5), alpha.y = c(0.5, 0.5))

◮ Last version also includes sparse group Discriminant Analysis.

Big Data PLS Methods JSTAR 2016, Rennes 44/54

slide-68
SLIDE 68

Regularized PLS scalable for BIG-DATA

What happens in a MASSIVE DATA SET context?

Big Data PLS Methods JSTAR 2016, Rennes 45/54

slide-69
SLIDE 69

Regularized PLS scalable for BIG-DATA

What happens in a MASSIVE DATA SET context? Massive datasets. The size of the data is large and analysing it takes a significant amount of time and computer memory. Emerson & Kane (2012). Dataset considered large if it exceeds 20% of the RAM (Random Access Memory) on a given machine, and massive if it exceeds 50%

Big Data PLS Methods JSTAR 2016, Rennes 45/54

slide-70
SLIDE 70

Case of a lot of observations: two massive data sets X: n × p matrix and Y: n × q matrix due to a large number of observations. We suppose here that n is very large, but not p nor q.

Big Data PLS Methods JSTAR 2016, Rennes 46/54

slide-71
SLIDE 71

Case of a lot of observations: two massive data sets X: n × p matrix and Y: n × q matrix due to a large number of observations. We suppose here that n is very large, but not p nor q. PLS algorithm mainly based on the SVD of Mh−1 = XT

h−1Yh−1:

Big Data PLS Methods JSTAR 2016, Rennes 46/54

slide-72
SLIDE 72

Case of a lot of observations: two massive data sets X: n × p matrix and Y: n × q matrix due to a large number of observations. We suppose here that n is very large, but not p nor q. PLS algorithm mainly based on the SVD of Mh−1 = XT

h−1Yh−1:

Dimension of Mh−1: p × q matrix !! This matrix fits into memory. But not X nor Y.

Big Data PLS Methods JSTAR 2016, Rennes 46/54

slide-73
SLIDE 73

Computation of M = XTY by chunks

M = XTY =

G

  • g=1

XT

(g)Y(g)

All terms fit (successively) into memory!

Big Data PLS Methods JSTAR 2016, Rennes 47/54

slide-74
SLIDE 74

Computation of M = XTY by chunks using R

◮ No need to load the big matrices X and Y ◮ Use memory-mapped files (called “filebacking”) through the big-

memory package to allow matrices to exceed the RAM size.

◮ A big.matrix is created which supports the use of shared memory

for efficiency in parallel computing.

◮ foreach: package for running in parallel the computation of M by

chunks

Big Data PLS Methods JSTAR 2016, Rennes 48/54

slide-75
SLIDE 75

Computation of M = XTY by chunks using R

◮ No need to load the big matrices X and Y ◮ Use memory-mapped files (called “filebacking”) through the big-

memory package to allow matrices to exceed the RAM size.

◮ A big.matrix is created which supports the use of shared memory

for efficiency in parallel computing.

◮ foreach: package for running in parallel the computation of M by

chunks Regularized PLS algorithm:

◮ Computation of the components (“Scores”):

Xu (n × 1) and Yv (n × 1)

◮ Easy to compute by chunks and store in a big.matrix object.

Big Data PLS Methods JSTAR 2016, Rennes 48/54

slide-76
SLIDE 76

Illustration of group PLS with Big-Data

◮ Simulated: X (5GB) and Y (5GB); ◮ n = 560, 000 observations, p = 400 and q = 500; ◮ Linked by two latent variables, made up of sparse linear combina-

tions of the original variables;

◮ Both X and Y have a group structure: 20 groups of 20 variables

for X and 25 groups of 20 variables for Y;

◮ Only 4 groups in each data set are relevant, 5 variables in each of

these groups are not relevant.

Big Data PLS Methods JSTAR 2016, Rennes 49/54

slide-77
SLIDE 77

Figure 1: Comparison of gPLS and BIG-gPLS (for small n = 1, 000)

Big Data PLS Methods JSTAR 2016, Rennes 50/54

slide-78
SLIDE 78

Figure 2: Use of BIG-gPLS. Left: small n. Right: Large n. Blue: truth. Red: Recovered.

Big Data PLS Methods JSTAR 2016, Rennes 51/54

slide-79
SLIDE 79

Regularised PLS Discriminant Analysis

Categorical response variable becomes a dummy matrix in PLS algo- rithms:

Big Data PLS Methods JSTAR 2016, Rennes 52/54

slide-80
SLIDE 80

Concluding Remarks and Take Home Message

We were able to derive a simple unified algorithm that perfoms stan- dard, sparse, group and sparse group versions of the four classical PLS algorithms (i)–(iv). (And also PLSDA.) We used big memory objects, and a simple trick that makes our pro- cedure scalable to big data (large n). We also parallelized the code for faster computation. This will soon been made available in our new R package: bigsgPLS. Eager to apply to real neuroimaging data sets! We are currently working on a batch version of this algorithm, as well as a large n and large p version of it.

Big Data PLS Methods JSTAR 2016, Rennes 53/54

slide-81
SLIDE 81

References

◮ Yuan M. and Lin Y. (2006) Model Selection and Estimation in Regression with

Grouped Variables. Journal of the Royal Statistical Society: Series B (Sta- tistical Methodology), 68 (1), 49–67.

◮ Simon N., Friedman J., Hastie T. and Tibshirani R. (2013) A Sparse-group

  • Lasso. Journal of Computational and Graphical Statistics, 22 (2), 231–245.

◮ Liquet B., Lafaye de Micheaux P

., Hejblum B. and Thiebaut R., (2016) Group and Sparse Group Partial Least Square Approaches Applied in Genomics Context. Bioinformatics, 32(1), 35–42.

◮ Lafaye de Micheaux P

., Liquet B. and Sutton M., A Unified Parallel Algorithm for Regularized Group PLS Scalable to Big Data (in progress).

Thank you! Questions?

Big Data PLS Methods JSTAR 2016, Rennes 54/54