Orthogonal grey simultaneous component analysis to distinguish - - PowerPoint PPT Presentation

orthogonal grey simultaneous component analysis to
SMART_READER_LITE
LIVE PREVIEW

Orthogonal grey simultaneous component analysis to distinguish - - PowerPoint PPT Presentation

Orthogonal grey simultaneous component analysis to distinguish common and distinctive information in coupled data Martijn Schouteden Katrijn Van Deun Iven Van Mechelen Outline Introduction Coupled data Research questions


slide-1
SLIDE 1

Orthogonal grey simultaneous component analysis to distinguish common and distinctive information in coupled data

Martijn Schouteden Katrijn Van Deun Iven Van Mechelen

slide-2
SLIDE 2

Outline

  • Introduction

– Coupled data – Research questions

  • Method

– Simultaneous component method – Problem – Solution: DISCO-GSCA

  • Illustration

– Results

  • Conclusion
slide-3
SLIDE 3

Outline

  • Introduction

– Coupled data – Research questions

  • Method

– Simultaneous component method – Problem – Solution: DISCO-GSCA

  • Illustration

– Results

  • Conclusion
slide-4
SLIDE 4
  • Coupled data: data that consist of different data blocks,

which all contain information about the same entities

– E.g.

  • Data blocks = GC/MS and LC/MS
  • Variables = E. coli metabolites
  • Objects = condition

Introduction

Condition

LC/MS GC/MS

Smilde et al. (2005)

Metabolites

slide-5
SLIDE 5
  • Coupled data: data that consist of different data blocks,

which all contain information about the same entities

– E.g.

  • Data blocks = GC/MS and LC/MS
  • Variables = E. coli metabolites
  • Objects = condition

Introduction

Metabolites

LC/MS GC/MS

Smilde et al. (2005)

1 … J1 1 … J2 1 . . . I

Condition

slide-6
SLIDE 6
  • Finding mechanisms that underly the coupled data
  • RESEARCH QUESTIONS: which mechanisms are

– common for both data blocks and – distinctive for a single data block? Which metabolome processes are measured by both separation techniques? Which processes are measured by just one of the two?

slide-7
SLIDE 7

Outline

  • Introduction

– Coupled data – Research questions

  • Method

– Simultaneous component method – Problem – Solution: DISCO-GSCA

  • Illustration

– Results

  • Conclusion
slide-8
SLIDE 8

Outline

  • Introduction

– Coupled data – Research questions

  • Method

– Simultaneous component method – Problem – Solution: DISCO-GSCA

  • Illustration

– Results

  • Conclusion
slide-9
SLIDE 9

Simultaneous Component Analysis

  • Finding underlying mechanisms in

– ONE data block Principal Component Analysis (PCA, Jolliffe, 2002) – More data blocks Simultaneous Component Analysis (SCA, Van Deun et al., 2009)

slide-10
SLIDE 10

Simultaneous Component Analysis

LC/MS GC/MS

1 . . . I 1 … J1 1 … J2

slide-11
SLIDE 11

Simultaneous Component Analysis

LC/MS GC/MS LC/MS GC/MS

1 . . . I 1 … J1+J2

slide-12
SLIDE 12

Simultaneous Component Analysis

LC/MS GC/MS LC/MS GC/MS

conc

X

1 . . . I 1 … J1+J2

slide-13
SLIDE 13

Simultaneous Component Analysis

LC/MS GC/MS LC/MS GC/MS

conc =

X

+ x T

'

LC

P

'

GC

P

LC

E

GC

E

1 . . . I 1 … J1+J2

= Scores Loadings Error

1 2

× ( + ) I J J

× I R

1 2

× ( + ) R J J

'

P

conc

conc

E

1 2

×( + ) I J J

x +

Data

slide-14
SLIDE 14

Simultaneous Component Analysis

LC/MS GC/MS LC/MS GC/MS

conc =

X

+ x T

'

LC

P

'

GC

P

LC

E

GC

E

2 '

min

conc

conc conc T,P

X

  • TP

Objective:

1 . . . I 1 … J1+J2

= Scores Loadings Error

1 2

× ( + ) I J J

Data

× I R

1 2

× ( + ) R J J

'

P

conc

conc

E

1 2

×( + ) I J J

x +

slide-15
SLIDE 15
  • Distinctive mechanisms= simultaneous components that underly
  • nly one data block
  • Common mechanisms= simultaneous components that underly

both data blocks

slide-16
SLIDE 16
  • Distinctive mechanisms= simultaneous components that underly
  • nly one data block
  • Common mechanisms= simultaneous components that underly

both data blocks

  • E.g.,

' ' '

|

conc conc LC GC

= ⎡ ⎤ ⎣ ⎦ X TP = T P P )

x x ⎡ ⎤ ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ M

[ ]

| x x L L

slide-17
SLIDE 17
  • Distinctive mechanisms= simultaneous components that underly
  • nly one data block
  • Common mechanisms= simultaneous components that underly

both data blocks

  • E.g.,

' ' '

|

conc conc LC GC

= ⎡ ⎤ ⎣ ⎦ X TP = T P P )

x x ⎡ ⎤ ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ M

[ ]

| x x L L

Distinctive component for GC/MS

slide-18
SLIDE 18
  • Distinctive mechanisms= simultaneous components that underly
  • nly one data block
  • Common mechanisms= simultaneous components that underly

both data blocks

  • E.g.,

' ' '

| | | |

conc LC GC

x x x x x x x x ⎡ ⎤ = ⎣ ⎦ ⎡ ⎤ ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ P P P L L L L L L

slide-19
SLIDE 19
  • Distinctive mechanisms= simultaneous components that underly
  • nly one data block
  • Common mechanisms= simultaneous components that underly

both data blocks

  • E.g.,

' ' '

| | | |

conc LC GC

x x x x x x x x ⎡ ⎤ = ⎣ ⎦ ⎡ ⎤ ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ P P P L L L L L L

D1 D2 C

slide-20
SLIDE 20

Problem

  • Distinctive mechanisms= simultaneous components that underly
  • nly one data block
  • Common mechanisms= simultaneous components that underly

both data blocks

  • E.g.,

However… SC method: obtaining such a pattern is outside control…

' ' '

| | | |

conc LC GC

x x x x x x x x ⎡ ⎤ = ⎣ ⎦ ⎡ ⎤ ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ P P P L L L L L L

D1 D2 C

slide-21
SLIDE 21

Problem

  • Distinctive mechanisms= simultaneous components that underly
  • nly one data block
  • Common mechanisms= simultaneous components that underly

both data blocks

  • E.g.,

' ' '

| | | |

conc LC GC

x x x x x x x x ⎡ ⎤ = ⎣ ⎦ ⎡ ⎤ ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ P P P L L L L L L

t a r g e t t a r g e t t a r g e t

D1 D2 C

However… SC method: obtaining such a pattern is outside control…

slide-22
SLIDE 22

Solution: DISCO-GSCA

  • Predecessors:

– DISCO-SCA (Schouteden et al., 2010) – Grey Component Analysis (GCA, Westerhuis et al., 2007)

slide-23
SLIDE 23
  • Impose target structure to a certain power

( )

( )

2 2 ' ,

min

conc

target conc conc conc conc

λ +

T P

X

  • TP

W P P

λ

'

= T T I

Solution: DISCO-GSCA

slide-24
SLIDE 24
  • Impose target structure to a certain power

( )

( )

2 2 ' ,

min

conc

target conc conc conc conc

λ +

T P

X

  • TP

W P P

λ

'

= T T I

Solution: DISCO-GSCA

( ) ( ) ( ) ( ) ( ) ( )

1 1 1 1 2 1 2 1 2 1 2 1 2 1 2

11 12 13 1 2 3 1 2 3 1 2 3 I I I I I I I I I I I I I I I

p p p x x p p p x x p p p x x x x p p p

+ + + + + +

⎛ ⎞ ⎡ ⎤ ⎡ ⎤ ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ − − − ⎜ ⎟ − − − ⎢ ⎥ ⎢ ⎥ ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ ⎜ ⎟ ⎣ ⎦ ⎢ ⎥ ⎣ ⎦ ⎝ ⎠ M M M M M M M M M M M M

slide-25
SLIDE 25
  • Impose target structure to a certain power

( )

( )

2 2 ' ,

min

conc

target conc conc conc conc

λ +

T P

X

  • TP

W P P

λ

'

= T T I

Solution: DISCO-GSCA

1 1 1 1 ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ • − − − ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ M M M M M M

Elementwise product

( ) ( ) ( ) ( ) ( ) ( )

1 1 1 1 2 1 2 1 2 1 2 1 2 1 2

11 12 13 1 2 3 1 2 3 1 2 3 I I I I I I I I I I I I I I I

p p p x x p p p x x p p p x x x x p p p

+ + + + + +

⎛ ⎞ ⎡ ⎤ ⎡ ⎤ ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ − − − ⎜ ⎟ − − − ⎢ ⎥ ⎢ ⎥ ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ ⎜ ⎟ ⎢ ⎥ ⎢ ⎥ ⎜ ⎟ ⎣ ⎦ ⎢ ⎥ ⎣ ⎦ ⎝ ⎠ M M M M M M M M M M M M

slide-26
SLIDE 26

( )

( )

2 2 ' ,

min

conc

target conc conc conc conc

λ +

T P

X

  • TP

W P P

Solution: DISCO-GSCA

'

= T T I

slide-27
SLIDE 27

Solution: DISCO-GSCA

  • Model selection: 3 steps

– FIRST: Select the number of simultaneous components

  • (SCA, Van Deun et al., 2009)

– SECOND: characterize these components

  • i.e., how many of them are common/distinctive?
  • (DISCO-SCA, Schouteden et al., 2010)

– THIRD: define λ

  • L-curve (Hansen, 1992)
slide-28
SLIDE 28

Outline

  • Introduction

– Coupled data – Research questions

  • Method

– Simultaneous component method – Problem – Solution: DISCO-GSCA

  • Illustration

– Results

  • Conclusion
slide-29
SLIDE 29

Outline

  • Introduction

– Coupled data – Research questions

  • Method

– Simultaneous component method – Problem – Solution: DISCO-GSCA

  • Illustration

– Results

  • Conclusion
slide-30
SLIDE 30
  • Data: E. coli
  • Model:

– 5 simultaneous components – Target:

  • 1 common component
  • 2 distinctive components for GC/MS
  • 2 distinctive components for LC/MS
slide-31
SLIDE 31
  • Data: E. coli
  • Model:

– 5 simultaneous components – Target:

  • 1 common component
  • 2 distinctive components for GC/MS
  • 2 distinctive components for LC/MS

x x x x x x x x x x x x ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ − − − − − ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ M M M M M M M M M M

GC1 GC2 LC1 LC2 C

GC LC

⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = − − − ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ P P

T a r g e t T a r g e t

slide-32
SLIDE 32
  • Data: E. coli
  • Model:

– 5 simultaneous components – Target:

  • 1 common component
  • 2 distinctive components for GC/MS
  • 2 distinctive components for LC/MS

– λ=1

x x x x x x x x x x x x ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ − − − − − ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ M M M M M M M M M M

GC1 GC2 LC1 LC2 C

GC LC

⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = − − − ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ P P

T a r g e t T a r g e t

slide-33
SLIDE 33

Results

% Variance accounted for by DISCO-GSCA (λ=1)

GC1 GC2 LC1 LC2 C Total GC/MS 0.17 0.14 0.04 0.03 0.12 0.50 LC/MS 0.03 0.03 0.14 0.31 0.11 0.62 Xconc 0.12 0.10 0.08 0.13 0.12 0.54

slide-34
SLIDE 34

Results

% Variance accounted for by DISCO-GSCA (λ=1)

GC1 GC2 LC1 LC2 C Total GC/MS 0.17 0.14 0.04 0.03 0.12 0.50 LC/MS 0.03 0.03 0.14 0.31 0.11 0.62 Xconc 0.12 0.10 0.08 0.13 0.12 0.54

% Variance accounted for by SCA

GC1 GC2 LC1 LC2 C Total GC/MS 0.14 0.13 0.06 0.11 0.11 0.54 LC/MS 0.10 0.04 0.12 0.24 0.12 0.62 Xconc 0.12 0.10 0.08 0.15 0.11 0.57

slide-35
SLIDE 35

DISCO-GSCA (λ=1)

slide-36
SLIDE 36

GC/MS: 1) Processes that are

active when the carbon source is succinate instead

  • f glucose

DISCO-GSCA (λ=1)

slide-37
SLIDE 37

GC/MS: 1) Processes that are

active when the carbon source is succinate instead

  • f glucose

2) Processes that are

active in the E. coli wildtype and when the oxygen tension was not maintained

DISCO-GSCA (λ=1)

slide-38
SLIDE 38

LC/MS:

1) Processes that are active in pH+ environments and in low phosphate concentrations

DISCO-GSCA (λ=1)

slide-39
SLIDE 39

LC/MS:

1) Processes that are active in pH+ environments and in low phosphate concentrations 2) Processes that are active in the E. coli wildtype and in a pH+ environment

DISCO-GSCA (λ=1)

slide-40
SLIDE 40

Common: General time-related processes

DISCO-GSCA (λ=1)

slide-41
SLIDE 41

Outline

  • Introduction

– Coupled data – Research questions

  • Method

– Simultaneous component method – Problem – Solution: DISCO-GSCA

  • Illustration

– Results

  • Conclusion
slide-42
SLIDE 42

Outline

  • Introduction

– Coupled data – Research questions

  • Method

– Simultaneous component method – Problem – Solution: DISCO-GSCA

  • Illustration

– Results

  • Conclusion
slide-43
SLIDE 43

Conclusion

  • DISCO-GSCA

– Method to find common & distinctive mechanisms in coupled data – Imposes a target matrix to a simultaneous component solution – to a user-defined degree (λ)

  • Makes it possible to find an optimal trade-off between obtaining the target

structure and a loss of fit

slide-44
SLIDE 44

References

– Jolliffe, I. T. (2002). Principal component analysis. New York: Springer-Verlag. – Schouteden, M., Van Deun, K., Van Mechelen, I., & Pattyn, S. (2010). SCA and Rotation to distinguish common and distinctive information in coupled data. Manuscript submitted for publication. – Smilde, A. K., van der Werf, M. J., Bijlsma, S., van der werff-van der Vat, B. J. C., & Jellema, R. H. (2005). Fusion of mass spectrometry- based metabolomics data. Analytical chemistry, 77, 6729-6736. – Van Deun, K., Smilde, A. K., van der Werf, M. J., Kiers, H. A. L., & Van Mechelen, I. (2009). A structured overview of simultaneous component based data integration. BMC Bioinformatics, 10 (1), 246- 261. – Westerhuis, J. A., Derks, P. P. A., Hoefsloot, H. C. J. and Smilde, A.

  • K. (2007). Grey component analysis. Journal of chemometrics, 21,

474-485.

slide-45
SLIDE 45

! Thanks For Your Attention !

slide-46
SLIDE 46

Extra

  • Predecessors of DISCO-GSCA

– DISCO-SCA (Schouteden et al., 2010) – Grey Component Analysis (GCA, Westerhuis et al., 2007)

  • Model Selection

– Step 1: selection of the number of components – Step 2: selection of target matrix – Step 3: selection of λ

slide-47
SLIDE 47

Extra

  • Predecessors of DISCO-GSCA

– DISCO-SCA (Schouteden et al., 2010) – Grey Component Analysis (GCA, Westerhuis et al., 2007)

  • Model Selection

– Step 1: selection of the number of components – Step 2: selection of target matrix – Step 3: selection of λ

slide-48
SLIDE 48

DISCO-SCA

  • DISCO-SCA: rotates the simultaneous components

towards target structure

slide-49
SLIDE 49

DISCO-SCA

  • DISCO-SCA: rotates the simultaneous components

towards target structure

  • Loss function:

( )

2

min

target conc conc

B

W P P B

rotated rotated conc conc

= = T TB P P B

= BB' I

slide-50
SLIDE 50

DISCO-SCA

  • DISCO-SCA: rotates the simultaneous components

towards target structure

  • Loss function:
  • ☺ Consequences (as compared to SCA):

– Fit remains – Target structure is better obtained

( )

2

min

target conc conc

B

W P P B

rotated rotated conc conc

= = T TB P P B

= BB' I

2 2 2 ' ' ' rotated rotated conc conc conc conc conc conc

= = X

  • T

P X

  • TBB'P

X

  • TP
slide-51
SLIDE 51
  • consequences

– Rotation sometimes not powerful enough: the difference between rotated component loadings and a target matrix remains somewhat too large.

  • Solution DISCO-GSCA
slide-52
SLIDE 52

GCA

  • GCA: Impose target structure to component solution
  • Needed:
  • Extension towards coupled data
  • Common and distinctive target structure
  • Restriction orthogonal components <-> correlation between

components that should not share any information.

  • Solution: DISCO-GSCA

( )

( )

2 2 ' ,

min

target

λ +

T P

X - TP W P P

slide-53
SLIDE 53

Extra

  • Predecessors of DISCO-GSCA

– DISCO-SCA (Schouteden et al., 2010) – Grey Component Analysis (GCA, Westerhuis et al., 2007)

  • Model Selection

– Step 1: selection of the number of components – Step 2: selection of target matrix – Step 3: selection of λ

slide-54
SLIDE 54

Extra

  • Predecessors of DISCO-GSCA

– DISCO-SCA (Schouteden et al., 2010) – Grey Component Analysis (GCA, Westerhuis et al., 2007)

  • Model Selection

– Step 1: selection of the number of components – Step 2: selection of target matrix – Step 3: selection of λ

slide-55
SLIDE 55

Model Selection

  • Data = E. coli data set
  • Model selection: 3 steps

– FIRST: select the number of simultaneous components

  • (SCA, Van Deun et al., 2009)

– SECOND: characterize these components (i.e., select target)

  • (DISCO-SCA, Schouteden et al., 2010)

– THIRD: define λ

  • (Hansen, 1992)
slide-56
SLIDE 56
  • STEP 1: define the number of simultaneous components

– Simultaneous component scree-plot (Van Deun et al., 2009)

slide-57
SLIDE 57
  • STEP 2: characterization of the components (Schouteden et al.,

2010)

– (Non-)congruence criterion FOR EACH POSSIBLE TARGET-MATRIX

  • = sum of the percentages of variance accounted for by the rotated distinctive

components in the ‘wrong’ data blocks

– Taken the number of distinctive components into account

slide-58
SLIDE 58
  • STEP 2: characterization of the components (Schouteden et al.,

2010)

– (Non-)congruence criterion FOR EACH POSSIBLE TARGET-MATRIX

  • = sum of the percentages of variance accounted for by the (rotated) distinctive

components in the ‘wrong’ data blocks

– Taken the number of distinctive components into account

target conc

x x x x x x x x x x x x x x x x ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = − − − − − ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ M M M M M M M M M M P

slide-59
SLIDE 59
  • STEP 2: characterization of the components (Schouteden et al.,

2010)

– (Non-)congruence criterion FOR EACH POSSIBLE TARGET-MATRIX

  • = sum of the percentages of variance accounted for by the (rotated) distinctive

components in the ‘wrong’ data blocks

– Taken the number of distinctive components into account

target conc

x x x x x x x x x x x x x x x x ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = − − − − − ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ M M M M M M M M M M P

target conc

x x x x x x x x x x x x x x ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = − − − − − ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ M M M M M M M M M M P

slide-60
SLIDE 60
  • STEP 2: characterization of the components (Schouteden et al.,

2010)

– (Non-)congruence criterion FOR EACH POSSIBLE TARGET-MATRIX

  • = sum of the percentages of variance accounted for by the (rotated) distinctive

components in the ‘wrong’ data blocks

– Taken the number of distinctive components into account

target conc

x x x x x x x x x x x x x x x x ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = − − − − − ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ M M M M M M M M M M P

target conc

x x x x x x x x x x x x x x ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = − − − − − ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ M M M M M M M M M M P

target conc

x x x x x x x x x x x x ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = − − − − − ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ M M M M M M M M M M P

slide-61
SLIDE 61

STEP 3: define λ with L-curve (Hansen, 1992)

slide-62
SLIDE 62

STEP 3: define λ with L-curve (Hansen, 1992)

λ= 0.1

slide-63
SLIDE 63

STEP 3: define λ with L-curve (Hansen, 1992)

λ= 0.1 λ= 1

slide-64
SLIDE 64

STEP 3: define λ with L-curve (Hansen, 1992)

λ= 0.1 λ= 1 λ= 10