SVD- -based Functional ANOVA For based Functional ANOVA For SVD - - PowerPoint PPT Presentation

svd based functional anova for based functional anova for
SMART_READER_LITE
LIVE PREVIEW

SVD- -based Functional ANOVA For based Functional ANOVA For SVD - - PowerPoint PPT Presentation

SVD- -based Functional ANOVA For based Functional ANOVA For SVD Measurement Evaluation of MALDI- - Measurement Evaluation of MALDI TOF Mass Spectrometry of Polymers * TOF Mass Spectrometry of Polymers * Z.Q. John Lu (John.lu@nist.gov)


slide-1
SLIDE 1

SVD SVD-

  • based Functional ANOVA For

based Functional ANOVA For Measurement Evaluation of MALDI Measurement Evaluation of MALDI-

  • TOF Mass Spectrometry of Polymers

TOF Mass Spectrometry of Polymers* *

Z.Q. John Lu (John.lu@nist.gov)

Statistical Engineering Division, ITL National Institute of Standards and Technology *Acknowledgement: Presentation based in part on collaboration with Charles Guttman (NIST), Stephanie Wetzel (NIST), Jennifer Huckett (Iowa State).

slide-2
SLIDE 2

5/28/04 Interface2004 2

  • I. Background
  • I. Background
  • 1. Statistics in Biology: beginning of

mathematical statistics (R A Fisher); statistical genetics; genomics (microarrays), proteomics, system biology (Nobert Wiener).

  • 2. High-throughput experiments and modern

metrology: large data (many variables, high p), and unfortunately not many samples (low n)!

slide-3
SLIDE 3

3

A G E P Box view of statistical learning? A G E P Box view of statistical learning?

  • H. Kitano 2002, Science, System Biology
  • H. Kitano 2002, Science, System Biology
slide-4
SLIDE 4

5/28/04 Interface2004 4

3.

  • 3. Biomarker hunting from serum mass

Biomarker hunting from serum mass spectra spectra

  • Qu, Y, B. Adam, et al (2003): Biometrics, 59, 143-

151.

– P=48,538, n=248 (167 cancer patients and 81 controls) – Test samples= 45

  • Claim: data too large for PCA or SVD

– Opts to use wavelet transform, K-L information criterion for variable selections, then apply Fisher’s discriminant analysis, evaluating classifier: sensitivity=98%, specficity=99%.

slide-5
SLIDE 5

5/28/04 Interface2004 5

2 5 5 0 7 5 2 5 5 0 7 5 2 5 5 0 7 5 2 5 5 0 7 5 2 5 5 0 7 5 2 5 5 0 7 5 2 5 5 0 7 5 2 5 5 0 7 5 2 5 5 0 7 5 2 5 5 0 7 5 2 5 5 0 7 5 2 5 5 0 7 5 2 5 5 0 7 5 2 0 4 0 6 0 2 5 5 0 7 5

N1 N2 N3 N4 N5 N6 B1 B2 B3 C1 C2 C3 C4 C5 C6

2000 10000 6000 2000 10000 6000

SELDI Serum Protein Profile Analysis SELDI Serum Protein Profile Analysis-

  • Prostate Cancer

Prostate Cancer

Bao Bao-

  • ling Adam et al 2004 Interface

ling Adam et al 2004 Interface

slide-6
SLIDE 6

5/28/04 Interface2004 6

II.

  • II. Statistical and Metrological Issues

Statistical and Metrological Issues

  • High-dimensional pattern recognition and regression

modeling (prediction)

– What are the underlying assumptions for pattern recognitions such as the simple hyperplane classifer? – Hidden and intrinsic low-dimensional predictors and classifying variables? – Need robust and data and user-friendly, algorithmic like, and fast, scale well

  • Repeatability and reproducibility controls (Baggerly

et al 2004, Bioinformatics)

  • Experimental design to validate conclusions: sample

size requirements

slide-7
SLIDE 7

5/28/04 Interface2004 7

Biostatistics Biostatistics to the rescue? to the rescue?

  • Boguski and McIntosh

2003: “The analysis of proteomics data is currently informal and relies heavily on expert

  • pinion”.
  • Recommend: better study

design to avoid confounding factors

  • Study of biological

variability and reproducbility

slide-8
SLIDE 8

5/28/04 Interface2004 8

  • III. MALDI
  • III. MALDI-
  • TOF Mass Spectrometry of

TOF Mass Spectrometry of Synthetic Polymers Synthetic Polymers

1. The molecular mass (weight) distribution (MMD) of synthetic polymer is studied using MALDI-TOF MS. The MWDs and the

moments derived from MALDI are compared with the values determined by traditional methods including size exclusion chromatography, light scattering, and osmometry.

2. Evaluate effects from experimental design

  • f various instrument settings and sample

preparation.

slide-9
SLIDE 9

9

m/z 6000 8000 10000 12000

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8

slide-10
SLIDE 10

10

m/z 6000 8000 10000 12000 0.0 0.01 0.02 0.03 0.04 0.05

raw data--Detector Voltage

slide-11
SLIDE 11

11

normalized intensity 0.0 0.005 0.010 0.015 0.020 0.025

Raw data: normalized intensity vs. m/z.

slide-12
SLIDE 12

5/28/04 Interface2004 12

Classical Polymer Approaches Classical Polymer Approaches

Mn = number-average molecular weight

First moment

– Mw = weight-average molecular weight

2nd moment/1st moment

– Mz = z-average molecular weight

Third moment/2nd moment

slide-13
SLIDE 13

5/28/04 Interface2004 13

run number M 10 15 20 25 30 8400 8600 8800 9000 n w z n w z n w z n w z n w z n w z n w z n w z n w z n w z n w z n w z n w z n w z n w z n w z n w z n w z n w z n w z n w z n w z n w z n w z

molecular moments

slide-14
SLIDE 14

5/28/04 Interface2004 14

setting number M 2 4 6 8 8400 8600 8800 9000 n w z n w z n w z n w z n w z n w z n w z n w z

average molecular moments

slide-15
SLIDE 15

5/28/04 Interface2004 15

Proposed Analysis Methods Proposed Analysis Methods

  • Standard Functional ANOVA

– Analysis on the points in each spectrum at each m/z value – Does not take into account the continuity of the data

  • Singular Value Decomposition-based

– Singular values decomposition of fitted run spectra and setting mean spectra – Does take into account the continuity of the data

slide-16
SLIDE 16

16

SVD Analysis Results: I SVD Analysis Results: I

Index of mass Deviations 10 20 30 40 50 60 70 80

  • 0.006

0.0 0.006

Mean-sweeped mass spec

+ + + + + + + + + + + + + + + + + + + + + + + +

  • Dimension

SVs 10 20 0.0 0.04 0.08

SVD spectrum of raw data (+) and of noise (o)

111 11 1 11 1 11 111 111 11 1 11 1 11 111 111 111 111 1 1 111 111 111 111 1 1 111 111 111 111 11 1 11 1 11 111 111 11 222 22 2 2 2 2 22 222 222 22 2 22 2 22 222 222 222 222 2 2 222 222 222 222 2 2 222 22 2 222 222 2 2 2 22 2 22 222 222 22 333 3 3 3 3 3 3 3 3 33 3 3 3 3 3 3 3 3 3 3 3 3 33 3 3 33 333 3 33 3 3 33 3 3 33 3 3 3 3 33 3 3 3 3 3 3 33 3 3 3 3 33 33 3 33 3 33 333 333 33 Coef

  • 0.4

0.0

Columns of V matrix: PCA

slide-17
SLIDE 17

17

SVD Analysis Results: II SVD Analysis Results: II

Area SVD coordinates 2 4 6 8

  • 0.4
  • 0.2

0.0 0.2 0.4

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3

TopRight TopRight TopRight TopLeft TopLeft TopLeft SideFront SideFront SideFront SideBack SideBack SideBack SideRight SideRight SideRight SideLeft SideLeft SideLeft BottomRight BottomRight BottomRight BottomLeft BottomLeft BottomLeft TopRight TopRight TopRight TopLeft TopLeft TopLeft SideFront SideFront SideFront SideBack SideBack SideBack SideRight SideRight SideRight SideLeft SideLeft SideLeft BottomRight BottomRight BottomRight BottomLeft BottomLeft BottomLeft TopRight TopRight TopRight TopLeft TopLeft TopLeft SideFront SideFront SideFront SideBack SideBack SideBack SideRight SideRight SideRight SideLeft SideLeft SideLeft BottomRight BottomRight BottomRight BottomLeft BottomLeft BottomLeft

Time plot of the 3-d representational points

coord 1 coord 2

  • 0.2

0.0 0.2 0.4

  • 0.2

0.0 0.2 0.4 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8

Scatter plot of coordinates 1 and 2

coord 3

  • 0.2

0.0 0.2 0.4

  • 0.4
  • 0.2

0.0 0.2 0.4 1 1 1 2 2 2 3 3 3 4 4 4 55 5 6 6 6 7 7 7 8 8 8

Scatter plot of coordinates 1 and 3

coord 3

  • 0.2

0.0 0.2 0.4

  • 0.4
  • 0.2

0.0 0.2 0.4 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8

Scatter plot of coordinates 2 and 3

slide-18
SLIDE 18

5/28/04 Interface2004 18

  • IV. Statistical Theory
  • IV. Statistical Theory
  • Determine the significance of singular values
  • Develop MANOVA to SVD-derived U vectors for

reproducibility test

  • Hidden structure and hidden dimension: how do they

determine the sample size requirement

  • Curse of dimensionality: over-hyped phonenomen (?)

Lu (1999, J. Multivariate Analysis)

  • SVD-based high-dimensional regression such as

multivariate locally weighted regression: popular tool for multivariate calibration in chemometrics

slide-19
SLIDE 19

5/28/04 Interface2004 19

Related Papers Related Papers: :

  • Z.Q. Lu (1999). Nonparametric Regression with Singular
  • Design. Journal of Multivariate Analysis, 70, 177-201.

(What is curse of dimensionality?)

  • Z.Q. Lu, J. Huckett, S.J. Wetzel, C.M.Guttman (2004).

Functional ANOVA For Mass Spectral Data Analysis in Synthetic Polymers. Paper in preparation. To be submitted to Journal of American Statistical Association. (The paper on statistics for synthetic polymers)

  • Z.Q. Lu (2004). Singular Value Decomposition Methods in

Statistics and High-dimensional Data Analysis. Paper in preparation. (The SVD theory paper in statistics)