STATISTICAL ANALYSIS OF MASS SPECTROMETRY IMAGING EXPERIMENTS - - PowerPoint PPT Presentation
STATISTICAL ANALYSIS OF MASS SPECTROMETRY IMAGING EXPERIMENTS - - PowerPoint PPT Presentation
A FRAMEWORK FOR STATISTICAL ANALYSIS OF MASS SPECTROMETRY IMAGING EXPERIMENTS Kylie Bemis Purdue University Department of Statistics OUTLINE Statement of the problem Biotechnological problem Statistical and computational problem
OUTLINE
- Statement of the problem
- Biotechnological problem
- Statistical and computational problem
- Statement of contributions
- Open-source software
- matter: Rapid prototyping with data on disk
- Cardinal: Statistical toolbox for mass spectrometry imaging experiments
- Statistical methods
- Spatial shrunken centroids
- Evaluation and case studies
- Summary
- Conclusions
- Future work
2
OUTLINE
- Statement of the problem
- Biotechnological problem
- Statistical and computational problem
- Statement of contributions
- Open-source software
- matter: Rapid prototyping with data on disk
- Cardinal: Statistical toolbox for mass spectrometry imaging experiments
- Statistical methods
- Spatial shrunken centroids
- Evaluation and case studies
- Summary
- Conclusions
- Future work
3
MASS SPECTROMETRY IMAGING
Investigate spatial distribution of analytes
y y y y y
- Scan with laser/spray
- Collect mass spectra
- Reconstruct ion images
- Date “cube”
- R. Graham
Cooks and lab
BIOTECHNOLOGICAL PROBLEM
- Rapidly advancing technology
- Increasing mass resolutions
- Greater mass accuracy and range
- More features (larger P)
- Increasing spatial resolutions
- Approaching 1 µm resolution
- More pixels (larger N)
- More complex experiments
- 3D experiments
- Time-course experiments
- Increasing sample size
- More biological replicates
- More pixels (larger N)
5
STATISTICAL & COMPUTATIONAL PROBLEM
- Complex, high-dimensional data
- Spatial x, y dimensions
- Potentially z, t dimensions
- Mass spectral features (m/z values)
- Correlation structures
- Spatial (and possibly temporal)
- Between mass spectral features
- Increasing mass+spatial resolutions
- Larger(-than-memory) datasets
- Can range from 100 MB to 100 GB
- Experimental design
- Variation across samples+slides
- What counts as a replicate?
6
PROBLEM STATEMENT
- Biotechnological problem
- Mass spectrometry (MS) imaging has advanced at a rapid pace
- Computational tools have not advanced at a comparable pace
- Lack of free, open-source statistical tools for statistical analysis
- Need for classification/segmentation with statistical inference:
- Classification: Classify pixels based on their mass spectral profiles into
pre-defined classes (such as healthy/disease status)
- Segmentation: Assign pixels to newly discovered segments with
relatively homogenous and distinct mass spectral profiles
- Select a subset of informative mass spectral features
- Statistical and computational problem
- MS imaging experiments result in complex, high-dimensional experiments
- Spatial structure in datasets with large P and large N
- Statistical computing on larger-than-memory data is a challenge
OUTLINE
- Statement of the problem
- Biotechnological problem
- Statistical and computational problem
- Statement of contributions
- Open-source software
- matter: Rapid prototyping with data on disk
- Cardinal: Statistical toolbox for mass spectrometry imaging experiments
- Statistical methods
- Spatial shrunken centroids
- Evaluation and case studies
- Summary
- Conclusions
- Future work
8
STATEMENT OF CONTRIBUTIONS
9
- Statistical methods: spatial shrunken centroids
- Classification and segmentation for MS imaging experiments
- Probabilistic model using spatial information
- Selection of most informative mass spectral features
- Open-source software: Cardinal
- Free, open-source R package for MS imaging experiments
- Full pipeline including processing, visualization, and statistical analysis
- For experimentalists, provides accessible statistical methods
- For statisticians, provides infrastructure for method development
- Open-source software: matter
- Free, open-source R package for rapid prototyping with data-on-disk
- Flexible statistical computing and method development for larger-than-memory datasets
- Enables Cardinal to scale to high-resolution, high-throughput MS imaging experiments
- Evaluation and case studies
- Public datasets and reproducible results in CardinalWorkflows
- Community impact of this work
x y z
OUTLINE
- Statement of the problem
- Biotechnological problem
- Statistical and computational problem
- Statement of contributions
- Open-source software
- matter: Rapid prototyping with data on disk
- Cardinal: Statistical toolbox for mass spectrometry imaging experiments
- Statistical methods
- Spatial shrunken centroids
- Evaluation and case studies
- Summary
- Conclusions
- Future work
10
11
PROBLEM: LARGER-THAN-MEMORY DATA
challenges statistical method development
x y z m/z = 715.03 t = 4 x y z m/z = 715.03 t = 8 x y z m/z = 715.03 t = 11- MS imaging experiments rapidly advancing
- Increasing mass and spatial resolutions
- Larger sample sizes, multiple files
- Growing data size poses difficulty for statistics
- Need to test methods on larger-than-memory data
- Need to work with domain-specific formats
- Current R solutions are inflexible
Cardinal help Google group
12
CONTRIBUTION: MATTER
- pen-source statistical computing with data on disk
File 1 File 2 File 3
Storage matter object
Atom 1 Atom 2 Atom 3 Atom 4 Atom 5 Atom 6
- Work with larger-than-memory
datasets on disk in R
- Emphasizes flexibility with a
minimal memory footprint
- Adaptable to more datasets than
bigmemory and ff
- Potentially slower computation
- Designed for statistical method
development in R
- Rapid prototyping with minimal
additional effort
- Works with many existing algorithms
- Efficient calculation of summary statistics
- Infrastructure for statistical computing
- n large data
UUID mzArray 1 intensityArray 1 mzArray 2 intensityArray 2 mzArray 3 intensityArray 3 UUID mzArray intensityArray 1 intensityArray 2 intensityArray 3 intensityArray 4 intensityArray 5
“processed” imzML “continuous” imzML
NEED TO WORK WITH MS IMAGING FILES
e.g., “processed” and “continuous” imzML
13
- Open-source format for MS
imaging experiments
- XML metadata file defines
binary data file structure
- Binary data schema is incompatible
with bigmemory and ff
- Prefer to avoid additional file
conversion
- Need random access into different
parts of the file
- Often one-sample-per-file
- Need to seamlessly work with
multiple files in an experiment
- Each file can be very large
- matter solves these problems
FLEXIBLE ACCESS TO DATA ON DISK
Metadata Column A Column B Column C Column D Metadata Column E Column F Column G Column H Column A Column C Column F Column H
File 1 File 2 matter matrix
any binary format, any file structure
14
- User-defined file structure
- Data can come from anywhere
- Any part of a file
- Any combination of files
- Representation in R can be
different from on disk
- Access as ordinary R vector/matrix
- No need to worry about data size
- r memory management
EXAMPLE: LINEAR REGRESSION
with a 1.2 GB simulated data and biglm
15
Memory Used Memory Overhead Time R matrices + lm 7 GB 1.4 GB 33 sec R matrices + biglm 2.7 GB 1.3 GB 158 sec bigmemory + biglm 1.7 GB 397 MB 21 sec matter + biglm 466 MB 319 MB 42 sec
R matrices + lm R matrices + biglm bigmemory + biglm matter + biglm
1750 3500 5250 7000
Memory Used (MB) Memory Overhead (MB)
- 1.2 GB dataset
- N = 15,000,000 observations
- P = 9 variables
- Linear regression
- Using biglm package
- Specifically for large datasets
EXAMPLE: PRINCIPAL COMPONENTS ANALYSIS
with a 1.2 GB simulated data and irlba
16
Memory Used Memory Overhead Time R matrices + svd 3.6 GB 2.4 GB 62 sec R matrices + irlba 2.3 GB 961 MB 9 sec bigmemory + irlba 3.5 GB 962 MB 9 sec matter + irlba 522 MB 427 MB 171 sec
R matrices + svd R matrices + irlba bigmemory + irlba matter + irlba
1000 2000 3000 4000
Memory Used (MB) Memory Overhead (MB)
- 1.2 GB dataset
- N = 15,000,000 observations
- P = 10 variables
- PCA
- Using irlba package
- Not specifically for large datasets
EXAMPLE: PRINCIPAL COMPONENTS ANALYSIS
with a 2.85 GB microbial time-course experiment
17
Oetjen et al, Gigascience, 2015
x y z x y z x y zt = 11 t = 8 t = 4 m/z 262
x y z x y z x y zt = 11 t = 8 t = 4 PC1 scores PC1 loadings
- 3D microbial time-course
- 2.85 GB on disk
- 17,672 pixels
- 40,299 features
234 MB to compute 3 PC 79 MB memory overhead 418 sec per PC
EXAMPLE: VISUALIZATION
- f a 26.45 GB mouse pancreas experiment
18
Oetjen et al, Gigascience, 2015
x y z
m/z 5086
x y z
m/z 3121
x y z
m/z 3922
- 3D mouse pancreas
- 26.45 GB on disk
- 497,225 pixels
- 13,312 features
1.25 GB used in-memory 223 MB to calculate mean spectrum
Mean spectrum
cannot load at all without matter
OUTLINE
- Statement of the problem
- Biotechnological problem
- Statistical and computational problem
- Statement of contributions
- Open-source software
- matter: Rapid prototyping with data on disk
- Cardinal: Statistical toolbox for mass spectrometry imaging experiments
- Statistical methods
- Spatial shrunken centroids
- Evaluation and case studies
- Summary
- Conclusions
- Future work
19
20
- Few free, open-source tools exist
- Most incapable of handling large datasets from multiple files
- Lack of extensibility by statisticians and computer scientists
- Little focus on statistical analysis and experimental design
- Focus on visualization of molecular ion images and mass spectra
- Some computational algorithms without statistical inference
- MSiReader
- Free, open-source
- Requires Matlab
- SCiLS
- Commercial, proprietary
- Requires Bruker instruments
PROBLEM: LACK OF SOFTWARE
for statistical analysis of MS imaging experiments
CONTRIBUTION: CARDINAL
- pen-source statistical software for MS imaging
- K. D. Bemis, A. Harry, L. S. Eberlin, C. Ferreira, S. M. van de Ven, P. Mallick, M. Stolowitz, O. Vitek.
“Cardinal: an R package for statistical analysis of mass spectrometry-based imaging experiments”. Bioinformatics, 31:2418, 2015
- Free, open-source
- R-based
- Available on Bioconductor
- Source code on Github
www.cardinalmsi.org
- >1,800 unique downloads since public release on April 17, 2015
- Winner of the 2015 John M. Chambers Statistical Software Award
- Last release on May 4, 2016 with Bioconductor 3.3
SOFTWARE FOR MSI EXPERIMENTS
- Format support
- imzML (continuous & processed) and Analyze 7.5
- Visualization
- Plotting of mass spectra and molecular images
- Spectral processing
- Normalization, smoothing, baseline reduction, peak picking
- Image processing
- Contrast enhancement, spatial smoothing
- Statistical analysis
- PCA, PLS, spatial shrunken centroids (classification & segmentation)
22
focus on experiments, not just datasets
EFFICIENT, MODULAR DATA STRUCTURES
- iSet
- Virtual class for imaging experiments
- MSImageSet
- Mass spectrometry imaging experiments
- MSImageData
- Efficient storage of mass spectra and reconstruction of images
- MSImageProcess
- Tracks pre-processing applied to mass spectra
- IAnnotatedDataFrame
- Tracks pixel-level metadata
- MIAPE-Imaging
- Minimum Information About a Proteomics [Imaging] Experiment
- ResultSet
- Stores results of statistical analyses on imaging experiments
23
VISUALIZATION TOOLS
library(CardinalWorkflows) data(cardinal, cardinal_analyses) top <- topLabels(cardinal.sscg, model=list(r=1, k=10, s=3), n=9) image(cardinal, mz=top$mz, plusminus=0.5, normalize.image="linear", contrast.enhance="histogram", layout=c(3,3))
Plot top 9 ion images from segmentation (across all segments)
24
VISUALIZATION TOOLS
image(cardinal, mz=c(207.08, 235, 255.25, 265.17, 649.17), plusminus=0.5, normalize.image="linear", contrast.enhance="histogram", col=c(“red", “darkred”, “gray", “black", "brown"), superpose=TRUE)
Recreate painting from
- verlay of ion images
25
SPECTRAL AND IMAGE PROCESSING
smoothSignal(Brain_1, plot=TRUE) reduceBaseline(Brain_1, plot=TRUE) peakPick(Brain_1, plot=TRUE)
627.61
m/z = 9984.72
627.61 6.54
m/z = 9984.72
627.61 14.05
m/z = 9984.72
image(Brain_1, mz=9984.7) image(…, contrast.enhance=“histogram”) image(…, smooth.image=“gaussian”)
26
27
- All pre-processing methods in Cardinal
- Can take user-specified functions for custom processing
- Are wrappers around pixelApply or featureApply
- pixelApply and featureApply
- Allow applying arbitrary functions over imaging experiments
- Allow conditioning on groups of pixels and/or features
standardize <- function(x) x / sum(x) # TIC normalization pixelApply(data, .fun=standardize) # Standardize samples featureApply(data, .fun=standardize, .pixel.groups=sample)
APPLY FUNCTIONS OVER IMAGES
with pixelApply and featureApply
ANALYZE BIGGER EXPERIMENTS
using data-on-disk with matter
28
Oetjen et al, Gigascience, 2015
Work with arbitrarily large datasets from any number of files
mouse <- readMSIData(“3D_Mouse_Pancreas.imzML”) pData(mouse)$TIC <- pixelApply(mouse, sum) image3D(mouse, TIC ~ x * y * z)
x y z
TIC
Example: 26.45 GB dataset on a 16 GB laptop
OUTLINE
- Statement of the problem
- Biotechnological problem
- Statistical and computational problem
- Statement of contributions
- Open-source software
- matter: Rapid prototyping with data on disk
- Cardinal: Statistical toolbox for mass spectrometry imaging experiments
- Statistical methods
- Spatial shrunken centroids
- Evaluation and case studies
- Summary
- Conclusions
- Future work
29
30
- Few statistical methods being developed for MS imaging
- Current algorithms do not do statistical inference
- Feature selection is post-hoc and heuristic
- Existing statistical methods are inappropriate or inefficient
- Spatial statistics methods do not yet scale to many features
- Few other methods can incorporate spatial information
PROBLEM: NEED FOR STATISTICAL INFERENCE
incorporating the spatial information in the experiment
200 400 600 800 −30 −10 10 30
m z brain t−statistics
200 400 600 800 −40 −20 20 40
m z liver t−statistics
200 400 600 800 −40 −20 20 40
m z heart t−statistics
- K. D. Bemis, A. Harry, L. S. Eberlin, C. Ferreira, S. M. van de Ven, P. Mallick, M. Stolowitz, O. Vitek.
“Probabilistic segmentation of mass spectrometry images helps select important ions and characterize confidence in the resulting segments ”. Molecular & Cellular Proteomics, 2016
t-statistics show important ions for the brain, heart, and liver segments
CONTRIBUTION: SPATIAL SHRUNKEN CENTROIDS
spatially-aware classification/segmentation with feature selection
- Combines spatial information & feature selection
- Spatially-aware distance from spatially-aware clustering
(Alexandrov and Kobarg, 2011)
- Statistical regularization from nearest shrunken centroids
(Tibshirani, Hastie, et al., 2013)
- Improved image classification & segmentation
- Data-driven selection of appropriate number of segments
- Selects most important ions for distinguishing class/segment
- Probability model characterizes uncertainty
Spatially-aware (SA) weights: weights depend on the distance from neighborhood center Spatially-aware structurally-adaptive (SASA) weights: weights of neighbors also depend on their spectral similarity
αδiδj = exp (
- δ2
i + δ2 j
2σ2 )
αδiδj(xijm, xi0j0m0) = exp (
- δ2
i + δ2 j
2σ2 ) · q βδiδj(xijm)βδiδj(xi0j0m0)
βδiδj(xijm) = exp ⇢ 1 2λ2 kx(i+δi)(j+δj)m xijmk2
- Alexandrov & Kobarg,
Bioinformatics, 2011 Considers mass spectra from neighboring pixels
SPATIAL SMOOTHING
from spatially-aware clustering
32
Classification: Start with labeled classes Calculate t-statistics
tkp = ¯ xkp − ¯ xp ˆ τp · q 1
Nk − 1 PK
k=1 Nk
class centroid global centroid Tibshirani, Hastie, et al. Statistical Science, 2013 pooled sd
t0
kp = sign(tkp)(|tkp| − s)+,
where t+ = t if t > 0, and t+ = 0 if t ≤ 0
Shrink t-statistics shrinkage parameter Segmentation: Initialize segments with spatially-aware clustering
¯ x0
kp = ¯
xp + t0
kpˆ
τp · s 1 Nk − 1 PK
k=1 Nk
Calculate shrunken centroids Uninformative features are removed
FEATURE SELECTION
from nearest shrunken centroids
33
d(xijm, ¯ x0
k) =
X
rδi,δj,r
αδiδj(xijm) · kx(i+δi)(j+δj)m ¯ x0
kk2
Key contribution: Calculate spatially-aware distance to shrunken centroids spatial neighborhood SA or SASA weights mass spectrum class centroid SA weights: Modified SASA weights:
αδiδj = exp (
- δ2
i + δ2 j
2σ2 ) βδiδj(xijm) = exp ⇢ 1 2λ2 kx(i+δi)(j+δj)m xijmk2
- αδiδj(xijm) = exp
(
- δ2
i + δ2 j
2σ2 ) · βδiδj(xijm)
Bemis, et al. Molecular & Cellular Proteomics, 2016 Allows feature selection + spatial smoothing
PROPOSAL:
spatial distance to shrunken centroids
34
CALCULATING CLASS OR SEGMENT MEMBERSHIP
D(xijm, ¯ x0
k) = 1
ˆ τ 2
p
d(xijm, ¯ x0
k) − 2 log πk
Calculate discriminant scores Using spatially-aware distance to shrunken centroids Calculate posterior probabilities
- f class or segment membership
ˆ pk(xijm) = e(1/2)D(xijm, ¯
x0
k)
K
P
l=1
e(1/2)D(xijm, ¯
x0
l)
pooled sd prior probabilities Tibshirani, Hastie, et al. Statistical Science, 2013 Classification: Done Segmentation: Iterate until no change in segments Assign pixel to class or segment with max posterior probability spatial distance
35
Alexandrov & Kobarg, Bioinformatics, 2011 Spatial shrunken centroids
r=2, k=6 r=2, k=20, s=6 6 segments SA=Spatially Aware SASA=Spatially Aware Structurally Adaptive
K−means PCA + K−means SA + K−means SASA + K−means SA + Shrunken Centroids SASA + Shrunken Centroids
IMPROVED SEGMENTATION
from statistical regularization and spatial information
r = 2, k = 20, s = 0 r = 2, k = 20, s = 3 r = 2, k = 20, s = 6 r = 2, k = 20, s = 9
s=0 s=3 s=6 s=9
2 4 6 8 6 8 10 12 14 16 18
Shrinkage parameter (s) Predicted # of Classes
- r = 1, k = 15
r = 2, k = 15 r = 1, k = 20 r = 2, k = 20
Empirical relationship exists between sparsity in the # of features and # of segments r=2, k=20, s=6 6 segments
SA + Shrunken Centroids
DATA-DRIVEN MODEL SELECTION
for unsupervised experiments through statistical regularization
37
r=2, s=6, k=20 6 segments
200 400 600 800 −30 −10 10 30
m z brain t−statistics
200 400 600 800 −40 −20 20 40
m z liver t−statistics
36.6
m/z = 834.5
43.11
m/z = 537.25
SA + Shrunken Centroids
SELECTION OF MOLECULAR FEATURES
that distinguish each segment for improved interpretability
38
Low noise (MALDI rat) Medium noise (DESI mouse) High noise (MALDI mouse) Reduced to 2 segments Optimal sparsity Higher sparsity
r = 2, k = 10, s = 0 r = 2, k = 5, s = 0 r = 3, k = 10, s = 28 r = 3, k = 5, s = 35 r = 2, k = 10, s = 5 r = 2, k = 10, s = 25
1 2 3 4 5 2.0 2.5 3.0 3.5 4.0
Shrinkage parameter (s) Predicted # of Classes
- r = 2, k = 5
r = 2, k = 10 5 10 15 20 25 30 35 2 4 6 8 10
Shrinkage parameter (s) Predicted # of Classes
- r = 3, k = 5
r = 3, k = 10 5 10 15 20 25 2 3 4 5 6 7 8
Shrinkage parameter (s) Predicted # of Classes
- r = 2, k = 5
r = 2, k = 10
Optimal # of segments
VISUALIZE UNCERTAINTY
probabilistic model characterizes uncertainty in segmentation
39
69 2.66
m/z = 885.67 UH0505_12
cancer normal
101 2.79
m/z = 885.67 UH9812_03
200 400 600 800 1000 5 10 15 20
m z intensity
r = 3, k = 2, s = 20200 400 600 800 1000 5 10 15 20
m z intensity
r = 3, k = 2, s = 20200 400 600 800 1000 −20 −10 10 20
m z t−statistic
r = 3, k = 2, s = 20
cancer normal
Graham Cooks and lab Livia Eberlin
r=3, s=20 Selected by cross-validation
FACILITATES INTERPRETABILITY
for supervised experiments through feature selection and probability
40
OUTLINE
- Statement of the problem
- Biotechnological problem
- Statistical and computational problem
- Statement of contributions
- Open-source software
- matter: Rapid prototyping with data on disk
- Cardinal: Statistical toolbox for mass spectrometry imaging experiments
- Statistical methods
- Spatial shrunken centroids
- Evaluation and case studies
- Summary
- Conclusions
- Future work
41
EVALUATION AND CASE STUDIES
- Cardinal and spatial shrunken centroids widely tested
- Evaluated on both experimental data and controlled standards
- Public datasets and reproducible results provided in CardinalWorkflows
- Community support and feedback has been valuable
- >1,800 users of Cardinal and public feedback is extremely enthusiastic
- Google help group provides insight into usage and needed improvements
CONTROLLED EXAMPLE: CARDINAL PAINTING
SA + Shrunken Centroids SASA + Shrunken Centroids
2 4 6 8 7 8 9 10 11 12 13 14
Shrinkage parameter (s) Predicted # of Classes
- r = 1, k = 10
r = 2, k = 10 r = 1, k = 15 r = 2, k = 15 2 4 6 8 6 7 8 9 10 11 12 13
Shrinkage parameter (s) Predicted # of Classes
- r = 1, k = 10
r = 2, k = 10 r = 1, k = 15 r = 2, k = 15
Graham Cooks and lab r=1, s=3, k=10 r=2, s=3, k=10
43
CONTROLLED EXAMPLE: FARMHOUSE PAINTING
Graham Cooks and lab
SA + Shrunken Centroids SASA + Shrunken Centroids
2 4 6 8 5 6 7 8 9 10 11 12
Shrinkage parameter (s) Predicted # of Classes
- r = 1, k = 10
r = 2, k = 10 r = 1, k = 15 r = 2, k = 15 2 4 6 8 6 7 8 9 10 11 12 13
Shrinkage parameter (s) Predicted # of Classes
- r = 1, k = 10
r = 2, k = 10 r = 1, k = 15 r = 2, k = 15
r=1, s=3, k=10 r=2, s=3, k=10
44
CONTROLLED EXAMPLE: SPOTTED PATTERN TEST
Mark Stolowitz Stephanie van de Ven
- S. M. van de Ven, K. D. Bemis, K. Lau, R. Adusumilli, U. Kota, M. Stolowitz, O. Vitek, P. Mallick, S. S.
- Gambhir. “Protein biomarkers on tissue as imaged via MALDI mass spectrometry: A systematic
approach to study the limits of detection”. Proteomics, 2016
Quantify limits of detection in MS imaging experiments
- Statistical approaches to MS
imaging experiments are necessary
- Cardinal enables systematic
study of crucial experimental design questions
OUTLINE
- Statement of the problem
- Biotechnological problem
- Statistical and computational problem
- Statement of contributions
- Open-source software
- matter: Rapid prototyping with data on disk
- Cardinal: Statistical toolbox for mass spectrometry imaging experiments
- Statistical methods
- Spatial shrunken centroids
- Evaluation and case studies
- Summary
- Conclusions
- Future work
46
CONCLUSIONS AND FUTURE WORK
47
- Statistical methods: spatial shrunken centroids
- Regularized classification and segmentation for MS imaging experiments
- Further investigate relationship between sparsity and number of segments
- Open-source software: Cardinal
- Free, open-source statistical software for MS imaging experiments
- More statistical methods and parallel computation
- Open-source software: matter
- Enables statistical method development with larger-than-memory datasets
- Extension to sparse datasets and “processed” imzML format
- General conclusions
- Combined contributions enable scalable statistical methods for MS imaging
- Development of statistically-focused computational infrastructure alongside
new statistical methods is vital in rapidly advancing areas
ACKNOWLEDGEMENTS
48
Purdue EMBL
Theodore Alexandrov
Purdue
Graham Cooks and lab Livia Eberlin Kevin Kerian Christina Ferreira
Stanford
Parag Mallick Mark Stolowitz Uma Kota Stephanie van de Ven April Harry
- Advanced BioImaging
Systems
- Canary Center at
Stanford for Cancer Early detec9on
- NIH-R21
Olga Vitek
Northeastern
- NSF-GRFP
- NSF-SI2-SSE
- NSF-BIO-DBI
- Sy and Laurie Sternberg
Interdisciplinary Chair
Robert Ness Meena Choi Mike Cheng Ting Huang
ADDITIONAL SLIDES
SHRUNKEN T-STATISTICS MEASURE IMPORTANCE OF FEATURES IN DISTINGUISHING A SEGMENT
Tibshirani, Hastie, et al., Statistical Science, 2003
Measure difference between segment mean spectrum and
- verall mean spectrum
750 800 850 900 5 10 15 20 25 30
PIGII_206 mean spectrum (brain)
750 800 850 900 −40 20 40 60
- ●
- ●
- ● ●
- Also use spectra
from nearby pixels when comparing to mean spectrum
Proposed
Intensity t-statistic centroid (mean spectrum) t-statistics m/z m/z
STATISTICAL REGULARIZATION REMOVES UNINFORMATIVE FEATURES
Tibshirani, Hastie, et al., Statistical Science, 2003
Shrink t-statistics toward 0 with a shrinkage penalty (regularization) Shrink mean spectra accordingly — uninformative features are dropped
750 800 850 900 −20 10 30
- 750
800 850 900 5 10 15 20 25 30
Intensity t-statistic shrunken centroid shrunken t-statistics m/z m/z