Fitting Large-Scale Spatial Models with Applications to Microarray - PowerPoint PPT Presentation

Fitting Large-Scale Spatial Models with Applications to Microarray Data Analysis Stephan R. Sain Reinhard Furrer Department of Mathematics Geophysical Statistics Project University of Colorado at Denver National Center for Atmospheric Research Outline • Microarrays and Climate • An Additive Spatial Model • Model Fitting • Examples

Introduction • Many spatial problems are inherently multivariate – more than one measurement or observations at each spatial location. – Advent of GIS, modern computing, and methodological advances. • Many spatial problems involve lots of spatial locations. – Problems in constructing and working with design and covariance matrices. • Propose a simple multivariate spatial model and discuss some strategies for fitting and examining the results.

Microarrays and Climate • Combining observed data and climate models – Examine model behavior as well as predictions of climate change. – Precipitation and temperature for sixteen models on a 5 ◦ grid. � 2 × 36 × 72 ≈ 5000 observations per chip/model. • Microarray analysis – Build a profile of differentially expressed genes relating to cerebral vascular malformations. – Roughly 20 chips with three disease groups (control, AVM, CCM) with each chip a 640 × 640 array � 2 × 16 × 12 K ≈ 400 K observations per chip.

A Multivariate, Additive Spatial Model k ] ′ where each Y i denotes one spatial variable (one climate • Let Y = [ Y ′ 1 . . . Y ′ model, one microarray chip, etc.). • Then, Y = X β + h + ǫ where – X β represent fixed effects – h represents a random, zero-mean spatial process – ǫ represents a random error process orthonormal to h .

A Multivariate, Additive Spatial Model • The structure of X includes both chip specific and gene specific (across chip) terms:   · · · 1 R C 0 0 G . .   . . . . 0 1 R C   X =  ,   . ... . .  0 1 R C G where – 1 is a vector of 1s – R and C represent row and column effects – G indicate gene effects.

A Multivariate, Additive Spatial Model • Further, the model suggests E [ Y ] = X β Var[ Y ] = Σ h + Σ ǫ where     σ 2 · · · · · · K 1 0 0 1 I 0 0  .   .  . . σ 2 . . 0 K 2 0 2 I     Σ h = Σ ǫ =     . . ... ... . . . .     σ 2 0 K k 0 k I where – K i = K ( θ i ) represents a chip specific spatial covariance matrix parameterized by θ i – σ 2 i are chip specific variances (nugget).

Backfitting • Ideally, one could use REML to fit covariance parameters and then estimates of β and predictions of the random effects follow directly: ˆ β = ( X ′ V − 1 X ) − 1 X ′ V − 1 Y (generalized least-squares) h = Σ h V − 1 ( Y − X ˆ ˆ β ) where V = Σ h + Σ ǫ • Direct computation of the design and covariance matrices impractical, not to mention the matrix computations... • Quick overview of backfitting from additive models...

Backfitting • We estimate iteratively the fixed effects and the spatial process. • Algorithm: h (0) be an inital guess and put j = 1 Let � [1] � � − 1 X ′ � h ( j − 1) � β ( j ) = [2a] � Y − � X ′ X [2b] Estimate covariance parameters, then h ( j ) = Σ h V − 1 � β ( j ) � � Y − X � [3] Put j = j + 1 and repeat [2a] and [2b] until convergence • To prove equivalence at convergence, plug [2b] into [2a] . Straightforward manipulations lead to the generalized least-squares estimator. • Equivalent to universal kriging.

Backfitting • Goal: computation in R on a computer with 2 GBytes RAM. • We perform the regression step iteratively on the chip specific effects and the gene effects: � � − 1 ( 1RC ) ′ � h ( j − 1) � β ( j ) � Y − � ( 1RC ) ′ ( 1RC ) [2a’] Chip = � � − 1 G ′ � h ( j − 1) � β ( j ) � Y − � G ′ G [2a’’] Gene = [2a’’’] Repeat [2a’] and [2a’’] until convergence • We need to reconstruct the individual design matrices for each step. • Calculation time for entire model ≈ 2 minutes per iteration (Xeon i686, Linux). �� β ( j − 1) � β ( j ) − � < 10 − 4 for j ≥ 4. • Quick convergence: MSE

Sparse Matrix Manipulation • Matrices are stored in a sparse format. • Design matrix X contains only { 0 , 1 } and Σ h has a lot of zeros due to tapering. • Illustration with “small” sub-design matrix C (Base and SparseM library in R): Sparse format Full format Sum contr. treatment contr. Storage of C : x Calculate C ′ C Storage of C ′ C : x Solve ( C ′ C ) x = v : s s s

Covariance Tapering • Introduce sparseness structure in covariance matrix K i with some taper function. • Carefully chosen taper preserves asymptotic optimality with kriging (Furrer et al. 2004, submitted). Covariance 0.0 0.2 0.4 0.6 0.8 Lag Exponential (green), spheric (yellow), Tapered = exponential × spheric (red). • Tapering with range of 2 (eight neighbors): Nonzero elements in K i : 3,614,762 (0.002%) Nonzero elements in Cholesky factor of K i : 67,070,820 (0.040%) With 12 and more neighbors, more than 2 30 nonzero elements in Cholesky factor.

Microarray Example – A Single Chip • Few missing values (1.5%). • Add blurring to recompense rounding. • . . .

Microarray Example – A Single Chip • Estimate the covariance structure. • Fit an exponential covariance (range, sill, nugget) = (1 . 528 , 0 . 487 , 0 . 061). • Taper with a spherical covariance with range 2. + x + empirical horizontal x empirical vertical 0.4 empirical off−axis o fitted exponential + tapered covariance: exp*spher 0.2 x taper covariance o o x + o o o o x o o o + o o o x o o o o o o o o o + o o o x o o o o o o o o x o o + o o o o o x 0.0 o o + o o o o o o o o + o o o o 0 1 2 3 4 5 6 7 lag

Microarray Example – A Single Chip Row/Column effects Column effects 0.2 −0.2 0 100 200 300 400 500 600 Row effects 0.2 −0.2 0 100 200 300 400 500 600

Microarray Example – A Single Chip Gene effects (normed on the right) 15 4 3 10 Miss match Miss match 2 5 1 0 0 −1 −5 −1 0 1 2 3 4 −5 0 5 10 15 Perfect match Perfect match

Microarray Example – A Single Chip QQ-plots of gene effects (normed on the right) 4 15 3 10 Sample Quantiles Sample Quantiles 2 99% 99% 5 1 95% 95% 75% 75% 0 0 25% 25% 5% 5% 1% 1% −1 −4 −2 0 2 4 −4 −2 0 2 4 Theoretical Quantiles Theoretical Quantiles

Microarray Example – A Single Chip Y = mean + row effects + column effects + spatial process + gene effects + error

Microarray Example – More Chips. . . • The algoritm is simply extended according: � � − 1 ( 1RC ) ′ � � β ( j ) h ( j − 1) � Y Chip i − � [2a ∗ ] ( 1RC ) ′ ( 1RC ) Chip i = , i = 1 , . . . , k Chip i � � − 1 G ′ � � β ( j ) h ( j − 1) � Y Chip i − � [2a ∗∗ ] G ′ G Gene = Chip i [2b ∗ ] For i = 1 , . . . , k , estimate covariance parameters, then � � − 1 � � h ( j ) β ( j ) β ( j ) � Y Chip i − ( 1RC ) � Chip i − G � K i + σ 2 Chip i = K i i I Gene • Careful programming in R and a few Fortran routines allows calculation on a Xeon processor with 2 GBytes RAM. Results within a few minutes: ≈ 2 × k minutes per iteration.

Microarray Example – Two Chip Y − mean = row effects + column effects + spatial process + gene effects + error Chip 1 Difference Chip 2

Microarray Example – Two Chips QQ-plots of gene effects 4 3 Sample Quantiles 2 99% 1 95% 75% 0 25% 5% 1% −1 −4 −2 0 2 4 Theoretical Quantiles

Microarray Example – Two Chips Difference in gene effects

Fitting Large-Scale Spatial Models with Applications to Microarray - PowerPoint PPT Presentation

Fitting Large-Scale Spatial Models with Applications to Microarray Data Analysis Stephan R. Sain Reinhard Furrer Department of Mathematics Geophysical Statistics Project University of Colorado at Denver National Center for Atmospheric

Track fitting, vertex fitting and Track fitting, vertex fitting and Track fitting, vertex fitting

Week 2 Video 5 Cross-Validation and Over-Fitting Over-Fitting Ive mentioned over-fitting a

Lecture 11 Fitting ARIMA Models 10/10/2018 1 Model Fitting Fitting ARIMA For an

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

Resource 1: What is spatial? presentation notes Section Section text Notes 1. Spatial

Broadening the Study of Spatial Intelligence Mary Hegarty University of California, Santa

A Spatial Cloaking Framework A Spatial Cloaking Framework A Spatial Cloaking Framework A Spatial

Fitting Agent Fitting Agent- -Based Models to Based Models to Historical Networks Historical

Lecture 19 Fitting CAR and SAR Models Colin Rundel 03/29/2017 1 Fitting areal models 2 CAR

Lecture 18 Fitting CAR and SAR Models Colin Rundel 11/07/2018 1 Fitting areal models Revised

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Functions and Data Fitting COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

Over fitting distribution functions over Bayesian Regression / " ' i diggllloise dist

Fitting high resolution structures into low resolution EM maps Michael Rossmann Purdue

Fitting a Line, Residuals, and Correlation October 28, 2019 October 28, 2019 1 / 36 Fitting a

Unit 1: Data Fitting Motivation Data fitting: Construct a continuous function that represents

Spatial Tools for Case Selection Using LISA Statistics to Design Mixed-Methods Research Imke

Can the Spatial Distribution of Damping be Measured? S. A DHIKARI , J. W OODHOUSE AND A. S RIKANTH

on Remote Sensing and Ground- Based Data Stefan Stamenov, Bulgaria 1 Location of the study area

MURB strategy Why prepare for EV's EV's are not a fad - they have decided advantages apart from

Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction A. Agafonov,

in Burkina Faso TSX: ROXG Investor Presentation | November 2018 TSX: ROXG 1 Cautionary

Broadband Network and future Networks (NGN, IMT-2020) Mamadou Oury SAKHO Guinea (Republic of)

Habitat III National Reports for the Asia-Pacific Region Bangkok, January 2016 National Reports

Fitting Large-Scale Spatial Models with Applications to Microarray - PowerPoint PPT Presentation

Fitting Large-Scale Spatial Models with Applications to Microarray Data Analysis Stephan R. Sain Reinhard Furrer Department of Mathematics Geophysical Statistics Project University of Colorado at Denver National Center for Atmospheric

Track fitting, vertex fitting and Track fitting, vertex fitting and Track fitting, vertex fitting

Week 2 Video 5 Cross-Validation and Over-Fitting Over-Fitting Ive mentioned over-fitting a

Lecture 11 Fitting ARIMA Models 10/10/2018 1 Model Fitting Fitting ARIMA For an

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

Resource 1: What is spatial? presentation notes Section Section text Notes 1. Spatial

Broadening the Study of Spatial Intelligence Mary Hegarty University of California, Santa

A Spatial Cloaking Framework A Spatial Cloaking Framework A Spatial Cloaking Framework A Spatial

Fitting Agent Fitting Agent- -Based Models to Based Models to Historical Networks Historical

Lecture 19 Fitting CAR and SAR Models Colin Rundel 03/29/2017 1 Fitting areal models 2 CAR

Lecture 18 Fitting CAR and SAR Models Colin Rundel 11/07/2018 1 Fitting areal models Revised

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Functions and Data Fitting COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

Over fitting distribution functions over Bayesian Regression / &quot; ' i diggllloise dist

Fitting high resolution structures into low resolution EM maps Michael Rossmann Purdue

Fitting a Line, Residuals, and Correlation October 28, 2019 October 28, 2019 1 / 36 Fitting a

Unit 1: Data Fitting Motivation Data fitting: Construct a continuous function that represents

Spatial Tools for Case Selection Using LISA Statistics to Design Mixed-Methods Research Imke

Can the Spatial Distribution of Damping be Measured? S. A DHIKARI , J. W OODHOUSE AND A. S RIKANTH

on Remote Sensing and Ground- Based Data Stefan Stamenov, Bulgaria 1 Location of the study area

MURB strategy Why prepare for EV's EV's are not a fad - they have decided advantages apart from

Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction A. Agafonov,

in Burkina Faso TSX: ROXG Investor Presentation | November 2018 TSX: ROXG 1 Cautionary

Broadband Network and future Networks (NGN, IMT-2020) Mamadou Oury SAKHO Guinea (Republic of)

Habitat III National Reports for the Asia-Pacific Region Bangkok, January 2016 National Reports

Over fitting distribution functions over Bayesian Regression / " ' i diggllloise dist