SLIDE 1 Characterization, Modeling, and Characterization, Modeling, and Simulation Simulation
- f Mouse
- f Mouse Microarray
Microarray Data Data
David S. Lalush Bioinformatics Research Center
North Carolina State University
SLIDE 2 Acknowledgments
– Jeff Tucker (NIEHS) – Pierre Bushel (NIEHS) – Bruce Weir (NCSU)
- Funded by K01 HG02428, National Human
Genome Research Institute
SLIDE 3 Outline
- Microarray Simulation Project
- Characterization of Microarray Images
- Results of Characterization
- Simulations
- Conclusion
SLIDE 4 Outline
- Microarray Simulation Project
- Characterization of Microarray Images
- Results of Characterization
- Simulations
- Conclusion
SLIDE 5
Microarray in Diagnosis
Type I tumors Type II tumors Gene expression pattern Gene expression pattern Microarray Microarray
SLIDE 6
Microarray in Diagnosis
Unknown tumor Gene expression pattern Microarray Type I or type II? Probability of misclassification?
SLIDE 7 Research Focus
- Evaluating classification methods
- Studying variability in microarray data
Problems:
- Many replications are required to evaluate error rates.
- Microarray experiments are expensive.
- True patterns are unknown in real data.
SLIDE 8 Microarray Simulation
- Creating a realistic simulation of microarray
data
- Accounting for various sources of variability
in the system
Advantages:
- Generates many replications cheaply.
- True patterns are known.
- Can control sources of variability.
SLIDE 9
Microarray System
Slide Printing Sample Preparation Hybridization Scanning Image Processing Data Analysis
SLIDE 10
Simulation Model
Sample Slide Pin Array Printing And Hybridization Scanning
SLIDE 11 Simulation Model
Sample Slide Pin Array Printing And Hybridization Scanning
- Gene expression variation modeled
as multivariate normal
- Global expression variations
modeled as normal
SLIDE 12 Simulation Model
Sample Slide Pin Array Printing And Hybridization Scanning
- Background level modeled as
normal (dye-dependent)
- Defects modeled as 2D causal
Markov random field
SLIDE 13 Simulation Model
Sample Slide Pin Array Printing And Hybridization Scanning
- Spot size, shape, and orientation
modeled as normal
- Spot defects modeled with 2D causal
Markov random field
SLIDE 14 Simulation Model
Sample Slide Pin Array Printing And Hybridization Scanning
- Instantiates spots based on properties
from sample, slide, and pin
SLIDE 15 Simulation Model
Sample Slide Pin Array Printing And Hybridization Scanning
- Creates discretized image based on
spots, SNR, gain, resolution, and blur parameters
SLIDE 16 Characterization
- Characterization of existing microarray
images
– Spot properties (size, shape, uniformity) – Pin properties (spot uniformity) – Slide properties (background, signal-to-noise) – Gene properties (mean, variance, covariance)
SLIDE 17 Outline
- Microarray Simulation Project
- Characterization of Microarray Images
- Results of Characterization
- Simulations
- Conclusion
SLIDE 18 Characterization
- Characterization of mouse kidney dataset
– Six mice – Four slides each (2x2 fluor flip) – 24 slides in all – 5520 spots in 16 blocks, 4x4 block pattern
SLIDE 19 Characterization of Spots
SLIDE 20 Characterization of Spots
- Step 2: Spot Morphology Measures
Cast rays from centroid Radius Area Eccentricity
SLIDE 21 Characterization of Spots
- Step 3: Spot Intensity Measures
– Mean and standard deviation of spot pixels – Mean and standard deviation of background pixels
SLIDE 22 Characterization of Spots
- Step 4: Secondary Intensity Measures
2 2
) (
background signal
background signal σ σ + −
Separability
SLIDE 23 Characterization of Spots
- Step 4: Secondary Intensity Measures
signal
signal
σ
Spot Uniformity
SLIDE 24 Characterization of Spot Defects
- Spots often exhibit characteristic
nonuniformities
– Low center – Spot breaks
SLIDE 25
Characterization of Spot Defects
Normal region Defect region
Consider each spot to have two regions
SLIDE 26
Characterization of Spot Defects
State 0: N State 1: D
Each region acts as a hidden state. Each state has its own distribution of emitted intensities.
SLIDE 27
Characterization of Spot Defects
The probability of a pixel being in a given state depends on its neighbors. N N D D X P(X | N,N,D,D)
SLIDE 28 Characterization of Spot Defects
State 0: N State 1: D
Region Model (2D causal MRF):
- 16 parameters for state transition
- 2 parameters for intensity of D region pixels
relative to N region (mean, s.d.)
SLIDE 29 Characterization of Spot Defects
State 0: N State 1: D
Applying the Region Model
Pixel is in D region if:
- It is in the spot
- It is below the spot average intensity in BOTH channels
SLIDE 30 Characterization of Spot Defects
State 0: N State 1: D
Applying the Region Model
- Smooth region boundary
- Compute the 18 parameters for each spot
SLIDE 31 Characterization of Background
– Modeled as stationary across slide
– Marks, scratches, bright spots, other features – Modeled with 2D Markov random field
SLIDE 32 Characterization of Background
- Classify all background pixels as normal or defect
– Defect is 2σ above background mean
- Compute statistics on normal background
- Apply 2D MRF to model defect state
– Similar to region model – Intensities are modeled as beta distribution
- Measures taken only by slide
0.1 0.2 0.3 0.4 0.5 0.6 0.7
0.01 0.02 0.03 0.04 0.05
Relative Defect Intensity
Probability
SLIDE 33 Characterization of Gene Expression
- Multivariate normal distribution for each
sample (test or reference)
– Mean vector – Covariance matrix
- Linear model to account for global effects
from slide to slide and dye effects
Sample = (mean gene expression) + slope * (slide perturbation) + (variable expression)
SLIDE 34 Characterization of Gene Expression
- Problem: Covariance matrix is BIG (5200x5200)
– In simulation, we will have to diagonalize it.
- Model the most significant correlations
– Compute correlations between each pair of genes on each slide – Cluster genes by correlation distance – Each gene in a cluster has greater than .48 absolute correlation with every other gene in the cluster
SLIDE 35 Analyzing Characterization Data
– By slide (fixed) – By pin (random)
- Which properties varied more?
– By slide – By pin – By spot
SLIDE 36 Analyzing Characterization Data
- Spot morphology measures
- Spot secondary intensity measures
- Spot defect model parameters
- Background defect model parameters (by
slide measurement only - no ANOVA)
Only spots with separability > 1 used in ANOVA
SLIDE 37 Outline
- Microarray Simulation Project
- Characterization of Microarray Images
- Results of Characterization
- Simulations
- Conclusion
SLIDE 38
Results
Sometimes the images have their own story to tell.
SLIDE 39 Results: Spot Morphology
- Most variation (75% for size measures) was
attributed to variation by spot
- Pins behaved similarly (mostly)
- Slides showed some differences in last eight
slides (mice five and six)
SLIDE 40 Results: Spot Morphology
1 2 3 4 5 6 7 8 9
Pin Number Radius (pixels)
Spot size vs. Pin Number
SLIDE 41 Results: Spot Morphology
Spot size vs. Slide Number
1 2 3 4 5 6 7 8 9 Slide Number Radius (pixels)
Mouse 1 Mouse 2 Mouse 3 Mouse 4 Mouse 5 Mouse 6
SLIDE 42 Results: Spot Intensities
- Most variation in separability (83-90%) was
attributed to variation by spot
- Spot uniformity varied considerably by
slide, mostly due to last eight slides
SLIDE 43 Results: Spot Intensities
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Slide Number Uniformity (532 nm)
Mouse 1 Mouse 2 Mouse 3 Mouse 4 Mouse 5 Mouse 6
Spot uniformity (532nm) vs. Slide Number
SLIDE 44 Results: Spot Defect MRF
- The 16 region transition probability
parameters varied by pin
– Model the MRF as a property of a pin, not a slide
- The mean intensity of defect region was
strongly dependent on the pin.
- Mean intensity of defect region varied
considerably by slide.
SLIDE 45 Results: Spot Defect MRF
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Slide Number Low region mean intensity relative to normal region mean
Mouse 1 Mouse 2 Mouse 3 Mouse 4 Mouse 5 Mouse 6
Defect region intensity vs. Slide Number
SLIDE 46 Results: Background MRF
- Last eight slides had more intense
background defects
- Last eight also had higher probabilities of
generating a defect
SLIDE 47 Results: Background MRF
0.1 0.2 0.3 0.4 0.5 Slide Number Intensity of Background Defects Relative to Background Mean
Mouse 1 Mouse 2 Mouse 3 Mouse 4 Mouse 5 Mouse 6
Background defect intensity vs. Slide Number
SLIDE 48 Results: General
- Slide-pin interactions were small (<5% of
variance in all cases)
- Therefore, modeling of slide and pin effects
separately is justified.
SLIDE 49 Results: Summary
- Characterization shows differences in the
properties of slides for mice five and six:
– Spots were more likely to be broken. – Spot breaks were more severe. – Background defects were more numerous. – Background defects were more intense.
Did this impact the estimated mouse-to-mouse variation?
SLIDE 50
Results
Slide 2 (Mouse 1) Slide 19 (Mouse 5)
SLIDE 51 Outline
- Microarray Simulation Project
- Characterization of Microarray Images
- Results of Characterization
- Simulations
- Conclusion
SLIDE 52
Simulations
From mouse 1-4 properties Slide 2 (Mouse 1) Simulation
SLIDE 53
Simulations
From mouse 1-4 properties Slide 2 (Mouse 1) Simulation
SLIDE 54
Simulations
From mouse 5,6 properties Slide 19 (Mouse 5) Simulation
SLIDE 55
Simulations
From mouse 5,6 properties Slide 19 (Mouse 5) Simulation
SLIDE 56 Outline
- Microarray Simulation Project
- Characterization of Microarray Images
- Results of Characterization
- Simulations
- Conclusion
SLIDE 57 Conclusions
- Characterization of microarray images can
reveal important effects
– In the mouse kidney set, the slides from two mice may have been handled differently.
- Realistic simulation of microarray images
may allow us to estimate the effects of variations due to parts of the microarray system.
SLIDE 58 To Do List
- Noncausal MRF for spot and background
defects
- Multiscale modeling of large defects
- Simulation study to estimate effects of spot
uniformity and background defects