Marker Based Infinitesimal Model for Quantitative Trait Analysis
Shizhong Xu Department of Botany and Plant Sciences University of California Riverside, CA 92521
Marker Based Infinitesimal Model for Quantitative Trait Analysis - - PowerPoint PPT Presentation
Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences University of California Riverside, CA 92521 Outline Quantitative trait and the infinitesimal model Infinitesimal
Shizhong Xu Department of Botany and Plant Sciences University of California Riverside, CA 92521
1 1
p j jk k j k j jk k j k L j j j
L j j j j j
1 1 1
( ) ( ) ( ) ( )
j jk k j k L j j j m j j k k k j k m j jk k j k
y Z y Z d y Z y Z
Dense markers Bin Bin
1
k
p jk j h k
1
k k
p jk j h k jk j k
k
jk j k
jk
1
k k k
m j jk k j k jk j k k k k k
2
1 ( ) number of crossovers, inversely related to linkage disequilibrium 1 lim var( ) , high linkage disequilibrium (F ) 2 lim var( ) 0, low linkage disequilibrium Larger v
k k k
jk j k k jk jk
Z Z d Z Z
ar( ) means higher power
jk
Z
2 2 2 2 2 2
2 1 1 1 lim var( ) lim lim 4 2 2 2 1 1 lim var( ) lim lim 4 2
k k k k k k k k k k
k jk k jk k k
e Z e e Z e
var( ) 0.5 choose var( ) as close to 0.5 as possible but with the number of bins small enough to be handled by a program for a given sample size
jk k jk
Z Z
1 1 1
k k
m j jk k j k p jk j h k p k k k h k
1 1 * * 1 1 1
1 Unweighted: ( ); ( ) 1 Weighted: ( ) ( ); ( ) ( )
k k k k
p p jk j k k k h h k p p jk j k h h k
Z Z h p h p Z w h Z h w h h p
* * 1 m j jk k j k
y Z
1 1 1
1 ˆ ˆ Define | | = mean(| |) ˆ where is the least squares estimate
The weight for marker is defined as ˆ ˆ ˆ ˆ mean(| |) ˆ | |
k k
p k h h k h k h h h k h p h h
c b b p b h k h p b b w c b b b
* * * * 2 1 2 2 1 2 2 1
1 var( ) var ( ) 2 cov ( ), ( ) 1 1 1 2 (1 2 ) 2 2 1 1 , when no linkage disequilibrium (1 2 ) 2
k k k k k
p p jk j j j h l h k p p h h l hl h l h k p h hl h k
Z Z h Z h Z l p w w w p w p
* 1 1 1 1 * 1
( ) ˆ ( ) | | ˆ ( ) where (a constant) ˆ ˆ ˆ | | 0 as long as one
k k k k
p p p k h k k k h h h h h h p k h h h
h w h c c p b b h b b b
2 1 2 1 2
1 ˆ ( ) , Mean Squared Error 1 ( ) , Phenotypic Variance , Squared Correlation
n j j j n j j j
MSE y y n MSY y y n MSY MSE R MSY
500 1000 1500 2000 2500
2 4 6 8
true values
Effect Position (cM)
500 1000 1500 2000 2500
(a) Δ = 1cM m = 2400 p = 50
2 6
Effect
500 1000 1500 2000 2500
(b) Δ = 2cM m = 1200 p = 100
2 6
Effect
500 1000 1500 2000 2500
(c) Δ = 5cM m = 480 p = 250
2 6
Effect
500 1000 1500 2000 2500
(d) Δ = 10cM m = 240 p = 500
2 6
Effect
500 1000 1500 2000 2500
(e) Δ = 20cM m = 120 p = 1000
2 6
Effect
500 1000 1500 2000 2500
(f) Δ = 40cM m = 60 p = 2000
2 6
Effect Position (cM)
500 1000 1500 2000 2500
(g) Δ = 100cM m = 24 p = 5000
2 6 500 1000 1500 2000 2500
(h) Δ = 150cM m = 16 p = 7500
2 6 500 1000 1500 2000 2500
(i) Δ = 300cM m = 8 p = 15000
2 6 500 1000 1500 2000 2500
(j) Δ = 600cM m = 4 p = 30000
2 6 500 1000 1500 2000 2500
(k) Δ = 1200cM m = 2 p = 60000
5 10 500 1000 1500 2000 2500
(l) Δ = 2400cM m = 1 p = 120000
5 10
Position (cM)
500 1000 1500 2000 2500
2 4 6 8
true values
Effect Position (cM)
500 1000 1500 2000 2500
Δ = 20cM m = 120 p = 1000
2 4 6 8
Effect Position (cM)
20 40 60 80 100 40 50 60 70 80 90
(a)
2=10, h 2=0.638
Mean squared error
20 40 60 80 100 50 60 70 80 90 100
(b)
2=20, h 2=0.581
Bin size (cM) Mean squared error
20 40 60 80 100 90 100 120 140
(c)
2=50, h 2=0.457
20 40 60 80 100 140 160 180 200 220
(d)
2=100, h 2=0.337
Bin size (cM)
n=200 n=300 n=400 n=500 n=1000
Figure 1. Mean squared error expressed as a function of bin size for Design I. The mean squared errors were obtained from 100 replicated simulations. The overall proportion of the phenotypic variance contributed by the 20 simulated QTL was calculated using
2 2
64.41/ (64.41 26.53 ) h
. Each panel contains the result of five different sample sizes (n). The phenotypic variance of the simulated trait is indicated by the light horizontal line in each panel (each panel represents one of the four different scenarios).
4.0 4.5 5.0 5.5 50 100 150 Mean squared error Bin size (log10 cM)
Figure 6. Mean squared error for the simulated data under design IV (low linkage disequilibrium) plotted against the bin size. The sample size of the simulated population was
500 n
. The residual error variance was
2
20
, corresponding to
2
0.777 h
. The filled circles indicate the MSE under the infinitesimal model while the open circles indicate the MSE under the adaptive infinitesimal model. The dashed horizontal line represents the phenotypic variance of the simulated trait (89.71).
Yu et al. 2007, PLoS One 6(3) e17595
Figure 5. The MSE (curve in the left panel) and the R-square (curve in the right panel)
bins). The black dashed horizontal line in the left panel is the phenotypic variance. The red dashed horizontal line in the left panel is the MSE of the natural bin (without breakpoints with bin) analysis. The red dashed horizontal line in the right panel is the R-square of the natural bin analysis. R-square increased from 0.42 to 0.55.
5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 400 450 500 550 600 650 700 Bin size (log10 bp) Mean squared error
Figure 7. Mean squared error for the carcass trait of beef cattle plotted against the bin size. The filled circles indicate the MSE under the infinitesimal model while the open circles indicate the MSE under the adaptive infinitesimal model. The dashed horizontal line represents the phenotypic variance of the simulated trait (670.36). The blue horizontal line along with the two dotted lines represents the MSE and the standard deviation of the MSE in the situation where the bin size was one (one marker per bin). The sample size was
921 n
and the number of SNP markers was
40809 p
. The bin size was defined as log10 bp. For example, the largest bin size
10
log bp 8.5
means that the bin size contains
5
8.5 10
base pairs.
5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 400 450 500 550 600 650 700 Bin size (log10 bp) Mean squared error
Figure 7. Mean squared error for the carcass trait of beef cattle plotted against the bin size. The filled circles indicate the MSE under the infinitesimal model while the open circles indicate the MSE under the adaptive infinitesimal model. The dashed horizontal line represents the phenotypic variance of the simulated trait (670.36). The blue horizontal line along with the two dotted lines represents the MSE and the standard deviation of the MSE in the situation where the bin size was one (one marker per bin). The sample size was
921 n
and the number of SNP markers was
40809 p
. The bin size was defined as log10 bp. For example, the largest bin size
10
log bp 8.5
means that the bin size contains
5
8.5 10
base pairs.
5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 400 450 500 550 600 650 700 Bin size (log10 bp) MSE 0.294 0.3 0.298 0.281 0.305 0.326 0.324 0.324 0.333 0.306 0.204 0.122
0.001 0.001
0.026 0.017 0.024 0.026 0.039 0.085 0.075 0.087 Marker Analysis p = 40809 MSE = 600 R^2 = (670-600)/670 = 0.09
0.09
Table 1. Mean squared error (MSE) and R-square values obtained from the 10-fold cross validation analysis for the beef carcass trait using five competing models and the proposed bin model. Model MSE2 R-square eBayes 648.11 0.0332 G-Blup 632.46 0.0565 BayesB-1 655.59 0.0220 BayesB-21 658.19 0.0182 Lasso 603.75 0.0994 Bin model 447.10 0.3330
1The Pi value for BayesB-2 is set at 0.95. 2The phenotypic variance of the beef carcass trait is 670.36. The magnitude of MSE value
smaller than 670.36 indicates the effectiveness of the model predictability.