Marker Based Infinitesimal Model for Quantitative Trait Analysis - - PowerPoint PPT Presentation

marker based infinitesimal model for quantitative trait
SMART_READER_LITE
LIVE PREVIEW

Marker Based Infinitesimal Model for Quantitative Trait Analysis - - PowerPoint PPT Presentation

Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences University of California Riverside, CA 92521 Outline Quantitative trait and the infinitesimal model Infinitesimal


slide-1
SLIDE 1

Marker Based Infinitesimal Model for Quantitative Trait Analysis

Shizhong Xu Department of Botany and Plant Sciences University of California Riverside, CA 92521

slide-2
SLIDE 2

Outline

  • Quantitative trait and the infinitesimal model
  • Infinitesimal model using marker information
  • Adaptive infinitesimal model
  • Simulation studies
  • Rice and beef cattle data analyses
slide-3
SLIDE 3

Outline

  • Quantitative trait and the infinitesimal model
  • Infinitesimal model using marker information
  • Adaptive infinitesimal model
  • Simulation studies
  • Rice and beef cattle data analyses
slide-4
SLIDE 4

Quantitative Trait

slide-5
SLIDE 5

Quantitative Genetics Model Phenotype = Genotype + Environment

slide-6
SLIDE 6

Infinitesimal Model

  • Infinite number of genes
  • Infinitely small effect of each gene
  • Effect of an individual gene is not recognizable
  • Collective effect of all genes are studied using

pedigree information (genetic relationship)

  • Best linear unbiased prediction (BLUP)
slide-7
SLIDE 7

Outline

  • Quantitative trait and the infinitesimal model
  • Infinitesimal model using marker information
  • Adaptive infinitesimal model
  • Simulation studies
  • Rice and beef cattle data analyses
slide-8
SLIDE 8

Marker Based Infinitesimal Model

1 1

( ) ( )

p j jk k j k j jk k j k L j j j

y Z y Z y Z d            

  

        

  

slide-9
SLIDE 9

Different from Longitudinal Data Analysis

( ) ( ) ( ) ( ) ;

L j j j j j

y Z d y t t t                

slide-10
SLIDE 10

Numerical Integration

slide-11
SLIDE 11

Bin Effect Model

1 1 1

( ) ( ) ( ) ( )

j jk k j k L j j j m j j k k k j k m j jk k j k

y Z y Z d y Z y Z                 

   

            

   

slide-12
SLIDE 12

Bin Effects

Dense markers Bin Bin

1

1 ( )

k

p jk j h k

Z Z h p

slide-13
SLIDE 13

Recombination Breakpoint Data

1

1 Marker: ( ) 1 Breakpoint: ( )

k k

p jk j h k jk j k

Z Z h p Z Z d  

 

  

 

slide-14
SLIDE 14

1 ( )

k

jk j k

Z Z d  

  

8 2 1 0.8 10 10

jk

Z     

slide-15
SLIDE 15
slide-16
SLIDE 16

What Does a Bin Effect Represent?

1

1 ( ) 1 ( ) ( ) size of bin k uniform variable

k k k

m j jk k j k jk j k k k k k

y Z Z Z d d d             

   

            

   

slide-17
SLIDE 17

Assumptions of the Infinitesimal Model

  • High linkage disequilibrium within a bin
  • Homogeneous genetic effect within a bin
slide-18
SLIDE 18

High Linkage Disequilibrium

2

1 ( ) number of crossovers, inversely related to linkage disequilibrium 1 lim var( ) , high linkage disequilibrium (F ) 2 lim var( ) 0, low linkage disequilibrium Larger v

k k k

jk j k k jk jk

Z Z d Z Z  

    

     

ar( ) means higher power

jk

Z

slide-19
SLIDE 19

Range of Var(Z)

2 2 2 2 2 2

2 1 1 1 lim var( ) lim lim 4 2 2 2 1 1 lim var( ) lim lim 4 2

k k k k k k k k k k

k jk k jk k k

e Z e e Z e

                   

              var( ) 0.5 choose var( ) as close to 0.5 as possible but with the number of bins small enough to be handled by a program for a given sample size

jk k jk

Z Z      

slide-20
SLIDE 20

Outline

  • Quantitative trait and the infinitesimal model
  • Infinitesimal model using marker information
  • Adaptive infinitesimal model
  • Simulation studies
  • Rice and beef cattle data analyses
slide-21
SLIDE 21

Adaptive Model Relaxes the Two Assumptions

  • High linkage disequilibrium within a bin
  • prevent var(Z) from being zero
  • Homogeneous genetic effect within a bin
  • make all effects positive
slide-22
SLIDE 22

Redefine the Bin Size by the Number

  • f Markers Within a Bin

1 1 1

1 ( ) ( ) number of markers in bin k

k k

m j jk k j k p jk j h k p k k k h k

y Z Z Z h p p h p      

  

      

  

slide-23
SLIDE 23

Weighted Average Effect of a Bin

1 1 * * 1 1 1

1 Unweighted: ( ); ( ) 1 Weighted: ( ) ( ); ( ) ( )

k k k k

p p jk j k k k h h k p p jk j k h h k

Z Z h p h p Z w h Z h w h h p     

    

    

   

* * 1 m j jk k j k

y Z   

  

slide-24
SLIDE 24

Weight System

1 1 1

1 ˆ ˆ Define | | = mean(| |) ˆ where is the least squares estimate

  • f marker within bin

The weight for marker is defined as ˆ ˆ ˆ ˆ mean(| |) ˆ | |

k k

p k h h k h k h h h k h p h h

c b b p b h k h p b b w c b b b

  

   

 

slide-25
SLIDE 25

Weighted Var(Z*) > 0

* * * * 2 1 2 2 1 2 2 1

1 var( ) var ( ) 2 cov ( ), ( ) 1 1 1 2 (1 2 ) 2 2 1 1 , when no linkage disequilibrium (1 2 ) 2

k k k k k

p p jk j j j h l h k p p h h l hl h l h k p h hl h k

Z Z h Z h Z l p w w w p w p  

    

                                  

    

slide-26
SLIDE 26

Homogenization of Marker Effects Within Bin

* 1 1 1 1 * 1

( ) ˆ ( ) | | ˆ ( ) where (a constant) ˆ ˆ ˆ | | 0 as long as one

k k k k

p p p k h k k k h h h h h h p k h h h

h w h c c p b b h b b b         

    

       

   

slide-27
SLIDE 27

Outline

  • Quantitative trait and the infinitesimal model
  • Infinitesimal model using marker information
  • Adaptive infinitesimal model
  • Simulation studies
  • Rice and beef cattle data analyses
slide-28
SLIDE 28

Measurement of Prediction (Cross Validation)

2 1 2 1 2

1 ˆ ( ) , Mean Squared Error 1 ( ) , Phenotypic Variance , Squared Correlation

n j j j n j j j

MSE y y n MSY y y n MSY MSE R MSY

 

       

 

slide-29
SLIDE 29

Simulation Experiment

  • Genome size = 2,500 cM
  • Number of markers = 120,000
  • Marker interval = 0.02 cM
  • Cross validation (MSE)
  • Design I = 20 QTL
  • Design II = Clustered polygenic model
  • Design III = Polygenic model
  • Design IV = Design I with 2,500 x100 cM
slide-30
SLIDE 30

True QTL Effect

500 1000 1500 2000 2500

  • 6
  • 4
  • 2

2 4 6 8

true values

Effect Position (cM)

slide-31
SLIDE 31

Estimated Bin Effects

500 1000 1500 2000 2500

(a) Δ = 1cM m = 2400 p = 50

  • 6
  • 2

2 6

Effect

500 1000 1500 2000 2500

(b) Δ = 2cM m = 1200 p = 100

  • 6
  • 2

2 6

Effect

500 1000 1500 2000 2500

(c) Δ = 5cM m = 480 p = 250

  • 6
  • 2

2 6

Effect

500 1000 1500 2000 2500

(d) Δ = 10cM m = 240 p = 500

  • 6
  • 2

2 6

Effect

500 1000 1500 2000 2500

(e) Δ = 20cM m = 120 p = 1000

  • 6
  • 2

2 6

Effect

500 1000 1500 2000 2500

(f) Δ = 40cM m = 60 p = 2000

  • 6
  • 2

2 6

Effect Position (cM)

500 1000 1500 2000 2500

(g) Δ = 100cM m = 24 p = 5000

  • 6
  • 2

2 6 500 1000 1500 2000 2500

(h) Δ = 150cM m = 16 p = 7500

  • 6
  • 2

2 6 500 1000 1500 2000 2500

(i) Δ = 300cM m = 8 p = 15000

  • 6
  • 2

2 6 500 1000 1500 2000 2500

(j) Δ = 600cM m = 4 p = 30000

  • 6
  • 2

2 6 500 1000 1500 2000 2500

(k) Δ = 1200cM m = 2 p = 60000

  • 5

5 10 500 1000 1500 2000 2500

(l) Δ = 2400cM m = 1 p = 120000

  • 5

5 10

Position (cM)

slide-32
SLIDE 32

True and Estimated QTL Effect

500 1000 1500 2000 2500

  • 6
  • 4
  • 2

2 4 6 8

true values

Effect Position (cM)

500 1000 1500 2000 2500

Δ = 20cM m = 120 p = 1000

  • 6
  • 4
  • 2

2 4 6 8

Effect Position (cM)

slide-33
SLIDE 33

20 40 60 80 100 40 50 60 70 80 90

(a) 

2=10, h 2=0.638

Mean squared error

20 40 60 80 100 50 60 70 80 90 100

(b) 

2=20, h 2=0.581

Bin size (cM) Mean squared error

20 40 60 80 100 90 100 120 140

(c) 

2=50, h 2=0.457

20 40 60 80 100 140 160 180 200 220

(d) 

2=100, h 2=0.337

Bin size (cM)

n=200 n=300 n=400 n=500 n=1000

Figure 1. Mean squared error expressed as a function of bin size for Design I. The mean squared errors were obtained from 100 replicated simulations. The overall proportion of the phenotypic variance contributed by the 20 simulated QTL was calculated using

2 2

64.41/ (64.41 26.53 ) h    

. Each panel contains the result of five different sample sizes (n). The phenotypic variance of the simulated trait is indicated by the light horizontal line in each panel (each panel represents one of the four different scenarios).

slide-34
SLIDE 34

4.0 4.5 5.0 5.5 50 100 150 Mean squared error Bin size (log10 cM)

Figure 6. Mean squared error for the simulated data under design IV (low linkage disequilibrium) plotted against the bin size. The sample size of the simulated population was

500 n 

. The residual error variance was

2

20  

, corresponding to

2

0.777 h 

. The filled circles indicate the MSE under the infinitesimal model while the open circles indicate the MSE under the adaptive infinitesimal model. The dashed horizontal line represents the phenotypic variance of the simulated trait (89.71).

slide-35
SLIDE 35

Outline

  • Quantitative trait and the infinitesimal model
  • Infinitesimal model using marker information
  • Adaptive infinitesimal model
  • Simulation studies
  • Rice and beef cattle data analyses
slide-36
SLIDE 36

Rice Tiller Number (Yu et al. 2011)

  • Number of recombinant inbred lines: 210
  • Number of SNP: 270,820
  • Number of natural bins: 1619
  • Number of artificial bins: vary from small to large
  • Method: Empirical Bayes (eBayes)
  • Cross validation: MSE and R-square
slide-37
SLIDE 37

Yu et al. 2007, PLoS One 6(3) e17595

slide-38
SLIDE 38

Figure 5. The MSE (curve in the left panel) and the R-square (curve in the right panel)

  • f the rice tiller number trait analysis, expressed as a function of bin size (artificial

bins). The black dashed horizontal line in the left panel is the phenotypic variance. The red dashed horizontal line in the left panel is the MSE of the natural bin (without breakpoints with bin) analysis. The red dashed horizontal line in the right panel is the R-square of the natural bin analysis. R-square increased from 0.42 to 0.55.

slide-39
SLIDE 39

Beef Cattle Data Analysis

  • Trait = carcass weight
  • Number of beef = 922
  • Number of SNP markers = 40809
  • Number of chromosomes = 29
  • Methods = unweighted and weighted
slide-40
SLIDE 40

5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 400 450 500 550 600 650 700 Bin size (log10 bp) Mean squared error

Figure 7. Mean squared error for the carcass trait of beef cattle plotted against the bin size. The filled circles indicate the MSE under the infinitesimal model while the open circles indicate the MSE under the adaptive infinitesimal model. The dashed horizontal line represents the phenotypic variance of the simulated trait (670.36). The blue horizontal line along with the two dotted lines represents the MSE and the standard deviation of the MSE in the situation where the bin size was one (one marker per bin). The sample size was

921 n 

and the number of SNP markers was

40809 p 

. The bin size was defined as log10 bp. For example, the largest bin size

10

log bp 8.5 

means that the bin size contains

5

8.5 10 

base pairs.

slide-41
SLIDE 41
slide-42
SLIDE 42
slide-43
SLIDE 43

5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 400 450 500 550 600 650 700 Bin size (log10 bp) Mean squared error

Figure 7. Mean squared error for the carcass trait of beef cattle plotted against the bin size. The filled circles indicate the MSE under the infinitesimal model while the open circles indicate the MSE under the adaptive infinitesimal model. The dashed horizontal line represents the phenotypic variance of the simulated trait (670.36). The blue horizontal line along with the two dotted lines represents the MSE and the standard deviation of the MSE in the situation where the bin size was one (one marker per bin). The sample size was

921 n 

and the number of SNP markers was

40809 p 

. The bin size was defined as log10 bp. For example, the largest bin size

10

log bp 8.5 

means that the bin size contains

5

8.5 10 

base pairs.

slide-44
SLIDE 44

5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 400 450 500 550 600 650 700 Bin size (log10 bp) MSE 0.294 0.3 0.298 0.281 0.305 0.326 0.324 0.324 0.333 0.306 0.204 0.122

  • 0.002

0.001 0.001

  • 0.001

0.026 0.017 0.024 0.026 0.039 0.085 0.075 0.087 Marker Analysis p = 40809 MSE = 600 R^2 = (670-600)/670 = 0.09

0.09

slide-45
SLIDE 45

Table 1. Mean squared error (MSE) and R-square values obtained from the 10-fold cross validation analysis for the beef carcass trait using five competing models and the proposed bin model. Model MSE2 R-square eBayes 648.11 0.0332 G-Blup 632.46 0.0565 BayesB-1 655.59 0.0220 BayesB-21 658.19 0.0182 Lasso 603.75 0.0994 Bin model 447.10 0.3330

1The Pi value for BayesB-2 is set at 0.95. 2The phenotypic variance of the beef carcass trait is 670.36. The magnitude of MSE value

smaller than 670.36 indicates the effectiveness of the model predictability.

slide-46
SLIDE 46

Outline

  • Quantitative trait and the infinitesimal model
  • Infinitesimal model using marker information
  • Adaptive infinitesimal model
  • Simulation studies
  • Rice and beef cattle data analyses
slide-47
SLIDE 47

Acknowledgements

  • Zhiqiu Hu (postdoc)
  • Qifa Zhang (rice data)
  • Zhiqiun Wang (beef data)
  • USDA Grant 2007-02784
slide-48
SLIDE 48

Thank You !