statistical analysis of genetic and phenotypic data for
play

Statistical Analysis of Genetic and Phenotypic Data for Breeders: - PowerPoint PPT Presentation

Statistical Analysis of Genetic and Phenotypic Data for Breeders: Hands on Practical Sessions (BayesB) Paulino Prez 1 perpdgo@gmail.com Jos Crossa 2 j.crossa@CGIAR.org 1 ColPos-Mxico 2 CIMMyT-Mxico June, 2015. CIMMYT, Mxico-SAGPDB


  1. Statistical Analysis of Genetic and Phenotypic Data for Breeders: Hands on Practical Sessions (BayesB) Paulino Pérez 1 perpdgo@gmail.com José Crossa 2 j.crossa@CGIAR.org 1 ColPos-México 2 CIMMyT-México June, 2015. CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical 1/15

  2. Contents BayesB model and others 1 BayesB model 2 Examples 3 CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical 2/15

  3. BayesB model and others BayesB model and others Several analytical approaches for genome-based prediction of breeding values, y i = x ′ i β + e i , i = 1 , ..., n (1) Approaches differ respect to the assumptions about markers effects, Ridge regression-GBLUP: All the marker effects are normally distributed and all the markers have identical variances. BayesA and BayesB model introduced in GS by Meuwissen et al., 2001. BayesA: Markers are assumed to have different variances and are modeled following a χ − 2 distribution. BayesB: Variance of some markers equal to zero with probability π and variance of the rest of the markers follows an inverted scaled χ − 2 distribution with ν degrees of freedom and scale parameter s . CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical 3/15

  4. BayesB model and others Advantages and disadvantages of BayesB Advantages Simulation studies have shown that BayesB provided more accurate predictions of genetic values than BayesA and Ridge regression. It can be used effectively for variable selection. Disadvantages It requires the use of computer intensive MCMC techniques. Huge computation times in computer simulations of genomic selection breeding schemes. CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical 4/15

  5. BayesB model BayesB model BayesB uses a mixture distribution with a mass at zero, such that the (conditional) prior distribution of marker effects is given by � 0 with probability π β j | σ 2 j , π = (2) N ( 0 , σ 2 j ) with probability 1 − π The prior assigned to σ 2 j , j = 1 , ..., p is the same for all the markers, i.e. a scaled inverted chi squared distribution χ − 2 ( df , S ) . The model can be implemented using the Metropolis algorithm. CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical 5/15

  6. BayesB model 0.5 p ( β ) = γ 1 2 π exp ( − 1 ( 2 σ β 2 ) β 2 ) + ( 1 − γ ) δ ( 0 ) 0.4 0.3 p ( β ) 0.2 0.1 0.0 −3 −2 −1 0 1 2 3 β Figure 1: Example of prior for β j , j = 1 , ..., p . CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical 6/15

  7. Examples Examples Data for n = 599 wheat lines evaluated in 4 environments, wheat improvement program, CIMMyT. The dataset includes p = 1279 molecular markers ( x ij , i = 1 , ..., n , j = 1 , ..., p ) (coded as 0,1). The pedigree information is also available. Lets load the dataset in R, Load R 1 Install BGLR package (if not yet installed), this package is not yet 2 available in CRAN, so you have to install it from https://r-forge.r-project.org/R/?group_id=1525 install.packages("BGLR", repos="http://R-Forge.R-project.org") Load the package 3 Load the data 4 CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical 7/15

  8. Examples R code rm(list=ls()) setwd(tempdir()) library(BGLR) data(wheat) n<-599 # should be <= 599 p<-300 # should be <= than 1279=ncol(X) nQTL<-30 # should be <= than p X<-wheat.X[1:n,1:p] ## Centering and standarization for(i in 1:p) { X[,i]<-(X[,i]-mean(X[,i]))/sd(X[,i]) } # Simulation b0<-rep(0,p) whichQTL<-sample(1:p,size=nQTL,replace=FALSE) b0[whichQTL]<-rnorm(length(whichQTL), sd=sqrt(1/length(whichQTL))) signal<-as.vector(X%*%b0) error<-rnorm(n=n,sd=sqrt(0.5)) y<-signal +error CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical 8/15

  9. Examples Continued... nIter=500; burnIn=400; thin=10; saveAt=’’; S0=NULL; weights=NULL; R2=0.5; ETA<-list(list(X=X,model="BayesB",probIn=0.05)) fit_BB=BGLR(y=y,ETA=ETA,nIter=nIter,burnIn=burnIn, thin=thin,saveAt=saveAt,df0=5,S0=S0, weights=weights,R2=R2) plot(fit_BB$yHat,y) CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical 9/15

  10. Examples Continued... Notes: In this model the proportion of markers with zero effects should be fixed by the user. BayesC π offers the possibility of estimating this proportion from the sample. In BGLR the proportion of makers whose effect is effectively different from zero are indicated using the parameter probIn , in this case probIn = 0 . 05, that means that at most 1279 × 0 . 05 ≈ 64 markers are different from zero. hyper parameter for χ 2 ( σ 2 j | df , S ) , j = 1 , ..., p are fixed using heritability rules, see de los Campos et al., (2012) for more details. In this case h 2 = 0 . 5 (in the code R2=0.5), but you can change it according to your knowledge of heritabilities. Also it is possible to set this hyperparameters directly. CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical 10/15

  11. Examples Continued 4 ● #Preditions ● ● ● > fit_BB$yHat ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● #Error variance ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● > fit_BB$varE ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● [1] 0.5973773 ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● y ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● #Marker effects ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● > fit_BB$ETA[[1]]$b ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2 ● ● ● ● ● ● #Variance for markers ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● > fit_BB$ETA[[1]]$varB ● ● ● ● ● ● ● ● −4 ● ● −3 −2 −1 0 1 2 3 fit_BB$yHat CIMMYT, México-SAGPDB Statistical Analysis of Genetic and Phenotypic Data for Breeders:Hands on Practical 11/15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend