genomic prediction and selection for multi environments
play

Genomic Prediction and Selection for Multi-Environments J. Crossa 1 - PowerPoint PPT Presentation

Genomic Prediction and Selection for Multi-Environments J. Crossa 1 j.crossa@CGIAR.org . Prez 2 P perpdgo@gmail.com G. de los Campos 3 gcampos@gmail.com 1 CIMMyT-Mxico 2 ColPos-Mxico 3 Michigan-USA. June, 2015. CIMMYT, Mxico-SAGPDB


  1. Genomic Prediction and Selection for Multi-Environments J. Crossa 1 j.crossa@CGIAR.org . Pérez 2 P perpdgo@gmail.com G. de los Campos 3 gcampos@gmail.com 1 CIMMyT-México 2 ColPos-México 3 Michigan-USA. June, 2015. CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 1/24

  2. Contents The problem 1 Models 2 Model fitting 3 Cross validation 4 Application examples (Part 1) 5 Model extensions with environmental covariates 6 CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 2/24

  3. The problem The problem In most agronomic traits, the effects of genes are modulated by environmental conditions, generating G × E. Researchers working in plant breeding have developed multiple methods for accounting for, and exploiting G × E in multi-environment trials. Genomic selection is gaining ground in plant breeding. Most applications so far are based on single-environment/single-trait models. Preliminary evidence (e.g., Burgueño et al., 2012) suggests that there is great scope for improving prediction accuracy using multi-environment models. The ideas can be taken one step further by incorporating information on environmental covariates. CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 3/24

  4. The problem Continue... CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 4/24

  5. The problem Continue... CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 5/24

  6. Models Models Model 1 (EL, Environment + Line, no pedigree) y ij = µ + E i + L i + e ij Model 2 (EA, Environment + Line, with markers) y ij = µ + E i + g j + e ij Model 3 (Environments, Line and interactions markes and environment) y ij = µ + E i + g j + Eg ij + e ij CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 6/24

  7. Models Assumptions It is assumed that E i ∼ N ( 0 , σ 2 E ) , g ∼ N ( 0 , σ 2 g G ) with G being the genomic relationship matrix and Eg ij the interaction term between genotypes and environment. Eg ∼ N ( 0 , ( Z g GZ T g ) · Z E Z T E ) , Z g connects genotypes with phenotypes, Z E connects phenotypes with environments, and · stands for Hadamart product between two matrices. CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 7/24

  8. Model fitting Description of Data Objects - Y, data frame containing the elements described below; - Y$yield: (nx1), a numeric vector with centered and standardized yield; - Y$VAR (nx1), a factor giving the IDs for the varieties; - Y$ENV (nx1), a factor giving the IDs for the environments; - A, a symmetric positive semi-definite matrix containing the pedigree or marker-based relationships (dimensions equal to number of lines by number of lines). We assume that the rownames(A)=colnames(A) gives the IDs of the lines; CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 8/24

  9. Model fitting Model fitting Model 1 (EL, Environment + Line, no pedigree) library(BGLR) # incidence matrix for main eff. of environments. ZE<-model.matrix(~factor(Y$ENV)-1) # incidence matrix for main eff. of lines. Y$VAR<-factor(x=Y$VAR,levels=rownames(A),ordered=TRUE) ZVAR<-model.matrix(~Y$VAR-1) # Model Fitting ETA<-list( ENV=list(X=ZE,model="BRR"), VAR=list(X=ZVAR,model="BRR")) fm1<-BGLR(y=Y$yield,ETA=ETA,saveAt="M1_",nIter=6000,burnIn=1000) CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 9/24

  10. Model fitting Model fitting Model 2 (EA, Environment + Line, with markers) X<-scale(X,center=TRUE,scale=TRUE) G<-tcrossprod(X)/ncol(X) G<-G/mean(diag(G)) L<-t(chol(G)) ZL<-ZVAR%*%L ETA<-list( ENV=list(X=ZE,model="BRR"), Grm=list(X=ZL,model="BRR") ) fm2<-BGLR(y=Y$yield,ETA=ETA,saveAt="M2_",nIter=6000,burnIn=1000) CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 10/24

  11. Model fitting Model 3 (Environments, Line and interactions markers and environment) ZGZ<-tcrossprod(ZL) ZEZE<-tcrossprod(ZE) K<-ZGZ*ZEZE diag(K)<-diag(K)+1/200 K<-K/mean(diag(K)) ETA<-list( ENV=list(X=ZE,model="BRR"), Grm=list(X=ZL,model="BRR"), EGrm=list(K=K,model="RKHS") ) fm3<-BGLR(y=Y$yield,ETA=ETA, saveAt=’M3_’,nIter=6000,burnIn=1000) CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 11/24

  12. Cross validation Cross validation CV1: Prediction of performance of newly developed lines (i.e., lines that 1 have not been evaluated in any field trials). CV2: Prediction in incomplete field trials; here the aim was to predict 2 performance of lines that have been evaluated in some environments but not in others. See Figure in next slide. CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 12/24

  13. Cross validation Continue... Figure 1: Two hypothetical cross-validation schemes (CV1 and CV2) for five lines (Lines 1-5) and five environments (E1-E5), source: Jarquín et al. (2014). CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 13/24

  14. Application examples (Part 1) Example Wheat dataset (CIMMyT) Data for n = 599 wheat lines evaluated in 4 environments, wheat improvement program, CIMMyT. The dataset includes p = 1279 molecular markers ( x ij , i = 1 , ..., n , j = 1 , ..., p ) (coded as 0,1). The pedigree information is also available. Histogram of Y$yield ● 7 ● ● 400 ● ● ● 6 ● ● 300 5 ● ● ● ● ● ● ● Frequency Yield 4 200 3 ● 100 ● ● ● ● ● ● ● 2 ● ● ● ● ● ● 1 ● 0 1 2 4 5 1 2 3 4 5 6 7 Environment Y$yield Figure 2: Grain yield by environment. CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 14/24

  15. Application examples (Part 1) Data preparation... #Load genotypic data load("pedigree_markers.RData") #Load phenotypic data pheno=read.table(file="599_yield_raw-1.prn",header=TRUE) pheno=pheno[,c(2,5,6)] index=paste(pheno$env,pheno$gen1,sep="@") yavg=tapply(pheno$GY,index,"mean") tmp=names(yavg) tmp2=strsplit(tmp,"@") gen=character() env=character() for(i in 1:length(tmp2)) { env[i]=tmp2[[i]][1] gen[i]=tmp2[[i]][2] } Y=data.frame(yield=yavg,VAR=gen,ENV=env) index=order(as.character(Y$ENV),as.character(Y$VAR)) Y=Y[index,] CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 15/24

  16. Application examples (Part 1) Continue... index=order(colnames(A)) A=A[index,index] X=X[index,] save(Y,A,X,file="standarized_data.RData") CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 16/24

  17. Application examples (Part 1) Code for cross validation schemas ... #CV=1: assigns lines to folds #CV=2: assigns entries of a line to folds CV<-2 nFolds<-5 sets<-rep(NA,nrow(Y)) set.seed(123) IDs<-as.character(unique(Y$VAR)) if(CV==1) { folds<-sample(1:nFolds,size=length(IDs),replace=TRUE) for(i in 1:nrow(Y)){ sets[i]<-folds[which(IDs==Y$VAR[i])] } } if(CV==2) { IDy<-as.character(Y$VAR) for(i in IDs){ tmp=which(IDy==i) ni=length(tmp) tmpFold<-sample(1:nFolds,size=ni,replace=ni>nFolds) sets[tmp]<-tmpFold } } CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 17/24

  18. Application examples (Part 1) Fitting model and extracting results... ################################################### #Model 1 ################################################### # incidence matrix for main eff. of environments. ZE<-model.matrix(~factor(Y$ENV)-1) # incidence matrix for main eff. of lines. Y$VAR<-as.factor(Y$VAR) ZVAR<-model.matrix(~Y$VAR-1) # Model Fitting ETA<-list( ENV=list(X=ZE,model="BRR"), VAR=list(X=ZVAR,model="BRR")) y=Y$yield testing=(sets==1) y[testing]=NA fm1<-BGLR(y=y,ETA=ETA,saveAt="M1_",nIter=6000,burnIn=1000) unlink("*.dat") #Extract the predictions predictions=data.frame(Env=Y$ENV[testing], Individual=Y$VAR[testing], y=Y$yield[testing], yHat=fm1$yHat[testing]) CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 18/24

  19. Application examples (Part 1) Continue... #write.table(predictions,file=paste("predictions.csv",sep=""), # row.names=FALSE,sep=",") #doBy version predictions=orderBy(~Env,data=predictions) lapplyBy(~Env,data=predictions,function(x){cor(x$yHat,x$y)}) > lapplyBy(~Env,data=predictions,function(x){cor(x$yHat,x$y)}) $‘1‘ [1] 0.01630911 $‘2‘ [1] 0.6108203 $‘4‘ [1] 0.564435 $‘5‘ [1] 0.289207 CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 19/24

  20. Application examples (Part 1) Results for one fold... 0.4 0.3 Correlation 0.2 0.1 0.0 M1 M2 M3 Figure 3: Results from CV1 CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 20/24

  21. Application examples (Part 1) Continue... 0.5 0.4 0.3 Correlation 0.2 0.1 0.0 M1 M2 M3 Figure 4: Results from CV2 CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 21/24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend