Genomic Prediction and Selection for Multi-Environments J. Crossa 1 - PowerPoint PPT Presentation

Genomic Prediction and Selection for Multi-Environments J. Crossa 1 j.crossa@CGIAR.org . Pérez 2 P perpdgo@gmail.com G. de los Campos 3 gcampos@gmail.com 1 CIMMyT-México 2 ColPos-México 3 Michigan-USA. June, 2015. CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 1/24

Contents The problem 1 Models 2 Model fitting 3 Cross validation 4 Application examples (Part 1) 5 Model extensions with environmental covariates 6 CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 2/24

The problem The problem In most agronomic traits, the effects of genes are modulated by environmental conditions, generating G × E. Researchers working in plant breeding have developed multiple methods for accounting for, and exploiting G × E in multi-environment trials. Genomic selection is gaining ground in plant breeding. Most applications so far are based on single-environment/single-trait models. Preliminary evidence (e.g., Burgueño et al., 2012) suggests that there is great scope for improving prediction accuracy using multi-environment models. The ideas can be taken one step further by incorporating information on environmental covariates. CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 3/24

The problem Continue... CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 4/24

The problem Continue... CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 5/24

Models Models Model 1 (EL, Environment + Line, no pedigree) y ij = µ + E i + L i + e ij Model 2 (EA, Environment + Line, with markers) y ij = µ + E i + g j + e ij Model 3 (Environments, Line and interactions markes and environment) y ij = µ + E i + g j + Eg ij + e ij CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 6/24

Models Assumptions It is assumed that E i ∼ N ( 0 , σ 2 E ) , g ∼ N ( 0 , σ 2 g G ) with G being the genomic relationship matrix and Eg ij the interaction term between genotypes and environment. Eg ∼ N ( 0 , ( Z g GZ T g ) · Z E Z T E ) , Z g connects genotypes with phenotypes, Z E connects phenotypes with environments, and · stands for Hadamart product between two matrices. CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 7/24

Model fitting Description of Data Objects - Y, data frame containing the elements described below; - Y$yield: (nx1), a numeric vector with centered and standardized yield; - Y$VAR (nx1), a factor giving the IDs for the varieties; - Y$ENV (nx1), a factor giving the IDs for the environments; - A, a symmetric positive semi-definite matrix containing the pedigree or marker-based relationships (dimensions equal to number of lines by number of lines). We assume that the rownames(A)=colnames(A) gives the IDs of the lines; CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 8/24

Model fitting Model fitting Model 1 (EL, Environment + Line, no pedigree) library(BGLR) # incidence matrix for main eff. of environments. ZE<-model.matrix(~factor(Y$ENV)-1) # incidence matrix for main eff. of lines. Y$VAR<-factor(x=Y$VAR,levels=rownames(A),ordered=TRUE) ZVAR<-model.matrix(~Y$VAR-1) # Model Fitting ETA<-list( ENV=list(X=ZE,model="BRR"), VAR=list(X=ZVAR,model="BRR")) fm1<-BGLR(y=Y$yield,ETA=ETA,saveAt="M1_",nIter=6000,burnIn=1000) CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 9/24

Model fitting Model fitting Model 2 (EA, Environment + Line, with markers) X<-scale(X,center=TRUE,scale=TRUE) G<-tcrossprod(X)/ncol(X) G<-G/mean(diag(G)) L<-t(chol(G)) ZL<-ZVAR%*%L ETA<-list( ENV=list(X=ZE,model="BRR"), Grm=list(X=ZL,model="BRR") ) fm2<-BGLR(y=Y$yield,ETA=ETA,saveAt="M2_",nIter=6000,burnIn=1000) CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 10/24

Model fitting Model 3 (Environments, Line and interactions markers and environment) ZGZ<-tcrossprod(ZL) ZEZE<-tcrossprod(ZE) K<-ZGZ*ZEZE diag(K)<-diag(K)+1/200 K<-K/mean(diag(K)) ETA<-list( ENV=list(X=ZE,model="BRR"), Grm=list(X=ZL,model="BRR"), EGrm=list(K=K,model="RKHS") ) fm3<-BGLR(y=Y$yield,ETA=ETA, saveAt=’M3_’,nIter=6000,burnIn=1000) CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 11/24

Cross validation Cross validation CV1: Prediction of performance of newly developed lines (i.e., lines that 1 have not been evaluated in any field trials). CV2: Prediction in incomplete field trials; here the aim was to predict 2 performance of lines that have been evaluated in some environments but not in others. See Figure in next slide. CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 12/24

Cross validation Continue... Figure 1: Two hypothetical cross-validation schemes (CV1 and CV2) for five lines (Lines 1-5) and five environments (E1-E5), source: Jarquín et al. (2014). CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 13/24

Application examples (Part 1) Example Wheat dataset (CIMMyT) Data for n = 599 wheat lines evaluated in 4 environments, wheat improvement program, CIMMyT. The dataset includes p = 1279 molecular markers ( x ij , i = 1 , ..., n , j = 1 , ..., p ) (coded as 0,1). The pedigree information is also available. Histogram of Y$yield ● 7 ● ● 400 ● ● ● 6 ● ● 300 5 ● ● ● ● ● ● ● Frequency Yield 4 200 3 ● 100 ● ● ● ● ● ● ● 2 ● ● ● ● ● ● 1 ● 0 1 2 4 5 1 2 3 4 5 6 7 Environment Y$yield Figure 2: Grain yield by environment. CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 14/24

Application examples (Part 1) Data preparation... #Load genotypic data load("pedigree_markers.RData") #Load phenotypic data pheno=read.table(file="599_yield_raw-1.prn",header=TRUE) pheno=pheno[,c(2,5,6)] index=paste(pheno$env,pheno$gen1,sep="@") yavg=tapply(pheno$GY,index,"mean") tmp=names(yavg) tmp2=strsplit(tmp,"@") gen=character() env=character() for(i in 1:length(tmp2)) { env[i]=tmp2[[i]][1] gen[i]=tmp2[[i]][2] } Y=data.frame(yield=yavg,VAR=gen,ENV=env) index=order(as.character(Y$ENV),as.character(Y$VAR)) Y=Y[index,] CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 15/24

Application examples (Part 1) Continue... index=order(colnames(A)) A=A[index,index] X=X[index,] save(Y,A,X,file="standarized_data.RData") CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 16/24

Application examples (Part 1) Code for cross validation schemas ... #CV=1: assigns lines to folds #CV=2: assigns entries of a line to folds CV<-2 nFolds<-5 sets<-rep(NA,nrow(Y)) set.seed(123) IDs<-as.character(unique(Y$VAR)) if(CV==1) { folds<-sample(1:nFolds,size=length(IDs),replace=TRUE) for(i in 1:nrow(Y)){ sets[i]<-folds[which(IDs==Y$VAR[i])] } } if(CV==2) { IDy<-as.character(Y$VAR) for(i in IDs){ tmp=which(IDy==i) ni=length(tmp) tmpFold<-sample(1:nFolds,size=ni,replace=ni>nFolds) sets[tmp]<-tmpFold } } CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 17/24

Application examples (Part 1) Fitting model and extracting results... ################################################### #Model 1 ################################################### # incidence matrix for main eff. of environments. ZE<-model.matrix(~factor(Y$ENV)-1) # incidence matrix for main eff. of lines. Y$VAR<-as.factor(Y$VAR) ZVAR<-model.matrix(~Y$VAR-1) # Model Fitting ETA<-list( ENV=list(X=ZE,model="BRR"), VAR=list(X=ZVAR,model="BRR")) y=Y$yield testing=(sets==1) y[testing]=NA fm1<-BGLR(y=y,ETA=ETA,saveAt="M1_",nIter=6000,burnIn=1000) unlink("*.dat") #Extract the predictions predictions=data.frame(Env=Y$ENV[testing], Individual=Y$VAR[testing], y=Y$yield[testing], yHat=fm1$yHat[testing]) CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 18/24

Application examples (Part 1) Continue... #write.table(predictions,file=paste("predictions.csv",sep=""), # row.names=FALSE,sep=",") #doBy version predictions=orderBy(~Env,data=predictions) lapplyBy(~Env,data=predictions,function(x){cor(x$yHat,x$y)}) > lapplyBy(~Env,data=predictions,function(x){cor(x$yHat,x$y)}) $‘1‘ [1] 0.01630911 $‘2‘ [1] 0.6108203 $‘4‘ [1] 0.564435 $‘5‘ [1] 0.289207 CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 19/24

Application examples (Part 1) Results for one fold... 0.4 0.3 Correlation 0.2 0.1 0.0 M1 M2 M3 Figure 3: Results from CV1 CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 20/24

Application examples (Part 1) Continue... 0.5 0.4 0.3 Correlation 0.2 0.1 0.0 M1 M2 M3 Figure 4: Results from CV2 CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 21/24

Genomic Prediction and Selection for Multi-Environments J. Crossa 1 - PowerPoint PPT Presentation

Genomic Prediction and Selection for Multi-Environments J. Crossa 1 j.crossa@CGIAR.org . Prez 2 P perpdgo@gmail.com G. de los Campos 3 gcampos@gmail.com 1 CIMMyT-Mxico 2 ColPos-Mxico 3 Michigan-USA. June, 2015. CIMMYT, Mxico-SAGPDB

Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical

Using the genomic relationship matrix to predict the accuracy of genomic selection M.E. Goddard

Genomic Knowledge Standards (GKS) genomicsandhealth.org Genomic Knowledge Standards GKS aims

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

misc: environments, usethis, package structure Environments Environments and bindings via

Environments Announcements Environments for Higher-Order Functions Environments Enable

Genomic Selection with Linear Models and Rank Aggregation Marco Scutari m.scutari@ucl.ac.uk

GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (GBLUP-RR) Paulino Prez 1 Jos Crossa 2

GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions Paulino Prez 1 Jos Crossa 1 1

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Summary of part I: prediction and RL Prediction is important for action selection The

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

INTERNET OF Applying IoT to agriculture: FOOD? how far can we go? Workshop on Rapid Prototyping

BIOTECHNOLOGY: History, State of the art, Future Dr Marcel Daba BENGALY Universit Ouaga I Pr

Spinning a Semantic Web for Agriculture Medha Devare Sr. . Research Fell llow, Big ig Data

and Technology Indicators: Lessons Learned Kathleen Flaherty Agricultural S&T Indicators

Statistical Analysis of Genetic and Phenotypic Data for Breeders: Hands on Practical Sessions

Myanmar Impacts of COVID-19 on Economy, Agri-Food System, Jobs & Incomes Feed the Future

Regulating What You Cant See: International Law and Transboundary Aquifers Gabriel Eckstein

1 World Meteorological Organization WMO is the United Nations systems authoritative voice on

Genomic Prediction and Selection for Multi-Environments J. Crossa 1 - PowerPoint PPT Presentation

Genomic Prediction and Selection for Multi-Environments J. Crossa 1 j.crossa@CGIAR.org . Prez 2 P perpdgo@gmail.com G. de los Campos 3 gcampos@gmail.com 1 CIMMyT-Mxico 2 ColPos-Mxico 3 Michigan-USA. June, 2015. CIMMYT, Mxico-SAGPDB

Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical

Using the genomic relationship matrix to predict the accuracy of genomic selection M.E. Goddard

Genomic Knowledge Standards (GKS) genomicsandhealth.org Genomic Knowledge Standards GKS aims

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

misc: environments, usethis, package structure Environments Environments and bindings via

Environments Announcements Environments for Higher-Order Functions Environments Enable

Genomic Selection with Linear Models and Rank Aggregation Marco Scutari m.scutari@ucl.ac.uk

GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (GBLUP-RR) Paulino Prez 1 Jos Crossa 2

GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions Paulino Prez 1 Jos Crossa 1 1

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Summary of part I: prediction and RL Prediction is important for action selection The

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

INTERNET OF Applying IoT to agriculture: FOOD? how far can we go? Workshop on Rapid Prototyping

BIOTECHNOLOGY: History, State of the art, Future Dr Marcel Daba BENGALY Universit Ouaga I Pr

Spinning a Semantic Web for Agriculture Medha Devare Sr. . Research Fell llow, Big ig Data

and Technology Indicators: Lessons Learned Kathleen Flaherty Agricultural S&amp;T Indicators

Statistical Analysis of Genetic and Phenotypic Data for Breeders: Hands on Practical Sessions

Myanmar Impacts of COVID-19 on Economy, Agri-Food System, Jobs &amp; Incomes Feed the Future

Regulating What You Cant See: International Law and Transboundary Aquifers Gabriel Eckstein

1 World Meteorological Organization WMO is the United Nations systems authoritative voice on

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

and Technology Indicators: Lessons Learned Kathleen Flaherty Agricultural S&T Indicators

Myanmar Impacts of COVID-19 on Economy, Agri-Food System, Jobs & Incomes Feed the Future