Genomic Prediction and Selection for Multi-Environments J. Crossa 1 - - PowerPoint PPT Presentation

genomic prediction and selection for multi environments
SMART_READER_LITE
LIVE PREVIEW

Genomic Prediction and Selection for Multi-Environments J. Crossa 1 - - PowerPoint PPT Presentation

Genomic Prediction and Selection for Multi-Environments J. Crossa 1 j.crossa@CGIAR.org . Prez 2 P perpdgo@gmail.com G. de los Campos 3 gcampos@gmail.com 1 CIMMyT-Mxico 2 ColPos-Mxico 3 Michigan-USA. June, 2015. CIMMYT, Mxico-SAGPDB


slide-1
SLIDE 1

Genomic Prediction and Selection for Multi-Environments

  • J. Crossa 1

j.crossa@CGIAR.org P . Pérez 2 perpdgo@gmail.com

  • G. de los Campos 3

gcampos@gmail.com

1CIMMyT-México 2ColPos-México 3Michigan-USA.

June, 2015.

CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 1/24

slide-2
SLIDE 2

Contents

1

The problem

2

Models

3

Model fitting

4

Cross validation

5

Application examples (Part 1)

6

Model extensions with environmental covariates

CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 2/24

slide-3
SLIDE 3

The problem

The problem

In most agronomic traits, the effects of genes are modulated by environmental conditions, generating G×E. Researchers working in plant breeding have developed multiple methods for accounting for, and exploiting G×E in multi-environment trials. Genomic selection is gaining ground in plant breeding. Most applications so far are based on single-environment/single-trait models. Preliminary evidence (e.g., Burgueño et al., 2012) suggests that there is great scope for improving prediction accuracy using multi-environment models. The ideas can be taken one step further by incorporating information on environmental covariates.

CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 3/24

slide-4
SLIDE 4

The problem

Continue...

CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 4/24

slide-5
SLIDE 5

The problem

Continue...

CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 5/24

slide-6
SLIDE 6

Models

Models

Model 1 (EL, Environment + Line, no pedigree) yij = µ + Ei + Li + eij Model 2 (EA, Environment + Line, with markers) yij = µ + Ei + gj + eij Model 3 (Environments, Line and interactions markes and environment) yij = µ + Ei + gj + Egij + eij

CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 6/24

slide-7
SLIDE 7

Models

Assumptions

It is assumed that Ei ∼ N(0, σ2

E), g ∼ N(0, σ2 gG) with G being the genomic

relationship matrix and Egij the interaction term between genotypes and

  • environment. Eg ∼ N(0, (Z gGZ T

g ) · Z EZ T E), Z g connects genotypes with

phenotypes, Z E connects phenotypes with environments, and · stands for Hadamart product between two matrices.

CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 7/24

slide-8
SLIDE 8

Model fitting

Description of Data Objects

  • Y, data frame containing the elements described below;
  • Y$yield: (nx1), a numeric vector with centered and standardized yield;
  • Y$VAR

(nx1), a factor giving the IDs for the varieties;

  • Y$ENV (nx1), a factor giving the IDs for the environments;
  • A, a symmetric positive semi-definite matrix containing the pedigree or

marker-based relationships (dimensions equal to number

  • f lines by number of lines). We assume that

the rownames(A)=colnames(A) gives the IDs of the lines;

CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 8/24

slide-9
SLIDE 9

Model fitting

Model fitting

Model 1 (EL, Environment + Line, no pedigree)

library(BGLR) # incidence matrix for main eff. of environments. ZE<-model.matrix(~factor(Y$ENV)-1) # incidence matrix for main eff. of lines. Y$VAR<-factor(x=Y$VAR,levels=rownames(A),ordered=TRUE) ZVAR<-model.matrix(~Y$VAR-1) # Model Fitting ETA<-list( ENV=list(X=ZE,model="BRR"), VAR=list(X=ZVAR,model="BRR")) fm1<-BGLR(y=Y$yield,ETA=ETA,saveAt="M1_",nIter=6000,burnIn=1000)

CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 9/24

slide-10
SLIDE 10

Model fitting

Model fitting

Model 2 (EA, Environment + Line, with markers)

X<-scale(X,center=TRUE,scale=TRUE) G<-tcrossprod(X)/ncol(X) G<-G/mean(diag(G)) L<-t(chol(G)) ZL<-ZVAR%*%L ETA<-list( ENV=list(X=ZE,model="BRR"), Grm=list(X=ZL,model="BRR") ) fm2<-BGLR(y=Y$yield,ETA=ETA,saveAt="M2_",nIter=6000,burnIn=1000)

CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 10/24

slide-11
SLIDE 11

Model fitting

Model 3 (Environments, Line and interactions markers and environment)

ZGZ<-tcrossprod(ZL) ZEZE<-tcrossprod(ZE) K<-ZGZ*ZEZE diag(K)<-diag(K)+1/200 K<-K/mean(diag(K)) ETA<-list( ENV=list(X=ZE,model="BRR"), Grm=list(X=ZL,model="BRR"), EGrm=list(K=K,model="RKHS") ) fm3<-BGLR(y=Y$yield,ETA=ETA, saveAt=’M3_’,nIter=6000,burnIn=1000)

CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 11/24

slide-12
SLIDE 12

Cross validation

Cross validation

1

CV1: Prediction of performance of newly developed lines (i.e., lines that have not been evaluated in any field trials).

2

CV2: Prediction in incomplete field trials; here the aim was to predict performance of lines that have been evaluated in some environments but not in others. See Figure in next slide.

CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 12/24

slide-13
SLIDE 13

Cross validation

Continue...

Figure 1: Two hypothetical cross-validation schemes (CV1 and CV2) for five lines (Lines 1-5) and five environments (E1-E5), source: Jarquín et al. (2014).

CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 13/24

slide-14
SLIDE 14

Application examples (Part 1)

Example Wheat dataset (CIMMyT)

Data for n = 599 wheat lines evaluated in 4 environments, wheat improvement program, CIMMyT. The dataset includes p = 1279 molecular markers (xij, i = 1, ..., n, j = 1, ..., p) (coded as 0,1). The pedigree information is also available.

  • 1

2 4 5 1 2 3 4 5 6 7 Environment Yield

Histogram of Y$yield

Y$yield Frequency 1 2 3 4 5 6 7 100 200 300 400

Figure 2: Grain yield by environment.

CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 14/24

slide-15
SLIDE 15

Application examples (Part 1)

Data preparation...

#Load genotypic data load("pedigree_markers.RData") #Load phenotypic data pheno=read.table(file="599_yield_raw-1.prn",header=TRUE) pheno=pheno[,c(2,5,6)] index=paste(pheno$env,pheno$gen1,sep="@") yavg=tapply(pheno$GY,index,"mean") tmp=names(yavg) tmp2=strsplit(tmp,"@") gen=character() env=character() for(i in 1:length(tmp2)) { env[i]=tmp2[[i]][1] gen[i]=tmp2[[i]][2] } Y=data.frame(yield=yavg,VAR=gen,ENV=env) index=order(as.character(Y$ENV),as.character(Y$VAR)) Y=Y[index,]

CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 15/24

slide-16
SLIDE 16

Application examples (Part 1)

Continue...

index=order(colnames(A)) A=A[index,index] X=X[index,] save(Y,A,X,file="standarized_data.RData")

CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 16/24

slide-17
SLIDE 17

Application examples (Part 1)

Code for cross validation schemas ...

#CV=1: assigns lines to folds #CV=2: assigns entries of a line to folds CV<-2 nFolds<-5 sets<-rep(NA,nrow(Y)) set.seed(123) IDs<-as.character(unique(Y$VAR)) if(CV==1) { folds<-sample(1:nFolds,size=length(IDs),replace=TRUE) for(i in 1:nrow(Y)){ sets[i]<-folds[which(IDs==Y$VAR[i])] } } if(CV==2) { IDy<-as.character(Y$VAR) for(i in IDs){ tmp=which(IDy==i) ni=length(tmp) tmpFold<-sample(1:nFolds,size=ni,replace=ni>nFolds) sets[tmp]<-tmpFold } }

CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 17/24

slide-18
SLIDE 18

Application examples (Part 1)

Fitting model and extracting results...

################################################### #Model 1 ################################################### # incidence matrix for main eff. of environments. ZE<-model.matrix(~factor(Y$ENV)-1) # incidence matrix for main eff. of lines. Y$VAR<-as.factor(Y$VAR) ZVAR<-model.matrix(~Y$VAR-1) # Model Fitting ETA<-list( ENV=list(X=ZE,model="BRR"), VAR=list(X=ZVAR,model="BRR")) y=Y$yield testing=(sets==1) y[testing]=NA fm1<-BGLR(y=y,ETA=ETA,saveAt="M1_",nIter=6000,burnIn=1000) unlink("*.dat") #Extract the predictions predictions=data.frame(Env=Y$ENV[testing], Individual=Y$VAR[testing], y=Y$yield[testing], yHat=fm1$yHat[testing])

CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 18/24

slide-19
SLIDE 19

Application examples (Part 1)

Continue...

#write.table(predictions,file=paste("predictions.csv",sep=""), # row.names=FALSE,sep=",") #doBy version predictions=orderBy(~Env,data=predictions) lapplyBy(~Env,data=predictions,function(x){cor(x$yHat,x$y)}) > lapplyBy(~Env,data=predictions,function(x){cor(x$yHat,x$y)}) $‘1‘ [1] 0.01630911 $‘2‘ [1] 0.6108203 $‘4‘ [1] 0.564435 $‘5‘ [1] 0.289207

CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 19/24

slide-20
SLIDE 20

Application examples (Part 1)

Results for one fold...

M1 M2 M3 Correlation 0.0 0.1 0.2 0.3 0.4

Figure 3: Results from CV1

CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 20/24

slide-21
SLIDE 21

Application examples (Part 1)

Continue...

M1 M2 M3 Correlation 0.0 0.1 0.2 0.3 0.4 0.5

Figure 4: Results from CV2

CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 21/24

slide-22
SLIDE 22

Model extensions with environmental covariates

Model extensions with environmental covariates

This model is obtained by extending model EA by incorporating the environmental covariates. Model 4 (EAW) yij = µ + Ei + aj + tij + eij, where tij = Q

q=1 Wijqγq represent a regression on ECs and Wijq is the

evaluation of the q-th EC at the ij-th environmental-line combination and γq represents the effect of the q-th EC. Assumptions: γq ∼ N(0, σ2

γ), t = Wγ ∼ N(0, σ2 t WW T).

CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 22/24

slide-23
SLIDE 23

Model extensions with environmental covariates

Continue...

Model 5 (EAW-A×W) yij = µ + Ei + aj + tij + atij + eij Assumptions: at ∼ N(0, (ZpGZ T

p ) · WW Tσ2 at)

CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 23/24

slide-24
SLIDE 24

Model extensions with environmental covariates

References

Burgueño, J., G. de-los-Campos, K. Weigel, and J. Crossa. (2012). Genomic prediction of breeding values when modeling genotype × environment interaction using pedigree and dense molecular markers. Crop Science, 43: 311-320. Jarquín, D., J. Crossa, X. Lacaze, P . Cheyron, J. Daucourt, J. Lorgeou, F . Piraux, et al . (2014). A reaction norm model for genomic selection using high-dimensional genomic and environmental data. Theoretical and Applied Genetics, 127 (3): 595-607.

CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 24/24