Genomic Prediction and Selection for Multi-Environments with Big Data - - PowerPoint PPT Presentation

genomic prediction and selection for multi environments
SMART_READER_LITE
LIVE PREVIEW

Genomic Prediction and Selection for Multi-Environments with Big Data - - PowerPoint PPT Presentation

Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package Biometrics and Statistics Unit, Global Maize and Wheat programs June, 2015. CIMMyT Genomic Prediction and Selection for Multi-Environments


slide-1
SLIDE 1

Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical package

Biometrics and Statistics Unit, Global Maize and Wheat programs June, 2015.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR 1/26

slide-2
SLIDE 2

Contents

1

BGLR

2

Prediction in multi-environments

3

Models

4

Cross validation

5

Application examples

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR 2/26

slide-3
SLIDE 3

BGLR

BGLR

A novel software for whole genomic regression an prediction for continuous, discrete traits, censored and uncensored. Suitable for big p and small n problems. Many non-parametric and parametric models implemented in a consistent manner. Large collection of Bayesian models included:

Bayesian ridge regression. Bayesian LASSO. BayesA, BayesB, BayesC-π. Reproducing Kernel Hilbert Spaces. Reproducing Kernel Hilbert Spaces with Kernel-Averaging.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR 3/26

slide-4
SLIDE 4

BGLR

Continue...

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR 4/26

slide-5
SLIDE 5

BGLR

BGLR in a nutshell

Data equation: y = η + ε where η = 1µ + X jβj + ul. Piors: Different priors can be assigned to regression coefficients and random effects ul, which leads to different models. Model fitting using MCMC algorithms (Gibbs sampler and Metropolis-Hastings) implemented efficiently.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR 5/26

slide-6
SLIDE 6

Prediction in multi-environments

Prediction in multi-environments

In most agronomic traits, the effects of genes are modulated by environmental conditions, generating G×E. Researchers working in plant breeding have developed multiple methods for accounting for, and exploiting G×E in multi-environment trials. Genomic selection is gaining ground in plant breeding. Most applications so far are based on single-environment/single-trait models. Preliminary evidence (e.g., Burgueño et al., 2012) suggests that there is great scope for improving prediction accuracy using multi-environment models. The ideas can be taken one step further by incorporating information on environmental covariates.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR 6/26

slide-7
SLIDE 7

Prediction in multi-environments

Continue...

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR 7/26

slide-8
SLIDE 8

Prediction in multi-environments

Continue...

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR 8/26

slide-9
SLIDE 9

Models

Models

Model 1 (EL, Environment + Line, no pedigree) yij = µ + Ei + Li + eij Model 2 (EA, Environment + Line, with markers) yij = µ + Ei + gj + eij Model 3 (Environments, Line and interactions markes and environment) yij = µ + Ei + gj + Egij + eij

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR 9/26

slide-10
SLIDE 10

Models

Assumptions

It is assumed that Ei ∼ N(0, σ2

E), g ∼ N(0, σ2 gG) with G being the genomic

relationship matrix and Egij the interaction term between genotypes and

  • environment. Eg ∼ N(0, (Z gGZ T

g ) · Z EZ T E), Z g connects genotypes with

phenotypes, Z E connects phenotypes with environments, and · stands for Hadamart product between two matrices.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR 10/26

slide-11
SLIDE 11

Cross validation

Cross validation

1

CV1: Prediction of performance of newly developed lines (i.e., lines that have not been evaluated in any field trials).

2

CV2: Prediction in incomplete field trials; here the aim was to predict performance of lines that have been evaluated in some environments but not in others. See Figure in next slide.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR 11/26

slide-12
SLIDE 12

Cross validation

Continue...

Figure 1: Two hypothetical cross-validation schemes (CV1 and CV2) for five lines (Lines 1-5) and five environments (E1-E5), source: Jarquín et al. (2014).

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR 12/26

slide-13
SLIDE 13

Application examples

Example 1 Wheat dataset (Ravi, Jessica et al.)

The phenotypic information consists in grain yield for wheat in 5 mega environments. Table 1. Number of lines evaluated in each environment The problem is to predict 9, 000 unobserved individuals in all the environments.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR 13/26

slide-14
SLIDE 14

Application examples

Continue...

Table 2. Phenotypic correlations between environments.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR 14/26

slide-15
SLIDE 15

Application examples

Continue...

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR 15/26

slide-16
SLIDE 16

Application examples

Continue...

In order to do model fitting we used COP and markers (GBS).

1

COP: We computed a relationship matrix (A). The matrix has about 50k × 50k = 2500, 000, 000 entries.

We used BROWSE, the program took several days to finish. We used a ‘ad-hoc’ version of the R program pedigreemm and we got the matrix in about 3 hours.

2

Markers: Information for about 21,000 individuals and 14,000 individuals was available.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR 16/26

slide-17
SLIDE 17

Application examples

Continue...

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR 17/26

slide-18
SLIDE 18

Application examples

Benchmark: Predicting 2014 using previous records

Figure 2: Predictions in testing

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR 18/26

slide-19
SLIDE 19

Application examples

The real problem

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR 19/26

slide-20
SLIDE 20

Application examples

Continue...

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR 20/26

slide-21
SLIDE 21

Application examples

Continue...

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR 21/26

slide-22
SLIDE 22

Application examples

Example 2: Biparental Tropical maize populations (Xuecai et al.)

Genotypic and phenotypic information for about 20 biparental populations. Low (about 200) and Hight density markers (about 60,000). Individuals evaluated in several environments.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR 22/26

slide-23
SLIDE 23

Application examples

Continue...

Figure 3: Results from CV1

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR 23/26

slide-24
SLIDE 24

Application examples

Continue...

Figure 4: Results from CV2

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR 24/26

slide-25
SLIDE 25

Application examples

Collaborators in this work

  • J. Crossa

Juan Burgueño

  • G. de los Campos

Jessica Rutoski Ravi Singh Enrique Autrique Jesee Poland Juan Carlos Alarcón Susan Dreisigaker Paulino Pérez

  • X. Zhang
  • K. Semagn
  • Y. Beyene
  • R. Babu

F . San Vicente

  • M. Olsen

Newman Samayoua

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR 25/26

slide-26
SLIDE 26

Application examples

References

Burgueño, J., G. de-los-Campos, K. Weigel, and J. Crossa. (2012). Genomic prediction of breeding values when modeling genotype × environment interaction using pedigree and dense molecular markers. Crop Science, 43: 311-320. Jarquín, D., J. Crossa, X. Lacaze, P . Cheyron, J. Daucourt, J. Lorgeou, F . Piraux, et al . (2014). A reaction norm model for genomic selection using high-dimensional genomic and environmental data. Theoretical and Applied Genetics, 127 (3): 595-607.

CIMMyT Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR 26/26