Computations in Animal Breeding Ignacy Misztal and Romdhane Rekaya - PowerPoint PPT Presentation

Computations in Animal Breeding Ignacy Misztal and Romdhane Rekaya University of Georgia

Animal Breeding • Selection of more productive animals as parents • Effect=> Next generation is more productive • Tools: artificial insemination + embryo transfer – A dairy sire can have >100,000 daughters! – Selection of “good” sires important!

Information for selection • DNA (active research) • Data collected on large populations of farm animals – Records contain combination of genetics, environment, and systematic effects – Need statistical methodology to obtain best prediction of genetic effects

Dairy • Mostly family farms (50-1000 cows) • Mostly Holsteins • Recording on: – Production (milk, fat, protein yields) – Conformation (size, legs, udder..) – Secondary traits (Somatic cell count in milk,Calving ease, reproduction) • Information on > 20 million Holsteins (kept by USDA and breed associations) • Semen market global

Molecular genetics

Poultry • Integrated system • Large companies • Hierarchical breeding structure • Final product - crossbreds • Useful records on up to 200k birds • Recording on: – Number and size of eggs – Growth (weights at certain ages) – Fertility – …

Swine • Becoming more integrated • Hierarchical breeding structure • Final product - crossbreds • Recording on: – Growth – Litter size – Meat quality – … • Populations > 200k animals

Beef cattle • Mostly family farms (5 - 50,000) • Many breeds • Final product – purebreds and crossbreds • Recording on: – growth – fertility – meat quality • Data size up to 2 million animals

Results of genetic selection • Milk yield 2 times higher • Chicken – Time to maturity over 2 times shorter – Food efficiency 2 times higher • Swine • Beef • Fish

Genetic value of individual On a population level: g = a + “rest ” Var ( a )= A s a, A - matrix of relationships among animals, s a - additive variance A dense; A -1 sparse and easy to set up

Example of a model • Litter weight = – contemporary group + – age class + – genetic group + – animal + – e – Var( e ) = I s e , var( animal )= A s a – s e , s a - variance components

Mixed model y – vector of records ß – vector of fixed effects u – vector of random effects e – vector of residuals X, Z – design matrices Fixed effects – usually few levels, lots of information Random effects – usually many levels, little information

Mixed model equations R block diagonal with small blocks G block diagonal with small or large blocks

Matrices in Mixed Model • Symmetric • Semi-positive definite • Sparse – 3-200 nonzeros per row (on average) • Can be constructed as a sum of outer products Σ W i Q i W i W i – 1-20 x 20k-300 million, < 100 nonzeroes Q i - small square matrix (less than 20 x 20)

Models • Sire model Y = cg +… sire + e 1000k animals ≈ 50k equations • Animal model Y= cg +..+ animal + e 1000k animals ≈ 1200k equations • Mutiple trait model y 1 y 1 = cg 1 +…+ animal 1 + …+ e 1 1000k animals >3000k equations …. y n y n = cg n +…+ animal n + …+ e n • Random regression model 1000k animals >6000k equations (Longitudinal) y = cg +…+ Σ f i (x)animal i + …+ e y

Tasks • Estimate variance components – Usually sample of 5-50k animals • Solve mixed model equations – Populations up to 20 million animals – Up to 60 unknowns per animal

Data structures for sparse matrices • Linked list • Triples as Hash • IJA – (row pointer to columns and values) • Matrix not stored

Data structures

Solving strategies • Sparse factorization • Iteration – Gauss Seidel – Gauss Seidel + 2 nd order Jacobi – PCG • Conditioners from diagonal to incomplete factorization

Gauss-Seidel Iteration & SOR Ax = b -Simple -Stable and self correcting -Converges for semi-positive definite A -For balanced cross-classified models converges in one round -Small memory requirements -Hard to implement matrix-free -Slow convergence for complicated models

Preconditioned Conjugate Gradient Large memory requirements Tricky implementation Easy to implement matrix-free Usually converges a few times faster than SOR even with diagonal preconditioner! Matrix-free iteration Let A = Σ (Wi’ Wi) nonzeros(W) << nonzeros(A) W easily generated from data Ax = Σ (Wi’ Wi x)

Methodologies to estimate variance components • Restricted Maximum Likelihood (REML) • Monte Carlo Markov Chain

REML Φ - variance components (in R and G) C* - LHS converted to full rank Maximizations Derivative free (use sparse factorization) First derivative (Expectation maximization; use sparse inverse) Second derivative D 2 and E(D 2 ) hard to compute but [D 2 and E(D 2 ) ]/2 simpler – Average Information REML

Sparse matrix inversion • Takahashi method: • Can obtain inverse elements only for elements where L ≠ 0 • Inverses obtained for sparse matrices as large as 1000kx1000k • Cost ≈ 2 x sparse factorization

REML, properties • Derivative free methods reliable only for simple problems • Derivative methods – Difficult formulas • Nearly impossible for nonstandard models – High computing cost ≈ quadratic – Easy determination of termination

Bayesian Methods and MCMC

Samples

Approximate P(s e |y)

Approximate (p(s p |y)

Approximate (p(s a |y)

MCMC, properties – Much simpler formulas – Can accommodate • large models • Complicated models – Can take months if not optimized – Details important and hard decision when to stop

Optimization in sampling methods • 10k-1 million samples • Slow if equations regenerated each round • If equations can be represented as: [R ⊗ X1 + G ⊗ X2] = R ⊗ y R,G – estimated small matrices, X1, X2, y – constant X1, X2 and y can be created and stored once. Requires tricks if models different/trait and missing traits

Software • SAS (fixed models or small mixed models) • Custom • Packages – PEST, VCE (Groeneveld et al) – ASREML (Gilmour) – DMU (Jensen et al) – MATVEC (Wang et al) – Blupf90 etc. (Misztal et al)

Larger data, more complicated models, simpler computing?

Computing platforms • Past – Mainframe – Supercomputers • Sparse computations vectorize! • Current – PCs + workstations – Windows/Linux,Unix • Parallel and vector processing not important

Random regression model on a parallel processor Madsen et al, 1999; Lidauer et all, 1999) Goal: compute Σ (W i ‘W i x); W i -large sparse vectors • Approaches: Distribute collection of Σ (W i ‘W i )x t o separate processors a) b) Optimize scalar algorithm first to Σ (W i ‘(W i x)) If W i has 30 nonzeros: a) 900 multiplications b) 60 multiplications Scalar optimization more important than brute force parallelization

Other Models • Censored • Survival • Threshold • …..

Issues • 0.99 problem • Sophistication of statistics vs. understanding of problem vs. data editing • Undesired responses of selection – Less fitness,…. – Aggressiveness (swine, poultry) • Challenges of molecular genetics

Molecular Genetics • Attempts to identify effects of genes on individual traits • Simple statistical methodologies • Methodology for joint analyses with phenotypic and DNA data difficult • Active research area

Conclusions • Animal breeding compute intensive – Large systems of equations – Matrices sparse • Research has high large economic value

Computations in Animal Breeding Ignacy Misztal and Romdhane Rekaya - PowerPoint PPT Presentation

Computations in Animal Breeding Ignacy Misztal and Romdhane Rekaya University of Georgia Animal Breeding Selection of more productive animals as parents Effect=> Next generation is more productive Tools: artificial

Genomic selection risks benefits alternatives Jack J. Windig Animal Breeding & Genomics

Embarrassingly Parallel Computations 3.2 1 Embarrassingly Parallel Computations A computation

Animal adaptations Jada What are adaptations Adaptations help animal It help animal get

Animal Testing Angie Bannister What is Animal Testing? animal testing, animal experimentation or

THE FUTURE LIES IN THE QUALITY OF THE PROGENY Paul Lubout History of animal breeding in

P bli P bli Public Soybean Public Soybean S S b b Breeding Research in Breeding Research

Los Alamos Breeding Bird Atlas Kickoff Meeting February 2, 2017 What is a Breeding Bird Atlas?

Selecting hybrid pine clone/s for deployment the pointy end of breeding for wood quality.

Animal protein production in a Animal protein production in a Animal protein production in a

ISU CVM Food Animal and Mixed Animal Options for VM4 Students 2014-2015 Mixed Animal and Food

ISU CVM Food Animal and Mixed Animal Options for VM4 Students 2015-2016 Mixed Animal and Food

Human Language vs. Animal Communication Linguistics 101 Human Language vs. Animal Communication

Structuring Computations Structuring Computations Contents Jacobs Types06, 18/4/06

Breeding sheep for parasite resistance: What traits to measure? Phenotypes and Genotypes Animal

The City of Calgary The City of Calgary Animal & Bylaw Services Animal & Bylaw Services

Farmed Animal Health and Welfare 2020 National Farmed Animal Health & Welfare Council

MEMORANDUM TO: Council Members FROM: Tony Grover SUBJECT: Columbia River Intertribal Fish

sst st

GENE SELECTION IN MICROARRAY SURVIVAL STUDIES UNDER POSSIBLY NON PROPORTIONAL HAZARDS Daniela

Conditional vs. marginal estimators Background of within-pair regression e ff ects in Models for

Factors Affecting the Presentation of Footrot and Interdigital Dermatitis in a UK Sheep Flock. V.

Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves 1 1 Department of

Genome-wide association studies Fernando Rivadeneira MD PhD 1,2 1 Department of Internal Medicine

In Inter-varietal Hybrid id Dougla las-Fir ir Growth Potential of Coastal Sources and Winter

Computations in Animal Breeding Ignacy Misztal and Romdhane Rekaya - PowerPoint PPT Presentation

Computations in Animal Breeding Ignacy Misztal and Romdhane Rekaya University of Georgia Animal Breeding Selection of more productive animals as parents Effect=> Next generation is more productive Tools: artificial

Genomic selection risks benefits alternatives Jack J. Windig Animal Breeding &amp; Genomics

Embarrassingly Parallel Computations 3.2 1 Embarrassingly Parallel Computations A computation

Animal adaptations Jada What are adaptations Adaptations help animal It help animal get

Animal Testing Angie Bannister What is Animal Testing? animal testing, animal experimentation or

THE FUTURE LIES IN THE QUALITY OF THE PROGENY Paul Lubout History of animal breeding in

P bli P bli Public Soybean Public Soybean S S b b Breeding Research in Breeding Research

Los Alamos Breeding Bird Atlas Kickoff Meeting February 2, 2017 What is a Breeding Bird Atlas?

Selecting hybrid pine clone/s for deployment the pointy end of breeding for wood quality.

Animal protein production in a Animal protein production in a Animal protein production in a

ISU CVM Food Animal and Mixed Animal Options for VM4 Students 2014-2015 Mixed Animal and Food

ISU CVM Food Animal and Mixed Animal Options for VM4 Students 2015-2016 Mixed Animal and Food

Human Language vs. Animal Communication Linguistics 101 Human Language vs. Animal Communication

Structuring Computations Structuring Computations Contents Jacobs Types06, 18/4/06

Breeding sheep for parasite resistance: What traits to measure? Phenotypes and Genotypes Animal

The City of Calgary The City of Calgary Animal &amp; Bylaw Services Animal &amp; Bylaw Services

Farmed Animal Health and Welfare 2020 National Farmed Animal Health &amp; Welfare Council

MEMORANDUM TO: Council Members FROM: Tony Grover SUBJECT: Columbia River Intertribal Fish

sst st

GENE SELECTION IN MICROARRAY SURVIVAL STUDIES UNDER POSSIBLY NON PROPORTIONAL HAZARDS Daniela

Conditional vs. marginal estimators Background of within-pair regression e ff ects in Models for

Factors Affecting the Presentation of Footrot and Interdigital Dermatitis in a UK Sheep Flock. V.

Treelet Covariance Smoothers Estimation of Genetic Parameters Benjamin Draves 1 1 Department of

Genome-wide association studies Fernando Rivadeneira MD PhD 1,2 1 Department of Internal Medicine

In Inter-varietal Hybrid id Dougla las-Fir ir Growth Potential of Coastal Sources and Winter

Genomic selection risks benefits alternatives Jack J. Windig Animal Breeding & Genomics

The City of Calgary The City of Calgary Animal & Bylaw Services Animal & Bylaw Services

Farmed Animal Health and Welfare 2020 National Farmed Animal Health & Welfare Council