computations in animal breeding
play

Computations in Animal Breeding Ignacy Misztal and Romdhane Rekaya - PowerPoint PPT Presentation

Computations in Animal Breeding Ignacy Misztal and Romdhane Rekaya University of Georgia Animal Breeding Selection of more productive animals as parents Effect=> Next generation is more productive Tools: artificial


  1. Computations in Animal Breeding Ignacy Misztal and Romdhane Rekaya University of Georgia

  2. Animal Breeding • Selection of more productive animals as parents • Effect=> Next generation is more productive • Tools: artificial insemination + embryo transfer – A dairy sire can have >100,000 daughters! – Selection of “good” sires important!

  3. Information for selection • DNA (active research) • Data collected on large populations of farm animals – Records contain combination of genetics, environment, and systematic effects – Need statistical methodology to obtain best prediction of genetic effects

  4. Dairy • Mostly family farms (50-1000 cows) • Mostly Holsteins • Recording on: – Production (milk, fat, protein yields) – Conformation (size, legs, udder..) – Secondary traits (Somatic cell count in milk,Calving ease, reproduction) • Information on > 20 million Holsteins (kept by USDA and breed associations) • Semen market global

  5. Molecular genetics

  6. Poultry • Integrated system • Large companies • Hierarchical breeding structure • Final product - crossbreds • Useful records on up to 200k birds • Recording on: – Number and size of eggs – Growth (weights at certain ages) – Fertility – …

  7. Swine • Becoming more integrated • Hierarchical breeding structure • Final product - crossbreds • Recording on: – Growth – Litter size – Meat quality – … • Populations > 200k animals

  8. Beef cattle • Mostly family farms (5 - 50,000) • Many breeds • Final product – purebreds and crossbreds • Recording on: – growth – fertility – meat quality • Data size up to 2 million animals

  9. Results of genetic selection • Milk yield 2 times higher • Chicken – Time to maturity over 2 times shorter – Food efficiency 2 times higher • Swine • Beef • Fish

  10. Genetic value of individual On a population level: g = a + “rest ” Var ( a )= A s a, A - matrix of relationships among animals, s a - additive variance A dense; A -1 sparse and easy to set up

  11. Example of a model • Litter weight = – contemporary group + – age class + – genetic group + – animal + – e – Var( e ) = I s e , var( animal )= A s a – s e , s a - variance components

  12. Mixed model y – vector of records ß – vector of fixed effects u – vector of random effects e – vector of residuals X, Z – design matrices Fixed effects – usually few levels, lots of information Random effects – usually many levels, little information

  13. Mixed model equations R block diagonal with small blocks G block diagonal with small or large blocks

  14. Matrices in Mixed Model • Symmetric • Semi-positive definite • Sparse – 3-200 nonzeros per row (on average) • Can be constructed as a sum of outer products Σ W i Q i W i W i – 1-20 x 20k-300 million, < 100 nonzeroes Q i - small square matrix (less than 20 x 20)

  15. Models • Sire model Y = cg +… sire + e 1000k animals ≈ 50k equations • Animal model Y= cg +..+ animal + e 1000k animals ≈ 1200k equations • Mutiple trait model y 1 y 1 = cg 1 +…+ animal 1 + …+ e 1 1000k animals >3000k equations …. y n y n = cg n +…+ animal n + …+ e n • Random regression model 1000k animals >6000k equations (Longitudinal) y = cg +…+ Σ f i (x)animal i + …+ e y

  16. Tasks • Estimate variance components – Usually sample of 5-50k animals • Solve mixed model equations – Populations up to 20 million animals – Up to 60 unknowns per animal

  17. Data structures for sparse matrices • Linked list • Triples as Hash • IJA – (row pointer to columns and values) • Matrix not stored

  18. Data structures

  19. Solving strategies • Sparse factorization • Iteration – Gauss Seidel – Gauss Seidel + 2 nd order Jacobi – PCG • Conditioners from diagonal to incomplete factorization

  20. Gauss-Seidel Iteration & SOR Ax = b -Simple -Stable and self correcting -Converges for semi-positive definite A -For balanced cross-classified models converges in one round -Small memory requirements -Hard to implement matrix-free -Slow convergence for complicated models

  21. Preconditioned Conjugate Gradient Large memory requirements Tricky implementation Easy to implement matrix-free Usually converges a few times faster than SOR even with diagonal preconditioner! Matrix-free iteration Let A = Σ (Wi’ Wi) nonzeros(W) << nonzeros(A) W easily generated from data Ax = Σ (Wi’ Wi x)

  22. Methodologies to estimate variance components • Restricted Maximum Likelihood (REML) • Monte Carlo Markov Chain

  23. REML Φ - variance components (in R and G) C* - LHS converted to full rank Maximizations Derivative free (use sparse factorization) First derivative (Expectation maximization; use sparse inverse) Second derivative D 2 and E(D 2 ) hard to compute but [D 2 and E(D 2 ) ]/2 simpler – Average Information REML

  24. Sparse matrix inversion • Takahashi method: • Can obtain inverse elements only for elements where L ≠ 0 • Inverses obtained for sparse matrices as large as 1000kx1000k • Cost ≈ 2 x sparse factorization

  25. REML, properties • Derivative free methods reliable only for simple problems • Derivative methods – Difficult formulas • Nearly impossible for nonstandard models – High computing cost ≈ quadratic – Easy determination of termination

  26. Bayesian Methods and MCMC

  27. Samples

  28. Approximate P(s e |y)

  29. Approximate (p(s p |y)

  30. Approximate (p(s a |y)

  31. MCMC, properties – Much simpler formulas – Can accommodate • large models • Complicated models – Can take months if not optimized – Details important and hard decision when to stop

  32. Optimization in sampling methods • 10k-1 million samples • Slow if equations regenerated each round • If equations can be represented as: [R ⊗ X1 + G ⊗ X2] = R ⊗ y R,G – estimated small matrices, X1, X2, y – constant X1, X2 and y can be created and stored once. Requires tricks if models different/trait and missing traits

  33. Software • SAS (fixed models or small mixed models) • Custom • Packages – PEST, VCE (Groeneveld et al) – ASREML (Gilmour) – DMU (Jensen et al) – MATVEC (Wang et al) – Blupf90 etc. (Misztal et al)

  34. Larger data, more complicated models, simpler computing?

  35. Computing platforms • Past – Mainframe – Supercomputers • Sparse computations vectorize! • Current – PCs + workstations – Windows/Linux,Unix • Parallel and vector processing not important

  36. Random regression model on a parallel processor Madsen et al, 1999; Lidauer et all, 1999) Goal: compute Σ (W i ‘W i x); W i -large sparse vectors • Approaches: Distribute collection of Σ (W i ‘W i )x t o separate processors a) b) Optimize scalar algorithm first to Σ (W i ‘(W i x)) If W i has 30 nonzeros: a) 900 multiplications b) 60 multiplications Scalar optimization more important than brute force parallelization

  37. Other Models • Censored • Survival • Threshold • …..

  38. Issues • 0.99 problem • Sophistication of statistics vs. understanding of problem vs. data editing • Undesired responses of selection – Less fitness,…. – Aggressiveness (swine, poultry) • Challenges of molecular genetics

  39. Molecular Genetics • Attempts to identify effects of genes on individual traits • Simple statistical methodologies • Methodology for joint analyses with phenotypic and DNA data difficult • Active research area

  40. Conclusions • Animal breeding compute intensive – Large systems of equations – Matrices sparse • Research has high large economic value

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend