baysian haplotype inference via the dirichlet process
play

Baysian Haplotype Inference via the Dirichlet Process Eric Xing, - PowerPoint PPT Presentation

Baysian Haplotype Inference via the Dirichlet Process Eric Xing, Micheal Jordan, Roded Sharan presented by Amrudin Agovic Motivation 99.9 % of human DNA shared 0.1% of DNA makes up for differences Need to determine what those


  1. Baysian Haplotype Inference via the Dirichlet Process Eric Xing, Micheal Jordan, Roded Sharan presented by Amrudin Agovic

  2. Motivation  99.9 % of human DNA shared  0.1% of DNA makes up for differences  Need to determine what those 0.1% are  Find genes responsible for diseases

  3. Background  Humans have 23 pairs of chromosomes in their cells  23 come from the father, 23 from the mother  Certain parts of the genome are inherited unchanged  Other genetic information gets mixed up

  4. Background  Allele: genetic coding that occupies a position on the chromosome.  Genotype: unordered pairs of Alleles in a region (one from each chromosome)  Phase: Allele Chromosome association (not given)  SNP: Single Nucleotide Polymorphism, difference in one nucleotide (A,C,G,T)  Haplotype: set of associated SNP alleles in a region of a chromosome. A haplotype is inherited as a unit.

  5. Background

  6. Dirichlet Process Representation Let  G 0 ( Ф ) be a base measure for the dirichlet process  A (k) :=[A 1 (k) ,..,A J (k) ] be a founding haplotype configuration (ancestral template) at loci t=[1,..,J]  θ (k) be the mutation rate of the ancestor  Ф be the parameter associated with a mixture component. Where Ф k = {A (k) , θ (k) }

  7. Dirichlet Process Representation  Use Chinese Restaurant Process  Associate population haplotype with table  Sample for each table Ф k = {A (k) , θ (k) }

  8. The Model

  9. Assumptions  G 0 ( A,θ )=p( A)p(θ)  p(A) uniform distribution over all haplotypes  p(θ) is Beta( α h , β h )

  10. Distributions Considering for all alleles mutations: Integrating out theta:

  11. Noisy Observation Model  Observed Genotype at a locus determined by parental and maternal alleles  If genotype disagrees penalize  γ has Beta prior

  12. Pedigree-Haplotyper

  13. Inference - Gibbs Sampling  γ and θ integrated out  Sample C it , A j (k) , H it,j (k) 1) Given current hidden values of haplotypes sample c it , a j

  14. Gibbs Sampling 2) Given ancestral assignment and ancestral pool sample haplotype

  15. Metropolis Hastings  Long list of loci and uniform prior p(a), leaves probability of sampling new ancestor very small.  Slow mixing  Sample ancestor assignment using proposal distribution

  16. Metropolis Hastings  In acceptance probability, the proposal factor cancels out

  17. Experiments  Simulated Data: Haplotypes randomly paired to form genotypes.  Performance compared to PHASE

  18. Experiments  Two real data sets: 129 individuals, 90 individuals from 4 populations Dataset 1:

  19. Experiments Dataset 2:  Small sample size, tougher data set  Haplotyper outperforms PHASE

  20. Conclusions  Algorithm outperform PHASE on two data sets With a big margin on one of them.  Strength of proposed approach in flexibility  Can be extended to incorporate aspects of evolutionary dynamics and other things  Illustrated example: Pedigree information

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend