nonparametric combinatorial sequence models
play

Nonparametric combinatorial sequence models Fabian L. Wauthier, UC - PowerPoint PPT Presentation

Nonparametric combinatorial sequence models Fabian L. Wauthier, UC Berkeley with Nebojsa Jojic (MSR) and Michael I. Jordan (UCB) 30 th March, 2011 Fabian L. Wauthier: Nonparametric combinatorial sequence models, 1 Biological motivation:


  1. Nonparametric combinatorial sequence models Fabian L. Wauthier, UC Berkeley with Nebojsa Jojic (MSR) and Michael I. Jordan (UCB) 30 th March, 2011 Fabian L. Wauthier: Nonparametric combinatorial sequence models, 1

  2. Biological motivation: Sequence variability Y N Q S E A G S H I I Q R M Y G C D Y N Q S E A G S H T L Q R M Y G C D Y N Q S E D G S H T I Q I M Y G C D ◮ Suppose we are given aligned sequences. Fabian L. Wauthier: Nonparametric combinatorial sequence models, 2

  3. Biological motivation: Sequence variability Y N Q S E A G S H I I Q R M Y G C D Y N Q S E A G S H T L Q R M Y G C D Y N Q S E D G S H T I Q I M Y G C D ◮ Suppose we are given aligned sequences. ◮ Interest in understanding sequence variability: • Functional properties, domains, ancestral inference, etc. Fabian L. Wauthier: Nonparametric combinatorial sequence models, 2

  4. Biological motivation: Sequence variability Y N Q S E A G S H I I Q R M Y G C D Y N Q S E A G S H T L Q R M Y G C D Y N Q S E D G S H T I Q I M Y G C D ◮ Suppose we are given aligned sequences. ◮ Interest in understanding sequence variability: • Functional properties, domains, ancestral inference, etc. ◮ Many simplifying assumptions in previous work: • Site independence: Kingman coalescents, phylogenetic trees. Fabian L. Wauthier: Nonparametric combinatorial sequence models, 2

  5. Biological motivation: Sequence variability Y N Q S E A G S H I I Q R M Y G C D Y N Q S E A G S H T L Q R M Y G C D Y N Q S E D G S H T I Q I M Y G C D ◮ Suppose we are given aligned sequences. ◮ Interest in understanding sequence variability: • Functional properties, domains, ancestral inference, etc. ◮ Many simplifying assumptions in previous work: • Site independence: Kingman coalescents, phylogenetic trees. • Full site dependence: Mixture models Fabian L. Wauthier: Nonparametric combinatorial sequence models, 2

  6. Biological motivation: Sequence variability Y N Q S E A G S H I I Q R M Y G C D Y N Q S E A G S H T L Q R M Y G C D Y N Q S E D G S H T I Q I M Y G C D ◮ Suppose we are given aligned sequences. ◮ Interest in understanding sequence variability: • Functional properties, domains, ancestral inference, etc. ◮ Many simplifying assumptions in previous work: • Site independence: Kingman coalescents, phylogenetic trees. • Full site dependence: Mixture models • Sequential stochastic process: HMMs, changepoint models. Fabian L. Wauthier: Nonparametric combinatorial sequence models, 2

  7. Biological motivation: Sequence variability Y N Q S E A G S H I I Q R M Y G C D Y N Q S E A G S H T L Q R M Y G C D Y N Q S E D G S H T I Q I M Y G C D ◮ Suppose we are given aligned sequences. ◮ Interest in understanding sequence variability: • Functional properties, domains, ancestral inference, etc. ◮ Many simplifying assumptions in previous work: • Site independence: Kingman coalescents, phylogenetic trees. • Full site dependence: Mixture models • Sequential stochastic process: HMMs, changepoint models. Our interest: sequences where these assumptions do not hold Fabian L. Wauthier: Nonparametric combinatorial sequence models, 2

  8. Biological motivation: Sequence variability Y N Q S E A G S H I I Q R M Y G C D Y N Q S E A G S H T L Q R M Y G C D Y N Q S E D G S H T I Q I M Y G C D ◮ Suppose we are given aligned sequences. ◮ Interest in understanding sequence variability: • Functional properties, domains, ancestral inference, etc. ◮ Many simplifying assumptions in previous work: • Site independence: Kingman coalescents, phylogenetic trees. • Full site dependence: Mixture models • Sequential stochastic process: HMMs, changepoint models. Our interest: sequences where these assumptions do not hold ◮ Partial, long-range site dependencies Fabian L. Wauthier: Nonparametric combinatorial sequence models, 2

  9. Example: MHC I proteins Freeman and Company, 2007 ◮ MHC I proteins present peptide chains to T-cell receptors. Fabian L. Wauthier: Nonparametric combinatorial sequence models, 3

  10. Example: MHC I proteins Freeman and Company, 2007 ◮ MHC I proteins present peptide chains to T-cell receptors. Fabian L. Wauthier: Nonparametric combinatorial sequence models, 3

  11. Example: MHC I proteins Freeman and Company, 2007 ◮ MHC I proteins present peptide chains to T-cell receptors. Fabian L. Wauthier: Nonparametric combinatorial sequence models, 3

  12. Example: MHC I proteins Freeman and Company, 2007 ◮ MHC I proteins present peptide chains to T-cell receptors. Fabian L. Wauthier: Nonparametric combinatorial sequence models, 3

  13. Example: MHC I proteins Freeman and Company, 2007 ◮ MHC I proteins present peptide chains to T-cell receptors. ◮ Peptides originating from virus protein ⇒ destruction of cell. Fabian L. Wauthier: Nonparametric combinatorial sequence models, 3

  14. Example: MHC I proteins Freeman and Company, 2007 ◮ MHC I proteins present peptide chains to T-cell receptors. ◮ Peptides originating from virus protein ⇒ destruction of cell. ◮ Variability: duplication + mutation + fitness pressure. Fabian L. Wauthier: Nonparametric combinatorial sequence models, 3

  15. Example: MHC I proteins Freeman and Company, 2007 ◮ MHC I proteins present peptide chains to T-cell receptors. ◮ Peptides originating from virus protein ⇒ destruction of cell. ◮ Variability: duplication + mutation + fitness pressure. Our Interest: model sequence variability , not its origins . Fabian L. Wauthier: Nonparametric combinatorial sequence models, 3

  16. Example: MHC I proteins Freeman and Company, 2007 Fabian L. Wauthier: Nonparametric combinatorial sequence models, 4

  17. Example: MHC I proteins Freeman and Company, 2007 ◮ Binding site decomposes into pockets (Sidney et al., 2008) Expect partial site linkage. Fabian L. Wauthier: Nonparametric combinatorial sequence models, 4

  18. Example: MHC I proteins Freeman and Company, 2007 ◮ Binding site decomposes into pockets (Sidney et al., 2008) Expect partial site linkage. ⇒ Full site (in)dependence inappropriate Fabian L. Wauthier: Nonparametric combinatorial sequence models, 4

  19. Example: MHC I proteins Freeman and Company, 2007 ◮ Binding site decomposes into pockets (Sidney et al., 2008) Expect partial site linkage. ⇒ Full site (in)dependence inappropriate ◮ Variability due to evolutionary pressure on 3D binding site. Variable sites are discontiguous ⇒ long-range dependencies. Fabian L. Wauthier: Nonparametric combinatorial sequence models, 4

  20. Example: MHC I proteins Freeman and Company, 2007 ◮ Binding site decomposes into pockets (Sidney et al., 2008) Expect partial site linkage. ⇒ Full site (in)dependence inappropriate ◮ Variability due to evolutionary pressure on 3D binding site. Variable sites are discontiguous ⇒ long-range dependencies. ⇒ Markovian analysis inappropriate Fabian L. Wauthier: Nonparametric combinatorial sequence models, 4

  21. Our model: high level Main idea: Each sequence is composed of smaller components. Fabian L. Wauthier: Nonparametric combinatorial sequence models, 5

  22. Our model: high level Main idea: Each sequence is composed of smaller components. Fabian L. Wauthier: Nonparametric combinatorial sequence models, 5

  23. Our model: high level Main idea: Each sequence is composed of smaller components. 1. Sites grouped into discontiguous, aligned components (gray). Fabian L. Wauthier: Nonparametric combinatorial sequence models, 5

  24. Our model: high level Main idea: Each sequence is composed of smaller components. 1. Sites grouped into discontiguous, aligned components (gray). 2. Components of a sequence assigned a PSSM (colors). Fabian L. Wauthier: Nonparametric combinatorial sequence models, 5

  25. Our model: high level Main idea: Each sequence is composed of smaller components. 1. Sites grouped into discontiguous, aligned components (gray). 2. Components of a sequence assigned a PSSM (colors). 3. Symbols sampled from assigned PSSMs. Fabian L. Wauthier: Nonparametric combinatorial sequence models, 5

  26. Our model: high level Main idea: Each sequence is composed of smaller components. 1. Sites grouped into discontiguous, aligned components (gray). 2. Components of a sequence assigned a PSSM (colors). 3. Symbols sampled from assigned PSSMs. C.f. Probabilistic index map (Jojic and Caspi, CVPR 2004; Jojic et al., UAI 2004) Fabian L. Wauthier: Nonparametric combinatorial sequence models, 5

  27. Missing information Do not know how many site groups/PSSMs there are! Fabian L. Wauthier: Nonparametric combinatorial sequence models, 6

  28. Missing information Do not know how many site groups/PSSMs there are! ◮ Our approach : put a prior distribution on these unknowns Fabian L. Wauthier: Nonparametric combinatorial sequence models, 6

  29. Missing information Do not know how many site groups/PSSMs there are! ◮ Our approach : put a prior distribution on these unknowns ◮ Our model : A Chinese Restaurant Franchise (CRF) conditioned on a Chinese Restaurant Process (CRP) Fabian L. Wauthier: Nonparametric combinatorial sequence models, 6

  30. Missing information Do not know how many site groups/PSSMs there are! ◮ Our approach : put a prior distribution on these unknowns ◮ Our model : A Chinese Restaurant Franchise (CRF) conditioned on a Chinese Restaurant Process (CRP) 1. CRP: induces prior on number of site groups. Fabian L. Wauthier: Nonparametric combinatorial sequence models, 6

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend