information theoretic analysis of molecular co evolution
play

Information-Theoretic Analysis of Molecular (Co)Evolution Using - PowerPoint PPT Presentation

Information-Theoretic Analysis of Molecular (Co)Evolution Using Graphics Processing Units Michael Waechter, Kathrin Jaeger, Stephanie Weissgraeber, Sven Widmer, Michael Goesele, and Kay Hamacher . . . AEERYAEYKEAFTLFDSDGD. . . . . .


  1. Information-Theoretic Analysis of Molecular (Co)Evolution Using Graphics Processing Units Michael Waechter, Kathrin Jaeger, Stephanie Weissgraeber, Sven Widmer, Michael Goesele, and Kay Hamacher . . . AEERYAEYKEAFTLFDSDGD. . . . . . TEEQGRQFRQM FEM FDKNGD. . . . . . TDEQQRQYRQM FETFDKDGN. . . . . . TKEQVEEFKQAFSM FDTDGD. . . . . . SEEQVAEFKEAFDRFDKNKD. . . . . . SKEQVAKFKEAFDRI DKNKD. . . . . . SPEQVAEFKQAFSRFDKNGD. . . . . . SEEQVAKFKAAFSRFDTNGD. . . . . . PPEQVAKFKEVFSRFDKNGD. . . . . . AEERYAEYKEAFTLFDSDGD. . . FDKNGD. . . FETFDKDGN. . . FDTDGD. . . . . . SEEQVAEFKEAFDRFDKNKD. . . . . . SKEQVAKFKEAFDRI DKNKD. . . . . . SPEQVAEFKQAFSRFDKNGD. . . . . . SEEQVAKFKAAFSRFDTNGD. . . . . . PPEQVAKFKEVFSRFDKNGD. . . FEM . . . TKEQVEEFKQAFSM . . . TEEQGRQFRQM . . . TDEQQRQYRQM June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber 1

  2. Motivation ● Huge amount of Multiple Sequence Alignments (MSAs) available, some of them really large ● E.g., HIV protease [1]: > 45,000 sequences of length > 1400 ● Put them to use for coevolutionary and structural analysis ● But: Our computations take >25 days [1] Pan et. al.:“The HIV positive selection mutation database” June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber 2

  3. Outline ● In this talk we will show… ● MSA analysis using Mutual Information ● GPU parallelization & speed improvements ● 3-point Mutual Information contributions ● an application to a well-known protein ● that the use of this is beneficial June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber 3

  4. Introduction – Mutual Information ● Given an MSA: Sequence 1: AEERYAEYKEAFTLFDSDGD. . . Sequence 2: TEEQGRQFRQM FEM FDKNGD. . . Sequence 3: TDEQQRQYRQM FETFDKDGN. . . Sequence 4: TKEQVEEFKQAFSM FDTDGD. . . Sequence 5: SEEQVAEFKEAFDRFDKNKD. . . Sequence 6: SKEQVAKFKEAFDRI DKNKD. . . Sequence 7: SPEQVAEFKQAFSRFDKNGD. . . Sequence 8: SEEQVAKFKAAFSRFDTNGD. . . ● Mutual Information between two columns (correlation  coevolution): ● Iteration over all column pairs  MI matrix: June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber 4

  5. Introduction – Shuffling Null-Model ● MI is sensitive to underlying amino acid distribution ● Computational Normalization: Shuffling Null-Model [2] ● Is MI distinguishable from “random evolution” MI? [2] K. Hamacher: “Relating sequence evolution of HIV1-protease to its underlying molecular mechanics” June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber

  6. Introduction – Shuffling Null-Model ● Compute original MI ● Iterate 10,000 times: AEER. . . SEEQ. . . ● Shuffle each MSA column TEEQ. . . TDER. . . TDEQ. . . TKEQ. . . ● Compute rand. MI matrix SEEQ. . . SEEQ. . . SKEQ. . . APEQ. . . PPEQ. . . PEEQ. . . SEEQ. . . SEEQ. . . PEEQ. . . ● Normalize original MI TPEQ. . . AKEQ. . . using random MI: TDER. . . . . . . . . June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber 6 6

  7. Massive parallelism ● Highly compute intensive ● HIV-1 protease on single core: ● MI computation for all column pairs: ~3.5 min ● Repeat for 10,000 iterations: > 25 days ● But: ● Computation of each MI matrix entry independent of all others ● Shuffling of each MSA column independent of all others ● Parallelizable (to hundreds of thousands of threads) June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber 7

  8. GPU Implementation ● Iterate 10,000 times: ● Shuffling . . . AEERYA. . . . . . TEEQGR. . . – Map MSA columns to blocks of threads . . . TDEQQR. . . . . . TKEQVE. . . – Shuffle columns (GPU suited algorithm) . . . SEEQVA. . . – Synchronize . . . SKEQVA. . . . . . SPEQVA. . . ● MI computation – Map MI matrix entries to blocks of threads (suitable for MSA access pattern) – Compute MI matrix entries – Synchronize ● Combine results & normalize orig. MI with randomized MI June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber 8

  9. Speed Results GeForce GTX 480 4 threads on Core i7 ‐ 960 Calmodulin 1.1 min 13.4 min 753 sequences ~ 12x speed ‐ up of length 264 HIV ‐ 1 protease 1.85 days 7.3 days > 45,000 seqs. ~ 4x speed ‐ up of length > 1400 ● Problem size dependent June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber 9

  10. Implications ● One order of magnitude speed-up ● Quickly redo previous steps (e.g., alignment) and recompute MI ● New analysis tool feasible: 3-point MI: Coevolution of a ‘3-clique’ of MSA columns ● Can we deduce more information from 3-point MI than we could from 2-point MI alone? June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber 10

  11. Calmodulin ● 149 amino acids ● Ca 2+ binding  conformational change ● Regulates various signaling pathways June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber 11

  12. Coevolution in Calmodulin – 2-point MI ● Finding coevolving pairs of amino acids ● Structural or functional connection ● Here: Coevolution within N- and C-terminus ● Ca 2+ binding ● Propagation of conformational change ● Conserved inner helix ● No coevolution without variation June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber 12

  13. Coevolution in Calmodulin – 3-point MI ● ‘3-cliques’ of amino acids ● Higher order correlations ● Concerted motions ● Binding sites June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber 13

  14. Coevolution in Calmodulin – 3-point MI ● ‘3-cliques’ of amino acids ● ● Color indicates the frequency with which an amino acid contributes to the ‘3-cliques’ set ● Key residues for important functions June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber 14

  15. Conclusions ● MI for coevolutionary analysis ● GPU implementation ~10x faster on typical MSAs ● 3-point MI analysis possible in acceptable time ● 3-point MI does reveal new insights ● Next step could be k-point MI ● It may be possible to detect key residues in unknown proteins June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber 15

  16. What happened since? ● Multi-GPU parallelization: ● Distribute Shuffling Null-Model iterations among GPUs ● First tests: 32 GPUs  ~32x speed-up (on top of basic GPU speed- up!) June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber 16

  17. Please visit tinyurl.com/tud ‐ comic Thank you. for code & documentation or contact us. June 18, 2012 | ECMLS 2012 | Michael Waechter & Stephanie Weissgraeber 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend