i
play

i- i e nc e Baye sian Mul t Way Mo S de l s fo r D at - PowerPoint PPT Presentation

nt s and t h e ir t re at me s. uman dise T h e c o mput at io nal ase udy h ransl y dange w n and po t e nt ial l ro st us, mo de l o rganisms are use d t o t at s are w o rganism t o h umans is a


  1. nt s and t h e ir t re at me s. uman dise T h e c o mput at io nal ase udy h ransl y dange w n and po t e nt ial l ro st us, mo de l o rganisms are use d t o t at s are w o rganism t o h umans is a pro bl e m, h de ic h is addre sse d in t h is t h e l mo io o n o f t h e o ut c me o f an e xpe rime nt fro m t h e unkno nt P r finding diffe o ds are pre se nt e d fo re t nc e s be t w e e n h h me - n xpe rime nt s is l imit e d. I t nal h is t h e sis, c o mput at io igh dime me e h e e ffe c t s and side - ffe e c t s o f ne w drug t re at t Sinc nsio xt nal o bse rvat io ns and fo r e e nsio ns o f t h is pro bl e m. sis. re e i- i e nc e Baye sian Mul t Way Mo S de l s fo r D at a T ransl at c r n in r t me nt o f I nf o ma e t i o n a nd C o mp ut io Co a S T O R A L D I S O E R T A T I O N S C D mput o at io nal Bio l o gy T mmi S uv i t a i v a l r p se h nt al t re at me nt s t at xpe are c o nse rve d ac ro ss o rime e n me nt e d dat a t ransl at io t o h o ds ide nt ify re spo nse s t rganisms. T e U l o gy A a l t o ni nal v e r s i t y D Bio io o l mmi Suvit aival Baye sian Mul t i- Way Mo de s f at o r D at a T ransl at io n in Co mput e h I o y S c h o o l f i S c i e nc e D e t s a 9 4 2 ( p d f ) r A a l t o U ni v e p r - a e nc e w w w . a c l t o . f i BU S i S t ma me nt o f I nf o r t r i o n a nd C o mp ut e 4 9 N 9 ) I S BN 9 7 8- 5 e 2 - 6 0 - 5 9 33- 4 d nt p - S BN 9 7 8- 9 5 2 6 ri 0 - 5 9 32 - 7 ( p ( d 9 nt 9 - 4 9 34 ( p ri e 7 d ) I S S N 1 7 9 1 f 7 ) I S S N - L 1 9 N 9 - 4 9 34 I S S I E s in t a fundame s be t w e e n sampl e s is nt nc al pro bl e m in c o mput at io e re 2 o - D D 1 7 1 / 0 f diffe 1 4 I nfe re nc e o nal bio al h igh - dime nsio nal dat a but t e e numbe r o f t e st subje c t h duc l asure o gy. Mo l e c ul ar me me rganisms pro nt s o f bio l o gic al o t A S U R C H I T E C T R + E S C I E N C A N + Y S + E C O N O M G A R T + D E S I E T S T O R A L D I S O E R T A T I O N S C D E C H N O L O G Y C R O S S O V E R 9HSTFMG*afjdch+

  2. Introduction ◮ Molecular measurements of ◮ Measurements can be made: biological organisms to study ◮ in vivo : cell extracts from response to: humans or model organisms ◮ disease ◮ in vitro : cell lines grown in ◮ medical treatment laboratory ◮ environment Hilvo et al. , Cancer Res. 2011

  3. Molecular activity in biological cell Watson & Crick, Nature 1953 Joyce & Palsson, Nat. Rev. Mol. Cell Biol. 2006

  4. Machine learning for computational biology ◮ Molecular measurements: ◮ Large data sets ◮ Uncertainty/noise ⇒ Automated and robust data-driven analysis tools needed ◮ Bayesian approach to probability: Posterior probability density ◮ Take uncertainty into account ◮ Describe the generative process of the data ⇒ Integration of multiple measurement sources ◮ Incorporate existing knowledge by specifying: ◮ the model structure 0 ◮ priors Covariate effect

  5. Computational medicine & contributions ◮ Model organisms for studying effects of: ◮ genomic mutations ◮ new medical treatments, potentially dangerous

  6. Computational medicine & contributions ◮ Model organisms for studying effects of: ◮ genomic mutations ◮ new medical treatments, potentially dangerous ◮ Dissertation: statistical modeling of effects in molecular measurement data with ◮ high-dimensional, noisy measurements ◮ multiple measurement types ◮ multiple organisms

  7. Computational medicine & contributions ◮ Model organisms for studying effects of: ◮ genomic mutations ◮ new medical treatments, potentially dangerous ◮ Dissertation: statistical modeling of effects in molecular measurement data with ◮ high-dimensional, noisy measurements ◮ multiple measurement types ◮ multiple organisms Kaski, MLAB 2013

  8. P I: Multi-Way Model for “ n < p ” covariates data space: a b 100...300 metabolites { { untreated healthy { (1) Data: treated { { untreated diseased { treated (2) Model: (3) Result: α αβ β B A ANOVA a b x lat V FA µ x n

  9. P II–III: Multi-Way Models for Multi-Peak Metabolomics a) Peak clustering based on shapes Sample i 1 2 3 4 Covariate 1 1 2 2 level a i Intensity Intensity Intensity Intensity Data Peaks j Retention time Retention time Retention time Retention time ⇓ ⇓ ⇓ ⇓ Result Cluster k 1 2 1 2 1 2 1 2 b) Inference of covariate effects based on intensity Peak intensities Cluster k Cluster 1 Cluster 2 Posterior probability density Posterior probability density Samples 1 2 3 4 1 1 1 Result 2 ⇒ Peaks Data 2 3 2 4 5 2 1 1 2 2 0 0 Covariate level Covariate effect Covariate effect LIPID MAPS 2014

  10. P IV: Multi-Way Model for Multiple Sources no matched variables, different dimensionalities covariates data space 1 data space 2 a b { { untreated paired samples healthy (1) { treated { { untreated diseased { treated (2) (3) Shared X-specific Y-specific α α β β 2 2 B 0.5 A x y ANOVA � 0 � 0 � n −1.5 a −3 −3 20 50 100 200 500 20 50 100 200 500 20 50 100 200 500 b 2 2 0.5 z x y � � 0 � 0 −1.5 −2 CCA Ψ x W x W y Ψ y −3 20 50 100 200 500 20 50 100 200 500 20 50 100 200 500 y lat 2 x lat ) x ) y 2 ) 2 ��� ��� ��� 0 0 0 V x V y FA ( ( ( −2 −2 −3 µ x µ y 20 50 100 200 500 20 50 100 200 500 20 50 100 200 500 x y n samples n samples n samples

  11. P V: Cross-Organism Toxicogenomics Latent s w e i Observed data variables Factor loadings v n s f w o w i e e t e Data v i e i v s v i c t l b e l View 1 2 3 1 2 3 a a u l s g ) n A a i s Components Treatments ) B a & ) C ≈ ≈ × × Model: D e g e n e Real numbers oxidation-reduction process small molecule biosynthetic process r a t i o n , Zero small molecule catabolic process N g r a o d u n l u small molecule metabolic process e l a , h r e , Swelling Hematopoiesis, extramedullary p e a o t s o n i d i o a p p G1/S transition of mitotic cell cycle h h r a i l o n ↑ ↑ ↑ g c i a t i m l i c a e p t A r i c D N microtubule-based movement Increased mitosis 1 2 D A e g e 3 n e r a t i o Human Rat Rat n , a 1 2 c i d o B mitotic chromosome condensation o s s i p h i l m i t i c , e 3 regulation of transcription involved in G1/S phase of mitotic cell cycle o s i n o p h i l c i C 1 2 3 Hypertrophy D 1 2 cell cycle DNA replication 3 in vitro in vitro in vivo 1 2 cell division Anisonucleosis E 3 chromosome organization 1 2 F c e l l 3 e a r DNA packaging n u c l o n o 1 2 n , m G D N a t i o Cellular infiltration 3 A r e n i f l r t p i l c a a r i t o i n e l l u l 1 2 DNA strand elongation involved in DNA replication i n i t a i t C H i o n Change, eosinophilic 3 interphase m i c i t m o t i s c c a mitotic sister chromatid segregation e l l p l c c y o l i c l e t h i negative regulation of mitosis c y Single cell necrosis p o n , s nucleotide-excision repair, DNA gap filling o a t i b a telomere maintenance via recombination i z e , o l g u telomere maintenance via semi-conservative replication c n a a V h transcription-coupled nucleotide-excision repair C o r g n a Result: e n e cell cycle phase transition g l a t e l cellular response to stimulus v i f e s i D s N r e i o macromolecule metabolic process g n negative regulation of organelle organization A u regulation of mitotic metaphase/anaphase transition l a m t cell part morphogenesis protein modification by small protein conjugation or removal e i o t n a b o o f → Multi-level cross-organism l i m c e p t a p r o h c a e s s e s / a n a p h a drug responses s e t r a n s i i t o n o f c e l l c y c e l Organ-level Factors Molecular level (Pathological findings) (GO terms)

  12. P VI–VII: Cross-Organism Multi-Way Model Organism X Organism Y a) no matched variables, different dimensionalities data space data space covariate b X covariate b Y no paired samples { unknown alignments { healthy time series ( ): healthy varying lengths, { { diseased diseased { { { � { { { b) matching effect time � clusters based on their a = 1 2 3 4 5 � profiles disease a = 1 2 3 4 5 effect � b = 1 2 3 4 5 b = 1 2 3 4 5

  13. Summary New machine learning models for: P I Small sample size, high dimensionality ( n < p ) P II–III Incorporating prior information about the measurement process P IV–V Multiple data sources with co-occurring samples P VI–VII Multiple data sources without co-occurring samples

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend