i- i e nc e Baye sian Mul t Way Mo S de l s fo r D at - - PowerPoint PPT Presentation

i
SMART_READER_LITE
LIVE PREVIEW

i- i e nc e Baye sian Mul t Way Mo S de l s fo r D at - - PowerPoint PPT Presentation

nt s and t h e ir t re at me s. uman dise T h e c o mput at io nal ase udy h ransl y dange w n and po t e nt ial l ro st us, mo de l o rganisms are use d t o t at s are w o rganism t o h umans is a


slide-1
SLIDE 1

9HSTFMG*afjdch+

I S BN 9 7 8- 9 5 2

  • 6
  • 5

9 32

  • 7

( p ri nt e d ) I S BN 9 7 8- 9 5 2

  • 6
  • 5

9 33- 4 ( p d f ) I S S N

  • L

1 7 9 9

  • 4

9 34 I S S N 1 7 9 9

  • 4

9 34 ( p ri nt e d ) I S S N 1 7 9 9

  • 4

9 4 2 ( p d f ) A a l t

  • U

ni v e r s i t y S c h

  • l
  • f

S c i e nc e D e p a r t me nt

  • f

I nf

  • r

ma t i

  • n a

nd C

  • mp

ut e r S c i e nc e w w w . a a l t

  • .

f i BU S I N E S S + E C O N O M Y A R T + D E S I G N + A R C H I T E C T U R E S C I E N C E + T E C H N O L O G Y C R O S S O V E R D O C T O R A L D I S S E R T A T I O N S A al t

  • D

D 1 7 1 / 2 1 4

I nfe re nc e

  • f diffe

re nc e s be t w e e n sampl e s is a fundame nt al pro bl e m in c

  • mput

at io nal bio l

  • gy.

Mo l e c ul ar me asure me nt s o f bio l

  • gic

al

  • rganisms pro

duc e h igh

  • dime

nsio nal dat a but t h e numbe r o f t e st subje c t s in t h e e xpe rime nt s is l imit e d. I n t h is t h e sis, c

  • mput

at io nal me t h

  • ds are

pre se nt e d fo r finding diffe re nc e s be t w e e n h igh

  • dime

nsio nal

  • bse

rvat io ns and fo r e xt e nsio ns o f t h is pro bl e m. Sinc e t h e e ffe c t s and side

  • e

ffe c t s o f ne w drug t re at me nt s are unkno w n and po t e nt ial l y dange ro us, mo de l

  • rganisms are

use d t

  • st

udy h uman dise ase s and t h e ir t re at me nt s. T h e c

  • mput

at io nal t ransl at io n

  • f t

h e

  • ut

c

  • me
  • f an e

xpe rime nt fro m t h e mo de l

  • rganism t
  • h

umans is a pro bl e m, w h ic h is addre sse d in t h is t h e sis. P re se nt e d dat a t ransl at io n me t h

  • ds ide

nt ify re spo nse s t

  • e

xpe rime nt al t re at me nt s t h at are c

  • nse

rve d ac ro ss o rganisms. T

  • mmi Suvit

aival Baye sian Mul t i- Way Mo de l s f

  • r D

at a T ransl at io n in Co mput at io nal Bio l

  • gy

A a l t

  • U

ni v e r s i t y D e p a r t me nt

  • f

I nf

  • r

ma t i

  • n a

nd C

  • mp

ut e r S c i e nc e

Baye sian Mul t i- Way Mo de l s fo r D at a T ransl at io n in Co mput at io nal Bio l

  • gy

T

  • mmi

S uv i t a i v a l

D O C T O R A L D I S S E R T A T I O N S

slide-2
SLIDE 2

Introduction

◮ Molecular measurements of

biological organisms to study response to:

◮ disease ◮ medical treatment ◮ environment

◮ Measurements can be made:

◮ in vivo: cell extracts from

humans or model organisms

◮ in vitro: cell lines grown in

laboratory

Hilvo et al., Cancer Res. 2011

slide-3
SLIDE 3

Molecular activity in biological cell

Watson & Crick, Nature 1953 Joyce & Palsson, Nat. Rev. Mol. Cell Biol. 2006

slide-4
SLIDE 4

Machine learning for computational biology

◮ Molecular measurements:

◮ Large data sets ◮ Uncertainty/noise

⇒ Automated and robust data-driven analysis tools needed

◮ Bayesian approach to probability:

◮ Take uncertainty into account ◮ Describe the generative process of the data

⇒ Integration of multiple measurement sources

◮ Incorporate existing knowledge

by specifying:

◮ the model structure ◮ priors

Posterior probability density Covariate effect

slide-5
SLIDE 5

Computational medicine & contributions

◮ Model organisms for studying effects of:

◮ genomic mutations ◮ new medical treatments, potentially dangerous

slide-6
SLIDE 6

Computational medicine & contributions

◮ Model organisms for studying effects of:

◮ genomic mutations ◮ new medical treatments, potentially dangerous

◮ Dissertation: statistical modeling of effects in molecular measurement

data with

◮ high-dimensional, noisy measurements ◮ multiple measurement types ◮ multiple organisms

slide-7
SLIDE 7

Computational medicine & contributions

◮ Model organisms for studying effects of:

◮ genomic mutations ◮ new medical treatments, potentially dangerous

◮ Dissertation: statistical modeling of effects in molecular measurement

data with

◮ high-dimensional, noisy measurements ◮ multiple measurement types ◮ multiple organisms Kaski, MLAB 2013

slide-8
SLIDE 8

P I: Multi-Way Model for “n < p”

(1) Data:

untreated treated data space: 100...300 metabolites

{ { { {

treated untreated healthy diseased b a covariates

{ {

(2) Model: (3) Result:

n A B x V µ xlat a b α β αβ FA ANOVA

slide-9
SLIDE 9

P II–III: Multi-Way Models for Multi-Peak Metabolomics

a) Peak clustering based on shapes Sample i 1 2 3 4 Covariate level ai 1 1 2 2 Data Peaks j Intensity Retention time Intensity Retention time Intensity Retention time Intensity Retention time

⇓ ⇓ ⇓ ⇓

Result Cluster k 1 2 1 2 1 2 1 2 b) Inference of covariate effects based on intensity Peak intensities Cluster k Cluster 1 Cluster 2 Data 1 2 3 4 1 1 2 2 Peaks 1 2 3 4 5 Samples Covariate level 1 1 2 2 2

Result Posterior probability density Covariate effect Posterior probability density Covariate effect

LIPID MAPS 2014

slide-10
SLIDE 10

P IV: Multi-Way Model for Multiple Sources

(1)

b a covariates

{ { { {

no matched variables, different dimensionalities paired samples data space 1 data space 2 untreated treated treated untreated healthy diseased

{ {

(2)

n A B x y µx Vx µy Vy z a b xlat Ψx Wx ylat Ψy Wy α β α β FA CCA ANOVA

(3)

20 50 100 200 500 −3 2

  • 20

50 100 200 500 −1.5 0.5 20 50 100 200 500 −2 2

( )

  • 20

50 100 200 500 −3 2 x 20 50 100 200 500 −2 2 x 20 50 100 200 500 −3 2

( )x

20 50 100 200 500 −1.5 0.5 y 20 50 100 200 500 −3 2 y 20 50 100 200 500 −2 2

( )y

  • Shared

X-specific Y-specific

n samples n samples n samples

slide-11
SLIDE 11

P V: Cross-Organism Toxicogenomics

Data & Model:

≈ × Observed data Latent variables Factor loadings

View 1 2 3 Treatments 1 2 3 Components Real numbers Zero a c t i v e i n A ) a l l v i e w s B ) a s u b s e t

  • f

v i e w s C ) a s i n g l e v i e w

≈ ×

↑ Human in vitro ↑ Rat in vitro ↑ Rat in vivo

Result: → Multi-level cross-organism drug responses

  • xidation-reduction process

small molecule biosynthetic process small molecule catabolic process small molecule metabolic process D N A r e p l i c a t i

  • n

G1/S transition of mitotic cell cycle microtubule-based movement m i t

  • s

i s mitotic chromosome condensation regulation of transcription involved in G1/S phase of mitotic cell cycle cell cycle DNA replication cell division chromosome organization DNA packaging D N A r e p l i c a t i

  • n

i n i t i a t i

  • n

DNA strand elongation involved in DNA replication interphase m i t

  • t

i c c e l l c y c l e mitotic sister chromatid segregation negative regulation of mitosis nucleotide-excision repair, DNA gap filling telomere maintenance via recombination telomere maintenance via semi-conservative replication transcription-coupled nucleotide-excision repair

  • r

g a n e l l e f i s s i

  • n

n e g a t i v e r e g u l a t i

  • n
  • f

m e t a p h a s e / a n a p h a s e t r a n s i t i

  • n
  • f

c e l l c y c l e cell cycle phase transition cellular response to stimulus D N A m e t a b

  • l

i c p r

  • c

e s s macromolecule metabolic process negative regulation of organelle organization regulation of mitotic metaphase/anaphase transition protein modification by small protein conjugation or removal cell part morphogenesis Swelling D e g e n e r a t i

  • n

, g r a n u l a r , e

  • s

i n

  • p

h i l i c N

  • d

u l e , h e p a t

  • d

i a p h r a g m a t i c Hematopoiesis, extramedullary Increased mitosis D e g e n e r a t i

  • n

, a c i d

  • p

h i l i c , e

  • s

i n

  • p

h i l i c Hypertrophy Anisonucleosis C e l l u l a r i n f i l t r a t i

  • n

, m

  • n
  • n

u c l e a r c e l l Cellular infiltration Change, eosinophilic V a c u

  • l

i z a t i

  • n

, c y t

  • p

l a s m i c C h a n g e , b a s

  • p

h i l i c Single cell necrosis A

1 2 3

B

1 2 3

C

1 2 3

D

1 2 3

E

1 2 3

F

1 2 3

G

1 2 3

H

1 2 3

Organ-level Factors Molecular level (Pathological findings) (GO terms)

slide-12
SLIDE 12

P VI–VII: Cross-Organism Multi-Way Model

no matched variables, different dimensionalities

{

{

  • a = 1 2 3 4 5

b = 1 2 3 4 5 a = 1 2 3 4 5 b = 1 2 3 4 5

time effect disease effect

no paired samples time series ( ): varying lengths, unknown alignments

{

{

healthy diseased Organism X Organism Y covariate b data space X data space Y covariate b a) b) healthy diseased

{ { {

{ { {

matching clusters based on their profiles

slide-13
SLIDE 13

Summary

New machine learning models for: P I Small sample size, high dimensionality (n < p) P II–III Incorporating prior information about the measurement process P IV–V Multiple data sources with co-occurring samples P VI–VII Multiple data sources without co-occurring samples

slide-14
SLIDE 14

9HSTFMG*afjdch+

I S BN 9 7 8- 9 5 2

  • 6
  • 5

9 32

  • 7

( p ri nt e d ) I S BN 9 7 8- 9 5 2

  • 6
  • 5

9 33- 4 ( p d f ) I S S N

  • L

1 7 9 9

  • 4

9 34 I S S N 1 7 9 9

  • 4

9 34 ( p ri nt e d ) I S S N 1 7 9 9

  • 4

9 4 2 ( p d f ) A a l t

  • U

ni v e r s i t y S c h

  • l
  • f

S c i e nc e D e p a r t me nt

  • f

I nf

  • r

ma t i

  • n a

nd C

  • mp

ut e r S c i e nc e w w w . a a l t

  • .

f i BU S I N E S S + E C O N O M Y A R T + D E S I G N + A R C H I T E C T U R E S C I E N C E + T E C H N O L O G Y C R O S S O V E R D O C T O R A L D I S S E R T A T I O N S A al t

  • D

D 1 7 1 / 2 1 4

I nfe re nc e

  • f diffe

re nc e s be t w e e n sampl e s is a fundame nt al pro bl e m in c

  • mput

at io nal bio l

  • gy.

Mo l e c ul ar me asure me nt s o f bio l

  • gic

al

  • rganisms pro

duc e h igh

  • dime

nsio nal dat a but t h e numbe r o f t e st subje c t s in t h e e xpe rime nt s is l imit e d. I n t h is t h e sis, c

  • mput

at io nal me t h

  • ds are

pre se nt e d fo r finding diffe re nc e s be t w e e n h igh

  • dime

nsio nal

  • bse

rvat io ns and fo r e xt e nsio ns o f t h is pro bl e m. Sinc e t h e e ffe c t s and side

  • e

ffe c t s o f ne w drug t re at me nt s are unkno w n and po t e nt ial l y dange ro us, mo de l

  • rganisms are

use d t

  • st

udy h uman dise ase s and t h e ir t re at me nt s. T h e c

  • mput

at io nal t ransl at io n

  • f t

h e

  • ut

c

  • me
  • f an e

xpe rime nt fro m t h e mo de l

  • rganism t
  • h

umans is a pro bl e m, w h ic h is addre sse d in t h is t h e sis. P re se nt e d dat a t ransl at io n me t h

  • ds ide

nt ify re spo nse s t

  • e

xpe rime nt al t re at me nt s t h at are c

  • nse

rve d ac ro ss o rganisms. T

  • mmi Suvit

aival Baye sian Mul t i- Way Mo de l s f

  • r D

at a T ransl at io n in Co mput at io nal Bio l

  • gy

A a l t

  • U

ni v e r s i t y D e p a r t me nt

  • f

I nf

  • r

ma t i

  • n a

nd C

  • mp

ut e r S c i e nc e

Baye sian Mul t i- Way Mo de l s fo r D at a T ransl at io n in Co mput at io nal Bio l

  • gy

T

  • mmi

S uv i t a i v a l

D O C T O R A L D I S S E R T A T I O N S