Latent Variable models for GWAs Oliver Stegle Machine Learning and - - PowerPoint PPT Presentation

latent variable models for gwas
SMART_READER_LITE
LIVE PREVIEW

Latent Variable models for GWAs Oliver Stegle Machine Learning and - - PowerPoint PPT Presentation

Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research Group Max-Planck-Institutes T ubingen, Germany September 2011 O. Stegle Latent variable models for GWAs T ubingen 1 Motivation Why


slide-1
SLIDE 1

Latent Variable models for GWAs

Oliver Stegle

Machine Learning and Computational Biology Research Group Max-Planck-Institutes T¨ ubingen, Germany

September 2011

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 1

slide-2
SLIDE 2

Motivation

Why latent variables ?

Causal influences on phenotypes

◮ Genotype

◮ Primary variable of

interest

◮ Known confounding

factors

◮ Covariates ◮ Population structure

◮ Unknown (latent)

confounders

◮ Sample handling ◮ Sample history ◮ Subtle environmental

perturbations

?

Phenome Genome

ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT individuals phenotypes SNPs

y y y y y

y1

y y y y y

y2

y y y y y

yN

...

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 2

slide-3
SLIDE 3

Motivation

Why latent variables ?

Causal influences on phenotypes

◮ Genotype

◮ Primary variable of

interest

◮ Known confounding

factors

◮ Covariates ◮ Population structure

◮ Unknown (latent)

confounders

◮ Sample handling ◮ Sample history ◮ Subtle environmental

perturbations

?

Phenome Genome

ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT individuals phenotypes SNPs

y y y y y

y1

y y y y y

y2

y y y y y

yN

...

Covariates Population

y y y y y

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 2

slide-4
SLIDE 4

Motivation

Why latent variables ?

Causal influences on phenotypes

◮ Genotype

◮ Primary variable of

interest

◮ Known confounding

factors

◮ Covariates ◮ Population structure

◮ Unknown (latent)

confounders

◮ Sample handling ◮ Sample history ◮ Subtle environmental

perturbations

?

Phenome Genome

ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT individuals phenotypes SNPs

y y y y y

y1

y y y y y

y2

y y y y y

yN

...

y y y y y

Confounders Covariates Population

y y y y y

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 2

slide-5
SLIDE 5

Outline

Outline

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 3

slide-6
SLIDE 6

Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM)

Outline

Motivation Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Modeling hidden confounders in GWAs Model Applications Modeling unobserved cellular phenotypes in genetic analyses Model Applications A unifying view Summary

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 4

slide-7
SLIDE 7

Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM)

Manifolds and dimension reduction

(from Olivier Grisel, Generated using the Modular Data Processing toolkit and matplotlib.)

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 5

slide-8
SLIDE 8

Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM)

Linear dimension reduction

◮ Map G dimensional data on K dimensional manifold; K << G

Y

  • NxG

= H

  • NxK

W

  • KxG

+ Ψ

  • NxG

◮ H: latent factors in low-dimensional space ◮ W: weights for factors on data dimensions ◮ Ψ: noise, ψn,g ∼ N(0, σ2). ◮ Challenge: neither W nor H known!

◮ Depending on assumptions on W and H:

◮ Principle component analysis (PCA) ◮ Independent component analysis (ICA) ◮ ...

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 6

slide-9
SLIDE 9

Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM)

Linear dimension reduction

◮ Map G dimensional data on K dimensional manifold; K << G

Y

  • NxG

= H

  • NxK

W

  • KxG

+ Ψ

  • NxG

◮ H: latent factors in low-dimensional space ◮ W: weights for factors on data dimensions ◮ Ψ: noise, ψn,g ∼ N(0, σ2). ◮ Challenge: neither W nor H known!

◮ Depending on assumptions on W and H:

◮ Principle component analysis (PCA) ◮ Independent component analysis (ICA) ◮ ...

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 6

slide-10
SLIDE 10

Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM)

Linear dimension reduction

◮ Map G dimensional data on K dimensional manifold; K << G

Y

  • NxG

= H

  • NxK

W

  • KxG

+ Ψ

  • NxG

◮ H: latent factors in low-dimensional space ◮ W: weights for factors on data dimensions ◮ Ψ: noise, ψn,g ∼ N(0, σ2). ◮ Challenge: neither W nor H known!

◮ Depending on assumptions on W and H:

◮ Principle component analysis (PCA) ◮ Independent component analysis (ICA) ◮ ...

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 6

slide-11
SLIDE 11

Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM)

Linear dimension reduction

PCA

PCA is corresponds to a noise-free version of the model Y

  • NxG

= H

  • NxK

W

  • KxG

◮ PCA components (H) correspond to directions of maximum data

variance in the original dataset:

◮ Covariance matrix: C = YYT ◮ Eigenvalue/Eigen vectors Cvi = λivi ◮ Projection matrix P = [v1, . . . , vK] ◮ Principle components Hn = P · Yn.

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 7

slide-12
SLIDE 12

Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM)

Linear dimension reduction

PCA

PCA is corresponds to a noise-free version of the model Y

  • NxG

= H

  • NxK

W

  • KxG

◮ PCA components (H) correspond to directions of maximum data

variance in the original dataset:

◮ Covariance matrix: C = YYT ◮ Eigenvalue/Eigen vectors Cvi = λivi ◮ Projection matrix P = [v1, . . . , vK] ◮ Principle components Hn = P · Yn.

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 7

slide-13
SLIDE 13

Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM)

Linear dimension reduction

PCA

PCA is corresponds to a noise-free version of the model Y

  • NxG

= H

  • NxK

W

  • KxG

◮ PCA components (H) correspond to directions of maximum data

variance in the original dataset:

◮ Covariance matrix: C = YYT ◮ Eigenvalue/Eigen vectors Cvi = λivi ◮ Projection matrix P = [v1, . . . , vK] ◮ Principle components Hn = P · Yn.

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 7

slide-14
SLIDE 14

Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM)

Linear dimension reduction

PCA

PCA is corresponds to a noise-free version of the model Y

  • NxG

= H

  • NxK

W

  • KxG

◮ PCA components (H) correspond to directions of maximum data

variance in the original dataset:

◮ Covariance matrix: C = YYT ◮ Eigenvalue/Eigen vectors Cvi = λivi ◮ Projection matrix P = [v1, . . . , vK] ◮ Principle components Hn = P · Yn.

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 7

slide-15
SLIDE 15

Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM)

Linear dimension reduction

PCA

PCA is corresponds to a noise-free version of the model Y

  • NxG

= H

  • NxK

W

  • KxG

◮ PCA components (H) correspond to directions of maximum data

variance in the original dataset:

◮ Covariance matrix: C = YYT ◮ Eigenvalue/Eigen vectors Cvi = λivi ◮ Projection matrix P = [v1, . . . , vK] ◮ Principle components Hn = P · Yn.

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 7

slide-16
SLIDE 16

Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM)

Linear dimension reduction

Bayesian PCA and GPLVM

Assumption: data dimensions or sample dimension independent given H and W.

Probabilistic PCA

p(Y|H, W) =

N

  • n=1

N

  • yn
  • hnW, σ2I
  • p(H) =

N

  • n=1

N

  • hn
  • 0, σ2

hI

  • p(Y|W) =

N

  • n=1

N

  • yn
  • 0, σ2

hWWT + σ2I

  • [Tipping and Bishop, 1999]

GPLVM

p(Y|H, W) =

G

  • g=1

N

  • y:,g
  • Hwg, σ2I
  • p(W) =

G

  • g=1

N

  • w:,g
  • 0, σ2

hI

  • p(Y|H) =

G

  • g=1

N    y:,g

  • 0, σ2

hHHT

  • NxN

+σ2I    

[Lawrence, 2005]

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 8

slide-17
SLIDE 17

Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM)

Linear dimension reduction

Bayesian PCA and GPLVM

Assumption: data dimensions or sample dimension independent given H and W.

Probabilistic PCA

p(Y|H, W) =

N

  • n=1

N

  • yn
  • hnW, σ2I
  • p(H) =

N

  • n=1

N

  • hn
  • 0, σ2

hI

  • p(Y|W) =

N

  • n=1

N

  • yn
  • 0, σ2

hWWT + σ2I

  • [Tipping and Bishop, 1999]

GPLVM

p(Y|H, W) =

G

  • g=1

N

  • y:,g
  • Hwg, σ2I
  • p(W) =

G

  • g=1

N

  • w:,g
  • 0, σ2

hI

  • p(Y|H) =

G

  • g=1

N    y:,g

  • 0, σ2

hHHT

  • NxN

+σ2I    

[Lawrence, 2005]

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 8

slide-18
SLIDE 18

Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM)

GPLVM

◮ Marginal likelihood

p(Y|H) =

G

  • g=1

N

  • y:,g
  • 0, σ2

hHHT + σ2I

  • ◮ Inference of most probable “hidden factors” H:

ˆ H = argmax

H

log

G

  • g=1

N

  • y:,g
  • 0, σ2

hHHT + σ2I

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 9

slide-19
SLIDE 19

Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM)

GPLVM

◮ Marginal likelihood

p(Y|H) =

G

  • g=1

N

  • y:,g
  • 0, σ2

hHHT + σ2I

  • ◮ Inference of most probable “hidden factors” H:

ˆ H = argmax

H

log

G

  • g=1

N

  • y:,g
  • 0, σ2

hHHT + σ2I

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 9

slide-20
SLIDE 20

Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM)

Are linear relationships sufficient?

(from Olivier Grisel, Generated using the Modular Data Processing toolkit and matplotlib.)

◮ Non-linear generalizations, introducing general kernel function instead

  • f linear covariance

ˆ H = argmax

H

log

G

  • g=1

N

  • y:,g
  • 0, σ2

hHHT + σ2I

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 10

slide-21
SLIDE 21

Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM)

Are linear relationships sufficient?

(from Olivier Grisel, Generated using the Modular Data Processing toolkit and matplotlib.)

◮ Non-linear generalizations, introducing general kernel function instead

  • f linear covariance

ˆ H = argmax

H

log

G

  • g=1

N

  • y:,g
  • 0, σ2

hHHT + σ2I

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 10

slide-22
SLIDE 22

Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM)

Are linear relationships sufficient?

(from Olivier Grisel, Generated using the Modular Data Processing toolkit and matplotlib.)

◮ Non-linear generalizations, introducing general kernel function instead

  • f linear covariance

ˆ H = argmax

H

log

G

  • g=1

N

  • y:,g
  • 0, σ2

hKH,H + σ2I

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 10

slide-23
SLIDE 23

Modeling hidden confounders in GWAs

Outline

Motivation Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Modeling hidden confounders in GWAs Model Applications Modeling unobserved cellular phenotypes in genetic analyses Model Applications A unifying view Summary

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 11

slide-24
SLIDE 24

Modeling hidden confounders in GWAs

Confounders in eQTL studies

◮ Confounders in eQTL studies

◮ Experimental procedures ◮ Gene regulation ◮ (Translation)

◮ Standard to take known factors

(gender, population structure) into account.

◮ It is key to account for hidden

factors as well.

◮ Sample preparation ◮ Sample history

?

Phenome Genome

ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT individuals phenotypes SNPs

y y y y y

y1

y y y y y

y2

y y y y y

yN

...

y y y y y

Confounders Covariates Population

y y y y y

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 12

slide-25
SLIDE 25

Modeling hidden confounders in GWAs

Confounders in eQTL studies

◮ Confounders in eQTL studies

◮ Experimental procedures ◮ Gene regulation ◮ (Translation)

◮ Standard to take known factors

(gender, population structure) into account.

◮ It is key to account for hidden

factors as well.

◮ Sample preparation ◮ Sample history

ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT SNPs

mRNA proteins

translation gene regulation transcription

alternative splicing transcription factors small RNAs

DNA

external influenes

experimental procedures environment

[ [

  • bserved & hidden
  • O. Stegle

Latent variable models for GWAs T¨ ubingen 12

slide-26
SLIDE 26

Modeling hidden confounders in GWAs

Confounders in eQTL studies

◮ Confounders in eQTL studies

◮ Experimental procedures ◮ Gene regulation ◮ (Translation)

◮ Standard to take known factors

(gender, population structure) into account.

◮ It is key to account for hidden

factors as well.

◮ Sample preparation ◮ Sample history

ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT SNPs

mRNA proteins

translation gene regulation transcription

alternative splicing transcription factors small RNAs

DNA

external influenes

experimental procedures environment

[ [

  • bserved & hidden
  • O. Stegle

Latent variable models for GWAs T¨ ubingen 12

slide-27
SLIDE 27

Modeling hidden confounders in GWAs

Confounders in eQTL studies

◮ Confounders in eQTL studies

◮ Experimental procedures ◮ Gene regulation ◮ (Translation)

◮ Standard to take known factors

(gender, population structure) into account.

◮ It is key to account for hidden

factors as well.

◮ Sample preparation ◮ Sample history

ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT SNPs

mRNA proteins

translation gene regulation transcription

alternative splicing transcription factors small RNAs

DNA

external influenes

experimental procedures environment

[ [

  • bserved & hidden
  • O. Stegle

Latent variable models for GWAs T¨ ubingen 12

slide-28
SLIDE 28

Modeling hidden confounders in GWAs

Motivating examples

HapMAP II, 3 populations, 90 individuals each, 40K genes, 3 million SNPs (here: small region in chromosome 7)

(a) Standard (b) VBFA accounting for hidden factors

5 10 15 eQTL LOD SLC35B4

0.01% FPR

0.01% FPR 1.3354 1.3356 1.3358 1.336 1.3362 1.3364 1.3366 1.3368 1.337 1.3372 1.3374 x 10

8

5 10 15 VBFA eQTL LOD Position in chr. 7 SLC35B4

0.01% FPR

0.01% FPR SLC35B4

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 13

slide-29
SLIDE 29

Modeling hidden confounders in GWAs Model

Association model

Direct effects of confounders

◮ Start with standard association

model.

◮ Include (K, few) global hidden

factors (confounders) in the model.

◮ Factors H = {h:,1, . . . , h:,K}

need to be learned from the expression data.

◮ How to control model

complexity, i.e. choose the number of factors?

?

Phenome Genome

ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT individuals phenotypes SNPs

y y y y y

y1

y y y y y

y2

y y y y y

yN

...

Covariates Population

y y y y y

y:,g =

genetic

  • N
  • n=1

(θn,gsn) + ψ

  • noise
  • O. Stegle

Latent variable models for GWAs T¨ ubingen 14

slide-30
SLIDE 30

Modeling hidden confounders in GWAs Model

Association model

Direct effects of confounders

◮ Start with standard association

model.

◮ Include (K, few) global hidden

factors (confounders) in the model.

◮ Factors H = {h:,1, . . . , h:,K}

need to be learned from the expression data.

◮ How to control model

complexity, i.e. choose the number of factors?

?

Phenome Genome

ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT individuals phenotypes SNPs

y y y y y

y1

y y y y y

y2

y y y y y

yN

...

y y y y y

Confounders Covariates Population

y y y y y

y:,g =

genetic

  • N
  • n=1

(θn,gsn) +

K

  • k=1

wg,khk

  • hidden factors

+ ψ

  • noise
  • O. Stegle

Latent variable models for GWAs T¨ ubingen 14

slide-31
SLIDE 31

Modeling hidden confounders in GWAs Model

Association model

Direct effects of confounders

◮ Start with standard association

model.

◮ Include (K, few) global hidden

factors (confounders) in the model.

◮ Factors H = {h:,1, . . . , h:,K}

need to be learned from the expression data.

◮ How to control model

complexity, i.e. choose the number of factors?

?

Phenome Genome

ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT individuals phenotypes SNPs

y y y y y

y1

y y y y y

y2

y y y y y

yN

...

y y y y y

Confounders Covariates Population

y y y y y

y:,g =

genetic

  • N
  • n=1

(θn,gsn) +

K

  • k=1

wg,khk

  • hidden factors

+ ψ

  • noise
  • O. Stegle

Latent variable models for GWAs T¨ ubingen 14

slide-32
SLIDE 32

Modeling hidden confounders in GWAs Model

Association model

Gaussian process formulation

◮ Data likelihood of the linear generative model assuming Gaussian

noise

p(Y | X, H, θ, W, ΘK) =

G

  • g=1

N

  • y:,g
  • S
  • s=1

xsθs +

K

  • k=1

wg,khk, σ2

eI

  • ◮ Specify Gaussian priors on the factor weight

P(wg,k) = N

  • wg,k
  • 0, σ2

h

  • ◮ Integrating out the weights yields a Gaussian process marginal

likelihood, factorizing over genes given H

P(Y | X, H, ΘK) =

G

  • g=1

N

  • y:,g
  • S
  • s=1

xsθs, σ2

h K

  • k=1

hkhT

k + σ2 eI

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 15

slide-33
SLIDE 33

Modeling hidden confounders in GWAs Model

Association model

Gaussian process formulation

◮ Data likelihood of the linear generative model assuming Gaussian

noise

p(Y | X, H, θ, W, ΘK) =

G

  • g=1

N

  • y:,g
  • S
  • s=1

xsθs +

K

  • k=1

wg,khk, σ2

eI

  • ◮ Specify Gaussian priors on the factor weight

P(wg,k) = N

  • wg,k
  • 0, σ2

h

  • ◮ Integrating out the weights yields a Gaussian process marginal

likelihood, factorizing over genes given H

P(Y | X, H, ΘK) =

G

  • g=1

N

  • y:,g
  • S
  • s=1

xsθs, σ2

h K

  • k=1

hkhT

k + σ2 eI

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 15

slide-34
SLIDE 34

Modeling hidden confounders in GWAs Model

Association model

Gaussian process formulation

◮ Data likelihood of the linear generative model assuming Gaussian

noise

p(Y | X, H, θ, W, ΘK) =

G

  • g=1

N

  • y:,g
  • S
  • s=1

xsθs +

K

  • k=1

wg,khk, σ2

eI

  • ◮ Specify Gaussian priors on the factor weight

P(wg,k) = N

  • wg,k
  • 0, σ2

h

  • ◮ Integrating out the weights yields a Gaussian process marginal

likelihood, factorizing over genes given H

P(Y | X, H, ΘK) =

G

  • g=1

N

  • y:,g
  • S
  • s=1

xsθs, σ2

h K

  • k=1

hkhT

k + σ2 eI

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 15

slide-35
SLIDE 35

Modeling hidden confounders in GWAs Model

Association model

Including population structure

P(Y | X, H, ΘK) =

G

  • g=1

N       y:,g

  • S
  • s=1

xsθs, σ2

h K

  • k=1

hkhT

k + σ2 eI

  • NxN

     

◮ Including additional covariance term for population structure. ◮ Association test for single SNP. ◮ Challenge: Refitting σg, σh, H for every SNP.

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 16

slide-36
SLIDE 36

Modeling hidden confounders in GWAs Model

Association model

Including population structure

P(Y | X, H, ΘK) =

G

  • g=1

N       y:,g

  • S
  • s=1

xsθs, σ2

h K

  • k=1

hkhT

k + σgXXT + σ2 eI

  • NxN

     

◮ Including additional covariance term for population structure. ◮ Association test for single SNP. ◮ Challenge: Refitting σg, σh, H for every SNP.

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 16

slide-37
SLIDE 37

Modeling hidden confounders in GWAs Model

Association model

Including population structure

P(Y | X, H, ΘK) =

G

  • g=1

N       y:,g

  • xiθi, σ2

h K

  • k=1

hkhT

k + σgXXT + σ2 eI

  • NxN

     

◮ Including additional covariance term for population structure. ◮ Association test for single SNP. ◮ Challenge: Refitting σg, σh, H for every SNP.

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 16

slide-38
SLIDE 38

Modeling hidden confounders in GWAs Model

Association model

Including population structure

P(Y | X, H, ΘK) =

G

  • g=1

N       y:,g

  • xiθi, σ2

h K

  • k=1

hkhT

k + σgXXT + σ2 eI

  • NxN

     

◮ Including additional covariance term for population structure. ◮ Association test for single SNP. ◮ Challenge: Refitting σg, σh, H for every SNP.

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 16

slide-39
SLIDE 39

Modeling hidden confounders in GWAs Model

Association model

◮ Fit parameter once on null model

ˆ H, ˆ σg, ˆ σh, ˆ σe = argmax

ˆ H, ˆ σe, ˆ σg, ˆ σe

P(Y | X, H, ΘK)

◮ Association testing for fixed hyperparameters

p(y:,g | xi, H, ΘK) = N

  • y:,g
  • xiθi, α2( ˆ

σe ˆ H ˆ HT + ˆ σgXXT) + σ2

eI

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 17

slide-40
SLIDE 40

Modeling hidden confounders in GWAs Model

Association model

◮ Fit parameter once on null model

ˆ H, ˆ σg, ˆ σh, ˆ σe = argmax

ˆ H, ˆ σe, ˆ σg, ˆ σe

P(Y | X, H, ΘK)

◮ Association testing for fixed hyperparameters

p(y:,g | xi, H, ΘK) = N

  • y:,g
  • xiθi, α2( ˆ

σe ˆ H ˆ HT + ˆ σgXXT) + σ2

eI

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 17

slide-41
SLIDE 41

Modeling hidden confounders in GWAs Applications

Evaluation on real data

gene: YJL213W

ATGACCTGAAACTGGGCGGATGACGTGGAACGGTATGACCTGAAACTGGGGGACTGACGTGGAACGGTATGACCTGAAACT

Promoter

S

gene: YJL213W

ATGA..................ACTGGGCGGATGACGTGGAACGGTATGACCTGAAACTGGGGGACTGACGTGGAACGGTATGACCTGAAACT

Promoter

S CIS TRANS

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 18

slide-42
SLIDE 42

Modeling hidden confounders in GWAs Applications

Evaluation on real data

gene: YJL213W

ATGACCTGAAACTGGGCGGATGACGTGGAACGGTATGACCTGAAACTGGGGGACTGACGTGGAACGGTATGACCTGAAACT

Promoter

S

gene: YJL213W

ATGA..................ACTGGGCGGATGACGTGGAACGGTATGACCTGAAACTGGGGGACTGACGTGGAACGGTATGACCTGAAACT

Promoter

S CIS TRANS

(a) Y east cis eQT L s (b) Y east tr ans eQT L s

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 18

slide-43
SLIDE 43

Modeling hidden confounders in GWAs Applications

Evaluation on real data

Application to HapMap II expression analysis

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 19

slide-44
SLIDE 44

Modeling hidden confounders in GWAs Applications

Evaluation on real data

Application to HapMap II expression analysis

(f) cis V B eQT L location and strength relative to gene start (a) P robes with a V B eQT L in pooled population

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 19

slide-45
SLIDE 45

Modeling unobserved cellular phenotypes in genetic analyses

Outline

Motivation Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Modeling hidden confounders in GWAs Model Applications Modeling unobserved cellular phenotypes in genetic analyses Model Applications A unifying view Summary

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 20

slide-46
SLIDE 46

Modeling unobserved cellular phenotypes in genetic analyses Model

Association studies

Regulatory and external factors

◮ Confounding factors

◮ Accounting for hidden

confounders

yg,j =

genetic

  • bn,g
  • θn,gsn,j
  • +

vgfj

known factors

+ wgxj

hidden factors

+ ψn,g

noise

.

◮ Accounting for regulatory

factors?

ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT SNPs

mRNA proteins

translation gene regulation transcription

alternative splicing transcription factors small RNAs

DNA

external influenes

experimental procedures environment

[ [

  • bserved & hidden
  • O. Stegle

Latent variable models for GWAs T¨ ubingen 21

slide-47
SLIDE 47

Modeling unobserved cellular phenotypes in genetic analyses Model

Association studies

Regulatory and external factors

◮ Confounding factors

◮ Accounting for hidden

confounders

yg,j =

genetic

  • bn,g
  • θn,gsn,j
  • +

vgfj

known factors

+ wgxj

hidden factors

+ ψn,g

noise

.

◮ Accounting for regulatory

factors?

ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT SNPs

mRNA proteins

translation gene regulation transcription

alternative splicing transcription factors small RNAs

DNA

external influenes

experimental procedures environment

[ [

  • bserved & hidden
  • O. Stegle

Latent variable models for GWAs T¨ ubingen 21

slide-48
SLIDE 48

Modeling unobserved cellular phenotypes in genetic analyses Model

Association studies

Regulatory factors

◮ Account for regulatory factors:

◮ Transcription factors ◮ Pathway components

◮ Hypothesis: intermediate

cellular factors mediate the

  • bserved association signals of

target genes.

◮ Measuring X?

◮ Difficult and expensive.

◮ Learn the unobserved factors X.

mRNA proteins

translation transcription

DNA

[ [

gene regulation

alternative splicing small RNAs

external influenes

experimental procedures environment

  • bserved & hidden

ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT SNPs

[Parts et al., 2011]

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 22

slide-49
SLIDE 49

Modeling unobserved cellular phenotypes in genetic analyses Model

Association studies

Regulatory factors

◮ Account for regulatory factors:

◮ Transcription factors ◮ Pathway components

◮ Hypothesis: intermediate

cellular factors mediate the

  • bserved association signals of

target genes.

◮ Measuring X?

◮ Difficult and expensive.

◮ Learn the unobserved factors X.

mRNA proteins

translation transcription

DNA

[ [

gene regulation

alternative splicing small RNAs

external influenes

experimental procedures environment

  • bserved & hidden

ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT SNPs

[Parts et al., 2011]

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 22

slide-50
SLIDE 50

Modeling unobserved cellular phenotypes in genetic analyses Model

Association studies

Regulatory factors

◮ Account for regulatory factors:

◮ Transcription factors ◮ Pathway components

◮ Hypothesis: intermediate

cellular factors mediate the

  • bserved association signals of

target genes.

◮ Measuring X?

◮ Difficult and expensive.

◮ Learn the unobserved factors X.

mRNA proteins

translation transcription

DNA

[ [

gene regulation

alternative splicing small RNAs

external influenes

experimental procedures environment

  • bserved & hidden

ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT SNPs

[Parts et al., 2011]

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 22

slide-51
SLIDE 51

Modeling unobserved cellular phenotypes in genetic analyses Model

Association studies

Transcription factor effects

◮ Inference of regulatory factors

using a bilinear model

Y

  • Expr.

= W

  • Weights

· X

  • Factors

+ Ψ

  • Noise

.

◮ W is sparse; each factor

regulates only a subset of all genes.

◮ Incorporate biological prior

knowledge to that factor are interpretable:

◮ Transcription factor binding

affinities.

◮ Inferred factors summarize

co-expression clusters.

mRNA proteins

translation transcription

DNA

[ [

gene regulation

alternative splicing small RNAs

external influenes

experimental procedures environment

  • bserved & hidden

ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT SNPs

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 23

slide-52
SLIDE 52

Modeling unobserved cellular phenotypes in genetic analyses Model

Association studies

Transcription factor effects

◮ Inference of regulatory factors

using a bilinear model

Y

  • Expr.

= W

  • Weights

· X

  • Factors

+ Ψ

  • Noise

.

◮ W is sparse; each factor

regulates only a subset of all genes.

◮ Incorporate biological prior

knowledge to that factor are interpretable:

◮ Transcription factor binding

affinities.

◮ Inferred factors summarize

co-expression clusters.

gene: YJL213W TF: PHO4

ATGACCTGAAACTGGGCGGATGACGTGGAACGGTATGACCTGAAACTGGGGGACTGACGTGGAACGGTATGACCTGAAACT

Promoter

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 23

slide-53
SLIDE 53

Modeling unobserved cellular phenotypes in genetic analyses Model

Association studies

Transcription factor effects

◮ Inference of regulatory factors

using a bilinear model

Y

  • Expr.

= W

  • Weights

· X

  • Factors

+ Ψ

  • Noise

.

◮ W is sparse; each factor

regulates only a subset of all genes.

◮ Incorporate biological prior

knowledge to that factor are interpretable:

◮ Transcription factor binding

affinities.

◮ Inferred factors summarize

co-expression clusters.

G e n e s ( 5 4 9 3 ) PHO4 Segregants (218) Segregants (218) Known targets from Yeastract

PHO4 factor activation

Gene expression

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 23

slide-54
SLIDE 54

Modeling unobserved cellular phenotypes in genetic analyses Model

Sparse factor analysis

Probabilistic model

◮ Graphical model for

Y = W · X + Ψ.

◮ Indicators zg,k determine the

sparsity pattern:

P(wg,k | zg,k = 0) = N

  • wg,k
  • 0, σ2
  • P(wg,k | zg,k = 1) = N
  • wg,k
  • 0, σ2

1

  • .

◮ Prior knowledge is encoded

in πg,k = P(zg,k = 1).

◮ Standard conjugate priors for the

remaining random variables.

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 24

slide-55
SLIDE 55

Modeling unobserved cellular phenotypes in genetic analyses Model

Sparse factor analysis

Probabilistic model

◮ Graphical model for

Y = W · X + Ψ.

◮ Indicators zg,k determine the

sparsity pattern:

P(wg,k | zg,k = 0) = N

  • wg,k
  • 0, σ2
  • P(wg,k | zg,k = 1) = N
  • wg,k
  • 0, σ2

1

  • .

◮ Prior knowledge is encoded

in πg,k = P(zg,k = 1).

◮ Standard conjugate priors for the

remaining random variables.

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 24

slide-56
SLIDE 56

Modeling unobserved cellular phenotypes in genetic analyses Model

Sparse factor analysis

Probabilistic model

◮ Graphical model for

Y = W · X + Ψ.

◮ Indicators zg,k determine the

sparsity pattern:

P(wg,k | zg,k = 0) = N

  • wg,k
  • 0, σ2
  • P(wg,k | zg,k = 1) = N
  • wg,k
  • 0, σ2

1

  • .

◮ Prior knowledge is encoded

in πg,k = P(zg,k = 1).

◮ Standard conjugate priors for the

remaining random variables.

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 24

slide-57
SLIDE 57

Modeling unobserved cellular phenotypes in genetic analyses Model

Sparse factor analysis

Inference

◮ Scale is challenging: thousands of

genes, hundreds of TFs.

◮ Efficient deterministic approximate

inference:

  • 1. Variational Bayesian inference for

the core model.

  • 2. Expectation Propagation for the

sparsity submodel.

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 25

slide-58
SLIDE 58

Modeling unobserved cellular phenotypes in genetic analyses Model

Sparse factor analysis

Inference

◮ Scale is challenging: thousands of

genes, hundreds of TFs.

◮ Efficient deterministic approximate

inference:

  • 1. Variational Bayesian inference for

the core model.

  • 2. Expectation Propagation for the

sparsity submodel.

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 25

slide-59
SLIDE 59

Modeling unobserved cellular phenotypes in genetic analyses Model

Sparse factor analysis

Comparison to MCMC on (small!) simulated dataset

Recovery of the true simulated network structure.

Stegle et al., in preparation

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 26

slide-60
SLIDE 60

Modeling unobserved cellular phenotypes in genetic analyses Model

Sparse factor analysis

Comparison to MCMC on (small!) simulated dataset

Recovery of the true simulated factor activations.

Stegle et al., in preparation

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 26

slide-61
SLIDE 61

Modeling unobserved cellular phenotypes in genetic analyses Applications

Application to yeast

Dataset

◮ Applied the approach to 108 yeast strains (crosses), genotyped and

expression profiled in 2 conditions (ethanol and glucose).

◮ Alternative types of prior information available

◮ TF binding affinities (Yeastract). ◮ Pathway information (KEGG).

[Smith and Kruglyak, 2008]

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 27

slide-62
SLIDE 62

Modeling unobserved cellular phenotypes in genetic analyses Applications

Application to yeast

Factor associations

Biological hypotheses

◮ Genetic variation (SNPs) may

regulate factor activations.

◮ Similarly for the environment

condition (glucose/ethanol).

◮ Interaction effects between

SNPs, factors and genes.

mRNA proteins

translation transcription

DNA

[ [

gene regulation

alternative splicing small RNAs

external influenes

experimental procedures environment

  • bserved & hidden

ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT SNPs

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 28

slide-63
SLIDE 63

Modeling unobserved cellular phenotypes in genetic analyses Applications

Application to yeast

Factor associations

Biological hypotheses

◮ Genetic variation (SNPs) may

regulate factor activations.

◮ Similarly for the environment

condition (glucose/ethanol).

◮ Interaction effects between

SNPs, factors and genes.

mRNA proteins

translation transcription

DNA

[ [

gene regulation

alternative splicing small RNAs

external influenes

experimental procedures environment

  • bserved & hidden

ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT SNPs

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 28

slide-64
SLIDE 64

Modeling unobserved cellular phenotypes in genetic analyses Applications

Application to yeast

Factor associations

Biological hypotheses

◮ Genetic variation (SNPs) may

regulate factor activations.

◮ Similarly for the environment

condition (glucose/ethanol).

◮ Interaction effects between

SNPs, factors and genes.

mRNA proteins

translation transcription

DNA

[ [

gene regulation

alternative splicing small RNAs

external influenes

experimental procedures environment

  • bserved & hidden

ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT SNPs

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 28

slide-65
SLIDE 65

Modeling unobserved cellular phenotypes in genetic analyses Applications

Application to yeast

Factor associations

Biological hypotheses

◮ Genetic variation (SNPs) may

regulate factor activations.

◮ Similarly for the environment

condition (glucose/ethanol).

◮ Interaction effects between

SNPs, factors and genes.

mRNA proteins

translation transcription

DNA

[ [

gene regulation

alternative splicing small RNAs

external influenes

experimental procedures environment

  • bserved & hidden

ATGACCTGAAACTGGGGGACTGACGTGGAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGCAACTGGGGGACTGACGTGCAACGGT ATGACCTGAAACTGGGGGATTGACGTGGAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT ATGACCTGCAACTGGGGGATTGACGTGCAACGGT SNPs

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 28

slide-66
SLIDE 66

Modeling unobserved cellular phenotypes in genetic analyses Applications

Application to yeast

Factor associations

◮ Genome-wide association density.

1 10

# of associations

Yeastract KEGG Freeform I II III IV V VI VII VIII IX X XI XII XIII XIV XV XVI

Chromosomal position

1 10 100

# of associations

Trans genes

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 29

slide-67
SLIDE 67

Modeling unobserved cellular phenotypes in genetic analyses Applications

Application to yeast

Factor interactions

◮ Interaction tests between all gene/SNP/factor triplets.

−1.0 −0.5 0.0 0.5 PHO4 activation 5 10 15 20 25 Segregant count Association LOD = 21.5 (PHO84 locus) BY RM

Genes (5493) PHO4 Segregants (218) Segregants (218) Segregants (218) PHO4 factor activation Growth condition Genotype Known targets from Yeastract (a) (b) (c) PHO4 factor activation

PHO4 factor activation Association with genotype Genotype-factor interaction ... ...

−1.0 −0.5 0.0 0.5 PHO4 activation −4 −3 −2 −1 1 2 3 4 YJL213W expression BY Glu RM Glu BY Eth RM Eth Interaction LOD = 35.0 (PHO84 locus)

Factor activation / gene expression low high Genotype Growth condition BY RM Eth Glu

Legend: Segregating loci (2956) Segregants (218)

Genes (5493)

... ... ... ...

Segregating loci (2956)

Gene expression Genotype Gene expression

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 30

slide-68
SLIDE 68

Modeling unobserved cellular phenotypes in genetic analyses Applications

Application to yeast

Factor interactions

◮ Example interactions.

(a) YAP1-IRA2 interaction (b) PHO4-PHO84 interaction (c) MAT-YOX1 interaction

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 30

slide-69
SLIDE 69

Modeling unobserved cellular phenotypes in genetic analyses Applications

Application to yeast

Factor interactions

◮ Genome-wide interaction density.

I II III IV V VI VII VIII IX X XI XII XIII XIV XV XVI

Chromosomal position

100 500 1000 1500

# of interacting genes

OAF1 IRA2 CIN5 PHO84 HAP1 MKT1 AMN1 Environment Yeastract KEGG Freeform

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 30

slide-70
SLIDE 70

A unifying view

Outline

Motivation Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Modeling hidden confounders in GWAs Model Applications Modeling unobserved cellular phenotypes in genetic analyses Model Applications A unifying view Summary

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 31

slide-71
SLIDE 71

A unifying view

Matrix variate normal distributions

◮ Confounders induces

structure between samples.

◮ Gene regulation induces

structure between genes.

◮ Matrix variate models for

joint correction.

confounders hidden known

1 2 3 4 samples g e n e s S

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 32

slide-72
SLIDE 72

A unifying view

Matrix variate normal distributions

◮ ◮ Confounders induces

structure between samples.

◮ Gene regulation induces

structure between genes.

◮ Matrix variate models for

joint correction.

samples genes S

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 32

slide-73
SLIDE 73

A unifying view

Matrix variate normal distributions

◮ ◮ Confounders induces

structure between samples.

◮ Gene regulation induces

structure between genes.

◮ Matrix variate models for

joint correction.

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 32

slide-74
SLIDE 74

A unifying view

Matrix variate normal distributions

s a m p l e s genes S

Row covariances (samples) K Column covariances (genes) Λ−1

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 33

slide-75
SLIDE 75

A unifying view

Matrix variate normal distributions

s a m p l e s genes S

Row covariances (samples) K

20 40 60 80 100 sampless 20 40 60 80 100 sampless

Column covariances (genes) Λ−1

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 33

slide-76
SLIDE 76

A unifying view

Matrix variate normal distributions

s a m p l e s genes S

Row covariances (samples) K

20 40 60 80 100 sampless 20 40 60 80 100 sampless

Column covariances (genes) Λ−1

10 20 30 40 50 genes 10 20 30 40 50 genes

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 33

slide-77
SLIDE 77

A unifying view

Matrix variate normal distributions

s a m p l e s genes S

Row covariances (samples) K

20 40 60 80 100 sampless 20 40 60 80 100 sampless

Column covariances (genes) Λ−1

10 20 30 40 50 genes 10 20 30 40 50 genes

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 33

slide-78
SLIDE 78

A unifying view

Matrix variate distributions

◮ Matrix variate distribution with column covariance Λ−1 and row

covariance K p(Y | µ = 0, Λ−1, K) ∝ exp

  • −0.5tr
  • ΛYTK−1Y
  • ◮ Equivalent to Kronecker covariance structure on vectorized data

p(vecY | µ = 0, Λ−1, K) = N

  • vecY
  • 0, Λ−1 ⊗ K
  • O. Stegle

Latent variable models for GWAs T¨ ubingen 34

slide-79
SLIDE 79

A unifying view

Matrix variate distributions

◮ Matrix variate distribution with column covariance Λ−1 and row

covariance K p(Y | µ = 0, Λ−1, K) ∝ exp

  • −0.5tr
  • ΛYTK−1Y
  • ◮ Equivalent to Kronecker covariance structure on vectorized data

p(vecY | µ = 0, Λ−1, K) = N

  • vecY
  • 0, Λ−1 ⊗ K
  • Drawing samples from a MN
  • 1. sample

Y ∼

R

  • r=1

G

  • g=1

N(0, 1)

  • 2. Y = chol(K) · Y
  • 3. Y = Y · chol(Λ−1)T

Kronecker matrix product A ⊗ B =      a11B a12B . . . a1HB a21B a22B . . . a2HB . . . . . . . . . . . . aG1B aG2B . . . aGHB     

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 34

slide-80
SLIDE 80

A unifying view

Simulated example

better recovery of the true graph

◮ Recovery of simulated network. Weak confounding variation (20%

variance in row covariance).

0.0 0.2 0.4 0.6 0.8 1.0 Recall 0.0 0.2 0.4 0.6 0.8 1.0 Precision GLasso Kronecker GLasso Ideal GLasso

  • 1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49

ground truth GLasso

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 35

slide-81
SLIDE 81

A unifying view

Simulated example

better recovery of the true graph

◮ Recovery of simulated network. Weak confounding variation (20%

variance in row covariance).

0.0 0.2 0.4 0.6 0.8 1.0 Recall 0.0 0.2 0.4 0.6 0.8 1.0 Precision GLasso Kronecker GLasso Ideal GLasso

  • 1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49

ground truth GLasso

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49

Kornecker GLasso Ideal GLasso

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 35

slide-82
SLIDE 82

Summary

Outline

Motivation Dimension reduction and the Gaussian Process Latent Variable Model (GPLVM) Modeling hidden confounders in GWAs Model Applications Modeling unobserved cellular phenotypes in genetic analyses Model Applications A unifying view Summary

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 36

slide-83
SLIDE 83

Summary

Summary

◮ Latent variables can have a dramatic effect in GWAs. ◮ In expression studies, confounders can be estimated from data. ◮ Cellular features can be learnt from the expression profiles. ◮ Duality of regulatory relationships and confounders in matrix variate

normal models.

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 37

slide-84
SLIDE 84

Summary

References I

  • N. Lawrence. Probabilistic non-linear principal component analysis with gaussian process latent

variable models. The Journal of Machine Learning Research, 6:1783–1816, 2005.

  • L. Parts, O. Stegle, J. Winn, and R. Durbin. Joint genetic analysis of gene expression data with

inferred cellular phenotypes. PLoS genetics, 7(1):e1001276, 2011.

  • E. Smith and L. Kruglyak. Gene–environment interaction in yeast gene expression. PLoS

biology, 6(4):e83, 2008.

  • M. Tipping and C. Bishop. Probabilistic principal component analysis. Journal of the Royal

Statistical Society. Series B, Statistical Methodology, pages 611–622, 1999.

  • O. Stegle

Latent variable models for GWAs T¨ ubingen 38