Estimating the contribution of non-genetic factors to gene - - PowerPoint PPT Presentation

estimating the contribution of non genetic factors to
SMART_READER_LITE
LIVE PREVIEW

Estimating the contribution of non-genetic factors to gene - - PowerPoint PPT Presentation

eQTL mapping Dataset The model Experiments Conclusions Estimating the contribution of non-genetic factors to gene expression using Gaussian Process Latent Variable Models Nicol` o Fusi and Neil Lawrence Learning and Inference in


slide-1
SLIDE 1

eQTL mapping Dataset The model Experiments Conclusions

Estimating the contribution of non-genetic factors to gene expression using Gaussian Process Latent Variable Models

Nicol`

  • Fusi and Neil Lawrence

Learning and Inference in Computational Systems Biology

31st March 2010

slide-2
SLIDE 2

eQTL mapping Dataset The model Experiments Conclusions

1 eQTL mapping 2 Dataset 3 The model 4 Experiments 5 Conclusions

slide-3
SLIDE 3

eQTL mapping Dataset The model Experiments Conclusions

Outline

1 eQTL mapping 2 Dataset 3 The model 4 Experiments 5 Conclusions

slide-4
SLIDE 4

eQTL mapping Dataset The model Experiments Conclusions

Expression Quantitative Trait Loci - eQTL

Transcript abudance is regulated by polymorphisms in the regulatory elements Statistical methods can be used to discover which polymorphism affects the expression levels of a gene This mapping sometimes is obfuscated by non-genetic factors

slide-5
SLIDE 5

eQTL mapping Dataset The model Experiments Conclusions

Expression Quantitative Trait Loci - eQTL

Transcript abudance is regulated by polymorphisms in the regulatory elements Statistical methods can be used to discover which polymorphism affects the expression levels of a gene This mapping sometimes is obfuscated by non-genetic factors

slide-6
SLIDE 6

eQTL mapping Dataset The model Experiments Conclusions

Expression Quantitative Trait Loci - eQTL

Transcript abudance is regulated by polymorphisms in the regulatory elements Statistical methods can be used to discover which polymorphism affects the expression levels of a gene This mapping sometimes is obfuscated by non-genetic factors

slide-7
SLIDE 7

eQTL mapping Dataset The model Experiments Conclusions

Outline

1 eQTL mapping 2 Dataset 3 The model 4 Experiments 5 Conclusions

slide-8
SLIDE 8

eQTL mapping Dataset The model Experiments Conclusions

Single Nucleotide Polymorphisms

A single nucleotide polymorphism is a variation in the DNA sequence that affects only one nucleotide. They make up about 90% of all human genetic variation They capture 84% of the total genetic variation in gene expression

slide-9
SLIDE 9

eQTL mapping Dataset The model Experiments Conclusions

Single Nucleotide Polymorphisms

A single nucleotide polymorphism is a variation in the DNA sequence that affects only one nucleotide. They make up about 90% of all human genetic variation They capture 84% of the total genetic variation in gene expression

slide-10
SLIDE 10

eQTL mapping Dataset The model Experiments Conclusions

Single Nucleotide Polymorphisms

A single nucleotide polymorphism is a variation in the DNA sequence that affects only one nucleotide. They make up about 90% of all human genetic variation They capture 84% of the total genetic variation in gene expression

slide-11
SLIDE 11

eQTL mapping Dataset The model Experiments Conclusions

The Hapmap dataset

a multi-country effort to identify and catalog genetic similarities and differences in human beings 3.1 million human single nucleotide polymorphisms have been genotyped 270 individuals from 4 geographically diverse populations (Hapmap phase II)

slide-12
SLIDE 12

eQTL mapping Dataset The model Experiments Conclusions

The Hapmap dataset

a multi-country effort to identify and catalog genetic similarities and differences in human beings 3.1 million human single nucleotide polymorphisms have been genotyped 270 individuals from 4 geographically diverse populations (Hapmap phase II)

slide-13
SLIDE 13

eQTL mapping Dataset The model Experiments Conclusions

The Hapmap dataset

a multi-country effort to identify and catalog genetic similarities and differences in human beings 3.1 million human single nucleotide polymorphisms have been genotyped 270 individuals from 4 geographically diverse populations (Hapmap phase II)

slide-14
SLIDE 14

eQTL mapping Dataset The model Experiments Conclusions

Project GENEVAR - GENe Expression VARiation

Gene expression data from EBV-transformed lymphoblastoid cell lines (Stranger et al., Nature Genetics 2007) 270 individuals from Hapmap phase I and II 47,293 gene probes

slide-15
SLIDE 15

eQTL mapping Dataset The model Experiments Conclusions

Project GENEVAR - GENe Expression VARiation

Gene expression data from EBV-transformed lymphoblastoid cell lines (Stranger et al., Nature Genetics 2007) 270 individuals from Hapmap phase I and II 47,293 gene probes

slide-16
SLIDE 16

eQTL mapping Dataset The model Experiments Conclusions

Project GENEVAR - GENe Expression VARiation

Gene expression data from EBV-transformed lymphoblastoid cell lines (Stranger et al., Nature Genetics 2007) 270 individuals from Hapmap phase I and II 47,293 gene probes

slide-17
SLIDE 17

eQTL mapping Dataset The model Experiments Conclusions

Outline

1 eQTL mapping 2 Dataset 3 The model 4 Experiments 5 Conclusions

slide-18
SLIDE 18

eQTL mapping Dataset The model Experiments Conclusions

Confounding factors

Several studies have shown that non-genetic factors can obfuscate associations: Known Factors: age, sex, ethnicity, ... Batch effects: optical effects Unknown factors

slide-19
SLIDE 19

eQTL mapping Dataset The model Experiments Conclusions

Confounding factors

Several studies have shown that non-genetic factors can obfuscate associations: Known Factors: age, sex, ethnicity, ... Batch effects: optical effects Unknown factors

slide-20
SLIDE 20

eQTL mapping Dataset The model Experiments Conclusions

Confounding factors

Several studies have shown that non-genetic factors can obfuscate associations: Known Factors: age, sex, ethnicity, ... Batch effects: optical effects Unknown factors

slide-21
SLIDE 21

eQTL mapping Dataset The model Experiments Conclusions

Modelling non-genetic factors

Our model is inspired by Stegle et al, Lecture notes in Computer Science (2006). We model non-genetic factors as unobserved latent variables. Gene expression levels are described as a linear function of SNP data and non-genetic factors Y = SV + XW + µ1⊤ + ǫ

slide-22
SLIDE 22

eQTL mapping Dataset The model Experiments Conclusions

Modelling non-genetic factors

Our model is inspired by Stegle et al, Lecture notes in Computer Science (2006). We model non-genetic factors as unobserved latent variables. Gene expression levels are described as a linear function of SNP data and non-genetic factors Y = SV + XW + µ1⊤ + ǫ

slide-23
SLIDE 23

eQTL mapping Dataset The model Experiments Conclusions

Modelling non-genetic factors

Our model is inspired by Stegle et al, Lecture notes in Computer Science (2006). We model non-genetic factors as unobserved latent variables. Gene expression levels are described as a linear function of SNP data and non-genetic factors Y = SV + XW + µ1⊤ + ǫ

slide-24
SLIDE 24

eQTL mapping Dataset The model Experiments Conclusions

Modelling non-genetic factors

Our model is inspired by Stegle et al, Lecture notes in Computer Science (2006). We model non-genetic factors as unobserved latent variables. Gene expression levels are described as a linear function of SNP data and non-genetic factors Y = SV + XW + µ1⊤ + ǫ

slide-25
SLIDE 25

eQTL mapping Dataset The model Experiments Conclusions

Modelling non-genetic factors

Our model is inspired by Stegle et al, Lecture notes in Computer Science (2006). We model non-genetic factors as unobserved latent variables. Gene expression levels are described as a linear function of SNP data and non-genetic factors Y = SV + XW + µ1⊤ + ǫ

slide-26
SLIDE 26

eQTL mapping Dataset The model Experiments Conclusions

Modelling non-genetic factors

Our model is inspired by Stegle et al, Lecture notes in Computer Science (2006). We model non-genetic factors as unobserved latent variables. Gene expression levels are described as a linear function of SNP data and non-genetic factors Y = SV + XW + µ1⊤ + ǫ

slide-27
SLIDE 27

eQTL mapping Dataset The model Experiments Conclusions

Modelling non-genetic factors

Our model is inspired by Stegle et al, Lecture notes in Computer Science (2006). We model non-genetic factors as unobserved latent variables. Gene expression levels are described as a linear function of SNP data and non-genetic factors Y = SV + XW + µ1⊤ + ǫ

slide-28
SLIDE 28

eQTL mapping Dataset The model Experiments Conclusions

Modelling non-genetic factors

Our model is inspired by Stegle et al, Lecture notes in Computer Science (2006). We model non-genetic factors as unobserved latent variables. Gene expression levels are described as a linear function of SNP data and non-genetic factors Y = SV + XW + µ1⊤ + ǫ

slide-29
SLIDE 29

eQTL mapping Dataset The model Experiments Conclusions

Modelling non-genetic factors

Our model is inspired by Stegle et al, Lecture notes in Computer Science (2006). We model non-genetic factors as unobserved latent variables. Gene expression levels are described as a linear function of SNP data and non-genetic factors Y = SV + XW + µ1⊤ + ǫ

slide-30
SLIDE 30

eQTL mapping Dataset The model Experiments Conclusions

dual Probabilistic Principal Component Analysis

We learn the parameters by: Marginalizing W, V, µ, ǫ Maximizing the log-likelihood with respect to the latent variables (X) For a particular choice of priors over W and V this approach is equivalent to probabilistic Principal Component Analysis

slide-31
SLIDE 31

eQTL mapping Dataset The model Experiments Conclusions

dual Probabilistic Principal Component Analysis

We put Gaussian priors over W, V and µ: P(W) =

D

  • i=1

N(wi|0, αwI) P(V) =

D

  • i=1

N(vi|0, αvI) P(µ) = N(µ|0, αµI)

slide-32
SLIDE 32

eQTL mapping Dataset The model Experiments Conclusions

dual Probabilistic Principal Component Analysis

slide-33
SLIDE 33

eQTL mapping Dataset The model Experiments Conclusions

dual Probabilistic Principal Component Analysis

The likelihood of Y can be then written as P(Y|W, X, S, µ) =

D

  • j=1

N(yj|Wxj + Vsj + µ, σ2I) Marginalizing W, V, µ, ǫ we obtain the marginal likelihood P(Y|X) =

D

  • j=1

N(yj|0, αwXX⊤ + αvSS⊤ + αµ + σ2I)

slide-34
SLIDE 34

eQTL mapping Dataset The model Experiments Conclusions

dual Probabilistic Principal Component Analysis

The likelihood of Y can be then written as P(Y|W, X, S, µ) =

D

  • j=1

N(yj|Wxj + Vsj + µ, σ2I) Marginalizing W, V, µ, ǫ we obtain the marginal likelihood P(Y|X) =

D

  • j=1

N(yj|0, αwXX⊤ + αvSS⊤ + αµ + σ2I)

slide-35
SLIDE 35

eQTL mapping Dataset The model Experiments Conclusions

Population structure

slide-36
SLIDE 36

eQTL mapping Dataset The model Experiments Conclusions

Accounting for population structure

C = αwXX⊤ + αvSS⊤ + αµ + σ2I C = αwXX⊤ + αvSS⊤ + αpPP⊤ + αµ + σ2I C = αwXX⊤ + αvSS⊤ + αpPP⊤ + αgGG⊤ + αµ + σ2I

slide-37
SLIDE 37

eQTL mapping Dataset The model Experiments Conclusions

Accounting for population structure

C = αwXX⊤ + αvSS⊤ + αµ + σ2I C = αwXX⊤ + αvSS⊤ + αpPP⊤ + αµ + σ2I C = αwXX⊤ + αvSS⊤ + αpPP⊤ + αgGG⊤ + αµ + σ2I

slide-38
SLIDE 38

eQTL mapping Dataset The model Experiments Conclusions

Accounting for population structure

C = αwXX⊤ + αvSS⊤ + αµ + σ2I C = αwXX⊤ + αvSS⊤ + αpPP⊤ + αµ + σ2I C = αwXX⊤ + αvSS⊤ + αpPP⊤ + αgGG⊤ + αµ + σ2I

slide-39
SLIDE 39

eQTL mapping Dataset The model Experiments Conclusions

Outline

1 eQTL mapping 2 Dataset 3 The model 4 Experiments 5 Conclusions

slide-40
SLIDE 40

eQTL mapping Dataset The model Experiments Conclusions

eQTL scan using data from Hapmap and GENEVAR

At each locus we compute the log-odds score: Li = log10

  • n

P(Ym|sn,j, θi,n) P(Ym|θbkg)

  • (1)

The significance of an association is evaluated via permutation testing.

slide-41
SLIDE 41

eQTL mapping Dataset The model Experiments Conclusions

Traditional eQTL scan

slide-42
SLIDE 42

eQTL mapping Dataset The model Experiments Conclusions

eQTL scan accounting for non-genetic factors

slide-43
SLIDE 43

eQTL mapping Dataset The model Experiments Conclusions

Traditional eQTL scan

slide-44
SLIDE 44

eQTL mapping Dataset The model Experiments Conclusions

eQTL scan accounting for non-genetic factors

slide-45
SLIDE 45

eQTL mapping Dataset The model Experiments Conclusions

Outline

1 eQTL mapping 2 Dataset 3 The model 4 Experiments 5 Conclusions

slide-46
SLIDE 46

eQTL mapping Dataset The model Experiments Conclusions

Conclusions

We presented a model that explicitly accounts for non-genetic factors Using this model we can detect an higher number of significant associations Many extensions are possible (future work)