BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey - - PowerPoint PPT Presentation

btry 7210 topics in quantitative genomics and genetics
SMART_READER_LITE
LIVE PREVIEW

BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey - - PowerPoint PPT Presentation

BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine jgm45@cornell.edu March 12, 2015 Lecture 4: What we can infer about eQTL and by


slide-1
SLIDE 1

BTRY 7210: Topics in Quantitative Genomics and Genetics

Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine jgm45@cornell.edu March 12, 2015

slide-2
SLIDE 2

Lecture 4: What we can infer about eQTL and by leveraging eQTL?

slide-3
SLIDE 3
  • Identifying causal genotype candidates
  • Leveraging eQTL to identify new regulatory

relationships

  • Learning about the conditional dependencies of

eQTL

  • Inferring the possible impacts of eQTL on complex

phenotypes

Areas where we can use genome-wide data analysis to learn about eQTL

slide-4
SLIDE 4

Reminder: an eQTL

eQTL

3.5 4.0 4.5 5.0 5.5 6.0 rs27290 genotype ERAP2 expression A/A A/G G/G

A1 ! A2 ) 4Y

slide-5
SLIDE 5

Can we identify the causal genotype / polymorphism?

  • Given the right study conditions and available data, yes (sometimes)!

20 40 60 80 100 120

−log10(p)

  • 20

40

ERAP2

  • Geuvadis Individuals (sorted by genotype)

1

100 200 300

96.251 96.252 96.253 96.254 Mb chr5

ERAP2

Phased Haplotypes B/B A/B A/A B/A

  • 20

40 60 80 100 120 −log10(p)

Geuvadis Europeans

0.1 0.2 0.3 0.4 0.5

  • 5

10 15 20 −log10(p)

HapMap LWK

96.1 96.2 96.3 96.4 96.5 Mb chr5

minor allele frequency

A B C

rs2927608 G>A

NRSF binding motif

1 2 3 4 5 6 7 8 9 0.5 1 1.5 2
slide-6
SLIDE 6

Can we identify the causal genotype / polymorphism?

  • Given the right study conditions and available data, yes (sometimes)!

rs10233171 T>C rs1059307 G>T 25.1kb CNV (esv3640682-3) rs9897034 C>T 50 30 10 100 200 300 400

ERAP2

  • 14

10 6 2

LRRC37A2

  • 5

15 25

NT5C3B

  • 250

100

SNHG5

  • 0.0

1.0 2.0

WBSCR27

  • Geuvadis individuals (sorted by genotype)

SP1 binding motif Nkx2-5 binding motif YY1 binding motif rs2927608 G>A

NRSF binding motif

1 2 3 4 5 6 7 8 9 0.5 1 1.5 2 1 2 3 4 5 6 7 8 9 10 0.5 1 1.5 2 1 2 3 4 5 6 7 0.5 1 1.5 2 1 2 3 4 5 6 7 8 9 10 11 12 0.5 1 1.5 2
slide-7
SLIDE 7

Can we identify the causal genotype / polymorphism?

  • The only case of “perfect” experimental validation of an eQTL (to date:

Lee et al. 2014 Science; 343:119):

min.

SLFN5: rs11080327

luc luc Relative fold change 5 10

maj.

15

  • min.

ARL5B: rs2130531

luc luc Relative fold change 0.5 1.0 1.5 2.0

maj.

*

min.

CLEC4F: rs35856355

luc luc Relative fold change 0.5 1.0 1.5 2.0

maj.

2.5 3.0

**

E

A g C G a

F

**

IRF9, T2 wt HEK293 SLFN5 A>G HEK293

SLFN5: rs11080327 * *

a

SLFN5 A>G HEK293 [log2(IFN /baseline)] wt HEK293 [log2(IFN /baseline)] SLFN5

–2 –1 1 2 3 4 5 6 –3 –2

–1

1 2 3 4 5 6

slide-8
SLIDE 8
  • We assume a graphical model G=(V, E) describes the joint

distribution where each random variable is represented as a vertex and each edge represents a conditional relationship:

Pr(Y) = Pr(Y1, Y2, ..., Yk) V = (Y) = Y1, Y2, ..., Yk

E(Yi, Yj) = ⇥ Pr(Yi|Yj) ⇤= Pr(Yi)

eQTL network analysis using probabilistic graphical models

yj yk yj yk yj yk yj yk yj yk z yj yk

slide-9
SLIDE 9

Causal network graphical modeling

  • We could discover network relationships with directed

models, which would imply the direction of causality. However, there is an equivalence class problem:

slide-10
SLIDE 10

By leveraging an eQTL, we can reduce some of the equivalencies

Unidentied

  • r equivalent models

Resolvable in theory (not in practice)

Resolvable in theory and in practice No cis-trans regulation

A B C D

slide-11
SLIDE 11

Some examples...

Skin (MuTHER) Skin (GTEx) Lung (TCGA Ctrl) Lung (GTEx) Blood (GTEx)

+ + + + +

  • +

+ + + + + + +

  • Blood (DGN)
  • +

+ +

  • +

+ + + + +

  • B Cells (MuTHER)

B Cells (Geuvadis)

+

  • +

+ + +

  • +

+

  • +

+

  • +
  • Adipose (MuTHER)

Adipose (GTEx)

  • +

+

  • +
  • +

+ + +

  • +

+

  • +
  • +
  • +
  • Breast (GTEx)

Breast (TCGA Ctrl)

+ +

  • +
  • +
  • +

+ + + +

  • +
  • +

+

  • +

+ + +

  • +
  • +
  • +

+ + +

  • +
  • +

+ +

  • +
  • +

+ + +

  • +
  • +

+ +

  • +
  • +

+

  • KRTAP212

MRPS18B SLC41A2 SELK CLEC5A PAPPA P2RY2 SNRPC MAPK8IP1 DCAKD LINC00937 PRSS53

LOC100130264

CDYL2 ITGB3

  • ERAP2

LRRC37A2

relationship present relationship not present

+

  • positive correlation

negative correlation cis-gene trans-gene euQTL

slide-12
SLIDE 12
  • In most human genome-wide eQTL studies, expression

is measured in a “tissue” (very broadly defined, e.g. blood, skin...) sampled under uncontrolled conditions (i.e. in vivo)

  • This means each sample has an unknown mixture of

cell populations, has unknown factors that apply to it (e.g. exposure of individuals in the study to X, etc.)

  • Many eQTL cannot be replicated, such that many may

be dependent (=only exist) given unknown conditional dependencies

  • Are such results useful? When are they useful?

Conceptually: what are the impacts

  • f eQTL we are measuring?
slide-13
SLIDE 13

If we find that eQTL replicate for a known condition...

Blood (DGN) Blood (GTEx)

  • ERAP2 cis

WBSCR27 cis LRRC37A2 cis ZNF266 cis FAM118A cis CPNE1 cis NT5C3B cis XKR9 cis SNHG5 cis Fibroblast (GTEx) Breast (GTEx)

  • Breast (TCGA Ctrl)
  • Thyroid (GTEx)

Skin (GTEx) Nerve (GTEx) Muscle (GTEx)

  • Lung (GTEx)

Skin (MuTHER) Lung (TCGA Ctrl)

  • KANSL1AS1 cis

B Cells (MuTHER) B Cells (Geuvadis) Artery (GTEx) Adipose (MuTHER) Adipose (GTEx)

Dataset-wide Bonferroni Locally Bonferroni signif. Gene not measured

Skin (MuTHER) Skin (GTEx) Lung (TCGA Ctrl) Lung (GTEx) Blood (GTEx)

+ + + + +

  • +

+ + + + + + +

  • Blood (DGN)
  • +

+ +

  • +

+ + + + +

  • B Cells (MuTHER)

B Cells (Geuvadis)

+

  • +

+ + +

  • +

+

  • +

+

  • +
  • Adipose (MuTHER)

Adipose (GTEx)

  • +

+

  • +
  • +

+ + +

  • +

+

  • +
  • +
  • +
  • Breast (GTEx)

Breast (TCGA Ctrl)

+ +

  • +
  • +
  • +

+ + + +

  • +
  • +

+

  • +

+ + +

  • +
  • +
  • +

+ + +

  • +
  • +

+ +

  • +
  • +

+ + +

  • +
  • +

+ +

  • +
  • +

+

  • KRTAP212

MRPS18B SLC41A2 SELK CLEC5A PAPPA P2RY2 SNRPC MAPK8IP1 DCAKD LINC00937 PRSS53

LOC100130264

CDYL2 ITGB3

  • ERAP2

LRRC37A2 relationship present relationship not present

+

  • positive correlation

negative correlation cis-gene trans-gene euQTL

e.g. all conditions e.g. for a specific tissue

slide-14
SLIDE 14
  • If we are sure that eQTL (ideally we know the causal

alleles) are responsible for a change in gene expression, we at least know this is functional genetic variation, i.e. it at LEAST impacts gene expression under certain these certain conditions

  • Such eQTL may also more effects on many other

phentoypes, e.g. many (many) studies are finding that eQTL and GWAS hits co-locate in the genome, suggesting a common genetic basis

  • If an eQTL is specific to a tissue (or under a specific

condition) this might help draw a connection to a specific disease

What are known conditional dependencies useful for?

slide-15
SLIDE 15

That’s it for today!