Predic'ng 'ssue-specific effects of rare gene'c variants Farhan - - PowerPoint PPT Presentation

predic ng ssue specific effects of rare gene c variants
SMART_READER_LITE
LIVE PREVIEW

Predic'ng 'ssue-specific effects of rare gene'c variants Farhan - - PowerPoint PPT Presentation

Predic'ng 'ssue-specific effects of rare gene'c variants Farhan Damani Biological Data Sciences 2016 Goal: develop a framework to predict 'ssue- specific regulatory effects of rare variants Rare variants are abundant and poten'ally


slide-1
SLIDE 1

Predic'ng 'ssue-specific effects

  • f rare gene'c variants

Farhan Damani Biological Data Sciences 2016

slide-2
SLIDE 2

Goal: develop a framework to predict 'ssue- specific regulatory effects of rare variants

slide-3
SLIDE 3

Eynard et al. BMC Gene'cs 2015

Minor Allele Frequency Number of variants

Enriched for deleterious func'onal classes

CADD score DAF

Kircher et al. Nature Gene'cs 2014

Rare variants are abundant and poten'ally high-impact

Rare variants defined with minor allele frequency < 1%

Slide – Alexis BaUle

slide-4
SLIDE 4

Tissue-specific func'onality

  • Understanding 'ssue-specific

consequences of noncoding gene'c varia'on is cri'cal to understanding complex traits

Aguet et al. Biorxiv 2016

Overlap of func'onal common variants Cell type Tissue type

Backenroth et al. Biorxiv 2016

slide-5
SLIDE 5

Challenges

  • Even fewer reliable labels in 'ssue-specific

seZng

  • Each individual 'ssue has low sample size

(RNA-seq)

  • Limited samples for each rare SNV
slide-6
SLIDE 6

GTEx Project Data

44 tissues 522 individuals (RNA-seq samples) 148 individuals (WGS)

  • WGS from 148 donors
  • 114 European Ancestry used here
  • 8555 RNA-seq samples from
  • 44 tissues from 522 donors
slide-7
SLIDE 7

Expression outliers

Li et al. The impact of rare varia'on. Biorxiv hUp:// biorxiv.org/content/early/2016/09/09/074443

What are expression outliers? Enrichment of func5onal variants among outliers

slide-8
SLIDE 8

Genomic features

(1) regulatory elements (2) variant predictor summary sta5s5cs

  • Variant effect predictor
  • CADD
  • DANN
slide-9
SLIDE 9

Genomic features

ENCODE Project Consor'um. Plos Biology 2011.

  • Tissue-specific

promoters/ enhancers

  • Conserva'on

scores

  • Transcrip'on

factor binding sites

  • CpG sites
  • ChromHMM
slide-10
SLIDE 10

Related work on 'ssue-shared effects

+ =

Li et al. The impact of rare varia'on. Biorxiv hUp:// biorxiv.org/content/early/2016/09/09/074443

slide-11
SLIDE 11

C ?

Learning 'ssue-specific effects as individual tasks

Brain Muscles Epithelial Diges've Artery+Fats

λ1 λ2 λ3 λ4 λ5

slide-12
SLIDE 12

C ?

Learning 'ssue-specific effects as individual tasks

Brain Muscles Epithelial Diges've Artery+Fats

λ1 λ2 λ3 λ4 λ5

Expression outliers are noisier based on smaller sets of 'ssues

slide-13
SLIDE 13

! g r e q " # $ % & N M

unobserved

  • bserved

Graphical model

Boxes represent replicates…

  • M 5ssues
  • N individual by gene

samples

slide-14
SLIDE 14

g r # N

unobserved

  • bserved

Graphical model

Sample-level component

Genomic annota'ons Presence of rare regulatory variant genomic annota'ons coefficients

slide-15
SLIDE 15

! g r e q # & N

unobserved

  • bserved

Graphical model

Sample-level component

Leak probability Presence of common variant Gene expression expression-covariate parameter

slide-16
SLIDE 16

! g r e q " # $ & N M

unobserved

  • bserved

Graphical model

Tissue-specific influence

Tissue-specific genomic annota'ons coefficient Tissue-specific transfer parameter Global genomic annota'ons coefficient

slide-17
SLIDE 17

! g r e q " # $ % & N M

unobserved

  • bserved

Graphical model

Global influence

Global genomic annota'ons coefficient Global transfer parameter

slide-18
SLIDE 18

! g r e q " # $ % & N M

unobserved

  • bserved

Graphical model

We want to infer p(regulatory variant | data) …

slide-19
SLIDE 19

Objec've func'on

slide-20
SLIDE 20

Objec've func'on

slide-21
SLIDE 21

Hyperparameter seZng

  • Categorical distribu'on

(transfer parameters) (leak probability) Bootstrap es'ma'on:

slide-22
SLIDE 22

Op'mizing the objec've using EM

  • Expecta'on step
  • Exact inference
  • Maximiza'on Step

Coordinate gradient descent NoisyOr update

slide-23
SLIDE 23

Results

slide-24
SLIDE 24

Allelic imbalance presents strong evidence for regulatory varia'on

BaUle et al. Genome Research 2013

Strong evidence of causal cis- regulatory impact Almost all rare variants in our cohort are heterozygous

Zhang et al. Nature Methods 2009: “we found that the varia'on of allelic ra'os in gene expression among different cell lines was primarily explained by gene'c varia'ons…” Yan et. al. Science 2002: “We es'mated that this approach could confidently iden'fy varia'ons when the differences between expression of the two alleles differed by more than 20%.”

slide-25
SLIDE 25

Posteriors are predic've of allelic imbalance

slide-26
SLIDE 26

Brain Muscle

slide-27
SLIDE 27

Epithelial Artery+ Fats

slide-28
SLIDE 28

Diges've

slide-29
SLIDE 29

Our predic'ons are also confident

slide-30
SLIDE 30

Rare regulatory variant nearby GCAT

Brain Muscle

91.2 percen'le allelic imbalance 24.75 percen'le allelic imbalance

P(regulatory variant | data)

slide-31
SLIDE 31

Conclusion

We developed a framework for regulatory rare variant predic'on We compared our predic'ons to measured allelic imbalance Presents an opportunity for researchers with WGS and (limited) RNA-seq to reliably iden'fy func'onal rare variants

slide-32
SLIDE 32

BaPle Lab Yungil Kim Ben Strober Alexis BaUle

Thank you!

Montgomery Lab Xin Li Joe Davis Emily Tsang Zachary Zappala Stephen Montgomery GTEx Consor'um PistriUo Fellowship NIH NIMH Searle Scholar Program

slide-33
SLIDE 33
slide-34
SLIDE 34

Tissue groups with similar behavior

slide-35
SLIDE 35

Case 1: Extreme expression across 'ssues

Tissue type Gene expression (z-score)

slide-36
SLIDE 36

Model predic'ons

p(regulatory variant | data) Mul'-task: Brain Mul'-task: Not Brain RIVER Shared Logis'c Regression

slide-37
SLIDE 37

Case 2: Extreme expression in brain 'ssues

Gene expression (z-score) Tissue type

slide-38
SLIDE 38

Model predic'ons

Mul'-task: Brain Mul'-task: Not Brain RIVER Shared Logis'c Regression p(regulatory variant | data)