Big Da Data: Ge Genomic R Referen ence Da e Databases ses t to - - PowerPoint PPT Presentation

big da data ge genomic r referen ence da e databases ses
SMART_READER_LITE
LIVE PREVIEW

Big Da Data: Ge Genomic R Referen ence Da e Databases ses t to - - PowerPoint PPT Presentation

Clinical Pathology Update August 16, 2018 Big Da Data: Ge Genomic R Referen ence Da e Databases ses t to Empower Mend ndel elian Di Diagnosi sis Anne ODonnell-Luria, MD, PhD Associate Director for Rare Disease Genomics Broad


slide-1
SLIDE 1

Big Da Data: Ge Genomic R Referen ence Da e Databases ses t to Empower Mend ndel elian Di Diagnosi sis

Anne O’Donnell-Luria, MD, PhD

Associate Director for Rare Disease Genomics Broad Institute of MIT and Harvard Clinical Geneticist, Boston Children’s Hospital

Clinical Pathology Update August 16, 2018

Twitter: AnneOtation

slide-2
SLIDE 2
  • NIH-funded center launched in early 2016 to discover new disease-gene

relationships underlying Mendelian disease

  • We work with collaborators with existing cohorts of patient samples

consented for genetic studies, prescreened for some known causes of disease

  • CMG covers cost of exome sequencing; supports analysis
  • Diagnoses & gene discoveries are pursued and published by collaborator
  • Commitment to data sharing

seqr analysis software https://cmg.broadinstitute.org/

slide-3
SLIDE 3

Trio e

  • exome s

e sequencing

exons Target <2% of the genome

Bamshad et al., Nature Reviews Genetics (2011) 12, 745-755.

slide-4
SLIDE 4

Trio e

  • exome s

e sequencing

Target <2% of the genome

slide-5
SLIDE 5

Trio e

  • exome s

e sequencing

Target <2% of the genome

slide-6
SLIDE 6

Clinical al e exome s sequencing i in a new t tool in ou

  • ur d

diagnostic t ic tool b

  • l box
  • Sequence ~20,000 human genes
  • 10,000 – 30,000 protein coding variants
slide-7
SLIDE 7

What’s i s in an exome

  • Every genome contains many rare,

potentially functional variants

  • ~500 rare missense variants (1/3 of which are

predicted damaging by in silico predictors)

  • ~100 LoF variants: ~20 homozygous, ~20 rare
  • ~100 rare variants in known disease genes
  • ~50 reported disease-causing mutations (!)
  • 1-2 de novo coding mutations
  • Unknown number of sequencing errors

How can we identify the pathogenic genetic variant(s) in the sea of benign variants?

slide-8
SLIDE 8

Making sense of one exome requires tens of thousands of exomes (or genomes) to reveal rare variants

vs

Harnessing the power of allele frequency

slide-9
SLIDE 9

Five-fold reduction in number of very rare variants with large reference databases

  • # variants remaining in an

exome after applying a 0.1% filter across all populations

  • Both size and ancestral

diversity increase filtering power 6K 60K people East Asian South Asian Latino African European

# variants remaining after filtering

Lek et al., Nature, 2016

slide-10
SLIDE 10

Individuals in dataset

Publicly a availab able r reference population datab abases ases

slide-11
SLIDE 11

Individuals in dataset

Publicly a availab able r reference population datab abases ases

One of the first reference databases Exomes and low coverage genomes sequenced individuals from diverse ancestries

http://www.internationalgenome.org/1000-genomes-browsers/

slide-12
SLIDE 12

Individuals in dataset

Publicly a availab able r reference population datab abases ases

One of the first reference databases Exome sequenced individuals of European and African ancestry, many from common disease cohorts

http://evs.gs.washington.edu/EVS

slide-13
SLIDE 13

Individuals in dataset

Publicly a availab able r reference population datab abases ases

First aggregated exome reference database with representation of 5 ancestries Became the standard reference database for molecular diagnostic labs http://exac.broadinstitute.org/

slide-14
SLIDE 14

Individuals in dataset

Publicly a availab able r reference population datab abases ases

Largest whole genome data from TOPMED project; Restrictions prevent sharing of ancestry

  • r download of complete dataset

Related individuals in dataset https://bravo.sph.umich.edu

slide-15
SLIDE 15

Individuals in dataset

Publicly a availab able r reference population datab abases ases

http://gnomad.broadinstitute.org/

slide-16
SLIDE 16

Individuals in dataset

The genome aggregation database (gnomAD)

  • Data provided by 107 PIs for >138,000 individuals including

123,136 exomes & 15,496 whole genomes

  • Illumina data, processed through same pipeline, called jointly
  • Sites VCF of entire dataset available for download -> Can

annotate your dataset with allele frequencies

  • Individual level data not shared & phenotype data not available
  • Cases and controls from common disease studies. No

Mendelian disease studies knowingly included.

  • New population (e.g. >5K Ashkenazi Jewish samples)
  • Report the population with the highest allele frequency for

each variant (popmax AF)

  • 55% Male; Mean age 54 years

http://gnomad.broadinstitute.org http://gnomad-beta.broadinstitute.org

slide-17
SLIDE 17

Ancestry and sex are inferred from principal component analysis (PCA), rather than self-reported Sample QC Removes Low quality samples Sex chromosome abnormalities First and second degree relatives

Ancestry across gnomAD

Laurent Francioli

PCA computed from 52K SNPs Populations matched from 40K known ancestry samples

African (12,942) Latino (18,237) Ashkenazi Jewish (5,081) East Asian (9,472) Finnish European (13,046) European (63,416) South Asian (15,450)

slide-18
SLIDE 18

http://gnomad.broadinstitute.org http://gnomad-beta.broadinstitute.org

Konrad Karczewski Matthew Solomonson Ben Weisburd Nick Watts

slide-19
SLIDE 19

gnomad.broadinstitute.org Also check out gnomad-beta.broadinstitute.org http://gnomad-beta.broadinstitute.org/gene/CFTR

slide-20
SLIDE 20

http://gnomad-beta.broadinstitute.org/gene/CFTR

slide-21
SLIDE 21

gnomAD variant page

CFTR Phe508del chr7:117199644 ATCT / A

http://gnomad-beta.broadinstitute.org/variant/7-117199644-ATCT-A

Raw read data supporting a variant is available

slide-22
SLIDE 22

gnomAD variant page

CFTR Phe508del chr7:117199644 ATCT / A European carrier frequency 1:41 63,284 x (1/41) = 1,543

http://gnomad-beta.broadinstitute.org/variant/7-117199644-ATCT-A

slide-23
SLIDE 23

gnomAD variant page

CFTR Phe508del chr7:117199644 ATCT / A Expect to see 9 h homoz

  • zygotes

in 63,000 Europeans

  • Carrier frequency as predicted
  • Severe pediatric-onset disease

cases depleted (but not entirely removed)

Do you think the homozygote is a real variant?

  • Review the read data
slide-24
SLIDE 24

Ho Homozygous CF CFTR Phe5 e508del el

Reference sequence Coverage Raw read data

CFTR Phe508del homozygote

Large databases allow us to identify these potentially interesting individuals

slide-25
SLIDE 25

Con

  • nsiderati

tions f for

  • r gnom
  • mAD IGV

V visualization

  • n of
  • f variants

ts

  • Low confidence loss of function (LC LOF)
  • Poorly aligned regions (ex: low copy repeat)
  • Multinucleotide variants (MNVs)
  • Homopolymer runs
  • Complex indels
  • Somatic mosaicism
slide-26
SLIDE 26

Lo Low c con

  • nfid

fidence los

  • ss o
  • f f

funct ctio ion varia iants ts

  • LOFTEE flags variants that are unlikely to cause loss of

function, for example:

  • Dubious transcript annotation
  • Protein truncating variant near end of the gene
slide-27
SLIDE 27

Poorly a y aligned r regions ns

Sequence Coverage Paired-end reads

  • Multiple variants in region
  • Different allele balances
  • Raises concern about variants

called in this region

slide-28
SLIDE 28

Poorly a y aligned r regions ns

Sequence Coverage Paired-end reads

  • Multiple variants in region
  • Different allele balances
  • Raises concern about variants

called in this region

slide-29
SLIDE 29
  • Homopolymer G
  • Indels in these regions enriched

for PCR artifacts

  • But also region enriched for true

variants

Sequence Coverage Paired-end reads

Consid ideratio ions f for

  • r g

gnomAD v varia iants Homo mopolyme mer runs

slide-30
SLIDE 30

Multinucleotide varia iants

  • Two variants within 1 codon – in vcf

considered separately but should be interpreted together

  • Multinucleotide variants (MNV)
  • Variant 1: T>C, Ser>Pro (missense)
  • Variant 2: C>A, Ser>* (nonsense)
  • MNP: TC>CA, Ser>Gln (missense)
  • These are flagged in ExAC, working on

them for gnomAD

  • Can see similar situation with complex

indels (deletion and insertion that maintain the frame

Sequence Coverage Paired-end reads

slide-31
SLIDE 31

Som

  • matic m

ic mos

  • saic

icis ism

  • See skewed allele balance
  • Many of these are filtered but not all

Sequence Coverage Paired-end reads

slide-32
SLIDE 32

When a a vari riant i is s absent f from gnomA mAD, i it’s i importan ant t to det determine i if tha hat r region i is covered

Unable to find variant in gnomAD Possible reasons: 1)This is not the position in the canonical transcript displayed on the browser 2)Position is not covered in gnomAD 3)Variant is not in gnomAD

Look up chromosome coordinate at http://mutalyzer.nl

slide-33
SLIDE 33

Look for the closest variant

Pro273Thr is not present but Pro273Pro is present 65K chromosomes or 32.5K people genotyped at this position

Looking for: chr6:1611497 C > A Pro273Thr

slide-34
SLIDE 34

2015

Evaluating rare variant pathogenicity

slide-35
SLIDE 35

Richards et al., Genet Med, 2015

slide-36
SLIDE 36

Iden entification o

  • f constrai

ained ed g genes

Kaitlin Samocha Mark Daly Konrad Karczewski

Daniel MacArthur

slide-37
SLIDE 37

CONSTRAINED TOLERANT

Individual 1 Individual 2 Individual 3 Individual 4 Individual 5 Individual 6 Individual 1 Individual 2 Individual 3 Individual 4 Individual 5 Individual 6

TI M E TI M E

Identification of constrained genes in ExAC

Kaitlin Samocha

slide-38
SLIDE 38

pLI iden entifies k es known haploi

  • insuffi

ficient gen enes es f for ped ediatric-on

  • nset conditions

JAG1

Alagille syndrome (dominant congenital disorder affecting liver, heart and eyes)

slide-39
SLIDE 39

Prob

  • babili

lity of

  • f los
  • ss-of
  • f-function
  • n (

(LOF OF) intol

  • leran

ance: e:

pLI sc scor

  • res

es

  • Haploinsufficiency: in a diploid organism, where having only
  • ne functional copy of a gene is insufficient to sustain a wild

type phenotype and leads to a “abnormal” phenotype.

  • pLI >0.9 is considered evidence of haploinsufficiency
  • 3,230 genes have pLI score >0.9
  • 70% have not been assigned a phenotype in OMIM
  • We predict that loss of function variation in these genes will

result in disease or embryonic lethality

slide-40
SLIDE 40

pLI does es not iden entify g y gen enes es haploinsufficien ent gen enes es fo for ad adult-onset et conditions

Breast and ovarian cancer

BRCA1

Majority of disease impact is post-fertility

slide-41
SLIDE 41

pLI does es not iden entify g y gen enes es for rec eces essive c e conditions

Cystic fibrosis (recessive disorder affecting lungs and pancreas)

CFTR

slide-42
SLIDE 42

Missen ense c se constrai aint

JAG1

Missense constrained genes have a Z-score > 3 ~1800 missense constrained genes Regional missense constraint https://www.biorxiv.org/content/early/2017/06/12/148353

slide-43
SLIDE 43

Cl ClinVar ar has a a growing c catal alog o

  • f varian

ant i interpretations b s but VUS USes es remain a a major r ch challenge

# of variants in ClinVar

Zach Zappala

slide-44
SLIDE 44

Also on http://exac.broadinstitute.org http://gnomad-beta.broadinstitute.org/gene/KMT2C

Constraint on the browser

Haploinsufficiency results in Kleefstra syndrome

slide-45
SLIDE 45

Also on http://exac.broadinstitute.org

Constraint o

  • n the browser

http://gnomad-beta.broadinstitute.org/gene/KMT2C

Kleefstra syndrome

slide-46
SLIDE 46

Also on http://exac.broadinstitute.org

Constraint o

  • n the browser

http://gnomad-beta.broadinstitute.org/gene/KMT2C

Kleefstra syndrome

slide-47
SLIDE 47

Also on http://exac.broadinstitute.org

Gene e expression

  • n on the b

brow

  • wser

http://gnomad-beta.broadinstitute.org/gene/KMT2C

Kleefstra syndrome

slide-48
SLIDE 48

Also on http://exac.broadinstitute.org

Gene e expression

  • n on the b

brow

  • wser

http://gnomad-beta.broadinstitute.org/gene/KMT2C

Kleefstra syndrome

slide-49
SLIDE 49
slide-50
SLIDE 50

Ex Example gene e expression acr cross t tissues: KM KMT2C

https://www.gtexportal.org/home/gene/KMT2C

slide-51
SLIDE 51

Ex Example gene e expression acr cross t tissues: KM KMT2C

https://www.gtexportal.org/home/gene/KMT2C

slide-52
SLIDE 52

Can a male with OTC be an organ donor?

  • Adult male with OTC deficiency (urea cycle defect) presented with brain

herniation in the setting of illness

  • Declared brain dead
  • Had requested organ donation – are organs from someone with OTC

deficiency safe for transplantation?

  • Literature review
  • Has been done successfully except for liver (would result in OTC deficiency in

recipient)

  • Case reports of deaths in cases of undiagnosed OTC carrier females were liver

donors

  • Ask the experts
  • Look at expression in different tissues
slide-53
SLIDE 53
slide-54
SLIDE 54
slide-55
SLIDE 55

Bohring-Opitz s syndrome ( (BOS OS): Severe dominan ant disorder c caused b by protein t truncating v variants ( (PTVs) i in ASXL1

  • Well-established severe autosomal

dominant pediatric-onset disorder

  • Profound intellectual disability &

characteristic facial features

  • We would not expect to see any

individuals with this disorder in ExAC or gnomAD

Collaboration with University of Utah/ARUP Colleen Carlston Hunter Underhill Tatiana Tvrdik Rong Mao

slide-56
SLIDE 56

Clinical e exome s sequenci cing result: De n e novo dominan ant ASXL1 p.R404* 404* nonsense p pathogenic v variant

chr20:31021211 C>T

  • Are there patients with Borhing-

Opitz syndrome in ExAC? No

  • Does this variant cause Borhing-

Opitz syndrome? Yes

slide-57
SLIDE 57

Ther here a e are num e numer erous P s PTVs s in n ASX SXL1 in ExA xAC

Carlston*, O’Donnell-Luria*, et al., Hum Mut, 2017

PTVs found in ExAC, excluding individuals from the TCGA cohort

slide-58
SLIDE 58

Read s support f for ASXL1 p p.R404* * shows skewed a allele b balance

Reference DNA sequence Coverage Alt allele Ref allele 50%

slide-59
SLIDE 59

Most ExA xAC ASX SXL1 PTVs s s show skewed ed a allel ele ba e balanc nce

Alt allele Ref allele 50%

Carlston*, O’Donnell-Luria*, et al., Hum Mut, 2017

slide-60
SLIDE 60

ExAC AC PTVs i in ASXL1 show skewed allele b balance compared t to other r rare variants in ASXL1

Frequency Allele balance percentage

slide-61
SLIDE 61

Clonal H Hem ematopoies esis o

  • f I

Inde ndeter erminate P e Poten ential ( (CHIP)

Mutations

*

Genovese et al. NEJM, 2014, Jaiswal et al. NEJM, 2014, Artomov, biorxiv, 2016

  • A well described phenomenon of aging
  • Somatic mutations in certain genes provide a growth advantage

to hematopoietic stem cells

  • ASXL1 PTVs are known driver mutations in hematopoietic cancer

% of people with CHIP

Age

Increase in the risk of all-cause mortality, highest for hematologic cancer but also for solid tumors, coronary heart disease & ischemic stroke.

slide-62
SLIDE 62

If If PTVs in in ExA xAC ar are d due t e to

  • clonal h

hem ematopoie iesis is of

  • f

indetermin inate p pot

  • tentia

ial ( l (CHIP IP), t then en the e ASX SXL1 PTVs Vs should be seen a at h high gher f frequency wi with i increasing a g age

Frequency

  • This is consistent with ExAC

AC ASX SXL1 PTVs ar aris ising b by s y som

  • matic m

mosaicis ism an and c clo lonal l expan ansion ion, so are not germline.

  • The ge

germline p.R404* variant is pathogenic in the patient for BOS.

% of people with CHIP

Age Carlston*, O’Donnell-Luria*, et al., Hum Mut, 2017

General Population

slide-63
SLIDE 63

We can learn interesting biology from reference population databases starting from a single variant and a clinical question

slide-64
SLIDE 64

Frequency f y filtering

Nicky Whiffin

Imperial College London

James Ware

Imperial College London

Daniel MacArthur

Broad/MGH/HMS

Eric Minikel

HMS/Broad

slide-65
SLIDE 65

Cen Central t ten enet

  • The frequency of a pathogenic variant in a reference sample, that is not

selected for the condition, should not exceed the prevalence of the condition. Possible Exceptions

  • Founder mutations and Bottlenecked populations
  • Balancing selection
  • Penetrance needs to be considered
slide-66
SLIDE 66

Disease s e specific a c allele f ele freq equen ency cy ( (AF) t thres eshol

  • lds f

for autos

  • som
  • mal d

dominant d disea ease

Whiffin*, Minikel* et al. Genetics in Medicine (2017)

maximum credible population allele frequency

disease prevalence X heterogeneity penetrance

Genetic architecture

slide-67
SLIDE 67

Hypertrophic c ic cardio iomyopathy ( (HCM) s ) spec ecific A ific AF thres eshol

  • ld

maximum credible population allele frequency

disease prevalence X heterogeneity penetrance

Genetic architecture

0.5 x 1/500 3%

6x10-5

MYBPC3:c.1504C>T causes 2.2% (1.6-3.0%) of European HCM cases

Whiffin*, Minikel* et al. Genetics in Medicine (2017)

50%

Most common pathogenic allele

slide-68
SLIDE 68

Online c calcul ulator: r: cardiodb.org/alleleF eFreq equen encyApp

James Ware

slide-69
SLIDE 69

Allele f ele freq equen enci cies es: n not e exactl ctly w what t t they a appea ear t to be

Reference population databases sample the general population so we need to apply statistical estimates of uncertainty. We have the ability to estimate the upper limit on the CI.

Lek et al., Nature, 2016

slide-70
SLIDE 70

Precomputed across 5 ExAC populations: Fi Filterin ring A g AF

maximum credible population allele frequency (AF) disease prevalence X heterogeneity penetrance Genetic architecture Population specific ExAC variant counts filtering allele frequency (AF)

Pre-computed User-defined

YES NO

RETAIN VARIANT, May be pathogenic DISCARD VARIANT Too common to be pathogenic

  • Allele count (AC) at the upper

bound of the one-tailed 95% CI

  • Specified as the maximum

credible AF given the sample size (AN)

  • Computed for 5 main

populations (AFR, AMR, EAS, EUR, SAS)

  • Highest filtering AF reported

Rarity necessary but not sufficient for pathogenicity

slide-71
SLIDE 71

Example: Looking up a ClinVar VUS

ClinVar Entry

slide-72
SLIDE 72

MYBPC3

ClinVar Entry Every variant page in ExAC has a Filtering AF (Coming soon for gnomAD)

0.0007

MYBPC3

slide-73
SLIDE 73

Fi Filterin ring A g AF for H HCM CM

maximum credible population allele frequency (AF) disease prevalence X heterogeneity penetrance Genetic architecture Population specific ExAC variant counts filtering allele frequency (AF)

Pre-computed User-defined

YES NO

RETAIN VARIANT, May be pathogenic DISCARD VARIANT Too common to be pathogenic

6e-5 7e-4

From prior HCM calculation AF = 6e-5

0.0007

slide-74
SLIDE 74

Fi Filterin ring A g AF for H HCM CM

maximum credible population allele frequency (AF) disease prevalence X heterogeneity penetrance Genetic architecture Population specific ExAC variant counts filtering allele frequency (AF)

Pre-computed User-defined

YES NO

RETAIN VARIANT, May be pathogenic DISCARD VARIANT Too common to be pathogenic

X

6e-5 7e-4

slide-75
SLIDE 75

Cl Classific ificatio ion b by ACM CMG c criteria

Using frequency filter approach, we can say: BS1 too common in controls BUT still need to consider other evidence (if there is any) Other criteria met: BP5 alternate cause found in several cases No segregation data available No functional data available MYBPC3 c.961G>A, p.Val321Met Richards et al., Genetics in Medicine, 2015

Likely benign

slide-76
SLIDE 76
  • Dominant, complete penetrance
  • Prevalence 1:10,000
  • Only 1 gene causes phenotype
  • Most common pathogenic variant

accounts for 20% of cases

slide-77
SLIDE 77
  • Dominant, 50% penetrance
  • Prevalence 1:10,000
  • Only 1 gene causes phenotype
  • Most common pathogenic variant

accounts for 20% of cases

slide-78
SLIDE 78
  • Recessive, fully penetrant
  • Only 1 gene causes phenotype
  • Most common pathogenic variant

accounts for 20% of cases

slide-79
SLIDE 79

ClinGen disease expert panel working groups drafting guidelines for disease specific allele frequency thresholds

https://www.clinicalgenome.org/

slide-80
SLIDE 80

Conclusions

  • Reference population databases are critically important to evaluate

variant rarity, which is necessary but not sufficient for pathogenicity for rare disease

  • Constrained genes show less variation among humans than

expected and are enriched for genes that result in disease when mutated

  • Frequency filtering is a more stringent, statistically-based approach

to set allele frequency cut offs for variant filtering and interpretation

  • The power of reference population datasets will increase as they

grow in size and diversity

  • gnomAD v3 with ~60,000 genomes anticipated by early 2019
slide-81
SLIDE 81

Fr Frequentl tly a asked questi tions

  • Phenotypes: Very limited phenotype information and

regulatory restrictions on sharing – need for phenotype- genotype databases (biobanks)

  • Subsets: Non-cancer, non-neuro coming with next gnomAD

release in Fall 2018

  • Constraint on gnomAD: Coming with next gnomAD release
  • Genes that do not have constraint – mainly annotation issue

(Gencode) or too many variants (synonymous and missense) often related to mapping issues (pseudogenes)

slide-82
SLIDE 82

http://geno2mp.gs.washington.edu/

Rare variants (<1% AF) from ~8,000 samples

slide-83
SLIDE 83

Variant s seen en in n pa patien ents w with ne nervous s system em a abn bnormalities es but but also so i in n una unaffec ected ed r rel elatives es

slide-84
SLIDE 84

There is power in big data when deployed in publicly available, intuitive user interfaces Thank you to all the groups that contribute data to gnomAD and other public resources

slide-85
SLIDE 85

Acknow

  • wled

edgem ements

Konrad Karczewski Jessica Alföldi Laura Gauthier Laurent Francioli Monkol Lek Kristen Laricchia

Broad Data Sciences Platform

Eric Banks Charlotte Tolonen Christopher Llanwarne Dave Shiga Fengmei Zhao Jeff Gentry Jose Soto Kathleen Tibbetts Khalid Shakir Kristian Cibulskis Miguel Covarrubias Ryan Poplin Ruchi Munshi Sam Novod Thibault Jeandet Valentin Ruano-Rubio Yossi Farjoun Intel GenomicsDB Google Cloud

team (QC)

Cotton Seed Tim Poterba Jonathan Bloom Jacqueline Goldstein Dan King Ben Neale

Matthew Solomonson

Daniel MacArthur

Ben Weisburd Eric Minikel Kaitlin Samocha Mark Daly

gnomAD PIs

gnomad.broadinstitute.org/about

NIGMS R01 GM104371 NIDDK U54 DK105566 Broad Institute

Mike Wilson Beryl Cummings Grace Tiao

Email:

  • donnell@broadinstitute.org

Nick Watts Qingbo Wang

Collaboration with University of Utah/ARUP Colleen Carlston Hunter Underhill Tatiana Tvrdik Rong Mao UCL: Nicky Whiffin, James Ware

slide-86
SLIDE 86
slide-87
SLIDE 87

Individuals in dataset

Publicly a availab able r reference population datab abases ases

Contains exomes from individuals of Middle Eastern ancestry from phenotypically diverse rare disease cohorts http://igm.ucsd.edu/gme/

slide-88
SLIDE 88

Individuals in dataset

Publicly a availab able r reference population datab abases ases

Exome data from Geisinger/Regeneron with rare variants binned at 0.1% allele frequency; Ancestry not shared (Western Pennsylvania); Phenotype information may be available >25% of cohort related https://discovehrshare.com/

slide-89
SLIDE 89

Individuals in dataset

Publicly a availab able r reference population datab abases ases

Whole genome data from Human Longevity, Inc.; Rare variants binned at 0.1% allele frequency; Ancestry unavailable Cannot download the dataset but can upload your vcf for annotation

https://search.hli.io

slide-90
SLIDE 90

Mutational model accurately predicts synonymous v variation

  • We used our mutational model to predict the expected number of variants in

the ~61K individuals in ExAC

Observed

Synonymous

Expected

r2 = 0.96

Loss-of-function

Expected

r2 = 0.35

Expected

Missense

r2 = 0.89

Samocha et al., Nat Genetics, 2014; Lek et al., Nature, 2016