OMIM 1966-1998 ONLINE MENDELIAN INHERITANCE IN MAN = database of - - PowerPoint PPT Presentation

omim
SMART_READER_LITE
LIVE PREVIEW

OMIM 1966-1998 ONLINE MENDELIAN INHERITANCE IN MAN = database of - - PowerPoint PPT Presentation

SNP COURSE Nov. 15, 2017 MONOGENETIC DISEASES AND APPLICATIONS OF NEXT GENERATION SEQUENCING IN HUMAN GENETICS ANNEMIEKE VERKERK j.verkerk@erasmusmc.nl http://www.nature.com/scitable/topicpage/calculation-of-complex-disease-risk-756 1985


slide-1
SLIDE 1

SNP COURSE

  • Nov. 15, 2017

MONOGENETIC DISEASES

AND

APPLICATIONS OF NEXT GENERATION SEQUENCING IN HUMAN GENETICS

ANNEMIEKE VERKERK j.verkerk@erasmusmc.nl

http://www.nature.com/scitable/topicpage/calculation-of-complex-disease-risk-756
slide-2
SLIDE 2

OMIM

ONLINE MENDELIAN INHERITANCE IN MAN

Around 6600 monogenetic DISEASES 5108 - phenotype descriptions, molecular basis known 1596 - phenotype descriptions, molecular basis unknown

http://omim.org Update from October 2017

= database of human disorders, phenotype descriptions gene descriptions

1985

1966-1998

slide-3
SLIDE 3

7 years: 2270 10 years: 1800 start of exome seq 12 years: 1000 first working draft of human genome

phenotype with known gene function phenotype without known gene function

1000 2000 3000 4000 5000 6000 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017

OMIM data

slide-4
SLIDE 4

TYPES OF DISEASES II

“Complex” diseases more than one gene involved

  • genetic background plays a role
  • environment plays a role

combinations of Variations in DNA

are risk factors small effect caused by Mutation in DNA large effect “Simple” diseases monogenetic

  • ne disease – one genedefect

more severe phenotype

  • ften early onset

relatively rare Mendelian inheritance e.g. osteogenesis imperfecta mild phenotype late onset common complex inheritance e.g. osteoporosis

slide-5
SLIDE 5

solving Mendelian

Disorders

LINKAGE ANALYSIS / POSITIONAL CLONING NEXT GENERATION SEQUENCING linkage analysis

slide-6
SLIDE 6

_look for the chromosomal region that is shared in all patients and segregates with the disease _works for large families _lot of meioses needed to end up with a small shared region in the patients _does not work for small families with AD rare diseases de novo cases cases with “same disease” but different genes involved - locus heterogeneity _solve with NGS techniques EXOME sequencing

LINKAGE ANALYSIS

slide-7
SLIDE 7

Are monogenetic diseases really that simple?

slide-8
SLIDE 8

AUTOSOMAL RECESSIVE

compound heterozygous variants in affected siblings heterozygous variants in parents homozygous variants in affected siblings heterozygous variants in parents

slide-9
SLIDE 9

De NOVO

heterozygous variants in affected sibling

  • ref. seq in parents
slide-10
SLIDE 10

X - linked

X-chr. heterozygous variants in carrier females X-chr. variants hemizygous in affected males

slide-11
SLIDE 11

AUTOSOMAL DOMINANT

heterozygous non reference variant in all affected family members

slide-12
SLIDE 12

confusing ??

Mani et al, Science 315, 1278, 2007

slide-13
SLIDE 13

complicating factors

_phenocopies: it looks like the “same disease”, but is it? _locus heterogeneity _technical issues _incomplete penetrance _phenotype definition

slide-14
SLIDE 14

PHENOCOPY

IN DOMINANT INHERITANCE

Example: osteoporosis is a disease in the elderly population _low bone mineral density _impaired bone quality _fractures 30% of women and 12% of men of >60 years are affected _interaction between genes and environment but is also seen as a monogenetic disorder in families

slide-15
SLIDE 15

age 74

Is one of them a phenocopy?

age 59 age 45 age 40 age 39

slide-16
SLIDE 16

= the disease in your group of patients is caused by more than 1 gene for exome sequencing this is a problem: _the patient population is not homogeneous _you have to search for variants in more than 1 gene _you need enough samples to find the gene mutation

LOCUS HETEROGENEITY

PROBLEM OF

slide-17
SLIDE 17

KABUKI SYNDROME

_ very rare 1/30.000 – 1/50.000 -- 400 cases worldwide reported _exome sequencing of 10 cases _expectation to find the same causing gene in all cases _only in 4 of the most severe cases a mutation in MLL2 was found _due to >1 gene involved and also technical issues

slide-18
SLIDE 18

NON-PENETRANT MUTATION CARRIERS

Retinitis Pigmentosa Mutations in PRPF31 Asymptomatic mutation carrier non-penetrant mutation Symptomatic mutation carrier High expression of the gene due to a polymorphism in the promoter of the unaffected allele = mutated allele + high expression normal allele Low expression of the gene due to absence of polymorphism in the promoter of the unaffected allele = mutated allele + low expression normal allele

MODIFIERS in AD disease

slide-19
SLIDE 19

_autosomal recessive form due to mutations in DJ-1 or PINK1

PARKINSON DISEASE DIGENIC AND INCOMPLETE PENETRANCE

_heterozygous PINK1 mutation from mother _heterozygous DJ-1 mutation from father

DJ-1:wt/A39S

incomplete penetrance?

slide-20
SLIDE 20

NON-PENETRANT MUTATION CARRIERS MODIFIERS in AR disease

Cystic Fibrosis: Mutations in CFTR classic CF mutation p.R117H + TGTTTTT classic CF mutation p.R117H + TTTTTTT

http://genetics.emory.edu/docs/Emory_Human_Genetics_Cystic_Fibrosis_PolyT_TG_Tracts.pdf http://www.cftr2.org/r117h.php

CF + infertility risk in males females asymptomatic males CF - infertility

slide-21
SLIDE 21

_first paper on NGS published in Nature Genetics in 2010 by Ng et al. _Miller syndrome (facial and limb abnormalities) _very rare disease, only 30 cases described in literature _gene 1: DHODH, causing Miller syndrome _gene 2: DNAH5, causing pulmonary problems

Phenotype Definition

slide-22
SLIDE 22

NGS DATA WHAT DO YOU GET

a large file with variants +/- 25.000 variants present in 1 person

slide-23
SLIDE 23

25.000 – 80.000 variants depending on the number of samples sequenced genotype file

genotypes variant information

slide-24
SLIDE 24

HOW TO FIND THE RIGHT VARIANT ? MUTATION ?

slide-25
SLIDE 25

after exome sequencing you are left with different kind of variants

  • 1. where in the genome
  • - exonic
  • - intronic
  • - intergenic
  • - ncRNA
  • 2. what kind of change
  • - STOP gain
  • - STOP loss
  • - SYNONYMOUS SNPs
  • - NON-SYNONYOUS SNPs
  • - SPLICE SITE VARIANTS
  • - SMALL INSERTIONS or DELETIONS

steps are needed to filter out normal variants -- keep LoF variants

HOW TO FIND THE RIGHT VARIANT / MUTATION …

slide-26
SLIDE 26

annotated file

slide-27
SLIDE 27
  • 3. filter according to the genetic model

e.g. for a dominant disease: keep the heterozygous variants in the patients keep the homozygous reference variants in the unaffecteds

  • 4. filter out common variants present in different databases

WHICH DATABASES CAN YOU USE?

  • - dbSNP
  • - 1000 genomes
  • - goNL (750 genomes)
  • - Washington ExomeSequenceProject database (6500 genomes)
  • - ExAC database (>60 706 genomes)

HOW TO FIND THE RIGHT VARIANT / MUATION …

slide-28
SLIDE 28

db SNP

contains: _genetic variation; human + other species _SNPs (99.7%) short insertion/deletion polymorphisms (0.2%) short tandem repeats and other things (0.1%) _large submissions included by HapMap Project, 1000 Genomes Project, goNL, Wash_UV ESP _ build 150 with 171.000.000 human SNPs in Feb. 2017 _neutral variants as well as disease-causing clinical mutations !

slide-29
SLIDE 29

washington ESPdatabase

contains: _genetic variation obtained by exome sequencing _from 6503 human samples (4300 European-Americans - 2203 African-Americans) _normal but also with heart, lung and blood disorders _ > 2.000.000 variants 50% Eur-Am 50% Afr-Am _dbSNP build 132 contains a subset of ESP build 138 contains the complete set of ESP

slide-30
SLIDE 30

combines variants form different consortia worldwide _from 92.000 exomes _65.000 available through their website

EXACdatabase

Exome Aggregation Consortium

slide-31
SLIDE 31
slide-32
SLIDE 32

frequency in databases

slide-33
SLIDE 33

... ATGCCGATCGCT ... ... ATGCCGATCGCT ... ... ATGCCGATCGCT ... ... ATGCCGATCGCT ... ... ATGCCGATCGCT ... ... ATGCCGATCGCT ... ... ATGCCGATCGCT ... ... ATGCCGATCGCT ... ... ATGCCGATCACT ... ... ATGCCGATCACT ...

variation/ SNP with MAF of 0. 20 MAF of 20 %

SNP| G SNP| A

80% G allele Ancestral allele

HOW TO USE THE VARIANT FREQUENCY

a SNP has a Minor Allele Frequency

20% A allele Minor allele

slide-34
SLIDE 34

_population frequencies _frequency of monogenetic disorders is very low in the population

HOW TO USE THE VARIANT FREQUENCY

example: the frequency of disease is: 1 per 10.000 births = 0,0001 = (0,01%) variants freq > 0,0001 can be discarded (to be on the safe side > 0,0005)

slide-35
SLIDE 35
  • 5. synonymous variants (no change in amino-acid) can be filtered out

but can they?

HOW TO FIND THE RIGHT VARIANT / MUATION …

slide-36
SLIDE 36

DIFFERENT TYPES OF SNPs IN CODING REGIONS

synonymous SNP variation on DNA level no change in amino acid TTT or TTC both code for Phenylalanine no change in protein non-synonymous SNP variation on DNA level gives change in amino acid TGG or TGC Tryptofane Cysteine change in protein

slide-37
SLIDE 37

Example of a functional synonymous SNP - 1

2013

slide-38
SLIDE 38

Gene: ABCA12 Mutations cause a severe form of recessive skin disease (congenital ichthyosiform erythroderma) _DNA sequencing in a family: no obvious mutation _cDNA sequencing: 163 bp homozygous deletion in the RNA sequence _SYNONYMOUS variant: c.3456G>A TCG TCA : Ser Ser Normal sequence: GTTCCTGTATTTTTCGGACTACAGCTTCT

Mutated sequence: GTTCCTGTATTTTTCAGACTACAGCTTCT Creation of a novel acceptor splice site

Example of a functional synonymous SNP - 1

slide-39
SLIDE 39

Example of a functional synonymous SNP - 2

2011

slide-40
SLIDE 40

Example of a functional synonymous SNP - 2

Gene: IRGM Involved Disease: Crohn’s disease _SYNONYMOUS variant: c.313C>T CTG TTG : Leu Leu “Normal” sequence: CTG higher aff. binding of miRNA196

TTG: lower aff. binding of miRNA196 higher protein expression induces inflammation increased risk for Crohn’s disease

slide-41
SLIDE 41

after all the filter steps you hope you do not have that many left to select from coding - stop - splice site variants - non-synonymous - some indels

  • 6. what to do with the non-synonymous variants?

what effect will they have on protein function? Prediction Tools for deleteriousness/pathogenicity SIFT PolyPhen MutationTaster GERP++ PhyloP CADD

HOW TO FIND THE RIGHT VARIANT / MUTATION …

http://mendel.stanford.edu/SidowLab/downloads/gerp/index.html http://www.mutationtaster.org/

aminoacid substitution conservation in other sp. integrates diverse annotation tools into single score

slide-42
SLIDE 42

http://siftdna.org/www/Extended_SIFT_chr_coords_submit.html

SIFT

_predicts whether an amino acid substitution affects protein function INPUT: only one base is needed Sample format: 22,30163533,1,A/C

slide-43
SLIDE 43

http://genetics.bwh.harvard.edu/pph2/

PolyPhen-2

_searches for impact on structure

  • f a protein using physical and

comparative considerations STRUCTURE INPUT: whole protein sequence Indicate _position of the changed aa in protein _changed aa (R H)

slide-44
SLIDE 44

variant prediction tools

slide-45
SLIDE 45

exome sequencing seems very easy but….. is all sequence present…. _still not all genes or exons are recognized, and are therefore not present in the sequence capture kit _not all targeted exons are well captured _not all targeted sequence can be aligned back to the ref genome _regulating DNA areas are not present in the sequence capture kit _copy number variation is hard to recognize _deletions/amplifications not easy to call accurately

THINGS TO CONSIDER IN TECHNIQUE I

slide-46
SLIDE 46

_loss of function variants are expected to be present in genes causing genetic diseases _BUT, also present in every “normal/healthy” individual 100 LOF variants per human genome, 80 in heterozygous state 20 in homozygous state

THINGS TO CONSIDER IN ANALYSIS I

Science 335, 2012

>1000 rare homozygous LOF variants in inbred 3222 Pakistani from UK predicted to be pathogenic, but without clinical phenotype _stopcodons _splice site disrupting variants _frame shifts _insertions/deletions

slide-47
SLIDE 47

normal individuals carry SEVERE MENDELIAN CHILDHOOD DISEASE MUTATIONS

Nat biotechnology 34, 2016

slide-48
SLIDE 48

lots of exome sequencing of rare diseases / cases _UK: Deciphering Developmental Disorders (DDD) sequence 1000 exomes to find genes for mental disability 2016: scaled up to 100.000 genomes project: rare disease + cancer _in Canada: FORGE project 200 different disorders/ rare diseases in Canadian children Will probably result in lots of mutations in “private” genes Confirmation in other patient will be difficult Functional proof needed

THINGS TO CONSIDER IN ANALYSIS II

slide-49
SLIDE 49

https://www.mousephenotype.org/

International Knockout Mouse Consortium

slide-50
SLIDE 50

OPTIONS NOW: exome sequencing

_linkage is still useful:

  • for identification of your candidate gene region(s)
  • focus on restricted regions for mutation finding in exome data

_mutations for mendelian disorders are found in the coding region of the genome _NGS techniques keep improving exome coverage of the kits are constantly improving _ WES is becoming less expensive / affordable and it works! _ WGS ….

slide-51
SLIDE 51 http://r2blog.com/r2s-picture-phun/
slide-52
SLIDE 52

INTERESTED IN SEQUENCING? you are welcome to contact us

Annemieke j.verkerk@erasmusmc.nl André a.g.uitterlinden@erasmusmc.nl Robert r.kraaij@erasmusmc.nl