SNP COURSE
- Nov. 15, 2017
MONOGENETIC DISEASES
AND
APPLICATIONS OF NEXT GENERATION SEQUENCING IN HUMAN GENETICS
ANNEMIEKE VERKERK j.verkerk@erasmusmc.nl
http://www.nature.com/scitable/topicpage/calculation-of-complex-disease-risk-756
OMIM 1966-1998 ONLINE MENDELIAN INHERITANCE IN MAN = database of - - PowerPoint PPT Presentation
SNP COURSE Nov. 15, 2017 MONOGENETIC DISEASES AND APPLICATIONS OF NEXT GENERATION SEQUENCING IN HUMAN GENETICS ANNEMIEKE VERKERK j.verkerk@erasmusmc.nl http://www.nature.com/scitable/topicpage/calculation-of-complex-disease-risk-756 1985
MONOGENETIC DISEASES
AND
APPLICATIONS OF NEXT GENERATION SEQUENCING IN HUMAN GENETICS
ANNEMIEKE VERKERK j.verkerk@erasmusmc.nl
http://www.nature.com/scitable/topicpage/calculation-of-complex-disease-risk-756ONLINE MENDELIAN INHERITANCE IN MAN
Around 6600 monogenetic DISEASES 5108 - phenotype descriptions, molecular basis known 1596 - phenotype descriptions, molecular basis unknown
http://omim.org Update from October 2017
= database of human disorders, phenotype descriptions gene descriptions
1985
1966-1998
7 years: 2270 10 years: 1800 start of exome seq 12 years: 1000 first working draft of human genome
phenotype with known gene function phenotype without known gene function
1000 2000 3000 4000 5000 6000 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
OMIM data
TYPES OF DISEASES II
“Complex” diseases more than one gene involved
combinations of Variations in DNA
are risk factors small effect caused by Mutation in DNA large effect “Simple” diseases monogenetic
more severe phenotype
relatively rare Mendelian inheritance e.g. osteogenesis imperfecta mild phenotype late onset common complex inheritance e.g. osteoporosis
solving Mendelian
LINKAGE ANALYSIS / POSITIONAL CLONING NEXT GENERATION SEQUENCING linkage analysis
_look for the chromosomal region that is shared in all patients and segregates with the disease _works for large families _lot of meioses needed to end up with a small shared region in the patients _does not work for small families with AD rare diseases de novo cases cases with “same disease” but different genes involved - locus heterogeneity _solve with NGS techniques EXOME sequencing
AUTOSOMAL RECESSIVE
compound heterozygous variants in affected siblings heterozygous variants in parents homozygous variants in affected siblings heterozygous variants in parents
De NOVO
heterozygous variants in affected sibling
X - linked
X-chr. heterozygous variants in carrier females X-chr. variants hemizygous in affected males
AUTOSOMAL DOMINANT
heterozygous non reference variant in all affected family members
confusing ??
Mani et al, Science 315, 1278, 2007
_phenocopies: it looks like the “same disease”, but is it? _locus heterogeneity _technical issues _incomplete penetrance _phenotype definition
IN DOMINANT INHERITANCE
Example: osteoporosis is a disease in the elderly population _low bone mineral density _impaired bone quality _fractures 30% of women and 12% of men of >60 years are affected _interaction between genes and environment but is also seen as a monogenetic disorder in families
age 74
Is one of them a phenocopy?
age 59 age 45 age 40 age 39
= the disease in your group of patients is caused by more than 1 gene for exome sequencing this is a problem: _the patient population is not homogeneous _you have to search for variants in more than 1 gene _you need enough samples to find the gene mutation
PROBLEM OF
KABUKI SYNDROME
_ very rare 1/30.000 – 1/50.000 -- 400 cases worldwide reported _exome sequencing of 10 cases _expectation to find the same causing gene in all cases _only in 4 of the most severe cases a mutation in MLL2 was found _due to >1 gene involved and also technical issues
NON-PENETRANT MUTATION CARRIERS
Retinitis Pigmentosa Mutations in PRPF31 Asymptomatic mutation carrier non-penetrant mutation Symptomatic mutation carrier High expression of the gene due to a polymorphism in the promoter of the unaffected allele = mutated allele + high expression normal allele Low expression of the gene due to absence of polymorphism in the promoter of the unaffected allele = mutated allele + low expression normal allele
MODIFIERS in AD disease
_autosomal recessive form due to mutations in DJ-1 or PINK1
PARKINSON DISEASE DIGENIC AND INCOMPLETE PENETRANCE
_heterozygous PINK1 mutation from mother _heterozygous DJ-1 mutation from father
DJ-1:wt/A39S
incomplete penetrance?
NON-PENETRANT MUTATION CARRIERS MODIFIERS in AR disease
Cystic Fibrosis: Mutations in CFTR classic CF mutation p.R117H + TGTTTTT classic CF mutation p.R117H + TTTTTTT
http://genetics.emory.edu/docs/Emory_Human_Genetics_Cystic_Fibrosis_PolyT_TG_Tracts.pdf http://www.cftr2.org/r117h.phpCF + infertility risk in males females asymptomatic males CF - infertility
_first paper on NGS published in Nature Genetics in 2010 by Ng et al. _Miller syndrome (facial and limb abnormalities) _very rare disease, only 30 cases described in literature _gene 1: DHODH, causing Miller syndrome _gene 2: DNAH5, causing pulmonary problems
a large file with variants +/- 25.000 variants present in 1 person
25.000 – 80.000 variants depending on the number of samples sequenced genotype file
genotypes variant information
after exome sequencing you are left with different kind of variants
steps are needed to filter out normal variants -- keep LoF variants
HOW TO FIND THE RIGHT VARIANT / MUTATION …
annotated file
e.g. for a dominant disease: keep the heterozygous variants in the patients keep the homozygous reference variants in the unaffecteds
WHICH DATABASES CAN YOU USE?
HOW TO FIND THE RIGHT VARIANT / MUATION …
db SNP
contains: _genetic variation; human + other species _SNPs (99.7%) short insertion/deletion polymorphisms (0.2%) short tandem repeats and other things (0.1%) _large submissions included by HapMap Project, 1000 Genomes Project, goNL, Wash_UV ESP _ build 150 with 171.000.000 human SNPs in Feb. 2017 _neutral variants as well as disease-causing clinical mutations !
washington ESPdatabase
contains: _genetic variation obtained by exome sequencing _from 6503 human samples (4300 European-Americans - 2203 African-Americans) _normal but also with heart, lung and blood disorders _ > 2.000.000 variants 50% Eur-Am 50% Afr-Am _dbSNP build 132 contains a subset of ESP build 138 contains the complete set of ESP
combines variants form different consortia worldwide _from 92.000 exomes _65.000 available through their website
Exome Aggregation Consortium
frequency in databases
... ATGCCGATCGCT ... ... ATGCCGATCGCT ... ... ATGCCGATCGCT ... ... ATGCCGATCGCT ... ... ATGCCGATCGCT ... ... ATGCCGATCGCT ... ... ATGCCGATCGCT ... ... ATGCCGATCGCT ... ... ATGCCGATCACT ... ... ATGCCGATCACT ...
variation/ SNP with MAF of 0. 20 MAF of 20 %
SNP| G SNP| A
80% G allele Ancestral allele
HOW TO USE THE VARIANT FREQUENCY
a SNP has a Minor Allele Frequency
20% A allele Minor allele
_population frequencies _frequency of monogenetic disorders is very low in the population
HOW TO USE THE VARIANT FREQUENCY
example: the frequency of disease is: 1 per 10.000 births = 0,0001 = (0,01%) variants freq > 0,0001 can be discarded (to be on the safe side > 0,0005)
but can they?
HOW TO FIND THE RIGHT VARIANT / MUATION …
DIFFERENT TYPES OF SNPs IN CODING REGIONS
synonymous SNP variation on DNA level no change in amino acid TTT or TTC both code for Phenylalanine no change in protein non-synonymous SNP variation on DNA level gives change in amino acid TGG or TGC Tryptofane Cysteine change in protein
Example of a functional synonymous SNP - 1
2013
Gene: ABCA12 Mutations cause a severe form of recessive skin disease (congenital ichthyosiform erythroderma) _DNA sequencing in a family: no obvious mutation _cDNA sequencing: 163 bp homozygous deletion in the RNA sequence _SYNONYMOUS variant: c.3456G>A TCG TCA : Ser Ser Normal sequence: GTTCCTGTATTTTTCGGACTACAGCTTCT
Mutated sequence: GTTCCTGTATTTTTCAGACTACAGCTTCT Creation of a novel acceptor splice site
Example of a functional synonymous SNP - 1
Example of a functional synonymous SNP - 2
2011
Example of a functional synonymous SNP - 2
Gene: IRGM Involved Disease: Crohn’s disease _SYNONYMOUS variant: c.313C>T CTG TTG : Leu Leu “Normal” sequence: CTG higher aff. binding of miRNA196
TTG: lower aff. binding of miRNA196 higher protein expression induces inflammation increased risk for Crohn’s disease
after all the filter steps you hope you do not have that many left to select from coding - stop - splice site variants - non-synonymous - some indels
what effect will they have on protein function? Prediction Tools for deleteriousness/pathogenicity SIFT PolyPhen MutationTaster GERP++ PhyloP CADD
HOW TO FIND THE RIGHT VARIANT / MUTATION …
http://mendel.stanford.edu/SidowLab/downloads/gerp/index.html http://www.mutationtaster.org/
aminoacid substitution conservation in other sp. integrates diverse annotation tools into single score
http://siftdna.org/www/Extended_SIFT_chr_coords_submit.html
SIFT
_predicts whether an amino acid substitution affects protein function INPUT: only one base is needed Sample format: 22,30163533,1,A/C
http://genetics.bwh.harvard.edu/pph2/
PolyPhen-2
_searches for impact on structure
comparative considerations STRUCTURE INPUT: whole protein sequence Indicate _position of the changed aa in protein _changed aa (R H)
variant prediction tools
exome sequencing seems very easy but….. is all sequence present…. _still not all genes or exons are recognized, and are therefore not present in the sequence capture kit _not all targeted exons are well captured _not all targeted sequence can be aligned back to the ref genome _regulating DNA areas are not present in the sequence capture kit _copy number variation is hard to recognize _deletions/amplifications not easy to call accurately
THINGS TO CONSIDER IN TECHNIQUE I
_loss of function variants are expected to be present in genes causing genetic diseases _BUT, also present in every “normal/healthy” individual 100 LOF variants per human genome, 80 in heterozygous state 20 in homozygous state
THINGS TO CONSIDER IN ANALYSIS I
Science 335, 2012
>1000 rare homozygous LOF variants in inbred 3222 Pakistani from UK predicted to be pathogenic, but without clinical phenotype _stopcodons _splice site disrupting variants _frame shifts _insertions/deletions
normal individuals carry SEVERE MENDELIAN CHILDHOOD DISEASE MUTATIONS
Nat biotechnology 34, 2016
lots of exome sequencing of rare diseases / cases _UK: Deciphering Developmental Disorders (DDD) sequence 1000 exomes to find genes for mental disability 2016: scaled up to 100.000 genomes project: rare disease + cancer _in Canada: FORGE project 200 different disorders/ rare diseases in Canadian children Will probably result in lots of mutations in “private” genes Confirmation in other patient will be difficult Functional proof needed
THINGS TO CONSIDER IN ANALYSIS II
https://www.mousephenotype.org/
International Knockout Mouse Consortium
OPTIONS NOW: exome sequencing
_linkage is still useful:
_mutations for mendelian disorders are found in the coding region of the genome _NGS techniques keep improving exome coverage of the kits are constantly improving _ WES is becoming less expensive / affordable and it works! _ WGS ….
INTERESTED IN SEQUENCING? you are welcome to contact us
Annemieke j.verkerk@erasmusmc.nl André a.g.uitterlinden@erasmusmc.nl Robert r.kraaij@erasmusmc.nl