DUMMIES APPLICATIONS OF NEXT GENERATION SEQUENCING IN HUMAN GENETICS - - PowerPoint PPT Presentation

dummies
SMART_READER_LITE
LIVE PREVIEW

DUMMIES APPLICATIONS OF NEXT GENERATION SEQUENCING IN HUMAN GENETICS - - PowerPoint PPT Presentation

GENETICS FOR DUMMIES APPLICATIONS OF NEXT GENERATION SEQUENCING IN HUMAN GENETICS ANNEMIEKE VERKERK j.verkerk@erasmusmc.nl www.woesvanhaaften.com/weblog/joost-van-haaften/ HUMAN GENOMES _in the Human Genome Project DNA full genome was


slide-1
SLIDE 1 www.woesvanhaaften.com/weblog/joost-van-haaften/

GENETICS FOR

DUMMIES

APPLICATIONS OF NEXT GENERATION SEQUENCING IN HUMAN GENETICS

ANNEMIEKE VERKERK j.verkerk@erasmusmc.nl

slide-2
SLIDE 2

HUMAN GENOMES

_in the Human Genome Project DNA full genome was sequenced from a mix of male and female individuals _this sequence served as the human genome reference sequence and does not represent any one’s personal genome _it took 13 years and cost $ 3 – 5 billion _with the new techniques of NGS, sequencing became much faster and people wanted to sequence personal genomes _lot of data – stored in databases – visualised in genome browsers

slide-3
SLIDE 3

_the first personal genome sequenced with the new method was the DNA of James Watson was generated within 4 months cost $ 1.5 million published in Nature in April 2008 _but... Craig Venter’s personal genome was sequenced first, although with the previous-generation machines cost $ 100 million published in Plos Biology Sept 2007 despite all sequencing advances, we are still learning how to read all this data.....

CONTINUED RIVALRY

slide-4
SLIDE 4

James Watson did not want to know the status of his ApoE gene, because sequence within it could indicate his risk for Alzheimer’s disease – not present in genomebrowser

PRIVACY ISSUES

and this also started the debate about privacy issues on human genome sequences and the “right” to know or not to know

slide-5
SLIDE 5

chr 19

rs429358 locus: APOE gene rs7412 E2 T T T T E2 C C C C E4 E4 increased risk for Alzheimer

slide-6
SLIDE 6

STORE DATA IN

GENOME BROWSWERS

_sequencing produces a lot of data _to share it with others 3 genome browsers were developed to store it in – and make it visible

NCBI_map viewer Ensembl UCSC

slide-7
SLIDE 7

USE OF NGS DATA

_find (rare) variants that can be used in Genome Wide Association studies _solving Mendelian disorders that were until now impossible to solve especially rare diseases – diseases in small families _solving de novo diseases (newly arisen in the child, parents are healthy) _ somatic mutations in cancer _learn and understand the sequence/structure of DNA

slide-8
SLIDE 8

solving Mendelian disorders

disease function gene location on genome/chromosome linkage analysis / positional cloning

slide-9
SLIDE 9

_you look for the chromosomal region that is shared in all patients (in a family) and segregates with the disease _you follow polymorphic markers in the pedigree, can be CA repeats or SNPs _you need large families with (10) affected individuals to get a small region were you can search for the mutated gene _you need a lot of meioses to end up with a small region _was very time consuming with the old techniques (using CA repeats) _faster with SNPs as markers (SNP arrays) _small families with rare diseases can not be solved in this way

LINKAGE ANALYSIS

slide-10
SLIDE 10

LINKAGE ANALYSIS

ca15 ca10 ca15 ca16 ca12 ca10 ca16 ca10 ca16 ca10 ca16

This chromosome segregates with the disease in this family and contains a gene mutation = principle of a linkage study

ca10 ca16 ca15 ca18 ca12 ca10 ca14 ca18 ca10 ca16 ca14 ca16 ca14 ca15 ca15 ca16 ca15 ca14 ca16 ca14 ca15 ca16
slide-11
SLIDE 11

MEIOSIS

GERM CELL DIVISION

during production of germ cells chromosomes in “metaphase” exchange pieces of DNA = recombination or crossing over to create genetic diversity germ cells end up with single chromatids from recombined chromosomes

slide-12
SLIDE 12

A

A = affected

X recombination and segregation how it works in a family

the region still needs to be sequenced

A A X X X A A A A A A A

slide-13
SLIDE 13

_you can determine the complete DNA sequence of all exons in one individual in

  • ne go

_only have to look in exons, because most Mendelian diseases have a mutation in the coding sequence _costs keep going down, more extensive platforms are developed = affordable _especially suited for small families and sporadic cases (without family history) de novo (dominant) cases (were impossible to solve, mapping with linkage not possible) continuing stuck positional cloning projects with large linkage areas

NEW WAY OF GENE / MUTATION FINDING

= EXOME SEQUENCING

slide-14
SLIDE 14

_published in Nature Genetics in 2010 by Ng et al. _4 patients from 3 families with Miller syndrome (facial and limb abnormalities) _very rare disease, only 30 cases described in literature _linkage analysis was not an option, families are very small

FIRST PAPER ON EXOME SEQUENCING

slide-15
SLIDE 15

3 families family 1: 2 affected individuals family 2: 1 affected individual family 3: 1 affected individual assumption mode of inheritance: autosomal recessive

slide-16
SLIDE 16

_variants that cause aminoacid changes (NS = non synonmyous changes) _splice site mutations (SS = splice site changes) _short insertions and deletions in coding regions _new variants not present in dbSNP database _predicted to be damaging for data-analysis: filtered out all variants that were not relevant _mainly synonymous changes: change on DNA level = aminoacid does not change _intronic changes _known SNPs Exome Sequencing: 40-fold coverage of 37 Mb = all coding sequence with data-analysis interest in:

slide-17
SLIDE 17

first only family 1 was analyzed for the recessive model it was required for each sibling to have variants in the same gene: 2800 genes _filtered out all known SNPs present in dbSNP129 and 8 HapMap samples: problem reduced to 9 genes _compared these 9 genes to variants in the two other unrelated patients from family 2 and 3: _problem reduced to 1 gene: DHODH = dihydro-orotate dehydrogenase BUT: it was not a classical recessive mutation, but a compound recessive mutation: the parents carried different mutations in the same gene!

slide-18
SLIDE 18

Unexpected bonus: _also recessive mutations were found in another gene, DNAH5, but only in the 2 siblings from family 1, not in patients from family 2 and 3 _patients from family 1 had additional clinical problems with pulmonary infections (not part of Miller syndrome) and the second gene DNAH5 was responsible for that (= dynein, axonemal, heavy chain 5).

slide-19
SLIDE 19

7 years: 2270 10 years: 1800 start of exome seq 12 years: 1000 first working draft of human genome

phenotype with known gene function phenotype without known gene function 1000 2000 3000 4000 5000 6000 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017

OMIM data

slide-20
SLIDE 20

it all seems very easy but….. _still not all genes or exons are recognized, and are therefore not present in the sequence capture kit _sometimes even known genes are not present in the capture kit, but this is improving _ not all targeted exons are well captured _not all targeted sequence can be aligned back to the ref genome _not all aligned sequences can be accurately called

PROBLEMS… IN TECHNIQUE I

slide-21
SLIDE 21

_regulating DNA areas are not (yet) present in the sequence capture kit (whole genome seq better?) _copy number variation is (still) hard to recognize _mitochondrial DNA is sometimes present, sometimes not mitochondrial probes are not present in the capture kit mitochondrial sequence is a ‘’by-product” _micro RNAs are not all recognized and are not in the capture kit

A WHOLE EXOME IS NOT THAT WHOLE

PROBLEMS… IN TECHNIQUE II

slide-22
SLIDE 22

= the disease in your group of patients is caused by more than 1 gene for exome sequencing this is a problem: _the patient population is not homogeneous _you are searching for variants in more than 1 gene _you need enough samples to find the gene mutation _especially important for sporadic cases

LOCUS HETEROGENEITY

PROBLEM OF

slide-23
SLIDE 23

= the disease in your group of patients is caused by more than 1 variant in the same gene for exome sequencing this is an advantage: _the variants are spread over the same gene in different patients the risk of not finding the variant is spread over different sequencing targets and increases the chance of finding the causing gene

ALLELIC HETEROGENEITY

ADVANTAGE OF

slide-24
SLIDE 24

How to find the right variant (mutation) …… _after exome sequencing you are left with around 25.000 variants per person _filtering steps are needed to filter out normal variants many databases with DNA variants

  • - dbSNP
  • - 1000 genomes project
  • - Washington Exome Sequencing Project (ESP6500)
  • - ExAC (from 60,706 individuals combined from 17 different databases)

can be used to filter out common variants non-coding variants can be filtered out new variants, but synonymous ones (no change in amino-acid) can be filtered out after that you hope you do not have that many left to select from coding - stop - nonsynonymous - splice site variants - some indels

PROBLEMS … IN ANALYZING

slide-25
SLIDE 25

MAIN ISSUE

TO FIND THE RIGHT VARIANT

_you will end up with a list of variants that are the potential cause of the disease you are studying _there will be false positives (false negatives?) _every healthy person has 100 LOF variants in their genome, including non-penetrant disease mutations which one is it?? _you will need to investigate functional issues are the variants conserved in other species do the variants have an effect on the protein _a region – even though large – can help restrict the number of variants

slide-26
SLIDE 26

dbSNP AGAIN

_use of dbSNP can be helpful but also dangerous _also variants associated with disease are in there _mutations of (“more common”) recessive disorders are present i.e. mutations in the CFTR gene very rare variants are probably not in there _sequence data of cohorts is deposited in dbSNP i.e. from the Exome Sequencing Project (ESP) Univ of Washington exome sequencing data from people with heart, lung and blood disorders _it would be good to have your own internal control set that you know best yourself _ExAC database is used by many researchers (http://exac.broadinstitute.org/)

slide-27
SLIDE 27

NGS DATA

WHAT DO YOU GET

variant location genotypes

slide-28
SLIDE 28

HOW TO FIND THE RIGHT VARIANT

prediction programs conservation population freq

slide-29
SLIDE 29

RECESSIVE DISORDER

EXAMPLE OF

WITH CONSANGUINITY SOLVED BY LINKAGE ANALYSIS AND EXOME SEQUENCING

Malpuech-Michels-Mingarelli-Carnevale syndrome

AJHG 87, 2012

slide-30
SLIDE 30

Malpuech-Michels-Mingarelli-Carnevale syndrome

_disorder with developmental delay, ocular and abdominal defects, skeletal anomalies, facial characteristics, normal intelligence – mild intellectual disability, hearing problems _phenotypic variability but also overlap cases described by 4 mentioned authors _rare – 10-15 small families reported in literature _assumed autosomal recessive inheritance strategy used: _linkage analysis/homozygosity mapping in 2 families with related parents _sequenced exome of 1 patient

slide-31
SLIDE 31

linkage/homozygosity mapping identified a region of 1.8 Mb (chr3) in family 1: homozygous in the 2 patients, not present in the unaffected children heterozygous in the parents (and in one unaffected) the same region, but larger (24Mb), was present in family 2 (confirmation of the region _one exome was sequenced of patient 101 - smallest homozygous region

slide-32
SLIDE 32

exome data of 1 patient: filtered according to a recessive model: mutation on both alleles _only looked in the 1.8 Mb region – contains 20 genes _filtered on: sufficient good quality not present in dbSNP not present in data of 1000 genomes project 4 variants left: (1 was non-conserved intronic) (1 was non-conserved intergenic) 1 was non-synonymus, non-conserved 1 was non-synonymus, conserved in other species: in MASP1 = mannan-binding lectin serine protease 1 predicted to be damaging bij SIFT and PolyPhen programs Sanger sequencing identified a STOP mutation in family 2 in MASP1 exon 6

slide-33
SLIDE 33

some functional evidence show with specific modeler software that the NS variant has an effect on the 3D model of the protein stop-codon missense – amino acid change

slide-34
SLIDE 34

AUTOSOMAL DOMINANT

EXAMPLE OF

DISORDER SOLVED BY EXOME SEQUENCING

Nat Genet 42, 2010

slide-35
SLIDE 35

KABUKI SYNDROME

_disorder with skeletal phenotypes and intellectual disability phenotypic variability _very rare 1/30.000 – 1/50.000 -- 400 cases worldwide reported _not solvable by positional cloning/linkage analyses _most cases are the novo, in some instances transmission from parent to child suggested autosomal dominant disorder

slide-36
SLIDE 36

strategy used: _sequenced exomes of 10 unrelated patients _they could initially NOT find the mutation problems encountered: _more then one gene was involved: locus heterogeneity _the gene capture missed a number of exons

  • ptimal

later resolved MLL2 gene 54 exons

slide-37
SLIDE 37

LOSS OF FUNCTION VARIANTS IN HUMAN PROTEIN-CODING GENES

Science 335, 2012

in normal people

slide-38
SLIDE 38

_loss of function variants severely disrupt the function of protein-coding genes through mutations in the DNA

  • - stopcodons (stopping translation of the protein)
  • - splice site disrupting variants
  • - frame shifts
  • - small insertions/deletions
  • - larger deletions

_normally expected to be present in genes causing genetic diseases _BUT, every healthy individual has around 100 LOF variants ± 20 stopcodons, 6 in homozygous state ± 80 deletions/insertions/frameshifts, up to 14 in homozygous state

slide-39
SLIDE 39

Classes of LOF variant affecting protein-coding regions

MacArthur D G , Tyler-Smith C Hum. Mol. Genet. 2010;19:R125-R130
slide-40
SLIDE 40

WHAT IS IN THERE

_mutations in heterozygous state, that would cause disease in homozygous state = recessive diseases _LOF variants in poorly evolutionary conserved genes – small or no effect _LOF variants in genes that belong to multi-gene families, suggesting that proteins in the same family have similar functions _a minority are associated with reduced expression of the corresponding gene question: what effect do these variants have on human phenotypes this was a first report about LOF in normal individuals in 2012 now it is found that also Mendelian disease variants, causing severe disorders, are present in healthy individuals

slide-41
SLIDE 41

normal individuals carry SEVERE MENDELIAN CHILDHOOD DISEASE MUTATIONS

Nat biotechnology 34, 2016

slide-42
SLIDE 42

HYPOTHESIS

_analysed existing sequence data - - WES and WGS - - identified: _13 individuals with mutations in 8 disease genes, normally causing severe Mendelian childhood disorders: 5 homozygous state for AR diseases 3 heterozygous state for AD diseases hypothesis: these individuals carry a genetic modulator or suppressor that protects against Mendelian disease

slide-43
SLIDE 43

SOME ISSUES

_for 6 individuals health record was checked: no mention of disease _for 5 of the 13 genotypes could be confirmed, others no DNA available any more _because of original consent could not re-contact these individuals _ search for counterbalancing variants hampered by unavailability

  • f Whole Genome Sequence data except in 2

_ data sharing important to investigate this further _linking participants to their medical records _consent for re-contacting needed

slide-44
SLIDE 44

OTHER EXAMPLES

>1000 rare homozygous LOF variants in inbred 3222 Pakistani from UK predicted to be pathogenic, but without clinical phenotype 56 LOF mutations in ASXL1 in ExAC database in individuals without severe genetic childhood disorders _de novo (dominant) mutations cause severe Bohring-Opitz syndrome

slide-45
SLIDE 45

LARGEST GENOMIC DATABASE

New initiative: Million Veteran Program USA _participants donate blood for DNA extraction + give access to their electronic health records + agree to be contacted about participating in future research http://www.va.gov/opa/pressrel/pressrelease.cfm?id=2806 http://www.research.va.gov/mvp/ 2011: start aug 2016: 500.000 participants july 2017: 580.000 participants 2020:

slide-46
SLIDE 46
slide-47
SLIDE 47

1953 1990

slide-48
SLIDE 48 germaglitter.blogspot.com
slide-49
SLIDE 49

INTERESTED IN SEQUENCING? you are welcome to contact us

Annemieke j.verkerk@erasmusmc.nl André a.g.uitterlinden@erasmusmc.nl Robert r.kraaij@erasmusmc.nl