Introduction to Complex Genetics: Concepts and Tools: part A Andr G - - PowerPoint PPT Presentation

introduction to complex genetics concepts and tools part a
SMART_READER_LITE
LIVE PREVIEW

Introduction to Complex Genetics: Concepts and Tools: part A Andr G - - PowerPoint PPT Presentation

MolMed Course Genetics for Dummies Rotterdam, 7 November, 2018 Introduction to Complex Genetics: Concepts and Tools: part A Andr G Uitterlinden Genetic Laboratory Department of Internal Medicine Department of Epidemiology Department


slide-1
SLIDE 1

MolMed Course “Genetics for Dummies” Rotterdam, 7 November, 2018

Introduction to Complex Genetics: Concepts and Tools: part A

André G Uitterlinden

Genetic Laboratory Department of Internal Medicine

Department of Epidemiology Department of Clinical Chemistry

www.glimdna.org

Professor Trifonius Zonnebloem Professor Cuthbert Calculus Professeur Tryphon Tournesol

Our website…

slide-2
SLIDE 2

ROTTERDAM – OLDEBARNEVELDSTRAAT - MULTATULI

Portret gemaakt door Mathieu Ficheroux, 1974

Viewed from the moon we are all equal

slide-3
SLIDE 3

We differ from each other…

DNA variation causes differences in:

  • Development
  • Appearance
  • Behaviour
  • Ageing
  • Diseases
slide-4
SLIDE 4

AGING RESEARCH

slide-5
SLIDE 5

exon

mRNA Protein Gene structure DNA base pair sequence

A A C C G C A T A A G G T T G G C G T A T T C C . . . . . . . .

From DNA to RNA to Protein....

“Genetics”

“Genomics”

“Proteomics”

slide-6
SLIDE 6

Why do we study DNA variation ?

*Biology:

  • Mechanism: understand cause of disease
  • Treatment: finding new potential drug targets

*Prediction:

  • (Early) diagnostics with a stable marker:

understand how DNA variation contributes to variation in:

  • Risk of disease (vulnarability):

“personalized medicine”

  • “Response-to-treatment” (medication, diet):

“pharmacogenetics”

slide-7
SLIDE 7

“The Human Genome Project”

* 26 Juni 2000: Press conference Bill Clinton & Tony Blair: "working draft“, 95% gesequenced * 14 april 2003: finished: 99% gesequenced. >>Cheaper and Faster!! Costs: $ 2.7 miljard (instead of $ 3 billion estimated costs) Timing: 1990 - 2003 (instead of 2005)

Bill Clinton Tony Blair Craig Venter Francis Collins

What will DNA tell about this stain in a dress

slide-8
SLIDE 8

AGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGT GACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGG TGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTA GCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTA CGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGA TGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTA CTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGA GGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCT TAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGT TAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGA CTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGGAGTCTGACTGA CCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGA CTGCGATGCTGGACTGAACGCCCCTCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGAGGAGTCTGACTGACCA TGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGC GATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTACC ATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCAT AACCGTATAAGGGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTA GAACAAAATAGCGGTATTTTGGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCT CTAGTGATCGATGCTAGTAAGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATT AAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCA GCTGACGTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTA TCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTC TGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGAGGAGTCTGACTGACCATTGGACTAGGGG ACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACT CGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTA GTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCT TAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGG TATTTTGGGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCT GTGGGGGGTTAAATGCACACACACACACACACACACACACACACACACAGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGG GCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGA GATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGA CGCTAGCTAGAACAAAATAGCGGTATTTTGGAGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGG ATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCT GACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGT GCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGA

“SNP=Single Nucleotide Polymorphism”

DNA Variants are: *Frequent in the Genome (based on 500k WGS/WES):

  • Many: >150 million variable loci in human genome (~3%)
  • Types: “SNPs” , in/del, CNV, VNTR
  • Databases: dbSNP, HapMap, 1KG, “local” NGS efforts,..

*Frequent in the Population:

> 5 % = common polymorphism 1 – 5 % = less common variant < 1 % = rare variant/mutation

COMPLEX GENETICS: HUMAN DNA IS HIGHLY VARIABLE

“IN/DEL=Insertion Deletion” “CNV=Copy Number Variation” “VNTR=Variable Nunber of Tandem Repeats”

slide-9
SLIDE 9

AGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATT AGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGAC GTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGAT CGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTA GTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGGAGTCTGACTGACCATTGGAC TAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGC GATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGAGTCT GACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGA CGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTG CGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTA GTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGT GGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGGAGTCTGACTGACCATTGGACTAGGGGATT GACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGG ACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGAGGAGTCTGACTGACCATTGGACT AGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGC GATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTA CCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGAT CGATCATCGATAACCGTATAAGGGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATG CGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCG ATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAAGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCT GCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCC CCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAG TCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAG CTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAA CAAAATAGCGGTATTTTGGAGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGAC GATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCT GACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTA GCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATC GATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGCTA GCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGG GGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATC GATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCT AGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCG GTATTTTGGAGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGG ATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGA TGCTGACGTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAG CTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGA

“SNP” T-C “HAPLOTYPE”

slide-10
SLIDE 10

SNPs, alleles, genotypes and haplotypes

A C G T A G C A A C G T A G C A SNP= Single Nucleotide Polymorphism Genotype Allele Haplotype Allele

+ +

strand Chromosomes: from Father from Mother

slide-11
SLIDE 11

Single Nucleotide Polymorphisms (SNPs) are common and have subtle effects ..AACCGCATAAGG.. ..TTGGCGTATTCC.. ..AACCGTATAAGG.. ..TTGGCATATTCC..

Codon 222 Codon 222

Alanine Valine Ala Val DNA: C677T protein: Ala222Val Population frequency: 65% 35% c u

enzyme activity Hcy level Disease risk

slide-12
SLIDE 12

“Simple” versus “Complex” Disease

“Simple”/Monogenic Disease

  • severe phenotype
  • early onset
  • rare
  • Mendelian inheritance
  • e.g.: cystic fibrosis,
  • steogenesis imperfecta

“Complex” Disease

  • mild phenotype
  • late onset
  • common
  • complex inheritance
  • e.g.: diabetes, asthma,
  • steoporosis, etc.

Mutations

+ polymorphisms

Polymorphisms

+ mutations

Cause:

slide-13
SLIDE 13

DNA polymorphisms as markers

Pro:

  • DNA (essentially) does not change with age (=stable) and so

allows risk analysis at very early stage (=prognostic)

  • DNA is (essentially) identical in all tissues (=accessible)
  • Allows risk analysis for many characteristics (=comprehensive)

Contra:

  • Individual effects of most common variants are small
  • Not all contributing variants (common or rare) are known
slide-14
SLIDE 14

This is what happens when there are NO POLYMORPHISMS

Why are DNA polymorphisms important ?

Evolution Forensics Disease

Slide stolen from Prof Axel Themmen

DTC fun

slide-15
SLIDE 15

Quantitating the Genetic Contribution to Complex Traits using Twins

Monozygotic Dizygotic 100% 50% Genes Shared

H2 bone phenotypes

  • BMD

50-80%

  • Turnover

40-70%

  • Geometry 70-85%
  • QUS

~80%

  • Fracture:
  • Hip fx

3-70%

  • Wrist fx

54%

Resemblance MZ

Twin 1 Twin 2 r2 = 0.7

DZ

Twin 1 Twin 2 r2 = 0.3

  • Height

80-90%

  • Menopause 60%
  • BMI

60-70%

H2 bone-related phenotypes

H2 ~ 2x (r2

MZ – r2 DZ)

slide-16
SLIDE 16

Diabetes Breast cancer Osteoarthrosis Menopause Height Infidelity Entrepreneurship Paget’s Disease Depression Eye colour Osteoporosis Longevity Eye diseases Telomere length Etc.

Twin Studies Demonstrate Heritability

Heritable diseases and traits:

Rheumatoid arthritis Lung cancer BMI Weight Menarche cholesterol Uric acid Infectious disease susceptibility Ankylosing spondylitis Myocardial Infarction Skin colour Stroke Baldness Smoking behaviour Etc.

slide-17
SLIDE 17

Clinical Expression:

Risk Factors:

Fracture Risk

Bone Strength Impact Force Fall Risk

“OMICS”: DNA RNA, METHYLATION, MICROBIOME, etc BMD Quality Geometry

Osteoporotic fracture is a “complex” phenotype: Environmental factors: diet, exercise, ...

Hip fx Wrist fx Vertebral fx etc.

Age, Sex, Age-at-Menopause, Height, OA, etc.

dynamic genomics data: cause/effect? genetic data: cause

slide-18
SLIDE 18

The next challenge: Environmental Factors The field needs: standardization, harmonization, replication HOLLAND BELGIUM

> 1100 mg/day < 500 mg/day Dietary Calcium intake Geographical distance: <100km

Foto: Barbara Obermayer-Pietsch Foto: Stuart Ralston

slide-19
SLIDE 19

Time needed for analysing 1 SNP in 7.000 DNA samples

1996

6 months:

RFLP, Epp tubes

1999

3 months: RFLP, 96-well plates 2001 1 week: SBE, 384-well plates 2003 1 day: Taqman (manual)

2004 6 hrs: Taqman, Caliper pipetting robot

2005 3 hrs: Taqman, Deerac, “Fast” PCR

2007

6 sec:

Illumina 550K array, 600 DNAs/week 2010

< 0.0006 sec:

Illumina HiSeq2000 Sequencers

The influence of “technology-push”

slide-20
SLIDE 20

INCREASED LEVELS OF GENETIC RESOLUTION IN GENOME ANALYSIS BY HIGHER COVERAGE OF NUCLEOTIDES ANALYSED BY NEWER DNA ANALYSIS TECHNOLOGIES

(TOP ARE OLDER TECHNIQUES, BOTTOM ARE THE LATEST)

Technique: Genome Coverage: 0 % 0.1 % 0.5 % 1 % 95 %

TaqMan SNP Array SNP Array + Imputation Whole Exome Sequence (WES) Whole Genome Sequence (WGS)

TIME Next Generation Sequencing

slide-21
SLIDE 21

TTIME TVISIBILITY and/or ACTIVITY NGS, SEQUENCING ARRAYS

Progres (in DNA research) is technology driven:

slide-22
SLIDE 22

Erasmus MC Genomics Core Facility: SERVICES

BIOBANKING/DNA isolation GENOTYPING 2nd and 3rd GENERATION SEQUENCING BIOINFORMATICS TRANSCRIPTOMICS EPIGENETICS HIGH THROUGPUT ARRAYS MICROBIOME

Rotterdam Study, GenR, Parelsnoer, BBMRI, GEFOS, many more Bench marking with top institutes of the world Collaborations in large consortia Core facility for BBMRI GWAS, imputation, methylation analysis, exome and transcriptome, microbiome analysis Functional studies in cell lines DNA isolation 6 euro WES (50x) 350 euro GSA Array (800k) 28 euro RNA Seq 300 euro EPIC array (850k) 245 euro

  • 16S

24 euro

  • Metagenomics

200 euro Support >500 euro SERVICE PRICES* EXAMPLES *Prices are for standard service; inquire for other options (July 2018)

WWW.GLIMDNA.ORG

slide-23
SLIDE 23

28 euro for GSA array

Recently price of DNA analysis has gone down

700,000 DNA variants on the GSA array: GWAS, Clinical, pharmacogenetics, HLA, forensic, mitochondrial, ancestry, blood groups, etc.

slide-24
SLIDE 24

EU-GSA project time line at the Human Genomics Facility, Erasmus MC

  • Okt 2015: introduction of GSA concept by Illumina at ASHG
  • Nov 2015 – feb 2016: Illumina invites genotyping centres with their offer

(order minimum 150k samples!!…)

  • Dec 2015 – March 2016: HuGeF approaches people through its network
  • March 2016: Collaboration with Bonn and Paris to place single order
  • April 2016: HuGeF places first order of 150k (initial approval RvB)
  • Jan 2016 – July 2016: contact more people to reach 500k samples
  • Sept 2016: Final order of 500k samples (final approval Exec Board)
  • Dec 2016: Lab is optimized for running high throughput and data

production can start

  • Sept 2017: Many more orders >185 signed contracts (134

studies/cohorts)

The hunt for more samples began……

slide-25
SLIDE 25

28 euro for GSA array

In 2016 costs of DNA analysis has gone down

Arrays are preferred in large-scale application (compared to sequencing)

  • 30-100x (!) cheaper
  • Only relevant DNA variants
  • Customizable
  • Very high throughput
  • Easy data analysis and automation
  • DTC companies prefer arrays
  • Less ethical issues

700,000 DNA variants on the GSA array: GWAS, Clinical, pharmacogenetics, HLA, forensic, mitochondrial, ancestry, blood groups, etc.

slide-26
SLIDE 26

1 093 522

Europe 1 004 992 Netherlands 168 992 Canada/USA 28 209 Australia 37 219 Asia 21 952 South America 1 150 Africa 0

EU GSA consortium

Coordinating center HuGe-F Erasmus MC

By end 2018 there will be many SNP array datasets..

Existing: academic data 1 million samples (global) UK Biobank 0.5 mio samples (UK) Millions Veterans Program (MVP) 1 million samples (USA) FinGen 0.5 mio samples (Finland) 23andme >2 mio samples (USA centric) Avera, Kaiser Permanente 0.6 mio samples (USA) New: GSA sales 2016/2017/2018 >20 million samples (USA centric) EU-GSA 1.1 million samples (global)

TOTAL

~25 million samples with SNP array data……

slide-27
SLIDE 27

Gene Regulatory sequences

But not only for more GWAS….a shift from attention to regulatory variants to clinical/coding variants ?

Efforts with a focus on genes/coding variants:

  • WES/WGS
  • exome chip
  • new arrays with

enhanced cinical content (e.g., GSA, PMRA)

slide-28
SLIDE 28

New SNP arrays have a very “ rich” content: the goodies

slide-29
SLIDE 29

Effect Size Frequency Genetic Variant

rare, monogenic common, complex rare common small big

The Genetic Architecture of Diseases/Traits :

study designs to identify “risk” alleles

Genome-Wide Association Study (GWAS)

few “big” effects of common alleles (ApoE, CFH)

Whole Exome Sequencing (WES)

Next-Generation Sequencing

(WES/WGS of reference sets) +

Arrays/Imputation

slide-30
SLIDE 30

“Top-down” / hypothesis free

* Genome Wide Linkage Analysis

  • Pedigrees
  • Sib-pairs
  • Human, mouse

* Genome Wide Association (GWA) Analysis

  • 100K – 2000K SNP analysis in cases/controls

* Genome Wide Sequencing

  • full exome and full genome

“Bottom-up” / up-front hypothesis

* Association Analyses of Candidate Gene Polymorphisms (based on biology)

* Candidate Sequencing

  • Selected regions (e.g., exons, gene regions)

Effectiveness Type of Approach

  • +/-

+

Resolution

5-20 million bp 5k-50k bp 1 bp

+

1 bp

+/-

1 bp

Common risk alleles Rare risk alleles

+/- +/- + +/- +/-

slide-31
SLIDE 31

*to convince yourself+colleagues+society that the observation is true and generalizable *Methodology -in one centre- is different and/or flawed:

  • transformed cell lines are genomically unstable
  • wrong/mixed cell lines (HeLa!)
  • bad antibodies
  • complicated and/or different method
  • human error, fraud

*Effect sizes are small (GWAS, omics data) *The modelsystem is not representative for humans, e.g.:

  • worm/insect/mouse biology is not similar to human biology
  • only one (inbred mouse) strain is used (n=1 human, …a strange one!)
  • only one iPS cell line is used (n=1 human)
  • a small and/or strange human sample is used (cases only; an isolated

population; etc.)

Replication is needed (a few reasons):

 Replication in an independent sample/lab

(nót doing the experiment 6 times!)

=>”Tri-angulation”

(Davey-Smith, Munafo, Nature 2018)

SOLUTION:

  • Provide replication, and publish in one and the same paper, with

colleagues

  • Acknowledge contributions (e.g., with many authors)
slide-32
SLIDE 32

Grades of Evidence and the Reproducibility Crisis

Level Method Science disciplines

  • Large scale collaborative prospective

meta-analysis of individual level data in consortia

  • Meta-analysis of published data
  • >2 large studies (n > 1000 each)
  • 1-3 smaller studies
  • 1 small study (n<500), NO replication
  • Expert Opinion…

Very Good Not so Good

  • Complex Genetics
  • Physics
  • Astronomy
  • Sociology
  • Psychology
  • Medicine

Cell Biology

  • The biomedical community publishes 2,5 mio papers per year
  • Only 40% papers describe results that can be replicated (the “reproducibility crisis”)
slide-33
SLIDE 33

Collaboration doesn’t come easy…..

>> Donald Trump’s view on EUROPE…. ?

(From: Yanko Tsvetkov, alphadesigner.com)

Wall !! Wall !! Wall !! Wall !! Wall !! Wall !! Wall !!

slide-34
SLIDE 34

A “Culture” Change in doing Research:

GLOBAL OBAL CO COLLAB ABOR ORAT ATIONS NS IN N CO COMPL PLEX EX GEN ENETICS ETICS

Example: the “GIANT” consortium:

>2,000,000 participants…

SUNLIGHT consortium GSA- consortium

slide-35
SLIDE 35

AA→ BB→ AB→ . . . AB→ SNP1 SNP2 SNP3 . . . SNP550,000 1 2 3 4 5 6 7 8 14 18 X

Chromosomes

10 12 AA AB BB AA BB AB

DATA ANALYSIS (e.g., PLINK):

Replication Illumina Affymetrix

Genome-Wide Association Study (GWAS)

Select SNPs

DNA collection: e.g. 1000 cases vs. 1000 controls

Each dot is one SNP in, e.g, 2000 subjects

Meta-Analysis of all data Combine GWAS

  • Effects per SNP are usually small
  • We are looking at common variants
slide-36
SLIDE 36

NHGRI GWA Catalog www.genome.gov/GWAStudies www.ebi.ac.uk/fgpt/gwas/ Published Genome-Wide Associations through 12/2012 Published GWA at p≤5X10-8 for 17 trait categories

With GWAS for common variants we have:

*Genotyped only 1 mio (0.3%) nucleotides in the human genome *By imputation to reference data: ~50-500 mio nucleotides *Selected for “Universal/Cosmopolitan” variants *Explained –up till now- 2-70% of genetic variance per disease *but this will grow due to larger sample sizes… *have not analysed all phenotypes/diseases…

As of 11/19/13, the catalog includes 1751 publications and 11,912 SNPs.

slide-37
SLIDE 37

Per 12 May 2018:

  • 3,379 publications
  • 61,620 unique SNP-trait

associations. (www.ebi.ac.uk/GWAS )

GWAS ….drinking from the firehose

slide-38
SLIDE 38

HERC2/OCA2 gene

12 kb on Chr. 15q11

Rotterdam Study: Kayser et al, Am J Hum Genet, 2008

A “Dubai”plot: GWAS of human iris colour Chromosome / position P - value (-log 10)

P < 1.10-206 n = 5974

slide-39
SLIDE 39

A “Holland”plot: GWAS for BMD in the Rotterdam Study

N=5,000

5 x 10-8

  • Rotterdam Study
  • ERF Study
  • Twins UK
  • deCODE Genetics
  • Framingham Study

LUMBAR SPINE BMD

Rivadeneira et al., Nat Genet., 2009

slide-40
SLIDE 40

A real Manhattan plot: “height” in the GIANT consortium

5 x 10-8

Lango, Estrada, Rivadeneira et al., Nature, 2010

  • 180,000 subjects
  • 180 loci identified
  • 10-15% variance explained
slide-41
SLIDE 41

Human Genomic Life Course Epidemiology

Rotterdam Study Age (yr)

Bone Mineral Density

Bone growth Peak BMD Bone Loss 50 75 25 100

EPOS CALEUR AGGO

Osteoporosis: Low BMD, fractures men women

DNA/RNA collections for OMICS studies Maternal genotype Paternal genotype Environmental factors “Chronological” vs. “Biological” Ageing

bone phenoptyes

B-Proof intervention

Birth Death

Bone as an Example...

Off-spring

GenR

slide-42
SLIDE 42

! OPTIMAL EPIDEMIOLOGICAL DESIGN:

A single-centre, longitudinal population-based cohort study of normal elderly Dutch, started in 1990, with 25 years of follow-up ! LARGE: Total = 15,000 men and women of age  45 yrs ! VERY DEEP PHENOTYPING: 5 Follow-up measurements with ~1,500 per subject each time : height, bmi, brain MRI, DXA, cholesterol level, blood pressure, glucose, etc. etc. etc. ! ETHNICALLY HOMOGENEOUS: 99% White Caucasian ! EXTENSIVE GENOMICS DATA AVAILABLE: GWAS, RNA expression (array+ NGS), DNA methylation (450K), Whole Exome Sequencing, 16S microbiome, telomeres, mitochondrial DNA,

“ERGO: Erasmus Rotterdam Gezondheid Ouderen” “The Rotterdam Study”

slide-43
SLIDE 43

The Rotterdam Study comprises three cohorts (RS-I,-II,-III), examined every three to four years

GWAS Exome Methylation GWAS Methylation GWAS RNA Methylation

slide-44
SLIDE 44

Microbiome as a new source of biological variation:

10x more microbiome cells than human cells in and on our body

  • Archaea
  • Bacteria
  • Protozoa
  • Viruses
  • Fungi

– molds – yeasts

slide-45
SLIDE 45
  • Microscope
  • culture-based techniques
  • 16S rRNA amplicon
  • DGGE
  • arrays (hitChip)
  • sequencing
  • shotgun sequencing (metagenomics)
  • Affymetrix microbiome array

Microbiota profiling

This paper cited 7,135 times

slide-46
SLIDE 46

A new source of physiological variability: the human “microbiome”

  • Bacteria, yeast, virus, unicellular
  • Everywhere in and on our body

(stool, nose, skin, ear, eye, urine, mouth, aerosols,..)

  • Easy to type by 16S NGS (€20)
  • changes with age, diet, etc.

Radjabzadeh, Kraaij et al. (ms in preparation)

STOOL (not GUT!) MICROBIOME: 362 (out of 900) OTU’s in 800 (out of 1700)

faecal samples from the Rotterdam Study based on 16S

slide-47
SLIDE 47

Differential abundance of bacterial genera across RS and GenR

(FDR < 0.05; X-as represents the coefficient (transformed beta). Minus sign is used just for plotting.) Radjabzadeh, Kraaij et al. (ms in preparation)

Rotterdam Study Generation R

slide-48
SLIDE 48

MANY OMICS IN BIOBANKED TISSUES, BUT MOSTLY BLOOD…

slide-49
SLIDE 49

“Translational Genomics”

Progress of translating DNA Research into the Hospital….. We are here….?