next generation sequencing the basics
play

Next Generation Sequencing The basics Wilfred van IJcken Erasmus - PowerPoint PPT Presentation

Center for Biomics Next Generation Sequencing The basics Wilfred van IJcken Erasmus MC Center for Biomics Biomedical Research Techniques (XVIth ed.), Nov 6 Learning objectives Next generation sequencing (NGS): The basics Background


  1. Center for Biomics Next Generation Sequencing The basics Wilfred van IJcken Erasmus MC Center for Biomics Biomedical Research Techniques (XVIth ed.), Nov 6

  2. Learning objectives Next generation sequencing (NGS): The basics  Background  Illumina sequencing technology  Terminology Next presentation  Research applications  Diagnostic applications  Future directions

  3. What is next generation sequencing?  Sequencing technology developed after Sanger  Millions of reads in parallel (MPS)  Shorter (<400bp) sequencing reads  Enables analysis of complex mixtures of DNA or RNA  Enables genome wide approach  Different vendors with different approaches  MPS = massive parallel sequencing

  4. NGS systems on the market Desktop High Throughput Special Different characteristics Sequencing technology Readlength Speed Output Applications Run cost

  5. Illumina systems  6 Tb per run Data amount HiSeq X Ten NovaSeq6000 HiSeq 4000 HiSeq 2500 Run costs 8 Gb NextSeq 500 Purchase cost MiSeq MiniSeq

  6. NGS flow Intake Isolate Library Sequence Report yield ID DNA or Select chemistry quality RNA enzymes amount region of sex interest Variation disease blood detection plasma PCR signal Match phenotype? saliva capture FFPE cells

  7. DNA library prep

  8. Sequencing by Synthesis cluster generation lane flowcell

  9. Bridge amplification

  10. Sequencing incorporated

  11. Sequencing and basecalling Read 1 A G T C Image acquisition 1 2 3 4 5 6 7 8 9 Base calling C A A G T A A C …

  12. SingIe-end, paired end, index read Index read Single Read GATCG Paired end read Single read = sequence from one side of the fragment Paired end = sequence from both sides of the fragment

  13. Indexing enables sample multiplexing Index Patient 1 GATCG Patient 2 CGTGA ATCGG Patient 3 TCTCT Patient 4 Index = different nucleic acid code per sample  introduced during sampleprep  read during index read Enables multiple samples in one flowcell lane

  14. Sequence Index 1

  15. Sequence Index 2

  16. Sequence Read 2 Image acquisition 1 2 3 4 5 6 7 8 9 C A A G T A A C …

  17. Summary sequencing technology Index 2 Read 2 Read 1 Index 1

  18. Simplified RNA sample preparation DNA RNA Reverse transcriptase Adaptor 1 Adaptor 2

  19. Output file from basecalling  Many file types: qseq, fastq, etc… C A A G T A A C …  Each system own format.  Large file sizes: >400 million reads per lane Instrument PF (0,1) X-coord Y-coord Index # Read # Run ID Lane Tile Sequence ASCII Character Q-score

  20. Data analysis not trivial due to data volumes and complexity Data Volume Total Final Comment HiSeq 2000 200G run Image Data 32 TB 0 Intensity Data 2 TB 0 Optionally transferred 1 byte/base (raw) assuming Base Call / Quality Score Data 0.25 TB 0.25 TB qseq generation offline Alignment Output 6 TB (3 TB) 1.2 TB Remove intermediate files GA IIx 50G run 150 M reads x 8 lanes x 100 bp x 2 (paired end) = 240 Gbp Image Data 6.9 TB 0 Optionally transferred Intensity Data 0.93 TB 0.93 TB Storage and compute needed Base Call / Quality Score Data 0.17 TB 0.17 TB Alignment Output 1.2 TB 1.2 TB Core facilities

  21. Terminology  Next generation sequencing, AKA:  - Deep sequencing  - MPS = massive parallel sequencing Cluster # of sequencing cycles 1 2 3 4 5 6 7 8 9 = readlength T G C T A C G A T … Read

  22. Alignment, Mapping Reference sequence AAAACGCGCTTAGCCTTT T TTCGACTGTCGAGTGGA A CGCCGCTAGCTAGGCGC Heterozygous SNP mismatch Consensus sequence AAAACGCGCTTAGCCTTT T TTCGACTGTCGAGTGGA T CGCCGCTAGCTAGGCGC TAGCCTTT T TTCGACTGTCGAGTGGATCGCCG AGCCTTT T TTCGACTGTCGAGTGGATCGCCGC GCCTTT G TTCGACTGTCGAGTGGATCGCCGCT CCTTT G TTCGACTGTCGAGTGGATCGCCGCTA

  23. Read depth Aka depth of coverage 1 5 7 AAAACGCGCTTAGCCTTT T TTCGACTGTCGAGTGGA T CGCCGCTAGCTAGGCGC TAGCCTTT T TTCGACTGTCGAGTGGATCGCCG AGCCTTT T TTCGACTGTCGAGTGGATCGCCGC GCCTTT G TTCGACTGTCGAGTGGATCGCCGCT CCTTT G TTCGACTGTCGAGTGGATCGCCGCTA GACTGTCGAGTGGATCGCCGCTAGCTAGG CTGTCGAGTGGATCGCCGCTAGCTAGG  Average read depth can differ a lot from read depth !

  24. Accuracy, error rate, quality score  Single base error rate =  Total number of mismatched bases found in mapped sequence reads from a sequencing run, divided by the mappable yield.  Quality scores (Q scores / phred scores)  - derived from an examination of the intensity peaks around each base  - range from 0 – 41, higher corresponds to higher quality  - Q = -10log 10 p, p is basecall error probability Quality score Probability of Base call incorrect base call accuracy 10 (Q10) 1 in 10 90% 20 (Q20) 1 in 100 99% 30 (Q30) 1 in 1000 99.9%

  25. Traditional vs NextGen Sequencing Sanger sequencing: 1 sequence read per basepair NGS: Multiple sequence reads per basepair

  26. Erasmus Center for Biomics Genomics core facility at ErasmusMC www.biomics.nl w.vanijcken@erasmusmc.nl LNA

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend