Principles and Applicaons of Modern Principles and Applicaons of - - PowerPoint PPT Presentation

principles and applica ons of modern principles and
SMART_READER_LITE
LIVE PREVIEW

Principles and Applicaons of Modern Principles and Applicaons of - - PowerPoint PPT Presentation

Principles and Applicaons of Modern Principles and Applicaons of Modern DNA Sequencing DNA Sequencing EEEB GU4055 EEEB GU4055 Session 11: Genome Assembly Session 11: Genome Assembly 1 Today's topics Today's topics 1. de Bruijn


slide-1
SLIDE 1

Principles and Applicaons of Modern Principles and Applicaons of Modern DNA Sequencing DNA Sequencing

EEEB GU4055 EEEB GU4055

Session 11: Genome Assembly Session 11: Genome Assembly

1

slide-2
SLIDE 2

Today's topics Today's topics

  • 1. de Bruijn Graphs and Euler.
  • 2. Kmers.
  • 3. Challenges in Genome Assembly.
  • 4. Empirical Example.

2

slide-3
SLIDE 3

Kmers and de Bruijn graphs Kmers and de Bruijn graphs

Reads start and end at different posions covering all or nearly all of the genome. Decomposing reads into smaller kmers makes it more likely that we have uniformly sized bits covering the enre genome. This is useful for building a graph.

3

slide-4
SLIDE 4

Kmers and de Bruijn graphs Kmers and de Bruijn graphs

Shortest possible superstring that contains all substrings of length k.

4

slide-5
SLIDE 5

Kmers and de Bruijn graphs Kmers and de Bruijn graphs

Hamiltonian graph requires comparing/aligning kmers, which is hard when the number and size of kmers is large. de Bruijn graphs join idencal matching (k-1)mers, such that kmers form the edges of the graph -- a much simpler computaon.

5

slide-6
SLIDE 6

[3] Action: Write a function to get all 5 mers from the

When poll is active, respond at PollEv.com/dereneaton004

6

slide-7
SLIDE 7

[6,7,8] Use functions to accomplish the designated tasks... Compare your functions and results with at least two of your

7

slide-8
SLIDE 8

Genome Assembly Genome Assembly

8

slide-9
SLIDE 9

denovo Genome Assembly denovo Genome Assembly

denovo genome assembly is computaonally demanding. Requires reads that cover the full genome many mes (e.g., 50X). The end goal is to assemble scaffolds that match to chromosomes -- the real *bits* of the genome.

9

slide-10
SLIDE 10

Combining short and long-read technologies Combining short and long-read technologies

Short read assemblies are highly fragmented. Long read technologies are highly error

  • prone. Combining the two technologies -- while obtaining high-coverage of both -- is

currently the gold standard.

10

slide-11
SLIDE 11

Caveats: Long reads require HMW DNA, somemes a lot. Caveats: Long reads require HMW DNA, somemes a lot.

Specialized DNA extracon kits and protocols are used to isolate long (unbroken) DNA fragment lengths. More expensive and me-consuming, but worth it.

11

slide-12
SLIDE 12

Eucalypus: (500Mb size, 170X ONT; 200X Illumina) Eucalypus: (500Mb size, 170X ONT; 200X Illumina)

12

slide-13
SLIDE 13

Scaffolding: Hi-C Proximity Ligaon Scaffolding: Hi-C Proximity Ligaon

Chromosome conformaon capture (3C) describes the structure of the genome within a cell; it's organizaon and structure. Beer than microscopy, can tell us how close together (potenally interacng) some regions of the genome are (such as promoters and enhancers). Hi-C: A highthroughput version of 3C is based a library preparaon to build chimeric reads followed by short-read sequencing of paired-end reads. Creates a contact map

  • f interacons correlated to spaal distance.

13

slide-14
SLIDE 14

Scaffolding: Hi-C Proximity Ligaon Scaffolding: Hi-C Proximity Ligaon

Restricon digeson; streptavidin bead extracon; paired-seq.

14

slide-15
SLIDE 15

Scaffolding: Amaranthus Hi-C Assembly Scaffolding: Amaranthus Hi-C Assembly

15