Genomics Sequencing tech Sequencing tech: next generation What do - - PowerPoint PPT Presentation

genomics sequencing tech sequencing tech next generation
SMART_READER_LITE
LIVE PREVIEW

Genomics Sequencing tech Sequencing tech: next generation What do - - PowerPoint PPT Presentation

Genomics Sequencing tech Sequencing tech: next generation What do we get from sequencing? How to analyze these reads? Mutation identification: Mapping Cancer Heart Disease Brain Disease Genome projects: Assembly Use sequencing for other


slide-1
SLIDE 1

Genomics

slide-2
SLIDE 2

Sequencing tech

slide-3
SLIDE 3

Sequencing tech: next generation

slide-4
SLIDE 4
slide-5
SLIDE 5

What do we get from sequencing?

slide-6
SLIDE 6

How to analyze these reads?

slide-7
SLIDE 7

Cancer Heart Disease Brain Disease

Mutation identification: Mapping

slide-8
SLIDE 8

Genome projects: Assembly

slide-9
SLIDE 9

Use sequencing for other types of data

X-seq technology

slide-10
SLIDE 10

RNA-seq

slide-11
SLIDE 11

Assembly

slide-12
SLIDE 12

Assembly

Computational Challenge: assemble individual short fragments (reads) into a single genomic sequence (“superstring”)

slide-13
SLIDE 13

Problem: Given a set of strings, find a shortest string that contains all of them Input: Strings s1, s2,…., sn Output: A string s that contains all strings s1, s2, …., sn as substrings, such that the length of s is minimized

Shortest common superstring

slide-14
SLIDE 14

Shortest common superstring

slide-15
SLIDE 15

Any ideas?

slide-16
SLIDE 16

Directed Graph

slide-17
SLIDE 17

Overlap Graph

slide-18
SLIDE 18

Example

slide-19
SLIDE 19

Shortest common superstring problem is hard

slide-20
SLIDE 20

Shortest common superstring problem is hard

slide-21
SLIDE 21

Is there a better or more feasible way?

slide-22
SLIDE 22

Matching a superstring to a set of short reads

Assume we have a set S of reads with length k (k-mers) Goal: Find a string that can be exactly split in to set S.

slide-23
SLIDE 23

Overlap graph approach

Assume we have a set S of reads with length k (k-mers) Goal: Find a string that can be exactly split in to set S.

slide-24
SLIDE 24

Overlap graph approach is hard

Assume we have a set S of reads with length k (k-mers) Goal: Find a string that can be exactly split in to set S.

slide-25
SLIDE 25

There is an alternative way

slide-26
SLIDE 26

De Bruijn Graph

slide-27
SLIDE 27

De Bruijn Graph

slide-28
SLIDE 28

What is the goal now?

slide-29
SLIDE 29

AT GT CG CA GC TG GG Path visited every EDGE once

Overlap graph vs De Bruijn graph

slide-30
SLIDE 30

MultiEdge

slide-31
SLIDE 31

MultiGraph

slide-32
SLIDE 32

Some definitions

slide-33
SLIDE 33

Eulerian walk/path

zero or

slide-34
SLIDE 34

Eulerian walk/path

slide-35
SLIDE 35

Proof? Algorithm?

slide-36
SLIDE 36
  • a. Start with an arbitrary

vertex v and form an arbitrary cycle with unused edges until a dead end is reached. Since the graph is Eulerian this dead end is necessarily the starting point, i.e., vertex v.

Assume all nodes are balanced

slide-37
SLIDE 37
  • b. If cycle from (a) is not an

Eulerian cycle, it must contain a vertex w, which has untraversed edges. Perform step (a) again, using vertex w as the starting point. Once again, we will end up in the starting vertex w.

slide-38
SLIDE 38
  • c. Combine the cycles from

(a) and (b) into a single cycle and iterate step (b).

slide-39
SLIDE 39
  • A vertex v is semibalanced if

| in-degree(v) - out-degree(v)| = 1

  • If a graph has an Eulerian path starting from s and

ending at t, then all its vertices are balanced with the possible exception of s and t

  • Add an edge between two semibalanced vertices:

now all vertices should be balanced (assuming there was an Eulerian path to begin with). Find the Eulerian cycle, and remove the edge you had added. You now have the Eulerian path you wanted.

Eulerian path