Sequence Alignment Mark Voorhies 5/20/2015 Mark Voorhies Sequence - - PowerPoint PPT Presentation

sequence alignment
SMART_READER_LITE
LIVE PREVIEW

Sequence Alignment Mark Voorhies 5/20/2015 Mark Voorhies Sequence - - PowerPoint PPT Presentation

Sequence Alignment Mark Voorhies 5/20/2015 Mark Voorhies Sequence Alignment Exercise: Scoring an ungapped alignment Given two sequences and a scoring matrix, find the offset that yields the best scoring ungapped alignment. Mark Voorhies


slide-1
SLIDE 1

Sequence Alignment

Mark Voorhies 5/20/2015

Mark Voorhies Sequence Alignment

slide-2
SLIDE 2

Exercise: Scoring an ungapped alignment

Given two sequences and a scoring matrix, find the offset that yields the best scoring ungapped alignment.

Mark Voorhies Sequence Alignment

slide-3
SLIDE 3

Exercise: Scoring an ungapped alignment

Given two sequences and a scoring matrix, find the offset that yields the best scoring ungapped alignment.

def s c o r e (S , x , y ) : ””” Return alignment s c o r e f o r subsequences x and y f o r s c o r i n g matrix S ( r e p r e s e n t e d as a d i c t ) ””” a s s e r t ( len ( x ) == len ( y )) return sum(S [ i ] [ j ] f o r ( i , j ) i n z i p ( x , y )) def subseqs ( x , y , i ) : ””” Return subsequences

  • f

x and y f o r

  • f f s e t

i . ””” i f ( i > 0 ) : y = y [ i : ] e l i f ( i < 0 ) : x = x[− i : ] L = min ( len ( x ) , len ( y )) return x [ : L ] , y [ : L ] def ungapped (S , x , y ) : ””” Return best

  • f f s e t ,

score , and alignment between sequences x and y f o r s c o r i n g matrix S ( r e p r e s e n t e d as a d i c t ) . ””” best = None b e s t s c o r e = None f o r i i n range(−len ( x )+1 , len ( y ) ) : ( sx , sy ) = subseqs ( x , y , i ) s = s c o r e (S , sx , sy ) i f ( s > b e s t s c o r e ) : b e s t s c o r e = s best = i return best , b e s t s c o r e , subseqs ( x , y , best ) Mark Voorhies Sequence Alignment

slide-4
SLIDE 4

Dotplots

1

Unbiased view of all ungapped alignments of two sequences

Mark Voorhies Sequence Alignment

slide-5
SLIDE 5

Dotplots

1

Unbiased view of all ungapped alignments of two sequences

2

Noise can be filtered by applying a smoothing window to the diagonals.

Mark Voorhies Sequence Alignment

slide-6
SLIDE 6

Exercise: Scoring a gapped alignment

1 Given two equal length gapped sequences (where “-”

represents a gap) and a scoring matrix, calculate an alignment score with a -1 penalty for each base aligned to a gap.

2 Write a new scoring function with separate penalties for

  • pening a zero length gap (e.g., G = -11) and extending an
  • pen gap by one base (e.g., E = -1).

Sgapped(x, y) = S(x, y) +

gaps

  • i

(G + E ∗ len(i))

Mark Voorhies Sequence Alignment

slide-7
SLIDE 7

How many ways can we align two sequences?

Mark Voorhies Sequence Alignment

slide-8
SLIDE 8

How many ways can we align two sequences?

Mark Voorhies Sequence Alignment

slide-9
SLIDE 9

How many ways can we align two sequences?

Mark Voorhies Sequence Alignment

slide-10
SLIDE 10

How many ways can we align two sequences?

Mark Voorhies Sequence Alignment

slide-11
SLIDE 11

How many ways can we align two sequences?

Mark Voorhies Sequence Alignment

slide-12
SLIDE 12

How many ways can we align two sequences?

Binomial formula: k r

  • =

k! (k − r)!r!

Mark Voorhies Sequence Alignment

slide-13
SLIDE 13

How many ways can we align two sequences?

Binomial formula: k r

  • =

k! (k − r)!r! 2n n

  • = (2n)!

n!n!

Mark Voorhies Sequence Alignment

slide-14
SLIDE 14

How many ways can we align two sequences?

Binomial formula: k r

  • =

k! (k − r)!r! 2n n

  • = (2n)!

n!n! Stirling’s approximation: x! ≈ √ 2π

  • xx+ 1

2

  • e−x

Mark Voorhies Sequence Alignment

slide-15
SLIDE 15

How many ways can we align two sequences?

Binomial formula: k r

  • =

k! (k − r)!r! 2n n

  • = (2n)!

n!n! Stirling’s approximation: x! ≈ √ 2π

  • xx+ 1

2

  • e−x

2n n

  • ≈ 22n

√πn

Mark Voorhies Sequence Alignment

slide-16
SLIDE 16

Dynamic Programming

Mark Voorhies Sequence Alignment

slide-17
SLIDE 17

Needleman-Wunsch A G C G G T A G A G C G G A

Mark Voorhies Sequence Alignment

slide-18
SLIDE 18

Needleman-Wunsch A G C G G T A G A G C G G A

  • 1 -2 -3 -4 -5 -6 -7
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

Mark Voorhies Sequence Alignment

slide-19
SLIDE 19

Needleman-Wunsch A G C G G T A G A G C G G A

  • 1 -2 -3 -4 -5 -6 -7
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 1

A G

  • A

G - A -

  • G

Mark Voorhies Sequence Alignment

slide-20
SLIDE 20

Needleman-Wunsch A G C G G T A G A G C G G A

  • 1 -2 -3 -4 -5 -6 -7
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 1

Mark Voorhies Sequence Alignment

slide-21
SLIDE 21

Needleman-Wunsch A G C G G T A G A G C G G A

  • 1 -2 -3 -4 -5 -6 -7
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 1 0

Mark Voorhies Sequence Alignment

slide-22
SLIDE 22

Needleman-Wunsch A G C G G T A G A G C G G A

  • 1 -2 -3 -4 -5 -6 -7
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 1 0
  • 1

Mark Voorhies Sequence Alignment

slide-23
SLIDE 23

Needleman-Wunsch A G C G G T A G A G C G G A

  • 1 -2 -3 -4 -5 -6 -7
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 1 0
  • 1 -2

Mark Voorhies Sequence Alignment

slide-24
SLIDE 24

Needleman-Wunsch A G C G G T A G A G C G G A

  • 1 -2 -3 -4 -5 -6 -7
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 1 0
  • 1 -2 -3

Mark Voorhies Sequence Alignment

slide-25
SLIDE 25

Needleman-Wunsch A G C G G T A G A G C G G A

  • 1 -2 -3 -4 -5 -6 -7
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 1 0
  • 1 -2 -3 -4 -5

Mark Voorhies Sequence Alignment

slide-26
SLIDE 26

Needleman-Wunsch A G C G G T A G A G C G G A

  • 1 -2 -3 -4 -5 -6 -7
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 1 0
  • 1 -2 -3 -4 -5

Mark Voorhies Sequence Alignment

slide-27
SLIDE 27

Needleman-Wunsch A G C G G T A G A G C G G A

  • 1 -2 -3 -4 -5 -6 -7
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 1 0
  • 1 -2 -3 -4 -5
  • 1 -1 -2 -3 -4 -3
  • 1 1
  • 1 -2 -3
  • 2

2 1

  • 1 -2
  • 3 -1

1 3 2 1

  • 4 -2

2 4 3 2

  • 5 -3 -1

1 3 3 4

Mark Voorhies Sequence Alignment

slide-28
SLIDE 28

Needleman-Wunsch A G C G G T A G A G C G G A

  • 1 -2 -3 -4 -5 -6 -7
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 1 0
  • 1 -2 -3 -4 -5
  • 1 -1 -2 -3 -4 -3
  • 1 1
  • 1 -2 -3
  • 2

2 1

  • 1 -2
  • 3 -1

1 3 2 1

  • 4 -2

2 4 3 2

  • 5 -3 -1

1 3 3 4

Mark Voorhies Sequence Alignment

slide-29
SLIDE 29

Homework

Implement Needleman-Wunsch global alignment with zero gap

  • pening penalties. Try attacking the problem in this order:

1 Initialize and fill in a dynamic programming matrix by hand

(e.g., try reproducing the example from my slides on paper).

2 Write a function to create the dynamic programming matrix

and initialize the first row and column.

3 Write a function to fill in the rest of the matrix 4 Rewrite the initialize and fill steps to store pointers to the

best sub-solution for each cell.

5 Write a backtrace function to read the optimal alignment

from the filled in matrix. If that isn’t enough to keep you occupied, read the dynamic programming references from the class website. Try to articulate in your own words the logic for the speed-ups and trade-offs in the Myers and Miller approach.

Mark Voorhies Sequence Alignment