Intro to Alignment Algorithms: Global and Local Algorithmic - - PowerPoint PPT Presentation

intro to alignment algorithms global and local
SMART_READER_LITE
LIVE PREVIEW

Intro to Alignment Algorithms: Global and Local Algorithmic - - PowerPoint PPT Presentation

Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor Istrail Algorithmic Functions of Computational Biology Professor Istrail Sequence Comparison Biomolecular sequences DNA sequences


slide-1
SLIDE 1

Intro to Alignment Algorithms: Global and Local

Algorithmic Functions of Computational Biology Professor Istrail

slide-2
SLIDE 2

Sequence Comparison

Biomolecular sequences

DNA sequences (string over 4 letter alphabet {A, C, G, T})

RNA sequences (string over 4 letter alphabet {ACGU})

Protein sequences (string over 20 letter alphabet {Amino Acids}) Sequence similarity helps in the discovery of genes, and the prediction of structure and function of proteins.

Algorithmic Functions of Computational Biology Professor Istrail

slide-3
SLIDE 3

The Basic Similarity Analysis Algorithm

Global Similarity

  • Scoring Schemes
  • Edit Graphs
  • Alignment = Path in the Edit Graph
  • The Principle of Optimality
  • The Dynamic Programming Algorithm
  • The Traceback

Algorithmic Functions of Computational Biology – Professor Istrail

slide-4
SLIDE 4

The Sequence Alignment Problem

  • Input. : two sequences over the same alphabet and a scoring scheme

Output: an alignment of the two sequences of maximum score Example:

GCGCATTTGAGCGA

TGCGTTAGGGTGACCA A possible alignment:

  • GCGCATTTGAGCGA - -

TGCG - - TTAGGGTGACC

match mismatch indel

Algorithmic Functions of Computational Biology – Professor Istrail

slide-5
SLIDE 5

CSCI2820 - Class 4

Mismatch, Deletion, Insertion

5

mismatch deletion (in template) insertion (in template)

TCAGGGGGCTATT AGTCCTCCGATAA TCAGGGGGCTATT AGTCC-CCGATAA TCAGGGGG-CTATT AGTCCCCCCGATAA

slide-6
SLIDE 6

m n

y y y Y x x x X

... 2 1 ... 2 1

= =

Consider two sequences Over the alphabet

} , , , T G C A { = Σ

j i y

x,

belong to Σ

Algorithmic Functions of Computational Biology – Professor Istrail

slide-7
SLIDE 7

Scoring Schemes

δ

Unit-score A C G T A C G T 1 1 1 1

  • Algorithmic Functions of Computational Biology –

Professor Istrail

slide-8
SLIDE 8

Alignment

ACG | | | AGG

δ

Score = (A,A) (C,G) (G,G) + + = 1 + 0 + 1 = 2 Unit-cost A | A A is aligned with A C | G C is aligned with G G is aligned with G G | G

Algorithmic Functions of Computational Biology – Professor Istrail

δ δ

slide-9
SLIDE 9

Gaps

ACATGGAAT ACAGGAAAT ACAT GG - AAT ACA - GG AAAT OPTIMAL ALIGNMENTS SCORE 7 8 AAAGGG GGGAAA SCORE 3

  • - - AAAGGG

GGGAAA - - -

“-” is the gap symbol

Algorithmic Functions of Computational Biology – Professor Istrail

slide-10
SLIDE 10

δ δ δ(x,y) = the score for aligning x with y

(-,y) = the score for aligning - with y (x,-) = the score for aligning x with -

Algorithmic Functions of Computational Biology – Professor Istrail

slide-11
SLIDE 11

A-CG - G ATCGTG Alignment Score

δ δ δ δ δ δ

(A,A) + (G,G) + (C,C) + (-,T) + (-,T ) + (G,G) THE SUM OF THE SCORES OF THE PAIRWISE ALIGNED SYMBOLS

Algorithmic Functions of Computational Biology – Professor Istrail

slide-12
SLIDE 12

Margaret Dayhoff & PAM Similarity Matrices

ARTEMIS Summer 2008 Professor Istrail

slide-13
SLIDE 13
  • Dr. Margaret Oakley Dayhoff

The Mother & Father of Bioinformatics

ARTEMIS Summer 2008 Professor Istrail

slide-14
SLIDE 14

Scoring Scheme

Dayhoff PAM scoring matrices

...

δ

PTIPLSRLFDNAMLRAHRLHQ SAIENQRLFNIAVSRVQHLHL Partial alignment for Monkey and Trout somatotropin proteins

  • A R N D C Q E G H I L K M F P S T W Y V
  • 8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8
  • 8 3 -3 0 0 -3 -1 0 1 -3 -1 -3 -2 -2 -4 1 1 1 -7 -4 0

A R N D 6 4

Algorithmic Functions of Computational Biology – Professor Istrail

slide-15
SLIDE 15

Scoring Functions

Scoring function = a sum of a terms each for a pair of

aligned residues, and for each gap The meaning = log of the relative likelihood that the sequences are related, compared to being unrelated Identities and conservative substitutions are Positive terms Non-conservative substitutions are Negative terms

Mutations= Substitutions, Insertions, Deletions

Algorithmic Functions of Computational Biology – Professor Istrail

slide-16
SLIDE 16

CSCI2820 - Class 4

Global alignment problem

  • Input: Sequences X and Y of length m and

n respectively and a similarity matrix

  • Output: An optimal global alignment of X

and Y

– Global alignments require all bases in both sequences are aligned

16

slide-17
SLIDE 17

CSCI2820 - Class 4

Local alignment problem

  • Input: Sequences X and Y of length m and

n respectively and a similarity matrix

  • Output: An optimal local alignment of X

and Y

– Local alignments do not require using all bases in either sequence in the alignment

  • Applicable when looking for subsequences
  • f similarity

17

slide-18
SLIDE 18

The Edit Graph

Suppose that we want to align AGT with AT We are going to construct a graph where alignments between the two sequences correspond to paths between the begin and and end nodes of the graph. This is the Edit Graph

Algorithmic Functions of Computational Biology – Professor Istrail

slide-19
SLIDE 19

1

2 3 1 2

AGT has length 3 AT has length 2 The Edit graph has (3+1)*(2+1) nodes The sequence AGT The sequence AT

Algorithmic Functions of Computational Biology – Professor Istrail

slide-20
SLIDE 20

1 2 3 1 2

A G T A T

AGT indexes the columns, and AT indexes the rows of this “table”

Begin End

Algorithmic Functions of Computational Biology – Professor Istrail

slide-21
SLIDE 21

1 2 3 1 2

A G T A T

Begin End

The Graph is directed. The nodes (i,j) will hold values.

Algorithmic Functions of Computational Biology – Professor Istrail

1 1 2

A G A T

slide-22
SLIDE 22

T

1 2 3 1 2

A G A T

Begin End

Algorithmic Functions of Computational Biology Professor Istrail

slide-23
SLIDE 23

T A T

1 2 3 1 2

A G

A

  • A

A A

  • A
  • A
  • A
  • A

G

  • A
  • A
  • G
  • G
  • T
  • T
  • T
  • T
  • T
  • T
  • T

A T G T T T G A T A

Directed edges get as labels pairs of aligned letters.

Begin End

Algorithmic Functions of Computational Biology – Professor Istrail

slide-24
SLIDE 24

Alignment = Path in the Edit Graph

T A T

1 2 3 1 2

A G

A

  • A

A A

  • A
  • A
  • A
  • A

G

  • A
  • A
  • G
  • G
  • T
  • T
  • T
  • T
  • T
  • T
  • T

A T G T T T G A T A

AGT A-T

Begin End

Every path from Begin to End corresponds to an alignment Every alignment corresponds to a path between Begin and End

Algorithmic Functions of Computational Biology – Professor Istrail

slide-25
SLIDE 25

The Principle of Optimality

The optimal answer to a problem is expressed in terms of optimal answer for its sub-problems

Algorithmic Functions of Computational Biology – Professor Istrail

slide-26
SLIDE 26

Dynamic Programming

Part 1: Compute first the optimal alignment score Part 2: Construct optimal alignment We are looking for the optimal alignment = maximal score path in the Edit Graph from the Begin vertex to the End vertex Given: Two sequences X and Y Find: An optimal alignment of X with Y

Algorithmic Functions of Computational Biology – Professor Istrail

slide-27
SLIDE 27

The DP Matrix S(i,j)

1 2 3 1 2

A G T A T S(2,1) S(1,0)

Algorithmic Functions of Computational Biology – Professor Istrail

slide-28
SLIDE 28

The DP Matrix

Matrix S =[S(i,j)] S(i,j) = The score of the maximal cost path from the Begin Vertex and the vertex (i,j) (i,j) (i,j-1) (i-1,j) (i-1,j-1) The optimal path to (i,j) must pass through one of the vertices (i-1,j-1) (i-1,j) (i,j-1)

O p t i m a l P a t h t

  • (

i , j )

Algorithmic Functions of Computational Biology – Professor Istrail

slide-29
SLIDE 29

Opt path

(i,j) (i,j-1) (i-1,j) (i-1,j-1) Optimal path to (i-1,j) + (- , yj)

  • xi

yj

  • S(i-1,j) + δ

δ

(- , yj)

Algorithmic Functions of Computational Biology – Professor Istrail

slide-30
SLIDE 30

Optimal path

(i,j) (i-1,j) (i,j-1) (i-1,j-1) Optimal path to (i-1,j-1) + (xi,yj)

δ δ

S(i-1,j-1) + (xi , yj)

Algorithmic Functions of Computational Biology – Professor Istrail

slide-31
SLIDE 31

Optimal path

(i,j) (i,j-1) (I-1,j) (i-1,j-1) Optimal path to (i,j-1) + (xi,-)

δ δ

S(i,j-1) + (xi, -)

Algorithmic Functions of Computational Biology – Professor Istrail

slide-32
SLIDE 32

The Basic ALGORITHM

S(i,j) = S(i-1, j-1) + (xi, yj) S(i-1, j) + (xi, -) S(i, j-1) + (-, yj)

MAX

δ δ δ

Algorithmic Functions of Computational Biology – Professor Istrail

slide-33
SLIDE 33

T A T

1 2 3 1 2

A G

A

  • A

A A

  • A
  • A
  • A
  • A

G

  • A
  • A
  • G
  • G
  • T
  • T
  • T
  • T
  • T
  • T
  • T

A T G T T T G A T A

1 1 1 1 1 2

AGT A - T Optimal Alignment

Optimal Alignment and Tracback

Algorithmic Functions of Computational Biology – Professor Istrail

slide-34
SLIDE 34

S(i,j) = S(i-1, j-1) + (xi, yj), S(i-1, j) + (xi, -), S(i, j-1) + (-, yj)

MAX

δ δ δ

0,

We add this

The Basic ALGORITHM: Local Similarity

Algorithmic Functions of Computational Biology – Professor Istrail

slide-35
SLIDE 35

CSCI2820 - Class 4

Protein global alignment

35

X = hlsek Y = nlsak

  • X and Y represent a protein subsequence

from the BRCA2 (early onset) protein in human and chimpanzee

  • Global alignments are used when the two

sequences being compared represent a similar biological sequence

slide-36
SLIDE 36

CSCI2820 - Class 4

Margaret Dayhoff’s PAM 100 similarity matrix (partial)

36

A N E H L K S * A 4

  • 1
  • 3 -3 -3

1

  • 9

N

  • 1

5 1 2

  • 4

1 1

  • 9

E 1 5

  • 1 -5 -1 -1 -9

H

  • 3

2

  • 1

7

  • 3 -2 -2 -9

L

  • 3 -4 -5 -3

6

  • 4 -4 -9

K

  • 3

1

  • 1 -2 -4

5

  • 1 -9

S 1 1

  • 1 -2 -4 -1

4

  • 9

*

  • 9 -9 -9 -9 -9 -9 -9

1

slide-37
SLIDE 37

CSCI2820 - Class 4 37

h l s e k n l s a k

X Y

  • 9

2

  • 7
  • 16
  • 25
  • 34
  • 9
  • 18
  • 27
  • 36
  • 45
  • 27
  • 16
  • 1

12 3

  • 6
  • 36
  • 25
  • 10

3 12 3

  • 18
  • 7

8

  • 1
  • 10
  • 19
  • 45
  • 34
  • 19
  • 6

3 17