pairwise sequence alignment
play

Pairwise Sequence Alignment Todays Goal > DNA Sequence 1 - PDF document

Pairwise Sequence Alignment Todays Goal > DNA Sequence 1 ACTGCGATTGACGTACGATCATCGTACGATCATCATGCTGAGCTATCATCATCGTACTGA TCGTAGACTACGTAGCTAGCATGCAGTCTGATGACGTCATGCTGACGTAGCATGC > DNA Sequence 2


  1. Pairwise Sequence Alignment Today’s Goal > DNA Sequence 1 ACTGCGATTGACGTACGATCATCGTACGATCATCATGCTGAGCTATCATCATCGTACTGA TCGTAGACTACGTAGCTAGCATGCAGTCTGATGACGTCATGCTGACGTAGCATGC > DNA Sequence 2 GACTAGCAGCGAGAGATCTCTCGAGTATGCGAGAGCTGATGCATCTACGTATGCAGTCGT GCTAATGCGAGCGTATACGCGGGCATGTAGAGACTTCCTAGTAC How related are two sequences? > Protein Sequence 1 KGLAHDGHNADFLKAMGGPIAFPIDADPFIDFKLHMNI > Protein Sequence 2 LHASDGFKHSADFHNAIFDPAFLKADFPIMADSFN 1

  2. Alignment CGTAGCAGC TGTAGTTCAGC CGTAG--CAGC |||| |||| TGTAGTTCAGC There’s more than one way to align a pair of sequences CGTTACA--TG CGTT-ACATG | || | | || | C-GTT-ACATG T-GT-CACGT- -TGTCACGT- C-G-T-TACATG | | || | CG-TTACATG || -TG-TCACGT- | | TG-T-C-AC-GT TGTC-A-CGT CGTTACATG CGTTACATG- -CGTTACA-TG || || | TGTCACGT | | || TGT--CACGT T-G-T-CACGT CGTTACATG C-----GTTACATG -CGTTAC-ATG || || | || CGTT-ACATG- || | TGTCACGT- TGTCACGT------ | | || | T-GTCA-C-GT TG-TCAC--GT CGTTACATG- CGT-TACATG- | || | CGTTACATG | || | --TGTCACGT | T-G-T-CACGT T-GTCACGT 2

  3. Scoring Alignments Match: +5 Mismatch: -4 Gap: -6 CGCGTTA ACTCGATCG CGTAGCAGCT CGGGTCA ACTTCG CATACAGGACT CGCGTTA ACTCGATCG CGTAGCAG--CT || || | ||| ||| | || ||| || CGGGTCA ACT---TCG CATA-CAGGACT Use the optimal (best scoring) alignment CGTTACA--TG CGTT-ACATG | || | | || | C-GTT-ACATG T-GT-CACGT- -TGTCACGT- C-G-T-TACATG | | || | CG-TTACATG || -TG-TCACGT- | | TG-T-C-AC-GT TGTC-A-CGT CGTTACATG CGTTACATG- -CGTTACA-TG || || | TGTCACGT | | || TGT--CACGT T-G-T-CACGT CGTTACATG C-----GTTACATG -CGTTAC-ATG || || | || CGTT-ACATG- || | TGTCACGT- TGTCACGT------ | | || | T-GTCA-C-GT TG-TCAC--GT CGTTACATG- CGT-TACATG- | || | CGTTACATG | || | --TGTCACGT | T-G-T-CACGT T-GTCACGT 3

  4. Pairwise Sequence Alignment Pairwise Alignment Problem: Given two sequences, determine their optimal (i.e., best scoring) alignment. How many different alignments? CGTTACA--TG CGTT-ACATG | || | | || | C-GTT-ACATG T-GT-CACGT- -TGTCACGT- C-G-T-TACATG | | || | CG-TTACATG || -TG-TCACGT- | | TG-T-C-AC-GT TGTC-A-CGT CGTTACATG CGTTACATG- -CGTTACA-TG || || | TGTCACGT | | || TGT--CACGT T-G-T-CACGT CGTTACATG C-----GTTACATG -CGTTAC-ATG || || | || CGTT-ACATG- || | TGTCACGT- TGTCACGT------ | | || | T-GTCA-C-GT TG-TCAC--GT CGTTACATG- CGT-TACATG- | || | CGTTACATG | || | --TGTCACGT | T-G-T-CACGT T-GTCACGT 4

  5. The Elegance of Alignment The problem of finding the best alignment of two sequences has two important properties: (1) The solution can be found by looking at the solutions to subproblems (2) Subproblems often overlap Indeed, to find the best alignment of two sequences, we need only look at 3 slightly smaller alignments (i.e., remove one or two characters from the sequences). The Elegance of Alignment AGCGTTA ACGTGA AGCGTT A + ACGTGA - 5

  6. The Elegance of Alignment AGCGTTA ACGTGA AGCGTT A AGCGTTA - + + ACGTGA - ACGTG A The Elegance of Alignment AGCGTTA ACGTGA AGCGTT A AGCGTTA - AGCGTT A + + + ACGTGA - ACGTG A ACGTG A 6

  7. The Elegance of Alignment AGCGTTA ACGTGA AGCGTT A AGCGTTA - AGCGTT A + + + ACGTGA - ACGTG A ACGTG A AGCGTT- A AGCGTTA - AGCGTT A | ||| + | ||| + | ||| | + A-CGTGA - A-CGTG- A A-CGTG A 4 - 6 4 - 6 10 +5 The Elegance of Alignment AGCGTTA ACGTGA AGCGTT A AGCGTTA - AGCGTT A + + + ACGTGA - ACGTG A ACGTG A AGCGT AGCGTT AGCGT AGCGTT AGCGTTA AGCGTT AGCGT AGCGTT AGCGT ACGTGA ACGTG ACGTG ACGTG ACGT ACGT ACGTG ACGT ACGT 7

  8. The Elegance of Alignment The problem of finding the best alignment of two sequences has two important properties: (1) The solution can be found by looking at the solutions to subproblems (2) Subproblems often overlap The method for determining the best alignment is known as a dynamic programming algorithm . Score Table AGCGTTA A G C G T T A ACGTGA A C G T G A 8

  9. Score Table AGCGTTA A G C G T T A ACGTGA A C AGCGT ACG G T G A Score Table AGCGTTA A G C G T T A ACGTGA A C A ACGTG G T G A 9

  10. How is each block in the table determined? • Each entry depends on 3 previous entries (because of problem’s “elegance”) • Each entry also depends on scores used (match, mismatch, gap) - Score in block to the left minus gap A G C G T T A penalty max A of 3 - Score in block above minus gap penalty C G - Score in block diagonally left/above T G plus match/mismatch score A The Elegance of Alignment AGCGTTA ACGTGA AGCGTT A AGCGTTA - AGCGTT A + + + ACGTGA - ACGTG A ACGTG A AGCGTT- A AGCGTTA - AGCGTT A | ||| | ||| | ||| | + + + A-CGTGA - A-CGTG- A A-CGTG A 10

  11. Alignment Score Table AGCGTTA A G C G T T A ACGTGA 0 -6 -12 -18 -24 -30 -36 -42 A -6 C -12 G -18 T -24 G -30 A -36 Alignment Score Table AGCGTTA A G C G T T A ACGTGA 0 -6 -12 -18 -24 -30 -36 -42 A -6 5 C -12 G -18 T -24 G -30 A -36 11

  12. Alignment Score Table AGCGTTA A G C G T T A ACGTGA 0 -6 -12 -18 -24 -30 -36 -42 A -6 5 -1 C -12 G -18 T -24 G -30 A -36 How do we re-create the alignment? AGCGTTA A G C G T T A ACGTGA 0 -6 -12 -18 -24 -30 -36 -42 A -6 5 -1 -7 -13 -19 -25 -31 C -12 -1 1 4 -2 -8 -14 -20 G -18 -7 4 -2 9 3 -3 -9 T -24 -13 -2 0 3 14 8 2 G -30 -19 -8 -6 5 8 10 4 A -36 -25 -14 -12 -1 2 4 15 12

  13. How do we re-create the alignment? AGCGTTA A G C G T T A ACGTGA 0 -6 -12 -18 -24 -30 -36 -42 A -6 5 -1 -7 -13 -19 -25 -31 C -12 -1 1 4 -2 -8 -14 -20 G -18 -7 4 -2 9 3 -3 -9 T -24 -13 -2 0 3 14 8 2 G A -30 -19 -8 -6 5 8 10 4 | A -36 -25 -14 -12 -1 2 4 15 A How do we re-create the alignment? AGCGTTA A G C G T T A ACGTGA 0 -6 -12 -18 -24 -30 -36 -42 A -6 5 -1 -7 -13 -19 -25 -31 C -12 -1 1 4 -2 -8 -14 -20 G -18 -7 4 -2 9 3 -3 -9 T -24 -13 -2 0 3 14 8 2 G TA -30 -19 -8 -6 5 8 10 4 | A -36 -25 -14 -12 -1 2 4 15 GA 13

  14. How do we re-create the alignment? AGCGTTA A G C G T T A ACGTGA 0 -6 -12 -18 -24 -30 -36 -42 A -6 5 -1 -7 -13 -19 -25 -31 C -12 -1 1 4 -2 -8 -14 -20 G -18 -7 4 -2 9 3 -3 -9 T -24 -13 -2 0 3 14 8 2 G AGCGTTA -30 -19 -8 -6 5 8 10 4 | ||| | A -36 -25 -14 -12 -1 2 4 15 A-CGTGA Let’s recap, shall we? • The problem of finding the best alignment for two sequences has a couple of interesting properties: (1) The best alignment can be determined using the best alignments of subproblems (2) Subproblems often overlap • Because of these properties, we can fill in a table of solutions to subproblems • Each table entry is determined from 3 of the preceding entries • The filled-in table tells us the best alignment! 14

  15. How big is our table? AGCGTTA A G C G T T A ACGTGA 0 -6 -12 -18 -24 -30 -36 -42 A -6 5 -1 -7 -13 -19 -25 -31 C -12 -1 1 4 -2 -8 -14 -20 G -18 -7 4 -2 9 3 -3 -9 T -24 -13 -2 0 3 14 8 2 G -30 -19 -8 -6 5 8 10 4 A -36 -25 -14 -12 -1 2 4 15 Global vs. Local TGGTAGATTCCCACGAGATCTACCGAGTATGAGTAGGGGGACGTTCGCTCGG GCCTCTAACACACTGCACGAGATCAACCGAGATATGAGTAATACAGCGGTACGGG Global Alignment Score: 60 ---TGGTAGATTC-C--CACGAGATCTACCGAG-TATGAGTAGGGGGAC-GTTCGCT-C-GG | || | | | ||||||||| |||||| |||||||| || | || | | || GCCT-CTA-ACACACTGCACGAGATCAACCGAGATATGAGTA---ATACAG--CGGTACGGG Local Alignment Score: 105 CACGAGATCTACCGAG-TATGAGTA ||||||||| |||||| |||||||| CACGAGATCAACCGAGATATGAGTA 15

  16. Local Alignment AGATCAC A G A T C A C CGACAG 0 0 0 0 0 0 0 0 C 0 G 0 A 0 C 0 A 0 G 0 Local Alignment AGATCAC A G A T C A C CGACAG 0 0 0 0 0 0 0 0 C 0 0 G 0 A 0 C 0 A 0 G 0 16

  17. Local Alignment AGATCAC A G A T C A C CGACAG 0 0 0 0 0 0 0 0 C 0 0 0 G 0 A 0 C 0 A 0 G 0 Local Alignment AGATCAC A G A T C A C CGACAG 0 0 0 0 0 0 0 0 C 0 0 0 0 0 5 0 5 G 0 0 5 0 0 0 1 0 A 0 5 0 10 4 0 5 0 C 0 0 1 4 6 9 3 10 A 0 5 0 6 0 3 14 8 G 0 0 10 4 2 0 8 10 17

  18. Local Alignment AGATCAC A G A T C A C CGACAG 0 0 0 0 0 0 0 0 C 0 0 0 0 0 5 0 5 G 0 0 5 0 0 0 1 0 A 0 5 0 10 4 0 5 0 C 0 0 1 4 6 9 3 10 A A 0 5 0 6 0 3 14 8 | G 0 0 10 4 2 0 8 10 A Local Alignment AGATCAC A G A T C A C CGACAG 0 0 0 0 0 0 0 0 C 0 0 0 0 0 5 0 5 G 0 0 5 0 0 0 1 0 A 0 5 0 10 4 0 5 0 C 0 0 1 4 6 9 3 10 A CA 0 5 0 6 0 3 14 8 || G 0 0 10 4 2 0 8 10 CA 18

  19. Local Alignment AGATCAC A G A T C A C CGACAG 0 0 0 0 0 0 0 0 C 0 0 0 0 0 5 0 5 G 0 0 5 0 0 0 1 0 A 0 5 0 10 4 0 5 0 C 0 0 1 4 6 9 3 10 A GATCA 0 5 0 6 0 3 14 8 || || G 0 0 10 4 2 0 8 10 GA-CA Linear Gap Penalty With linear gap scoring, every gap has the same score AGGCTACGATCGATCGG | || | ||| || | A-GCCA---TCG-TC-G c c c c c c 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend