an introduction to multiple alignments
play

An introduction to multiple alignments original version by Cdric - PDF document

An introduction to multiple alignments original version by Cdric Notredame, updated by Laurent Falquet Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01 Overview Multiple alignments How-to, Goal,


  1. An introduction to multiple alignments original version by Cédric Notredame, updated by Laurent Falquet Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01 Overview � Multiple alignments � How-to, Goal, problems, use � Patterns � PROSITE database, syntax, use � PSI-BLAST � BLAST, matrices, use � [ Profiles/HMMs ] … Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01

  2. Overview � What are multiple alignments? � How can I use my alignments? � How does the computer align the sequences? � The progressive alignment algorithm � What are the difficulties? � Pre-requisite? � How can we compare sequences? � How can we align sequences? Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01 Sometimes two sequences are not enough The man with TWO watches NEVER knows the exact time Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01

  3. What is a multiple sequence alignment? � What can it do for me? � How can I produce one of these? � How can I use it? chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM--------- mouse AKDDRIRYDNEMKSWEEQMAE * : .* . : Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01 What is a multiple sequence alignment? � Structural/biochemical criteria � Residues playing a similar role end up in the same column. � Evolution criteria � Residues having the same ancestor end up in the same column. chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM--------- mouse AKDDRIRYDNEMKSWEEQMAE * : .* . : Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01

  4. Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01 How can I use a multiple alignment? chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP unknown -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: * chite AATAKQNYIRALQEYERNGG- Less Than 30 % id wheat ANKLKGEYNKAIAAYNKGESA BUT trybr AEKDKERYKREM--------- unknown AKDDRIRYDNEMKSWEEQMAE Conserved where it MATTERS * : .* . : Extrapolation Homology? SwissProt Unkown Sequence Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01

  5. How can I use a multiple alignment? chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM--------- mouse AKDDRIRYDNEMKSWEEQMAE * : .* . : Extrapolation Prosite Patterns P-K-R-[PA]-x(1)-[ST]… Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01 How can I use a multiple alignment? chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-IQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA L? trybr AEKDKERYKREM--------- K>R mouse AKDDRIRYDNEMKSWEEQMAE * : .* . : Extrapolation A F D E Prosite Patterns F G H Q I Prosite Profiles -More Sensitive V L -More Specific W Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01

  6. PROSITE profile (see also HMMs) A Substitution Cost For Every Amino Acid, At Every Position Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01 How can I use a multiple alignment? chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM--------- mouse AKDDRIRYDNEMKSWEEQMAE * : .* . : Phylogeny chite wheat -Evolution trybr -Paralogy/Orthology mouse Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01

  7. How can I use a multiple alignment? chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM--------- mouse AKDDRIRYDNEMKSWEEQMAE * : .* . : Phylogeny Column Constraint � Struc. Prediction Evolution Constraint � Structure Constraint Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01 How can I use a multiple alignment? chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM--------- mouse AKDDRIRYDNEMKSWEEQMAE * : .* . : PsiPred or PhD Phylogeny For secondary Structure Prediction: Struc. Prediction 75% Accurate. Threading : is improving but is not yet as good. Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01

  8. How can I use a multiple alignment? chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM--------- mouse AKDDRIRYDNEMKSWEEQMAE * : .* . : Phylogeny Automatic Multiple Struc. Prediction Sequence Alignment methods are not always perfect… Caution! Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01 Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01

  9. The problem � why is it difficult to compute a multiple sequence alignment? Biology What is a good alignment? chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: * Computation What is the good alignment? Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01 The problem � why is it difficult to compute a multiple sequence alignment? CIRCULAR PROBLEM.... Good Good Sequences Alignment Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01

  10. The problem � Same as pairwise alignment problem � We do NOT know how sequences evolve. � We do NOT understand the relation between structures and sequences. � We would NOT recognize the “correct” alignment if we had it IN FRONT of our eyes… Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01 The Charlie Chaplin paradox Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01

  11. What do I need to know to make a good multiple alignment? � How do sequences evolve? � How does the computer align the sequences? � How can I choose my sequences? � What is the best program? � How can I use my alignment? Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01 An alignment is a story Deletion Insertion ADKPKRPLSAYMLWLN ADKPRRP---LS-YMLWLN ADKPKRPKPRLSAYMLWLN Mutation Mutations ADKPKRPLSAYMLWLN ADKPKRPLSAYMLWLN + Selection ADKPKRPKPRLSAYMLWLN ADKPRRPLS-YMLWLN Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01

  12. Homology � Same sequences -> same origin? -> same function? - > same 3D fold? %Sequence Identity Same 3D Fold 30% Twilight Zone Length 100 Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01 Residues and mutations � All residues are equal, but some more than others… M C P Small L V A G G I Aliphatic C T S D N H K Y E F Q W R Aromatic Hydrophobic Polar Accurate matrices are data driven rather than knowledge driven Swiss Institute of Bioinf ormatics Institut Suisse de Bioinf ormatique CN+LF-2006.01

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend