1
Alignments in Practice BLAST and CLUSTAL Introduction to - - PowerPoint PPT Presentation
Alignments in Practice BLAST and CLUSTAL Introduction to - - PowerPoint PPT Presentation
Alignments in Practice BLAST and CLUSTAL Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann Exercises: Udo Feldkamp, Michael Wurst 1 Overview Dot Plots Nucleotide BLAST Protein BLAST BLAST
2
Overview
- Dot Plots
- Nucleotide BLAST
- Protein BLAST
- BLAST Statistics
- BLAT
- CLUSTAL
- JalView
3
Dotter – Tool for Dot Plots
- http://www.cgb.ki.se/cgb/groups/sonnhammer/Dotter.html
- Dotlet: a Java applet for Dot Plots
4
Dot Plots
- Hemoglobin Alpha against Hemoglobin Beta
5
EBI Alignment Service
6
BLAST
- URL: http://www.ncbi.nlm.nih.gov/BLAST/
- Basic Local Alignment Search Tool
7
Choose the right BLAST
8
Nucleotide BLAST Interface
9
BLAST Parameters
- Expect threshold:
low [0.01] = strict high [100] = loose
- Word size:
speed vs. sensitivity high = faster low = slower, but more sensitive
10
Protein BLAST
11
Protein BLAST Parameters
12
Translated BLAST
- protein query against nucleotide database
– nucleotide sequence not unique – also consider reverse complement
- nucleotide query against protein database
– consider all 6 reading frames
13
BLAST Output
14
BLAST Output II
Database + Accession Link Bit score E-value Description
15
- How good / reliable is a hit found by BLAST?
- Raw score :=
score of the alignment according to scoring matrix and gap penalties
- Bit score :=
score (log2 units), length-normalized
- E-value :=
Number of hits of such or better score in a hypothetical database of random proteins of the same size
BLAST Statistics
16
More on Statistics
- Null model :=
random model describing sequences without intentional signal (here: pair of random sequences without intentional similarity)
- (single) p-value for observed score s :=
Prob(Score >= s) in the null model
- (multiple) p-value :=
Prob(Score >= s at least once)
17
BLAT
- BLAST-Like Alignment Tool
- index-based
- developed at UC Santa Cruz
- especially for searching in whole genomes
- very fast
- limited to nearly exact matches
18
UCSC Genome Browser + BLAT
19
CLUSTAL
20
What Clustal Did (“Output file”)
21
Clustal Results (pretty)
22
Clustal Results (“alignment file”)
CLUSTAL W (1.83) multiple sequence alignment FOS_RAT MMFSGFNADYEASSSRCSSASPAGDSLSYYHSPADSFSSMGSPVNTQDFCADLSVSSANF 60 FOS_MOUSE MMFSGFNADYEASSSRCSSASPAGDSLSYYHSPADSFSSMGSPVNTQDFCADLSVSSANF 60 FOS_HUMAN MMFSGFNADYEASSSRCSSASPAGDSLSYYHSPADSFSSMGSPVNAQDFCTDLAVSSANF 60 FOS_CHICK MMYQGFAGEYEAPSSRCSSASPAGDSLTYYPSPADSFSSMGSPVNSQDFCTDLAVSSANF 60 FOS_ZEBRAFISH MMFTSLNADCDASS-RCSTASPSGDSVGYY------------PLNQTQEFTDLSVSSASF 47 **: .: .: :*.* ***:***:***: ** *:* : :**:****.* FOS_RAT IPTVTAISTSPDLQWLVQPTLVSSVAPSQTRAPHPYGLPTPS-TGAYARAGVVKTMSGGR 119 FOS_MOUSE IPTVTAISTSPDLQWLVQPTLVSSVAPSQTRAPHPYGLPTQS-AGAYARAGMVKTVSGGR 119 FOS_HUMAN IPTVTAISTSPDLQWLVQPALVSSVAPSQTRAPHPFGVPAPS-AGAYSRAGVVKTMTGGR 119 FOS_CHICK VPTVTAISTSPDLQWLVQPTLISSVAPSQNRG-HPYGVPAPAPPAAYSRPAVLKAP-GGR 118 FOS_ZEBRAFISH VPTVTAISSCPDLQWMVQP-MISSAAPS-------NGAAQSYNPSSYPKMRVTGAK---- 95 :*******:.*****:*** ::**.*** * . ..:*.: : : FOS_RAT AQSIGRRGKVEQLSPEEEEKRRIRRERNKMAAAKCRNRRRELTDTLQAETDQLEDEKSAL 179 FOS_MOUSE AQSIGRRGKVEQLSPEEEEKRRIRRERNKMAAAKCRNRRRELTDTLQAETDQLEDEKSAL 179 FOS_HUMAN AQSIGRRGKVEQLSPEEEEKRRIRRERNKMAAAKCRNRRRELTDTLQAETDQLEDEKSAL 179 FOS_CHICK GQSIGRRGKVEQLSPEEEEKRRIRRERNKMAAAKCRNRRRELTDTLQAETDQLEEEKSAL 178 FOS_ZEBRAFISH --TSNKRSRSEQLSPEEEEKKRVRRERSKMAAAKCRNRRRELTDTLQAETDQLEDEKSAL 153 : .:*.: **********:*:****.**************************:***** FOS_RAT QTEIANLLKEKEKLEFILAAHRPACKIPNDLGFPEE----MSVTS-LDLTGGLPEATTPE 234 FOS_MOUSE QTEIANLLKEKEKLEFILAAHRPACKIPDDLGFPEE----MSVAS-LDLTGGLPEASTPE 234 FOS_HUMAN QTEIANLLKEKEKLEFILAAHRPACKIPDDLGFPEE----MSVAS-LDLTGGLPEVATPE 234 FOS_CHICK QAEIANLLKEKEKLEFILAAHRPACKMPEELRFSEE----LAAATALDLG----APSPAA 230 FOS_ZEBRAFISH QNDIANLLKEKERLEFILAAHKPICKIPADASFPEPSSSPMSSISVPEIVTTSVVSSTPN 213 * :*********:********:* **:* : *.* :: : :: :..
23
Clustal Guide Tree
24
Clustal Guide Tree
- Guide Tree is not a phylogenetic tree,
just a computational device
- Cladogram: edge lengths have no meaning
- Phylogram: edgle lengths correspond to
distances
25
JalView: Alignment Editor (start from the CLUSTAL web site)
26
Simple JalView Window
- Simple alignment editor (Java applet)
- Complex alignment editor (Java application)
– Web Start, or – Download installer
27
Starting or Installing JalView
www.jalview.org
28
Multiple Alignment @ BiBiServ
29
For Windows/MAC: QAlign2
- URL: http://gi.cebitec.uni-bielefeld.de/QAlign/
- Live Demo of QAlign2