This Weeks Plan BLAST CSE 527 Scoring Computational Biology - PowerPoint PPT Presentation

This Week’s Plan • BLAST CSE 527 • Scoring Computational Biology • Weekly Bio Interlude: PCR & Sequencing Autumn 2006 Lectures 4-5: BLAST Alignment score significance PCR and DNA sequencing 1 2 Topoisomerase I A Protein Structure 3 4 http://www.rcsb.org/pdb/explore.do?structureId=1a36 1

BLAST: Sequence Evolution Basic Local Alignment Search Tool Altschul, Gish, Miller, Myers, Lipman, J Mol Biol 1990 Nothing in Biology Makes Sense Except in the Light of • The most widely used comp bio tool Evolution • Which is better: long mediocre match or a few – Theodosius Dobzhansky , 1973 nearby, short, strong matches with the same total • Changes happen at random score? • Deleterious/neutral/advantageous changes – score-wise, exactly equivalent unlikely/possibly/likely spread widely in a population – biologically, later may be more interesting, & is common • Changes are less likely to be tolerated in positions involved in – at least, if must miss some, rather miss the former many/close interactions, e.g. • BLAST is a heuristic emphasizing the later – enzyme binding pocket – speed/sensitivity tradeoff: BLAST may miss former, but – protein/protein interaction surface gains greatly in speed – … 5 6 BLAST: What BLAST: How • Input: Idea: find parts of data base near a good match to some short subword of the query – a query sequence (say, 300 residues) – a data base to search for other sequences similar to the • Break query into overlapping words w i of small fixed query (say, 10 6 - 10 9 residues) length (e.g. 3 aa or 11 nt) – a score matrix σ (r,s), giving cost of substituting r for s (& • For each w i , find (empirically, ~50) “neighboring” words perhaps gap costs) v ij with score σ (w i , v ij ) > thresh 1 – various score thresholds & tuning parameters • Look up each v ij in database (via prebuilt index) -- • Output: i.e., exact match to short, high-scoring word – “all” matches in data base above threshold • Extend each such “seed match” (bidirectional) – “E-value” of each • Report those scoring > thresh 2 , calculate E-values 7 8 2

BLOSUM 62 BLAST: Example A R N D C Q E G H I L K M F P S T W Y V A 4 -1 -2 -2 0 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -3 -2 0 R -1 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -3 -2 -3 ≥ 7 (thresh 1 ) N -2 0 6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3 query deadly D -2 -2 1 6 -3 0 2 -1 -1 -3 -4 -1 -3 -3 -1 0 -1 -4 -3 -3 C 0 -3 -3 -3 9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 de (11) -> de ee dd dq dk Q -1 1 0 0 -3 5 2 -2 0 -3 -2 1 0 -3 -1 0 -1 -2 -1 -2 E -1 0 0 2 -4 2 5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2 ea ( 9) -> ea G 0 -2 0 -1 -3 -2 -2 6 -2 -4 -4 -2 -3 -3 -2 0 -2 -2 -3 -3 H -2 0 1 -1 -3 0 0 -2 8 -3 -3 -1 -2 -1 -2 -1 -2 -2 2 -3 ad (10) -> ad sd I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 2 -3 1 0 -3 -2 -1 -3 -1 3 dl (10) -> dl di dm dv L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1 K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 -1 -3 -1 0 -1 -3 -2 -2 ly (11) -> ly my iy vy fy lf M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 0 -2 -1 -1 -1 -1 1 F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 -4 -2 -2 1 3 -1 ddgearlyk . . . P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 -1 -1 -4 -3 -2 DB S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 1 -3 -2 -2 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -2 -2 0 ddge 10 W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 2 -3 hits ≥ 10 (thresh 2 ) Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 -1 early 18 9 V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 BLAST Refinements Significance of Alignments • “Two hit heuristic” -- need 2 nearby, nonoverlapping, • Is “42” a good score? gapless hits before trying to extend either • Compared to what? • “Gapped BLAST” -- run heuristic version of Smith- Waterman, bi-directional from hit, until score drops by • Usual approach: compared to a specific “null model”, fixed amount below max such as “random sequences” • PSI-BLAST -- For proteins, iterated search, using “weight matrix” pattern from initial pass to find weaker matches in subsequent passes 11 12 3

Hypothesis Testing: Hypothesis Testing, II A Very Simple Example • Given: A coin, either fair (p(H)=1/2) or biased (p(H)=2/3) • Log of likelihood ratio is equivalent, often more • Decide: which convenient • How? Flip it 5 times. Suppose outcome D = HHHTH – add logs instead of multiplying… • Null Model/Null Hypothesis M 0 : p(H)=1/2 • “Likelihood Ratio Tests”: reject null if LLR > threshold • Alternative Model/Alt Hypothesis M 1 : p(H)=2/3 – LLR > 0 disfavors null, but higher threshold gives stronger • Likelihoods: evidence against – P(D | M 0 ) = (1/2) (1/2) (1/2) (1/2) (1/2) = 1/32 • Neyman-Pearson Theorem: For a given error rate, – P(D | M 1 ) = (2/3) (2/3) (2/3) (1/3) (2/3) = 16/243 LRT is as good a test as any. p ( D | M 1 ) p ( D | M 0 ) = 16/ 243 1/ 32 = 512 243 � 2.1 • Likelihood Ratio: I.e., alt model is ≈ 2.1x more likely than null model, given data 13 14 p-values A Likelihood Ratio Test for Alignment • the p-value of such a test is the probability, assuming that the • Defn: two proteins are homologous if they are alike because of null model is true, of seeing data as extreme or more extreme shared ancestry; similarity by descent that what you actually observed • e.g., we observed 4 heads; p-value is prob of seeing 4 or 5 • suppose among proteins overall, residue x occurs with frequency p x heads in 5 tosses of a fair coin • then in a random alignment of 2 random proteins, you would expect • Why interesting? It measures probability that we would be to find x aligned to y with prob p x p y making a mistake in rejecting null. • suppose among homologs , x & y align with prob p xy • Usual scientific convention is to reject null only if p-value is < • are seqs X & Y homologous? Which is 0.05; sometimes demand p << 0.05 more likely, that the alignment reflects log p x i y i • can analytically find p-value for simple problems like coins; often � chance or homology? Use a likelihood turn to simulation/permutation tests for more complex situations; ratio test. p x i p y i as below i 15 16 4

Non- ad hoc Alignment Scores ad hoc Alignment Scores? • Take alignments of homologs and look at frequency • Make up any scoring matrix you like of x-y alignments vs freq of x, y overall • Somewhat surprisingly, under pretty general • Issues assumptions ** , it is equivalent to the scores – biased samples constructed as above from some set of probabilities – evolutionary distance p xy , so you might as well understand what they are • BLOSUM approach p x y 1 – large collection of trusted alignments (the BLOCKS DB) � log 2 ** e.g., average scores should be negative, but you probably want – subsetted by similarity, e.g. p x p y that anyway, otherwise local alignments turn into global ones, BLOSUM62 => 62% identity and some score must be > 0, else best match is empty 17 18 BLOSUM 62 Overall Alignment Significance, I A Theoretical Approach: EVD A R N D C Q E G H I L K M F P S T W Y V A 4 -1 -2 -2 0 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -3 -2 0 R -1 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -3 -2 -3 • If X i is a random variable drawn from, say, a normal N -2 0 6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3 D -2 -2 1 6 -3 0 2 -1 -1 -3 -4 -1 -3 -3 -1 0 -1 -4 -3 -3 distribution with mean 0 and std. dev. 1, what can C 0 -3 -3 -3 9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 you say about distribution of y = max{ X i | 1 ≤ i ≤ N }? Q -1 1 0 0 -3 5 2 -2 0 -3 -2 1 0 -3 -1 0 -1 -2 -1 -2 E -1 0 0 2 -4 2 5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2 • Answer: it’s approximately an Extreme Value G 0 -2 0 -1 -3 -2 -2 6 -2 -4 -4 -2 -3 -3 -2 0 -2 -2 -3 -3 Distribution (EVD) H -2 0 1 -1 -3 0 0 -2 8 -3 -3 -1 -2 -1 -2 -1 -2 -2 2 -3 I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 2 -3 1 0 -3 -2 -1 -3 -1 3 L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1 P ( y � z ) � exp( � KNe � � z ) (*) K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 -1 -3 -1 0 -1 -3 -2 -2 M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 0 -2 -1 -1 -1 -1 1 • For ungapped local alignment of seqs x, y, N ~ |x|*|y| F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 -4 -2 -2 1 3 -1 P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 -1 -1 -4 -3 -2 λ , K depend on scores, etc., or can be estimated by S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 1 -3 -2 -2 curve-fitting random scores to (*). (cf. reading) T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -2 -2 0 W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 2 -3 Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 -1 20 V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 5

This Weeks Plan BLAST CSE 527 Scoring Computational Biology - PowerPoint PPT Presentation

This Weeks Plan BLAST CSE 527 Scoring Computational Biology Weekly Bio Interlude: PCR & Sequencing Autumn 2006 Lectures 4-5: BLAST Alignment score significance PCR and DNA sequencing 1 2 Topoisomerase I A Protein

MATH2130-F17 Week 13 Week 14 Week 15, Inner Farid Aliniaeifard Product Space CU BOULDER

Time Matters Week 7 Week 6 Prototyping + Needfinding Week 7 Week 8 Implementation Week 9

Math 610 Section 700 - Recitation week 3 week 4 week 6 week 8 TA: Peng Wei Office: Blocker

Galatians: week 3 Galatians 3:1-29 Week 1: Galatians 1:1-2:14 Week 2: Galatians 2:15-21 Week 3:

www. velpaprojects .com Finishing your property the VELPA way Time plan Week 1 - 4 Week 5 - 8

Vermont M nt Marble: A e: Americas s nt Stone Monument Sto Class S s Schedule e Week

Week 1: Christ: The Source of True Happiness Week 2: Happiness, the Gospel and Living Well Week

Islands of the Pacific Northwest One or Two Week Cruise Week 1: September 14 th 20 th Week 2:

Menu Day Week 1 Week 2 Week 3 Week 4 Monday +Pork and Apple Casserole or +Meat Loaf or Lamb

Case-X Progress Report By: MELRR Engineering Group #3 Weekly Updates Week Week Week Week

INSTRUCTION WEEK OF MAY 18 TH 2020 MS. KELLYS SIXTH GRADE GLOBAL THINKERS STUDENT OF THE WEEK:

INSTRUCTION WEEK OF MAY 18 TH 2020 MS. KELLYS SIXTH GRADE GLOBAL THINKERS STUDENT OF THE WEEK:

Week 4 Create content. Drive traffic. Pre-sell MVP. Week 5 - Email marketing funnel. More

PHASE IA PLAN ULTIMATE PLAN 13 PHASE IB PLAN ULTIMATE PLAN 14 ULTIMATE PLAN ULTIMATE PLAN

HOW TO APPLY HELSINKI DESIGN WEEK Week HELSINKI DESIGN WEEK Founded in 2005 and held anually in

Outdoors Adventure Trip to the South of France Rodillian Academy BBG Brayton Academy

edibles delivery los angeles venice beach Are They Finding Your Website? V2018.1.9 The Yak Group

Corridor Planning Group and Technical Advisory Group Meeting #4 November 30, 2010 November 2010

New Website Overview Accessibility and the BSU website University Relations and Marketing, Bowie

Making Distance Learning Courses Accessible to Students with Disabilities Presented By Adam

Township of Union Public School District English & Math, Grades 6-12 PARCC Presentation Mr.

West Orange Public Schools West Orange Public Schools MAP Information Night Thursday, May 16,

Investigation and Presentation of the Valuation Type Graphs and Plans to Design Protection System

Small Group Reading Instruction for ELs Incorporating Strategies into a Successful Routine Sarah

Sambuz

Useful Links

Newsletter

Mail Us

This Weeks Plan BLAST CSE 527 Scoring Computational Biology - PowerPoint PPT Presentation

This Weeks Plan BLAST CSE 527 Scoring Computational Biology Weekly Bio Interlude: PCR & Sequencing Autumn 2006 Lectures 4-5: BLAST Alignment score significance PCR and DNA sequencing 1 2 Topoisomerase I A Protein

MATH2130-F17 Week 13 Week 14 Week 15, Inner Farid Aliniaeifard Product Space CU BOULDER

Time Matters Week 7 Week 6 Prototyping + Needfinding Week 7 Week 8 Implementation Week 9

Math 610 Section 700 - Recitation week 3 week 4 week 6 week 8 TA: Peng Wei Office: Blocker

Galatians: week 3 Galatians 3:1-29 Week 1: Galatians 1:1-2:14 Week 2: Galatians 2:15-21 Week 3:

www. velpaprojects .com Finishing your property the VELPA way Time plan Week 1 - 4 Week 5 - 8

Vermont M nt Marble: A e: Americas s nt Stone Monument Sto Class S s Schedule e Week

Week 1: Christ: The Source of True Happiness Week 2: Happiness, the Gospel and Living Well Week

Islands of the Pacific Northwest One or Two Week Cruise Week 1: September 14 th 20 th Week 2:

Menu Day Week 1 Week 2 Week 3 Week 4 Monday +Pork and Apple Casserole or +Meat Loaf or Lamb

Case-X Progress Report By: MELRR Engineering Group #3 Weekly Updates Week Week Week Week

INSTRUCTION WEEK OF MAY 18 TH 2020 MS. KELLYS SIXTH GRADE GLOBAL THINKERS STUDENT OF THE WEEK:

INSTRUCTION WEEK OF MAY 18 TH 2020 MS. KELLYS SIXTH GRADE GLOBAL THINKERS STUDENT OF THE WEEK:

Week 4 Create content. Drive traffic. Pre-sell MVP. Week 5 - Email marketing funnel. More

PHASE IA PLAN ULTIMATE PLAN 13 PHASE IB PLAN ULTIMATE PLAN 14 ULTIMATE PLAN ULTIMATE PLAN

HOW TO APPLY HELSINKI DESIGN WEEK Week HELSINKI DESIGN WEEK Founded in 2005 and held anually in

Outdoors Adventure Trip to the South of France Rodillian Academy BBG Brayton Academy

edibles delivery los angeles venice beach Are They Finding Your Website? V2018.1.9 The Yak Group

Corridor Planning Group and Technical Advisory Group Meeting #4 November 30, 2010 November 2010

New Website Overview Accessibility and the BSU website University Relations and Marketing, Bowie

Making Distance Learning Courses Accessible to Students with Disabilities Presented By Adam

Township of Union Public School District English &amp; Math, Grades 6-12 PARCC Presentation Mr.

West Orange Public Schools West Orange Public Schools MAP Information Night Thursday, May 16,

Investigation and Presentation of the Valuation Type Graphs and Plans to Design Protection System

Small Group Reading Instruction for ELs Incorporating Strategies into a Successful Routine Sarah

Sambuz

Useful Links

Newsletter

Mail Us

Township of Union Public School District English & Math, Grades 6-12 PARCC Presentation Mr.