The Plan BLAST CSE 427 Scoring Computational Biology Another - PowerPoint PPT Presentation

The Plan • BLAST CSE 427 • Scoring Computational Biology • Another Bio Interlude: PCR & Sequencing BLAST Alignment score significance PCR and DNA sequencing 1 2 A Protein Structure: (Dihydrofolate Reductase) Sequence Evolution Nothing in Biology Makes Sense Except in the Light of Evolution – Theodosius Dobzhansky , 1973 • Changes happen at random • Deleterious/neutral/advantageous changes unlikely/possibly/likely spread widely in a population • Changes are less likely to be tolerated in positions involved in many/close interactions, e.g. – enzyme binding pocket – protein/protein interaction surface – … 3 4 1

BLAST: BLAST: What Basic Local Alignment Search Tool Altschul, Gish, Miller, Myers, Lipman, J Mol Biol 1990 • Input: • The most widely used comp bio tool – a query sequence (say, 300 residues) • Which is better: long mediocre match or a few – a data base to search for other sequences similar to the nearby, short, strong matches with the same total query (say, 10 6 - 10 9 residues) score? – a score matrix σ (r,s), giving cost of substituting r for s (& – score-wise, exactly equivalent perhaps gap costs) – biologically, later may be more interesting, & is common – various score thresholds & tuning parameters – at least, if must miss some, rather miss the former • Output: • BLAST is a heuristic emphasizing the later – “all” matches in data base above threshold – speed/sensitivity tradeoff: BLAST may miss former, but – “E-value” of each gains greatly in speed 5 6 BLAST: How BLAST: Example ≥ 7 (thresh 1 ) Idea: find parts of data base near a good match to some query deadly short subword of the query • Break query into overlapping words w i of small fixed de (11) -> de ee dd dq dk length (e.g. 3 aa or 11 nt) ea ( 9) -> ea • For each w i , find (empirically, ~50) “neighboring” words ad (10) -> ad sd v ij with ungapped score σ (w i , v ij ) > thresh 1 dl (10) -> dl di dm dv • Look up each v ij in database (via prebuilt index) -- ly (11) -> ly my iy vy fy lf i.e., exact match to short, high-scoring word ddgearlyk . . . DB • Extend each such “seed match” (bidirectional) • Report those scoring > thresh 2 , calculate E-values ddge 10 hits ≥ 10 (thresh 2 ) early 18 7 8 2

BLOSUM 62 BLAST Refinements A R N D C Q E G H I L K M F P S T W Y V A 4 -1 -2 -2 0 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -3 -2 0 R -1 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -3 -2 -3 • “Two hit heuristic” -- need 2 nearby, nonoverlapping, N -2 0 6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3 D -2 -2 1 6 -3 0 2 -1 -1 -3 -4 -1 -3 -3 -1 0 -1 -4 -3 -3 gapless hits before trying to extend either C 0 -3 -3 -3 9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 • “Gapped BLAST” -- run heuristic version of Smith- Q -1 1 0 0 -3 5 2 -2 0 -3 -2 1 0 -3 -1 0 -1 -2 -1 -2 E -1 0 0 2 -4 2 5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2 Waterman, bi-directional from hit, until score drops by G 0 -2 0 -1 -3 -2 -2 6 -2 -4 -4 -2 -3 -3 -2 0 -2 -2 -3 -3 fixed amount below max H -2 0 1 -1 -3 0 0 -2 8 -3 -3 -1 -2 -1 -2 -1 -2 -2 2 -3 I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 2 -3 1 0 -3 -2 -1 -3 -1 3 L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1 K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 -1 -3 -1 0 -1 -3 -2 -2 • PSI-BLAST -- For proteins, iterated search, using M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 0 -2 -1 -1 -1 -1 1 “weight matrix” pattern from initial pass to find weaker F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 -4 -2 -2 1 3 -1 P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 -1 -1 -4 -3 -2 matches in subsequent passes S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 1 -3 -2 -2 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -2 -2 0 W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 2 -3 Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 -1 10 V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 Hypothesis Testing: Significance of Alignments A Very Simple Example • Is “42” a good score? • Given: A coin, either fair (p(H)=1/2) or biased (p(H)=2/3) • Decide: which • Compared to what? • How? Flip it 5 times. Suppose outcome D = HHHTH • Null Model/Null Hypothesis M 0 : p(H)=1/2 • Usual approach: compared to a specific “null model”, • Alternative Model/Alt Hypothesis M 1 : p(H)=2/3 such as “random sequences” • Likelihoods: – P(D | M 0 ) = (1/2) (1/2) (1/2) (1/2) (1/2) = 1/32 – P(D | M 1 ) = (2/3) (2/3) (2/3) (1/3) (2/3) = 16/243 p ( D | M 1 ) p ( D | M 0 ) = 16/ 243 1/ 32 = 512 243 � 2.1 • Likelihood Ratio: I.e., alt model is ≈ 2.1x more likely than null model, given data 11 12 3

Hypothesis Testing, II p-values • Log of likelihood ratio is equivalent, often more • the p-value of such a test is the probability, assuming that the null model is true, of seeing data as extreme or more extreme convenient that what you actually observed – add logs instead of multiplying… • e.g., we observed 4 heads; p-value is prob of seeing 4 or 5 • “Likelihood Ratio Tests”: reject null if LLR > threshold heads in 5 tosses of a fair coin – LLR > 0 disfavors null, but higher threshold gives stronger • Why interesting? It measures probability that we would be evidence against making a mistake in rejecting null. • Neyman-Pearson Theorem: For a given error rate, • Usual scientific convention is to reject null only if p-value is < 0.05; sometimes demand p << 0.05 LRT is as good a test as any (subject to some fine print). • can analytically find p-value for simple problems like coins; often turn to simulation/permutation tests for more complex situations; as below 13 14 A Likelihood Ratio Test for Alignment Non- ad hoc Alignment Scores • Take alignments of homologs and look at frequency • Defn: two proteins are homologous if they are alike because of of x-y alignments vs freq of x, y overall shared ancestry; similarity by descent • Issues • suppose among proteins overall, residue x occurs with frequency p x – biased samples • then in a random alignment of 2 random proteins, you would expect – evolutionary distance p x y to find x aligned to y with prob p x p y 1 • suppose among homologs , x & y align with prob p xy � log 2 • BLOSUM approach p x p y • are seqs X & Y homologous? Which is – large collection of trusted alignments more likely, that the alignment reflects log p x i y i (the BLOCKS DB), � chance or homology? Use a likelihood – subsetted by similarity, e.g. ratio test. BLOSUM62 => 62% identity p x i p y i i – e.g. http://blocks.fhcrc.org/blocks-bin/getblock.pl?IPB013598 15 16 4

The Plan BLAST CSE 427 Scoring Computational Biology Another - PowerPoint PPT Presentation

The Plan BLAST CSE 427 Scoring Computational Biology Another Bio Interlude: PCR & Sequencing BLAST Alignment score significance PCR and DNA sequencing 1 2 A Protein Structure: (Dihydrofolate Reductase) Sequence Evolution

PHASE IA PLAN ULTIMATE PLAN 13 PHASE IB PLAN ULTIMATE PLAN 14 ULTIMATE PLAN ULTIMATE PLAN

NEW COURTHOUSE 1 ST FLOOR PLAN ANNEX 3rd FLOOR PLAN 2nd FLOOR PLAN BASEMENT FLOOR PLAN ANNEX

Medical Plan Comparison Central Care Plan Medical / Prescription Benefit Summary Advantage HDHP/HSA

Master Plan Open House #3 Preferred Alternative Master Plan Master Plan Process What is a

Site Plan May 2009 Site Plan February 2010 Site Plan May 5, 2010 Site Plan

To Plan or Not To Plan Failure to plan is a plan to fail Questions to ask yourself Who is this

Retirement Plan Changes Update Overview Retirement Plan Options at W&M No changes to

Impr Improvement ement Plan Orienta Plan Orientation tion Building Your Plan for Academic

San Pedro Community Plan San Pedro Community Plan Presentation Overview Community Plan

Local Development Plan Local Development Plan Local Development Plan Local Development Plan

Project Area Vilas Park Vilas Park Master Plan Vilas Park Master Plan Plan Maestro De Vilas Park

( ) + ( ) + ( ) + What is an Area Plan ( ) + What is an Area Plan A Comprehensive

803 SOUTH AVENUE AERIAL MAP SITE PLAN CRAWL SPACE & FOUNDATION PLAN FIRST FLOOR PLAN

Site plan Refused application Proposed Block plan Proposed site layout plan N Ground floor

DISTRICT PLANS 2 Financial Plan Instructional Plan Strategic Plan STRATEGIC PLAN GOALS 3

MESA/POWDERHORN PLAN UPDATE Meeting 3 October 23, 2012 Agenda 2 Plan Purpose Plan Area

Defining Defining Material and Geometry Material and Geometry Makoto Asai (SLAC Computing

Childrens Services Needs Analysis Evidence Base The APA Self Assessment and outcomes The

THE LOCAL STRATEGY TO IMPROVE LANCASTER COUNTYS WATER QUALITY January 16, 2019 Local plan

Partnership Strategies Amja jath Hayath Head - Procurement STAR CEMENT UAE

A Probabilistic Approach to Description of Molecular Biological Processes on DNA and Their

A Simple Method for Preparing Reference Slides of Seed Article in Journal of Range Management

Agilistas are killing the planet is Agile, Scrum, kanban or

Is It Latency or Do You Just Suck? Latency Can Kill: Precision and Deadline in Online Games

The Plan BLAST CSE 427 Scoring Computational Biology Another - PowerPoint PPT Presentation

The Plan BLAST CSE 427 Scoring Computational Biology Another Bio Interlude: PCR & Sequencing BLAST Alignment score significance PCR and DNA sequencing 1 2 A Protein Structure: (Dihydrofolate Reductase) Sequence Evolution

PHASE IA PLAN ULTIMATE PLAN 13 PHASE IB PLAN ULTIMATE PLAN 14 ULTIMATE PLAN ULTIMATE PLAN

NEW COURTHOUSE 1 ST FLOOR PLAN ANNEX 3rd FLOOR PLAN 2nd FLOOR PLAN BASEMENT FLOOR PLAN ANNEX

Medical Plan Comparison Central Care Plan Medical / Prescription Benefit Summary Advantage HDHP/HSA

Master Plan Open House #3 Preferred Alternative Master Plan Master Plan Process What is a

Site Plan May 2009 Site Plan February 2010 Site Plan May 5, 2010 Site Plan

To Plan or Not To Plan Failure to plan is a plan to fail Questions to ask yourself Who is this

Retirement Plan Changes Update Overview Retirement Plan Options at W&amp;M No changes to

Impr Improvement ement Plan Orienta Plan Orientation tion Building Your Plan for Academic

San Pedro Community Plan San Pedro Community Plan Presentation Overview Community Plan

Local Development Plan Local Development Plan Local Development Plan Local Development Plan

Project Area Vilas Park Vilas Park Master Plan Vilas Park Master Plan Plan Maestro De Vilas Park

( ) + ( ) + ( ) + What is an Area Plan ( ) + What is an Area Plan A Comprehensive

803 SOUTH AVENUE AERIAL MAP SITE PLAN CRAWL SPACE &amp; FOUNDATION PLAN FIRST FLOOR PLAN

Site plan Refused application Proposed Block plan Proposed site layout plan N Ground floor

DISTRICT PLANS 2 Financial Plan Instructional Plan Strategic Plan STRATEGIC PLAN GOALS 3

MESA/POWDERHORN PLAN UPDATE Meeting 3 October 23, 2012 Agenda 2 Plan Purpose Plan Area

Defining Defining Material and Geometry Material and Geometry Makoto Asai (SLAC Computing

Childrens Services Needs Analysis Evidence Base The APA Self Assessment and outcomes The

THE LOCAL STRATEGY TO IMPROVE LANCASTER COUNTYS WATER QUALITY January 16, 2019 Local plan

Partnership Strategies Amja jath Hayath Head - Procurement STAR CEMENT UAE

A Probabilistic Approach to Description of Molecular Biological Processes on DNA and Their

A Simple Method for Preparing Reference Slides of Seed Article in Journal of Range Management

Agilistas are killing the planet is Agile, Scrum, kanban or

Is It Latency or Do You Just Suck? Latency Can Kill: Precision and Deadline in Online Games

Retirement Plan Changes Update Overview Retirement Plan Options at W&M No changes to

803 SOUTH AVENUE AERIAL MAP SITE PLAN CRAWL SPACE & FOUNDATION PLAN FIRST FLOOR PLAN