Heuris'c)search:)FastA)and)BLAST ) COMPSCI)260))Spring)2016 ) - PowerPoint PPT Presentation

Heuris'c)search:)FastA)and)BLAST ) COMPSCI)260)–)Spring)2016 )

Previous)lectures) • Global)alignment) • Local)alignment) • Dynamic)programming)algorithms:)O( mn ))'me)

Database)searches) • O( mn ))algorithms)are)very)efficient) • But)this)is)too)slow)for)searching)large)databases)of)DNA)or) protein)sequences)) )

NCBI)genomic)data)

Database)searches) • INPUT:) – Database) – Query:))RGIKW) • OUTPUT:)) – sequences)similar)to)query)) • What)does)“similar”)mean)?))

Heuris'cs) • Heuris'c)methods) can)be)used)to)perform)fast)approxima'ons) ) – Tradeoff:)accuracy)of)solu'on)vs.)speed) • We)have)seen)another)tradeoff:) – Speed)vs.)space) • Popular)heuris'c)algorithms)are)the)result)of)our)willingness)for ) our)solu'on)to)loose)accuracy)in)return)for)a)speedup) • Loss)of)accuracy:)false)posi'ves)or)(more)commonly))false) nega'ves.))

Heuris'cs) • False)posi'ves) represent)results)returned)by)your)algorithm) as)successful)(in)our)case,)as)high)scoring)alignments))that) really)aren’t))) • False)nega'ves) are)just)the)opposite:)really)high)scoring) alignments)that)get)overlooked)by)the)algorithm) • Trade`off:) sensi'vity) vs.) selec'vity )

The)history)of)the)alignment)problem)) Global)Alignment)) Needleman`Wunsch)algorithm)(linear)gap)) 1972) Global)Alignment) Affine)gap)(Gotoh’s)solu'on)in)quadra'c)'me)) 1980) Exact)Local)Alignment) Smith`Waterman)algorithm) 1981) Heuris'c)Local) FastN)+)FastP)=)FastA) 1985)) Alignment) Heuris'c)Local) BLAST)1.0:)Basic)Local)Alignment)Search)Tool)) 1990) Alignment) Heuris'c)Local) BLAST)2.0:)Gapped)BLAST)) 1997) Alignment) run much faster, at the expense of possibly missing some significant hits (i.e. getting false negatives) due to the heuristics employed

Heuris'c)local)alignment)) • Heuris'c)local)alignment)algorithms)are)usually) seed$and$extend) approaches:) small)exact)matches) are)found,)which)are)then) extended)to)obtain) long)inexact)matches) ) • Preprocessing:) for)every)W`mer)) (e.g.)W=3),)list)every)loca'on)in)) the)database)where)it)occurs) – W`mer:)a)string)of)length)W) • Query :)) – Generate)W`mers)and)look)) them)up)in)the)database)) – Process)the)results)to)obtain) longer,)inexact)matches))

FastA) • FastA) (which)stands)for)Fast`All))is)a)combina'on)of) FastN) (nucleo'de))and) FastP) (protein))) • It)was)the)first)good)heuris'c)local)alignment)program,)and)it) was)capable)of)finding) – DNA:DNA) – DNA:protein)(by)inferring)transla'on))and)) – protein:protein)alignments) • The)original)paper:) )David)J.)Lipman)and)William)R.)Pearson)“Rapid)and)sensi've) protein)similarity)searches”)( Science' 1985))))

FastA)paper)`)Abstract) An algorithm was developed which facilitates the search for similarities between newly determined amino acid sequences and sequences already available in databases . Because of the algorithm's efficiency on many microcomputers, sensitive protein database searches may now become a routine procedure for molecular biologists. The method efficiently identifies regions of similar sequence and then scores the aligned identical and differing residues in those regions by means of an amino acid replaceability matrix. This matrix increases sensitivity by giving high scores to those amino acid replacements which occur frequently in evolution . The algorithm has been implemented in a computer program designed to search protein databases very rapidly . For example, comparison of a 200 -amino-acid sequence to the 500,000 residues in the National Biomedical Research Foundation library would take less than 2 minutes on a minicomputer, and less than 10 minutes on a microcomputer (IBM PC).

FastA) • General)idea:) – Choose)regions)of)the)two)sequences)that)look)promising) (have)some)degree)of)similarity))) – Compute)local)alignment)using)dynamic)programming)in) these)regions)) • Assump'on:)a)good)alignment)probably)has)some)exact) matches.)) • The)algorithm)treats)these)exact)matches)as) anchors) or) seeds) of)a)larger)alignment)with)some)gaps))

FastA) • Assump'on:)a)good)alignment)probably)has)some)exact) matches.)) • Is)this)true?) • Two)sequences)of)9)aa)each,)with)7)iden''es)) )There)must)be)a)stretch)of)3)aa)perfectly)conserved)) • Two)sequences)of)9)aa)with)at)most)1)mismatch) )There)must)be)a)stretch)of)4)aa)perfectly)conserved)) • More)generally:)pigeonhole)principle)) )If)you)have)2)pigeons)and)3)holes,)there)) must)be)at)least)one)hole)with)no)pigeon))

FastA)–)Step)1) • The)algorithm)begins)by)looking)for)occurrences)of)exact) matches)of)length) k) between)the)two)sequences) • These)are)referred)to)as) k>tuples) and)their)length)is)set)by)the) FastA)parameter) ktup)) • This)search)is)a)rela'vely)fast)one))

FastA)–)Step)2) • Score)diagonals)with)k`word)matches,)to)iden'fy)the)10)best) diagonals)) • Construct)an)(m)×)n))grid)and)place)a)dot)at)every)(i,)j))that) begins)a)k`tuple) • The)end)result)will)be)a)table)with)some)runs)of) diagonal)dots) ) • This)table)is)called)a) dot)matrix ) k=2 • Why)diagonal?)) – Consider)the)case)where)k)=)2)and)we)have)a)match)of)5)characters)in)a) row)star'ng)at)posi'on)(i,)j).)Not)only)will)there)be)a)dot)in)the)(i,)j)) posi'on,)but)also)in)the)(i)+)1,)j)+)1),) (i + 2, j + 2) and ) (i + 3, j + 3) posi'ons.)This)will)results)in)a)run)of)4)diagonal)dots. )

Dot)matrix,)k= 1 )(DNA)sequences)of)size)1875)and)2013)bp))

FastA)–)Step)2) • Score)diagonals)with)k`word)matches,)to)iden'fy)the)10)best) diagonals)) • Construct)an)(m)×)n))grid)and)place)a)dot)at)every)(i,)j))that) begins)a)k`tuple ) k=2 • The)con'guous)diagonal)runs)of)dots)in)the)dot)plot)represent) exact)matches,)so)we)want)to)find)long)diagonal)runs ) • We)want)to)this)very)efficiently)('me)and)space) ) • It)can)be)done)without)genera'ng)the)en're)matrix )

FastA)–)Outline) 1. Iden'fy)common)k`words)between)X)and)Y)) 2. Score)diagonals)with)k`word)matches,)to)iden'fy)the)10)best) diagonals)) 3. Rescore)ini'al)regions)with)a)subs'tu'on)matrix)) – Each)of)the) ten)diagonal)runs) with)highest)scores) (iden'fied)in)step)2))are)further)processed) – Within)each)of)these)diagonal)runs,)an) op'mal)local) alignment) is)computed)using)a)subs'tu'on)matrix) – These)alignments)are)called)) ini(al'regions )) – The)score)of)the)best)sub`alignment)is)found)in)this)phase) is)reported)as) init1))

FastA)–)Outline) 1. Iden'fy)common)k`words)between)X)and)Y)) 2. Score)diagonals)with)k`word)matches,)to)iden'fy)the)10)best) diagonals)) 3. Rescore)ini'al)regions)with)a)subs'tu'on)matrix)) 4. Join)ini'al)regions)using)gaps)

FastA)–)Step)4:)Join)ini'al)regions)using)gaps)) • Two)offset)diagonals)can)be)joined)with)a)gap,)if)the)resul'ng) alignment)has)a)higher)score)) – Separate)gap)open)and)extension)are)used)(affine)gap)) – Idea:)find)the)best`scoring) combina'on)of)diagonals) – The)score)of)this)alignment)is)reported)as) initn)) – initn )is)used)to)rank)the)library)sequences)

FastA)–)Outline) 1. Iden'fy)common)k`words)between)X)and)Y)) 2. Score)diagonals)with)k`word)matches,)to)iden'fy)the)10)best) diagonals)) 3. Rescore)ini'al)regions)with)a)subs'tu'on)score)matrix)) 4. Join)ini'al)regions)using)gaps) 5. Perform)dynamic)programming)to)find)the)final)alignments))

FastA)–)Step)5:)Local)alignment)in)the)highest`scoring)region) • Last)step)of)FastA:)perform)local)alignment) using)dynamic) programming )around)the)highest`scoring)region) highest`scoring)region) • Do)we)fill)in)the)en're)DP)matrix?)(Smith`Waterman)) • NO,)we)can)apply) banded' Smith`Waterman)) )

FastA)–)Step)5:)Local)alignment)in)the)highest`scoring)region) • Banded' Smith`Waterman)) • Idea:)A)high)quality)alignment)will)stay)close)to)the)diagonal) – If)we)are)only)interested)in)high)quality)alignments,)we)can ) skip)filling)in)cells)that)can't)possibly)lead)to)a)high)quality) alignment) • Region)to)be)aligned)covers)±w)) characters)from)the)highest`)scoring) diagonals)) • With)long)sequences,)this)region)is)) typically)very)small)compared)to)the)) whole)n)x)m)matrix) • The)score)of)the)resul'ng)alignment)) is)reported)as) opt ' Dynamic)programming)matrix) is)filled)only)for)the)green)region))

Proper'es)of)FastA)) • Fast) compared)to)local)alignment) using)dynamic)programming)only)) – Only)a)narrow)region)of)the)full) matrix)is)aligned) • For)DNA)sequence)comparisons,) the) ktup' parameter)can)range)from ) 1)to)6) • Increasing) ktup )decreases)the) number)of)hits)) – increases) specificity) (the) method)does)not)produce)many ) incorrect)results)) – decreases) sensi'vity) (produces) fewer)of)the)correct)results))

Proper'es)of)FastA)) • FastA)looks)for)ini'al)exact) matches)to)the)query)sequence) – But)two)proteins)can)have)very) different)amino)acid)sequences) and)s'll)be)biologically)similar)) – This)may)lead)to)a) lack)of) sensi'vity)for)diverged) sequences) • FastA)determines)a)highest)) scoring)region,) not' all)high)scoring) alignments)between)two) sequences.)Hence,)it)may)miss) instances)of)repeats)or)mul'ple) domains)shared)by)two)proteins)

Heuris'c)search:)FastA)and)BLAST ) COMPSCI)260))Spring)2016 ) - PowerPoint PPT Presentation

Heuris'c)search:)FastA)and)BLAST ) COMPSCI)260))Spring)2016 ) Previous)lectures) Global)alignment) Local)alignment) Dynamic)programming)algorithms:)O( mn ))'me) Database)searches) O( mn ))algorithms)are)very)efficient)

Rapid alignment methods: FASTA and BLAST p The biological problem p Search strategies p FASTA p

Chapter 7: Rapid alignment methods: FASTA and BLAST The biological problem l Search strategies

BLAST Business License/ Web Update Business License/ Web Update BLAST BLAST BLAST BLAST (

Chapter 7: Rapid alignment methods: FASTA and BLAST The biological problem l Search strategies

A few BLAST details Julin Maloof April 16, 2019 Slides courtesy of Venkatsean Sundaresan BLAST

L4: Blast: Alignment Scores etc. L4: Blast: Alignment Scores etc. Why is Blast Fast? Why is

Geno2pheno[coreceptor] 3 Geno2pheno[454] Geno2pheno[454] fasta-format sff-, or fasta-format

FASTA - Pearson and Lipman (88) Earlier version by the same authors, FASTP, appeared in 85

Lecture 17: Heuristic methods for sequence alignment: BLAST and FASTA Fall 2019 November 14,

Similarity Searches on Sequence Databases: BLAST, FASTA Lorenza Bordoli Swiss Institute of

Outline BLAST CSE 527 Scoring Computational Biology Weekly Bio Interlude: PCR & Sequencing

Blast Injuries and Landmines Travelling positive pressure wave C. Giannou Hat Yai July 2012

Alignments in Practice BLAST and CLUSTAL Introduction to Bioinformatics Dortmund, 16.-20.07.2007

Software Verification with BLAST Model Checking Blast Motivation Rigorous Sofware Development

Recap: Search Example: Pancake Problem Search problem:

Local Search Toolbox so far Uninformed search BFS, DFS,

Global and local alignments Global vs. local alignments Global: align all nucleotides

BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio. 1990. One of the

Sleep Modes Pacemaker Training Program The heart benefits from a decreased heart rate

Information Theory and Coding i f s f f Image, Video and Audio Compression Markus

Genome 559 Introduction to Statistical and Computational Genomics Winter 2010 Lecture 14a: BLAST

Algoritmi per la Bioinformatica Zsuzsanna Lipt ak Laurea Magistrale Bioinformatica e

Whole genome alignments http://faculty.washington.edu/jht/GS559_2013/ Genome 559: Introduction to

Sequence Alignment Sequence Alignment AGGCTATCACCTGACCTCCAGGCCGATGCCC

Sambuz

Useful Links

Newsletter

Mail Us

Heuris'c)search:)FastA)and)BLAST ) COMPSCI)260))Spring)2016 ) - PowerPoint PPT Presentation

Heuris'c)search:)FastA)and)BLAST ) COMPSCI)260))Spring)2016 ) Previous)lectures) Global)alignment) Local)alignment) Dynamic)programming)algorithms:)O( mn ))'me) Database)searches) O( mn ))algorithms)are)very)efficient)

Rapid alignment methods: FASTA and BLAST p The biological problem p Search strategies p FASTA p

Chapter 7: Rapid alignment methods: FASTA and BLAST The biological problem l Search strategies

BLAST Business License/ Web Update Business License/ Web Update BLAST BLAST BLAST BLAST (

Chapter 7: Rapid alignment methods: FASTA and BLAST The biological problem l Search strategies

A few BLAST details Julin Maloof April 16, 2019 Slides courtesy of Venkatsean Sundaresan BLAST

L4: Blast: Alignment Scores etc. L4: Blast: Alignment Scores etc. Why is Blast Fast? Why is

Geno2pheno[coreceptor] 3 Geno2pheno[454] Geno2pheno[454] fasta-format sff-, or fasta-format

FASTA - Pearson and Lipman (88) Earlier version by the same authors, FASTP, appeared in 85

Lecture 17: Heuristic methods for sequence alignment: BLAST and FASTA Fall 2019 November 14,

Similarity Searches on Sequence Databases: BLAST, FASTA Lorenza Bordoli Swiss Institute of

Outline BLAST CSE 527 Scoring Computational Biology Weekly Bio Interlude: PCR &amp; Sequencing

Blast Injuries and Landmines Travelling positive pressure wave C. Giannou Hat Yai July 2012

Alignments in Practice BLAST and CLUSTAL Introduction to Bioinformatics Dortmund, 16.-20.07.2007

Software Verification with BLAST Model Checking Blast Motivation Rigorous Sofware Development

Recap: Search Example: Pancake Problem Search problem:

Local Search Toolbox so far Uninformed search BFS, DFS,

Global and local alignments Global vs. local alignments Global: align all nucleotides

BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio. 1990. One of the

Sleep Modes Pacemaker Training Program The heart benefits from a decreased heart rate

Information Theory and Coding i f s f f Image, Video and Audio Compression Markus

Genome 559 Introduction to Statistical and Computational Genomics Winter 2010 Lecture 14a: BLAST

Algoritmi per la Bioinformatica Zsuzsanna Lipt ak Laurea Magistrale Bioinformatica e

Whole genome alignments http://faculty.washington.edu/jht/GS559_2013/ Genome 559: Introduction to

Sequence Alignment Sequence Alignment AGGCTATCACCTGACCTCCAGGCCGATGCCC

Sambuz

Useful Links

Newsletter

Mail Us

Outline BLAST CSE 527 Scoring Computational Biology Weekly Bio Interlude: PCR & Sequencing