Profile HMMs for Sequence Families COMP 571 Luay Nakhleh, Rice - PowerPoint PPT Presentation

Profile HMMs for Sequence Families COMP 571 Luay Nakhleh, Rice University

Sequence Families Functional biological sequences typically come in families Sequences in a family have diverged during evolution, but normally maintain the same or a related function Thus, identifying that a sequence belongs to a family tells about its function

HMM Profile Consensus modeling of the family using a probabilistic model Built from a given multiple alignment (assumed to be correct)

Sequences from a Globin Family Alignment of 7 globins The 8 alpha helices are shown as A-H above the alignment

Ungapped Score Matrices A natural probabilistic model for a conserved region would be to specify independent probabilities e i (a) of observing amino acid a in position i The probability of a new sequence x according to this model is L Y P ( x | M ) = e i ( x i ) i =1

Log-odds Ratio We are interested in the ratio of the probability to the probability of x under the random model L log e i ( x i ) X S = q x i i =1 Position specific score matrix (PSSM)

Adding Indels to Obtain a Profile HMM � Silent � deletion states Insertion states Match states Profile HMMs generalize pairwise alignment

Deriving Profile HMMs from Multiple Alignments Essentially, we want to build a model representing the consensus sequence for a family, rather than the sequence of any particular member Non-probabilistic profiles and profile HMMs

Non-probabilistic Profiles Gribskov, McLachlan, and Eisenberg 1987 No underlying probabilistic model, but rather assigned position specific scores for each match state and gap penalty The score for each consensus position is set to the average of the standard substitution scores from all the residues in the corresponding multiple sequence alignment column

Non-probabilistic Profiles The score for residue � a � in column 1 s(a,b) : standard substitution matrix

Non-probabilistic Profiles They also set gap penalties for each column using a heuristic equation that decreases the cost of a gap according to the length of the longest gap observed in the multiple alignment spanning the column

Problem with the Approach If we had an alignment with 100 sequences, all with a cysteine (C), at some position, the probability distribution for that column for an “ average” profile would be exactly the same as would be derived from a single sequence Doesn’t correspond to our expectation that the likelihood of a cysteine should go up as we see more confirming examples

Similar Problem with Gaps Scores for a deletion in columns 2 and 4 would be set to the same value More reasonable to set the probability of a new gap opening to be higher in column 4

Basic Profile HMM Parameterization A profile HMM defines a probability distribution over the whole space of sequences The aim of parameterization is to make this distribution peak around members of the family Parameters: probabilities and the length of the model

Model Length A simple rule that works well in practice is that columns that are more than half gap characters should be modeled by inserts

Probability Values A k ` E k ( a ) a k ` = e k ( a ) = P ` 0 A k ` 0 P a 0 E k ( a 0 ) indices over states k, ` : transition and emission probabilities a k ` , e k ( a ) : transition and emission frequencies A k ` , E k ( a ) :

Problem with the Approach Transitions and emissions that don’t appear in the training data set would acquire zero probability (would never be allowed) Solution: add pseudo-counts to the observed frequencies Simples pseudo-count is Laplace’s rule: add one to each frequency

Example

Example: Full Profile HMM

Searching with Profile HMMs One of the main purposes of developing profile HMMs is to use them to detect potential membership in a family We can either use Viterbi algorithm to get the most probable alignment or the forward algorithm to calculate the full probability of the sequence summed over all possible paths

Viterbi Algorithm

Forward Algorithm

Questions?

Profile HMMs for Sequence Families COMP 571 Luay Nakhleh, Rice - PowerPoint PPT Presentation

Profile HMMs for Sequence Families COMP 571 Luay Nakhleh, Rice University Sequence Families Functional biological sequences typically come in families Sequences in a family have diverged during evolution, but normally maintain the same or a

Algorithms for NLP IITP, Spring 2020 HMMs, POS tagging, NER Yulia Tsvetkov 1 Plan POS

HMMs for Acoustic Modeling (Part II) Lecture 3 CS 753 Instructor: Preethi Jyothi Recap: HMMs

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

An introduction to Patterns, An introduction to Patterns, Profiles, HMMs and Profiles, HMMs and

Pair HMMs and Pairwise Sequence Alignment COMP 571 Luay Nakhleh, Rice University Pair HMMs

Pair HMMs and Profile HMMs COMPSCI 260 Spring 2016 HMM

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

Latent Models: Sequence Models Beyond HMMs and Machine Translation Alignment CMSC 473/673 UMBC

Latent Models: Sequence Models Beyond HMMs and Machine Translation Alignment CMSC 473/673 UMBC

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or

HMMS ARTS PROGRAMS HMMS ARTS TEACHERS Mrs. DeMayo Mrs. Wilson Theatre & Dance Band &

Today CS 188: Artificial Intelligence HMMs, Particle Filters, and Applications HMMs

30 September 1-2 October 2016 123 consecutive pts treated by Hypofractionated Radiotherapy with

Topics Biological background Gene Regulation: Bioinformatic aspects Computational

Contrasted Penalized Integrative Analysis Shuangge Ma School of Public Health, Yale University

How Does Surviving War Age the Body and Mind? An Analysis of Subjective Age in Vietnam VHAS Kim

Chemistry 1000 Lecture 14: The group 13 metals Marc R. Roussel October 1, 2018 Marc R. Roussel

Objectives Review most common diseases in dermatology What the primary needs to know

Teledermatology In the world of dermatology-teledermatology What the primary care physician

Disclosures Old, the New, and the Controversial I have no relevant financial relationships

Profile HMMs for Sequence Families COMP 571 Luay Nakhleh, Rice - PowerPoint PPT Presentation

Profile HMMs for Sequence Families COMP 571 Luay Nakhleh, Rice University Sequence Families Functional biological sequences typically come in families Sequences in a family have diverged during evolution, but normally maintain the same or a

Algorithms for NLP IITP, Spring 2020 HMMs, POS tagging, NER Yulia Tsvetkov 1 Plan POS

HMMs for Acoustic Modeling (Part II) Lecture 3 CS 753 Instructor: Preethi Jyothi Recap: HMMs

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

An introduction to Patterns, An introduction to Patterns, Profiles, HMMs and Profiles, HMMs and

Pair HMMs and Pairwise Sequence Alignment COMP 571 Luay Nakhleh, Rice University Pair HMMs

Pair HMMs and Profile HMMs COMPSCI 260 Spring 2016 HMM

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

Latent Models: Sequence Models Beyond HMMs and Machine Translation Alignment CMSC 473/673 UMBC

Latent Models: Sequence Models Beyond HMMs and Machine Translation Alignment CMSC 473/673 UMBC

SEQUENCE ANALYSIS The term &quot; sequence analysis &quot; in biology implies subjecting a DNA or

HMMS ARTS PROGRAMS HMMS ARTS TEACHERS Mrs. DeMayo Mrs. Wilson Theatre &amp; Dance Band &amp;

Today CS 188: Artificial Intelligence HMMs, Particle Filters, and Applications HMMs

30 September 1-2 October 2016 123 consecutive pts treated by Hypofractionated Radiotherapy with

Topics Biological background Gene Regulation: Bioinformatic aspects Computational

Contrasted Penalized Integrative Analysis Shuangge Ma School of Public Health, Yale University

How Does Surviving War Age the Body and Mind? An Analysis of Subjective Age in Vietnam VHAS Kim

Chemistry 1000 Lecture 14: The group 13 metals Marc R. Roussel October 1, 2018 Marc R. Roussel

Objectives Review most common diseases in dermatology What the primary needs to know

Teledermatology In the world of dermatology-teledermatology What the primary care physician

Disclosures Old, the New, and the Controversial I have no relevant financial relationships

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or

HMMS ARTS PROGRAMS HMMS ARTS TEACHERS Mrs. DeMayo Mrs. Wilson Theatre & Dance Band &