Efficient representation of uncertainty in multiple sequence - - PowerPoint PPT Presentation
Efficient representation of uncertainty in multiple sequence - - PowerPoint PPT Presentation
Efficient representation of uncertainty in multiple sequence alignments using directed acyclic graphs JOSEPH L HERMAN, DM NOVK, RUNE LYNGSO, ADRIENN SZAB, ISTVN MIKLS AND JOTUN HEIN Describing the Problem #1
Describing the Problem
Black Box #1 #2 #1000000 … What do we do we do with a million alignments?!
Sampling Procedure
Point Estimate
Solutions
You can:
- Take one sample alignment with
- Maximum likelihood
- In this case, MAP
- Maximize positional homologies
- Minimize the gaps
- …
- Or… Try to process the samples you have, and obtain a good estimate!
Columns
2 4 6 2 1 Even Points Column Vectors 2 4 7 2 +
Objective
An Alignment is a path from:
2 2 2 2
Example
Crossovers
Why do we care about ESS?
⇒ , , ⋯ ,
- ̅
- ⇒ !
, , ⋯ ,
- ̅ ⇒ !
.
Bayesian Alignment methods use MCMC So each point is highly correlated to the last point ESS is smaller than the sample size a lot
1
Crossovers and ESS
Suppose all alignments shared the even points A and B in their paths:
Number of Original Alignments = 4 Number of possible alignments = 4*3*3 A B
ESS
Equivalence classes
Approximate Inference
- The independence assumption
- The accuracy The nearest point w.r.t KL divergence metric
- Consequences
Pair marginals
Each site is independent of the very last ones(except the immediately preceding one)
Pair HMMs
Mean Field Approximation
Approximation
Mean Field vs Pair Marginals
Pair Marginal 1 Mean Field
Estimating Pair Marginals:
- For each column 2 possible preceding columns.
Estimating Mean field:
- Only approximate one parameter
Mean Field vs Pair Marginals
Mean Field vs Pair Marginals
Just the distribution estimation error
Mean Field vs Pair Marginals
More realistic case!
Point estimation
The quality of an alignment can be defined in terms of:
They argue that we only care about the positives, since the sample size is small!
Point Estimate
Loss function
Simplifying the space of loss functions
Simplifying a specific class
The dynamic programming problem
Suppose the edges were weighted somehow