Efficient representation of uncertainty in multiple sequence - - PowerPoint PPT Presentation

efficient representation of uncertainty in multiple
SMART_READER_LITE
LIVE PREVIEW

Efficient representation of uncertainty in multiple sequence - - PowerPoint PPT Presentation

Efficient representation of uncertainty in multiple sequence alignments using directed acyclic graphs JOSEPH L HERMAN, DM NOVK, RUNE LYNGSO, ADRIENN SZAB, ISTVN MIKLS AND JOTUN HEIN Describing the Problem #1


slide-1
SLIDE 1

Efficient representation of uncertainty in multiple sequence alignments using directed acyclic graphs

JOSEPH L HERMAN, ÁDÁM NOVÁK, RUNE LYNGSO, ADRIENN SZABÓ, ISTVÁN MIKLÓS AND JOTUN HEIN

slide-2
SLIDE 2

Describing the Problem

Black Box #1 #2 #1000000 … What do we do we do with a million alignments?!

slide-3
SLIDE 3

Sampling Procedure

slide-4
SLIDE 4

Point Estimate

slide-5
SLIDE 5

Solutions

You can:

  • Take one sample alignment with
  • Maximum likelihood
  • In this case, MAP
  • Maximize positional homologies
  • Minimize the gaps
  • Or… Try to process the samples you have, and obtain a good estimate!
slide-6
SLIDE 6

Columns

2 4 6 2 1 Even Points Column Vectors 2 4 7 2 +

slide-7
SLIDE 7

Objective

An Alignment is a path from:

2 2 2 2

slide-8
SLIDE 8

Example

slide-9
SLIDE 9

Crossovers

slide-10
SLIDE 10

Why do we care about ESS?

⇒ , , ⋯ ,

  • ̅
  • ⇒ !

, , ⋯ ,

  • ̅ ⇒ !

.

Bayesian Alignment methods use MCMC  So each point is highly correlated to the last point  ESS is smaller than the sample size a lot

1

slide-11
SLIDE 11

Crossovers and ESS

Suppose all alignments shared the even points A and B in their paths:

Number of Original Alignments = 4 Number of possible alignments = 4*3*3 A B

slide-12
SLIDE 12

ESS

slide-13
SLIDE 13

Equivalence classes

slide-14
SLIDE 14

Approximate Inference

  • The independence assumption
  • The accuracy  The nearest point w.r.t KL divergence metric
  • Consequences
slide-15
SLIDE 15

Pair marginals

Each site is independent of the very last ones(except the immediately preceding one)

slide-16
SLIDE 16

Pair HMMs

slide-17
SLIDE 17

Mean Field Approximation

Approximation

slide-18
SLIDE 18

Mean Field vs Pair Marginals

Pair Marginal  1 Mean Field

slide-19
SLIDE 19

Estimating Pair Marginals:

  • For each column  2 possible preceding columns.

Estimating Mean field:

  • Only approximate one parameter

Mean Field vs Pair Marginals

slide-20
SLIDE 20

Mean Field vs Pair Marginals

Just the distribution estimation error

slide-21
SLIDE 21

Mean Field vs Pair Marginals

slide-22
SLIDE 22

More realistic case!

slide-23
SLIDE 23

Point estimation

The quality of an alignment can be defined in terms of:

They argue that we only care about the positives, since the sample size is small!

slide-24
SLIDE 24

Point Estimate

slide-25
SLIDE 25

Loss function

slide-26
SLIDE 26

Simplifying the space of loss functions

slide-27
SLIDE 27

Simplifying a specific class

slide-28
SLIDE 28

The dynamic programming problem

Suppose the edges were weighted somehow

slide-29
SLIDE 29

A Sample running

slide-30
SLIDE 30

Some results