The median problem for the reversal distance in circular bacterial - - PowerPoint PPT Presentation

the median problem for the reversal distance in circular
SMART_READER_LITE
LIVE PREVIEW

The median problem for the reversal distance in circular bacterial - - PowerPoint PPT Presentation

Introduction Methods Conclusion The median problem for the reversal distance in circular bacterial genomes E. Ohlebusch, M.I. Abouelhoda, K. Hockel, J. Stallkamp University of Ulm, Germany CPM 2005 The median problem for the reversal


slide-1
SLIDE 1

Introduction Methods Conclusion

The median problem for the reversal distance in circular bacterial genomes

  • E. Ohlebusch, M.I. Abouelhoda, K. Hockel, J. Stallkamp

University of Ulm, Germany

CPM 2005

The median problem for the reversal distance in circular bacterial genomes

slide-2
SLIDE 2

Introduction Methods Conclusion General Problem Distances Specific Problem

Median Problem

Given 3 genomes G1, G2, and G3, find a genome G such that dm = 3

i=1 d (G, Gi) is minimized for a distance measure d. G1 G2 G3 G

Needed: distance between two genomes G = (π1, . . . , πn) and G ′ = (ρ1, . . . , ρn) on the same set of genes {1, . . . , n}

The median problem for the reversal distance in circular bacterial genomes

slide-3
SLIDE 3

Introduction Methods Conclusion General Problem Distances Specific Problem

Rearrangements

◮ genomes are subject to rearrangements ◮ less frequent than local changes ◮ information about the evolutionary distance between genomes ◮ affect large parts of the DNA ◮ change the order / orientation of involved genes

The median problem for the reversal distance in circular bacterial genomes

slide-4
SLIDE 4

Introduction Methods Conclusion General Problem Distances Specific Problem

Example: Transposition

1 2 3 4 5 6 3 1 2 4 5 6

The median problem for the reversal distance in circular bacterial genomes

slide-5
SLIDE 5

Introduction Methods Conclusion General Problem Distances Specific Problem

Example: Reversal

1 2 −3 4 5 6 −7 4 5 6 −7 3 −2 −1

The median problem for the reversal distance in circular bacterial genomes

slide-6
SLIDE 6

Introduction Methods Conclusion General Problem Distances Specific Problem

Rearrangement Distance

◮ minimum number of rearrangements needed to transform

genome G into genome G ′

◮ advantage: good estimation of evolutionary distance ◮ drawback: complexity not known; we can’t compute it

efficiently [Hartman2003]

The median problem for the reversal distance in circular bacterial genomes

slide-7
SLIDE 7

Introduction Methods Conclusion General Problem Distances Specific Problem

Reversal Distance

◮ minimum number of reversals needed to transform G into G ′ ◮ advantage: can be computed in 0(n)

[Bader, Moret, Yan2001; Bergeron, Mixtacki, Stoye2004]

◮ drawback: other operations are not considered (e.g.

transpositions)

The median problem for the reversal distance in circular bacterial genomes

slide-8
SLIDE 8

Introduction Methods Conclusion General Problem Distances Specific Problem

Breakpoints

◮ G = (π1, . . . , πn), G ′ = (γ1, . . . , γn) on the same set of genes

{1, . . . , n}

◮ two genes πi πi+1 determine a breakpoint in G w.r.t G ′ ⇔

neither πi precedes πi+1 nor −πi+1 precedes −πi in G ′

◮ example:

+7 +6 +5 +7 +6 +5 +1 −3 −2 +4 +1 +2 +3 +4

The median problem for the reversal distance in circular bacterial genomes

slide-9
SLIDE 9

Introduction Methods Conclusion General Problem Distances Specific Problem

Breakpoint Distance

◮ number of breakpoints between two genomes/permutations ◮ advantage: easy to compute ◮ draw back: only rough estimation of number of

rearrangements [Moret, Siepel, Tang, Liu2002]

The median problem for the reversal distance in circular bacterial genomes

slide-10
SLIDE 10

Introduction Methods Conclusion General Problem Distances Specific Problem

Bad and Good News

The median problem is NP-hard for both the breakpoint and the reversal distance! [Caprara1999; Pe′er, Shamir1998] Using biological constraints can simplify the problem significantly.

The median problem for the reversal distance in circular bacterial genomes

slide-11
SLIDE 11

Introduction Methods Conclusion General Problem Distances Specific Problem

Circular Bacterial Genomes

Predominant: reversals around the origin/terminus of replication [Eisenetal.2000; Tiller, Collins2000]

+4 −5 −2 −3 +4 −5 +2 +3 +1 −1 −6 −6 O O T T

◮ ρ(3) :reversal centered around origin (analogous: ρ(i) ) ◮ genes keep their distance to origin/terminus ◮ genes change their orientation

The median problem for the reversal distance in circular bacterial genomes

slide-12
SLIDE 12

Introduction Methods Conclusion General Problem Distances Specific Problem

Example: Chlamydiae (pneumoniae, trachomatis)

200000 400000 600000 800000 1e+06 1.2e+06 200000 400000 600000 800000 1e+06 1.2e+06 1.4e+06

The median problem for the reversal distance in circular bacterial genomes

slide-13
SLIDE 13

Introduction Methods Conclusion Computing the Reversal Distance Computing the Median

Genome Representation

◮ bit vector:

◮ 1: right side ◮ 0: left side

◮ orientation vector:

◮ +: forward, if right hand side; reverse, if left hand side ◮ −: reverse, if right hand side; forward, if left hand side

◮ representation of genome by bit vector

The median problem for the reversal distance in circular bacterial genomes

slide-14
SLIDE 14

Introduction Methods Conclusion Computing the Reversal Distance Computing the Median

Genome as Bit Vector

+1 +2 −3 −4 −5 +6 −7 −8 +9 +10

O T

(+10, 0, 0, 0, +6, −5, 0, 0, +2, 0 | +1, 0, −3, −4, 0, 0, −7, −8, +9, 0)

◮ bit vector: (1, 0, 1, 1, 0, 0, 1, 1, 1, 0) ◮ orientation vector:

(+, −, −, −, +, −, −, −, +, −)

The median problem for the reversal distance in circular bacterial genomes

slide-15
SLIDE 15

Introduction Methods Conclusion Computing the Reversal Distance Computing the Median

Only around Origin

procedure rd O(G, G ′) determine the breakpoints (i1, i1 + 1), . . . , (ik, ik + 1) between G and G ′ if Gρ(i1) · · · ρ(ik) = G ′ then return k else return k + 1

The median problem for the reversal distance in circular bacterial genomes

slide-16
SLIDE 16

Introduction Methods Conclusion Computing the Reversal Distance Computing the Median

Correctness

◮ reversal ρ(i) doesn’t change any existing breakpoints except

at position (i, i + 1)

◮ (i, i + 1) breakpoint ⇒ ρ(i) removes this breakpoint ◮ (i, i + 1) NO breakpoint ⇒ ρ(i) creates a new breakpoint

The median problem for the reversal distance in circular bacterial genomes

slide-17
SLIDE 17

Introduction Methods Conclusion Computing the Reversal Distance Computing the Median

Some Definitions

Definition

Let G = (b1, b2, b3, . . . , bn) and G ′ = (b′

1, b′ 2, b′ 3, . . . , b′ n) be two

circular genomes. An interval [i..j] of indices (where 1 ≤ i ≤ j ≤ n) is called a strip if bk = b′

k for all i ≤ k ≤ j, bi−1 = b′ i−1 if i = 1, and bj+1 = b′ j+1 if

j = n.

Definition

Let b1, b2, b3 ∈ {0, 1} . majority

  • b1, b2, b3

=

  • 1

if 3

j=1 bj ≥ 2

  • therwise

The median problem for the reversal distance in circular bacterial genomes

slide-18
SLIDE 18

Introduction Methods Conclusion Computing the Reversal Distance Computing the Median

Reversal Distance

procedure rd(G, G ′) if G and G ′ do not have a breakpoint then if G = G ′ then return 0 else return 1 else choose a strip [i..j] kl := rd O(G[1..i − 1], G ′[1..i − 1]) kr := rd T(G[j + 1..n], G ′[j + 1..n]) return (kl + kr)

strip[i..j] O T rd_O rd_T The median problem for the reversal distance in circular bacterial genomes

slide-19
SLIDE 19

Introduction Methods Conclusion Computing the Reversal Distance Computing the Median

The Problem

◮ Input: 3 genomes G1, G2 and G3, represented by their

bitvectors

◮ Output: median G, which minimizes dm = 3 i=1 rd (G, Gi) ◮ Restrictions:

◮ same set of genes in all 3 genomes ◮ only reversals around origin / terminus of replication

◮ can be computed in O(n)

The median problem for the reversal distance in circular bacterial genomes

slide-20
SLIDE 20

Introduction Methods Conclusion Computing the Reversal Distance Computing the Median

Around Origin Only

procedure median O(G1, G2, G3) / ⋆ Gj = (bj

1, bj 2, bj 3, . . . , bj n) ⋆ /

d := 0 for i := n downto 1 do b := majority(b1

i , b2 i , b3 i )

if there is a j, 1 ≤ j ≤ 3, such that bj

i = b then

Gj := Gjρ(i) d := d + 1 return (G1, d)

Definition

majority

  • b1, b2, b3

=

  • 1

if 3

j=1 bj ≥ 2

  • therwise

G1 G2 G3 G

The median problem for the reversal distance in circular bacterial genomes

slide-21
SLIDE 21

Introduction Methods Conclusion Computing the Reversal Distance Computing the Median

Around Origin Only: Example

−2 −2 +2 −3 +4 +4 −4 −5 −5

T T T O O O

G1 G2 G3

−5 +3 +3 −1 −1 −1 0 1 1 0 1 0 0 1 0 1 0 1 0 1 1

The median problem for the reversal distance in circular bacterial genomes

slide-22
SLIDE 22

Introduction Methods Conclusion Computing the Reversal Distance Computing the Median

Around Origin Only: Example

−2 −2 +2 −3 +4 +4 −4 −5 −5

T T T O O O

G1 G2 G3

−5 +3 +3 −1 −1 −1 0 1 1 0 1 0 0 1 0 1 0 1 0 1 1

The median problem for the reversal distance in circular bacterial genomes

slide-23
SLIDE 23

Introduction Methods Conclusion Computing the Reversal Distance Computing the Median

Around Origin Only: Example

−2 +2 +4 +4 −5 −5

T T T O O O

G1 G2 G3

−5 +2 +3 +4 +3 +3 −1 −1 +1 0 1 1 0 1 0 0 1 0 1 1 0 1 0 1

The median problem for the reversal distance in circular bacterial genomes

slide-24
SLIDE 24

Introduction Methods Conclusion Computing the Reversal Distance Computing the Median

Around Origin Only: Example

−2 +2 +4 +4 −5 −5

T T T O O O

G1 G2 G3

−5 +4 −2 +3 +3 +3 −1 −1 −1 0 1 1 0 1 0 0 1 0 1 0 1 1 0 1

The median problem for the reversal distance in circular bacterial genomes

slide-25
SLIDE 25

Introduction Methods Conclusion Computing the Reversal Distance Computing the Median

Around Origin Only: Example

−2 +4 +4 −5 −5

T T T O O O

G1 G2 G3

−5 +4 −2 −2 +3 +3 +3 −1 +1 −1 0 1 1 0 1 1 1 1 0 1 0 1 1 0 1

The median problem for the reversal distance in circular bacterial genomes

slide-26
SLIDE 26

Introduction Methods Conclusion Computing the Reversal Distance Computing the Median

Around Origin Only: Example

−2 +4 +4 −5 −5

T T T O O O

G1 G2 G3

−5 +4 −2 −2 +3 +3 +3 −1 −1 −1 0 1 1 0 1 0 1 1 0 1 0 1 1 0 1

The median problem for the reversal distance in circular bacterial genomes

slide-27
SLIDE 27

Introduction Methods Conclusion Computing the Reversal Distance Computing the Median

With Common Bit

procedure median cb(G1, G2, G3) determine a common bit i of G1, G2, and G3 (Gl, dl) := median O(G1[1..i − 1], G2[1..i − 1], G3[1..i − 1]) (Gr, dr) := median T(G1[i + 1..n], G2[i + 1..n], G3[i + 1..n]) return (GlG1[i]Gr, dl + dr)

O T median_T common bit i median_O

The median problem for the reversal distance in circular bacterial genomes

slide-28
SLIDE 28

Introduction Methods Conclusion Computing the Reversal Distance Computing the Median

Without Common Bit

procedure median ncb(G1, G2, G3) if two genomes coincide, say Gi = Gj with i = j then return (Gi, 1) else if one of the genomes is the inverse of another, say Gi = inv(Gj) with i = j then return (Gi, 1 + rd(Gi, Gk)) where k ∈ {1, 2, 3} \ {i, j} else / ⋆ Gi = Gj and Gi = inv(Gj) for all i = j ⋆ / (G ′, d′) := median cb(inv(G1), G2, G3) d′

1 := rd(inv(G1), G2) + rd(inv(G1), G3)

if d′

1 = d′ then return (G1, d′)

else return (G ′, d′)

The median problem for the reversal distance in circular bacterial genomes

slide-29
SLIDE 29

Introduction Methods Conclusion

Summary

◮ general median problem is NP-hard for both the reversal and

the breakpoint distance

◮ circular bacterial genomes: reversals are centered around

  • rigin / terminus of replication

◮ using this biological constraint leads to an O(n) algorithm

The median problem for the reversal distance in circular bacterial genomes

slide-30
SLIDE 30

Introduction Methods Conclusion

Future Work

◮ shortcomings:

◮ position of origin / terminus has to be known ◮ restriction to reversals

◮ future work:

◮ including tranpositions ◮ including reversals affecting only one single gene (any position)

[Lefebvre, El − Mabrouk, Tillier, Sankoff2003]

The median problem for the reversal distance in circular bacterial genomes

slide-31
SLIDE 31

Introduction Methods Conclusion

Thanks

◮ to my supervisor Prof. Ohlebusch ◮ to my colleagues Mohamed I. Abouelhoda and Jan Stallkamp ◮ to you for your attention

The median problem for the reversal distance in circular bacterial genomes