Group theoretic formalization of double-cut-and-join model of - - PowerPoint PPT Presentation

group theoretic formalization of double cut and join
SMART_READER_LITE
LIVE PREVIEW

Group theoretic formalization of double-cut-and-join model of - - PowerPoint PPT Presentation

Group theoretic formalization of double-cut-and-join model of chromosomal rearrangement Sangeeta Bhatia Phd Supervisor- Prof.Andrew Francis Centre for Research in Mathematics University of Western Sydney 7 th November 2013 Rare is better


slide-1
SLIDE 1

Group theoretic formalization of double-cut-and-join model of chromosomal rearrangement

Sangeeta Bhatia Phd Supervisor- Prof.Andrew Francis

Centre for Research in Mathematics University of Western Sydney

7th November 2013

slide-2
SLIDE 2

Rare is better – large scale mutations

◮ Large scale genome rearrangements such as insertion or

deletion of genes, gene duplications, inversions of genes make good phlyogenetic markers, precisely because they are rare.

◮ Our focus - Determining a measure of difference between

various species bssed on such large scale genome rearrangements.

◮ Our tool - algebra/group theory.

slide-3
SLIDE 3

An example – Double cut and join

slide-4
SLIDE 4

An example – Double cut and join

◮ Genome representation – graph.

slide-5
SLIDE 5

An example – Double cut and join

◮ Genome representation – graph. ◮ Rearrangement events

Inversion of a section

Translocation of a section

Fission/Fusion of strands

slide-6
SLIDE 6

Double-cut-and-join: genome representation

slide-7
SLIDE 7

Double-cut-and-join: genome representation

◮ A “gene” or region has two extremities: a head and a tail.

slide-8
SLIDE 8

Double-cut-and-join: genome representation

◮ A “gene” or region has two extremities: a head and a tail. ◮ Store “adjacencies” i.e. which gene extremities are adjacent

  • n the genome.
slide-9
SLIDE 9

Double-cut-and-join: genome representation

◮ A “gene” or region has two extremities: a head and a tail. ◮ Store “adjacencies” i.e. which gene extremities are adjacent

  • n the genome.

◮ Example

1t 1h, 3t 3h, 2t 2h 5h, 4t 5t, 4h {1t, {1h, 3t}, {3h, 2t}, 2h, {5h, 4t}, {5t, 4h}}

slide-10
SLIDE 10

Double cut and join – the cut

1t 1h, 2t 2h, 3t 3h, 4t 4h 1t 1h 2t 2h, 3t 3h 4t 4h

slide-11
SLIDE 11

Double cut and join operation — inversion

1t 1h 2t 2h, 3t 3h 4t 4h 1t 1h, 3h 3t, 2h 2t, 4t 4h

slide-12
SLIDE 12

Double cut and join operation — excision

1t 1h 2t 2h, 3t 3h 4t 4h 1t 1h, 4t 4h 2t, 3h 2h, 3t

slide-13
SLIDE 13

Circularization/Linearization

1t 1h, 2t 2h, 3t 3h, 4t 4h 4h, 1t 1h, 2t 2h, 3t 3h, 4t

slide-14
SLIDE 14

Fusion/Fission

1t 1h, 2t 2h, 3t 3h, 4t 4h 1t 1h, 2t 2h 3t 3h, 4t 4h

slide-15
SLIDE 15

Distance under the DCJ model – Adjacency graph

1h 1h2t 2t3t 4t3t 2h4t 2h3h 3h 1t4h 4h5t 5h5t 5h1t

slide-16
SLIDE 16

DCJ operator — Our re-formulation

◮ We assign a numeric label to each gene extremity. Let i be a

  • gene. Then

it → 2i − 1 ih → 2i

◮ Thus if there are n genes, we get 2n labels. Let us call this set

X.

slide-17
SLIDE 17

DCJ operator — Our re-formulation

◮ We assign a numeric label to each gene extremity. Let i be a

  • gene. Then

it → 2i − 1 ih → 2i

◮ Thus if there are n genes, we get 2n labels. Let us call this set

X.

◮ A genome on n genes is a permutation π on the set X such

that π(i) = j ⇐ ⇒ π(j) = i

slide-18
SLIDE 18

DCJ operator — Our re-formulation

◮ For example for the genome {1t, (1h, 2h), 2t}, the labels are

1t → 1, 1h → 2 2t → 3, 2h → 4

slide-19
SLIDE 19

DCJ operator — Our re-formulation

◮ For example for the genome {1t, (1h, 2h), 2t}, the labels are

1t → 1, 1h → 2 2t → 3, 2h → 4 and it is encoded as

  • 1

2 3 4 1 4 3 2

slide-20
SLIDE 20

DCJ operator — Our re-formulation

For i, j ∈ X Dij(π) =

  • (i j)π(i j)

if π = . . . (k i)(l j) and k = i or j = l (i j)π if i and j are fixed in π or π = . . . (i j)

slide-21
SLIDE 21

DCJ operator — Our re-formulation

For i, j ∈ X Dij(π) =

  • (i j)π(i j)

if π = . . . (k i)(l j) and k = i or j = l (i j)π if i and j are fixed in π or π = . . . (i j)

◮ Clearly, Dij = Dji.

slide-22
SLIDE 22

DCJ operator — Our re-formulation

For i, j ∈ X Dij(π) =

  • (i j)π(i j)

if π = . . . (k i)(l j) and k = i or j = l (i j)π if i and j are fixed in π or π = . . . (i j)

◮ Clearly, Dij = Dji. ◮ Also, D2 ij is identity.

slide-23
SLIDE 23

KEY RESULTS

slide-24
SLIDE 24

Key result # 1 – Structure of the group of Dijs

◮ Let Γn be the set of genomic permutations on n regions. Dij is

a bijection on Γn.

◮ Let D be the subgroup of SΓn generated by the Dij operators.

slide-25
SLIDE 25

Key result # 1 – Structure of the group of Dijs

◮ Let Γn be the set of genomic permutations on n regions. Dij is

a bijection on Γn.

◮ Let D be the subgroup of SΓn generated by the Dij operators.

Let the cardinality of Γn be γ. If γ/2 is even then D is alternating group of degree γ. Otherwise it is a symmetric group of degree γ.

slide-26
SLIDE 26

Key result # 1 – Structure of the group of Dijs

◮ Let Γn be the set of genomic permutations on n regions. Dij is

a bijection on Γn.

◮ Let D be the subgroup of SΓn generated by the Dij operators.

Let the cardinality of Γn be γ. If γ/2 is even then D is alternating group of degree γ. Otherwise it is a symmetric group of degree γ.

◮ Conjecture: γ/2 is even ∀n > 2.

slide-27
SLIDE 27

Key result # 2 – Characterization of cycles and paths of AG(A, B)

Theorem

Let A and B be genomes and let α be a k-cycle in the product πAπB. If α contains a point that is fixed in πA or πB, then the extremities in α form a path of length k in AG(A, B). If α does not contain any point of that is fixed in πA or πB then let β be the cycle in πAπB that contains πB(i) for any i ∈ α. Then αβ is a cycle in AG(A, B).

slide-28
SLIDE 28

Characterization of cycles and paths of AG(A, B) – example

πA = (1, 10)(2)(3, 5)(4, 7)(6)(8, 9) πB = (1, 8)(2, 3)(4, 6)(5, 7)(9, 10) 1h 2 2t3t (3,5) 2h4t (4,7) 3h (6) 4h5t (8,9) 5h1t (1,10) 1h2t (2,3) 4t3t (5,7) 2h3h (6,4) 1t4h (1,8) 5h5t (10,9) πA πB = (1, 9)(8, 10)(2, 5, 4, 6, 7, 3)

slide-29
SLIDE 29

Characterization of cycles and paths of AG(A, B) – example

πA = (1, 10)(2)(3, 5)(4, 7)(6)(8, 9) πB = (1, 8)(2, 3)(4, 6)(5, 7)(9, 10) 1h 2 2t3t (3,5) 2h4t (4,7) 3h (6) 4h5t (8,9) 5h1t (1,10) 1h2t (2,3) 4t3t (5,7) 2h3h (6,4) 1t4h (1,8) 5h5t (10,9) πA πB = (1, 9)(8, 10)(2, 5, 4, 6, 7, 3)

slide-30
SLIDE 30

Characterization of cycles and paths of AG(A, B) – example

πA = (1, 10)(2)(3, 5)(4, 7)(6)(8, 9) πB = (1, 8)(2, 3)(4, 6)(5, 7)(9, 10) 1h 2 2t3t (3,5) 2h4t (4,7) 3h (6) 4h5t (8,9) 5h1t (1,10) 1h2t (2,3) 4t3t (5,7) 2h3h (6,4) 1t4h (1,8) 5h5t (10,9) πA πB = (1, 9)(8, 10)(2, 5, 4, 6, 7, 3)

slide-31
SLIDE 31

Characterization of cycles and paths of AG(A, B) – example

πA = (1, 10)(2)(3, 5)(4, 7)(6)(8, 9) πB = (1, 8)(2, 3)(4, 6)(5, 7)(9, 10) 1h 2 2t3t (3,5) 2h4t (4,7) 3h (6) 4h5t (8,9) 5h1t (1,10) 1h2t (2,3) 4t3t (5,7) 2h3h (6,4) 1t4h (1,8) 5h5t (10,9) πA πB = (1, 9)(8, 10)(2, 5, 4, 6, 7, 3)

slide-32
SLIDE 32

Characterization of cycles and paths of AG(A, B) – example

πA = (1, 10)(2)(3, 5)(4, 7)(6)(8, 9) πB = (1, 8)(2, 3)(4, 6)(5, 7)(9, 10) 1h 2 2t3t (3,5) 2h4t (4,7) 3h (6) 4h5t (8,9) 5h1t (1,10) 1h2t (2,3) 4t3t (5,7) 2h3h (6,4) 1t4h (1,8) 5h5t (10,9) πA πB = (1, 9)(8, 10)(2, 5, 4, 6, 7, 3)

slide-33
SLIDE 33

Characterization of cycles and paths of AG(A, B) – example

πA = (1, 10)(2)(3, 5)(4, 7)(6)(8, 9) πB = (1, 8)(2, 3)(4, 6)(5, 7)(9, 10) 1h 2 2t3t (3,5) 2h4t (4,7) 3h (6) 4h5t (8,9) 5h1t (1,10) 1h2t (2,3) 4t3t (5,7) 2h3h (6,4) 1t4h (1,8) 5h5t (10,9) πA πB = (1, 9)(8, 10)(2, 5, 4, 6, 7, 3)

slide-34
SLIDE 34

Characterization of cycles and paths of AG(A, B) – example

πA = (1, 10)(2)(3, 5)(4, 7)(6)(8, 9) πB = (1, 8)(2, 3)(4, 6)(5, 7)(9, 10) 1h 2 2t3t (3,5) 2h4t (4,7) 3h (6) 4h5t (8,9) 5h1t (1,10) 1h2t (2,3) 4t3t (5,7) 2h3h (6,4) 1t4h (1,8) 5h5t (10,9) πA πB = (1, 9)(8, 10)(2, 5, 4, 6, 7, 3)

slide-35
SLIDE 35

Key result # 3 – DCJ Distance

dDCJ(πA, πB) = l(πA πB) 2 + E 2 where l(πAπB) is the length πA πB and E is the number of cycles in πA πB that move two fixed points of πA or of πB.

slide-36
SLIDE 36

Key result # 4 – Number of sorting scenarios

Let πA and πB be genomic permutations on n regions such that πBπA encodes a cycle in the adjacency graph AG(A, B). Then the number of optimal sorting scenarios between πA and πB is nn−2.

slide-37
SLIDE 37

An example

Let πa = (1, 8)(2, 3)(4, 5)(6, 7), πb = (1, 2)(3, 4)(5, 6)(7, 8)

slide-38
SLIDE 38

An example

Let πa = (1, 8)(2, 3)(4, 5)(6, 7), πb = (1, 2)(3, 4)(5, 6)(7, 8) d28(πa) = (1, 2)(8, 3)(4, 5)(6, 7)

slide-39
SLIDE 39

An example

Let πa = (1, 8)(2, 3)(4, 5)(6, 7), πb = (1, 2)(3, 4)(5, 6)(7, 8) d28(πa) = (1, 2)(8, 3)(4, 5)(6, 7) d48d28(πa) = (1, 2)(4, 3)(8, 5)(6, 7)

slide-40
SLIDE 40

An example

Let πa = (1, 8)(2, 3)(4, 5)(6, 7), πb = (1, 2)(3, 4)(5, 6)(7, 8) d28(πa) = (1, 2)(8, 3)(4, 5)(6, 7) d48d28(πa) = (1, 2)(4, 3)(8, 5)(6, 7) d68d48d28(πa) = (1, 2)(3, 4)(5, 6)(7, 8)

slide-41
SLIDE 41

An example

Let πa = (1, 8)(2, 3)(4, 5)(6, 7), πb = (1, 2)(3, 4)(5, 6)(7, 8) d28(πa) = (1, 2)(8, 3)(4, 5)(6, 7) d48d28(πa) = (1, 2)(4, 3)(8, 5)(6, 7) d68d48d28(πa) = (1, 2)(3, 4)(5, 6)(7, 8) d68d48d28(πa) = (6, 8)(4, 8)(2, 8)πa(2, 8)(4, 8)(6, 8)

slide-42
SLIDE 42

An example

Let πa = (1, 8)(2, 3)(4, 5)(6, 7), πb = (1, 2)(3, 4)(5, 6)(7, 8) d28(πa) = (1, 2)(8, 3)(4, 5)(6, 7) d48d28(πa) = (1, 2)(4, 3)(8, 5)(6, 7) d68d48d28(πa) = (1, 2)(3, 4)(5, 6)(7, 8) d68d48d28(πa) = (6, 8)(4, 8)(2, 8)πa(2, 8)(4, 8)(6, 8) (6, 8)(4, 8)(2, 8) = (6, 8)(2, 8)(2, 4) = (6, 8)(2, 4)(4, 8) (4, 6)(2, 6)(6, 8) = (4, 6)(2, 8)(2, 6) = (4, 6)(6, 8)(2, 8) (2, 4)(6, 8)(4, 8) = (2, 4)(4, 6)(6, 8) = (2, 4)(4, 8)(4, 6) (2, 8)(2, 4)(4, 6) = (2, 8)(2, 6)(2, 4) = (2, 8)(4, 6)(2, 6) (2, 6)(2, 4)(6, 8) = (2, 6)(6, 8)(2, 4) (4, 8)(2, 8)(4, 6) = (4, 8)(4, 6)(2, 8)

slide-43
SLIDE 43

To summarize

Genomes Graphs, Comparison graphs Permutations, group theory Distance between genomes

slide-44
SLIDE 44

To summarize

Genomes Graphs, Comparison graphs Permutations, group theory Distance between genomes

slide-45
SLIDE 45

To summarize

Genomes Graphs, Comparison graphs Permutations, group theory Distance between genomes

slide-46
SLIDE 46

Future work

◮ Of particular interest: evolution of mitochondrial DNA which

is circular.

◮ Model important rearrangement events in circular

chromosomes.

◮ Translocation event i.e. movement of a section of the genome

to a different location on the genome can be modeled as a combination of two double cut and join events.

◮ Determine DCJ distance when the different events carry

weights/probabilities.

Thank you!