Exact Computation of Graph Edit Distance for Uniform and Non-Uniform - - PowerPoint PPT Presentation

exact computation of graph edit distance for uniform and
SMART_READER_LITE
LIVE PREVIEW

Exact Computation of Graph Edit Distance for Uniform and Non-Uniform - - PowerPoint PPT Presentation

. . . . . . . . . . . . . . Background Experiments Conclusions and Future Work References Exact Computation of Graph Edit Distance for Uniform and Non-Uniform Metric Edit Costs David B. Blumenthal & Johann Gamper GbRPR,


slide-1
SLIDE 1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Background DF-GEDu CSI_GEDnu Experiments Conclusions and Future Work References

Exact Computation of Graph Edit Distance for Uniform and Non-Uniform Metric Edit Costs

David B. Blumenthal & Johann Gamper GbRPR, Anacapri, 18 May 2017

  • D. B. Blumenthal & J. Gamper: Exact Computation of Graph Edit Distance for Uniform and Non-Uniform Metric Edit Costs

1/17

slide-2
SLIDE 2

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Background DF-GEDu CSI_GEDnu Experiments Conclusions and Future Work References

Overview

▶ graph edit distance: flexible distance measure for labelled

graphs

▶ supports uniform and non-uniform edit costs ▶ exact computation is NP-hard ▶ existing exact algorithms

▶ A⋆-GED (Riesen, Fankhauser, and Bunke 2007) ▶ BLP-GED (Lerouge et al. 2016) ▶ DF-GED: node-based DFS, designed for non-uniform edit

costs (Abu-Aisheh et al. 2015)

▶ CSI_GED: edge-based DFS, supports uniform edit costs

  • nly (Gouda and Hassaan 2016)

▶ contributions

(1) DF-GEDu: speed-up of DF-GED for uniform edit costs (2) CSI_GEDnu: generalised version of CSI_GED that supports non-uniform edit costs

  • D. B. Blumenthal & J. Gamper: Exact Computation of Graph Edit Distance for Uniform and Non-Uniform Metric Edit Costs

2/17

slide-3
SLIDE 3

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Background DF-GEDu CSI_GEDnu Experiments Conclusions and Future Work References

Two Communities

▶ Pattern Recognition ▶ Database Technologies

▶ a lot of work o graph edit distance exists ▶ publications in venues such as VLDB, ICDE, SIGMOD, TKDE,

CIKM

▶ main focus: filtering and lower bounds ▶ slightly different definitions ▶ main difference: restriction on uniform edit costs

  • D. B. Blumenthal & J. Gamper: Exact Computation of Graph Edit Distance for Uniform and Non-Uniform Metric Edit Costs

3/17

slide-4
SLIDE 4

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Background DF-GEDu CSI_GEDnu Experiments Conclusions and Future Work References

Graph Edit Distance

▶ labelled undirected graph: 4-tuple G = (V G, EG, ℓG V , ℓG E ) ▶ label functions: ℓG V : V G → ΣV for nodes, ℓG E : EG → ΣE for

edges

▶ edit path between G and H: sequence of edit operations

starting at G and ending at H′ ≃ H

▶ edit operations: deleting, inserting, relabelling ▶ edit costs: cV : ΣV × ΣV → R for operations on nodes,

cE : ΣE × ΣE → R for operations on edges

▶ uniform edit costs: cV (α, β), cE(α, β) =

{ 1 α ̸= β α = β

▶ graph edit distance λ(G, H): minimum cost of edit path

between G and H

  • D. B. Blumenthal & J. Gamper: Exact Computation of Graph Edit Distance for Uniform and Non-Uniform Metric Edit Costs

4/17

slide-5
SLIDE 5

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Background DF-GEDu CSI_GEDnu Experiments Conclusions and Future Work References

Node Maps

▶ V G+|H|: V G plus |V H| isolated dummy nodes ▶ node map: injective partial function π : V G+|H| → V H+|G|

with V G ⊆ dom(π) and V H ⊆ img(π)

▶ edit path induced by node map: let i ∈ V G, k ∈ V H,

ij ∈ EG, kl ∈ EH

▶ π(i) = k ⇝ change node label from ℓG

V (i) to ℓH V (k)

▶ π(i) = kε ⇝ delete node i ▶ π−1(k) = iε ⇝ insert node k ▶ π(i)π(j) = kl ⇝ change edge label from ℓG

E (ij) to ℓH E (kl)

▶ π(i)π(j) /

∈ EH ⇝ delete edge ij

▶ π−1(k)π−1(l) /

∈ EG ⇝ insert edge kl

▶ alternative definition of λ(G, H): minimum cost g(π) of edit

path induced by a node map π

  • D. B. Blumenthal & J. Gamper: Exact Computation of Graph Edit Distance for Uniform and Non-Uniform Metric Edit Costs

5/17

slide-6
SLIDE 6

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Background DF-GEDu CSI_GEDnu Experiments Conclusions and Future Work References

DF-GED: Node-Based DFS

i1 i2 i3 i4 j1 j2 j3 jε ∅ V H dummy node heuristically sorted V G inner nodes = incomplete node maps leafs = complete node maps ⇝ UB = g(π)

π = {i1 → j1} π = {i1 → j1, i2 → jε} g(π1) + h(π1) g(π2) + h(π2) g(π3) + h(π3) ≤ ≤ > UB

V H dummy node ▶ g(π): cost of partial edit path induced by π ▶ h(π): lower bound for induced cost from π to a leaf, i. e., complete node map rooted at π ⇝ has to be computed at each inner node of the DFS

  • D. B. Blumenthal & J. Gamper: Exact Computation of Graph Edit Distance for Uniform and Non-Uniform Metric Edit Costs

6/17

slide-7
SLIDE 7

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Background DF-GEDu CSI_GEDnu Experiments Conclusions and Future Work References

Our Speed-Up DF-GEDu for Uniform Edit Costs

▶ h(π): defined as MLA(

multiset with unassigned labels from nodes in V G+|H|

  • ℓG

V (V G+|H|−π) ×ℓH V (V H+|G|−π), cV ) +

MLA(ℓG

E (EG−π)

  • multiset with

unassigned labels from edges in EG

×ℓH

E (V H−π), cE) ▶ computation for non-uniform edit costs requires cubic time

Lemma

For uniform edit costs, h(π) can be computed in linear time.

  • 1. at initialisation, sort node and edge labels
  • 2. compute MLA(A × B, c) as Γ(A, B) = max{|A|, |B|} − |A ∩ B|
  • D. B. Blumenthal & J. Gamper: Exact Computation of Graph Edit Distance for Uniform and Non-Uniform Metric Edit Costs

7/17

slide-8
SLIDE 8

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Background DF-GEDu CSI_GEDnu Experiments Conclusions and Future Work References

Valid Edge Maps (I)

▶ −

→ EG: one oriented edge (i, j) for each undirected ij ∈ EG

▶ ←

→ EH: both (k, l) and (l, k) for each kl ∈ EH

▶ edge map: mapping ϕ :

− → EG → ← → EH ∪ {eε}

▶ induces relation πϕ on V G × V H: if ϕ(i, j) = (k, l), then

(i, k) ∈ πϕ and (j, l) ∈ πϕ

▶ valid edge map: ϕ is valid iff πϕ is partial injective function

  • D. B. Blumenthal & J. Gamper: Exact Computation of Graph Edit Distance for Uniform and Non-Uniform Metric Edit Costs

8/17

slide-9
SLIDE 9

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Background DF-GEDu CSI_GEDnu Experiments Conclusions and Future Work References

Valid Edge Maps (II)

▶ partial edit path induced by valid edge map: let i ∈ V G,

k ∈ V H, (i, j) ∈ − → EG, (k, l), (l, k) ∈ ← → EH

▶ ϕ(i, j) = (l, k) ⇝ change edge label from ℓG

E (ij) to ℓH E (kl)

▶ ϕ(i, j) = eε ⇝ delete edge ij ▶ ϕ−1[{(k, l), (l, k)}] = ∅ ⇝ insert edge kl ▶ πϕ(i) = k ⇝ changed node label from ℓG

V (i) to ℓH V (k)

Theorem

λ(G, H) = min{g(ϕ) + Γ(V G−πϕ, V H−πϕ) | ϕ is valid edge map} holds for uniform edit costs, where g(ϕ) is the cost of the partial edit path induced by edge map ϕ.

▶ can compute λ(G, H) by traversing space of all valid edge

maps

  • D. B. Blumenthal & J. Gamper: Exact Computation of Graph Edit Distance for Uniform and Non-Uniform Metric Edit Costs

9/17

slide-10
SLIDE 10

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Background DF-GEDu CSI_GEDnu Experiments Conclusions and Future Work References

CSI_GED: Edge-Based DFS

(i1, i2) (i1, i4) (i2, i3) (i3, i4) (j1, j2) (j2, j1) (j2, j3) (j3, j2) eε ∅ − → EG inner nodes = incomplete edge maps leafs = complete edge maps ⇝ UB = g(ϕ) + Γ(V G−πϕ, V H−πϕ)

πϕ = {i1 → j1, i2 → j2} πϕ = {i1 → j1, i2 → j2} g′(ϕ1) g′(ϕ2) ≤ > UB

← → EH dummy edge ▶ g(ϕ): cost of partial edit path induced by ϕ ▶ g′(ϕ): lower bound for induced cost of complete edge map rooted at ϕ

  • D. B. Blumenthal & J. Gamper: Exact Computation of Graph Edit Distance for Uniform and Non-Uniform Metric Edit Costs

10/17

slide-11
SLIDE 11

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Background DF-GEDu CSI_GEDnu Experiments Conclusions and Future Work References

Our Generalisation CSI_GEDnu

Theorem

λ(G, H) = min{g(ϕ) + MLA(ℓG

V (V G+|H|−πϕ) × ℓH V (V H+|G|−πϕ), cV ) |

ϕ is valid edge map} holds for non-uniform metric edit costs.

▶ can use CSI_GED’s DFS framework for non-uniform edit

costs

▶ at leafs, use MLA instead of Γ to compute UB ▶ increased complexity at leafs (cubic instead of linear) ▶ no increased complexity at inner nodes of search tree

  • D. B. Blumenthal & J. Gamper: Exact Computation of Graph Edit Distance for Uniform and Non-Uniform Metric Edit Costs

11/17

slide-12
SLIDE 12

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Background DF-GEDu CSI_GEDnu Experiments Conclusions and Future Work References

Setup

▶ used the datasets AIDS and FINGERPRINTS (Riesen and

Bunke 2008)

▶ formed groups of size four containing graphs of fixed size

and ran all algorithms for all pairs of graphs in one test group

▶ set time limit of 1000 seconds ▶ recorded the runtime, the number of timeouts, and the

deviation of an algorithm’s upper bound after 1000 seconds from the best upper bound

  • D. B. Blumenthal & J. Gamper: Exact Computation of Graph Edit Distance for Uniform and Non-Uniform Metric Edit Costs

12/17

slide-13
SLIDE 13

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Background DF-GEDu CSI_GEDnu Experiments Conclusions and Future Work References

Results for Non-Uniform Metric Edit Costs

3 ± 1 9 ± 1 15 ± 1 21 ± 1 10−4 10−3 10−2 10−1 100 101 102 103

runtime in sec.

3 ± 1 9 ± 1 15 ± 1 21 ± 1 1 2 3 4 5 6

timeouts

3 ± 1 9 ± 1 15 ± 1 21 ± 1 0.5 1 1.5

deviation in % CSI_GEDnu DF-GED

(a) Results for FINGERPRINTS

3 ± 1 9 ± 1 15 ± 1 21 ± 1 10−4 10−3 10−2 10−1 100 101 102 103

number of nodes

3 ± 1 9 ± 1 15 ± 1 21 ± 1 1 2 3 4 5 6

number of nodes

3 ± 1 9 ± 1 15 ± 1 21 ± 1 20 40

number of nodes

(b) Results for AIDS

  • D. B. Blumenthal & J. Gamper: Exact Computation of Graph Edit Distance for Uniform and Non-Uniform Metric Edit Costs

13/17

slide-14
SLIDE 14

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Background DF-GEDu CSI_GEDnu Experiments Conclusions and Future Work References

Results for Uniform Edit Costs

3 ± 1 9 ± 1 15 ± 1 21 ± 1 10−4 10−3 10−2 10−1 100 101 102 103

runtime in sec.

3 ± 1 9 ± 1 15 ± 1 21 ± 1 1 2 3 4 5 6

timeouts

3 ± 1 9 ± 1 15 ± 1 21 ± 1 1 2 3

deviation in % CSI_GEDnu CSI_GED DF-GED DF-GEDu

(a) Results for FINGERPRINTS

3 ± 1 9 ± 1 15 ± 1 21 ± 1 10−4 10−3 10−2 10−1 100 101 102 103

number of nodes

3 ± 1 9 ± 1 15 ± 1 21 ± 1 1 2 3 4 5 6

number of nodes

3 ± 1 9 ± 1 15 ± 1 21 ± 1 50 100

number of nodes

(b) Results for AIDS

  • D. B. Blumenthal & J. Gamper: Exact Computation of Graph Edit Distance for Uniform and Non-Uniform Metric Edit Costs

14/17

slide-15
SLIDE 15

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Background DF-GEDu CSI_GEDnu Experiments Conclusions and Future Work References

Upshot of the Results

▶ uniform edit costs

▶ our speed-up DF-GEDu always outperforms DF-GED ▶ CSI_GED and our generalisation CSI_GEDnu perform

similarly

▶ general observation: no clear winner between node based

and edge based algorithms

▶ FINGERPRINTS: DF-GED and DF-GEDu perform better ▶ AIDS: CSI_GEDnu and CSI_GED perform better ▶ CSI_GED and CSI_GEDnu are more stable than DF-GED

and DF-GEDu: their deviation is small even if DF-GED and DF-GEDu perform better

▶ no prior knowledge about dataset and both uniform and

non-uniform edit costs relevant ⇝ CSI_GEDnu is algorithms of choice

  • D. B. Blumenthal & J. Gamper: Exact Computation of Graph Edit Distance for Uniform and Non-Uniform Metric Edit Costs

15/17

slide-16
SLIDE 16

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Background DF-GEDu CSI_GEDnu Experiments Conclusions and Future Work References

Future Work

▶ individuate characteristics of datasets, for which the node

based/edge based approaches perform better

▶ develop meta-algorithm based on these characteristics ▶ combine techniques from both communities in order to

come up with significantly faster algorithm

  • D. B. Blumenthal & J. Gamper: Exact Computation of Graph Edit Distance for Uniform and Non-Uniform Metric Edit Costs

16/17

slide-17
SLIDE 17

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Background DF-GEDu CSI_GEDnu Experiments Conclusions and Future Work References

References

Abu-Aisheh, Zeina et al. (2015). “An Exact Graph Edit Distance Algorithm for Solving Pattern Recognition Problems”. In: ICPRAM 2015. Ed. by Maria De Marsico, Mário A. T. Figueiredo, and Ana L. N. Fred. Vol. 1. SciTePress, pp. 271–278. Gouda, Karam and Mosab Hassaan (2016). “CSI_GED: An Efficient Approach for Graph Edit Similarity Computation”. In: 32nd IEEE International Conference on Data

  • Engineering. IEEE Computer Society, pp. 265–276.

Lerouge, Julien et al. (2016). “Exact Graph Edit Distance Computation Using a Binary Linear Program”. In: S+SSPR 2016. Vol. 10029. LNCS. Heidelberg: Springer,

  • pp. 485–495.

Riesen, Kaspar and Horst Bunke (2008). “IAM Graph Database Repository for Graph Based Pattern Recognition and Machine Learning”. In: S+SSPR 2008. Ed. by Niels da Vitoria Lobo et al. Vol. 5342. LNCS. Springer, pp. 287–297. Riesen, Kaspar, Stefan Fankhauser, and Horst Bunke (2007). “Speeding Up Graph Edit Distance Computation with a Bipartite Heuristic”. In: MLG 2007. Ed. by Paolo Frasconi, Kristian Kersting, and Koji Tsuda, pp. 21–24. URL: %7Bhttp://mlg07.dsi.unifi.it/pdf/02_Riesen.pdf%7D.

  • D. B. Blumenthal & J. Gamper: Exact Computation of Graph Edit Distance for Uniform and Non-Uniform Metric Edit Costs

17/17