Graph Matching Networks for Learning the Similarity of Graph - - PowerPoint PPT Presentation

graph matching networks for learning the similarity of
SMART_READER_LITE
LIVE PREVIEW

Graph Matching Networks for Learning the Similarity of Graph - - PowerPoint PPT Presentation

Graph Matching Networks for Learning the Similarity of Graph Structured Objects Yujia Li, Chenjie Gu, Thomas Dullien*, Oriol Vinyals, Pushmeet Kohli * Graph structured data appear in many applications Molecules Scene Graphs* Programs**


slide-1
SLIDE 1

Yujia Li, Chenjie Gu, Thomas Dullien*, Oriol Vinyals, Pushmeet Kohli

Graph Matching Networks for Learning the Similarity of Graph Structured Objects

*

slide-2
SLIDE 2

Graph Matching Networks — Yujia Li

Graph structured data appear in many applications

Molecules Scene Graphs*

Image credit: *Johnson et al. Image Retrieval using Scene Graphs. **Brockschmidt et al. Generative Code Modeling with Graphs

Programs** Binaries

slide-3
SLIDE 3

Graph Matching Networks — Yujia Li

Graph structured data appear in many applications

Molecules Scene Graphs*

Image credit: *Johnson et al. Image Retrieval using Scene Graphs. **Brockschmidt et al. Generative Code Modeling with Graphs

Programs** Binaries Drug Discovery Semantic Image Retrieval Code Search Sofuware Vulnerabilities

slide-4
SLIDE 4

Graph Matching Networks — Yujia Li

Finding similar graphs

Query Graph Candidate Graphs Nodes and edges can have atuributes The notion of “similarity” varies across problems Reasoning about both the graph structure and the semantics Graph structures vary a lot

slide-5
SLIDE 5

Graph Matching Networks — Yujia Li

The binary function similarity search problem

EXE

00000000: 7f45 4c46 0201 0100 .ELF.... 00000008: 0000 0000 0000 0000 ........ 00000010: 0300 3e00 0100 0000 ..>..... 00000018: 4005 0000 0000 0000 @....... 00000020: 4000 0000 0000 0000 @....... 00000028: 7819 0000 0000 0000 x....... 00000030: 0000 0000 4000 3800 ....@.8. 00000038: 0900 4000 1e00 1d00 ..@..... 00000040: 0600 0000 0400 0000 ........ 00000048: 4000 0000 0000 0000 @....... 00000050: 4000 0000 0000 0000 @.......

contains vulnerability?

slide-6
SLIDE 6

Graph Matching Networks — Yujia Li

The binary function similarity search problem

EXE

00000000: 7f45 4c46 0201 0100 .ELF.... 00000008: 0000 0000 0000 0000 ........ 00000010: 0300 3e00 0100 0000 ..>..... 00000018: 4005 0000 0000 0000 @....... 00000020: 4000 0000 0000 0000 @....... 00000028: 7819 0000 0000 0000 x....... 00000030: 0000 0000 4000 3800 ....@.8. 00000038: 0900 4000 1e00 1d00 ..@..... 00000040: 0600 0000 0400 0000 ........ 00000048: 4000 0000 0000 0000 @....... 00000050: 4000 0000 0000 0000 @.......

contains vulnerability? binary analysis

000000000000064a <f>: 64a: 55 push %rbp 64b: 48 89 e5 mov %rsp,%rbp 64e: 89 7d fc mov %edi,-0x4(%rbp) 651: 83 7d fc 00 cmpl $0x0,-0x4(%rbp) 655: 7e 09 jle 660 <f+0x16> 657: 8b 45 fc mov -0x4(%rbp),%eax 65a: 0f af 45 fc imul -0x4(%rbp),%eax 65e: eb 06 jmp 666 <f+0x1c> 660: 8b 45 fc mov -0x4(%rbp),%eax 663: 83 c0 01 add $0x1,%eax 666: 5d pop %rbp 667: c3 retq push %rbp mov %rsp,%rbp mov %edi,-0x4(%rbp) mov -0x4(%rbp),%eax imul -0x4(%rbp),%eax jmp 666 <f+0x1c> mov -0x4(%rbp),%eax add $0x1,%eax pop %rbp retq cmpl $0x0,-0x4(%rbp) jle 660 <f+0x16>

graph sizes in our dataset: from 10 to 103

slide-7
SLIDE 7

Graph Matching Networks — Yujia Li

The binary function similarity search problem

EXE

00000000: 7f45 4c46 0201 0100 .ELF.... 00000008: 0000 0000 0000 0000 ........ 00000010: 0300 3e00 0100 0000 ..>..... 00000018: 4005 0000 0000 0000 @....... 00000020: 4000 0000 0000 0000 @....... 00000028: 7819 0000 0000 0000 x....... 00000030: 0000 0000 4000 3800 ....@.8. 00000038: 0900 4000 1e00 1d00 ..@..... 00000040: 0600 0000 0400 0000 ........ 00000048: 4000 0000 0000 0000 @....... 00000050: 4000 0000 0000 0000 @.......

contains vulnerability? binary analysis similar search in a library of binaries with known vulnerabilities not similar

slide-8
SLIDE 8

Graph Matching Networks — Yujia Li

The binary function similarity search problem

EXE

00000000: 7f45 4c46 0201 0100 .ELF.... 00000008: 0000 0000 0000 0000 ........ 00000010: 0300 3e00 0100 0000 ..>..... 00000018: 4005 0000 0000 0000 @....... 00000020: 4000 0000 0000 0000 @....... 00000028: 7819 0000 0000 0000 x....... 00000030: 0000 0000 4000 3800 ....@.8. 00000038: 0900 4000 1e00 1d00 ..@..... 00000040: 0600 0000 0400 0000 ........ 00000048: 4000 0000 0000 0000 @....... 00000050: 4000 0000 0000 0000 @.......

contains vulnerability? binary analysis similar search in a library of binaries with known vulnerabilities not similar

slide-9
SLIDE 9

Graph Matching Networks — Yujia Li

Most existing approaches

Mostly hand-engineered algorithms / heuristics with limited learning: Graph hashes (graph → descriptor): widely used in security applications

  • human-designed hash functions that encode graph structure
  • good at exact matches, not so good at estimating similarity

Graph kernels (pair of graphs → similarity): popular in various graph-level prediction tasks

  • human-designed kernels as a measure of similarity between graphs
  • the design of kernels is imporuant for pergormance
slide-10
SLIDE 10

Graph Matching Networks — Yujia Li

Difgerent graph similarity estimation paradigms

Graph embedding Graph → descriptor Measure distance on descriptors Fast hashing based retrieval Graph matching Compute distance jointly on the pair of graphs More computation for betuer accuracy

slide-11
SLIDE 11

Graph Matching Networks — Yujia Li

Graph similarity learning

Learn a similarity (or distance) function

d( , ) → small d( , ) → large

slide-12
SLIDE 12

Graph Matching Networks — Yujia Li

Graph similarity learning

Learn a similarity (or distance) function

d( , ) → small d( , ) → large

Supervised learning on labeled pairs

  • r triplets

t = +1 ⇒ G1, G2 similar ⇒ d(G1, G2) ↙ t = -1 ⇒ G1, G2 not similar ⇒ d(G1, G2) ↗ G1, G2 similar, G1, G3 not similar ⇒ d(G1, G2) ↙ d(G1, G3) ↗

slide-13
SLIDE 13

Graph Matching Networks — Yujia Li

Learning graph embeddings with Graph Neural Nets

d(G1, G2) = Euclidean/Hamming distance(embed(G1), embed(G2))

slide-14
SLIDE 14

Graph Matching Networks — Yujia Li

Learning graph embeddings with Graph Neural Nets

d(G1, G2) = Euclidean/Hamming distance(embed(G1), embed(G2))

Input Graph Message Passing Aggregate over Graph

embed( ) =

slide-15
SLIDE 15

Graph Matching Networks — Yujia Li

Graph embedding model details

Messages: Node updates: Aggregation: sum pooling, atuention pooling etc.

slide-16
SLIDE 16

Graph Matching Networks — Yujia Li

Graph Matching Networks

h1, h2 = embed-and-match(G1, G2) d(G1, G2) = Euclidean/Hamming distance(h1, h2)

slide-17
SLIDE 17

Graph Matching Networks — Yujia Li

Graph Matching Networks

h1, h2 = embed-and-match(G1, G2) d(G1, G2) = Euclidean/Hamming distance(h1, h2)

Atuention: Weighted difgerence:

slide-18
SLIDE 18

Graph Matching Networks — Yujia Li

Graph Matching Networks

h1, h2 = embed-and-match(G1, G2) d(G1, G2) = Euclidean/Hamming distance(h1, h2)

Total cross-graph message Efgectively: match node i to the closest node in the

  • ther graph and take the difgerence.
slide-19
SLIDE 19

Graph Matching Networks — Yujia Li

Other variants

Other variants of GNNs for embedding:

  • e.g. Graph Convolutional Networks (GCNs), which is a simpler variant

without modeling edge features Siamese networks:

  • instead of using Euclidean or Hamming distance, learn a distance score

through a neural net

  • d(G1, G2) = MLP(concat(embed(G1), embed(G2)))
  • learn the embedding model and the scoring MLP jointly
slide-20
SLIDE 20

Graph Matching Networks — Yujia Li

Graph Matching Similarity score Graph Embedding Similarity score Siamese Network Similarity score

slide-21
SLIDE 21

Graph Matching Networks — Yujia Li

Experiments

Graph edit distance learning Data: synthetic graphs Similarity: small edit distance → similar Control-fmow graph based binary function similarity search Data: compile fgmpeg with difgerent compilers and

  • ptimization levels.

Similarity: binary functions associated with the same original function → similar Mesh graph retrieval Data: mesh graphs for 100 object classes (COIL-DEL dataset) Similarity: mesh for the same object class → similar

slide-22
SLIDE 22

Graph Matching Networks — Yujia Li

Synthetic task: graph edit distance learning

Training and evaluating on graphs of size n, and edge density (probability) p Measuring pair classifjcation AUC / triplet prediction accuracy. Learned models do betuer than WL kernel. Matching model betuer than embedding model.

slide-23
SLIDE 23

Graph Matching Networks — Yujia Li

Results on binary function similarity search

Hand-engineered baseline (graph hashing + locality sensitive hashing) vs GNN embedding vs GMN. Graph topology only vs jointly over structures and features.

slide-24
SLIDE 24

Graph Matching Networks — Yujia Li

Results on binary function similarity search

1) learned approaches betuer than hand-engineered solution 2) matching betuer than embedding alone 3) joint modeling of structure and features betuer than structure alone 4) pergormance betuer with more graph propagation steps

slide-25
SLIDE 25

Graph Matching Networks — Yujia Li

More ablation studies

GMNs consistently betuer than alternatives. Siamese vs matching: fusing two graphs early betuer than only at the end.

slide-26
SLIDE 26

Graph Matching Networks — Yujia Li

Learned atuention patuerns

We never supervise the cross-graph atuention, but the model still learns some interesting atuention patuerns.

slide-27
SLIDE 27

Graph Matching Networks — Yujia Li

Learned atuention patuerns

When the two graphs are identical, the learned atuention patuern may (not always) correspond to node matching.

Afuer 10 message passing steps Model trained on the edit distance learning task.

slide-28
SLIDE 28

Graph Matching Networks — Yujia Li

Learned atuention patuerns

Otherwise the atuention patuern is less interpretable.

Afuer 10 message passing steps Model trained on the edit distance learning task.

slide-29
SLIDE 29

Graph Matching Networks — Yujia Li

Conclusions and future directions

Takeaways:

  • graph similarity can be learned.
  • learned graph embedding models are good and effjcient models for this.
  • graph matching networks are even betuer.

Future directions:

  • make cross-graph atuention and matching more effjcient
  • explore new architectures that can utilize the new capability of learned

graph similarity

slide-30
SLIDE 30

Yujia Li, Chenjie Gu, Thomas Dullien*, Oriol Vinyals, Pushmeet Kohli

Graph Matching Networks for Learning the Similarity of Graph Structured Objects

*