Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of - - PowerPoint PPT Presentation

graph mining
SMART_READER_LITE
LIVE PREVIEW

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of - - PowerPoint PPT Presentation

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph Computation Think like a vertex Linear algebra Graph Search Find instances of path expressions Graph Mining Mine patterns of


slide-1
SLIDE 1

Graph Mining

Marco Serafini

COMPSCI 532 Lecture 11

slide-2
SLIDE 2
slide-3
SLIDE 3

3

Classes of Graph Systems

  • Graph Computation
  • Think like a vertex
  • Linear algebra
  • Graph Search
  • Find instances of path expressions
  • Graph Mining
  • Mine patterns of interest and their matches

3

slide-4
SLIDE 4

4

Applications of Graph Mining

  • Web and Advertising
  • Link spam detection
  • Identify sub-markets
  • Attributed edges in knowledge bases
  • Biology
  • DNA motif detection
  • Protein-protein interaction
  • Social computing
  • Friend recommendation
  • Community detection

4

slide-5
SLIDE 5

5

Graph Mining - Concepts

1 4 6 5 1 6 1 3 6 4 3 6 4 2 6 2

Input graph Pattern Embeddings

3 2

slide-6
SLIDE 6

6

Graph Exploration

  • Enumerate (& prune) embeddings
  • Aggregate by pattern

Input graph … … … … … … 6

slide-7
SLIDE 7

7

  • Exponential number of embeddings

Challenges

Size of embedding 4K 22K 335K 7.8M 117M 1.7B 1 2 3 4 5 6

# unique embedding (log-scale) E x p

  • n

e n t i a l ! ! !

7

slide-8
SLIDE 8

8

boolean filter(Embedding e) { return isClique(e); } void process(Embedding e) {

  • utput(e);

} boolean shouldExpand(Embedding embedding) { return embedding.getNumVertices() < maxsize; } boolean isClique(Embedding e) { return e.getNumEdgesAddedWithExpansion()==e.getNumberOfVertices()-1; }

API Example: Clique finding

8

1 2 3 4 5 6 7 8 9 10 11 12

slide-9
SLIDE 9

9

Model - Think Like an Embedding

1 2 3 1 2 1 3 3 6 1 2 6

Exploration step i Exploration step i+1 Input Output

1 2 3 1 2 6

Input Output

1 2 3 1 2 6

  • 2. Candidates:

Expand by 1 vertex/edge Filter Discard false

  • 3. Filter

uninteresting candidates Process Save

  • 4. Produce outputs

true

1 2 1 3

  • 1. Start from a

set of initial embeddings 9 …

slide-10
SLIDE 10

10

Avoiding redundant work

  • Problem: Automorphic embeddings
  • Automorphisms == subgraph equivalences
  • Redundant work

1 2 3

10

3 2 1

Worker 1 Worker 2 ==

slide-11
SLIDE 11

11

Avoiding redundant work

  • Solution: Decentralized Embedding Canonicality
  • No coordination
  • Efficient

1 2 3

11

3 2 1

Worker 1 Worker 2 == isCanonical(e) → true isCanonical(e) → false

slide-12
SLIDE 12

12

Embedding Canonicality

  • isCanonical(e) iff at every step add neighbor with

smallest ID

1 2 3 6 4 5

e

Initial embedding (e)

  • 1 - 3 - 6

Expansions:

  • 1 - 3 - 6 - 5 → canonical
  • 1 - 3 - 6 - 4 → canonical
  • 1 - 3 - 6 - 2 → not canonical (1 - 2 - 3 - 6)

12

slide-13
SLIDE 13

13

Efficient Pattern Aggregation

  • Goal: Aggregate automorphic patterns to single key
  • Find canonical pattern
  • No known polynomial solution

1 2 2 4 3 5

3x Expensive graph canonization Canonical pattern

slide-14
SLIDE 14

14

Efficient Pattern Aggregation

  • Solution: 2-level pattern aggregation
  • 1. Embeddings → quick patterns
  • 2. Quick patterns → canonical pattern

1 2 2 4 3 5

3x Linear matching to quick pattern 2) Canonical pattern 1) Quick patterns 2x Expensive graph canonization

slide-15
SLIDE 15

15

Handling Exponential growth

  • Goal: handle trillions+ different embeddings?
  • Solution: Overapproximating DAGs (ODAGs)
  • Compress into less restrictive superset
  • Deal with spurious embeddings

4 1 5 2 3

Canonical Embeddings

1 4 2 1 4 3 1 4 5 2 3 4 2 4 5 3 4 5

Input Graph Embedding List

1 2 3 3 4 2 3 4 5

ODAG 15

slide-16
SLIDE 16

16

16

Variants of Graph Mining Systems

  • G-Miner
  • For each embedding, decide how to expand
  • Easier to implement graph search
  • Systems for random walks
  • ASAP: Random walks for approximate subgraph enumeration
  • KnightKing: Random walks for node embeddings and graph

neural networks