Finding Dense Subgraphs via Low-Rank Bilinear Optimization Ioannis - - PowerPoint PPT Presentation

finding dense subgraphs via low rank bilinear optimization
SMART_READER_LITE
LIVE PREVIEW

Finding Dense Subgraphs via Low-Rank Bilinear Optimization Ioannis - - PowerPoint PPT Presentation

Finding Dense Subgraphs via Low-Rank Bilinear Optimization Ioannis Mitliagkas Dimitris Papailiopoulos with: Alex Dimakis UT Austin Constantine Caramanis Densest k-Subgraph (DkS) Given graph and a parameter k Find k vertices


slide-1
SLIDE 1

Ioannis Mitliagkas

  • UT Austin

Dimitris Papailiopoulos Alex Dimakis Constantine Caramanis

Finding Dense Subgraphs via Low-Rank Bilinear Optimization

with:

slide-2
SLIDE 2

Densest k-Subgraph (DkS)

Given

graph and a parameter k

  • Find

k vertices containing most edges

slide-3
SLIDE 3
  • Applications

Community Mining communities = large dense components Link Spam Detection dense parts of web: spam Computational biology complex patterns in gene annotation graphs

Densest k-Subgraph (DkS)

Given

graph and a parameter k

  • Find

k vertices containing most edges

slide-4
SLIDE 4

There is a 5-subgraph with 10 edges

  • Q: Can you find it?

Densest k-Subgraph (DkS)

slide-5
SLIDE 5

NP-hard Hard to approximate

Densest k-Subgraph (DkS)

Given

graph and a parameter k

  • Find

k vertices containing most edges

slide-6
SLIDE 6

NP-hard Hard to approximate

Densest k-Subgraph (DkS)

Given

graph and a parameter k

  • Find

k vertices containing most edges

*Except in specific cases: [Arora et al 95] (1+ε) approx. for linear subgraphs of dense graphs [Khot, 2004]

slide-7
SLIDE 7

Worst-Case Analysis

slide-8
SLIDE 8
  • Worst-Case Analysis
slide-9
SLIDE 9

After long effort, [Feige, 2001], [Bhaskara et al., STOC ’10] Best known ratio

  • 10-factor approx. for graphs with 10K nodes

100-factor approx. for graphs with 100 Million nodes

  • Worst-Case Analysis
slide-10
SLIDE 10
slide-11
SLIDE 11

Known DkS guarantees are not useful in practice… under worst case analysis

slide-12
SLIDE 12

Known DkS guarantees are not useful in practice… under worst case analysis

Q2: DkS on billion-scale graphs? Q1: Provable, graph-dependent bounds?

slide-13
SLIDE 13

Beyond the Worst Case

Graph-dependent bounds In practice:

New DkS algorithm: nearly-linear times for many real-world graphs Scalable

implementation in MapReduce+Python up to billion-edge graphs on 800 cores on Amazon EC2

Parallelizable

slide-14
SLIDE 14

1 1 1 1 1 1 1

DkS on a graph

  • Hard to solve
  • Hard to approximate

Our Low-Rank Framework

slide-15
SLIDE 15

Low rank approximation

1 1 1 1 1 1 1 1.1 0.9 1.2 0.7 1.4 0.6 1.3

  • 0.3
  • 0.2

0.1

DkS on a graph

  • Hard to solve
  • Hard to approximate

DkS on constant rank graph

  • Nearly-linear time solvable (!)

Our Low-Rank Framework

slide-16
SLIDE 16

Low rank approximation

1 1 1 1 1 1 1 1.1 0.9 1.2 0.7 1.4 0.6 1.3

  • 0.3
  • 0.2

0.1

DkS on a graph

  • Hard to solve
  • Hard to approximate

DkS on constant rank graph

  • Nearly-linear time solvable (!)

Low-rank DkS is related to original DkS

Our Low-Rank Framework

slide-17
SLIDE 17

Results: Theory

slide-18
SLIDE 18

Graph-dependent Guarantees

Theorems:

Algorithm computes in time O(nd+2/δ) a k-subgraph with density

OPTd ≥ OPT · 0.5 · (1 − δ) − 2|λd+1|

slide-19
SLIDE 19

Graph-dependent Guarantees

Theorems:

Algorithm computes in time O(nd+2/δ) a k-subgraph with density

OPTd ≥ OPT · 0.5 · (1 − δ) − 2|λd+1|

If the largest d eigenvalues of the adjacency are positive

Our algorithm computes in time

  • a k-subgraph with density

OPTd ≥ OPT · (1 − ✏) − 2|d+1|

O(|E| · log n + n ✏d )

slide-20
SLIDE 20

Graph-dependent Guarantees

Theorems:

Algorithm computes in time O(nd+2/δ) a k-subgraph with density

OPTd ≥ OPT · 0.5 · (1 − δ) − 2|λd+1|

larger d => better approximation, slower computation If the largest d eigenvalues of the adjacency are positive

Our algorithm computes in time

  • a k-subgraph with density

OPTd ≥ OPT · (1 − ✏) − 2|d+1|

O(|E| · log n + n ✏d )

slide-21
SLIDE 21

Performance in Practice

slide-22
SLIDE 22

Trivial upper bound = k-1

subgraph size, k density

com-LiveJournal graph

4M nodes, 35M edges

slide-23
SLIDE 23

Blue: TPower JMLR’13 Green: GreedyFeige Algorithmica ’01 Yellow: GreedyRavi OR’94

Trivial upper bound = k-1

subgraph size, k density

com-LiveJournal graph

4M nodes, 35M edges

slide-24
SLIDE 24

Big Gap

Blue: TPower JMLR’13 Green: GreedyFeige Algorithmica ’01 Yellow: GreedyRavi OR’94

Trivial upper bound = k-1

subgraph size, k density

com-LiveJournal graph

4M nodes, 35M edges

slide-25
SLIDE 25

d=1 spannogram

Blue: TPower JMLR’13 Green: GreedyFeige Algorithmica ’01 Yellow: GreedyRavi OR’94

Trivial upper bound = k-1

subgraph size, k density

com-LiveJournal graph

4M nodes, 35M edges

slide-26
SLIDE 26

d=2 spannogram

Blue: TPower JMLR’13 Green: GreedyFeige Algorithmica ’01 Yellow: GreedyRavi OR’94

Trivial upper bound = k-1

subgraph size, k density

com-LiveJournal graph

4M nodes, 35M edges

slide-27
SLIDE 27

d=5 spannogram

Blue: TPower JMLR’13 Green: GreedyFeige Algorithmica ’01 Yellow: GreedyRavi OR’94

Trivial upper bound = k-1

subgraph size, k density

com-LiveJournal graph

4M nodes, 35M edges

slide-28
SLIDE 28

Smaller Gap

Blue: TPower JMLR’13 Green: GreedyFeige Algorithmica ’01 Yellow: GreedyRavi OR’94

Trivial upper bound = k-1

subgraph size, k density

com-LiveJournal graph

4M nodes, 35M edges

slide-29
SLIDE 29

80% OPT Graph-dependent bound

Blue: TPower JMLR’13 Green: GreedyFeige Algorithmica ’01 Yellow: GreedyRavi OR’94

com-LiveJournal graph

4M nodes, 35M edges

subgraph size, k density

OPTd + λd+1

slide-30
SLIDE 30

How we do it

slide-31
SLIDE 31

DkS via Quadratic Optimization

vertex vertex

slide-32
SLIDE 32

DkS via Quadratic Optimization

vertex vertex

slide-33
SLIDE 33

DkS via Quadratic Optimization

vertex vertex

slide-34
SLIDE 34

DkS via Quadratic Optimization

vertex vertex Edges In subgraph

slide-35
SLIDE 35

DkS via Quadratic Optimization

vertex vertex Edges In subgraph

DkS:

slide-36
SLIDE 36

DkS:

DkS via Bilinear Optimization

slide-37
SLIDE 37

DBkS: DkS:

DkS via Bilinear Optimization

slide-38
SLIDE 38

DBkS: DkS:

DkS via Bilinear Optimization

slide-39
SLIDE 39

DBkS: DkS:

DkS via Bilinear Optimization

Lemma: ρ-approximation for DBkS = ½ρ-approximation for DkS

slide-40
SLIDE 40

DBkS:

DkS via Bilinear Optimization

1 1 1 1 1 1 1

slide-41
SLIDE 41

Low-Rank Approximation

DBkS:

slide-42
SLIDE 42

Low-Rank Approximation

1.1 0.9 1.2 0.7 1.4 0.6 1.3

  • 0.3
  • 0.2

0.1

DBkS:

slide-43
SLIDE 43

Low-Rank Approximation

1.1 0.9 1.2 0.7 1.4 0.6 1.3

  • 0.3
  • 0.2

0.1

DBkS:

slide-44
SLIDE 44

Low-Rank Approximation

1.1 0.9 1.2 0.7 1.4 0.6 1.3

  • 0.3
  • 0.2

0.1

DBkS: Efficiently solvable

slide-45
SLIDE 45

How the Low-Rank Solver Works

Check all subgraphs

Naïvely:

✓n k ◆

Rank-1 case:

Q: Maximize the product of two numbers A: Maximize each number individually

slide-46
SLIDE 46

1
 2
 3 4 1
 2
 3 4

How the Rank-1 Solver Works

top-k set: the k-largest coordinates of a vector, e.g., if k =2, then top-2 set = {3,4}

  • Intuition: x, y pick the top-k set of v.
slide-47
SLIDE 47

How the Rank-2 Solver Works

1 
 2 
 3

  • 4

5 
 2 
 7

  • 1


 2 
 3

  • 4

5 
 2 
 7

  • Q: How many top-k sets are there in a 2-dimensional span?

Theorem: # top-k sets in a d-dimensional span: Spannogram: Traverses all of them efficiently

Intuition: x, y pick the top-k set of a vector from a 2-dimensional span.

Based on Spannogram [Asteris, Papail., Karystinos, ISIT2011]

slide-48
SLIDE 48

How the Rank-2 Solver Works

1 
 2 
 3

  • 4

5 
 2 
 7

  • 1


 2 
 3

  • 4

5 
 2 
 7

  • Intuition: x, y pick the top-k set of a vector from a 2-dimensional span.

Randomized algorithm Take random points: s1, . . . , s1/✏d ∈ span(v1, . . . , vd)

slide-49
SLIDE 49

How the Rank-2 Solver Works

1 
 2 
 3

  • 4

5 
 2 
 7

  • 1


 2 
 3

  • 4

5 
 2 
 7

  • Intuition: x, y pick the top-k set of a vector from a 2-dimensional span.

Randomized algorithm Take random points: s1, . . . , s1/✏d ∈ span(v1, . . . , vd)

Practically linear time

slide-50
SLIDE 50

Implementation

slide-51
SLIDE 51
  • MapReduce Implementation
slide-52
SLIDE 52
  • git.io/spannogram

MapReduce Implementation

slide-53
SLIDE 53

Billion-scale Graphs

10

4

10

6

10

8

10

10

200 400 600 800 1000

G

  • n, 1

2, k = 3√n

  • Subgraph density

|E | G-Feige G-Ravi TPower Spannogram

slide-54
SLIDE 54

Conclusions

slide-55
SLIDE 55
  • New combinatorial approx. algorithm for DkS.

Conclusions

slide-56
SLIDE 56
  • New combinatorial approx. algorithm for DkS.
  • Graph-dependent spectral bounds: 


OPT within 70% in most experiments.

Conclusions

slide-57
SLIDE 57
  • New combinatorial approx. algorithm for DkS.
  • Graph-dependent spectral bounds: 


OPT within 70% in most experiments.

  • Bound could be trivial in the worst case.

Conclusions

slide-58
SLIDE 58
  • New combinatorial approx. algorithm for DkS.
  • Graph-dependent spectral bounds: 


OPT within 70% in most experiments.

  • Bound could be trivial in the worst case.
  • Empirically outperforms previous state of the art

Conclusions

slide-59
SLIDE 59
  • New combinatorial approx. algorithm for DkS.
  • Graph-dependent spectral bounds: 


OPT within 70% in most experiments.

  • Bound could be trivial in the worst case.
  • Empirically outperforms previous state of the art

Conclusions

slide-60
SLIDE 60
  • New combinatorial approx. algorithm for DkS.
  • Graph-dependent spectral bounds: 


OPT within 70% in most experiments.

  • Bound could be trivial in the worst case.
  • Empirically outperforms previous state of the art
  • Highly scalable implementation

Conclusions

slide-61
SLIDE 61

Thank you

slide-62
SLIDE 62
slide-63
SLIDE 63

Backup slides

slide-64
SLIDE 64

Other experiments

slide-65
SLIDE 65

Randomized Algorithm

Step 2 Find largest k entries: Step 3 Compute density of corresponding subgraph Step 1 Take random points: s1, . . . , s1/✏d ∈ span(v1, . . . , vd)

slide-66
SLIDE 66

Randomized Algorithm

Step 2 Find largest k entries:

Practically linear time

Step 3 Compute density of corresponding subgraph Step 1 Take random points: s1, . . . , s1/✏d ∈ span(v1, . . . , vd)