Ioannis Mitliagkas
- UT Austin
Dimitris Papailiopoulos Alex Dimakis Constantine Caramanis
Finding Dense Subgraphs via Low-Rank Bilinear Optimization
with:
Finding Dense Subgraphs via Low-Rank Bilinear Optimization Ioannis - - PowerPoint PPT Presentation
Finding Dense Subgraphs via Low-Rank Bilinear Optimization Ioannis Mitliagkas Dimitris Papailiopoulos with: Alex Dimakis UT Austin Constantine Caramanis Densest k-Subgraph (DkS) Given graph and a parameter k Find k vertices
Ioannis Mitliagkas
Dimitris Papailiopoulos Alex Dimakis Constantine Caramanis
with:
Given
graph and a parameter k
k vertices containing most edges
Community Mining communities = large dense components Link Spam Detection dense parts of web: spam Computational biology complex patterns in gene annotation graphs
Given
graph and a parameter k
k vertices containing most edges
There is a 5-subgraph with 10 edges
NP-hard Hard to approximate
Given
graph and a parameter k
k vertices containing most edges
NP-hard Hard to approximate
Given
graph and a parameter k
k vertices containing most edges
*Except in specific cases: [Arora et al 95] (1+ε) approx. for linear subgraphs of dense graphs [Khot, 2004]
After long effort, [Feige, 2001], [Bhaskara et al., STOC ’10] Best known ratio
100-factor approx. for graphs with 100 Million nodes
Known DkS guarantees are not useful in practice… under worst case analysis
Known DkS guarantees are not useful in practice… under worst case analysis
Q2: DkS on billion-scale graphs? Q1: Provable, graph-dependent bounds?
Graph-dependent bounds In practice:
New DkS algorithm: nearly-linear times for many real-world graphs Scalable
implementation in MapReduce+Python up to billion-edge graphs on 800 cores on Amazon EC2
Parallelizable
1 1 1 1 1 1 1
DkS on a graph
Low rank approximation
1 1 1 1 1 1 1 1.1 0.9 1.2 0.7 1.4 0.6 1.3
0.1
DkS on a graph
DkS on constant rank graph
Low rank approximation
1 1 1 1 1 1 1 1.1 0.9 1.2 0.7 1.4 0.6 1.3
0.1
DkS on a graph
DkS on constant rank graph
Low-rank DkS is related to original DkS
Theorems:
Algorithm computes in time O(nd+2/δ) a k-subgraph with density
OPTd ≥ OPT · 0.5 · (1 − δ) − 2|λd+1|
Theorems:
Algorithm computes in time O(nd+2/δ) a k-subgraph with density
OPTd ≥ OPT · 0.5 · (1 − δ) − 2|λd+1|
If the largest d eigenvalues of the adjacency are positive
Our algorithm computes in time
O(|E| · log n + n ✏d )
Theorems:
Algorithm computes in time O(nd+2/δ) a k-subgraph with density
OPTd ≥ OPT · 0.5 · (1 − δ) − 2|λd+1|
larger d => better approximation, slower computation If the largest d eigenvalues of the adjacency are positive
Our algorithm computes in time
O(|E| · log n + n ✏d )
Trivial upper bound = k-1
subgraph size, k density
4M nodes, 35M edges
Blue: TPower JMLR’13 Green: GreedyFeige Algorithmica ’01 Yellow: GreedyRavi OR’94
Trivial upper bound = k-1
subgraph size, k density
4M nodes, 35M edges
Big Gap
Blue: TPower JMLR’13 Green: GreedyFeige Algorithmica ’01 Yellow: GreedyRavi OR’94
Trivial upper bound = k-1
subgraph size, k density
4M nodes, 35M edges
d=1 spannogram
Blue: TPower JMLR’13 Green: GreedyFeige Algorithmica ’01 Yellow: GreedyRavi OR’94
Trivial upper bound = k-1
subgraph size, k density
4M nodes, 35M edges
d=2 spannogram
Blue: TPower JMLR’13 Green: GreedyFeige Algorithmica ’01 Yellow: GreedyRavi OR’94
Trivial upper bound = k-1
subgraph size, k density
4M nodes, 35M edges
d=5 spannogram
Blue: TPower JMLR’13 Green: GreedyFeige Algorithmica ’01 Yellow: GreedyRavi OR’94
Trivial upper bound = k-1
subgraph size, k density
4M nodes, 35M edges
Smaller Gap
Blue: TPower JMLR’13 Green: GreedyFeige Algorithmica ’01 Yellow: GreedyRavi OR’94
Trivial upper bound = k-1
subgraph size, k density
4M nodes, 35M edges
80% OPT Graph-dependent bound
Blue: TPower JMLR’13 Green: GreedyFeige Algorithmica ’01 Yellow: GreedyRavi OR’94
4M nodes, 35M edges
subgraph size, k density
OPTd + λd+1
vertex vertex
vertex vertex
vertex vertex
vertex vertex Edges In subgraph
vertex vertex Edges In subgraph
DkS:
DkS:
DBkS: DkS:
DBkS: DkS:
DBkS: DkS:
Lemma: ρ-approximation for DBkS = ½ρ-approximation for DkS
DBkS:
1 1 1 1 1 1 1
DBkS:
1.1 0.9 1.2 0.7 1.4 0.6 1.3
0.1
DBkS:
1.1 0.9 1.2 0.7 1.4 0.6 1.3
0.1
DBkS:
1.1 0.9 1.2 0.7 1.4 0.6 1.3
0.1
DBkS: Efficiently solvable
Check all subgraphs
Naïvely:
✓n k ◆
Rank-1 case:
Q: Maximize the product of two numbers A: Maximize each number individually
1 2 3 4 1 2 3 4
top-k set: the k-largest coordinates of a vector, e.g., if k =2, then top-2 set = {3,4}
1 2 3
5 2 7
2 3
5 2 7
Theorem: # top-k sets in a d-dimensional span: Spannogram: Traverses all of them efficiently
Intuition: x, y pick the top-k set of a vector from a 2-dimensional span.
Based on Spannogram [Asteris, Papail., Karystinos, ISIT2011]
1 2 3
5 2 7
2 3
5 2 7
Randomized algorithm Take random points: s1, . . . , s1/✏d ∈ span(v1, . . . , vd)
1 2 3
5 2 7
2 3
5 2 7
Randomized algorithm Take random points: s1, . . . , s1/✏d ∈ span(v1, . . . , vd)
Practically linear time
10
4
10
6
10
8
10
10
200 400 600 800 1000
G
2, k = 3√n
|E | G-Feige G-Ravi TPower Spannogram
OPT within 70% in most experiments.
OPT within 70% in most experiments.
OPT within 70% in most experiments.
OPT within 70% in most experiments.
OPT within 70% in most experiments.
Step 2 Find largest k entries: Step 3 Compute density of corresponding subgraph Step 1 Take random points: s1, . . . , s1/✏d ∈ span(v1, . . . , vd)
Step 2 Find largest k entries:
Practically linear time
Step 3 Compute density of corresponding subgraph Step 1 Take random points: s1, . . . , s1/✏d ∈ span(v1, . . . , vd)