Scalable Hashing-Based Network Discovery
Tara Safavi, Chandra Sripada, Danai Koutra University of Michigan, Ann Arbor
Scalable Hashing-Based Network Discovery Tara Safavi , Chandra - - PowerPoint PPT Presentation
Scalable Hashing-Based Network Discovery Tara Safavi , Chandra Sripada, Danai Koutra University of Michigan, Ann Arbor Networks are everywhere. Airport connections Internet routing Paper citations but are not always directly observed
Tara Safavi, Chandra Sripada, Danai Koutra University of Michigan, Ann Arbor
Airport connections Internet routing Paper citations
…
See Network Structure Inference, A Survey, Brugere, Gallagher, Berger-Wolf
…
Brain scans Gene sequences Stock patterns
Traditional method
A B C
network
All-pairs correlation
. 8 . 4 . 3
A B C
Traditional method
A B C
network
All-pairs correlation
. 8 . 4 . 3
A B C
. 8
A B C
Drop edges below threshold θ
Traditional method
A B C
network
All-pairs correlation
. 8 . 4 . 3
A B C
. 8
A B C
Drop edges below threshold θ
Widely used in many domains, interpretable, but…
A B C
All-pairs correlation
. 8 . 4 . 3
A B C
O(N2) comparisons
How to set?
. 8
A B C Drop edges below threshold θ
Traditional method
network
A B C
All-pairs correlation
. 8 . 4 . 3
A B C
O(N2) comparisons
How to set?
A B C A B C Hash function Buckets
Binarize
. 8
A B C
Bucket pairwise similarity
Drop edges below threshold θ
New hashing-based
network
A B C
All-pairs correlation
. 8 . 4 . 3
A B C
O(N2) comparisons
Arbitrary?
A B C A B C Hash function Buckets
Binarize
. 8
A B C
Bucket pairwise similarity
Drop edges below threshold θ
A B C A B C A B C A B C
. 8
Binarize Bucket pairwise similarity
Pipeline
A B C A B C A B C A B C
. 8
Binarize Bucket pairwise similarity
1Ratanamahatana et al, 2005
Pipeline
A B C A B C A B C A B C
. 8
Binarize Bucket pairwise similarity
1Ratanamahatana et al, 2005
x: y: 1 1 0 1 0 0 0 1 1 1 1 0 0 1
(1 + α)0 + (1 + α)1 (1 + α)0 + (1 + α)1 + (1 + α)2
+
A B C A B C A B C A B C
. 8
Binarize Bucket pairwise similarity
Pipeline
x: y: 1 1 0 1 0 0 0 1 1 1 1 0 0 1
(1 + α)0 + (1 + α)1 (1 + α)0 + (1 + α)1 + (1 + α)2
+
A B C A B C A B C A B C
. 8
Binarize Bucket pairwise similarity
Pipeline
A B C A B C A B C A B C
. 8
Binarize Bucket pairwise similarity
Pipeline
! "#
s r
A B C A B C A B C A B C
. 8
Binarize Bucket pairwise similarity
Pipeline
x y z
! "#
s r
1 1 1 0 0 1 0 1 1 0 0 1 1 0 1 0 0 1 x: y: z: n n + 1
ABC distance triangle inequality: sketch of proof
Append (a) to existing run
A B C A B C A B C A B C
. 8
Binarize Bucket pairwise similarity
Pipeline
x y z
Original data + hash function
x: y: z: 1 1 1 0 0 1 0 1 1 0 0 1 1 0 1 0 0 1 g = h2 & h4 [1, 0] [0, 0] x, y z
Hash signatures Hash table buckets
Background
Original data + hash function
x: y: z: 1 1 1 0 0 1 0 1 1 0 0 1 1 0 1 0 0 1 k = 2 g = h2 & h4 [11, 00] [01, 00] x, y z
Hash signatures Hash table buckets
(d1, d2, 1 − α d1 (1 + α)n − 1, 1 − α d2 (1 + α)n − 1)
A B C A B C A B C A B C
. 8
Binarize Bucket pairwise similarity
Pipeline
A B C
All-pairs correlation
. 8 . 4 . 3
A B A B C A B C Hash function Buckets
Binarize
. 8
A B C
Bucket pairwise similarity
Drop edges below threshold θ
A B C A B C A B C A B C
. 8
Binarize Bucket pairwise similarity
Pipeline
Data
6.5 .3 1.4 .7 .03 F1 F2 F3 F4 F5 Healthy
Feature selection
6.5 .3 1.4 .7 .03 1
Train Labels
1
Test Predicted health
Average accuracy same — runtime is not!
Total time: 5 min
Total time: >1 hr
A B C A B C A B C A B C
. 8
Binarize Bucket pairwise similarity
A B C A B C A B C A B C
. 8
Binarize Bucket pairwise similarity
A B C A B C A B C A B C
. 8
Binarize Bucket pairwise similarity
A B C A B C A B C A B C
. 8
Binarize Bucket pairwise similarity
Thank you + questions Supported by
Lacking consecutiveness and/or metrics Distributional assumptions
User-set thresholds