 
              Linear Algebraic Graph Algorithms Linear Algebraic Graph Algorithms for Back End Processing for Back End Processing Jeremy Kepner, Nadya Bliss, and Eric Robinson MIT Lincoln Laboratory This work is sponsored by the Department of Defense under Air Force Contract FA8721-05-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the United States Government. MIT Lincoln Laboratory Slide-1
Outline • Post Detection Processing Post Detection Processing • Introduction • Sparse Matrix Duality Sparse Matrix Duality • Approach Approach • Power Law Graphs • Graph Benchmark • Results • Summary MIT Lincoln Laboratory Slide-2
Statistical Network Detection 1st Neighbor 1st Neighbor Problem: Forensic Back-T Problem : Forensic Back-Tracking acking Problem Problem: Forensic Back-T : Forensic Back-Tracking acking 2nd Neighbor 2nd Neighbor • Currently, significant analy Currently, significant analyst effor effort ded dedica cated ted to to • Currently, significant analyst effort dedicated to Currently, significant analy st effort dedicated to 3rd Neighbor 3rd Neighbor manually identifying links manually identifying links between threat events and between threat events and manually identifying links manually identifying links between threat between threat events and events and their immediat their im diate precursor sites e precursor sites their im their immediat diate precursor sites e precursor sites – Days of manu Days of ma nual effo al effort to fu rt to fully expl lly explore candi ore candidate tracks ate tracks – Days of manu Days of ma nual effo al effort to fu rt to fully expl lly explore candi ore candidate tracks ate tracks – Correl Correlations missed unless re ations missed unless recurring sites are recognized curring sites are recognized – Correlations missed unless re Correl ations missed unless recurring sites are recognized curring sites are recognized by analysts by analysts Event B Event B by analysts by analysts – Precursor sites may be low-val Precursor sites may be low-value stagi ue staging areas ng areas – Precursor sites may be low-val Precursor sites may be low-value stagi ue staging areas ng areas – Manual a Man l analysis wi lysis will n ll not sup t support furth rt further b er back cktrackin tracking fro from m – Manual a Man l analysis wi lysis will n ll not sup t support furth rt further b er back cktrackin tracking fro from m staging areas to potentially higher-val stagi ng areas to potentially higher-value sites ue sites staging areas to potentially higher-val stagi ng areas to potentially higher-value sites ue sites Concept Concept: Statistica Statistical Network Detection l Network Detection Concept Concept: Statistica Statistical Network Detection l Network Detection • Develop graph algorithms to Develop graph algorithms to identify adversary nodes identify adversary nodes • Develop graph algorithms to Develop graph algorithms to identify adversary nodes identify adversary nodes by estimating connectivity to known events by estimating connect ivity to known events by estimating connectivity to known events by estimating connect ivity to known events – Tracks describ Tracks d scribe gra graph betwe between know known sites or events n sites or events – Tracks d Tracks describ scribe gra graph betwe between know known sites or events n sites or events which act as sources which act as sources which act as sources which act as sources – Unknow Unknown sites are detected by n sites are detected by the aggregati the aggregation of threat on of threat – Unknow Unknown sites are detected by n sites are detected by the aggregati the aggregation of threat on of threat Event A Event A propagated over many potential connections propagated over many potential connecti ons propagated over many potential connections propagated over many potential connecti ons Planned system capability (over major urban area) • 1M Tracks/day (100,000 at any time) • 100M Tracks in 100 day database • 1M nodes (starting/ending points) Computationally demanding graph processing • 100 events/day (10,000 events in – ~ 10 6 seconds based on benchmarks & scale database) – ~ 10 3 seconds needed for effective CONOPS (1000x improvement) MIT Lincoln Laboratory Slide-3
Graphs as Matrices 2 1 4 � 5 7 6 3 A T A T x x • Graphs can be represented as a sparse matrices – Multiply by adjacency matrix � step to neighbor vertices – Work-efficient implementation from sparse data structures • Most algorithms reduce to products on semi-rings: C = A “+”.“x” B – “x” : associative, distributes over “+”  “+” : associative, commutative – – Examples: +.* min.+ or.and MIT Lincoln Laboratory Slide-4
Distributed Array Mapping Adjacency Matrix Types: RANDOM PL SCRAMBLED TOROIDAL POWER LAW (PL) Distributions: 1D BLOCK 2D BLOCK 2D CYCLIC ANTI-DIAGONAL EVOLVED Sparse Matrix duality provides a natural way of exploiting Sparse Matrix duality provides a natural way of exploiting distributed data distributions distributed data distributions MIT Lincoln Laboratory Slide-5
Algorithm Comparison Algorithm (Problem) Canonical Array-Based Critical Path Complexity (for array) Complexity Θ ( mn ) Θ ( mn ) Θ ( n ) Bellman-Ford (SSSP) Θ ( n 3 Θ (log n ) Generalized B-F (APSP) NA log n ) Θ ( n 3 ) Θ ( n 3 ) Θ ( n ) Floyd-Warshall (APSP) Θ ( m + n log n ) Θ ( n 2 ) Θ ( n ) Prim (MST) Θ ( m log n ) Θ ( m log n ) Θ (log 2 Bor ů vka (MST) n ) Θ ( m 2 n ) Θ ( m 2 n ) Θ ( mn ) Edmonds-Karp (Max Flow) Θ ( mn 2 ) Push-Relabel (Max Flow) O ( mn 2 ) ? (or Θ ( n 3 )) Θ ( m + n log n ) Θ ( mn + n 2 ) Θ ( n ) Greedy MIS (MIS) Θ ( m + n log n ) Θ ( m log n ) Θ (log n ) Luby (MIS) Majority of selected algorithms can be represented ( n = | V | and m = | E |.) with array-based constructs with equivalent complexity. MIT Lincoln Laboratory Slide-6
A few DoD Applications using Graphs TOPOL TOPOLOGICAL ICAL DATA ANAL DATA ANALYSIS YSIS DATA FUSION DAT FUSION FORENSIC BACKTRACKING FORENSIC BACKTRACKING • Higher dimension graph analysis to determine sensor Event B Event B net coverage [Jadbabaie] 2D/3D Fused Event A Event A Imagery • Identify key staging and logistic sites areas from persistent surveillance of • Bayes nets for fusing vehicle tracks imagery and ladar for better on board tracking Key Se Key Semiring miring Key Algorithm Key Algorithm Application Application Operation Operation • Subspace reduction +.* A +.* X T • Minimal Spanning Trees X • Identifying staging areas • Betweenness Centrality A +.* B A +.* B ( A , B tensors) • Feature aided 2D/3D fusion • Bayesian belief propagation min.+ A A ( A tensor) D • Finding cycles on complexes • Single source shortest path MIT Lincoln Laboratory Slide-7
Approach: Graph Theory Benchmark • Scalable benchmark specified by graph community • Goal – Stress parallel computer architecture • Key data – Very large Kronecker graph • Key algorithm – Betweenness Centrality • Computes number of shortest paths each vertex is on – Measure of vertex “importance” – Poor efficiency on conventional computers MIT Lincoln Laboratory Slide-8
Outline • Introduction • Kronecker Model Kronecker Model • Power Law Graphs • Analytic Results Analytic Results • Graph Benchmark • Results • Summary MIT Lincoln Laboratory Slide-9
Power Law Graphs Social Network Analysis Anomaly Detection Target Identification • • Many graph algorithms must operate on power law graphs Many graph algorithms must operate on power law graphs • • Most nodes have a few edges Most nodes have a few edges • • A few nodes have many edges A few nodes have many edges MIT Lincoln Laboratory Slide-10
Modeling of Power Law Graphs Vertex In Degree Distribution Adjacency Matrix Number of Vertices Power Law In Degree • • Real world data (internet, social networks, …) has connections on all Real world data (internet, social networks, …) has connections on all scales (i.e power law) scales (i.e power law) • • ⊗ Graphs: G ⊗ k = G ⊗ k-1 Can be modeled with Kronecker ⊗ G Graphs: G ⊗ k = G ⊗ k-1 Can be modeled with Kronecker G Where “ ⊗ ”denotes the Kronecker – Where “ ⊗ ”denotes the Kronecker product of two matrices – product of two matrices MIT Lincoln Laboratory Slide-11
Kronecker Products and Graph Kronecker Product • Let B be a N B xN B matrix • Let C be a N C xN C matrix • Then the Kronecker product of B and C will produce a N B N C xN B N C matrix A: Kronecker Graph (Leskovec 2005 & Chakrabati 2004) • Let G be a NxN adjacency matrix • Kronecker exponent to the power k is: MIT Lincoln Laboratory Slide-12
Kronecker Product of a Bipartite Graph P ⊗ = Equal with the right P ⊗ = permutation B(15,1) P ⊗ B(5,1) B(3,1) = ∪ B(3,5) • • Fundamental result [Weischel 1962] is that the Kronecker product of Fundamental result [Weischel 1962] is that the Kronecker product of two complete bipartite graphs is two complete bipartite graphs two complete bipartite graphs is two complete bipartite graphs • • More generally More generally MIT Lincoln Laboratory Slide-13
Recommend
More recommend