Space of K-truss Decomposition Optimizations on GPUs Safaa Diab, Mhd - PowerPoint PPT Presentation

KTrussExplorer: Exploring the Design Space of K-truss Decomposition Optimizations on GPUs Safaa Diab, Mhd Ghaith Olabi, Izzat El Hajj American University of Beirut HPEC Graph Challenge September 23, 2020

Overview KTrussExplorer is a highly parameterized framework for exploring different combinations of k-truss decomposition optimizations on GPUs Supported features: Contributions: • Edge-centric parallelization • A survey of optimizations • Undirected or directed graphs • A framework for exploring the design space • Directed by index or by degree • Tiling the adjacency matrix github.com/ielhajj/ktruss-explorer • A view of the design space • Parallelizing intersections • Removing or marking weak edges • Unexplored combinations faster than prior champions • Recomputing for all or affected edges

Methodology • Software: KtrussExplorer kernels are implemented in CUDA • System: Evaluation is on one Volta V100 GPU with 16GB of memory • Datasets: We evaluate with all graphs in the graph challenge collection • Except: Friendster, graph500-scale24-ef16, and graph500-scale25-ef16 due to limited device memory capacity. • Search space: Design space is searched exhaustively • Except: very large graphs

Graph Directedness 64 Directed is faster Speedup of Directed over Undirected 32 Undirected is faster 0 0 16 8 4 1 2 1 2 2 support support +1 +1 +1 +1 +1 +1 +1 +1 +1 1 {0,1} {0,2} {1,0} {1,2} {2,0} {2,1} {0,1} {0,2} {1,2} Undirected Graph Directed Graph 0.5  Less redundancy  Less synchronization (no atomics) 0.25  Stop counting early 10 -6 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 10 1 10 2 0.000001 0.00001 0.00010.001 0.01 0.1 1 10 100 Average Number of Triangles per Edge k = 3

Directing Edges by Degree 8 Directed by degree is faster Directed by index Directed by index is faster Speedup of Directed by Degree 4 over Directed by Index • Keep edges from vertex with lower index to vertex with higher index 2 Directed by degree • Keep edges from vertex with lower 1 degree to vertex with higher degree  Advantage: shrink large adjacency 0.5 lists to reduce load imbalance 10 0 10 1 10 2 10 3 10 4 10 5 10 6 1 10 100 1000 10000 1000001000000 Maximum Vertex Degree k = 3

0 1 2 3 4 5 6 7 Tiling 0 1 1 0 2 srcPtr 3 0 4 7 10 12 14 18 29 24 3 5 7 6 4 dstIdx 5 1 5 6 7 0 3 5 4 5 7 1 5 2 7 0 1 2 3 0 7 0 2 4 6 2 4 6 7 Example Graph Logical Adjacency List CSR Representation without Tiling 0 1 2 3 4 5 6 7 0  Better locality  Partitioning intersections into smaller sub-intersections 1 2 srcPtr 3 0 1 3 3 4 7 8 11 12 13 17 18 20 21 21 22 24 4 dstIdx 5 1 0 3 1 5 6 7 5 4 5 7 5 2 0 1 2 3 0 0 2 7 7 4 6 6 7 Logical Adjacency List Tiled CSR Representation with Tiling

Benefits of Tiling 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 0 1 1 2 2 - Bad locality 3 3  Good locality 4 4 5 5 6 6  Good locality 7  Good locality 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 Memory Access Pattern without Tiling Memory Access Pattern with Tiling

Benefits of Tiling 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 0 1 1 Sub-intersection 1 2 2 (trivially empty) Intersection 3 3 4 4 Sub-intersection 2 5 5 (trivially empty) 6 6 7 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 Intersection without Tiling Intersection with Tiling

Benefits of Tiling 0 1 2 3 4 5 6 7 1.8 0 Tiling is faster 1.7 1 No tiling is faster Sub-intersection 1 2 1.6 (trivially empty) 3 1.5 4 Speedup of Tiling Sub-intersection 2 5 1.4 (trivially empty) 6 1.3 7 1.2 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 0 1.1 1 1 1.0 2 2 3 3 0.9 4 4 5 5 0.8 6 6 1 2 4 8 16 32 7 7 Average Vertex Degree Intersection with Tiling k = 3

Parallelizing Intersections 0 1 2 3 4 5 6 7 1.3 0 Parallelization is faster Speedup of Parallelizing Intersections 1 1.2 No parallelization is faster Sub-intersection 1 2 3 1.1 4 Sub-intersection 2 5 1.0 6 7 0.9 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 0 0.8 1 1 2 2 0.7 3 3 4 4 5 5 0.6 6 6 10 1 10 2 10 3 10 4 10 5 10 6 10 7 10 8 10 9 10 100 1,00010,000 100,000 1,000,000 10,000,000 100,000,000 1,000,000,000 7 7 Number of Edges k = 3

Removing Deleted Edges Intermediately 2.2 srcPtr Removing deleted 2.0 edges intermediately is dstIdx Speedup of Removing Deleted faster 1.8 Not removing deleted Edges Intermediately weak edges edges intermediately is 1.6 faster Mark deleted edges 1.4 srcPtr  No overhead to remove edges 1.2 dstIdx x x x x x 1.0 0.8 Remove deleted edges (for select iterations) srcPtr 0.6  Shorter intersections 0.4 dstIdx 10 1 1 10 2 10 100 1,000 10 3 10 4 10,000 10 5 100,000 1,000,000 10 6 10,000,000 10 7 100,000,000 1,000,000,000 10 8 10 9 Number of Edges k = k max

Recomputing Support for All or Affected Edges Edges that are not affected and whose threads do not need to recount Edges that are not affected but whose threads need to recount on behalf of affected edges Graphs performing better with Edges that are affected and whose threads need to recount Weak edges that were deleted only affected edges reprocessed: • graph500-scale20-ef16 1 1 0 0 • graph500-scale21-ef16 • graph500-scale23-ef16 2 3 2 3 For further investigation: • Recomputing for affected edges on select iterations (later iterations) 4 5 4 5 Undirected Graph Directed Graph

Marking Affected Edges Edges that are not affected and whose threads do not need to recount Edges that are not affected but whose threads need to recount on behalf of affected edges Edges that are affected and whose threads need to recount Weak edges that were deleted 1 0 01: parallel for e = { u , v } ∈ E do 02: if e is deleted then 03: mark u as affected, mark v as affected 2 3 4 4 5 5 Pseudocode for Marking Affected Edges

Marking Affected Edges Edges that are not affected and whose threads do not need to recount Edges that are not affected but whose threads need to recount on behalf of affected edges Edges that are affected and whose threads need to recount Weak edges that were deleted 1 0 01: parallel for e = { u , v } ∈ E do 02: if e is deleted then 03: mark u as affected, mark v as affected 04: parallel for e = { u , v } ∈ E do 05: if e is not deleted and ( u is affected or v is affected) then 06: mark e as affected 2 2 3 3 07: if u is not affected then mark u as needs to recount 08: else if v is not affected then mark v as needs to recount 4 5 Pseudocode for Marking Affected Edges

Marking Affected Edges Edges that are not affected and whose threads do not need to recount Edges that are not affected but whose threads need to recount on behalf of affected edges Edges that are affected and whose threads need to recount Weak edges that were deleted 1 0 01: parallel for e = { u , v } ∈ E do 02: if e is deleted then 03: mark u as affected, mark v as affected 04: parallel for e = { u , v } ∈ E do 05: if e is not deleted and ( u is affected or v is affected) then 06: mark e as affected 2 3 07: if u is not affected then mark u as needs to recount 08: else if v is not affected then mark v as needs to recount 09: parallel for e = { u , v } ∈ E do 10: if e is not deleted and e is not affected then 11: if u needs to recount or v needs to recount then 12: mark e as needs to recount 4 5 Pseudocode for Marking Affected Edges

Comparison with Prior Champions 8 Speedup over 2018 Champions 4 (Bisson & Fatica) 2 1 0.5 0.25 10 1 10 10 2 100 1,000 10 3 10,000 10 4 100,000 10 5 1,000,000 10 6 10,000,000 10 7 100,000,000 1,000,000,000 10 8 10 9 Number of Edges k = 3

KTrussExplorer: Exploring the Design Space of K-truss Decomposition Optimizations on GPUs Safaa Diab, Mhd Ghaith Olabi, Izzat El Hajj American University of Beirut github.com/ielhajj/ktruss-explorer

Space of K-truss Decomposition Optimizations on GPUs Safaa Diab, Mhd - PowerPoint PPT Presentation

KTrussExplorer: Exploring the Design Space of K-truss Decomposition Optimizations on GPUs Safaa Diab, Mhd Ghaith Olabi, Izzat El Hajj American University of Beirut HPEC Graph Challenge September 23, 2020 Overview KTrussExplorer is a highly

Truss St Tru s Structures Truss Definitions and Details 1 Truss: Mimic Beam Behavior 2

Truss Bridges of Kentucky 1899 Amanda Abner Rebecca Turner 1893 Truss Bridges of Kentucky

truss structures Chris Hunt, Michael Wisnom, Ben Woods CDT Conference 2019 16 th April 2019 2

Thermal decomposition of the Thermal decomposition of the Thermal decomposition of the Thermal

Polar Decomposition of a Matrix Garrett Buffington May 4, 2014 The Polar Decomposition SVD and

Member of the National Lumber Family of Companies | Headquarters in Mansfield, MA Reliable Truss

UNCERTAIN GEOMETRY Truss structures Frame structures Civil engineering codes ACI Code (USA)

H O W E T R U S S HISTORY, USE AND STRUCTURAL ANALYSIS 143 bridges are supported by the Howe

Warren Truss (Not the Politician) ARCH 3281 History Use of trusses trace back to classical Greek

Truss Decomposition on Shared-Memory Parallel Systems Shaden Smith 1 , 2 , Xing Liu 2 , Nesreen K.

[11] The Singular Value Decomposition The Singular Value Decomposition Gene Golubs license

CORE DECOMPOSITION AND DENSEST SUBGRAPH IN MULTILAYER NETWORKS CORE DECOMPOSITION AND DENSEST

Baldwin Space Summary October 25 1 Baldwin School Space Summary 2 Baldwin School Space Summary

Magnetars and Giant Flares Mark Allen, Nikki Truss Introduction First hypothesized to explain

BLAIR COVERED BRIDGE CAMPTON, NH Removal of Pavement Preparation for Concrete Siding Removal

truss-like mixed mode cohesive laws is independent 0 0 of the opening path, i.e. when

Communication Complexity Lecture 23 Computing with remote inputs 1 Communication Complexity

Impossible Tilings Kabe Moen Washington University in St. Louis Kabe Moen Washington University

Homework Logistics Lecture Outline Strengthening Induction Hypothesis. Theorem: The sum of the

OpenMapTiles: Vector tiles from OpenStreetMap Petr Pridal <petr.pridal@maptiler.com>

= 12 12 14 14 14 23 23 23 34 34 34 1 1 2 2 2 4 4 4 3 3 3

caagt Toroidal azulenoids p.1/30 Outline 1. Motivation 2. Translation to tiles 3. Tools

CS 70: Discrete Math and Probability. Last time Strenthening Induction Hypothesis. P ( n ) = 3

Two-by-two Substitution Systems and Tilings Nicolas Ollinger LIFO, Universit dOrlans

Space of K-truss Decomposition Optimizations on GPUs Safaa Diab, Mhd - PowerPoint PPT Presentation

KTrussExplorer: Exploring the Design Space of K-truss Decomposition Optimizations on GPUs Safaa Diab, Mhd Ghaith Olabi, Izzat El Hajj American University of Beirut HPEC Graph Challenge September 23, 2020 Overview KTrussExplorer is a highly

Truss St Tru s Structures Truss Definitions and Details 1 Truss: Mimic Beam Behavior 2

Truss Bridges of Kentucky 1899 Amanda Abner Rebecca Turner 1893 Truss Bridges of Kentucky

truss structures Chris Hunt, Michael Wisnom, Ben Woods CDT Conference 2019 16 th April 2019 2

Thermal decomposition of the Thermal decomposition of the Thermal decomposition of the Thermal

Polar Decomposition of a Matrix Garrett Buffington May 4, 2014 The Polar Decomposition SVD and

Member of the National Lumber Family of Companies | Headquarters in Mansfield, MA Reliable Truss

UNCERTAIN GEOMETRY Truss structures Frame structures Civil engineering codes ACI Code (USA)

H O W E T R U S S HISTORY, USE AND STRUCTURAL ANALYSIS 143 bridges are supported by the Howe

Warren Truss (Not the Politician) ARCH 3281 History Use of trusses trace back to classical Greek

Truss Decomposition on Shared-Memory Parallel Systems Shaden Smith 1 , 2 , Xing Liu 2 , Nesreen K.

[11] The Singular Value Decomposition The Singular Value Decomposition Gene Golubs license

CORE DECOMPOSITION AND DENSEST SUBGRAPH IN MULTILAYER NETWORKS CORE DECOMPOSITION AND DENSEST

Baldwin Space Summary October 25 1 Baldwin School Space Summary 2 Baldwin School Space Summary

Magnetars and Giant Flares Mark Allen, Nikki Truss Introduction First hypothesized to explain

BLAIR COVERED BRIDGE CAMPTON, NH Removal of Pavement Preparation for Concrete Siding Removal

truss-like mixed mode cohesive laws is independent 0 0 of the opening path, i.e. when

Communication Complexity Lecture 23 Computing with remote inputs 1 Communication Complexity

Impossible Tilings Kabe Moen Washington University in St. Louis Kabe Moen Washington University

Homework Logistics Lecture Outline Strengthening Induction Hypothesis. Theorem: The sum of the

OpenMapTiles: Vector tiles from OpenStreetMap Petr Pridal &lt;petr.pridal@maptiler.com&gt;

= 12 12 14 14 14 23 23 23 34 34 34 1 1 2 2 2 4 4 4 3 3 3

caagt Toroidal azulenoids p.1/30 Outline 1. Motivation 2. Translation to tiles 3. Tools

CS 70: Discrete Math and Probability. Last time Strenthening Induction Hypothesis. P ( n ) = 3

Two-by-two Substitution Systems and Tilings Nicolas Ollinger LIFO, Universit dOrlans

OpenMapTiles: Vector tiles from OpenStreetMap Petr Pridal <petr.pridal@maptiler.com>