Applications of interest Computational biology Social network - PowerPoint PPT Presentation

Coordinating More Than 3 Million CUDA Threads for Social Network Analysis Adam McLaughlin

Applications of interest… • Computational biology • Social network analysis • Urban planning • Epidemiology • Hardware verification GTC 2015 2

Applications of interest… • Computational biology • Social network analysis • Urban planning • Epidemiology • Hardware verification • Common denominator: Grap aph Ana nalysis is GTC 2015 3

Challenges in Network Analysis • Size – Networks cannot be manually inspected • Varying structural properties – Small-world, scale-free, meshes, road networks • Not a one-size fits all problem • Unpredictable – Data-dependent memory access patterns GTC 2015 4

Betweenness Centrality • Determine the importance of a vertex in a network – Requires the solution of the APSP problem • Applications are manifold • Computationally demanding – 𝑃 𝑛𝑜 time complexity GTC 2015 5

Defining Betweenness Centrality • Formally, the BC score of a vertex is defined as: 𝐶𝐷 𝑤 = 𝜏 𝑡𝑢 (𝑤) 𝜏 𝑡𝑢 𝑡≠t≠v • 𝜏 𝑡𝑢 is the number of shortest paths from 𝑡 to 𝑢 • 𝜏 𝑡𝑢 (𝑤) is the number of those paths passing through 𝑤 𝜏 𝑡𝑢 = 2 𝜏 𝑡𝑢 (𝑤) = 1 u GTC 2015 6

Brandes’s Algorithm 1. 1. Shor ortest test pat ath h ca calc lculation ulation ( downward ) 2. 2. Dep epen endency dency ac accum cumulation ulation ( upward ) – Dependency: 𝜏 𝑡𝑤 𝜀 𝑡𝑤 = 1 + 𝜀 𝑡𝑥 𝜏 𝑡𝑥 𝑥∈𝑡𝑣𝑑𝑑(𝑤) – Redefine BC scores as: 𝐶𝐷 𝑤 = 𝜀 𝑡𝑤 𝑡≠v GTC 2015 7

Prior GPU Implementations • Vertex and Edge Parallelism [Jia et al . (2011)] – Same coarse-grained strategy – Edge-parallel approach better utilizes the GPU • GPU-FAN [Shi and Zhang (2011)] – Reported 11-19% speedup over Jia et al. • Results were limited in scope – Devote entire GPU to fine-grained parallelism • Both use large 𝑃 𝑛 , 𝑃 𝑜 2 predecessor arrays – Our approach: eliminate iminate this s array ray • Both use 𝑃(𝑜 2 + 𝑛) graph traversals – Our approach: trad ade-off off memory mory bandwid width th and excess ss work GTC 2015 8

Coarse-grained Parallelization Strategy GTC 2015 9

Fine-grained Parallelization Strategy • Edge-parallel downward traversal • Threads are 𝒆 = 𝟏 assigned to each 𝑒 = 1 edge – Only a subset is 𝑒 = 2 active • Balanced amount 𝑒 = 3 of work per thread 𝑒 = 4 GTC 2015 10

Fine-grained Parallelization Strategy • Edge-parallel downward traversal • Threads are 𝑒 = 0 assigned to each 𝒆 = 𝟐 edge – Only a subset is 𝑒 = 2 active • Balanced amount 𝑒 = 3 of work per thread 𝑒 = 4 GTC 2015 11

Fine-grained Parallelization Strategy • Edge-parallel downward traversal • Threads are 𝑒 = 0 assigned to each 𝑒 = 1 edge – Only a subset is 𝒆 = 𝟑 active • Balanced amount 𝑒 = 3 of work per thread 𝑒 = 4 GTC 2015 12

Fine-grained Parallelization Strategy • Edge-parallel downward traversal • Threads are 𝑒 = 0 assigned to each 𝑒 = 1 edge – Only a subset is 𝑒 = 2 active • Balanced amount 𝒆 = 𝟒 of work per thread 𝑒 = 4 GTC 2015 13

Fine-grained Parallelization Strategy • Edge-parallel downward traversal • Threads are 𝑒 = 0 assigned to each 𝑒 = 1 edge – Only a subset is 𝑒 = 2 active • Balanced amount 𝑒 = 3 of work per thread 𝒆 = 𝟓 GTC 2015 14

Fine-grained Parallelization Strategy • Work-efficient downward traversal • Threads are 𝒆 = 𝟏 assigned vertices 𝑒 = 1 in n the he fro rontier ntier – Use an explicit 𝑒 = 2 queue • Variable number of 𝑒 = 3 edges to traverse 𝑒 = 4 per thread GTC 2015 15

Fine-grained Parallelization Strategy • Work-efficient downward traversal • Threads are 𝑒 = 0 assigned vertices 𝒆 = 𝟐 in n the he fro rontier ntier – Use an explicit 𝑒 = 2 queue • Variable number of 𝑒 = 3 edges to traverse 𝑒 = 4 per thread GTC 2015 16

Fine-grained Parallelization Strategy • Work-efficient downward traversal • Threads are 𝑒 = 0 assigned vertices 𝑒 = 1 in n the he fro rontier ntier – Use an explicit 𝒆 = 𝟑 queue • Variable number of 𝑒 = 3 edges to traverse 𝑒 = 4 per thread GTC 2015 17

Fine-grained Parallelization Strategy • Work-efficient downward traversal • Threads are 𝑒 = 0 assigned vertices 𝑒 = 1 in n the he fro rontier ntier – Use an explicit 𝑒 = 2 queue • Variable number of 𝒆 = 𝟒 edges to traverse 𝑒 = 4 per thread GTC 2015 18

Fine-grained Parallelization Strategy • Work-efficient downward traversal • Threads are 𝑒 = 0 assigned vertices 𝑒 = 1 in n the he fro rontier ntier – Use an explicit 𝑒 = 2 queue • Variable number of 𝑒 = 3 edges to traverse 𝒆 = 𝟓 per thread GTC 2015 19

Motivation for Hybrid Methods • No one method of parallelization works best • High diameter: Only do useful work • Low diameter: Leverage memory bandwidth GTC 2015 20

Sampling Approach • Idea: Processing one source vertex takes 𝑃(𝑛 + 𝑜) time – Can process a small sample of vertices fast! • Estimate the diameter of the graph’s connected components – Store the maximum BFS distance found from each of the first 𝑙 vertices – 𝑒𝑗𝑏𝑛𝑓𝑢𝑓𝑠 ≈ 𝑛𝑓𝑒𝑗𝑏𝑜(𝑒𝑗𝑡𝑢𝑏𝑜𝑑𝑓𝑡) • Completes useful work rather than preprocessing the graph! GTC 2015 21

Experimental Setup • Single-node • Multi-node (KIDS) – CPU (4 Cores) – CPUs (2 x 4 Cores) • Intel Core i7-2600K • Intel Xeon X5560 • 3.4 GHz, 8MB Cache • 2.8 GHz, 8 MB Cache – GPU – GPUs (3) • NVIDIA GeForce GTX • NVIDIA Tesla M2090 Titan • 16 SMs, 1.3 GHz, 6 GB • 14 SMs, 837 MHz, 6 GDDR5 GB GDDR5 • Compute Capability 2.0 • Compute Capability 3.5 – Infiniband QDR Network • All times are reported in seconds GTC 2015 22

Benchmark Data Sets Name Vertices Edges Diam. Significance af_shell9 504,855 8,542,010 497 Sheet Metal Forming caidaRouterLevel 192,244 609,066 25 Internet Router Level cnr-2000 325,527 2,738,969 33 Web crawl com-amazon 334,863 925,872 46 Product co-purchasing 1,048,576 3,145,686 444 Random Triangulation delaunay_n20 kron_g500-logn20 524,288 21,780,787 6 Kronecker Graph loc-gowalla 196,591 1,900,654 15 Geosocial luxembourg.osm 114,599 119,666 1,336 Road Network rgg_n_2_20 1,048,576 6,891,620 864 Random Geometric 100,000 499,998 9 Logarithmic Diameter smallworld GTC 2015 23

Scaling Results (rgg) • Random geometric graphs • Sampling beats GPU- FAN by 12x for all scales GTC 2015 24

Scaling Results (rgg) • Random geometric graphs • Sampling beats GPU- FAN by 12x for all scales • Similar amount of time to process a graph 4x as large! GTC 2015 25

Scaling Results (Delaunay) • Sparse meshes • Speedup grows with graph scale GTC 2015 26

Scaling Results (Delaunay) • Sparse meshes • Speedup grows with graph scale • When edge-parallel is best it’s best by a matter of ms GTC 2015 27

Scaling Results (Delaunay) • Sparse meshes • Speedup grows with graph scale • When edge-parallel is best it’s best by a matter of ms • When sampling is best it’s by a matter of days GTC 2015 28

Benchmark Results • Road networks and meshes see ~10x improvement – af_shell : 2.5 days → 5 hours • Modest improvements otherwise • 2.71x Average speedup GTC 2015 29

Multi-GPU Results • Linear speedups when graphs are sufficiently large • 10+ GTEPS for 192 GPUs • Scaling isn’t unique to graph structure – Abundant coarse- grained parallelism GTC 2015 30

A Back of the Envelope Calculation… • 192 Tesla M2090 GPUs • 16 Streaming Multiprocessors per GPU • Maximum of 1024 Threads per Block • 192 ∗ 16 ∗ 1024 = 3,145,728 • Over 3 million CUDA Threads! GTC 2015 31

Conclusions • Work-efficient approach obtains up to to 13x 3x speed eedup up for high-diameter graphs • Tradeoff between work-efficiency and DRAM utilization maximizes performance – Aver erag age e spe peed edup up is is 2 2.71x 71x for all graphs • Our algorithms easily scale to many GPUs – Linear scaling on up to p to 192 2 GPUs • Our results are co consi nsistent stent ac across ss network twork str tructures ctures GTC 2015 32

Questions? • Contact: Adam McLaughlin, Adam27X@gatech.edu • Advisor: David A. Bader, bader@cc.gatech.edu • Source code: https://github.com/Adam27X/hybrid_BC https://github.com/Adam27X/graph-utils GTC 2015 33

Backup GTC 2015 34

Contributions • A work-effici efficien ent t al algo gorit rithm hm for computing Betweenness Centrality on the GPU – Works especially well for high-diameter graphs • On-line hybrid rid ap appr proac aches hes that coordinate threads based on graph structure • An aver erage ge spe peed edup of p of 2.71x 71x over the best existing methods • A di distrib ibuted uted im impl plem ementa entati tion on that scale les lin inea early y to up p to 192 92 GPUs • Results that are pe performan rmance ce po portable ble across oss the gamut of net etwork rk structure uctures GTC 2015 35

Applications of interest Computational biology Social network - PowerPoint PPT Presentation

Coordinating More Than 3 Million CUDA Threads for Social Network Analysis Adam McLaughlin Applications of interest Computational biology Social network analysis Urban planning Epidemiology Hardware verification GTC 2015 2

Concept 8. Future Value (FV) Interest Simple interest means you only earn interest on the

Interest Rates and Fundamental Fluctuations in Home Values Albert Saiz 1 Saiz Interest Rates

COMPOUND INTEREST FORMULA How to calculate compound interest Courtesy o of Ms.V .Veronica Mo

Interest Rates and Fundamental Fluctuations in Home Values Albert Saiz 1 Saiz Interest

WHAT ABOUT THE INTEREST? The interest charged will vary during study and depending on earnings

SESSION 11: INTEREST RATES Aswath Damodaran Interest Rates: The Basics 2 An interest rate is

Conflict of Interest FDP TRAINING: Organizational Conflict of Interest January 25, 2019 From

Expression of Interest Interest from NIU: from NIU: Expression of Phase Space Manipulation of

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Carried Interest New Carried Interest Rules The Tax Cuts and Jobs Act (TCJA) extended

Conflicts of Interest Tufts University School of Medicine Effective April 5, 2010 1 Industry

8 22 Responses Interest S urvey R esults: Demographics Interest S urvey R esults:

Interest Rate Modeling Background Data Regression Time series Jaime Frade ARIMA CIR

Computing Simple Interest Earned Dianna deposits $725 into a savings account that pays 2.3%

Auditing in the Public Interest Auditing in the Public Interest Victorian Government

A population and distributional study of white-capped albatross (Auckland Islands) Contract

WORKSHOP BEHIND THE SCENES, INDUSTRIAL RAIL DIRECTION GNRALE e.SNCF _ - DIFFUSION LIMITE

Understanding Computer Usage Evolution David C. Anastasiu Department of Computer Science &

Informatics Education in schools Pierre Paradinas Socit Informatique de France A very long

Study of Geo-Social Networks, Social Cascades and Applications Cecilia Mascolo Computer

Autonomous Closed-loop Tasking, Acquisition, Processing, and Evaluation for Situational Awareness

Where Do Tourists Go? Visualizing and Analyzing the Spatial Distribution of Geotagged

WORK Applications Sateltrack WORK is an advanced Fleet Management software using GPS technology,

Applications of interest Computational biology Social network - PowerPoint PPT Presentation

Coordinating More Than 3 Million CUDA Threads for Social Network Analysis Adam McLaughlin Applications of interest Computational biology Social network analysis Urban planning Epidemiology Hardware verification GTC 2015 2

Concept 8. Future Value (FV) Interest Simple interest means you only earn interest on the

Interest Rates and Fundamental Fluctuations in Home Values Albert Saiz 1 Saiz Interest Rates

COMPOUND INTEREST FORMULA How to calculate compound interest Courtesy o of Ms.V .Veronica Mo

Interest Rates and Fundamental Fluctuations in Home Values Albert Saiz 1 Saiz Interest

WHAT ABOUT THE INTEREST? The interest charged will vary during study and depending on earnings

SESSION 11: INTEREST RATES Aswath Damodaran Interest Rates: The Basics 2 An interest rate is

Conflict of Interest FDP TRAINING: Organizational Conflict of Interest January 25, 2019 From

Expression of Interest Interest from NIU: from NIU: Expression of Phase Space Manipulation of

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Carried Interest New Carried Interest Rules The Tax Cuts and Jobs Act (TCJA) extended

Conflicts of Interest Tufts University School of Medicine Effective April 5, 2010 1 Industry

8 22 Responses Interest S urvey R esults: Demographics Interest S urvey R esults:

Interest Rate Modeling Background Data Regression Time series Jaime Frade ARIMA CIR

Computing Simple Interest Earned Dianna deposits $725 into a savings account that pays 2.3%

Auditing in the Public Interest Auditing in the Public Interest Victorian Government

A population and distributional study of white-capped albatross (Auckland Islands) Contract

WORKSHOP BEHIND THE SCENES, INDUSTRIAL RAIL DIRECTION GNRALE e.SNCF _ - DIFFUSION LIMITE

Understanding Computer Usage Evolution David C. Anastasiu Department of Computer Science &amp;

Informatics Education in schools Pierre Paradinas Socit Informatique de France A very long

Study of Geo-Social Networks, Social Cascades and Applications Cecilia Mascolo Computer

Autonomous Closed-loop Tasking, Acquisition, Processing, and Evaluation for Situational Awareness

Where Do Tourists Go? Visualizing and Analyzing the Spatial Distribution of Geotagged

WORK Applications Sateltrack WORK is an advanced Fleet Management software using GPS technology,

Understanding Computer Usage Evolution David C. Anastasiu Department of Computer Science &