CS 744: Powergraph Shivaram Venkataraman Fall 2020 ADMINISTRIVIA - PowerPoint PPT Presentation

CS 744: Powergraph Shivaram Venkataraman Fall 2020

ADMINISTRIVIA ! ! - Midterm update Tonight → - Course Project reminders groups Discussion - id - email Group Number : - Piazza group corresponding the You can join - week ! next from start this slot OH : - extra

Applications f- Spark streaming Naiad , Machine Learning SQL Streaming Graph → - - - Computational Engines Scalable Storage Systems Resource Management Datacenter Architecture

↳ GRAPH DATA Datasets Application " friend " - > recommendation graph network pair .ir .am#y.mrtgg..ponltrgeg:i7nm Social I . link PageRank web pages , Internet ! → 2 . connected Hosts are out , e. Fagots s . → etc others Paper 't cites Papert cites . 4 . actor frame . .ru/Btonimt ! dependencies Software 5 . Akka . . . → Spark

↳ GRAPH ANALYTICS on queries see L Perform computations on graph-structured data data Tabular Examples PageRank Shortest path g Connected components → …

11 " " " of - → Vow Ef PREGEL: PROGRAMMING MODEL vet -7in :c : : : a → Message combiner(Message m1, Message m2): - return Message(m1.value() + m2.value()); - \ void PregelPageRank(Message msg): float total = msg.value(); - - - State q vertex.val = 0.15 + 0.85*total; of - this 4 3 2 vertex foreach(nbr in out_neighbors): - - SendMsg(nbr, vertex.val/num_out_nbrs); = ]rh%eat Neighbors from het messages e) & ° messages coalesces combiner e ) combined message convergence the computation using , Neighbors to Send out msgs (4)

↳ ↳ ↳ NATURAL GRAPHS skewed ! degree is Distribution of a) small degree vertices have most - very high degree vertices have some q - lead to in skew vertices High degree (2) Communication premiere ( state ) memory computation D a - graphs a such partition Hard to

POWERGRAPH Programming Model: Execution → Gather-Apply-Scatter Better Graph Partitioning with vertex cuts Distributed execution (Sync, Async)

⑦ → Are GATHER-APPLY-SCATTER Az ⑦ state 0 AHAHA , quieter fedt~veri.IE ⑦ As // gather_nbrs: IN_NBRS Gather: Accumulate info from nbrs gather(Du, D(u,v), Dv): - - - - return Dv.rank / #outNbrs(v) Apply: Accumulated value to vertex sum(a, b): return a+b - → - apply(Du, acc): Scatter: Update adjacent edges, vertices - - rnew = 0.15 + 0.85 * acc - Du.delta = (rnew - Du.rank)/ accumulator in value returns - father an → change #outNbrs(u) ! accumulators - You Du.rank = rnew combine can spark - in reduction - similar to // scatter_nbrs: OUT_NBRS - scatter(Du,D(u,v),Dv): vertex from neighboring Activate a on if(|Du.delta|> ε) Activate(v) only to Allows scatter us return delta → next in vertices necessary process iteration

↳ Could into run EXECUTION MODEL, CACHING conditions race h Hath na : :* .li#*.eaon-atel:oii Ftl machine Single vertex . !÷7n÷e gather . . • - state ' ' ¥¥ - F u ,qedge Active Queue . P - - - . . . Huyser .fm/aaufaa.fau4aIy ¥ ⇒ ¥ → ✓ apses ? ! .ae ) . . Eat Eat . scatter UD u÷÷÷+.l÷÷at → mainframes need " Delta caching Cache accumulator value for vertex → operations future Optionally scatter returns a delta → - A- sync Accumulate deltas Syne rs . .

V1 SYNC VS ASYNC of Queue / operations Vz - # ' vs Sync Execution Async Execution . Gather for all active vertices, Execute active vertices, → - - - - followed by Apply, Scatter as cores become available - Barrier Barrier after each minor-step No Barriers! Optionally serializable read GUD Vertenl her , state vertex ensures updates neighbor AUD GUD → edge state . huh ) update Barrier state ? update Acv Acu ) visible in is local Alva , state GCVD mirror Barrier next so ? step :

DISTRIBUTED EXECUTION Symmetric system, no coordinator state 1 € :E Load graph into each machine partition Communicate across machines to spread updates, read state

GRAPH PARTITIONING mirror mirror I 1 - O ' O ① placed placed a on is is edge → Every vertex Every → machine machine a them across across be span might might Edges Vertices → machines graphs v. Natural balance for → edges across Better lots of → → graphs natural machines !

↳ ↳ qmachiez RANDOM, GREEDY OBLIVIOUS t - - machine I - - ← - Three distributed approaches: ② B Random Placement through edges stream machine random edge to send a Coordinated Greedy Placement that already has machine 6- send edge a vertices its of one don't have Oblivious Greedy Placement parallel so you ↳ greedy in machine vertex → knowledge of perfect

OTHER FEATURES Async Serializable engine - Preventing adjacent vertex from running simultaneously Acquire locks for all adjacent vertices → Fault Tolerance [IIFhfFj super step Checkpoint at the end of super-step for sync .

SUMMARY Gather-Apply-Scatter programming model Vertex cuts to handle power-law graphs Balance computation, minimize communication

DISCUSSION https://forms.gle/rKB5hcJgT4NQsFgq8

↳ Consider the PageRank implementation in Spark vs synchronous PageRank in PowerGraph. What are some reasons why PowerGraph might be faster? computation wasteful Activate ensures no → - graph - grained Power communication in fine → ! partitioning Better computation avoids caching → Delta →

NEXT STEPS Next class: GraphX Co - partitioning spark Partitioning in → : .me?:::::E:::nYJsrr . µ ! iterations ✓ methods to has Power graph vertices go what fick partition in a

CS 744: Powergraph Shivaram Venkataraman Fall 2020 ADMINISTRIVIA - PowerPoint PPT Presentation

CS 744: Powergraph Shivaram Venkataraman Fall 2020 ADMINISTRIVIA ! ! - Midterm update Tonight - Course Project reminders groups Discussion - id - email Group Number : - Piazza group corresponding the You can join - week !

CS 744: Powergraph Shivaram Venkataraman Fall 2019 ADMINISTRIVIA - Midterm grades (end of)

PowerGraph Distributed Graph-Parallel Computation on Natural Graphs by Gonzalez, Joseph E., et

Phone Fax 25448 SEIL ROAD 1-815-744-1910 1-815-744-1968 SHOREWOOD, ILLINOIS 60404-7620

PowerGraph: Distributed Graph- Parallel Computation on Natural Graphs J. E. Gonzales, Y. Low, H.

PowerGraph : Distributed Graph-Parallel Computation on Natural Graphs Gonzales et al. James

PowerGraph Distributed Graph-Parallel Computation on Natural Graphs JOSHUA SEND 24/10/2017

Tradeoffs Between Synchronous and Asynchronous Execution in PowerGraph Joshua Send Trinity Hall

2.744 Dreamweaver Tutorial Sangmok Han sangmok@mit.edu Feb 24, 2010 Overview We will go over

QR CODES 4 All Diane Edgar Education Specialist Region 4 ESC 713.744.6862 Handout Follow

CS 744: Big Data Systems Shivaram Venkataraman Fall 2018 ADMINISTRIVIA - Waitlist/Enrollment

Annual Budget 25448 Seil Rd. Shorewood, IL 60404 815-744-1968 www.troytownship.com P a g e | 1

Y R A N I M I L E R P 25448 Seil Rd. Shorewood, IL 60404 815-744-1968

Proposed Town Fund Levy Presentation 25448 Seil Rd. Shorewood, IL 60404 815-744-1968

Proposed Town Fund Levy Presentation 25448 Seil Rd. Shorewood, IL 60404 815-744-1968

Authority Financials Financial Snapshot May 2017 Profit/Loss $593,016 $409,744 Actual

CS 744: GOOGLE FILE SYSTEM Shivaram Venkataraman Fall 2020 ANNOUNCEMENTS no - Assignment 1

Imaging Galactic Dark Matter with IceCubes High-Energy Cosmic Neutrinos Ali Kheirandish The

Motion and Interaction SIGGRAPH 99 Course: Fundamental Issues of Visual Perception for

of Light Perception in Virtual Reality ISMAR 2020 Laura R. Luidolt 1 Michael Wimmer 1 Katharina

Energy Balance Estimates of ET Energy Balance Estimates of ET ET is calculated as a component of

CSE 105Theory of Computability Fall, 2006 Lecture 10October 24 Turing Machines

Variance-based Stochastic Gradient Descent (vSGD): No More Pesky Learning Rates Schaul et al.,

Ch. 5 continued 1 Classification systems Evolutionary Systematics: explains ancestor-descendant

AN ADAPTIVE GATEWAY DISCOVERY IN HYBRID MANETS F. D. Trujillo, A. Trivio, E. Casilari and A.