cs 744 powergraph
play

CS 744: Powergraph Shivaram Venkataraman Fall 2020 ADMINISTRIVIA - PowerPoint PPT Presentation

CS 744: Powergraph Shivaram Venkataraman Fall 2020 ADMINISTRIVIA ! ! - Midterm update Tonight - Course Project reminders groups Discussion - id - email Group Number : - Piazza group corresponding the You can join - week !


  1. CS 744: Powergraph Shivaram Venkataraman Fall 2020

  2. ADMINISTRIVIA ! ! - Midterm update Tonight → - Course Project reminders groups Discussion - id - email Group Number : - Piazza group corresponding the You can join - week ! next from start this slot OH : - extra

  3. Applications f- Spark streaming Naiad , Machine Learning SQL Streaming Graph → - - - Computational Engines Scalable Storage Systems Resource Management Datacenter Architecture

  4. ↳ GRAPH DATA Datasets Application " friend " - > recommendation graph network pair .ir .am#y.mrtgg..ponltrgeg:i7nm Social I . link PageRank web pages , Internet ! → 2 . connected Hosts are out , e. Fagots s . → etc others Paper 't cites Papert cites . 4 . actor frame . .ru/Btonimt ! dependencies Software 5 . Akka . . . → Spark

  5. ↳ GRAPH ANALYTICS on queries see L Perform computations on graph-structured data data Tabular Examples PageRank Shortest path g Connected components → …

  6. 11 " " " of - → Vow Ef PREGEL: PROGRAMMING MODEL vet -7in :c : : : a → Message combiner(Message m1, Message m2): - return Message(m1.value() + m2.value()); - \ void PregelPageRank(Message msg): float total = msg.value(); - - - State q vertex.val = 0.15 + 0.85*total; of - this 4 3 2 vertex foreach(nbr in out_neighbors): - - SendMsg(nbr, vertex.val/num_out_nbrs); = ]rh%eat Neighbors from het messages e) & ° messages coalesces combiner e ) combined message convergence the computation using , Neighbors to Send out msgs (4)

  7. ↳ ↳ ↳ NATURAL GRAPHS skewed ! degree is Distribution of a) small degree vertices have most - very high degree vertices have some q - lead to in skew vertices High degree (2) Communication premiere ( state ) memory computation D a - graphs a such partition Hard to

  8. POWERGRAPH Programming Model: Execution → Gather-Apply-Scatter Better Graph Partitioning with vertex cuts Distributed execution (Sync, Async)

  9. ⑦ → Are GATHER-APPLY-SCATTER Az ⑦ state 0 AHAHA , quieter fedt~veri.IE ⑦ As // gather_nbrs: IN_NBRS Gather: Accumulate info from nbrs gather(Du, D(u,v), Dv): - - - - return Dv.rank / #outNbrs(v) Apply: Accumulated value to vertex sum(a, b): return a+b - → - apply(Du, acc): Scatter: Update adjacent edges, vertices - - rnew = 0.15 + 0.85 * acc - Du.delta = (rnew - Du.rank)/ accumulator in value returns - father an → change #outNbrs(u) ! accumulators - You Du.rank = rnew combine can spark - in reduction - similar to // scatter_nbrs: OUT_NBRS - scatter(Du,D(u,v),Dv): vertex from neighboring Activate a on if(|Du.delta|> ε) Activate(v) only to Allows scatter us return delta → next in vertices necessary process iteration

  10. ↳ Could into run EXECUTION MODEL, CACHING conditions race h Hath na : :* .li#*.eaon-atel:oii Ftl machine Single vertex . !÷7n÷e gather . . • - state ' ' ¥¥ - F u ,qedge Active Queue . P - - - . . . Huyser .fm/aaufaa.fau4aIy ¥ ⇒ ¥ → ✓ apses ? ! .ae ) . . Eat Eat . scatter UD u÷÷÷+.l÷÷at → mainframes need " Delta caching Cache accumulator value for vertex → operations future Optionally scatter returns a delta → - A- sync Accumulate deltas Syne rs . .

  11. V1 SYNC VS ASYNC of Queue / operations Vz - # ' vs Sync Execution Async Execution . Gather for all active vertices, Execute active vertices, → - - - - followed by Apply, Scatter as cores become available - Barrier Barrier after each minor-step No Barriers! Optionally serializable read GUD Vertenl her , state vertex ensures updates neighbor AUD GUD → edge state . huh ) update Barrier state ? update Acv Acu ) visible in is local Alva , state GCVD mirror Barrier next so ? step :

  12. DISTRIBUTED EXECUTION Symmetric system, no coordinator state 1 € :E Load graph into each machine partition Communicate across machines to spread updates, read state

  13. GRAPH PARTITIONING mirror mirror I 1 - O ' O ① placed placed a on is is edge → Every vertex Every → machine machine a them across across be span might might Edges Vertices → machines graphs v. Natural balance for → edges across Better lots of → → graphs natural machines !

  14. ↳ ↳ qmachiez RANDOM, GREEDY OBLIVIOUS t - - machine I - - ← - Three distributed approaches: ② B Random Placement through edges stream machine random edge to send a Coordinated Greedy Placement that already has machine 6- send edge a vertices its of one don't have Oblivious Greedy Placement parallel so you ↳ greedy in machine vertex → knowledge of perfect

  15. OTHER FEATURES Async Serializable engine - Preventing adjacent vertex from running simultaneously Acquire locks for all adjacent vertices → Fault Tolerance [IIFhfFj super step Checkpoint at the end of super-step for sync .

  16. SUMMARY Gather-Apply-Scatter programming model Vertex cuts to handle power-law graphs Balance computation, minimize communication

  17. DISCUSSION https://forms.gle/rKB5hcJgT4NQsFgq8

  18. ↳ Consider the PageRank implementation in Spark vs synchronous PageRank in PowerGraph. What are some reasons why PowerGraph might be faster? computation wasteful Activate ensures no → - graph - grained Power communication in fine → ! partitioning Better computation avoids caching → Delta →

  19. NEXT STEPS Next class: GraphX Co - partitioning spark Partitioning in → : .me?:::::E:::nYJsrr . µ ! iterations ✓ methods to has Power graph vertices go what fick partition in a

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend