perfdebug performance debugging of computation skew in
play

PerfDebug: Performance Debugging of Computation Skew in Dataflow - PowerPoint PPT Presentation

PerfDebug: Performance Debugging of Computation Skew in Dataflow Systems Jason Teoh, Muhammad Ali Gulzar, Harry Xu, Miryung Kim University of California, Los Angeles Motivating Example Server Logs Cron Day 1 20GB Web Server Anomaly


  1. PerfDebug: Performance Debugging of Computation Skew in Dataflow Systems Jason Teoh, Muhammad Ali Gulzar, Harry Xu, Miryung Kim University of California, Los Angeles

  2. Motivating Example Server Logs Cron Day 1 20GB Web Server Anomaly Detection 2

  3. Motivating Example Execution Time : 28 s Server Logs Cron Day 1 20GB Web Server Anomaly Detection 3

  4. Motivating Example Execution Time : 28 s Server Logs Cron Day 1 20GB Execution Time : 25 s Cron Day 2 20GB Web Server Anomaly Detection 4

  5. Motivating Example Execution Time : 28 s Server Logs Cron Day 1 20GB Execution Time : 25 s Cron Day 2 20GB Cron Day 3 Web Server 20GB Execution Time : 92 s Anomaly Detection 5

  6. Motivating Example Execution Time : 28 s Server Logs Cron Day 1 20GB Execution Time : 25 s Cron Day 2 20GB Cron Day 3 Web Server 20GB Execution Time : 92 s Anomaly Detection 6

  7. Motivating Example Execution Time : 28 s Server Logs Cron Day 1 20GB Execution Time : 25 s Cron Day 2 20GB Why does my job run slowly for day 3’s data? Cron Day 3 Web Server 20GB Execution Time : 92 s Anomaly Detection 7

  8. Data Skew in Distributed Processing Worker1 Worker2 Worker3 Uneven distribution of data across partitions, tasks, or workers can lead to performance delays. 8

  9. Computation Skew User-defined function commonDefs = { Term Term Latency “Hello World”: ..,, “Big Data”: ..,, Hello World Hello World 2 ms “Debugging”: ..., ... Big Data Big Data 1 ms } Debugging Debugging 3 ms if (commonDefs.contains(term)) { return commonDefs.get(term) PerfDebug PerfDebug 442 ms } else { r = new r = new RedisClient RedisClient(…) (…) return return r.get r.get(term) (term) } Uneven distribution of computation due to interactions between data and application code. 9

  10. Computation Skew Why is it challenging? • Requires insight on how application code interacts with data. • Occurs across multiple stages. • Affected applications are inherently expensive to run. • Isolating individual records that impact performance is difficult with existing tools. 10

  11. Performance Debugging of Computation Skew Input: Output: Spark program, Individual records input data responsible for computation skew PerfDebug Data Provenance Expensive Record Computation + Record-Level Identification Skew Detection Latency 14

  12. PerfDebug Approach Input: Output: Spark program, Individual records input data responsible for computation skew PerfDebug Data Provenance Expensive Record Computation + Record-Level Identification Skew Detection Latency 15

  13. Data Expensive Computation Provenance + Record Skew Record-Level Computation Skew Detection Identification Detection Latency • PerfDebug monitors task-level metrics such as latency, garbage collection, and serialization using SparkListener API. • If potential computation skew is found, rerun the user program in debugging mode to collect additional information. 17

  14. PerfDebug Approach Input: Output: Spark program, Individual records input data responsible for computation skew PerfDebug Data Provenance Expensive Record Computation + Record-Level Identification Skew Detection Latency 18

  15. Data Expensive Computation Provenance + Record Skew Record-Level Capture Data Provenance Identification Detection Latency Stage 1 reduceByKey lines map (map-side) Stage 2 reduceByKey map (reduce-side) Titian [VLDB 2016] provides data provenance using provenance tables at the start/end of stages to track input-output record mappings. 19

  16. Data Expensive Computation Provenance + Record Skew Record-Level Capture Data Provenance Identification Detection Latency Input ID Output ID Input ID Output ID Stage 1 offset1 id1 {id1, id3} (0, 100) offset2 id2 reduceByKey lines map (map-side) {id2} (0, 200) offset3 id3 … … … … Input ID Output ID Input ID Output ID Stage 2 (0, 100) 100 100 output1 reduceByKey (1, 100) 100 map 200 output2 (reduce-side) (0, 200) 200 … … … … Titian [VLDB 2016] provides data provenance using provenance tables at the start/end of stages to track input-output record mappings. 20

  17. Data Expensive Computation Provenance + Record Skew Record-Level Capture Data Provenance Identification Detection Latency Input ID Output ID Input ID Output ID Stage 1 offset1 id1 {id1, id3} (0, 100) offset2 id2 reduceByKey lines map (map-side) {id2} (0, 200) offset3 id3 … … … … Input ID Output ID Input ID Output ID Stage 2 (0, 100) 100 100 output1 reduceByKey (1, 100) 100 map 200 output2 (reduce-side) (0, 200) 200 … … … … Titian [VLDB 2016] provides data provenance using provenance tables at the start/end of stages to track input-output record mappings. 21

  18. Data Expensive Computation Provenance + Record Skew Record-Level Capture Data Provenance Identification Detection Latency Input ID Output ID Input ID Output ID Stage 1 offset1 id1 {id1, id3} (0, 100) offset2 id2 reduceByKey lines map (map-side) {id2} (0, 200) offset3 id3 … … … … Input ID Output ID Input ID Output ID Stage 2 (0, 100) 100 100 output1 reduceByKey (1, 100) 100 map 200 output2 (reduce-side) (0, 200) 200 … … … … Titian [VLDB 2016] provides data provenance using provenance tables at the start/end of stages to track input-output record mappings. 22

  19. Data Expensive Computation Provenance + Record Skew Record-Level Capture Data Provenance Identification Detection Latency Input ID Output ID Input ID Output ID Stage 1 offset1 id1 {id1, id3}(0, 100) offset2 id2 reduceByKey lines map (map-side) {id2} (0, 200) offset3 id3 … … … … Input ID Output ID Input ID Output ID Stage 2 (0, 100) 100 100 output1 reduceByKey (1, 100) 100 map 200 output2 (reduce-side) (0, 200) 200 … … … … Titian [VLDB 2016] provides data provenance using provenance tables at the start/end of stages to track input-output record mappings. 23

  20. Data Expensive Computation Provenance + Record Skew Record-Level Capture Data Provenance Identification Detection Latency Input ID Output ID Input ID Output ID Stage 1 offset1 id1 {id1, id3}(0, 100) offset2 id2 reduceByKey lines map (map-side) {id2} (0, 200) offset3 id3 … … … … Input ID Output ID Input ID Output ID Stage 2 (0, 100) 100 100 output1 reduceByKey (1, 100) 100 map 200 output2 (reduce-side) (0, 200) 200 … … … … Titian [VLDB 2016] provides data provenance using provenance tables at the start/end of stages to track input-output record mappings. 24

  21. Data Expensive Computation Provenance + Record Skew Record-Level Measure UDF Latency Identification Detection Latency Input ID Output ID Input ID Output ID Stage 1 offset1 id1 {id1, id3} (0, 100) offset2 id2 reduceByKey lines map (map-side) {id2} (0, 200) offset3 id3 … … … … Input ID Output ID Input ID Output ID Stage 2 (0, 100) 100 100 output1 reduceByKey (1, 100) 100 map 200 output2 (reduce-side) (0, 200) 200 … … … … PerfDebug extends Titian by capturing summed UDF execution times. 25

  22. Data Expensive Computation Provenance + Record Skew Record-Level Measure UDF Latency Identification Detection Latency Input ID Output ID Input ID Output ID Stage 1 7 ms 3 ms offset1 id1 {id1, id3} (0, 100) offset2 id2 reduceByKey lines map (map-side) {id2} (0, 200) offset3 id3 … … … … Input ID Output ID Input ID Output ID Stage 2 (0, 100) 100 100 output1 reduceByKey (1, 100) 100 map 200 output2 (reduce-side) (0, 200) 200 … … … … PerfDebug extends Titian by capturing summed UDF execution times. 26

  23. Data Expensive Computation Provenance + Record Skew Record-Level Measure UDF Latency Identification Detection Latency Input ID Output ID Input ID Input ID Output ID Output ID UDF Latency Stage 1 7 ms 3 ms offset1 id1 {id1, id3} (0, 100) {id1, id3} (0, 100) 7 + 3 = 10 ms offset2 id2 reduceByKey lines map (map-side) {id2} {id2} (0, 200) (0, 200) offset3 id3 … … … … … … Input ID Output ID Input ID Output ID Stage 2 (0, 100) 100 100 output1 reduceByKey (1, 100) 100 map 200 output2 (reduce-side) (0, 200) 200 … … … … PerfDebug extends Titian by capturing summed UDF execution times. 27

  24. Data Expensive Computation Provenance + Record Skew Record-Level Measure UDF Latency Identification Detection Latency Input ID Output ID Input ID Input ID Output ID UDF Latency Output ID Stage 1 offset1 id1 {id1, id3} (0, 100) {id1, id3} (0, 100) 10 ms offset2 id2 reduceByKey lines map (map-side) {id2} {id2} (0, 200) (0, 200) 20 ms offset3 id3 … … … … … … … Input ID Output ID Input ID Input ID Output ID Output ID UDF Latency Stage 2 (0, 100) 100 100 100 output1 output1 30 ms reduceByKey (1, 100) 100 map 200 200 output2 output2 40 ms (reduce-side) (0, 200) 200 … … … … … … … PerfDebug extends Titian by capturing summed UDF execution times. 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend