PerfDebug: Performance Debugging of Computation Skew in Dataflow - PowerPoint PPT Presentation

PerfDebug: Performance Debugging of Computation Skew in Dataflow Systems Jason Teoh, Muhammad Ali Gulzar, Harry Xu, Miryung Kim University of California, Los Angeles

Motivating Example Server Logs Cron Day 1 20GB Web Server Anomaly Detection 2

Motivating Example Execution Time : 28 s Server Logs Cron Day 1 20GB Web Server Anomaly Detection 3

Motivating Example Execution Time : 28 s Server Logs Cron Day 1 20GB Execution Time : 25 s Cron Day 2 20GB Web Server Anomaly Detection 4

Motivating Example Execution Time : 28 s Server Logs Cron Day 1 20GB Execution Time : 25 s Cron Day 2 20GB Cron Day 3 Web Server 20GB Execution Time : 92 s Anomaly Detection 5

Motivating Example Execution Time : 28 s Server Logs Cron Day 1 20GB Execution Time : 25 s Cron Day 2 20GB Cron Day 3 Web Server 20GB Execution Time : 92 s Anomaly Detection 6

Motivating Example Execution Time : 28 s Server Logs Cron Day 1 20GB Execution Time : 25 s Cron Day 2 20GB Why does my job run slowly for day 3’s data? Cron Day 3 Web Server 20GB Execution Time : 92 s Anomaly Detection 7

Data Skew in Distributed Processing Worker1 Worker2 Worker3 Uneven distribution of data across partitions, tasks, or workers can lead to performance delays. 8

Computation Skew User-defined function commonDefs = { Term Term Latency “Hello World”: ..,, “Big Data”: ..,, Hello World Hello World 2 ms “Debugging”: ..., ... Big Data Big Data 1 ms } Debugging Debugging 3 ms if (commonDefs.contains(term)) { return commonDefs.get(term) PerfDebug PerfDebug 442 ms } else { r = new r = new RedisClient RedisClient(…) (…) return return r.get r.get(term) (term) } Uneven distribution of computation due to interactions between data and application code. 9

Computation Skew Why is it challenging? • Requires insight on how application code interacts with data. • Occurs across multiple stages. • Affected applications are inherently expensive to run. • Isolating individual records that impact performance is difficult with existing tools. 10

Performance Debugging of Computation Skew Input: Output: Spark program, Individual records input data responsible for computation skew PerfDebug Data Provenance Expensive Record Computation + Record-Level Identification Skew Detection Latency 14

PerfDebug Approach Input: Output: Spark program, Individual records input data responsible for computation skew PerfDebug Data Provenance Expensive Record Computation + Record-Level Identification Skew Detection Latency 15

Data Expensive Computation Provenance + Record Skew Record-Level Computation Skew Detection Identification Detection Latency • PerfDebug monitors task-level metrics such as latency, garbage collection, and serialization using SparkListener API. • If potential computation skew is found, rerun the user program in debugging mode to collect additional information. 17

PerfDebug Approach Input: Output: Spark program, Individual records input data responsible for computation skew PerfDebug Data Provenance Expensive Record Computation + Record-Level Identification Skew Detection Latency 18

Data Expensive Computation Provenance + Record Skew Record-Level Capture Data Provenance Identification Detection Latency Stage 1 reduceByKey lines map (map-side) Stage 2 reduceByKey map (reduce-side) Titian [VLDB 2016] provides data provenance using provenance tables at the start/end of stages to track input-output record mappings. 19

Data Expensive Computation Provenance + Record Skew Record-Level Capture Data Provenance Identification Detection Latency Input ID Output ID Input ID Output ID Stage 1 offset1 id1 {id1, id3} (0, 100) offset2 id2 reduceByKey lines map (map-side) {id2} (0, 200) offset3 id3 … … … … Input ID Output ID Input ID Output ID Stage 2 (0, 100) 100 100 output1 reduceByKey (1, 100) 100 map 200 output2 (reduce-side) (0, 200) 200 … … … … Titian [VLDB 2016] provides data provenance using provenance tables at the start/end of stages to track input-output record mappings. 20

Data Expensive Computation Provenance + Record Skew Record-Level Capture Data Provenance Identification Detection Latency Input ID Output ID Input ID Output ID Stage 1 offset1 id1 {id1, id3}(0, 100) offset2 id2 reduceByKey lines map (map-side) {id2} (0, 200) offset3 id3 … … … … Input ID Output ID Input ID Output ID Stage 2 (0, 100) 100 100 output1 reduceByKey (1, 100) 100 map 200 output2 (reduce-side) (0, 200) 200 … … … … Titian [VLDB 2016] provides data provenance using provenance tables at the start/end of stages to track input-output record mappings. 23

Data Expensive Computation Provenance + Record Skew Record-Level Capture Data Provenance Identification Detection Latency Input ID Output ID Input ID Output ID Stage 1 offset1 id1 {id1, id3}(0, 100) offset2 id2 reduceByKey lines map (map-side) {id2} (0, 200) offset3 id3 … … … … Input ID Output ID Input ID Output ID Stage 2 (0, 100) 100 100 output1 reduceByKey (1, 100) 100 map 200 output2 (reduce-side) (0, 200) 200 … … … … Titian [VLDB 2016] provides data provenance using provenance tables at the start/end of stages to track input-output record mappings. 24

Data Expensive Computation Provenance + Record Skew Record-Level Measure UDF Latency Identification Detection Latency Input ID Output ID Input ID Output ID Stage 1 offset1 id1 {id1, id3} (0, 100) offset2 id2 reduceByKey lines map (map-side) {id2} (0, 200) offset3 id3 … … … … Input ID Output ID Input ID Output ID Stage 2 (0, 100) 100 100 output1 reduceByKey (1, 100) 100 map 200 output2 (reduce-side) (0, 200) 200 … … … … PerfDebug extends Titian by capturing summed UDF execution times. 25

Data Expensive Computation Provenance + Record Skew Record-Level Measure UDF Latency Identification Detection Latency Input ID Output ID Input ID Output ID Stage 1 7 ms 3 ms offset1 id1 {id1, id3} (0, 100) offset2 id2 reduceByKey lines map (map-side) {id2} (0, 200) offset3 id3 … … … … Input ID Output ID Input ID Output ID Stage 2 (0, 100) 100 100 output1 reduceByKey (1, 100) 100 map 200 output2 (reduce-side) (0, 200) 200 … … … … PerfDebug extends Titian by capturing summed UDF execution times. 26

Data Expensive Computation Provenance + Record Skew Record-Level Measure UDF Latency Identification Detection Latency Input ID Output ID Input ID Input ID Output ID Output ID UDF Latency Stage 1 7 ms 3 ms offset1 id1 {id1, id3} (0, 100) {id1, id3} (0, 100) 7 + 3 = 10 ms offset2 id2 reduceByKey lines map (map-side) {id2} {id2} (0, 200) (0, 200) offset3 id3 … … … … … … Input ID Output ID Input ID Output ID Stage 2 (0, 100) 100 100 output1 reduceByKey (1, 100) 100 map 200 output2 (reduce-side) (0, 200) 200 … … … … PerfDebug extends Titian by capturing summed UDF execution times. 27

Data Expensive Computation Provenance + Record Skew Record-Level Measure UDF Latency Identification Detection Latency Input ID Output ID Input ID Input ID Output ID UDF Latency Output ID Stage 1 offset1 id1 {id1, id3} (0, 100) {id1, id3} (0, 100) 10 ms offset2 id2 reduceByKey lines map (map-side) {id2} {id2} (0, 200) (0, 200) 20 ms offset3 id3 … … … … … … … Input ID Output ID Input ID Input ID Output ID Output ID UDF Latency Stage 2 (0, 100) 100 100 100 output1 output1 30 ms reduceByKey (1, 100) 100 map 200 200 output2 output2 40 ms (reduce-side) (0, 200) 200 … … … … … … … PerfDebug extends Titian by capturing summed UDF execution times. 28

PerfDebug: Performance Debugging of Computation Skew in Dataflow - PowerPoint PPT Presentation

PerfDebug: Performance Debugging of Computation Skew in Dataflow Systems Jason Teoh, Muhammad Ali Gulzar, Harry Xu, Miryung Kim University of California, Los Angeles Motivating Example Server Logs Cron Day 1 20GB Web Server Anomaly

Probability BIO5312 FALL2017 STEPHANIE J. SPIELMAN, PHD Skew Symmetric Left-skew Right-skew

Debugging Debugging Tools Module Overview Introduction to Debugging Problems in Production

Coroutines Update Seva Tolstopyatov @qwwdfsad October 13, 2020 Coroutines debugging Coroutines

On Skew-Homomorphisms B. Kuzma 1 G. Dolinar G. Nagy P . Szokol 1 UP FAMNIT May 28, 2015

Debugging Debugging with High Level Languages Same goals as low-level debugging Examine and

Time skew analysis using web cookies Bj orgvin Ragnarsson 07-03-2013 Time skew analysis using

Hook formulas for skew shapes Greta Panova (University of Pennsylvania) joint with Alejandro

M obius disjointness for skew products on T \ G Jianya LIU Shandong University Cetraro

Heavy tails: right skew ! Right skew ! normal distribution (not heavy tailed) ! e.g. heights of

Braided skew monoidal categories Stephen Lack Macquarie University joint work with John Bourke

Higher product levels of skew fields J. Cimpri c July 1, 2004 1 product levels levels of

Debugging Floating-Point Debugging Floating-Point Debugging Floating-Point Math in Racket Math

Kernel Debugging and Virtualization John Baldwin January 15, 2015 What is Kernel Debugging

DEBUGGING RUBY PERFORMANCE Aman Gupta @tmm1 speakerdeck.com/u/tmm1/p/debugging-ruby-performance

Debugging microservices in production Bryan Cantrill CTO bryan@joyent.com @bcantrill

Scalable Post-Mortem Debugging Abel Mathew CEO - Backtrace amathew@backtrace.io @nullisnt0

CogX Briefing pack for speakers 4th June 2020 How do we get the next 10 years right? The

Management of Hospitalized Patients with Cirrhosis Bilal Hameed, MD Division of Transplant

Enhanced safety monitoring for COVID-19 vaccines in early phase vaccination Tom Shimabukuro, MD,

Thank you to our sponsors! Thank you to our partners! Stay connected! @paceECC

Pipeline Front-end Instruction Fetch & Branch Prediction Instructor: Nima Honarmand Spring

dementia in Finland Dimitrije Jakovljevi MD, PhD Helsinki Health Center, Home Care Services,

Grant Prep Boot Camp Grant Prep Boot Camp Grant Prep Boot Camp Grant Prep Boot Camp Robyn Gershon,

Incorporation of circular economy principles to NDCs and mid-century strategies. Circular

PerfDebug: Performance Debugging of Computation Skew in Dataflow - PowerPoint PPT Presentation

PerfDebug: Performance Debugging of Computation Skew in Dataflow Systems Jason Teoh, Muhammad Ali Gulzar, Harry Xu, Miryung Kim University of California, Los Angeles Motivating Example Server Logs Cron Day 1 20GB Web Server Anomaly

Probability BIO5312 FALL2017 STEPHANIE J. SPIELMAN, PHD Skew Symmetric Left-skew Right-skew

Debugging Debugging Tools Module Overview Introduction to Debugging Problems in Production

Coroutines Update Seva Tolstopyatov @qwwdfsad October 13, 2020 Coroutines debugging Coroutines

On Skew-Homomorphisms B. Kuzma 1 G. Dolinar G. Nagy P . Szokol 1 UP FAMNIT May 28, 2015

Debugging Debugging with High Level Languages Same goals as low-level debugging Examine and

Time skew analysis using web cookies Bj orgvin Ragnarsson 07-03-2013 Time skew analysis using

Hook formulas for skew shapes Greta Panova (University of Pennsylvania) joint with Alejandro

M obius disjointness for skew products on T \ G Jianya LIU Shandong University Cetraro

Heavy tails: right skew ! Right skew ! normal distribution (not heavy tailed) ! e.g. heights of

Braided skew monoidal categories Stephen Lack Macquarie University joint work with John Bourke

Higher product levels of skew fields J. Cimpri c July 1, 2004 1 product levels levels of

Debugging Floating-Point Debugging Floating-Point Debugging Floating-Point Math in Racket Math

Kernel Debugging and Virtualization John Baldwin January 15, 2015 What is Kernel Debugging

DEBUGGING RUBY PERFORMANCE Aman Gupta @tmm1 speakerdeck.com/u/tmm1/p/debugging-ruby-performance

Debugging microservices in production Bryan Cantrill CTO bryan@joyent.com @bcantrill

Scalable Post-Mortem Debugging Abel Mathew CEO - Backtrace amathew@backtrace.io @nullisnt0

CogX Briefing pack for speakers 4th June 2020 How do we get the next 10 years right? The

Management of Hospitalized Patients with Cirrhosis Bilal Hameed, MD Division of Transplant

Enhanced safety monitoring for COVID-19 vaccines in early phase vaccination Tom Shimabukuro, MD,

Thank you to our sponsors! Thank you to our partners! Stay connected! @paceECC

Pipeline Front-end Instruction Fetch &amp; Branch Prediction Instructor: Nima Honarmand Spring

dementia in Finland Dimitrije Jakovljevi MD, PhD Helsinki Health Center, Home Care Services,

Grant Prep Boot Camp Grant Prep Boot Camp Grant Prep Boot Camp Grant Prep Boot Camp Robyn Gershon,

Incorporation of circular economy principles to NDCs and mid-century strategies. Circular

Pipeline Front-end Instruction Fetch & Branch Prediction Instructor: Nima Honarmand Spring