BigDebug: Debugging Primitives for Interactive Big Data - PowerPoint PPT Presentation

BigDebug: ¡Debugging ¡Primitives ¡for ¡ Interactive ¡Big ¡Data ¡Processing ¡in ¡Spark Muhammad ¡ Ali ¡Gulzar, Matteo ¡Interlandi, ¡ Seunghyun ¡Yoo, ¡Sai ¡Deep ¡Tetali, ¡ Tyson ¡Condie, ¡Todd ¡Millstein, ¡Miryung ¡Kim University ¡ of ¡California, ¡ Los ¡Angeles 1

Developing ¡Big ¡Data ¡Analytics • Big ¡Data ¡Analytics ¡is ¡becoming ¡increasingly ¡important. ¡ • Big ¡Data ¡Analytics ¡are ¡built ¡using ¡data ¡intensive ¡computing ¡ platforms ¡such ¡as ¡Map ¡Reduce, ¡Hadoop, ¡and ¡Apache ¡Spark. ¡ Apache Pig Map Reduce 2

Apache ¡Spark: ¡Next ¡Generation ¡Map ¡Reduce • Apache ¡Spark ¡is ¡up ¡to ¡100X ¡faster ¡than ¡Hadoop MapReduce • It ¡is ¡open ¡source ¡and ¡over ¡800 ¡developers ¡have ¡contributed ¡in ¡its ¡ development • 200+ ¡companies ¡are ¡currently ¡using ¡Spark ¡ • Spark ¡also ¡provides ¡libraries ¡such ¡as ¡SparkSQL and ¡Mllib 3

Running ¡a ¡Map ¡Reduce ¡Job ¡on ¡Cluster Filter Map Reduce A ¡user ¡submits ¡a ¡job A ¡job ¡is ¡distributed ¡to ¡ workers ¡in ¡cluster Each ¡worker ¡performs ¡pipelined ¡transformations ¡on ¡a ¡ partition ¡with ¡millions ¡of ¡records 4

Motivating ¡Scenario: ¡Election ¡Record ¡Analysis VoterID Candidate ¡ ¡State ¡ ¡Time • Alice ¡writes ¡a ¡Spark ¡program ¡ 9213 Sanders ¡ ¡ ¡ ¡ ¡ TX 1440023087 that ¡runs ¡correctly ¡on ¡local ¡ 1 val log = "s3n://poll.log" machine ¡(100MB ¡data) ¡but ¡ 2 val text_file = spark.textFile(log) 3 val count = text_file crashes ¡on ¡cluster ¡(1TB) 4 .filter( line => line.split()[3].toInt 5 > 1440012701) 6 .map(line = > (line.split()[1] , 1)) • Alice ¡cannot ¡see ¡the ¡crash-‑ 7 .reduceByKey(_ + _).collect() inducing ¡intermediate ¡ result. ¡ • Alice ¡cannot ¡identify ¡which ¡input ¡ from ¡1TB ¡causing ¡crash • When ¡crash ¡occurs, ¡all ¡ intermediate ¡ results ¡are ¡thrown ¡ away. 5

Motivating ¡Scenario: ¡Election ¡Record ¡Analysis VoterID Candidate ¡ ¡State ¡ ¡Time • Alice ¡writes ¡a ¡Spark ¡program ¡ 9213 Sanders ¡ ¡ ¡ ¡ ¡ TX 1440023087 that ¡runs ¡correctly ¡on ¡local ¡ 1 val log = "s3n://poll.log" machine ¡(100MB ¡data ¡) ¡but ¡ 2 val text_file = spark.textFile(log) 3 val count = text_file crashes ¡on ¡cluster ¡(1TB) 4 .filter( line => line.split()[3].toInt 5 > 1440012701) 6 .map(line = > (line.split()[1] , 1)) • Alice ¡cannot ¡see ¡the ¡crash-‑ 7 .reduceByKey(_ + _).collect() inducing ¡intermediate ¡ result. ¡ • Alice ¡cannot ¡identify ¡which ¡input ¡ Task 31 failed 3 times; aborting from ¡1TB ¡causing ¡crash job ERROR Executor: Exception in task 31 in stage 0 (TID 31) • When ¡crash ¡occurs, ¡all ¡ intermediate ¡ results ¡are ¡thrown ¡ java.lang.NumberFormatException away. 6

BigDebug: ¡Interactive ¡Debugger ¡Features 1 Simulated ¡Breakpoint 2 Guarded ¡Watchpoint 3 Crash ¡Culprit ¡Identification 4 Backward ¡and ¡Forward ¡ Tracing $>Crash inducing input records : Crashing at transformation 2 9K23 Cruz TX 1440023645 Crashing Record : “Sanders” 2FSD Cruz KS 1440026456 ArrayIndexOutofBoundException 9909 Cruz KS 1440023768 7

Outline • Interactive ¡Debugging ¡Primitives 1. ¡Simulated ¡Breakpoint 2. ¡On-‑Demand ¡Watchpoint 3. ¡Crash ¡Culprit Identification 4. ¡Backward ¡and ¡Forward ¡Tracing 5. ¡Fine ¡Grained ¡Latency ¡Alert • Performance ¡Evaluation 8

Why ¡Traditional ¡Debug ¡Primitives ¡Do ¡Not ¡Work ¡ for ¡Apache ¡Spark? Enabling ¡interactive ¡debugging ¡requires ¡us ¡to re-‑think ¡the ¡features ¡ of ¡traditional ¡debugger ¡ such ¡as ¡GDB • Pausing ¡the ¡entire ¡computation ¡on ¡the ¡cloud ¡could ¡reduce ¡ throughput • It ¡is ¡clearly ¡infeasible ¡for ¡a ¡user ¡to ¡inspect ¡billion ¡of ¡records ¡ through ¡a ¡regular ¡watchpoint • Even ¡launching ¡remote ¡JVM ¡debuggers ¡to ¡individual ¡worker ¡ nodes ¡cannot ¡scale ¡for ¡big ¡data ¡computing 9

Spark ¡Program ¡with ¡Transformations ReduceByKey Map Flatmap Map ReduceByKey Filter 10

Spark ¡Program ¡Scheduled ¡as ¡Stages ReduceByKey Map Flatmap Map ReduceByKey Filter Stage ¡1 Stage ¡2 Stage ¡3 11

Materialization ¡Points ¡in ¡Spark ¡ ReduceByKey Map Flatmap Map ReduceByKey Filter Stage ¡1 Stage ¡2 Stage ¡3 Stored ¡data ¡ records 12

1. ¡Simulated ¡Breakpoint ReduceByKey Map Flatmap Map ReduceByKey Filter Stage ¡1 Stage ¡2 Stage ¡3 Stored ¡data ¡ records 13

1. ¡Simulated ¡Breakpoint Breakpoint ReduceByKey Map Flatmap Map ReduceByKey Filter Stage ¡1 Stage ¡2 Stage ¡3 Stored ¡data ¡ records 14

1. ¡Simulated ¡Breakpoint Breakpoint ReduceByKey Map Flatmap Map ReduceByKey Filter Stage ¡1 Stage ¡2 Stage ¡3 Stored ¡data ¡ records Simulated ¡breakpoint ¡replays ¡computation ¡from ¡the ¡latest ¡ materialization ¡point ¡where ¡data ¡is ¡stored ¡in ¡memory 15

1. ¡Simulated ¡Breakpoint – Realtime Code ¡Fix ¡ Breakpoint ReduceByKey Map Flatmap Map ReduceByKey Filter Stage ¡1 Stage ¡2 Stage ¡3 Allow ¡a ¡user ¡to ¡fix ¡code ¡after ¡the ¡breakpoint 16

2. ¡On-‑Demand ¡Guarded ¡Watchpoint ReduceByKey Map Watchpoint Flatmap Map Stage ¡2 Stage ¡1 Watchpoint captures ¡individual ¡data ¡records ¡matching ¡a ¡user-‑ provided ¡guard 17

2. ¡On-‑Demand ¡Guarded ¡Watchpoint ReduceByKey Map Watchpoint Flatmap Map Stage ¡2 Stage ¡1 state.equals(“TX”)||state.equals(“CA”) ¡ Watchpoint captures ¡individual ¡data ¡records ¡matching ¡a ¡user-‑ provided ¡guard 18

2. ¡On-‑Demand ¡Guarded ¡Watchpoint ReduceByKey Map Watchpoint Flatmap Map Stage ¡2 Stage ¡1 state.equals(“CA”) ¡ Watchpoint captures ¡individual ¡data ¡records ¡matching ¡a ¡user-‑ provided ¡guard 19

Crash ¡in ¡Apache ¡Spark A ¡job ¡failure ¡in ¡Spark ¡throws ¡away ¡the ¡intermediate ¡ results ¡ of ¡correctly ¡ computed ¡stages Map ReduceByKey Map Flatmap ReduceByKey Filter Stage ¡2 Stage ¡3 Stage ¡1 Task 31 failed 3 times; aborting job ERROR Executor: Exception in task 31 in stage 0 (TID 31) java.lang.NumberFormatException To ¡recover ¡to ¡from ¡crash, ¡a ¡user ¡need ¡to ¡find ¡input ¡causing ¡crash ¡ and ¡re-‑execute ¡the ¡whole ¡job. 20

3. ¡Crash ¡Culprit ¡Identification Map ReduceByKey Map Flatmap ReduceByKey Filter Stage ¡2 Stage ¡3 Stage ¡1 Crash occurred at transformation 3 Crashing Record : “Sanders” ArrayIndexOutofBoundException Skipping the record. Continuing processing. A ¡user ¡can ¡see ¡the ¡crash-‑causing ¡intermediate ¡record ¡and ¡trace ¡the ¡ original ¡inputs ¡leading ¡to ¡the ¡crash. 21

3. ¡Crash ¡Culprit Remediation Map ReduceByKey Map Flatmap ReduceByKey Filter Stage ¡2 Stage ¡3 Stage ¡1 A ¡user ¡can ¡either ¡correct ¡the ¡crashed ¡record, ¡skip ¡the ¡crash ¡culprit, ¡ or ¡supply ¡a ¡code ¡fix ¡to ¡repair ¡the ¡crash ¡culprit. ¡ 22

4. ¡Backward ¡and ¡Forward ¡Tracing Map ReduceByKey Map Flatmap ReduceByKey Filter Stage ¡2 Stage ¡3 Stage ¡1 A ¡user ¡can ¡also ¡issue ¡tracing ¡queries ¡on ¡intermediate ¡records ¡at ¡ realtime 23

4. ¡Backward ¡and ¡Forward ¡Tracing Map ReduceByKey Map Flatmap ReduceByKey Filter Stage ¡2 Stage ¡3 Stage ¡1 A ¡user ¡can ¡also ¡issue ¡tracing ¡queries ¡on ¡intermediate ¡records ¡at ¡ realtime 24

Titian: ¡Data ¡Provenance ¡for ¡Spark ¡[PVLDB2016] Titian ¡instruments ¡Spark ¡jobs ¡with ¡tracing ¡agents ¡to ¡generate ¡fine ¡ grained ¡tracing ¡tables Tracing ¡Table ¡3 Tracing ¡Table ¡2 Tracing ¡Table ¡1 Input Output ¡ Input Output ¡ t x Step ¡1 Input Output ¡ x 0 a y 0 a y 1 c z 25 b … … … … Step ¡2 … … w 10 Input Output ¡ a y b w … … Input Output ¡ 0 a 25 b Titian ¡logically ¡reconstructs ¡mapping ¡from ¡output ¡to ¡input ¡records ¡ by ¡recursively ¡joining ¡the ¡provenance ¡tables 25

BigDebug: Debugging Primitives for Interactive Big Data - PowerPoint PPT Presentation

BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark Muhammad Ali Gulzar, Matteo Interlandi, Seunghyun Yoo, Sai Deep Tetali, Tyson Condie,

Debugging Debugging Tools Module Overview Introduction to Debugging Problems in Production

Bi BigDebug : : Interactive Debugger for Bi Big Data

Coroutines Update Seva Tolstopyatov @qwwdfsad October 13, 2020 Coroutines debugging Coroutines

Debugging Debugging with High Level Languages Same goals as low-level debugging Examine and

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Debugging Floating-Point Debugging Floating-Point Debugging Floating-Point Math in Racket Math

RenderMan Primitives RenderMan Primitives CSCD 472? Slide 1 4/5/10 Primitive Attributes

Visual Debugging Software What is Debugging Visualization Visualizing

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

Debugging microservices in production Bryan Cantrill CTO bryan@joyent.com @bcantrill

Scalable Post-Mortem Debugging Abel Mathew CEO - Backtrace amathew@backtrace.io @nullisnt0

Embedded Software TI2726-B 8. Debugging techniques Koen Langendoen Embedded Software Group

Kernel Debugging and Virtualization John Baldwin January 15, 2015 What is Kernel Debugging

Debugging Techniques for C Programs Debugging Basics Will focus on the gcc/gdb combination.

CoE/ECE 0142 Computer Organization Pipelining Instructor: Jun Yang Slides are adapted from

Game Theory: Lecture #5 Outline: Stable Matchings The Gale-Shapley Algorithm

Data Assimilation in Coupled Wildland Fire - Atmosphere Modeling Jan Mandel Department of

Welcome to St. Anthonys Primary School 11 Nov 2017 P1 Orientation Information Booklet

1 2 3 4 5 Review the characteristics of this SMART design 6 Review the characteristics of this

SonicBOOM The Third Generation Berkeley Out-of-Order Machine Jerry Zhao, Ben Korpan, Abe

10,000 TESTS/DAY GOAL MET 300,444 TESTS COMPLETED TO DATE INCREASED CONTACT TRACING NEARLY

Learning To Stop While Learning To Predict Xinshi Chen 1 , Hanjun Dai 2 , Yu Li 3 , Xin Gao 3 , Le