A System-Wide Debugging Assistant Powered by Natural Language - - PowerPoint PPT Presentation

a system wide debugging assistant powered by natural
SMART_READER_LITE
LIVE PREVIEW

A System-Wide Debugging Assistant Powered by Natural Language - - PowerPoint PPT Presentation

A System-Wide Debugging Assistant Powered by Natural Language Processing Karthik Narasimhan Anirudh Sivaraman Pradeep Dogga* Ravi Netravali* * Distributed Systems are complex Load Balancer Response A Request A Debugging is


slide-1
SLIDE 1

A System-Wide Debugging Assistant Powered by Natural Language Processing

Pradeep Dogga* Karthik Narasimhan† Anirudh Sivaraman‡ Ravi Netravali* *

† ‡

slide-2
SLIDE 2

Distributed Systems are complex

Request A Response A Load Balancer

slide-3
SLIDE 3

Debugging is hard - abstraction gap

Application is not loading some content!

Users Developer

slide-4
SLIDE 4

Painful debugging process

Developer Application is not loading some content!

Is it a bug or feature request? Which team is relevant for this? Find root-cause

Preliminary Diagnosis

slide-5
SLIDE 5

Painful debugging process – Finding root cause

Developer Application is not loading some content!

Corrupt key-value store? Check logs from API calls to key- value store

Wrong hypothesis!

Routing loop at switch Check traffic logs from that switch

Correct hypothesis! (Identified a loop)

Largely manual and error-prone

Query Generation Active Debugging

slide-6
SLIDE 6

Painful debugging process – Generate Fix

Developer

Change switch configuration file Verify application behavior

Fix

slide-7
SLIDE 7

Systems debugging tools

Application Logs

slide-8
SLIDE 8

Systems debugging tools

Network Metrics Marple (SIGCOMM 17)

slide-9
SLIDE 9

Systems debugging tools

Distributed systems tracing

Canopy (SOSP 17) Pivot Tracing (SOSP 15)

slide-10
SLIDE 10

Debugging remains difficult

Did I debug this scenario before?

  • Still manual and error-prone:
  • Which tool?
  • When?
  • How?
  • Debugging intuitions are hard-won!
slide-11
SLIDE 11

Can we use a data-driven approach to automate steps in end-to-end debugging?

slide-12
SLIDE 12

Large amounts of debugging data

Two big classes of data: Quantitative/Structured Logs from tools Performance metrics Source code Unstructured/Natural Language User Issues Documentation and comments Past bug reports

slide-13
SLIDE 13

Related Work

  • Program Analysis and Synthesis:
  • NLP for code generation, Deep API learning (FSE 16)
  • Program Debugging:
  • Net2Text: English queries => SQL queries (NSDI 18)
  • Big Code:
  • Initiative to perform statistical program analysis on large amounts of code

Limitations:

  • Only ingest data from a single subsystem
  • Assume a single-step prediction
slide-14
SLIDE 14

A System-Wide Debugging Assistant Powered by Natural Language Processing

NL Debugging Assistant Suggestion:

  • Label
  • Folder/Module
  • Use tcpdump
  • Issue query X

with Marple System-wide concern Feedback Issues/Bug Reports Code/Configuration Files End Host Logs Application Logs Network Metrics

slide-15
SLIDE 15

Automating steps in end-to-end debugging

Preliminary Diagnosis Generating Debugging queries Active Debugging Fix!

Developer

slide-16
SLIDE 16

Preliminary Diagnosis

Debugging Assistant

Menu panel not closing when not detached

/src/lib/menu

  • Automate : Label assignment and Module prediction
  • Category : Text classification and document retrieval
  • Challenge : Learn joint representations of data from both unstructured text and

structured source code. Source code

slide-17
SLIDE 17

Label Prediction – Preliminary Evaluation

  • 165966 labeled issues from the top 98 open-source Github repositories (based on stars)
  • Bag-of-words representation of issue text

Menu panel not being closed when not detached 1 1 2 1 1 1 1

Menu panel not tool closed css being detached when FFN

Label1 Label2 Label3 Label4 Label5

slide-18
SLIDE 18

Results

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Precision Recall F1-score

Prediction performance

Label Prediction

slide-19
SLIDE 19

Source Code Folder Prediction – Preliminary Evaluation

Menu panel not being closed when not detached FFN

Relevance Score

  • 240138 issues with corresponding fixes from Github repositories

Fix in: /src/lib/menu

slide-20
SLIDE 20

Results

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Precision Recall F1-score

Prediction performance

Folder prediction

slide-21
SLIDE 21

Automating steps in end-to-end debugging

Preliminary Diagnosis Generating Debugging queries Active Debugging Fix!

Developer

slide-22
SLIDE 22

Generating debugging queries

Debugging Assistant

Application loading contents slowly Issue debugging query: ‘Stream = filter(T, (switch == 2) ); R = map(stream, [qin], [qin]);’ System logs

Developer

Found large queue depths due to a flow!

  • Automate : Query generation for use with debugging tools
  • Category : Language generation
  • Challenge : Understand system logs, source code semantics and language syntax
slide-23
SLIDE 23

Template-based query prediction

Linux Router

Reddit Frontend Memcache

Cassandra & Zookeeper

Postgres DB

P4 Switch P4 Switch P4 Switch P4 Switch

Fault Injector

Pick a fault from:

  • Shut down Cassandra host
  • Create congestion on reddit-

switch link with other traffic

Inject Distributed reddit setup

  • A platform to let users interact with the system and collect data for query generation.
  • Network debugging tool for performance queries (Marple)
slide-24
SLIDE 24

Template-based query prediction

Linux Router

Reddit Frontend Memcache

Cassandra & Zookeeper

Postgres DB

P4 Switch P4 Switch P4 Switch P4 Switch

Distributed reddit setup Marple stream = filter(T, (switch == 4) ); R = map(stream, [qin], [qin]); P4 program Queue depths

slide-25
SLIDE 25

Template-based query prediction

  • Predict the correct template and switch to diagnose the root-cause
  • Collected issue reports using the testbed from one user for faults injected using fault injector.

Application loading content slowly FFN

Relevance Score

Template1 Switch 10

slide-26
SLIDE 26

Results

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Precision Recall F1-score

Prediction performance

Query generation

slide-27
SLIDE 27

Automating steps in end-to-end debugging

Preliminary Diagnosis Generating Debugging queries Active Debugging Fix!

Developer

slide-28
SLIDE 28

Active (interactive) debugging

Debugging Assistant

Application loading content slowly Issue query 1 with marple System logs

Developer

Did not find any issues with queues

  • Automate : Iterative query generation by incorporating feedback
  • Category : Sequential decision making
slide-29
SLIDE 29

Issue query 2 with marple

Active (interactive) debugging

Debugging Assistant

Application loading content slowly System logs

Developer

Done: Found an issue in routing!

  • Automate : Iterative query generation by incorporating feedback
  • Category : Sequential decision making
  • Challenge : Developer-assistant interface to leverage developer’s experience
slide-30
SLIDE 30

Challenges & Future Work

  • Need to determine optimal model to leverage information from text and traces

to generate queries syntactically

  • Data collection, training time – need to develop novel systems and algorithmic

techniques

  • End-to-end evaluation – Evaluate impact of the assistant in the debugging

experience with real issues.

  • Developer study on systems with reasonable complexity
slide-31
SLIDE 31

Conclusion

  • Our work paints a vision for an end-to-end debugging assistant

which can:

  • Process natural language inputs
  • Various system logs
  • Leverage multiple domain specific debugging tools
  • Automate the three steps in debugging
slide-32
SLIDE 32

Thank you!

Contact: dogga@cs.ucla.edu http://web.cs.ucla.edu/~dogga