Provenance for System Troubleshooting Marc Chiarini Harvard SEAS - - PowerPoint PPT Presentation

provenance for system troubleshooting
SMART_READER_LITE
LIVE PREVIEW

Provenance for System Troubleshooting Marc Chiarini Harvard SEAS - - PowerPoint PPT Presentation

Provenance for System Troubleshooting Marc Chiarini Harvard SEAS TaPP '11 A Day in the Life... Wake up 3am via page from a heartless machine: hot backup has failed. Start troubleshooting (in pajamas, thankfully!) Log files


slide-1
SLIDE 1

Provenance for System Troubleshooting

Marc Chiarini Harvard SEAS TaPP '11

slide-2
SLIDE 2

A Day in the Life...

  • Wake up 3am via page from a heartless

machine: hot backup has failed.

  • Start troubleshooting (in pajamas, thankfully!)
  • Log files indicate unable to contact storage

appliance, ¾ into backup.

  • Storage appliance working fine and reachable

now.

  • Where to look next? (Coffee first!)
slide-3
SLIDE 3

System-level Provenance

  • Directed acyclic graph tells us what digital
  • bjects interacted during provenance

collection, and when.

  • Examples:

– File F read by process

P

– File Z written by P – Z read by process Q – Pipe I written by

Q

– I read by process

R

slide-4
SLIDE 4

Potential Dependency

  • Define dependency as the transmission of

information from a passive object (file, pipe, etc) to an active object (process), that is necessary to the proper functioning of the process.

  • Transitive dependencies also exist between

active objects.

  • For troubleshooting, provenance graph edges

represent potential dependencies. We don't look at data or programs, so won't talk about causality.

slide-5
SLIDE 5

Troubleshooting Example

netw

  • rk

man ager dhcli ent

resolv.con f dhclient socket endpoint

D- Bus resol ver

NetMan socket endpoint

  • ther inputs

dhclient.con f

  • ther inputs

many

  • ther inputs
slide-6
SLIDE 6

Graph Reduction

  • Real graph is much too large.
  • Reduction is necessary to support reasonable

queries.

  • Want to turn potential dependencies into

actual dependencies with high confidence and eliminate non-dependencies in the graph.

  • Impossible to identify all true dependencies;

would require enumerating all failures.

slide-7
SLIDE 7

In Our Favor...

  • There are known dependencies, e.g.,

configuration files for system services.

  • We can label with low probability, files

residing in well-known log directories, e.g., / var/log.

  • We can label with high probability, files

residing in library directories, e.g. /usr/lib.

  • We can label with high probability, files that

are opened by a program on every

slide-8
SLIDE 8

Other challenges

  • Building a tool that improves the sysadmin's

mental model of her systems via exploration, documentation, visualization, etc.

  • Give the sysadmin an intuitive way to query

the provenance graph and limit the scope of query responses (regexps may not cut it!).

  • How do we integrate troubleshooting workflow

artifacts (e.g., past symptoms and graph query results) with troubleshooting

slide-9
SLIDE 9

Questions?

Prototype will be available in late fall 2011. http://www.eecs.harvard.edu/syrah/pass/ chiarini@seas.harvard.edu