provenance for system troubleshooting
play

Provenance for System Troubleshooting Marc Chiarini Harvard SEAS - PowerPoint PPT Presentation

Provenance for System Troubleshooting Marc Chiarini Harvard SEAS TaPP '11 A Day in the Life... Wake up 3am via page from a heartless machine: hot backup has failed. Start troubleshooting (in pajamas, thankfully!) Log files


  1. Provenance for System Troubleshooting Marc Chiarini Harvard SEAS TaPP '11

  2. A Day in the Life... ● Wake up 3am via page from a heartless machine: hot backup has failed. ● Start troubleshooting (in pajamas, thankfully!) ● Log files indicate unable to contact storage appliance, ¾ into backup. ● Storage appliance working fine and reachable now. ● Where to look next? (Coffee first!)

  3. System-level Provenance ● Directed acyclic graph tells us what digital objects interacted during provenance collection, and when. ● Examples: – File F read by process P – File Z written by P – Z read by process Q – Pipe I written by Q – I read by process R

  4. Potential Dependency ● Define dependency as the transmission of information from a passive object (file, pipe, etc) to an active object (process), that is necessary to the proper functioning of the process. ● Transitive dependencies also exist between active objects. ● For troubleshooting, provenance graph edges represent potential dependencies . We don't look at data or programs, so won't talk about causality.

  5. Troubleshooting Example dhclient.con other inputs f dhcli ent dhclient socket many endpoint other inputs D- Bus NetMan socket endpoint other inputs netw ork man ager resolv.con f resol ver

  6. Graph Reduction ● Real graph is much too large. ● Reduction is necessary to support reasonable queries. ● Want to turn potential dependencies into actual dependencies with high confidence and eliminate non-dependencies in the graph. ● Impossible to identify all true dependencies; would require enumerating all failures.

  7. In Our Favor... ● There are known dependencies, e.g., configuration files for system services. ● We can label with low probability, files residing in well-known log directories, e.g., / var/log. ● We can label with high probability, files residing in library directories, e.g. /usr/lib. ● We can label with high probability, files that are opened by a program on every

  8. Other challenges ● Building a tool that improves the sysadmin's mental model of her systems via exploration, documentation, visualization, etc. ● Give the sysadmin an intuitive way to query the provenance graph and limit the scope of query responses (regexps may not cut it!). ● How do we integrate troubleshooting workflow artifacts (e.g., past symptoms and graph query results) with troubleshooting

  9. Questions? Prototype will be available in late fall 2011. http://www.eecs.harvard.edu/syrah/pass/ chiarini@seas.harvard.edu

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend