a graphical representation for identifier structure in
play

A Graphical representation for identifier structure in application - PowerPoint PPT Presentation

UC Berkeley A Graphical representation for identifier structure in application logs Ari Rabkin, Wei Xu, Avani Wildani, Armando Fox, David Patterson and Randy Katz SLAML October 3, 2010 Motivation & Summary Log analysis is


  1. UC Berkeley A Graphical representation for identifier structure in application logs Ari Rabkin, Wei Xu, Avani Wildani, Armando Fox, David Patterson and Randy Katz SLAML October 3, 2010

  2. Motivation & Summary • Log analysis is fundamentally constrained by the information content of the underlying logs • Need tools to help developers spot flaws in their loging • We propose a compact graph-based representation for log structure • Differs from previous work in analyzing logging behavior, not logs of particular executions

  3. Focus on identifers • We focus on identifiers in logs – Variable fields that refer to entities in a system. – Can be operationally defined as variable fields with increasingly many possible strings [Xu 09] • Previous work has modeled logs as sets of concurrent state machines. [Fu 09, Tan 08] – Identifiers tie together messages that correlate to the same state machine

  4. Some defects • Imagine a transaction processing system. 3:45 Starting transaction t123 
 3:46 Transaction failed 
 3:50 Starting transaction t123 
 3:51 Finished trans that was started at 3:50.

  5. Missing IDs • Imagine a transaction processing system. 3:45 Starting transaction t123 
 3:46 Transaction failed 
 No ID 3:50 Starting transaction t123 
 3:51 Finished trans that was started at 3:50.

  6. Inconsistent IDs • Imagine a transaction processing system. 3:45 Starting transaction t123 
 3:46 Transaction failed 
 3:50 Starting transaction t123 
 3:51 Finished trans that was started at 3:50. Inconsistent identification

  7. Ambiguous IDs • Imagine a transaction processing system. 3:45 Starting transaction t123 
 Ambiguous 3:46 Transaction failed 
 identification 3:50 Starting transaction t123 
 3:51 Finished trans that was started at 3:50.

  8. Goals • Seek a compact representation for logs • Make common logging flaws visible • Facilitate comparison across related logs • Not depend on details of particular execution traces

  9. A real example !, !3 <99+=>9 ?#@ 4 5)67&*+89 / , -. !- :%;( '&#$( ! Hadoop datanode !2 "#$%&'&#$()*&+ logs from Yahoo! M45 cluster !! !/ !0 !1

  10. Definitions • Definitions: – A log message is a string. – Each log message is associated with a specific message type. – All messages of a type are structurally identical. (same set of identifier fields) – Identifiers belong to identifier classes.

  11. Assumptions • Assumptions – Have representative sample of logs – Can find message type from message – Can extract identifiers from messages – Have identifier class for each identifier field in a message type

  12. Core structure • Ex: Starting task t123 on node n Host name Task ID Task ID Starting task… Host name Formally: a graph with V = { identifier classes} U {message types} E = { (i,m) | message m includes an identifier of class i }

  13. Subsumption • Sometimes, one identifier includes another. • Model this by adding a graph edge between two identifiers if one inclues another. • Call this subsumption – E.g., URLs subsume host names Host name URL

  14. Frequency • Can encode frequency information on diagram Rare Medium Common • Scaled relative to most-frequent message or identifier • γ -correction: scale by sqrt(freq / Max(freq))

  15. Ubiquity • Can show information about joint ID- message statistics • Want to distinguish (ab)normal messages • Defn: The ubiquity of identifier class C for message type T is the fraction of identifiers belonging to class C appearing in messages of type T. • Orthogonal to frequency of message

  16. Drawing ubiquity • Line thickness proportional to ubiquity Task ID Starting task… Abnormal failure

  17. Diagramming defects • Missing ID: Message 1 Message 2 • Inconsistent IDs Message 1 Message 2 ID 1 ID 2

  18. Our prototype • Have a prototype that converts logs into .dot files for rendering with GraphViz • Pluggable parsers • Omit message strings; output alongside

  19. A real example, part 2 !, !3 <99+=>9 ?#@ 4 5)67&*+89 / , -. !- :%;( '&#$( ! Hadoop datanode !2 "#$%&'&#$()*&+ logs from Yahoo! M45 cluster !! !/ !0 !1

  20. Inconsistent identifiers +0 ,- +, /8 7'3& -, +! 9:; 567 -. *+ +< ,+ -8 "#$%&'()* !"#$%&'() 9:'/#"() -0 -/ "#$%&'(1*234(5%&5'6 !"#$%&'.)/01'2$%2&3 ! -4 Old New Logs from Chukwa, an open-source log collection system [Boulon 08, Rabkin 10]

  21. Ambiguous identifiers Logs from SCADS, an experimental system at Berkeley

  22. Ambiguous identifiers Logs from SCADS, an experimental system at Berkeley

  23. Comparing logs 9&:( 7&8( !, +. !2 )3 )1 .* 36 * 6 7#'18&32 <#',=&.- !- !5 !4 !* )6 !. !3 10 *- )/ * 30 *6 )- %&'( 4 %&'( 1! 7-8#9-:; -! ** ** 2 *4 0112341 3 5 -+ +,,-./, >?#8( )0 ! +! )* ! 1) )! 1* *1 *) +* "#$ )6 "#$ 15 )* ,* )5 ,) @ABC?D-=, */ ). *+ )4 *! *, Missing ID/message )! *! 11 !* ,, -. !6 15-node cluster at Berkeley M45 cluster (professional management) Comparing Hadoop JobTracker logs

  24. Conclusions • Aspects of log structure can be encoded in succinct diagrams. • Our choice of representation captures: – missing identifiers, inconsistent identifiers, and ambiguous identifiers – How much detail about different topics – Ratio of routine vs peculiar messages + types • Usable on real systems, even with limited understanding of system and logs • No need for temporal information

  25. Questions?

  26. A note on parsing • I used semi-hand-written parsers. • Wrote rules to tag identifiers: – e.g., "job_..." is a job ID • Tokenized lines, identified line by token sequence + constants – Special cases for numbers • Explored using program analysis to extract messages – Came out ugly, but cleanable. – Need to fix names – Need to merge some categories

  27. Related work • Xu 09 • State machines • Entropy as metric?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend