Improvement of Log Pattern Extracting Algorithm Using Text - PowerPoint PPT Presentation

Improvement of Log Pattern Extracting Algorithm Using Text Similarity ZHAO Yining Computer Network Information Center, Chinese Academy of Sciences in HPBDC18, 2018/05/21

Content v CNGrid & LARGE v Why Log Patterns & Extracting Algorithm v Algorithm of Identical Word Rate v Text Similarity Based Approach Ø Improved Extracting Formation & LCS Ø Experiment Result v Modified Log Comparing Model v Summary & Future Work

CNGrid & LARGE v China National HPC Environment 2 Operating Centers ( Beijing / Hefei ) 19 Sites ( 200PF + 162PB ) Portal with Micro-Service Architecture Application oriented Global Scheduling & Predicting Resource Evaluation Standard & Comprehensive Evaluation Index

CNGrid & LARGE v Log Analyzing fRamework in Grid Environment

Log Patterns & Extracting Algorithm v We want to be alerted for logs in certain patterns, but… Ø too many logs for human to read Ø need to summarize patterns before defining alert rules v Set of log patterns in our context: Ø patterns are different from each other Ø covering all logs in original set Ø significantly less than original v The process of using log patterns Ø filter and remove frequent normal logs Ø use log pattern extraction algorithms to get the set of patterns Ø manually check the set and pick out abnormal patterns Ø define rules to generate alerts for these patterns

Algorithm of Identical Word Rate v Algorithm of identical word rate – a straight forward way Ø identical words • 2 words that are identical • and in the same position in 2 original logs Ø identical word rate • (number of identical words) / (total words) • predefined threshold t • If IWR is greater than t, the two logs are in one pattern v Process of algorithm of IWR Ø set threshold t and initial empty pattern set P Ø for each new incoming logs, compute IWR with each pattern in P Ø if pattern matched, skip to next; if none matched, add to P v Significant Limitation Ø Logs with different length has IWR of ZERO!

Text Similarity Based Approach (1) v Using Text Similarity to resolve the problem Ø S = P x O Ø S: similarity, P: propotion of common words, O: order factor v Two logs l 1 and l 2 , L 1 and L 2 are word sets respectively Ø define P: P(l 1 , l 2 ) = ( |L 1 ∩ L 2 | × 2) / ( |L 1 | + |L 2 | ) Ø define O: O(l 1 , l 2 ) = SeqSim(l 1 , l 2 ) / |L 1 ∩ L 2 | Ø hence S: S(l 1 , l 2 ) = (SeqSim(l 1 , l 2 ) × 2) / (|L 1 | + |L 2 |) v By this, logs in different lengths can be compared

Text Similarity Based Approach (2) v Using Longest Common Subsequence to define SeqSim(l 1 ,l 2 ) Ø S(l 1 , l 2 ) = ( |LCS(l 1 , l 2 )| × 2) / ( |L 1 | + |L 2 | ) Ø Same pattern if S(l 1 , l 2 ) ≥ t, where t is the predefined threshold v The process of improved log pattern extracting algorithm Ø set the threshold value t. Set the initial log pattern set P to be an empty set Ø for a new log l appearing from the input log set L, compute S i (l, p i ) between l and every p i ∈ P using a LCS algorithm Ø if there is no S i (l, p i ) ≥ t, add l to P Ø after all logs in L have been checked, return P v Increase time cost for single comparison Ø but reduce total number of comparisons Ø can be offset by choosing a better LCS algorithm

Text Similarity Based Approach (3) v Experiment result Ø numbers of extracted patterns

Text Similarity Based Approach (3) v Experiment result Ø time costs of candidate algorithms (in milliseconds)

Modified Pattern Comparing Model (1) v The original model is bad in time cost of searching patterns Ø has to visit all patterns until the one is met v Use hashmap to accelerate the matching Ø divide pattern set into subsets by initial words Ø skip majority of patterns in irrelevant subsets v Matching process : 1. get initial word of the log 2. hash the word 3. find desired subset in hashmap 4. compare with patterns in the subset

Modified Pattern Comparing Model (2) v This approach cannot deal with patterns with unfixed initials Ø build an unfixed pattern set v In real system, we split pattern set in 4 parts: Ø fixed alert pattern set Ø unfixed alert pattern set Ø fixed normal pattern set Ø unfixed normal pattern set v When a new log comes, it is compared in the 4 sets in turn to decide processing methods

Modified Pattern Comparing Model (3) v Real time cost comparison between original & modified models cron maillog millisecond millisecond 1800000 3000000 1600000 2500000 1400000 2000000 1200000 1000000 1500000 800000 600000 1000000 400000 500000 200000 0 0 original model modified model original model modified model secure messages millisecond millisecond 600000 10000000 9000000 500000 8000000 7000000 400000 6000000 300000 5000000 4000000 200000 3000000 2000000 100000 1000000 0 0 original model modified model original model modified model

Summary & Future Work v Log patterns: used to build log recognition v Algorithm of IWR isn’t capable to match logs in different lengths v Using the idea of text similarity and LCS to improve the algorithm v Modify log comparing model to accelerate the process v Future work: log pattern based analyses in CNGrid Ø log pattern associations Ø log flow feature modeling

Improvement of Log Pattern Extracting Algorithm Using Text - PowerPoint PPT Presentation

Improvement of Log Pattern Extracting Algorithm Using Text Similarity ZHAO Yining Computer Network Information Center, Chinese Academy of Sciences in HPBDC18, 2018/05/21 Content v CNGrid & LARGE v Why Log Patterns & Extracting

(142733/102960-Log[4])+(614851/73920-2 Log[64]) h 2 +(2329/1680-Log[4]) h 4 -h 10 /20160

A simple and robust A simple and robust algorithm for extracting algorithm for extracting

Chandra data reduction The CDFs Giorgio, Margherita, Elisabeta, Eleonora, Lazarus, Enrica,

1 Methods of Extracting or Obtaining Essential Oils The most common method for extracting

Extracting Tables from PDFs Extracting Tables from PDFs Using Camelot and Excalibur to

Syslog and Log Rotate Computer Center, CS, NCTU Log files Execution information of each

Distributed ephemeral log service Log entries are replicated,dispersed See Ivy,

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Section 3.7 Derivatives of logarithmic functions 1 Rules of exponentials and logarithms 1.

An NFR Pattern Approach to Dealing An NFR Pattern Approach to Dealing An NFR Pattern Approach to

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Pattern Recogniton Pattern: Any

Scope Constrained Frequent Pattern Mining: Constrained Frequent Pattern Mining: A A

A common pattern: map Another common pattern: filter Pattern: take a list and produce a new list,

Extracting Gait Parameters Extracting Gait Parameters from Raw Data from Raw Data

Program Analysis Program Analysis Extracting information, in order to present Extracting

CKM 2006 CKM 2006 Extracting CKM phase from phase from Extracting CKM B K

The ergodic theory of continued fraction maps Speaker: Radhakrishnan Nair University of

Quantized Quantized superfluid vortices superfluid vortices in the unitary Fermi gas in the

TOPO OPOLOGICA OGICAL L DR DRIVING IVING FIEL FIELDS DS Dr. Andrs Reynoso LABORATORY OF

Parallel automated reasoning Maria Paola Bonacina Dipartimento di Informatica Universit` a

Advanced RF-KO excitation methods for high quality spills Slow Extraction Workshop 2019,

System Test Extraction Region Results Tomasz Biesiadzinski LZ Collaboration Meeting SLAC

An Extended Integrated Assessment Model for Mitigation and Adaptation Policies on Climate Change

Sentence-Level Quality Estimation for MT System Combination Tsuyoshi Okita, Rapha el Rubino,