Automatic Event Log Abstraction to Support Forensic Investigation - PowerPoint PPT Presentation

Automatic Event Log Abstraction to Support Forensic Investigation Hudan Studiawan, Ferdous Sohel, Christian Payne College of Science, Health, Engineering and Education Murdoch University, Perth, Australia The Australasian Information Security Conference (AISC 2020) Swinburne University of Technology, Melbourne, Victoria, Australia

CORE Student Travel Award We acknowledge that we have received a CORE Student Travel Award. 2

Outline • Introduction • Existing Methods • The Proposed Method • Event Log Preprocessing • Grouping based on Word Count • Graph Model for Log Messages • Grouping with Automatic Graph Clustering • Extraction of Event Log Abstraction • Experimental Results • Conclusion and Future Work 3

Introduction • Abstraction of event logs is the creation of a template that contains the most common words representing all members in a group of event log entries • Abstraction helps the forensic investigators to obtain an overall view of the main events in a log file I n p u t l o g fj l e : a u t h . l o g O u t p u t a b s t r a c t i o n s : # 1 M a r * * n s s a l * r e m o v i n g r e m o v a b l e l o c a t i o n : * # 2 M a r 8 * n s s a l * I n v a l i d u s e r * f r o m * # 3 M a r 8 * n s s a l * F a i l e d p a s s w o r d f o r * f r o m * p o r t * s s h 2 … 4

Existing Methods Existing log abstraction methods require user input parameters. It is time consuming due to the need to identify the best parameters. • SLCT (Vaarandi, 2003): one mandatory parameter and 14 optionals • LogCluster (Vaarandi and Pihelgas, 2015): one mandatory parameter 26 optionals • IPLoM (Makanju et al., 2012): five mandatory parameters • LogSig (Tang et al., 2011): one mandatory parameter • Drain (He et al., 2017): three mandatory parameters • Model training (Thaler et al., 2017) 5

The Proposed Method R a w e v e n t l o g s A u t o m a t i c l o g p r e p r o c e s s i n g G r o u p i n g b a s e d o n w o r d c o u n t R e fj n e g r o u p i n g w i t h a u t o m a t i c g r a p h c l u s t e r i n g G e t t h e e v e n t l o g a b s t r a c t i o n p e r c l u s t e r 6

Event Log Preprocessing • We parse the log files using the nerlogparser, a log parsing tool based on named entity recognition • It supports fully automatic parsing because it provides a pre-trained model • We then extract unique messages from the log entries I n p u t : J a n 1 8 0 9 : 3 1 : 3 2 v i c t o r i a d h c l i e n t : D H C P A C K f r o m 1 0 . 0 . 2 . 2 P r o c e s s : a u t o m a t i c p a r s i n g w i t h t h e n e r l o g p a r s e r t o o l O u t p u t : t i m e s t a m p : J a n 1 8 0 9 : 3 1 : 3 2 h o s t n a m e : v i c t o r i a s e r v i c e : d h c l i e n t m e s s a g e : D H C P A C K f r o m 1 0 . 0 . 2 . 2 7

Grouping based on Word Count • We split the discovered unique messages based on space character then count the word length • An abstraction is extracted from the always-occurring word in a group of log entries having the same length C l u s t e r # 1 : J a n 1 8 0 9 : 3 1 : 3 2 v i c t o r i a d h c l i e n t : D H C P A C K f r o m 1 0 . 0 . 2 . 2 J a n 1 8 1 0 : 5 6 : 4 0 v i c t o r i a d h c l i e n t : D H C P A C K f r o m 1 0 . 0 . 2 . 2 F e b 6 1 3 : 3 1 : 1 2 v i c t o r i a d h c l i e n t : D H C P A C K f r o m 1 0 . 0 . 2 . 5 A b s t r a c t i o n # 1 : * * * v i c t o r i a d h c l i e n t : D H C P A C K f r o m * C l u s t e r # 2 : F e b 6 1 2 : 5 6 : 4 8 v i c t o r i a i n i t : S w i t c h i n g t o r u n l e v e l : 0 J a n 1 8 1 7 : 1 3 : 4 9 v i c t o r i a i n i t : S w i t c h i n g t o r u n l e v e l : 6 F e b 6 1 3 : 0 3 : 5 3 v i c t o r i a i n i t : S w i t c h i n g t o r u n l e v e l : 6 A b s t r a c t i o n # 2 : * * * v i c t o r i a i n i t : S w i t c h i n g t o r u n l e v e l : * 8

Graph Model for Log Messages The log entries have very diverse vocabularies, so we need to refine discovered groups based on the string similarity • We use an automatic graph-based clustering • Vertex: a unique message, edge: the weighted Hamming similarity S e n d i n g o n S o c k e t / f a l l b a c k D H C P A C K f r o m 1 0 . 0 . 2 . 2 0 . 8 3 0 . 8 3 D H C P A C K f r o m 1 0 . 0 . 2 . 5 0 . 8 3 D H C P A C K f r o m 1 9 2 . 1 6 8 . 5 6 . 1 0 0 9

Grouping with Automatic Graph Clustering 10

Building Micro-clusters 11

Extraction of Abstraction: Merging Abstractions • We extract an abstraction from each micro-cluster • Merging is needed because an abstraction from each micro-cluster has a possibility to be very similar with others • We find pair combinations ( A i , A j ) from all abstractions to be compared. • Two abstractions A i and A j will continue to be checked for merging if there is a weighted Hamming similarity between them. 12

Example of Merging Abstractions E x a m p l e 1 : A b s t r a c t i o n # 1 : I n v a l i d u s e r * f r o m * A b s t r a c t i o n # 2 : I n v a l i d u s e r a d m i n f r o m * E x a m p l e 2 : A b s t r a c t i o n # 1 : I n v a l i d u s e r * f r o m 2 0 0 . 2 7 . 1 4 8 . 4 5 A b s t r a c t i o n # 2 : I n v a l i d u s e r * f r o m * 13

Extraction of Abstraction: Final Abstractions • In all previous steps, we consider only the message field in a log entry. • In the final step, we consider all other fields such as timestamp, host name, and service name. C l u s t e r # 1 : J a n 1 8 0 9 : 3 1 : 3 2 v i c t o r i a d h c l i e n t : D H C P A C K f r o m 1 0 . 0 . 2 . 2 J a n 1 8 1 0 : 5 6 : 4 0 v i c t o r i a d h c l i e n t : D H C P A C K f r o m 1 0 . 0 . 2 . 2 F e b 6 1 3 : 3 1 : 1 2 v i c t o r i a d h c l i e n t : D H C P A C K f r o m 1 0 . 0 . 2 . 5 A b s t r a c t i o n # 1 : * * * v i c t o r i a d h c l i e n t : D H C P A C K f r o m * C l u s t e r # 2 : F e b 6 1 2 : 5 6 : 4 8 v i c t o r i a i n i t : S w i t c h i n g t o r u n l e v e l : 0 J a n 1 8 1 7 : 1 3 : 4 9 v i c t o r i a i n i t : S w i t c h i n g t o r u n l e v e l : 6 F e b 6 1 3 : 0 3 : 5 3 v i c t o r i a i n i t : S w i t c h i n g t o r u n l e v e l : 6 A b s t r a c t i o n # 2 : * * * v i c t o r i a i n i t : S w i t c h i n g t o r u n l e v e l : * 14

Experimental Results: Datasets • For all datasets except DFRWS 2016, we recovered the directory /var/log/ from the forensic disk images • We retrieved some common log files such as authentication logs, kernel logs, and system logs 15

Parameter Settings 16

Comparison of Performance • IPLoM shows a good performance because the bijective relationship in a group of log entries can accurately capture the most frequently occurring words • LogSig’ clustering is performed based on a local search algorithm and can lead to local optima. Therefore, it cannot cluster log messages precisely 17

Comparison of Performance • Drain performs well because it considers the first few words in a log entry as contributing most significantly to its abstraction. These words are used to construct a fixed-depth tree. • LogMine performs over-clustering for all datasets because the clustering process is conducted incrementally. If a log entry similarity with an existing cluster representation is less than the given threshold, it will be grouped with that particular cluster. • Spell employs the longest common subsequence (LCS) technique to obtain the abstractions. LCS cannot capture any potential abstraction that has separate substrings. 18

Over-clustering vs Under-clustering • The most important procedure in discovering event log abstractions is the clustering step • If the clustering is performed well, then good abstractions will be produced • We need to get the best cluster composition from event logs 19

Automatic Event Log Abstraction to Support Forensic Investigation - PowerPoint PPT Presentation

Automatic Event Log Abstraction to Support Forensic Investigation Hudan Studiawan, Ferdous Sohel, Christian Payne College of Science, Health, Engineering and Education Murdoch University, Perth, Australia The Australasian Information Security

(142733/102960-Log[4])+(614851/73920-2 Log[64]) h 2 +(2329/1680-Log[4]) h 4 -h 10 /20160

Forensic Science Center Forensic Science Center -10 Budget 10 Budget FY 09- FY 09 Forensic

Forensic Challenge V2.0 UNAM-CERT RedIRIS Topics * Forensic Challenge V1.0 * Forensic

Data Abstraction Announcements Data Abstraction Data Abstraction 4 Data Abstraction

Specialized Topics in Ethical Forensic Practice, Part 3: Bias in Forensic Evaluations November 18,

Chandra data reduction The CDFs Giorgio, Margherita, Elisabeta, Eleonora, Lazarus, Enrica,

Forensic Mental Health Care in the Texas State Hospital System Matthew Faubion, M.D. Forensic

THE NEW FORENSIC PATIENT Learning Objectives Review the epidemiology of forensic populations

Regional Forensic Trainings 2013 Pathways to Conditional Release: An Overview of the Forensic

Predicate Abstraction with SATABS Existential Abstraction Predicate Abstraction for Software

Data Abstraction Announcements Data Abstraction Data Abstraction Programmers Compound

Forensic Voice Comparison and Forensic Acoustics 1 Value and Interpretation of Biometric

Forensic and I nvestigative Speaker Recognition: Situation BKA Odyssey 2016 June 21-24, Bilbao,

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Event Log Chris Bay @chrisbay on GitHub Description Event Log is a web application that allows

Syslog and Log Rotate Computer Center, CS, NCTU Log files Execution information of each

Decla larative Process Min ining: Reducing Dis iscovered Models Complexity by Pre-Processing

MANAGEMENT SYSTEM NC Governors Highway Safety Program 750 North Greenfield Parkway Garner, NC

Event Logging System for Collaborative Communities Divyansh Sharma Kartik Verma Rahul Raj

A tool for specifying and validating software liability VERIMAG, Grenoble, France - Eduardo

How Events Are Reshaping Modern Systems Jonas Bonr @jboner Why Should you care about Events?

REGIONAL TRANSMISSION ORGANIZATIONS / INDEPENDENT SYSTEM OPERATORS AND THE ENERGY IMBALANCE

Nevadas Wholesale Energy Market Lauren Rosenblatt Director, Energy Market Policy July 11,

Network Rail System Operator Mike Smith Head of Strategic Planning 1 We have a need to