Uncovering Priority Anomalies Using Pattern Discovery as a Roadmap - PowerPoint PPT Presentation

Uncovering Priority Anomalies Using Pattern Discovery as a Roadmap for Contextual Analysis Thomas Henretty henretty@reservoir.com FloCon 2020 Reservoir Labs Savannah, GA New York, NY 9 January 2018 www.reservoir.com 1 Reservoir Labs 01.09.2020 FloCon 2020

Presentation Outline Part 1: Background • Tensor Decomposition Basics • Pattern Discovery in Network Flows • MITRE ATT&CK Framework Part 2: Anomaly Ranking • Decompositions as Documents • Topic Modeling for Anomaly Ranking Pattern Discovery • Other Techniques Tensor decomposition provides a model Part 3: Graphs and Databases for Zeek log data that allows behaviors to • Constructing a Targeted Query be separated as coherent patterns Reservoir Labs 2 01.09.2020 FloCon 2020

PART 1: BACKGROUND Reservoir Labs 3 01.09.2020 FloCon 2020

T ensors: Representing Multidimensional Data Time period Real World Data … • Multidimensional s s • Heterogeneous e e s n n o • Large d d u e e r r r • Sparse c receiver receiver e destination Sender x Receiver x Keyword x Time period Src x Dest x Time Email Data Network Traffic Data voltage light humidity p e t temperature r i s m o e n location location Source: Wikipedia Person x Location x Time Time x Location x Type Person x Person x Relation Physical Access Data Environmental Sensor Monitoring Social Network Graph Reservoir Labs 4 01.09.2020 FloCon 2020

Basic CP T ensor Decomposition • CP tensor decomposition • Multidimensional analog to matrix factorization • Break tensor into R components • Components represent correlated data (quantitatively) • Can reconstruct tensor from subset of components Reservoir Labs 5 01.09.2020 FloCon 2020

Example Component: Suspicious DNS T raffic Time x Source IP x Destination IP x Port Reservoir Labs 6 01.09.2020 FloCon 2020

T ensor Library for Cybersecurity Reservoir Labs 7 01.09.2020 FloCon 2020

T ensor Decompositions in MITRE ATT&CK Relevant techniques in the MITRE ATT&CK framework • Depends on data decomposed • Focus on network flows – Netflow – Techniques detected via Netflow/Enclave Netflow – Zeek logs – Netflow + Network Protocol Analysis + Network Intrusion Detection Relevant tactics • When decomposing Zeek logs … – Initial Access (3 of 11 techniques) – Discovery (4 of 23) – Execution (3 of 34) – Lateral Movement (4 of 18) – Persistence (5 of 62) – Collection (0 of 13) – Privilege Escalation (1 of 32) – Command and Control (20 of 22) – Defense Evasion (5 of 69) – Exfiltration (3 of 9) – Credential Access (3 of 21) – Impact (4 of 16) Substantially increase coverage by adding host data (e.g., Sysflow, Event Log, …) , Reservoir Labs 8 01.09.2020 FloCon 2020

T ensor Decomposition Coverage in ATT&CK Covered: Data can be converted to tensors, decomposed, and anomalies identified Covered by Zeek log tensor decompositions Covered by host data tensor decompositions Reservoir Labs 9 01.09.2020 FloCon 2020

Example Detection of ATT&CK T echnique Scanning occurred over one hour Tactic and Technique • Discovery – Network Service Scanning Context • SCinet 2019 • Network for Supercomputing conference Many scanners outside SCinet • All IP addresses public (no firewalls) • No authentication / authorization • ~8 Million flows per hour Many targets inside SCinet Details • Large number of external hosts scanning SCinet • ~176K flows on port 23 • Potential coordination Port 23 • Scan evaded other scan detection tools Reservoir Labs 10 01.09.2020 FloCon 2020

PART 2: ANOMALY DETECTION Reservoir Labs 11 01.09.2020 FloCon 2020

Need to Automate Anomaly Detection Often 100+ components needed to characterize network traffic Most components are benign Challenge is to identify and rank components representing anomalous behavior Each component can take minutes or hours to manually investigate Components are trailheads for further Which components are interesting? investigation Reservoir Labs 12 01.09.2020 FloCon 2020

T opic Modeling for Component Classification Latent Dirichlet Allocation (LDA) • Well-known Bayesian topic modeling algorithm • Learns topic model from a corpus of documents • Infers topic mixture of new documents • Online updates of topic model • Commonly used in other applications – Bioinformatics – Image, video, and sound processing – Collaborative filtering • Mapping tensor decompositions to LDA concepts • Component (as vector) = “document” • Label = “word” • Score = “word count” • Topic = recognizable pattern of network behavior Reservoir Labs 13 01.09.2020 FloCon 2020

LDA Dominant T opic Approach Reservoir Labs 14 01.09.2020 FloCon 2020

Hierarchical LDA Approach Learn topics in tree • Coarse grain behavior at root, fine grain at leaves • Topic is weighted mixture of root-to-leaf paths in tree • Same approach as dominant topic otherwise Reservoir Labs 15 01.09.2020 FloCon 2020

Limitations of Dominant T opic Approaches Reservoir Labs 16 01.09.2020 FloCon 2020

Component Reconstruction Approach Addresses mathematical limitations of dominant topic approach Infer topic mixtures for unseen components and reconstruct with known topics Compare to unseen component and rank by reconstruction error Reservoir Labs 17 01.09.2020 FloCon 2020

Decomposition Difference Approach Compute similarity matrix between current and historical decomposition components Component(s) dissimilar to every historical component represents anomalous behavior Rank by max similarity .00 .01 .04 .01 .99 Unseen Components .95 .02 .01 .00 .02 Unseen component matches historical component .00 .01 .00 .00 .03 Unseen component does not match any historical component .02 .98 .05 .03 .01 .00 .02 .01 .97 .01 Historical Components Reservoir Labs 18 01.09.2020 FloCon 2020

Approximate Convex Hull Approach Compute approximate convex hull of historical decomposition components If a component is a linear combination of historical components, it’s inside the hull and we’ve seen all aspects of the behavior it represents Identify anomalous components outside hull, compute distance to hull Rank by distance to hull Known Behavior Anomalous Behavior v Convex hull of known components Reservoir Labs 19 01.09.2020 FloCon 2020

Epsilon Ball Approach Treat component as vector, compare to historical components Count components inside a hypersphere of radius E Rank by count of components inside hypersphere Historical Component Examined Component E E Known Behavior Anomalous Behavior Reservoir Labs 20 01.09.2020 FloCon 2020

Comparison of Anomaly Detection Approaches Execution Parametric Detects Detects Time Anomalous Anomalous Variations of Behavior Historical Unrelated to Behavior Historical Behavior LDA – Dom Topic High Yes Yes No HLDA – Dom Topic High No Yes No LDA – Component High Yes Yes Yes Reconstruct HLDA – Component High No Yes Yes Reconstruct Decomp Diff Low Yes Somewhat Yes Approximate Convex Hull Low No No Yes Epsilon Ball Low Yes Somewhat Yes Reservoir Labs 21 01.09.2020 FloCon 2020

PART 3: GRAPHS AND DATABASES Reservoir Labs 22 01.09.2020 FloCon 2020

Graphs and Databases in Context Components only tell a small part of the story • E.g., Timestamp, Source IP, Destination IP Component represents beaconing behavior between two IP addresses. Is it C2 traffic? Hourly batch jobs? Hourly log transfers? More information necessary to make a malicious / benign decision • E.g., user, asset type, network topology, known behaviors, threat intel, … • Needed info stored in external DB / graph / … or enriched data in SIEM Use anomalous component as trailhead into investigation • Generate targeted queries to provide context and assist decision making • Massively reduces scope of graph / database analysis Reservoir Labs 23 01.09.2020 FloCon 2020

Generating T argeted Queries Use component labels with nonzero scores to generate “WHERE” clause • E.g., “SELECT * WHERE ts=(00:00, 01:00, …), src_ip=1.2.3.4, dst_ip=5.6.7.8” Component represents beaconing behavior between two IP addresses. Is it C2 traffic? Hourly batch jobs? Hourly log transfers? Problem: Data was binned before conversion to tensor Solution Part 1: Generate backtracking data when building tensor • Map tensor entries to lines in original log Solution Part 2: Reconstruct into tensor, get subset of relevant log entries • Original entries provide more context – exact timestamps, flow IDs, … Reservoir Labs 24 01.09.2020 FloCon 2020

Generating T argeted Queries Use enriched data to filter false positives • E.g., “SELECT * WHERE ts=(00:00, 01:00, …), src_ip=1.2.3.4, dst_ip=5.6.7.8” AND src_ip NOT “batch_server” AND src_ip NOT “log_transfer_hourly” Component represents beaconing behavior between two IP addresses. Is it C2 traffic? Hourly batch jobs? Hourly log transfers? Further queries based on results of targeted query • Query within the returned data or use as guide for further focused queries Targeted query massively reduces size of graph / DB / SIEM data to investigate • Not “boiling the ocean” by running analytics over entire graph / DB / SIEM • Tensor decompositions highly optimized and run on ten-billion scale logs in reasonable time (high minutes / low hours) Reservoir Labs 25 01.09.2020 FloCon 2020

Uncovering Priority Anomalies Using Pattern Discovery as a Roadmap - PowerPoint PPT Presentation

Uncovering Priority Anomalies Using Pattern Discovery as a Roadmap for Contextual Analysis Thomas Henretty henretty@reservoir.com FloCon 2020 Reservoir Labs Savannah, GA New York, NY 9 January 2018 www.reservoir.com 1 Reservoir Labs

Priority Queues Two kinds of priority queues: Min priority queue. Max priority queue.

Plan 1 Priority areas Key to landscape priority areas Historic core Ornamental Parkland

Priority Queues Min Priority Queue Collection of elements. Each element has a priority

Mining Anomalies Andrzej Wasylkowski 1 Why Mine Anomalies? How can we make programs more

UNESCO Discovery Centre reference image of education space UNESCO Discovery Centre Discovery

CS200: Priority Queues, Heaps Prichard Ch. 12 CS200 - Tables and Priority Queues 1 Priority

b s b c anomalies anomalies Found by LHCb (and perhaps Found by several experiments

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Pattern Recogniton Pattern: Any

Pattern Discovery in Biosequences Pattern Discovery in Biosequences ISMB 2002 tutorial ISMB 2002

Pattern Discovery in Biosequences Pattern Discovery in Biosequences ISMB 2002 tutorial (Appendix)

Pattern Discovery in Biosequences Pattern Discovery in Biosequences SDM 2005 tutorial (Appendix)

CMSC 206 Binary Heaps Priority Queues Priority Queues n Priority: some property of an object

Heaps and Priority Queues 2 5 6 9 7 Heaps and Priority Queues 1 Priority Queue ADT (

Detection of electromagnetic anomalies Detection of electromagnetic anomalies before volcanic

Impact of Meteorological Impact of Meteorological A Anomalies on Forest Anomalies on Forest A

Anomalies in Data Maximilian Toller KDDM2 Maximilian Toller, Know-Center > www.tugraz.at 1

The project INTERACTION Driver INTERACTION with in-vehicle technologies EU 7 th framework

Rock Slope Stability at Valdez Marine Terminal Contract Change Order TOEM Committee RCAC Staff

Great Eagle Holdings Investor Presentation Q3 2017 1 Great Eagle Holdings Limited Background A

LGU PROFILE Distance 532 kms. South of Manila Total Land Area 20,437.10 has. Barangays 70

Developing the UKs regulatory framework for automated vehicles Jessica Uguccioni Law

Compensation Redesign Board of Education Presentation February 6, 2014 Introduction MOU has

Team 1 Lee-Huang Chen Casey Duckering Cheng Hao Yuan Vehicle Electronic Hardware Two circuit

Q4 2019 January-December Financial statements review President and CEO Hannu Martola 10

Uncovering Priority Anomalies Using Pattern Discovery as a Roadmap - PowerPoint PPT Presentation

Uncovering Priority Anomalies Using Pattern Discovery as a Roadmap for Contextual Analysis Thomas Henretty henretty@reservoir.com FloCon 2020 Reservoir Labs Savannah, GA New York, NY 9 January 2018 www.reservoir.com 1 Reservoir Labs

Priority Queues Two kinds of priority queues: Min priority queue. Max priority queue.

Plan 1 Priority areas Key to landscape priority areas Historic core Ornamental Parkland

Priority Queues Min Priority Queue Collection of elements. Each element has a priority

Mining Anomalies Andrzej Wasylkowski 1 Why Mine Anomalies? How can we make programs more

UNESCO Discovery Centre reference image of education space UNESCO Discovery Centre Discovery

CS200: Priority Queues, Heaps Prichard Ch. 12 CS200 - Tables and Priority Queues 1 Priority

b s b c anomalies anomalies Found by LHCb (and perhaps Found by several experiments

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Pattern Recogniton Pattern: Any

Pattern Discovery in Biosequences Pattern Discovery in Biosequences ISMB 2002 tutorial ISMB 2002

Pattern Discovery in Biosequences Pattern Discovery in Biosequences ISMB 2002 tutorial (Appendix)

Pattern Discovery in Biosequences Pattern Discovery in Biosequences SDM 2005 tutorial (Appendix)

CMSC 206 Binary Heaps Priority Queues Priority Queues n Priority: some property of an object

Heaps and Priority Queues 2 5 6 9 7 Heaps and Priority Queues 1 Priority Queue ADT (

Detection of electromagnetic anomalies Detection of electromagnetic anomalies before volcanic

Impact of Meteorological Impact of Meteorological A Anomalies on Forest Anomalies on Forest A

Anomalies in Data Maximilian Toller KDDM2 Maximilian Toller, Know-Center &gt; www.tugraz.at 1

The project INTERACTION Driver INTERACTION with in-vehicle technologies EU 7 th framework

Rock Slope Stability at Valdez Marine Terminal Contract Change Order TOEM Committee RCAC Staff

Great Eagle Holdings Investor Presentation Q3 2017 1 Great Eagle Holdings Limited Background A

LGU PROFILE Distance 532 kms. South of Manila Total Land Area 20,437.10 has. Barangays 70

Developing the UKs regulatory framework for automated vehicles Jessica Uguccioni Law

Compensation Redesign Board of Education Presentation February 6, 2014 Introduction MOU has

Team 1 Lee-Huang Chen Casey Duckering Cheng Hao Yuan Vehicle Electronic Hardware Two circuit

Q4 2019 January-December Financial statements review President and CEO Hannu Martola 10

Anomalies in Data Maximilian Toller KDDM2 Maximilian Toller, Know-Center > www.tugraz.at 1