Improved Hunt Seeding with Specific Anomaly Scoring Brenden Bishop - - PowerPoint PPT Presentation

improved hunt seeding with specific anomaly scoring
SMART_READER_LITE
LIVE PREVIEW

Improved Hunt Seeding with Specific Anomaly Scoring Brenden Bishop - - PowerPoint PPT Presentation

Introduction Finding Anomalies Example Conclusion References Improved Hunt Seeding with Specific Anomaly Scoring Brenden Bishop January 8, 2019 1/21 Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring Introduction Finding


slide-1
SLIDE 1

1/21 Introduction Finding Anomalies Example Conclusion References

Improved Hunt Seeding with Specific Anomaly Scoring

Brenden Bishop January 8, 2019

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-2
SLIDE 2

2/21 Introduction Finding Anomalies Example Conclusion References

1 Introduction

First things first Framing the problem

2 Finding Anomalies

Density estimation Scoring

3 Example 4 Conclusion

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-3
SLIDE 3

3/21 Introduction Finding Anomalies Example Conclusion References First things first

New presentation who dis?

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-4
SLIDE 4

3/21 Introduction Finding Anomalies Example Conclusion References First things first

New presentation who dis?

My formal training was in quantitative psychology and statistics at The Ohio State University, graduated 2017

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-5
SLIDE 5

3/21 Introduction Finding Anomalies Example Conclusion References First things first

New presentation who dis?

My formal training was in quantitative psychology and statistics at The Ohio State University, graduated 2017 Started at Columbus Collaboratory, working on a variety of projects, quite a bit of prototyping

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-6
SLIDE 6

3/21 Introduction Finding Anomalies Example Conclusion References First things first

New presentation who dis?

My formal training was in quantitative psychology and statistics at The Ohio State University, graduated 2017 Started at Columbus Collaboratory, working on a variety of projects, quite a bit of prototyping Love cyber projects because, by and large, one can actually measure all the stuff required to answer the question

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-7
SLIDE 7

4/21 Introduction Finding Anomalies Example Conclusion References First things first

Hunting

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-8
SLIDE 8

4/21 Introduction Finding Anomalies Example Conclusion References First things first

Hunting

Hunting has become an integral component of mature cyber security operations

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-9
SLIDE 9

4/21 Introduction Finding Anomalies Example Conclusion References First things first

Hunting

Hunting has become an integral component of mature cyber security operations Network defenders spend a portion of their time hunting for vulnerabilities, misconfigurations, or previously unnoticed security events

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-10
SLIDE 10

4/21 Introduction Finding Anomalies Example Conclusion References First things first

Hunting

Hunting has become an integral component of mature cyber security operations Network defenders spend a portion of their time hunting for vulnerabilities, misconfigurations, or previously unnoticed security events The practice has evolved beyond grepping randomly through logs

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-11
SLIDE 11

4/21 Introduction Finding Anomalies Example Conclusion References First things first

Hunting

Hunting has become an integral component of mature cyber security operations Network defenders spend a portion of their time hunting for vulnerabilities, misconfigurations, or previously unnoticed security events The practice has evolved beyond grepping randomly through logs Hunts can now be seeded using ML/AI/Statistical models, leading to a directed search rather than a random walk

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-12
SLIDE 12

5/21 Introduction Finding Anomalies Example Conclusion References Framing the problem

Sounds simple enough, but...

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-13
SLIDE 13

5/21 Introduction Finding Anomalies Example Conclusion References Framing the problem

Sounds simple enough, but...

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-14
SLIDE 14

6/21 Introduction Finding Anomalies Example Conclusion References Framing the problem

Challenges

Frequent challenges when finding anomalies:

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-15
SLIDE 15

6/21 Introduction Finding Anomalies Example Conclusion References Framing the problem

Challenges

Frequent challenges when finding anomalies:

1 ”Find anything strange on the network” is not sufficiently

specific

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-16
SLIDE 16

6/21 Introduction Finding Anomalies Example Conclusion References Framing the problem

Challenges

Frequent challenges when finding anomalies:

1 ”Find anything strange on the network” is not sufficiently

specific (neither is “Find any lateral movement.”)

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-17
SLIDE 17

6/21 Introduction Finding Anomalies Example Conclusion References Framing the problem

Challenges

Frequent challenges when finding anomalies:

1 ”Find anything strange on the network” is not sufficiently

specific (neither is “Find any lateral movement.”)

Statistics requires problem identification, consideration of available variables, and understanding how observations arise

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-18
SLIDE 18

6/21 Introduction Finding Anomalies Example Conclusion References Framing the problem

Challenges

Frequent challenges when finding anomalies:

1 ”Find anything strange on the network” is not sufficiently

specific (neither is “Find any lateral movement.”)

Statistics requires problem identification, consideration of available variables, and understanding how observations arise

2 Cyber and statistics/data science folks can talk past one

another

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-19
SLIDE 19

6/21 Introduction Finding Anomalies Example Conclusion References Framing the problem

Challenges

Frequent challenges when finding anomalies:

1 ”Find anything strange on the network” is not sufficiently

specific (neither is “Find any lateral movement.”)

Statistics requires problem identification, consideration of available variables, and understanding how observations arise

2 Cyber and statistics/data science folks can talk past one

another

3 Unsupervised learning is prone to a high false alarm rate;

Machine Learning/Artificial Intelligence/Automated-Inference are not immune

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-20
SLIDE 20

7/21 Introduction Finding Anomalies Example Conclusion References Framing the problem

Addressing challenges

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-21
SLIDE 21

7/21 Introduction Finding Anomalies Example Conclusion References Framing the problem

Addressing challenges

1 Scope problems appropriately (e.g. Find strange outbound

connections to cloud storage.)

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-22
SLIDE 22

7/21 Introduction Finding Anomalies Example Conclusion References Framing the problem

Addressing challenges

1 Scope problems appropriately (e.g. Find strange outbound

connections to cloud storage.)

2 Cyber and statistics/AI/ML experts must iterate

collaboratively; interdisciplinary teams are optimal for innovation

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-23
SLIDE 23

7/21 Introduction Finding Anomalies Example Conclusion References Framing the problem

Addressing challenges

1 Scope problems appropriately (e.g. Find strange outbound

connections to cloud storage.)

2 Cyber and statistics/AI/ML experts must iterate

collaboratively; interdisciplinary teams are optimal for innovation

3 Turn big data into managable data, and, where possible, turn

unsupervised problems into supervised. Collect data and validate models

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-24
SLIDE 24

7/21 Introduction Finding Anomalies Example Conclusion References Framing the problem

Addressing challenges

1 Scope problems appropriately (e.g. Find strange outbound

connections to cloud storage.)

2 Cyber and statistics/AI/ML experts must iterate

collaboratively; interdisciplinary teams are optimal for innovation

3 Turn big data into managable data, and, where possible, turn

unsupervised problems into supervised. Collect data and validate models (practice security as a science)

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-25
SLIDE 25

7/21 Introduction Finding Anomalies Example Conclusion References Framing the problem

Addressing challenges

1 Scope problems appropriately (e.g. Find strange outbound

connections to cloud storage.)

2 Cyber and statistics/AI/ML experts must iterate

collaboratively; interdisciplinary teams are optimal for innovation

3 Turn big data into managable data, and, where possible, turn

unsupervised problems into supervised. Collect data and validate models (practice security as a science) The remainder of the talk essentially focuses on item three

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-26
SLIDE 26

8/21 Introduction Finding Anomalies Example Conclusion References

Good news everyone

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-27
SLIDE 27

9/21 Introduction Finding Anomalies Example Conclusion References

Good news everyone

Cyber security data is particularly well suited to statistical inference

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-28
SLIDE 28

9/21 Introduction Finding Anomalies Example Conclusion References

Good news everyone

Cyber security data is particularly well suited to statistical inference

Logs are typically a census of network activity, we have the population

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-29
SLIDE 29

9/21 Introduction Finding Anomalies Example Conclusion References

Good news everyone

Cyber security data is particularly well suited to statistical inference

Logs are typically a census of network activity, we have the population

Probability measures offer single-number summaries of all available information; anomalies are events with low probability

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-30
SLIDE 30

9/21 Introduction Finding Anomalies Example Conclusion References

Good news everyone

Cyber security data is particularly well suited to statistical inference

Logs are typically a census of network activity, we have the population

Probability measures offer single-number summaries of all available information; anomalies are events with low probability Building an anomaly scoring model is tantamount to estimating a probability distribution

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-31
SLIDE 31

9/21 Introduction Finding Anomalies Example Conclusion References

Good news everyone

Cyber security data is particularly well suited to statistical inference

Logs are typically a census of network activity, we have the population

Probability measures offer single-number summaries of all available information; anomalies are events with low probability Building an anomaly scoring model is tantamount to estimating a probability distribution Models can be validated during the course of regular hunting

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-32
SLIDE 32

10/21 Introduction Finding Anomalies Example Conclusion References Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-33
SLIDE 33

11/21 Introduction Finding Anomalies Example Conclusion References

Some fundamentals

1 Network activity can be quantified (e.g. time, bytes sent,

bytes received, protocol, connection type)

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-34
SLIDE 34

11/21 Introduction Finding Anomalies Example Conclusion References

Some fundamentals

1 Network activity can be quantified (e.g. time, bytes sent,

bytes received, protocol, connection type)

2 Quantified information can be stored in a numeric matrix with

each row representing a single multivariate observation

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-35
SLIDE 35

11/21 Introduction Finding Anomalies Example Conclusion References

Some fundamentals

1 Network activity can be quantified (e.g. time, bytes sent,

bytes received, protocol, connection type)

2 Quantified information can be stored in a numeric matrix with

each row representing a single multivariate observation

3 The observations are realizations from some probability

distribution

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-36
SLIDE 36

11/21 Introduction Finding Anomalies Example Conclusion References

Some fundamentals

1 Network activity can be quantified (e.g. time, bytes sent,

bytes received, protocol, connection type)

2 Quantified information can be stored in a numeric matrix with

each row representing a single multivariate observation

3 The observations are realizations from some probability

distribution

4 Anomalies are aberrant rows, from low-density regions

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-37
SLIDE 37

12/21 Introduction Finding Anomalies Example Conclusion References Density estimation

Estimation

Statisticians have been improving density estimation for around a century

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-38
SLIDE 38

12/21 Introduction Finding Anomalies Example Conclusion References Density estimation

Estimation

Statisticians have been improving density estimation for around a century Kernel density estimators allow nonparametric estimation of any p dimensional probability distribution

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-39
SLIDE 39

12/21 Introduction Finding Anomalies Example Conclusion References Density estimation

Estimation

Statisticians have been improving density estimation for around a century Kernel density estimators allow nonparametric estimation of any p dimensional probability distribution Though in practice, whenever p is larger than about 5 estimation can become quite burdensome

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-40
SLIDE 40

12/21 Introduction Finding Anomalies Example Conclusion References Density estimation

Estimation

Statisticians have been improving density estimation for around a century Kernel density estimators allow nonparametric estimation of any p dimensional probability distribution Though in practice, whenever p is larger than about 5 estimation can become quite burdensome One promising approach that circumvents this effective dimensionality constraint is the use of vine copulas

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-41
SLIDE 41

13/21 Introduction Finding Anomalies Example Conclusion References Density estimation

Vine copulas in a nut shell

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-42
SLIDE 42

13/21 Introduction Finding Anomalies Example Conclusion References Density estimation

Vine copulas in a nut shell

Copulas can partition multivariate densities into the product

  • f their marginals and a component which captures all

dependencies

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-43
SLIDE 43

13/21 Introduction Finding Anomalies Example Conclusion References Density estimation

Vine copulas in a nut shell

Copulas can partition multivariate densities into the product

  • f their marginals and a component which captures all

dependencies Vine copulas split the dependency portion into p(p-1)/2 bivariate copula densities, decoupling convergence speed and dimension

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-44
SLIDE 44

13/21 Introduction Finding Anomalies Example Conclusion References Density estimation

Vine copulas in a nut shell

Copulas can partition multivariate densities into the product

  • f their marginals and a component which captures all

dependencies Vine copulas split the dependency portion into p(p-1)/2 bivariate copula densities, decoupling convergence speed and dimension tl;dr One can estimate complicated multivariate distributions fairly accurately and quickly

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-45
SLIDE 45

14/21 Introduction Finding Anomalies Example Conclusion References Scoring

Scoring

Possessing an estimate of a distribution allows for the evaluation of the estimated density for novel values

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-46
SLIDE 46

14/21 Introduction Finding Anomalies Example Conclusion References Scoring

Scoring

Possessing an estimate of a distribution allows for the evaluation of the estimated density for novel values One can assign a probability to each record log and sort low probability events to the top

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-47
SLIDE 47

14/21 Introduction Finding Anomalies Example Conclusion References Scoring

Scoring

Possessing an estimate of a distribution allows for the evaluation of the estimated density for novel values One can assign a probability to each record log and sort low probability events to the top The most rare events can be given to a hunter, beginning iterative evaluation of the model

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-48
SLIDE 48

15/21 Introduction Finding Anomalies Example Conclusion References

Raw data

We’ll use a subset of publicly available data from Kent [2015]

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-49
SLIDE 49

15/21 Introduction Finding Anomalies Example Conclusion References

Raw data

We’ll use a subset of publicly available data from Kent [2015] The full data represents 58 consecutive days of events from Los Almos National Laboratory corporate, internal network (csr.lanl.gov/data/cyber1/) Data is de-identified, even the time variable

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-50
SLIDE 50

15/21 Introduction Finding Anomalies Example Conclusion References

Raw data

We’ll use a subset of publicly available data from Kent [2015] The full data represents 58 consecutive days of events from Los Almos National Laboratory corporate, internal network (csr.lanl.gov/data/cyber1/) Data is de-identified, even the time variable Say one is looking for anomalous, successful authentication events

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-51
SLIDE 51

16/21 Introduction Finding Anomalies Example Conclusion References

Wrangle data and analyze

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-52
SLIDE 52

16/21 Introduction Finding Anomalies Example Conclusion References

Wrangle data and analyze

Dummy code login-type and authentication-type factors, and engineer other desired features

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-53
SLIDE 53

16/21 Introduction Finding Anomalies Example Conclusion References

Wrangle data and analyze

Dummy code login-type and authentication-type factors, and engineer other desired features Wrangled data set is 13 dimensional binary

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-54
SLIDE 54

16/21 Introduction Finding Anomalies Example Conclusion References

Wrangle data and analyze

Dummy code login-type and authentication-type factors, and engineer other desired features Wrangled data set is 13 dimensional binary Employ a continuous convolution to allow for kernel density estimation

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-55
SLIDE 55

16/21 Introduction Finding Anomalies Example Conclusion References

Wrangle data and analyze

Dummy code login-type and authentication-type factors, and engineer other desired features Wrangled data set is 13 dimensional binary Employ a continuous convolution to allow for kernel density estimation Use the kdevine or vinecopular R libraries to estimate the density

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-56
SLIDE 56

17/21 Introduction Finding Anomalies Example Conclusion References

Just that easy

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-57
SLIDE 57

18/21 Introduction Finding Anomalies Example Conclusion References Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-58
SLIDE 58

19/21 Introduction Finding Anomalies Example Conclusion References

Were you talking just now?

With minimal investment, defenders can easily build probability models for any logs they want, not bound by existing tools

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-59
SLIDE 59

19/21 Introduction Finding Anomalies Example Conclusion References

Were you talking just now?

With minimal investment, defenders can easily build probability models for any logs they want, not bound by existing tools Models be generated on the fly, one-offs for a given hunt

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-60
SLIDE 60

19/21 Introduction Finding Anomalies Example Conclusion References

Were you talking just now?

With minimal investment, defenders can easily build probability models for any logs they want, not bound by existing tools Models be generated on the fly, one-offs for a given hunt Models can be refined/tuned as hunters check examine

  • utputs and iterative development continues

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-61
SLIDE 61

19/21 Introduction Finding Anomalies Example Conclusion References

Were you talking just now?

With minimal investment, defenders can easily build probability models for any logs they want, not bound by existing tools Models be generated on the fly, one-offs for a given hunt Models can be refined/tuned as hunters check examine

  • utputs and iterative development continues

If at some point a model is found to have a satisfactory hit-rate, the anomalies are interesting, then one create an automatic detector

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-62
SLIDE 62

20/21 Introduction Finding Anomalies Example Conclusion References

Thank you, kindly.

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring

slide-63
SLIDE 63

21/21 Introduction Finding Anomalies Example Conclusion References

  • K. Aas, C. Czado, A. Frigessi, and H. Bakken. Pair-copula

constructions of multiple dependence. Insurance: Mathematics and economics, 44(2):182–198, 2009.

  • A. D. Kent. Comprehensive, Multi-Source Cyber-Security Events.

Los Alamos National Laboratory, 2015.

  • T. Nagler. Kernel methods for vine copula estimation. 2014.
  • T. Nagler. A generic approach to nonparametric function

estimation with mixed data. Statistics & Probability Letters, 137:326–330, 2018.

  • T. Nagler and C. Czado. Evading the curse of dimensionality in

nonparametric density estimation with simplified vine copulas. Journal of Multivariate Analysis, 151:69–89, 2016.

Brenden Bishop Improved Hunt Seeding withSpecific Anomaly Scoring