David Rodriguez March 28, 2019
Masquerading Malicious DNS Traffic Bayesian Inference, Rainier, - - PowerPoint PPT Presentation
Masquerading Malicious DNS Traffic Bayesian Inference, Rainier, - - PowerPoint PPT Presentation
Masquerading Malicious DNS Traffic Bayesian Inference, Rainier, Spark David Rodriguez March 28, 2019 The Outline Masquerading Time Series Rainier Anomaly DNS Modeling + Detection Traffic Spark The Outline Masquerading Time Series
Rainier + Spark Time Series Modeling Anomaly Detection Masquerading DNS Traffic
The Outline
Rainier + Spark Time Series Modeling Anomaly Detection Masquerading DNS Traffic
The Outline
Cisco Umbrella DNS Resolution
Part 1 DNS Resolution
Web Server IP Address Mail Server Many More DNS Records 180 Billion Per Day Cisco Umbrella
Part 1 Protection 101
Phishing Compromised Account Malvertising Ransomware Worms Virus
Part 1 Definition
Masquerading Traffic = Masquerading Users +
Compromised Websites
Part 1 Masquerading Users
PDF Viewer Text Editor Browsing Internet Email SSH Keys
Part 1 Compromised Websites
Compromised Server Malicious Webpage Backdoor Vulnerability Typical Visitors Phished Browser Redirect
Part 1 Masquerading DNS Traffic
Atypical Vistor Typical Vistor DNS Traffic
Part 1 Emotet Campaign
Phishing Email User Click Links or Opens Attachments to Email Links or Macros Make DNS Requests Malware Downloaded Emotet Runs Code in Process and Registers Computer with C2 Server Masquerading Traffic
Part 1 Emotet Campaign
Rainier + Spark Time Series Modeling Anomaly Detection Masquerading DNS Traffic
The Outline
Part 2 Time-Series Analysis
Expected Non-Zero Volume Expected Zero Volume Extreme Outliers
Part 2 Time-Series Analysis
Probability of Demand Expected Demand when non-zero
Part 2 Croston’s Method
Spark Volume Pipeline Spark Table Join Spark Historical Table Spark Table Note : Trended Data Store
Part 2 Bayesian Approach X Y
Probability Distribution Probability of Demand Expected Demand when non-zero
Part 2 Bayesian Approach
1 2 3 4 5 6 7 8 9 Zero Distribution Non-Zero Distribution Outliers Outliers
Part 2 Mixture Models
Part 2 Discrete Models
Part 2 Continuous Models
Rainier + Spark Time Series Modeling Anomaly Detection Masquerading DNS Traffic
The Outline
Part 3 MCMC Methods
Observations Proposed Distribution Sampling From Distribution Rejection
- f
Samples
Part 3 MCMC Methods
Part 3 Rainier ~ README
Depending on your background, you might think of Rainier as aspiring to be either: “Stan, but on the JVM”
- r
“Tensorflow, but for small data”.
Part 3 Rainier Methods
Part 3 PyMC Methods
Part 3 Rainier + Spark
JVM Rainier Spark
Part 3 Rainier + Spark
Hourly Aggregations Daily Aggregations Rainier Simulations Spark Job 150 Million Paid-Level Domains Spark Job Spark Job Filtering Heuristics
Rainier + Spark Time Series Modeling Anomaly Detection Masquerading DNS Traffic
The Outline
Part 4 Window Based
Window 1 Window 2
Rainier
Window 1 Window 2
Simulated Parameter Values Distribution Parameter Values Difference
Part 4 Window Simulations
Week 1 Week 2 Week 3 Week 4
Part 4 Outlier Window
Part 4 Local Outlier to Global
Closing Recap
Rainier + Spark Time Series Modeling Anomaly Detection Masquerading DNS Traffic
Closing Glossed Over Details
Outliers Goodness
- f
Fit
A Review of Croston's method for intermittent demand forecasting
https://www.researchgate.net/publication/254044245_A_Review_of_Croston's_method_for_intermittent_demand_forecasting
Rainier
https://github.com/stripe/rainier
PyMC3
https://docs.pymc.io/
Emotet
https://www.us-cert.gov/ncas/alerts/TA18-201A
Bokeh Plots
https://bokeh.pydata.org/en/latest/
Twitter Chill
https://github.com/twitter/chill
Closing References
Website
davidrdgz.github.io
Github
@davidrdgz
@davidrdgz
davrodr3 at cisco.com