FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH MACHINE LEARNING - - PowerPoint PPT Presentation

fighting domain
SMART_READER_LITE
LIVE PREVIEW

FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH MACHINE LEARNING - - PowerPoint PPT Presentation

FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH MACHINE LEARNING GPU Technical Conference: Spring 2018 San Jose, CA Speakers: Greg McCullough and Aaron Sant-Miller MARCH 28, 2018 Collaboration space, Alexandria, VA CYBER ATTACKS ARE


slide-1
SLIDE 1

GPU Technical Conference: Spring 2018 – San Jose, CA Speakers: Greg McCullough and Aaron Sant-Miller

MARCH 28, 2018

FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH MACHINE LEARNING

Collaboration space, Alexandria, VA

slide-2
SLIDE 2

CYBER ATTACKS ARE HARD TO DETECT AND REQUIRE MULTIPLE MODELS, INFORMED BY CYBER EXPERTISE

2

Booz Allen Hamilton

The Challenges

  • 1. Increasing reliance on IT systems, and the development of new systems, expands the attack surface every day.
  • 2. The cyber domain and our adversaries are rapidly evolving, where the defenses of yesterday are quickly outdated.
  • 3. The technical depth of the domain is significant, demanding high end technical talent to just understand the problem.

Today, the average cyber breach is detected more 250 days after the intrusion. That leaves adversaries 250 days to steal data, compromise the network, and create more open attack vectors to disrupt the mission. Booz Allen’s Cyber Precog: Network speed alerting through cyber-informed ML model ensembling

  • 1. Optimized DL Edge Models – live at the edge, examine all DNS traffic, and flag logs that may have a malicious domain
  • 2. Bayesian Behavioral Models – develop behavioral baselines for endpoints, and alert analysts when an endpoint

navigates to a dangerous domain and deviates from its established behavioral baseline

Effective cyber defense with machine learning and automation is not built on data science skill

  • alone. Cyber expertise must be fused with data science and software development tradecraft.

Proven, deployed, and operational capability and service offering. We’ll walk through an adware campaign we caught last week for one of our partners. We can effectively fight cyber adversaries with intelligent automation and machine learning, decreasing the time to intrusion detection.

Our DGA use case:

slide-3
SLIDE 3

AGENDA

WHO WE ARE: BOOZ ALLEN CYBER AND DATA SCIENCE THE CHALLENGES OF CYBER DEFENSE DGAS AND AI-ENABLED DEFENSIVE TACTICS DEEP LEARNING ON MALICIOUS DOMAINS ADAPTIVE BAYESIAN LEARNING FOR BETTER ALERTING CYBER PRECOG: VIDEO DEMONSTRATION BOOZ ALLEN CYBER: OUR AI-ENABLED FUTURE STATE

Booz Allen Hamilton Internal

3

slide-4
SLIDE 4

BOOZ ALLEN HAMILTON: WHO WE ARE

4

Booz Allen Hamilton

Greg McCullough: Director of Cyber Machine Intelligence Aaron Sant-Miller: Lead Data Scientist

Greg McCullough is the Director of Cyber Machine Intelligence Capability Development at Booz Allen

  • Hamilton. He has over ten years of experience developing cyber capabilities across the Defense

market, while building, deploying, and scaling government custom products and solutions focused on securing networks and IT systems. Most recently, he has driven compliance automation and key cyber integrations across the entire Federal market. He holds a BS in Computer Science from Butler University, a BS in Electrical Engineering from Purdue University, and an MS in Computer Science from George Washington University. Aaron Sant-Miller is a Lead Data Scientist at Booz Allen Hamilton with a specialization in applied mathematics, machine learning, and statistical modeling. He has architected, developed, and deployed data science solutions and machine learning suites across a wide-range of domains, including tax fraud detection, climate science trend forecasting, cybersecurity risk scoring, and professional athlete performance prediction. Aaron’s current areas of research are focused on Bayesian modeling design, synthetic data generation, and neural network-based time series modeling. He holds a BS and an MS in Applied and Computational Mathematics and Statistics from the University of Notre Dame.

About Booz Allen Hamilton Cyber

For more than 100 years, business, government, and military leaders have turned to Booz Allen Hamilton to solve their most complex

  • problems. We are at the forefront of the cyber frontier, relentlessly pursuing innovative solutions that make the world a safer place to

live, serve, and do business. With decades of mission intelligence combined with the most advanced tools available, we prote ct industry and government against the attacks of today, and prepare them for the threats of tomorrow. To learn more, visit BoozAllen.com.

slide-5
SLIDE 5

BOOZ ALLEN DELIVERS SOLUTIONS WITH A FUSION OF CYBER EXPERTISE AND DATA SCIENCE TRADECRAFT

5

Booz Allen Hamilton

Analytics driven by statistical rigor Computational

  • ptimization

Machine learning model engineering Cyber defense

  • perations

Cyber engineering and integration Cybersecurity compliance

Booz Allen Cyber ML Capability Offerings Cybersecurity Data science

Booz Allen works to fuse capability offerings across domains to maximize solution impact

slide-6
SLIDE 6

EVOLVING CHALLENGES IN CYBERSECURITY DEMAND CREATIVE AND INTELLIGENT DEFENSIVE POSTURE

6

Booz Allen Hamilton

Cyber attacks can cause significant damage An Evolving Landscape of Challenges

Attack surfaces are rapidly expanding – growing dependence on IT systems and rapidly evolving novel technologies expose our networks in new ways while increasing our dependence on vulnerable systems The work force is saturated – adding more bodies to defensive efforts no longer improves defense due to a lack

  • f cyber talent and diminished returns from increased human labor and manual defensive tactics

Organizations are inundated with cyber tools – well-funded organizations have the money to buy new cyber tools and do so, but they are unable to effectively manage or integrate the capabilities of these tools Attackers are talented and increasingly more sophisticated – adversaries are getting more creative, developing dynamic attacks that can circumvent existing rules-driven and structurally-defined cyber defenses Cyber compromises are having real financial and physical impacts at an organizational and individual level. Creative adversaries have the ability to compromise an endpoint, access a network, steal and ransom data or accounts, and dangerously expose personal information to the open market. Many recent high profile attacks demonstrate this impact. An evolving landscape demands innovation and creative, new defensive tactics to advance defensive posture in a challenging and impactful cyber warzone.

  • -- This is the Booz Allen Cyber Mission ---
slide-7
SLIDE 7

7

Booz Allen Hamilton

DGAS EXEMPLIFY TRANSFORMATIVE ADVERSARIAL TACTICS THAT DEMAND INNOVATIVE AND ADAPTIVE CYBER DEFENSE

X

New tactics demand new defenses

Adversaries have developed creative tactics that easily circumvent rules-based defenses. To counter more adaptive attack methods, we must develop our own adaptive and innovative techniques to prevent attacks that transform every minute. Machine learning and AI enable our defenses to evolve and react to new tactics in real time, hardening our defenses.

Adversaries Adversaries Rules Compromise AI Defense Security

Domain Generation Algorithms (DGAs) are algorithms that can rapidly create a large number of domain names that act as a midpoint between a user and malware. ➢ Ever-changing and adaptive: Algorithms can rapidly generate new domains of new structures with regularity ➢ Inconspicuous at the surface-level: Algorithms can concatenate dictionary words or normative character patterns ➢ Large in number and historically tagged: Large pools of known DGAs are available and have been reverse engineered To defend against DGAs:

  • Defenses must understand underlying domain

characteristics, but also evolve and adapt rapidly We have at our disposal:

  • Large amounts of tagged data from uncovered

and reverse engineered DGAs Adaptable defense counters adaptive offense

This is an ideal use case for AI-powered cyber defense

slide-8
SLIDE 8

8

Booz Allen Hamilton

Proven DL capabilities are the building blocks

Academic research and our Booz Allen deployments have proven the efficacy of these models in implementation and test. When trained at scale, deep neural networks can learn the underlying framework used by a DGA to build out a breadth of malicious domains, moving beyond memorization of “known bads” toward an understanding of adversarial toolkits

CNNS AND LSTMS ARE PROVEN SOLUTIONS, WHERE GPUS ENABLE INLINE MODEL INFERENCE AT NETWORK SPEED

  • 1. Yu et al. (2017). “Inline DGA Detection with Deep Networks.” IEEE International Conference on Data Mining. http://doi.org/10.1109/ICDMW.2017.96

Proven Model Architectures1 Both the LSTM and the CNN use simple, lightweight architectures (see Yu et al 2007)

  • Capable of powerful

performance in holdout test

  • Simplicity allows for rapid

inference at network speed Training Approach Fuses multiple approaches into a complete learning scheme

  • 1. Offline training: Bambenek

DGA Dataset (4M)

  • 2. Automated Update: Open-

web intel collection

  • 3. Network Tailoring

Optimized Hardware Deployment

  • Lives on one NVIDIA DGX-1, across 8 GPUs
  • Deployed and scaled using MXNet framework
  • Proven to handle 3.5 GB/s throughput

Performance

  • 97 percent holdout balanced accuracy
  • Proven detection in network deployments
slide-9
SLIDE 9

If an endpoint is compromised, its behavior will change as a result of the intrusion. Cyber MI must flag potential compromise and alert when behavioral models notice a simultaneous change.

9

Booz Allen Hamilton

Network traffic off sensor Database Layer (e.g. Timescale / PostgreSQL & MapD): All traffic is held for periods dependent on degree of connotated risk of malicious action Analytic Layer (e.g. CNN and Behavioral Model):

Model

  • utputs

Model alerts

Application Layer

Analyst inputs

Existing SIEM (e.g. Splunk) DGA Detection: Models flag logs that reflect potential compromise Behavioral Models: Bayesian models flag endpoints that break from endpoint norm Cyber Precog allows analysts to investigate and flag legitimate alerts All model

  • utputs

and Precog alerts integrate seamlessly with existing SIEMs

Flagged traffic

BOOZ ALLEN’S CYBER PRECOG COMBINES EDGE MODELS WITH ADAPTIVE BEHAVIORAL MODELS TO CURATE TAILORED ALERTS

High false positive rates have stigmatized ML in cyber

Historically, machine learning in cyber has been stigmatized due to high false positive rates of ML-enabled alerting systems. As the adversarial tactics are rapidly changing, models that train offline and are slow to update rarely perform well and

  • ften provide outdated and incorrect alerts. The prevalence of poorly deployed systems stigmatized ML among SMEs.

Booz Allen’s Cyber Precog: Combining network speed alerting with adaptive, endpoint behavioral learning

  • 1. Optimized Edge Models – live at the edge, examine all DNS traffic, and flag logs that may have a malicious domain
  • 2. Bayesian Behavioral Models – develop behavioral baselines for endpoints, and alert analysts when an endpoint traverses

to a flagged domain and deviates from its established behavioral baseline

slide-10
SLIDE 10

DEMONSTRATION

Booz Allen Hamilton

10

slide-11
SLIDE 11

11

Booz Allen Hamilton

ADVERSARIES ARE SMART – COMBATTING DGAS IS ONLY ONE PIECE OF THE CYBERSECURITY PUZZLE THAT NEEDS ADAPTIVE DEFENSE

Beaconing PowerShell Scripting Network Scanning DNS Exfiltration Port/Protocol Anomalies Graph/Network Connection Anomalies DGA Detection

The solution demands many pieces

Comprehensive, adaptive cyber defensive posture requires collaborative work between ML engineers and SMEs.

  • 1. Cyber talent and domain expertise to shape MI solutions
  • 2. Rapid innovation to keep pace with adversarial advancement

Optimized,

  • perationalized

capabilities Identified gaps and needs, responses to new adversary tactics

  • 4. A Scaled, ML-enabled Cyber Defensive Suite (Illustrative)
  • 2. Rapid prototyping

Required to keep pace with adversaries

  • 3. Network Optimization

Tailored solutions on high velocity data

  • 1. Cyber talent as the

drivers and shapers of MI

AI Defense Cyber Talent

Optimized, MI-informed cyber defensive posture: ➢ Leverages MI and AI ➢ Ensembles models in a cyber informed manner ➢ Demands domain acumen for both analysis and design

Proven Prototypes

  • 3. Network optimization for large, fast data
  • 4. Integrated, broad suite covering many diverse use cases
slide-12
SLIDE 12

BACK UP

Booz Allen Hamilton

12

slide-13
SLIDE 13

BOOZ ALLEN HAMILTON CYBER

13

Booz Allen Hamilton

slide-14
SLIDE 14

MODEL ARCHITECTURE DEEP DIVE

14

Booz Allen Hamilton Internal

Model Architures Model Layers Embedding Layer – Learns a dense vector representation of vectorized domain names. Names that are more similar to each other are closer in vector space. Dropout Regularization Layer–Prevents model overfitting by setting a random subset of neurons to zero. Convolution Layer –Convolves filters over the embedded inputs to form an activation map, which represents the locations of discovered features in the data embedding Long-short Term Memory (LSTM) Layer - Allows the model to learn relevant features (patterns of characters) from domain names and capture dependencies between non-adjacent characters. Dropout Regularization Layer –Prevents model overfitting by setting a random subset of neurons to zero. Dense – Connects all nodes in the preceding later (also used to perform final classification into two classes) Sigmoid Activation - Simple, classical transformation to assign a probability that a domain is malicious Training Approach Adam Neural Network Optimization –Exploits both the benefits of adaptive gradient optimization (per- parameter learning rate) and root mean square propagation (per-parameter learning rates are adapted based on the average of recent magnitudes of the weights). Does so by adapting the learning rate using the second moment of the gradient (i.e. variance) by calculating the exponential moving average of the gradient and squared gradient.

➢ CNN: Vectorized Domain -> Embedding -> Convolution (1D) -> Dropout -> Flatten -> Dense -> Dense -> Sigmoid Activation ➢ LSTM: Vectorized Domain -> Embedding -> LSTM -> Dropout -> Dense -> Dense -> Sigmoid Activation