Joshua Patterson | Director of Applied Solutions Engineering | GTC Israel 2017 @datametrician
ACCELERATING CYBER THREAT DETECTION WITH GPU Joshua Patterson | - - PowerPoint PPT Presentation
ACCELERATING CYBER THREAT DETECTION WITH GPU Joshua Patterson | - - PowerPoint PPT Presentation
ACCELERATING CYBER THREAT DETECTION WITH GPU Joshua Patterson | Director of Applied Solutions Engineering | GTC Israel 2017 @datametrician RULES & PEOPLE DONT SCALE Current methods are too slow Right now, financial services reports it
2
RULES & PEOPLE DON’T SCALE
Right now, financial services reports it takes an average of 98 days to detect an Advance Threat but retailers say it can be about seven months. Once the security community moves beyond the mantras “encrypt everything” and “secure the perimeter,” it can begin developing intelligent prioritization and response plans to various kinds of breaches – with a strong focus on integrity. The challenge lies in efficiently scaling these technologies for practical deployment, and making them reliable for large networks. This is where the security community should focus its efforts.
http://www.wired.com/2015/12/the-cia-secret-to-cybersecurity-that-no-one-seems-to-get/
Current methods are too slow
3
ATTACKS ARE MORE SOPHISTICATED
How Hackers Hijacked a Bank’s Entire Online Operation
https://www.wired.com/2017/04/hackers-hijacked-banks-entire-online-operation/
4
FIRST PRINCIPLES OF CYBER SECURITY
Where the industry must go
- 1. Indication of compromise needs to improve as attacks are becoming more sophisticated,
subtle, and hidden in the massive volume and velocity of data. Combining machine learning, graph analysis, and applied statistics, and integrating these methods with deep learning is essential to reduce false positives, detect threats faster, and empower analyst to be more efficient.
- 2. Event management is an accelerated analytics problem, the volume and velocity of data
from devices requires a new approach that combines all data sources to allow for more intelligent/advanced threat hunting and exploration at scale across machine data.
- 3. Visualization will be a key part of daily operations, which will allows analyst to label and
train Deep Learning models faster, and validate machine learning prediciton.
5
FIRST PRINCIPLES OF CYBER SECURITY
Where the industry must go
- 1. Indication of compromise needs to improve as attacks are becoming more sophisticated,
subtle, and hidden in the massive volume and velocity of data. Combining machine learning, graph analysis, and applied statistics, and integrating these methods with deep learning is essential to reduce false positives, detect threats faster, and empower analyst to be more efficient.
6
7
DATA PLATFORM-AS-A-SERVICE
- Handles 1M events/second
- Auto-scales the cluster
automatically
SCALE
- Offers HA with no data-loss
- Always-on architecture
- Data replication
HIGH AVAILABILITY
- Data platform security has
been implemented with VPCs in AWS
- Dashboard access using
NVIDIA LDAP
SECURITY
- Log-to-analytics
- Kibana, JDBC access
- Accessing data using BI tools
SELF SERVICE
8
ARCHITECTURE
V1
9
DATA PLATFORM STATS
10
ANOMALY DETECTION
11
ANOMALY DETECTION USING DEEP LEARNING
Data Platform
AD AI Framework (Keras + TensorFlow)
NGC/NGN GPU Cluster NGC/NGN GPU Cluster
GPU Cloud
Anomaly Detection
Top Features
Automated Alerts & Dashboards Early Detection Self Service Better accuracy & less noise
12
Raw Dataset
Feature Learning Algorithm: Recurrent Neural Network (RNN), Autoencoders (AE)
Unsupervised Learning: Multivariate-Gaussian Supervised Learning: Logistic Regression Anomalies: Email alerts, Dashboards
Time X1 X2 Time X1 X2 X’ X’’ Time X1 X2 X’ X’’ Y 1
Anomaly Post- processing: Univariate Analysis
Time X1 X2 Y Anomaly Description 1 X1
Anomaly Detection
Feedback from user
ANOMALY DETECTION FRAMEWORK
13
ANOMALY DETECTION BENEFITS WITH DEEP LEARNING
Top Features
Automated Alerts & Dashboards Early Detection Self Service Better accuracy & less noise
14
ANOMALY DETECTION TRAINING
- Evolution
- CPU vs GPU
- Learnings :
- Manual feature extraction does not scale
- Dataset preparation is the long pole
- Training on CPU takes longer than data collection rate
V0: Manual Feature Creation
V1: Automatic Feature Creation using DL (Theano) V2: Multi- GPU support + TensorFlow Serving (Keras + TensorFlow)
15
INFERENCING V1
- Use Case: Detecting anomalies with user’s activity
- Inferencing flow from 10k feet
- Started with python scripts for windowed aggregation
Live Streaming Data AD Platform ETL aggregations for inferencing
73 103 154 50 100 150 200 10 MINS 30 MINS 60 MINS
Python Script Performance
- Learnings: Hard to scale for near real time. AD platform runs inferencing
every 3 mins as we are impacted by speed of data processing
16
INFERENCING V2
- V2: To improve performance, we started using Presto
with data on S3 in JSON format
- Live data will be streamed from Kafka to S3. We use
Presto for our data warehousing needs
- Presto is an open-source distributed SQL query engine
- ptimized for low-latency, ad-hoc analysis of data*
20 4 25 6 30 8 5 10 15 20 25 30 35 PRESTO ON JSON PRESTO ON PARQUET 10 mins 30 mins 60 mins
- Learnings: Presto with Parquet has best performance but we need
to batch data at 30 secs interval. So it’s not completely real time
Improved Performance
17
FIRST PRINCIPLES OF CYBER SECURITY
Where the industry must go
- 1. Indication of compromise needs to improve as attacks are becoming more sophisticated,
subtle, and hidden in the massive volume and velocity of data. Combining machine learning, graph analysis, and applied statistics, and integrating these methods with deep learning is essential to reduce false positives, detect threats faster, and empower analyst to be more efficient.
- 2. Event management is an accelerated analytics problem, the volume and velocity of data
from devices requires a new approach that combines all data sources to allow for more intelligent/advanced threat hunting and exploration at scale across machine data.
18
GPU ACCELERATION
Accelerate the Pipeline, Not Just Deep Learning
- GPUs for deep learning = proven
- Where else and how else can we use
GPU acceleration?
- Dashboards
- Accelerating data pipeline
- Stream processing
- Building better models faster
- First: GPU databases
Data Ingestion Data Processing Visualization Model Training Inferencing
19
MOVING TO BIG DATA IS A START
Spark outperforms traditional SIEM
vs
Big Data Solution
10 node cluster - ~$60k in hardware
Production SIEM of Fortune 500 Enterprise Data
450+ columns ~250 million events per day
SIEM
Spark vs SIEM Benchmarks from Accenture Labs - Strata NY, Bsides LV
20
MOVING TO BIG DATA IS A START
Spark outperforms traditional SIEM
Typical Scenario Time Period SIEM Big Data Speed Up
1 Show all network communication from one host (IP) to multiple hosts (IPs) 1 Day 3h 20m 13s 1m 44s 114 Times Faster 1 Week Not Feasible* 4m 05s 2 Retrieve failed logon attempts in Active Directory 1 Day 18m 26s 1m 37s 10 Times Faster 1 Week 2h 13m 45s 3m 10s 41 Times Faster 3 Search for Malware (exe) in Symantec logs 1 Day 3h 24m 36s 1m 37s 125 Times Faster 1 Week Not Feasible* 3m 22s 4 View all proxy logs for a for specific domain 1 Day 4h 30m 13s 2m 54s 92 Times Faster 1 Week Not Feasible* 1m 09s**
Spark vs SIEM Benchmarks from Accenture Labs - Strata NY, Bsides LV
21
GPU DATABASES ARE EVEN FASTER
1.1 Billion Taxi Ride Benchmarks
21 30 1560 80 99 1250 150 269 2250 372 696 2970
500 1000 1500 2000 2500 3000 3500 4000 4500 5000
MapD DGX-1 MapD 4 x P100 Redshift 6-node Spark 11-node
Query 1 Query 2 Query 3 Query 4 Time in Milliseconds
Source: MapD Benchmarks on DGX from internal NVIDIA testing following guidelines of Mark Litwintschik’s blogs: Redshift, 6-node ds2.8xlarge cluster & Spark 2.1, 11 x m3.xlarge cluster w/ HDFS
@marklit82
10190 8134 19624 85942
22
MAPD
MapD Core MapD Immerse
LLVM Backend Rendering Streaming
LLVM creates one custom function that runs at speeds approaching hand-written
- functions. LLVM enables generic
targeting of different architectures + run simultaneously on CPU/GPU. Speed eliminates need to pre-index or aggregate data. Compute resides on GPUs freeing CPUs to parse + ingest. Finally, newest data can be combined with billions of rows of “near historical” data. Data goes from compute (CUDA) to graphics (OpenGL) pipeline without copy and comes back as compressed PNG (~100 KB) rather than raw data (> 1GB).
23
MAPD ARCHITECTURE
Visualization Libraries
JavaScript libraries that allow users to build custom web- based visualization apps powered by a MapD Core database based on DC.js.
LLVM
MapD Core SQL queries are compiled with a just-in-time (JIT) LLVM based compiler, and run as NVIDIA GPU machine code.
Distributed Scale-out
MapD Core has native distributed scale-out
- capabilities. MapD Core users
can query and visualize larger datasets with much smaller cluster sizes than traditional solutions.
High Availability
MapD Core has high availability functionality that provides durability and
- redundancy. Ingest and
queries are load balanced across servers for additional throughput.
Open Source Commercial
24
MAPD + IMMERSE VS ELASTIC + KIBANA
Elastic + Kibana
- Fantastic for complex search
- Scales easily (up to a point)
- Indexing consumes more storage (~4-6x)
- Kibana for KPI dashboarding?
MapD Core
- Very fast OLAP queries
- JIT LLVM query compiler
- GPUs for compute
- CPUs for parse + ingest
- Limited join support (for now)
Immerse
- c3/d3 + crossfilter = nice dashboards
- Backend rendering
25
ARCHITECTURE
V1
26
ARCHITECTURE
V2 (with MapD)
27
INFERENCING V3
- V3: Explored GPU databases like MapD to improve the performance for querying on streaming live data
- MapD offers constant query response times
- MapD has some Sql limitations. We use Presto as an interface & built a “MapD-> Presto” connector for full
ANSI Sql features
20 4 0.1 1.2 25 6 0.1 1.2 30 8 0.1 1.2 5 10 15 20 25 30 35 PRESTO ON JSON PRESTO ON PARQUET MAPD PRESTO + MAPD
GPU Database Performance
10 mins 30 mins 60 mins
Execution Time (seconds)
28
FIRST PRINCIPLES OF CYBER SECURITY
Where the industry must go
- 1. Indication of compromise needs to improve as attacks are becoming more sophisticated,
subtle, and hidden in the massive volume and velocity of data. Combining machine learning, graph analysis, and applied statistics, and integrating these methods with deep learning is essential to reduce false positives, detect threats faster, and empower analyst to be more efficient.
- 2. Event management is an accelerated analytics problem, the volume and velocity of data
from devices requires a new approach that combines all data sources to allow for more intelligent/advanced threat hunting and exploration at scale across machine data.
- 3. Visualization will be a key part of daily operations, which will allows analyst to label and
train Deep Learning models faster, and validate machine learning predictions.
29
MAPD VS KIBANA
Dashboards Comparison + Performance Test Method
30
DASHBOARD PERFORMANCE
MapD Immerse vs Elastic Kibana
100 200 300 1 6 11 16 21 26 31 MapD Immerse (DGX) MapD Immerse (P2) Elastic Kibana x
< 9s < 12s
Days of Data Time to Fully Load (seconds)
31
VISUALIZATION WITH GPU
Less hardware, more performance, more scale
32
VISUALIZATION WITH GPU
Less hardware, more performance, more scale
1/10th the hardware 1-2 orders of magnitude more performance
33
VISUALIZATION WITH GPU
Less hardware, more performance, more scale
1/10th the hardware 1-2 orders of magnitude more performance Real time visualization of 100K+ nodes 1M+ Edges 50-100x faster clustering than other solutions
34
LISTS DO NOT VISUALLY SCALE
Text search is a great starting point! Does not scale Do not see the 30K+ events nor the IPs, users, nor how they relate…
35
BAR CHARTS HIDE RELATIONSHIPS
Good for summaries! But not: individual items But not: behaviors, relationships, patterns,
- utliers, …
?
36
GRAPHS: A KEY MISSING VIEW
Unified Model
Shows entities, events, and relationships Multipurpose: connect, see, interact
Visual
Inspect individual items See behavior, patterns, and outliers Scale to enterprise workloads
37
DIFFERENT GRAPHS, DIFFERENT QUESTIONS
Uni
Ex: Network mapping “Is it safe to reboot this?”
ip ip
Hyper
Ex: Incident response “Did this escalate?”
Multi
Ex: SSH trails “Is a user crossing zones?”
ip
user user
ip ip
user
event event
user
ip
38
Optimized Networking GPU Analysis and ML GPU Rendering
GRAPHISTRY
Graph Visualization
Hunting: Daily Anomalies SecOps: Shadow IT Use IR: Killchain Analysis Fraud: Tracking Embezzlers Threat Intel: Botnet Analysis https://www.youtube.com/watch?v=NG4HaIpZ2K4&feature=youtu.be
39
CURRENT WORK
40
EXPAND GPU USAGE
More Data, Less Hardware
0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0
2008 2010 2012 2014 2016 2017Peak Double Precision
NVIDIA GPU x86 CPU
TFLOPS
Scaling up and out with GPU co-processors
41
EXPAND GPU USAGE
AD Platform to Cyber Security
Cyber Security Analytics Platform
42
GPU DATA FRAME
Pre-GOAI
H2O.ai Graphistry Anaconda Gunrock BlazingDB MapD
CPU
APP A APP B
Copy & Convert Copy & Convert Copy & Convert Copy & Convert Copy & Convert Copy & Convert Copy & Convert
Evolution of Performance
No Copy & Converts - Full Interoperability
H2O.ai Anaconda Gunrock Graphistry BlazingDB MapD
GPU Data Frame Apache Arrow
43
CYBERWORKS
CYBERWORKS SIEM SDK
Goals
- Open Source Ecosystem & Select ISVs
- Integration Points w/ leading security vendors
- FireEye
- Splunk
- Palo Alto Networks
Purpose A platform to allow analysts to hunt and analyze data faster at scale than traditional big data to find unknown and zero day threats. It will accelerate the threat detection ecosystem and harden cyber defense utilizing GPU ISVs and Deep Learning Frameworks.
Purpose Built SDK For SIEM Analytics
44
CYBERWORKS ACTIVITIES
Continuous Improvement
Use GPU accelerated databases to analyze data to improve hunting today, as well as enrich and label data for Deep Learning Connect accelerated DBs to Splunk for event management, hunting, and exploration. Use Graphistry and MapD to visualize the data for anomaly and threat detection in new ways. The goal is to GPU accelerate parts of Splunk through partnership and connect/bolt on GPUDBs/Graphistry Use ML and Graph Analytics for feature extraction and behavioral analytics, an ensemble approach to detection. Expand Deep Learning training as more data is labeled/classified, and threats are caught faster, building off DL techniques used in GFN, other groups, and external ISV. Generalize Deep Learning for supervised and unsupervised anomaly and threat detection (Insider, APT, DDOS, etc…) while building our own cyber security deep learning accelerator. Use best practices from Driveworks and other accelerators and SDK as a reference architecture. Leverage DL from other parts of the firm to accelerate development as well. While using Splunk Cloud to protect Nvidia, we create a redundant path of data to enable R&D.
nvGRAPH
45
CYBERWORKS ARCHITECTURE
SecOps Data Sources Ingest Storage Stream Processing Batch Processing Serving Layer Notebook Visualization Graph Processing
cuSTINGER
Graph Visualization
Interactivity Query Speed Gunrock
Deep Learning Machine Learning
46
CYBERWORKS HARDWARE
Scale out Cluster DGX Cluster NAS SIEM Notebooks End User 3rd Party Apps
Messaging Queue
Accelerating your SIEM
47
GPU OPEN ANALYTICS INITIATIVE
http://gpuopenanalytics.com/
GPU Data Frame (GDF)
Ingest/ Parse Exploratory Analysis Feature Engineering ML/DL Algorithms Grid Search Scoring Model Export
48
ANACONDA
Python ETL on GPU
A Python open-source just-in-time
- ptimizing compiler that uses LLVM to
produce native machine instructions. Primary Contributor to PyGDF . Dask is a flexible parallel computing library for analytic computing with dynamic task scheduling and big data collections. Primary contributor to Dask_GDF .
Jeremy Howard
Deep learning researcher & educator. Founder: fast.ai; Faculty: USF & Singularity University; // Previously - CEO: Enlitic; President: Kaggle; CEO Fastmail Rewrote @scikit_learn PolynomialFeatures in @ContinuumIO Numba. Got a 40x speedup (would be bigger with more data!) 12 lines of code
49
BLAZINGDB
Scale out Data Warehousing
50
Optimized Networking GPU Analysis and ML GPU Rendering
GRAPHISTRY
Graph Visualization
Hunting: Daily Anomalies SecOps: Shadow IT Use IR: Killchain Analysis Fraud: Tracking Embezzlers Threat Intel: Botnet Analysis
51
GPU-Accelerated Graph Analytics Library
Multi-GPU optimized algorithms Reduced cost and increased performance Performance constantly improving
GUNROCK
52
H2O.AI
H2O4GPU - GPU Machine Learning Library
53
BETTER DATA PIPELINES
User Defined Functions at Scale
https://github.com/gpuopenanalytics libgdf, pygdf, dask_gdf
54
BETTER DATA PIPELINES
HIVE to BlazingDB
55
BETTER DATA PIPELINES
More Models
nvGRAPH
https://github.com/h2oai/h2o4gpu
# edges = E * 2^S ~34M
56
JOIN THE REVOLUTION
Everyone Can Help! APACHE ARROW APACHE PARQUET GPU Open Analytics Initiative
https://arrow.apache.org/ @ApacheArrow https://parquet.apache.org/ @ApacheParquet http://gpuopenanalytics.com/ @Gpuoai Integrations, feedback, documentation support, pull requests, new issues, or donations welcomed!
Joshua Patterson @datametrician