[PPT] - ACCELERATING CYBER THREAT DETECTION WITH GPU Joshua Patterson

SLIDE 1

Joshua Patterson | Director of Applied Solutions Engineering | GTC Israel 2017 @datametrician

ACCELERATING CYBER THREAT DETECTION WITH GPU

SLIDE 2

2

RULES & PEOPLE DON’T SCALE

Right now, financial services reports it takes an average of 98 days to detect an Advance Threat but retailers say it can be about seven months. Once the security community moves beyond the mantras “encrypt everything” and “secure the perimeter,” it can begin developing intelligent prioritization and response plans to various kinds of breaches – with a strong focus on integrity. The challenge lies in efficiently scaling these technologies for practical deployment, and making them reliable for large networks. This is where the security community should focus its efforts.

http://www.wired.com/2015/12/the-cia-secret-to-cybersecurity-that-no-one-seems-to-get/

Current methods are too slow

SLIDE 3

3

ATTACKS ARE MORE SOPHISTICATED

How Hackers Hijacked a Bank’s Entire Online Operation

https://www.wired.com/2017/04/hackers-hijacked-banks-entire-online-operation/

SLIDE 4

4

FIRST PRINCIPLES OF CYBER SECURITY

Where the industry must go

1. Indication of compromise needs to improve as attacks are becoming more sophisticated,

subtle, and hidden in the massive volume and velocity of data. Combining machine learning, graph analysis, and applied statistics, and integrating these methods with deep learning is essential to reduce false positives, detect threats faster, and empower analyst to be more efficient.

2. Event management is an accelerated analytics problem, the volume and velocity of data

from devices requires a new approach that combines all data sources to allow for more intelligent/advanced threat hunting and exploration at scale across machine data.

3. Visualization will be a key part of daily operations, which will allows analyst to label and

train Deep Learning models faster, and validate machine learning prediciton.

SLIDE 5

5

FIRST PRINCIPLES OF CYBER SECURITY

Where the industry must go

1. Indication of compromise needs to improve as attacks are becoming more sophisticated,

subtle, and hidden in the massive volume and velocity of data. Combining machine learning, graph analysis, and applied statistics, and integrating these methods with deep learning is essential to reduce false positives, detect threats faster, and empower analyst to be more efficient.

SLIDE 6

6

SLIDE 7

7

DATA PLATFORM-AS-A-SERVICE

Handles 1M events/second
Auto-scales the cluster

automatically

SCALE

Offers HA with no data-loss
Always-on architecture
Data replication

HIGH AVAILABILITY

Data platform security has

been implemented with VPCs in AWS

Dashboard access using

NVIDIA LDAP

SECURITY

Log-to-analytics
Kibana, JDBC access
Accessing data using BI tools

SELF SERVICE

SLIDE 8

8

ARCHITECTURE

V1

SLIDE 9

9

DATA PLATFORM STATS

SLIDE 10

10

ANOMALY DETECTION

SLIDE 11

11

ANOMALY DETECTION USING DEEP LEARNING

Data Platform

AD AI Framework (Keras + TensorFlow)

NGC/NGN GPU Cluster NGC/NGN GPU Cluster

GPU Cloud

Anomaly Detection

Top Features

Automated Alerts & Dashboards Early Detection Self Service Better accuracy & less noise

SLIDE 12

12

Raw Dataset

Feature Learning Algorithm: Recurrent Neural Network (RNN), Autoencoders (AE)

Unsupervised Learning: Multivariate-Gaussian Supervised Learning: Logistic Regression Anomalies: Email alerts, Dashboards

Time X1 X2 Time X1 X2 X’ X’’ Time X1 X2 X’ X’’ Y 1

Anomaly Post- processing: Univariate Analysis

Time X1 X2 Y Anomaly Description 1 X1

Anomaly Detection

Feedback from user

ANOMALY DETECTION FRAMEWORK

SLIDE 13

13

ANOMALY DETECTION BENEFITS WITH DEEP LEARNING

Top Features

Automated Alerts & Dashboards Early Detection Self Service Better accuracy & less noise

SLIDE 14

14

ANOMALY DETECTION TRAINING

Evolution
CPU vs GPU
Learnings :
Manual feature extraction does not scale
Dataset preparation is the long pole
Training on CPU takes longer than data collection rate

V0: Manual Feature Creation

V1: Automatic Feature Creation using DL (Theano) V2: Multi- GPU support + TensorFlow Serving (Keras + TensorFlow)

SLIDE 15

15

INFERENCING V1

Use Case: Detecting anomalies with user’s activity
Inferencing flow from 10k feet
Started with python scripts for windowed aggregation

Live Streaming Data AD Platform ETL aggregations for inferencing

73 103 154 50 100 150 200 10 MINS 30 MINS 60 MINS

Python Script Performance

Learnings: Hard to scale for near real time. AD platform runs inferencing

every 3 mins as we are impacted by speed of data processing

SLIDE 16

16

INFERENCING V2

V2: To improve performance, we started using Presto

with data on S3 in JSON format

Live data will be streamed from Kafka to S3. We use

Presto for our data warehousing needs

Presto is an open-source distributed SQL query engine
ptimized for low-latency, ad-hoc analysis of data*

20 4 25 6 30 8 5 10 15 20 25 30 35 PRESTO ON JSON PRESTO ON PARQUET 10 mins 30 mins 60 mins

Learnings: Presto with Parquet has best performance but we need

to batch data at 30 secs interval. So it’s not completely real time

Improved Performance

SLIDE 17

17

FIRST PRINCIPLES OF CYBER SECURITY

Where the industry must go

1. Indication of compromise needs to improve as attacks are becoming more sophisticated,

subtle, and hidden in the massive volume and velocity of data. Combining machine learning, graph analysis, and applied statistics, and integrating these methods with deep learning is essential to reduce false positives, detect threats faster, and empower analyst to be more efficient.

2. Event management is an accelerated analytics problem, the volume and velocity of data

from devices requires a new approach that combines all data sources to allow for more intelligent/advanced threat hunting and exploration at scale across machine data.

SLIDE 18

18

GPU ACCELERATION

Accelerate the Pipeline, Not Just Deep Learning

GPUs for deep learning = proven
Where else and how else can we use

GPU acceleration?

Dashboards
Accelerating data pipeline
Stream processing
Building better models faster
First: GPU databases

Data Ingestion Data Processing Visualization Model Training Inferencing

SLIDE 19

19

MOVING TO BIG DATA IS A START

Spark outperforms traditional SIEM

vs

Big Data Solution

10 node cluster - ~$60k in hardware

Production SIEM of Fortune 500 Enterprise Data

450+ columns ~250 million events per day

SIEM

Spark vs SIEM Benchmarks from Accenture Labs - Strata NY, Bsides LV

SLIDE 20

20

MOVING TO BIG DATA IS A START

Spark outperforms traditional SIEM

Typical Scenario Time Period SIEM Big Data Speed Up

1 Show all network communication from one host (IP) to multiple hosts (IPs) 1 Day 3h 20m 13s 1m 44s 114 Times Faster 1 Week Not Feasible* 4m 05s 2 Retrieve failed logon attempts in Active Directory 1 Day 18m 26s 1m 37s 10 Times Faster 1 Week 2h 13m 45s 3m 10s 41 Times Faster 3 Search for Malware (exe) in Symantec logs 1 Day 3h 24m 36s 1m 37s 125 Times Faster 1 Week Not Feasible* 3m 22s 4 View all proxy logs for a for specific domain 1 Day 4h 30m 13s 2m 54s 92 Times Faster 1 Week Not Feasible* 1m 09s**

Spark vs SIEM Benchmarks from Accenture Labs - Strata NY, Bsides LV

SLIDE 21

21

GPU DATABASES ARE EVEN FASTER

1.1 Billion Taxi Ride Benchmarks

21 30 1560 80 99 1250 150 269 2250 372 696 2970

500 1000 1500 2000 2500 3000 3500 4000 4500 5000

MapD DGX-1 MapD 4 x P100 Redshift 6-node Spark 11-node

Query 1 Query 2 Query 3 Query 4 Time in Milliseconds

Source: MapD Benchmarks on DGX from internal NVIDIA testing following guidelines of Mark Litwintschik’s blogs: Redshift, 6-node ds2.8xlarge cluster & Spark 2.1, 11 x m3.xlarge cluster w/ HDFS

@marklit82

10190 8134 19624 85942

SLIDE 22

22

MAPD

MapD Core MapD Immerse

LLVM Backend Rendering Streaming

LLVM creates one custom function that runs at speeds approaching hand-written

functions. LLVM enables generic

targeting of different architectures + run simultaneously on CPU/GPU. Speed eliminates need to pre-index or aggregate data. Compute resides on GPUs freeing CPUs to parse + ingest. Finally, newest data can be combined with billions of rows of “near historical” data. Data goes from compute (CUDA) to graphics (OpenGL) pipeline without copy and comes back as compressed PNG (~100 KB) rather than raw data (> 1GB).

SLIDE 23

23

MAPD ARCHITECTURE

Visualization Libraries

JavaScript libraries that allow users to build custom web- based visualization apps powered by a MapD Core database based on DC.js.

LLVM

MapD Core SQL queries are compiled with a just-in-time (JIT) LLVM based compiler, and run as NVIDIA GPU machine code.

Distributed Scale-out

MapD Core has native distributed scale-out

capabilities. MapD Core users

can query and visualize larger datasets with much smaller cluster sizes than traditional solutions.

High Availability

MapD Core has high availability functionality that provides durability and

redundancy. Ingest and

queries are load balanced across servers for additional throughput.

Open Source Commercial

SLIDE 24

24

MAPD + IMMERSE VS ELASTIC + KIBANA

Elastic + Kibana

Fantastic for complex search
Scales easily (up to a point)
Indexing consumes more storage (~4-6x)
Kibana for KPI dashboarding?

MapD Core

Very fast OLAP queries
JIT LLVM query compiler
GPUs for compute
CPUs for parse + ingest
Limited join support (for now)

Immerse

c3/d3 + crossfilter = nice dashboards
Backend rendering

SLIDE 25

25

ARCHITECTURE

V1

SLIDE 26

26

ARCHITECTURE

V2 (with MapD)

SLIDE 27

27

INFERENCING V3

V3: Explored GPU databases like MapD to improve the performance for querying on streaming live data
MapD offers constant query response times
MapD has some Sql limitations. We use Presto as an interface & built a “MapD-> Presto” connector for full

ANSI Sql features

20 4 0.1 1.2 25 6 0.1 1.2 30 8 0.1 1.2 5 10 15 20 25 30 35 PRESTO ON JSON PRESTO ON PARQUET MAPD PRESTO + MAPD

GPU Database Performance

10 mins 30 mins 60 mins

Execution Time (seconds)

SLIDE 28

28

FIRST PRINCIPLES OF CYBER SECURITY

Where the industry must go

1. Indication of compromise needs to improve as attacks are becoming more sophisticated,

subtle, and hidden in the massive volume and velocity of data. Combining machine learning, graph analysis, and applied statistics, and integrating these methods with deep learning is essential to reduce false positives, detect threats faster, and empower analyst to be more efficient.

2. Event management is an accelerated analytics problem, the volume and velocity of data

from devices requires a new approach that combines all data sources to allow for more intelligent/advanced threat hunting and exploration at scale across machine data.

3. Visualization will be a key part of daily operations, which will allows analyst to label and

train Deep Learning models faster, and validate machine learning predictions.

SLIDE 29

29

MAPD VS KIBANA

Dashboards Comparison + Performance Test Method

SLIDE 30

30

DASHBOARD PERFORMANCE

MapD Immerse vs Elastic Kibana

100 200 300 1 6 11 16 21 26 31 MapD Immerse (DGX) MapD Immerse (P2) Elastic Kibana x

< 9s < 12s

Days of Data Time to Fully Load (seconds)

SLIDE 31

31

VISUALIZATION WITH GPU

Less hardware, more performance, more scale

SLIDE 32

32

VISUALIZATION WITH GPU

Less hardware, more performance, more scale

1/10th the hardware 1-2 orders of magnitude more performance

SLIDE 33

33

VISUALIZATION WITH GPU

Less hardware, more performance, more scale

1/10th the hardware 1-2 orders of magnitude more performance Real time visualization of 100K+ nodes 1M+ Edges 50-100x faster clustering than other solutions

SLIDE 34

34

LISTS DO NOT VISUALLY SCALE

Text search is a great starting point! Does not scale Do not see the 30K+ events nor the IPs, users, nor how they relate…

SLIDE 35

35

BAR CHARTS HIDE RELATIONSHIPS

Good for summaries! But not: individual items But not: behaviors, relationships, patterns,

utliers, …

?

SLIDE 36

36

GRAPHS: A KEY MISSING VIEW

Unified Model

Shows entities, events, and relationships Multipurpose: connect, see, interact

Visual

Inspect individual items See behavior, patterns, and outliers Scale to enterprise workloads

SLIDE 37

37

DIFFERENT GRAPHS, DIFFERENT QUESTIONS

Uni

Ex: Network mapping “Is it safe to reboot this?”

ip ip

Hyper

Ex: Incident response “Did this escalate?”

Multi

Ex: SSH trails “Is a user crossing zones?”

ip

user user

ip ip

user

event event

user

ip

SLIDE 38

38

Optimized Networking GPU Analysis and ML GPU Rendering

GRAPHISTRY

Graph Visualization

Hunting: Daily Anomalies SecOps: Shadow IT Use IR: Killchain Analysis Fraud: Tracking Embezzlers Threat Intel: Botnet Analysis https://www.youtube.com/watch?v=NG4HaIpZ2K4&feature=youtu.be

SLIDE 39

39

CURRENT WORK

SLIDE 40

40

EXPAND GPU USAGE

More Data, Less Hardware

0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0

2008 2010 2012 2014 2016 2017

Peak Double Precision

NVIDIA GPU x86 CPU

TFLOPS

Scaling up and out with GPU co-processors

SLIDE 41

41

EXPAND GPU USAGE

AD Platform to Cyber Security

Cyber Security Analytics Platform

SLIDE 42

42

GPU DATA FRAME

Pre-GOAI

H2O.ai Graphistry Anaconda Gunrock BlazingDB MapD

CPU

APP A APP B

Copy & Convert Copy & Convert Copy & Convert Copy & Convert Copy & Convert Copy & Convert Copy & Convert

Evolution of Performance

No Copy & Converts - Full Interoperability

H2O.ai Anaconda Gunrock Graphistry BlazingDB MapD

GPU Data Frame Apache Arrow

SLIDE 43

43

CYBERWORKS

CYBERWORKS SIEM SDK

Goals

Open Source Ecosystem & Select ISVs
Integration Points w/ leading security vendors
FireEye
Splunk
Palo Alto Networks

Purpose A platform to allow analysts to hunt and analyze data faster at scale than traditional big data to find unknown and zero day threats. It will accelerate the threat detection ecosystem and harden cyber defense utilizing GPU ISVs and Deep Learning Frameworks.

Purpose Built SDK For SIEM Analytics

SLIDE 44

44

CYBERWORKS ACTIVITIES

Continuous Improvement

Use GPU accelerated databases to analyze data to improve hunting today, as well as enrich and label data for Deep Learning Connect accelerated DBs to Splunk for event management, hunting, and exploration. Use Graphistry and MapD to visualize the data for anomaly and threat detection in new ways. The goal is to GPU accelerate parts of Splunk through partnership and connect/bolt on GPUDBs/Graphistry Use ML and Graph Analytics for feature extraction and behavioral analytics, an ensemble approach to detection. Expand Deep Learning training as more data is labeled/classified, and threats are caught faster, building off DL techniques used in GFN, other groups, and external ISV. Generalize Deep Learning for supervised and unsupervised anomaly and threat detection (Insider, APT, DDOS, etc…) while building our own cyber security deep learning accelerator. Use best practices from Driveworks and other accelerators and SDK as a reference architecture. Leverage DL from other parts of the firm to accelerate development as well. While using Splunk Cloud to protect Nvidia, we create a redundant path of data to enable R&D.

nvGRAPH

SLIDE 45

45

CYBERWORKS ARCHITECTURE

SecOps Data Sources Ingest Storage Stream Processing Batch Processing Serving Layer Notebook Visualization Graph Processing

cuSTINGER

Graph Visualization

Interactivity Query Speed Gunrock

Deep Learning Machine Learning

SLIDE 46

46

CYBERWORKS HARDWARE

Scale out Cluster DGX Cluster NAS SIEM Notebooks End User 3rd Party Apps

Messaging Queue

Accelerating your SIEM

SLIDE 47

47

GPU OPEN ANALYTICS INITIATIVE

http://gpuopenanalytics.com/

GPU Data Frame (GDF)

Ingest/ Parse Exploratory Analysis Feature Engineering ML/DL Algorithms Grid Search Scoring Model Export

SLIDE 48

48

ANACONDA

Python ETL on GPU

A Python open-source just-in-time

ptimizing compiler that uses LLVM to

produce native machine instructions. Primary Contributor to PyGDF . Dask is a flexible parallel computing library for analytic computing with dynamic task scheduling and big data collections. Primary contributor to Dask_GDF .

Jeremy Howard

Deep learning researcher & educator. Founder: fast.ai; Faculty: USF & Singularity University; // Previously - CEO: Enlitic; President: Kaggle; CEO Fastmail Rewrote @scikit_learn PolynomialFeatures in @ContinuumIO Numba. Got a 40x speedup (would be bigger with more data!) 12 lines of code

SLIDE 49

49

BLAZINGDB

Scale out Data Warehousing

SLIDE 50

50

Optimized Networking GPU Analysis and ML GPU Rendering

GRAPHISTRY

Graph Visualization

Hunting: Daily Anomalies SecOps: Shadow IT Use IR: Killchain Analysis Fraud: Tracking Embezzlers Threat Intel: Botnet Analysis

SLIDE 51

51

GPU-Accelerated Graph Analytics Library

Multi-GPU optimized algorithms Reduced cost and increased performance Performance constantly improving

GUNROCK

SLIDE 52

52

H2O.AI

H2O4GPU - GPU Machine Learning Library

SLIDE 53

53

BETTER DATA PIPELINES

User Defined Functions at Scale

https://github.com/gpuopenanalytics libgdf, pygdf, dask_gdf

SLIDE 54

54

BETTER DATA PIPELINES

HIVE to BlazingDB

SLIDE 55

55

BETTER DATA PIPELINES

More Models

nvGRAPH

https://github.com/h2oai/h2o4gpu

# edges = E * 2^S ~34M

SLIDE 56

56

JOIN THE REVOLUTION

Everyone Can Help! APACHE ARROW APACHE PARQUET GPU Open Analytics Initiative

https://arrow.apache.org/ @ApacheArrow https://parquet.apache.org/ @ApacheParquet http://gpuopenanalytics.com/ @Gpuoai Integrations, feedback, documentation support, pull requests, new issues, or donations welcomed!

SLIDE 57

Joshua Patterson @datametrician