Pattern Detection in Computer Networks Using Robust Principal - - PowerPoint PPT Presentation

pattern detection in computer networks using robust
SMART_READER_LITE
LIVE PREVIEW

Pattern Detection in Computer Networks Using Robust Principal - - PowerPoint PPT Presentation

Pattern Detection in Computer Networks Using Robust Principal Component Analysis Randy Paffenroth Associate Professor of Mathematical Sciences, and Associate Professor of Computer Science Data Science Program Worcester Polytechnic Institute


slide-1
SLIDE 1

Pattern Detection in Computer Networks Using Robust Principal Component Analysis

CS525 URBAN NETWORKS: METHODS AND ANALYSIS 4-19-2017

Randy Paffenroth

Associate Professor of Mathematical Sciences, and Associate Professor of Computer Science Data Science Program Worcester Polytechnic Institute

slide-2
SLIDE 2

Urban networks vs. Computer networks

By Howchou (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

By Hibernia Networks (Hibernia Networks) [Public domain], via Wikimedia Commons

slide-3
SLIDE 3
  • Robust Principal Component Analysis as applied to

analysis of computer networks.

  • In effect, I am interested in “semi-supervised” learning

where much of the data is unlabeled and has to “speak for itself”

  • Attempt to justify why I think this is an interesting way to

think about network analysis.

  • Show some examples in this area.
  • Beware! I am a mathematician and, morally, I can’t give a

talk without any equations :-)

What am I going to talk about?

slide-4
SLIDE 4

Theory and Practice

slide-5
SLIDE 5

Where we find our inspiration for practice...

  • St

Stuxnet, Flame, Target In Inc., Ne Neiman Marcus, Affin init ity Gaming, Dairy Queen...

  • Et tu, Dair

iry Queen!? This is is is when thin ings got

  • t personal.

l...

  • "Axiom" 1: Unle

less some sensor, or coll llectio ion of sensors, is effected by an atta tack then you can't detect it it.

  • I.

I.e. eit ither the marginal or jo join int probabil ilit ity densit ity functio ion of the sensors must be dif ifferent in in a stati tistically ly meanin ingful way, condit ition

  • ned on the absence or presence
  • f an attack.
  • "Axiom" 2: The most dangerous attacks are those for whic

ich you don’t have a signature.

  • Vir

irus detectio ion and intrusio ion detectio ion systems (ID IDS) do a good jo job of

  • f detectin

ing atta tacks for whic ich a sig ignature is is known, but have ve noth

  • thin

ing to say if if the atta tack has no sig ignature

  • "Theorem": Therefore the most dangerous attacks can only

ly be detected by sensors whic ich were not

  • t desig

igned to detect th that threat.

  • You have

ve to get t lucky and have a sensor th that detects ts the new att ttack eve ven though it it was not desig igned to do

  • so.
  • "Corolla

lary": You want lots of sensors!

  • But, how
  • w do you fuse them? Eve

ven once you have a way of fusin ing the data, how

  • w do

you avo void id being ove verwhelmed with fals lse alarms!

slide-6
SLIDE 6

Advanced Persistent Threats

Reconnaissance

Botnet

Pivoting Pivoting Pivoting Pivoting

Exfiltration

Point of entry Command and control

slide-7
SLIDE 7

What do we mean by a sensor?

No attack No attack Attack Attack Time Sensor Response

slide-8
SLIDE 8

What do we mean by a sensor?

No attack No attack Attack Attack Time Sensor Response

slide-9
SLIDE 9

What do we mean by a sensor?

No attack No attack Attack Attack Time Sensor Response

slide-10
SLIDE 10

What kinds of sensors?

  • Already talked about packet rates.
  • Port, CPU, memory activity, etc.
  • Intrusion Detection Systems
  • Bro, Snort, Suricata, etc.
  • More "complicated" sensors such as those

inspired by information theory.

  • Packet payload entropy

Butun, Ismail, Salvatore D. Morgera, and Ravi Sankar. "A survey of intrusion detection systems in wireless sensor networks." Communications Surveys & Tutorials, IEEE 16.1 (2014): 266-282. Moosavi, M. R., et al. "ENTROPY BASED FUZZY RULE WEIGHTING FOR HIERARCHICAL INTRUSION DETECTION." Iranian Journal of Fuzzy Systems 11.3 (2014): 77-94.

slide-11
SLIDE 11

Best with an example!

slide-12
SLIDE 12

Data matrix

slide-13
SLIDE 13

First order anomaly

slide-14
SLIDE 14

Sparse correlations? Latent signal model...

N Y A U B V

slide-15
SLIDE 15

A simple second order anomaly

slide-16
SLIDE 16

Second order theory!

In our work we focused on analyzing the second order statistics of by way of its covariance or normalized cross correlation matrix , such as

Interesting questions:

  • Correlation versus covariance?
  • More refined calculations such as

Maximum Likelihood covariance estimation (e.g. using convex

  • ptimization).

Well defined for missing data and different data types (e.g. point-biserial correlation).

slide-17
SLIDE 17

Second order anomaly

slide-18
SLIDE 18

Standing on the shoulders of giants

  • Over the past 4-5 years there has been a flurry of activity on this problem,

much of which we suspect the current audience is aware of.

  • Ideas such as matrix completion, robust principal component analysis,

and robust matrix completion have generated a lot of interest, including among us!

  • Z. Zhou, X. Li, J. Wright, E. Cande`s, and Y.

Ma, “Stable Principal Component Pursuit,” ISIT 2010: Proceedings of IEEE International Symposium on Information Technology, 2010.

  • E. Candes, X. Li, Y. Ma, and J. Wright,

“Robust principal component analysis?,” J. ACM, vol. 58, pp. 11:1–11:37, June 2011.

  • E. Candes and Y. Plan, “Matrix

Completion With Noise,” Proceedings of the IEEE, vol.98, no.6, p.11, 2009 E.Candes and B.Recht, “Exact matrix completion via convex optimization,” Foundations of Computational Mathematics, vol. 9, pp. 717–772, December 2009. Eckart, C.; Young, G. (1936). "The approximation of one matrix by another of lower rank". Psychometrika 1 (3): 211–8.

Matrix completion: The Netflix problem! Robust principal component analysis

  • R. Paffenroth, P. Du Toit, R. Nong, L. Scharf, A. Jayasumana and V. Bandara

Space-time signal processing for distributed pattern detection in sensor networks IEEE Journal of Selected Topics in Signal Processing, Vol. 7, No.1, February 2013

  • P. Du Toit, R. Paffenroth, R. Nong Stability of Principal Component Pursuit with

Point-wise Error Constraints in preparation 2012.

What I am interested in :-)

slide-19
SLIDE 19

Singular values L S = + M

slide-20
SLIDE 20

The appropriate structures appear all

  • ver the place in real data!

Insurance Satisfaction Surveys

Elisa Rosales

Singular Values of Matrices

slide-21
SLIDE 21

Amazon product communities SKAION Internet Attack (e.g., DDoS) simulations

The appropriate structures appear all

  • ver the place in real data!

Rakesh Biradar

Singular Values of Matrices

slide-22
SLIDE 22

Abilene Internet2 Backbone

slide-23
SLIDE 23

Abilene Internet2 Backbone

slide-24
SLIDE 24

Abilene Internet2 Backbone

slide-25
SLIDE 25

Abilene Internet2 Backbone

slide-26
SLIDE 26

Abilene Internet2 Backbone

slide-27
SLIDE 27

Enough math for the moment, lets try a really practical example

  • DARPA Lincoln Lab

Intrusion Detection Evaluation Data Set

➢ IPsweep of the AFB from a remote site ➢ Probe of live IP's to look for the

sadmind daemon running on Solaris hosts

➢ Breakins via the sadmind vulnerability,

both successful and unsuccessful on those hosts

➢ Installation of the trojan mstream DDoS

software on three hosts at the AFB

➢ Launching the DDoS

https://www.ll.mit.edu/ideval/d ata/2000/LLS_DDOS_1.0.html

slide-28
SLIDE 28

Feature generation

Raw PCAP files Derived features

slide-29
SLIDE 29

Imporant idea... don't blindly follow theory

,

0 * 1

, . . arg min ( ) ( )

L S

L S L S s P P L S M t l

W W

+

  • =

+

slide-30
SLIDE 30

Lincoln Labs DARPA Intrusion Detection Data Set - PCA

  • IP sweep from a remote

site,

  • a probe of live IP

addresses looking for a running Sadmind daemon,

  • and then an exploitation
  • f a Sadmind vulnerability.
slide-31
SLIDE 31

Lincoln Labs DARPA Intrusion Detection Data Set - Comparison

PCA – Too many false negatives RPCA – Too many false positives

slide-32
SLIDE 32

Lincoln Labs DARPA Intrusion Detection Data Set - Comparison

Too “thick” Too “thin” Just right l

slide-33
SLIDE 33

Key idea

  • Semi-supervised learning
  • PCA and RPCA have many parameters
  • Far to many to train on reasonably sized collections
  • f attacks
  • Only train a few important parameters on

supervised training data

– Like

  • Gives better generalization and less over-fitting

l

slide-34
SLIDE 34

Key idea

  • Semi-supervised learning

Training data for l Algorithm not trained on this attack vector!

slide-35
SLIDE 35

Other fun problems: LANDER

The LANDER project measures the number of “active” (i.e. respond to pings)

  • n subnets across

the Internet Same structure appears! Can be used to pick out all LG DACOM subnets in Europe.

Subnets in anomaly: [1, 210, 44, 0] [1, 210, 173, 0] [1, 219, 34, 0] [1, 210, 206, 0] [1, 218, 60, 0] [1, 218, 121, 0] [1, 218, 173, 0] Test round number Number of responding hosts

slide-36
SLIDE 36

Other fun problems: CAIDA

Here is a small section of the

1.1 petabyte

(and growing) CAIDA data set. It contains measurements of the worldwide Internet connectivity and latency (traceroute). Same structure appears!

Time Normalized latency

slide-37
SLIDE 37

Big Data

By Holger Motzkau 2010, Wikipedia/Wikimedia Commons (cc-by-sa-3.0), CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=11115505

Math Computer Science

slide-38
SLIDE 38

Equivalent formulation

slide-39
SLIDE 39

Big Data

Original algorithm. Rank=2, probability of corruption=2%,

  • bservations=10m

and new algorithm!

  • R. Paffenroth, R. Nong, P. Du Toit, On covariance structure in noisy, big data.

Proceedings Vol. 8857, Signal and Data Processing of Small Targets, October 2013, Oliver E. Drummond; Richard D. Teichgraeber, Editors.

slide-40
SLIDE 40

Big Data

Hey, wait a minute...

slide-41
SLIDE 41

How can this be? Math helps...

( ) ( ) ( ) ( ( ))

slide-42
SLIDE 42

How can this be? Implementation helps...

Think about as distributed databases.

slide-43
SLIDE 43

Distributed databases.

Ali Benamara

slide-44
SLIDE 44

One meta-thought: The "Iron man" approach Person AND Machine

http://www.independent.co.uk/arts-entertainment/films/reviews/iron-man-3-review-a-big-hand-for-downey-jr-but-movie-lacks-dramatic-mettle-8588873.html

slide-45
SLIDE 45

Questions?

N Y A U B V