10 steps forward & 5 steps backward DeepSec 2011 Sourabh Satish - - PowerPoint PPT Presentation

10 steps forward 5 steps backward
SMART_READER_LITE
LIVE PREVIEW

10 steps forward & 5 steps backward DeepSec 2011 Sourabh Satish - - PowerPoint PPT Presentation

Behavioral Security: 10 steps forward & 5 steps backward DeepSec 2011 Sourabh Satish Distinguished Engineer/ Chief Architect, Symantec Behavioral Security - DeepSec 2011 1 Agenda Threat Landscape 1 Behavioral Security Overview 2


slide-1
SLIDE 1

Behavioral Security - DeepSec 2011

1

Behavioral Security: 10 steps forward & 5 steps backward

DeepSec 2011

Sourabh Satish

Distinguished Engineer/ Chief Architect, Symantec

slide-2
SLIDE 2

Agenda

Behavioral Security - DeepSec 2011

2

Threat Landscape

1

Behavioral Security Overview Traditional rules based behavioral security Machine Learning – Supervised and Unsupervised Machine Learning for behavioral security Real world examples Conclusion

2 3 4 5 6 7

slide-3
SLIDE 3

Threat Landscape

Behavioral Security - DeepSec 2011

3

Motivation?

slide-4
SLIDE 4

Threat Landscape

2010-2011 Trends

Behavioral Security - DeepSec 2011

 Social Networking

+ social engineering = compromise

Attack Kits

get a caffeine boost

 Targeted Attacks

continued to evolve

Hide and Seek

(zero-day vulnerabilities and rootkits)

 Mobile Threats

increase

4

slide-5
SLIDE 5

Threat Landscape

Why is it hard to stop attacks?

From:

A mass distribution of a relatively few threats e.g.

  • Storm made its way onto millions
  • f machines across the globe

To: A micro distribution model e.g.

  • The average Vundo variant is

distributed to 18 Symantec users!

  • The average Harakit variant is

distributed to 1.6 Symantec users!

286M+ distinct new threats discovered last year!

What are the odds a security vendor will discover all these threats?

Behavioral Security - DeepSec 2011

5

Many reasons, one being: Malware authors have switched tactics

slide-6
SLIDE 6

Changes at the byte-level evade traditional file-based pattern-matching engines Analyzing the Problem

“Unique” threats are unique at the byte-level

6 6

Behavioral Security - DeepSec 2011

Hacker develops threat Hacker uses Tool to

  • bfuscate executable

Tool generates clones that differ at the byte-level

This is my first virus that I plan to use to steal key and passwords from unsuspecting victims. Kjjkjjj sdkjhkjsj398jid 9-2 -02-00 3984—2 3— 030984 1299- 04 1-03---0- 23li jkjdunjjdpe d. Ijis kks my alsiep siilf that pasje ata see ps stwe ake adas pasowallsie sppfr ausupeasect ffi Ijis kks my alsiep siilf that pasje ata see ps stwe ake adas pasowallsie sppfr ausupeasect ffi Ista asbin lsiked lipole alskk askf hwpks pollasjjfklg toalkkst pooldajao sjfkg asklfa klla oek

slide-7
SLIDE 7

Examples of Threat Cloning

Malware Generators & Obfuscators

Behavioral Security - DeepSec 2011

7

slide-8
SLIDE 8

To the Cloud…

Presentation Identifier Goes Here

8

slide-9
SLIDE 9

Examples of Threat Cloning

Misleading Applications

  • Re-Skinning – Binary File is unchanged except for user-visible strings

Behavioral Security - DeepSec 2011

9

Number of Clones: 49

slide-10
SLIDE 10
  • Bytes change. But how about the behaviors of these threats ?

Password Stealers

will continue to steal passwords

…behaviors don’t change..

Behavioral Security - DeepSec 2011

10

Analyzing the Problem

Are these “unique” threats really unique ?

Spam Bots

will continue to send Spam

Rogue AntiVirus

will continue to popup misleading messages

slide-11
SLIDE 11

Solving the Problem

Behavior-based Detection

Engine that ignores what the threat looks like But detects threats based on what the threat does

11 11

Behavioral Security - DeepSec 2011

slide-12
SLIDE 12
  • Detection is “after” the fact

– After the sample has run on the system, you analyze the impact and conclude if the action taken was malicious and then remediate the threat and reverse its persistent system changes.

  • Prevention is “before” the fact

– You conclude that the action that a sample is about to take is malicious and hence prevent the action from happening in the first place. You remediate the threat and minimal system settings change(restore) is needed.

  • Protection

– Both detection based and prevention based technologies can offer protection.

  • Challenges:

– Detection based approach: Can all changes be reversed? File modified on disk? – Prevention based approach: Which action do you block and inspect? What is the performance overhead?

  • Debatable!

– Blocked the 5th event and hence prevented 6th most impactful event!

Behavioral Security - DeepSec 2011

12

Clarifying the terminology

Detection vs. Prevention vs. Protection

slide-13
SLIDE 13

Legacy rules based behavioral security

Behavioral Security - DeepSec 2011

13

slide-14
SLIDE 14
  • Rules to identify malicious activity and take action

Behavioral Security - DeepSec 2011

14

The Legacy Solution

Rules based behavioral security

slide-15
SLIDE 15

The legacy solution

Rules based behavioral security

  • Simple and intuitive model (Expert System)

– Domain Experts know how to distinguish between good and bad – They analyze the malware, spot the trends/patterns and write rules – Product ships with default set of rules & rules are updated regularly – The product may also have an ability to let users express new rules in the product

  • Applicability

– Many security products, especially enterprise products use this model – Maybe the only answer for some threat scenarios

  • Pros

– Broader coverage for variants, Precise reasoning for detection, Name the threat, Relevant Actions

  • Cons

– Scalability, Domain Expertise

Low error rate?

15 15

Behavioral Security - DeepSec 2011

slide-16
SLIDE 16
  • Fact:

– Behavioral variants are far less than file variants

  • New SHA256 = a file variant OR really a new malware?

– Same malware may be packed differently – Same malware may be skinned differently

  • Answer:

– Analyze the threat?

AUTOMATION COLLECT DATA DATA MINING

Behavioral Security - DeepSec 2011

16

Addressing the challenge

Scalability

0 1 0 1 1 0 0 1 1 1 0 1 1 1 0 1 0 1 1 1 1 0 0 1 1 1 0 1 1 0 1 1 0 1 0 1 1 0 0 1 1 1 0 1 1 1 0 1 0 1 1 1 1 0 0 1 1 1 0 1 1 0 1 1 0 1 0 1 1 0 0 1 1 1 0 1 1 1 0 1 0 1 1 1 1 0 0 1 1 1 0 1 1 0 1 1

slide-17
SLIDE 17

Machine Learning - Basics

Behavioral Security - DeepSec 2011

17

slide-18
SLIDE 18
  • New approach to AI is to get the computer to program itself by

showing it examples (data or past experiences) of behavior we want!

– This is the learning approach to AI Name Face – Often hand programming is not possible or not a feasible answer like face detectors, handwriting reader, etc.

Behavioral Security - DeepSec 2011

18

Machine Learning

Learning by Example

slide-19
SLIDE 19
  • Central Question

“How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning process?”

  • What is the learning problem?

A process learns with respect to <T, P, E> if it

  • Improves its performance P
  • At task T
  • Through experience E

“The Discipline of Machine Learning” T. Mitchell (2006)

  • Machine Learning algorithms discover the relationships between the

variables of a system (input, output and hidden) from direct samples of the system

Behavioral Security - DeepSec 2011

19

Machine Learning

What is Machine Learning?

slide-20
SLIDE 20
  • Computer Science

– How can we build machines that solve problems, and which problems are inherently tractable/intractable?

  • Statistics

– What can be inferred from data plus a set of modeling assumptions, with what reliability?

  • Cognitive Science

– How does the mind process information in faculties such as perception, language, memory, reasoning and emotion?

  • Information Theory

– How can we quantify, process, store and communicate data efficiently?

ML builds on all these questions but is a distinct question

Behavioral Security - DeepSec 2011

20

Information Theory Computer Science (AI) Cognitive Science Statistics

Machine Learning Machine Learning

Building Blocks

slide-21
SLIDE 21

Machine Learning

Categories of Machine Learning

  • Supervised Learning

– Given example of inputs and corresponding desired outputs, predict outputs on future inputs

  • Given input output pairs <xi ,yi>, learn a function f(xi) = yi for all i that makes a good guess at y for unseen x
  • Labeled Data*

– Example: Classification, Regression

  • Unsupervised Learning

– Given only inputs, automatically discover representations, features, structure, etc.

  • Unlabeled Data*

– Example: Clustering, Outlier detection

  • Semi Supervised Learning

– Learning from a combination of labeled and unlabeled data – Example: supervised learning problems like video indexing, bioinformatics

  • Applied where there is less labeled data and abundance of unlabeled data *
  • Reinforcement Learning

– Given sequence of inputs, actions from a fixed set, and scalar rewards/punishments, learn to select action sequences that maximizes expected reward – Example: Robotics

Behavioral Security - DeepSec 2011

21

slide-22
SLIDE 22

1) Pick a feature representation for your task

– Inputs and Outputs, Feature identification (power to discriminate)

2) Compile data 3) Choose a machine learning algorithm 4) Train the algorithm 5) Evaluate the results Probably: go to (1)

  • - -- --
  • - -- --
  • - -- --

Target data Cleaned data Transformed data Patterns/ model Knowledge Database/data warehouse Selection & Sampling Preprocessing & Cleaning Transformation & Reduction Interpretation/ Evaluation Data Mining Performance system

Machine Learning

Steps

22

Behavioral Security - DeepSec 2011

slide-23
SLIDE 23
  • WEKA (University of Waikato)

– Java based, freely available, lots of algorithms built in

  • Does not scale well to large data sets
  • Orange

– Native + Python, Drag-and-drop UI AND Automation friendly – Comparable Algorithms

  • Input file formats: ARFF file vs. TSV file

Behavioral Security - DeepSec 2011

Tools

Many choices

23

slide-24
SLIDE 24

Machine Learning for behavioral security

Behavioral Security - DeepSec 2011

24

slide-25
SLIDE 25
  • Goal

– Train a model to provide automated meaningful information about unknown samples

  • Identify class/label (Supervised Machine Learning)  Classification
  • Identify association (Unsupervised Machine Learning)  Clustering
  • Application of information extracted

– Classify the sample or provide information to analysts for labeling and writing definitions for detection – Real time protection

Behavioral Security - DeepSec 2011

25

Machine Learning for behavioral security

Overview

slide-26
SLIDE 26
  • Steps

– Collect samples – Setup a VM with *monitoring framework* – Push and run samples in a farm of virtual machines – Collect sample behavior data – Recycle the VM – Extract information into format suitable for data mining – Train the models – Test and deploy the models

Behavioral Security - DeepSec 2011

26

Machine Learning for behavioral security

Overall process

slide-27
SLIDE 27
  • Monitoring framework

– Data Collection – User mode hooking API: Detours (Microsoft)

  • Hook the APIs
  • Collect the data in the context of the API Hook

– API Info(Name, Parameters), Called-from API, State of the process, etc. – Log the information

  • Extract features: Logs  ARFF files

– API Called – Has UI/Window – Does Network Communication

  • IRC
  • HTTP

– Registered in AutoStart locations – Creates Windows Tasks (jobs) – Modifies PE Files – Creates PE Files – Injects into Trusted Processes

Behavioral Security - DeepSec 2011

27

Supervised Machine Learning

For real-time protection

slide-28
SLIDE 28

Behavioral Security

28

Supervised Machine Learning For real-time protection

slide-29
SLIDE 29

Behavioral Security - DeepSec 2011

29

Example

Data to Models

…click here if demo GODs act up!..

slide-30
SLIDE 30
  • Monitoring and Blocking hook points

– May or may not be the same

  • Some hooks points are merely for state/information collection

– Work done in API Hooks

  • Collect information
  • Transform information into feature vector
  • Evaluate against model
  • Allow or Deny

Behavioral Security - DeepSec 2011

30

Lab to field

Apply Classifiers

slide-31
SLIDE 31
  • Which APIs to hook?

– Higher level API (CreateProcess @ kernel32.dll) – Lower level API (NtCreateProcess? NtCreateThread?, Ldrpxxx?) – Higher level APIs (exports by kernel32) provide fine grain control – Many high level APIs map to few lower level APIs (functionally) – Lower level APIs provide a more comprehensive view

  • Block Action:

– Failing an API

  • Out parameter
  • Return code

– Terminate Thread/process

Behavioral Security - DeepSec 2011

31

Lab to field

Apply Classifiers

slide-32
SLIDE 32

Machine Learning for behavioral security

Behavioral Security - DeepSec 2011

32

Reality check…

slide-33
SLIDE 33
  • Practical Challenges

– Samples fail to run in automation

  • Good Samples fail to run in automation

– more commonly than Malicious samples – Dependency, Configuration, etc. – GUI automation

  • Malicious Samples deliberately fail to run in automation

– VM Aware – Automation Aware

  • Check own file name (example: sysdate.exe)
  • Check parent process (Threat: Trojan.Tracur)
  • Check application settings (Threat: Adware.InstantBuzz)
  • Check commonly used applications (MS Office)
  • Samples may be stale: C & C Down

– System state sensitivity

  • Valid Samples: Missing depencies like Java, .NET, etc.
  • Malicious Samples: Missing targeted applications like Adobe Reader, QuickTime, etc.

Behavioral Security - DeepSec 2011

33

Automation

Reality check

slide-34
SLIDE 34
  • Machine Learning Challenges

– Imbalanced data sets – Missing features – Anomalous feature values

  • outlier or deliberately manufactured?
  • Some tricks observed in malware*:

– Non-standard ImageBase – Large values in .DATA/SizeofRawdata – Bogus values in LoaderFlags

Behavioral Security - DeepSec 2011

34

Machine Learning

Reality check

*Scan of the Month 33: Anti Reverse Engineering Uncovered By Nicolas Brulez - 0x90(at)Rstack(dot)org

slide-35
SLIDE 35
  • NPTs (Non Process Threat)

– Trusted process -> Malicious Behavior – File vs. Process – Code vs. Data

  • Malicious PDF  Browser or Adobe reader
  • Malicious JAR files  Browser or java.exe
  • Malicious MSI files  msiexec.exe

– DLLs

  • Regsvr32
  • Rundll32
  • Svchost.exe
  • IE/Explorer Extensions

How to automate these? How/where is protection enforced? What is remediated?

Behavioral Security - DeepSec 2011

35

Stealthy Malware

Malicious Payloads

slide-36
SLIDE 36

Conclusion & Food for thought!

Behavioral Security - DeepSec 2011

36

slide-37
SLIDE 37
  • Volume of malware by unique file fingerprint != New Malware

– Behaviorally malware is not evolving at every instance – Scalability can be handled with Automation – Be aware of pitfalls of automation – Automation + domain knowledge – Use domain experts effectively

  • Challenge

– What if the Malware is a valid application with configuration file? – Solution: Opportunity for Creative Feature Engineering?

Behavioral Security - DeepSec 2011

37

Recap

Scaling to the Malware population

slide-38
SLIDE 38

Behavioral Security - DeepSec 2011

38

Thank You!

Sourabh Satish

sourabh_satish@symantec.com

slide-39
SLIDE 39

Behavioral Security - DeepSec 2011

39

slide-40
SLIDE 40

Behavioral Security - DeepSec 2011

40

slide-41
SLIDE 41

Behavioral Security

41

slide-42
SLIDE 42

Behavioral Security

42

slide-43
SLIDE 43

Behavioral Security

43

slide-44
SLIDE 44

Behavioral Security

44

slide-45
SLIDE 45

Behavioral Security

45

slide-46
SLIDE 46

Behavioral Security

46

slide-47
SLIDE 47

Behavioral Security

47

slide-48
SLIDE 48

Behavioral Security

48

slide-49
SLIDE 49

Behavioral Security - DeepSec 2011

49

slide-50
SLIDE 50

Behavioral Security

50

slide-51
SLIDE 51

Behavioral Security

51

slide-52
SLIDE 52

Behavioral Security

52

slide-53
SLIDE 53

Behavioral Security

53

slide-54
SLIDE 54

Behavioral Security

54

slide-55
SLIDE 55

Behavioral Security

55

slide-56
SLIDE 56

Behavioral Security

56

slide-57
SLIDE 57

Behavioral Security

57

slide-58
SLIDE 58

Behavioral Security

58

slide-59
SLIDE 59

Behavioral Security

59

slide-60
SLIDE 60

Behavioral Security

60

slide-61
SLIDE 61

Behavioral Security - DeepSec 2011

61

Go back…