PlatPal: Detecting Malicious Documents with Platform Diversity Meng - - PowerPoint PPT Presentation

platpal detecting malicious documents with platform
SMART_READER_LITE
LIVE PREVIEW

PlatPal: Detecting Malicious Documents with Platform Diversity Meng - - PowerPoint PPT Presentation

PlatPal: Detecting Malicious Documents with Platform Diversity Meng Xu and Taesoo Kim Georgia Institute of Technology 1 Malicious Documents On the Rise 2 3 4 Adobe Components Exploited Element parser JavaScript engine 137 CVEs in 2015


slide-1
SLIDE 1

PlatPal: Detecting Malicious Documents with Platform Diversity

Meng Xu and Taesoo Kim Georgia Institute of Technology

1

slide-2
SLIDE 2

Malicious Documents On the Rise

2

slide-3
SLIDE 3

3

slide-4
SLIDE 4

4

slide-5
SLIDE 5

5

Adobe Components Exploited

Element parser JavaScript engine Font manager System dependencies 137 CVEs in 2015 227 CVEs in 2016

slide-6
SLIDE 6

6

Maldoc Formula

Flexibility of doc spec A large attack surface Less caution from users More opportunities to profit

slide-7
SLIDE 7

Battle against Maldoc - A Survey

7

Category Focus Work Year Detection Static Dynamic

slide-8
SLIDE 8

Battle against Maldoc - A Survey

8

Category Focus Work Year Detection Static JavaScript PJScan 2011 Lexical analysis JavaScript Vatamanu et al. 2012 Token clustering JavaScript Lux0r 2014 API reference classification JavaScript MPScan 2013 Shellcode and opcode sig Dynamic

slide-9
SLIDE 9

Battle against Maldoc - A Survey

9

Category Focus Work Year Detection Static JavaScript PJScan 2011 Lexical analysis JavaScript Vatamanu et al. 2012 Token clustering JavaScript Lux0r 2014 API reference classification JavaScript MPScan 2013 Shellcode and opcode sig Metadata PDF Malware Slayer 2012 Linearized object path Metadata Srndic et al. 2013 Hierarchical structure Metadata PDFrate 2012 Content meta-features Both Maiorca et al. 2016 Many heuristics combined Dynamic

slide-10
SLIDE 10

Battle against Maldoc - A Survey

10

Category Focus Work Year Detection Static JavaScript PJScan 2011 Lexical analysis JavaScript Vatamanu et al. 2012 Token clustering JavaScript Lux0r 2014 API reference classification JavaScript MPScan 2013 Shellcode and opcode sig Metadata PDF Malware Slayer 2012 Linearized object path Metadata Srndic et al. 2013 Hierarchical structure Metadata PDFrate 2012 Content meta-features Both Maiorca et al. 2016 Many heuristics combined Dynamic JavaScript MDScan 2011 Shellcode and opcode sig JavaScript PDF Scrutinizer 2012 Known attack patterns JavaScript ShellOS 2011 Memory access patterns JavaScript Liu et al. 2014 Common attack behaviors

slide-11
SLIDE 11

Battle against Maldoc - A Survey

11

Category Focus Work Year Detection Static JavaScript PJScan 2011 Lexical analysis JavaScript Vatamanu et al. 2012 Token clustering JavaScript Lux0r 2014 API reference classification JavaScript MPScan 2013 Shellcode and opcode sig Metadata PDF Malware Slayer 2012 Linearized object path Metadata Srndic et al. 2013 Hierarchical structure Metadata PDFrate 2012 Content meta-features Both Maiorca et al. 2016 Many heuristics combined Dynamic JavaScript MDScan 2011 Shellcode and opcode sig JavaScript PDF Scrutinizer 2012 Known attack patterns JavaScript ShellOS 2011 Memory access patterns JavaScript Liu et al. 2014 Common attack behaviors Memory CWXDetector 2012 Violation of invariants

slide-12
SLIDE 12

Reliance on External PDF Parser

12

Category Focus Work Year Detection External Parser ? Static JavaScript PJScan 2011 Lexical analysis Yes JavaScript Vatamanu et al. 2012 Token clustering Yes JavaScript Lux0r 2014 API reference classification Yes JavaScript MPScan 2013 Shellcode and opcode sig No Metadata PDF Malware Slayer 2012 Linearized object path Yes Metadata Srndic et al. 2013 Hierarchical structure Yes Metadata PDFrate 2012 Content meta-features Yes Both Maiorca et al. 2016 Many heuristics combined Yes Dynamic JavaScript MDScan 2011 Shellcode and opcode sig Yes JavaScript PDF Scrutinizer 2012 Known attack patterns Yes JavaScript ShellOS 2011 Memory access patterns Yes JavaScript Liu et al. 2014 Common attack behaviors No Memory CWXDetector 2012 Violation of invariants No

slide-13
SLIDE 13

Category Focus Work Year Detection External Parser ? Static JavaScript PJScan 2011 Lexical analysis Yes JavaScript Vatamanu et al. 2012 Token clustering Yes JavaScript Lux0r 2014 API reference classification Yes JavaScript MPScan 2013 Shellcode and opcode sig No Metadata PDF Malware Slayer 2012 Linearized object path Yes Metadata Srndic et al. 2013 Hierarchical structure Yes Metadata PDFrate 2012 Content meta-features Yes Both Maiorca et al. 2016 Many heuristics combined Yes Dynamic JavaScript MDScan 2011 Shellcode and opcode sig Yes JavaScript PDF Scrutinizer 2012 Known attack patterns Yes JavaScript ShellOS 2011 Memory access patterns Yes JavaScript Liu et al. 2014 Common attack behaviors No Memory CWXDetector 2012 Violation of invariants No

Reliance on External PDF Parser

13

Parser-confusion attacks

(Carmony et al., NDSS’16)

slide-14
SLIDE 14

Reliance on Machine Learning

14

Category Focus Work Year Detection Machine Learning ? Static JavaScript PJScan 2011 Lexical analysis Yes JavaScript Vatamanu et al. 2012 Token clustering Yes JavaScript Lux0r 2014 API reference classification Yes JavaScript MPScan 2013 Shellcode and opcode sig No Metadata PDF Malware Slayer 2012 Linearized object path Yes Metadata Srndic et al. 2013 Hierarchical structure Yes Metadata PDFrate 2012 Content meta-features Yes Both Maiorca et al. 2016 Many heuristics combined Yes Dynamic JavaScript MDScan 2011 Shellcode and opcode sig No JavaScript PDF Scrutinizer 2012 Known attack patterns No JavaScript ShellOS 2011 Memory access patterns No JavaScript Liu et al. 2014 Common attack behaviors No Memory CWXDetector 2012 Violation of invariants No

slide-15
SLIDE 15

Reliance on Machine Learning

15

Category Focus Work Year Detection Machine Learning ? Static JavaScript PJScan 2011 Lexical analysis Yes JavaScript Vatamanu et al. 2012 Token clustering Yes JavaScript Lux0r 2014 API reference classification Yes JavaScript MPScan 2013 Shellcode and opcode sig No Metadata PDF Malware Slayer 2012 Linearized object path Yes Metadata Srndic et al. 2013 Hierarchical structure Yes Metadata PDFrate 2012 Content meta-features Yes Both Maiorca et al. 2016 Many heuristics combined Yes Dynamic JavaScript MDScan 2011 Shellcode and opcode sig No JavaScript PDF Scrutinizer 2012 Known attack patterns No JavaScript ShellOS 2011 Memory access patterns No JavaScript Liu et al. 2014 Common attack behaviors No Memory CWXDetector 2012 Violation of invariants No

Automatic classifier evasions

(Xu et al., NDSS’16)

slide-16
SLIDE 16

Reliance on Known Attacks

16

Category Focus Work Year Detection Known Attacks ? Static JavaScript PJScan 2011 Lexical analysis Yes JavaScript Vatamanu et al. 2012 Token clustering Yes JavaScript Lux0r 2014 API reference classification Yes JavaScript MPScan 2013 Shellcode and opcode sig Yes Metadata PDF Malware Slayer 2012 Linearized object path Yes Metadata Srndic et al. 2013 Hierarchical structure Yes Metadata PDFrate 2012 Content meta-features Yes Both Maiorca et al. 2016 Many heuristics combined Yes Dynamic JavaScript MDScan 2011 Shellcode and opcode sig Yes JavaScript PDF Scrutinizer 2012 Known attack patterns Yes JavaScript ShellOS 2011 Memory access patterns Yes JavaScript Liu et al. 2014 Common attack behaviors Yes Memory CWXDetector 2012 Violation of invariants No

slide-17
SLIDE 17

Reliance on Known Attacks

17

Category Focus Work Year Detection Known Attacks ? Static JavaScript PJScan 2011 Lexical analysis Yes JavaScript Vatamanu et al. 2012 Token clustering Yes JavaScript Lux0r 2014 API reference classification Yes JavaScript MPScan 2013 Shellcode and opcode sig Yes Metadata PDF Malware Slayer 2012 Linearized object path Yes Metadata Srndic et al. 2013 Hierarchical structure Yes Metadata PDFrate 2012 Content meta-features Yes Both Maiorca et al. 2016 Many heuristics combined Yes Dynamic JavaScript MDScan 2011 Shellcode and opcode sig Yes JavaScript PDF Scrutinizer 2012 Known attack patterns Yes JavaScript ShellOS 2011 Memory access patterns Yes JavaScript Liu et al. 2014 Common attack behaviors Yes Memory CWXDetector 2012 Violation of invariants No

How about zero-day attacks ?

slide-18
SLIDE 18

Reliance on Detectable Discrepancy

(between benign and malicious docs)

18

Category Focus Work Year Detection Discrepancy ? Static JavaScript PJScan 2011 Lexical analysis Yes JavaScript Vatamanu et al. 2012 Token clustering Yes JavaScript Lux0r 2014 API reference classification Yes JavaScript MPScan 2013 Shellcode and opcode sig No Metadata PDF Malware Slayer 2012 Linearized object path Yes Metadata Srndic et al. 2013 Hierarchical structure Yes Metadata PDFrate 2012 Content meta-features Yes Both Maiorca et al. 2016 Many heuristics combined Yes Dynamic JavaScript MDScan 2011 Shellcode and opcode sig No JavaScript PDF Scrutinizer 2012 Known attack patterns No JavaScript ShellOS 2011 Memory access patterns Yes JavaScript Liu et al. 2014 Common attack behaviors Yes Memory CWXDetector 2012 Violation of invariants No

slide-19
SLIDE 19

Reliance on Detectable Discrepancy

(between benign and malicious docs)

19

Category Focus Work Year Detection Discrepancy ? Static JavaScript PJScan 2011 Lexical analysis Yes JavaScript Vatamanu et al. 2012 Token clustering Yes JavaScript Lux0r 2014 API reference classification Yes JavaScript MPScan 2013 Shellcode and opcode sig No Metadata PDF Malware Slayer 2012 Linearized object path Yes Metadata Srndic et al. 2013 Hierarchical structure Yes Metadata PDFrate 2012 Content meta-features Yes Both Maiorca et al. 2016 Many heuristics combined Yes Dynamic JavaScript MDScan 2011 Shellcode and opcode sig No JavaScript PDF Scrutinizer 2012 Known attack patterns No JavaScript ShellOS 2011 Memory access patterns Yes JavaScript Liu et al. 2014 Common attack behaviors Yes Memory CWXDetector 2012 Violation of invariants No

Mimicry and reverse mimicry attacks

(Srndic et al., Oakland’14 and Maiorca et al, AsiaCCS’13)

slide-20
SLIDE 20

Prior works rely on

  • External PDF parsers
  • Machine learning
  • Known attack signatures
  • Detectable discrepancy

20

Highlights of the Survey

Parser-confusion attacks Automatic classifier evasion Zero-day attacks Mimicry and reverse mimicry

slide-21
SLIDE 21

Prior works rely on

  • External PDF parsers
  • Machine learning
  • Known attack signatures
  • Detectable discrepancy

21

Motivations for PlatPal

What PlatPal aims to achieve

  • Using Adobe’s parser
  • Using only simple heuristics
  • Capable to detect zero-days
  • Do not assume discrepancy
  • Complementary to prior works
slide-22
SLIDE 22

Prior works rely on

  • External PDF parsers
  • Machine learning
  • Known attack signatures
  • Detectable discrepancy

22

Motivations for PlatPal

What PlatPal aims to achieve

  • Using Adobe’s parser
  • Using only simple heuristics
  • Capable to detect zero-days
  • Do not assume discrepancy
  • Complementary to prior works
slide-23
SLIDE 23

Prior works rely on

  • External PDF parsers
  • Machine learning
  • Known attack signatures
  • Detectable discrepancy

23

Motivations for PlatPal

What PlatPal aims to achieve

  • Using Adobe’s parser
  • Using only simple heuristics
  • Capable to detect zero-days
  • Do not assume discrepancy
  • Complementary to prior works
slide-24
SLIDE 24

Prior works rely on

  • External PDF parsers
  • Machine learning
  • Known attack signatures
  • Detectable discrepancy

24

Motivations for PlatPal

What PlatPal aims to achieve

  • Using Adobe’s parser
  • Using only simple heuristics
  • Capable to detect zero-days
  • Do not assume discrepancy
  • Complementary to prior works
slide-25
SLIDE 25

Prior works rely on

  • External PDF parsers
  • Machine learning
  • Known attack signatures
  • Detectable discrepancy

25

Motivations for PlatPal

What PlatPal aims to achieve

  • Using Adobe’s parser
  • Using only simple heuristics
  • Capable to detect zero-days
  • Do not assume discrepancy
  • Complementary to prior works
slide-26
SLIDE 26

Prior works rely on

  • External PDF parsers
  • Machine learning
  • Known attack signatures
  • Detectable discrepancy

26

Motivations for PlatPal

What PlatPal aims to achieve

  • Using Adobe’s parser
  • Using only simple heuristics
  • Capable to detect zero-days
  • Do not assume discrepancy
  • Complementary to prior works
slide-27
SLIDE 27

A Motivating Example

  • A CVE-2013-2729 PoC against Adobe Reader 10.1.4

SHA-1: 74543610d9908698cb0b4bfcc73fc007bfeb6d84

27

slide-28
SLIDE 28

28

slide-29
SLIDE 29

29

slide-30
SLIDE 30

Platform Diversity as A Heuristic

30

When the same document is opened across different platforms:

  • A benign document “behaves” the same
  • A malicious document “behaves” differently
slide-31
SLIDE 31

Similar Ideas

  • Two variants placed in disjoint memory partitions

[N-Variant Systems]

  • Two variants with stacks growing in different directions

[Orchestra]

  • Multiple variants with randomized heap object locations

[DieHard]

  • Multiple versions of the same program

[Varan, Mx]

31

slide-32
SLIDE 32

Questions for PlatPal

32

  • What is a “behavior” ?
  • What is a divergence ?
  • How to trace them ?
  • How to compare them ?
slide-33
SLIDE 33

PlatPal Basic Setup

33

Windows Host

Virtual Machine 1

Adobe Reader

MacOS Host

Virtual Machine 2

Adobe Reader

?

slide-34
SLIDE 34

PlatPal Dual-Level Tracing

34

Virtual Machine 1

Adobe Reader Internal Tracer

Virtual Machine 2

Adobe Reader Internal Tracer

?

Windows Host MacOS Host

Traces of PDF processing

slide-35
SLIDE 35

PlatPal Dual-Level Tracing

35

Virtual Machine 1

Adobe Reader Internal Tracer Syscalls External Tracer

Virtual Machine 2

Adobe Reader Internal Tracer Syscalls External Tracer

?

Windows Host MacOS Host

Impacts on host platform Traces of PDF processing

slide-36
SLIDE 36

PlatPal Internal Tracer

36

Adobe Reader Internal Tracer COS object parsing PD tree construction Script execution Other actions Element rendering

  • Implemented as an Adobe

Reader plugin.

  • Hooks critical functions and

callbacks during the PDF processing lifecycle.

  • Very fast and stable across

Adobe Reader versions.

slide-37
SLIDE 37

PlatPal External Tracer

37

Virtual Machine

Adobe Reader Syscalls External Tracer

Host Platform

Filesystem Operations Network Activities Program Executions Normal Exit

  • r Crash
  • Implemented based on NtTrace

(for Windows) and Dtrace (for MacOS).

  • Resembles high-level system

impacts in the same manner as Cuckoo guest agent.

  • Starts tracing only after the

document is loaded into Adobe Reader.

slide-38
SLIDE 38

PlatPal Automated Workflow

38

Windows VM

Restore Clean Snapshot Launch Adobe Reader Attach External Tracer Open PDF Drive PDF by Internal Tracer Dump Traces Restore Clean Snapshot Launch Adobe Reader Attach External Tracer Open PDF Drive PDF by Internal Tracer Dump Traces

MacOS VM

Compare Traces

PlatPal <file-to-check>

slide-39
SLIDE 39

Evaluate PlatPal

39

  • Robustness against benign samples

A benign document “behaves” the same ?

  • Effectiveness against malicious samples

A malicious document “behaves” differently ?

  • Speed and resource usages
slide-40
SLIDE 40

Robustness

40

Sample Type Number of Samples Divergence Detected ? (i.e., False Positive) Plain PDF 966 No Embedded fonts 34 No JavaScript code 32 No AcroForm 17 No 3D objects 2 No

  • 1000 samples from Google search.
  • 30 samples that use advanced features in PDF standards

from PDF learning sites.

slide-41
SLIDE 41

Effectiveness

  • 320 malicious samples from VirusTotal with CVE labels.
  • Restricted to analyze CVEs published after 2013.
  • Use the most recent version of Adobe Reader when the CVE is

published.

41

slide-42
SLIDE 42

Effectiveness

Analysis Results of 
 320 Maldoc Samples 65% 11% 24% No Divergence Both Crash Divergence

42

slide-43
SLIDE 43

Effectiveness

Analysis Results of 
 320 Maldoc Samples 65% 11% 24% No Divergence Breakdown of 77 
 potentially false positives 26% 3% 25% 47% Targets old versions Mis-classified by AV vendor No malicious activity trigerred Unknown

43

slide-44
SLIDE 44

Time and Resource Usages

Average Analysis Time Breakdown (unit. Seconds)

Item Windows MacOS Snapshot restore 9.7 12.6 Document parsing 0.5 0.6 Script execution 10.5 5.1 Element rendering 7.3 6.2 Total 23.7 22.1

Resource Usages

  • 2GB memory per running virtual

machine.

  • 60GB disk space for Windows

and MacOS snapshots that each corresponds to one of the 6 Adobe Readers versions.

44

slide-45
SLIDE 45

Evaluation Highlights

  • Confirms our fundamental assumption in general:

benign document “behaves” the same malicious document “behaves” differently

  • PlatPal is subject to the pitfalls of dynamic analysis

i.e., prepare the environment to lure the malicious behaviors

  • Incurs reasonable analysis time to make PlatPal practical

45

slide-46
SLIDE 46

Further Analysis

  • What could be the root causes of these divergences?

46

slide-47
SLIDE 47

Diversified Factors across Platforms

47

Category Factor Windows MacOS Shellcode Creation Memory Management Platform Features

slide-48
SLIDE 48

Diversified Factors across Platforms

48

Category Factor Windows MacOS Shellcode Creation Syscall semantics Both the syscall number and the register set used to hold syscall arguments are different Calling convention rcx, rdx, r8 for first 3 args rdi, rsi, rdx for first 3 args Library dependencies e.g., LoadLibraryA e.g. dlopen Memory Management Platform Features

slide-49
SLIDE 49

Diversified Factors across Platforms

49

Category Factor Windows MacOS Shellcode Creation Syscall semantics Both the syscall number and the register set used to hold syscall arguments are different Calling convention rcx, rdx, r8 for first 3 args rdi, rsi, rdx for first 3 args Library dependencies e.g., LoadLibraryA e.g. dlopen Memory Management Memory layout Offset from attack point (e.g., overflowed buffer) to target address (e.g., vtable entries) are different Heap management Segment heap Magazine malloc Platform Features

slide-50
SLIDE 50

Diversified Factors across Platforms

50

Category Factor Windows MacOS Shellcode Creation Syscall semantics Both the syscall number and the register set used to hold syscall arguments are different Calling convention rcx, rdx, r8 for first 3 args rdi, rsi, rdx for first 3 args Library dependencies e.g., LoadLibraryA e.g. dlopen Memory Management Memory layout Offset from attack point (e.g., overflowed buffer) to target address (e.g., vtable entries) are different Heap management Segment heap Magazine malloc Platform Features Executable format COM, PE, NE Mach-O Filesystem semantics \ as separator, 
 prefixed drive letter C:\ / as separator,
 no prefixed drive letter Config and info hub registry proc Expected programs MS Office, IE, etc Safari, etc

slide-51
SLIDE 51

Back to The Motivating Example

51

  • 1. Allocate 1000 300-bytes chunks
  • 2. Free 1 in every 10
  • 3. Load a 300-byte malicious BMP image
  • 4. Corrupt heap metadata due to a buffer overflow
  • 5. Free BMP image, but what is actually

freed is slot 9

  • 6. A vtable of 300-byte is allocated on

slot 9, which is attacker controlled

slide-52
SLIDE 52

Another Case Study

52

CVE-2014-0521 PoC Example

slide-53
SLIDE 53

Apply Diversity to Stop Attacks

53

Vulnerability Discovery Exploitation Malicious Activities Success

slide-54
SLIDE 54

e.g. bugs in system library CVE-2015-2426

Platform-specific bugs

Apply Diversity to Stop Attacks

54

Vulnerability Discovery Exploitation Malicious Activities Success

Adobe implementation bugs

slide-55
SLIDE 55

e.g. bugs in system library CVE-2015-2426

Platform-specific bugs

Apply Diversity to Stop Attacks

55

Vulnerability Discovery Exploitation Malicious Activities Success

Adobe implementation bugs

e.g. bugs in element parser CVE-2013-2729

Memory corruption Logic bugs

e.g. bugs in JavaScript API CVE-2014-0521

slide-56
SLIDE 56

e.g. bugs in system library CVE-2015-2426

Platform-specific bugs

Apply Diversity to Stop Attacks

56

Vulnerability Discovery Exploitation Malicious Activities Success

Adobe implementation bugs

e.g. bugs in element parser CVE-2013-2729

Memory corruption Logic bugs

e.g. bugs in JavaScript API CVE-2014-0521

Execute shellcode Load executables Steal sensitive info Drop other exploits Other activities

slide-57
SLIDE 57

e.g. bugs in system library CVE-2015-2426

Platform-specific bugs

Apply Diversity to Stop Attacks

57

Vulnerability Discovery Exploitation Malicious Activities Success

Adobe implementation bugs

e.g. bugs in element parser CVE-2013-2729

Memory corruption Logic bugs

e.g. bugs in JavaScript API CVE-2014-0521

Execute shellcode Load executables Steal sensitive info Drop other exploits Other activities Attacks that cannot be detected with platform diversity

slide-58
SLIDE 58

Bypass PlatPal ?

58

An attacker has to simultaneously compromise all platforms in order to bypass PlatPal.

slide-59
SLIDE 59

Platform-agnostic Attacks

59

  • Heap feng-shui

Predict the address of next allocation and de-allocation.

  • Heap spray and NOP-sled

Alleviate attackers from using precise memory address.

  • Polyglot shellcode trampoline

Find operations that are meaningful on one platform and NOP

  • n the other.
slide-60
SLIDE 60

Limitations of PlatPal

  • User-interaction driven attacks
  • Social engineering attacks

e.g., fake password prompt

  • Other none-determinism to cause divergences

e.g., JavaScript gettime or RNG functions

60

slide-61
SLIDE 61

Potential Deployment of PlatPal

  • Not suitable for on-device analysis.
  • Best suited for cloud storage providers which can scan

for maldocs among existing files or new uploads.

  • Also fits the model of online malware scanning services

like VirusTotal.

  • As a complementary scheme, PlatPal can be integrated

with prior works to provide better prediction accuracy.

61

slide-62
SLIDE 62

Conclusion

  • It is feasible to harvest platform diversity for malicious

document detection.

  • PlatPal raises no false alarms in benign samples and detects

a variety of behavioral discrepancies in malicious samples.

  • PlatPal is scalable with various ways to deploy and integrate.

https://github.com/sslab-gatech/platpal (Source code will be released soon)

62