On-demand Inter-process Information Flow Tracking Yang Ji, Sangho - - PowerPoint PPT Presentation

on demand inter process information
SMART_READER_LITE
LIVE PREVIEW

On-demand Inter-process Information Flow Tracking Yang Ji, Sangho - - PowerPoint PPT Presentation

RAIN: Refinable Attack Investigation with On-demand Inter-process Information Flow Tracking Yang Ji, Sangho Lee, Evan Downing, Weiren Wang, Mattia Fazzini, Taesoo Kim, Alessandro Orso, and Wenke Lee ACM CCS 2017 Oct 31, 2017 More and more


slide-1
SLIDE 1

RAIN: Refinable Attack Investigation with On-demand Inter-process Information Flow Tracking

Yang Ji, Sangho Lee, Evan Downing, Weiren Wang, Mattia Fazzini, Taesoo Kim, Alessandro Orso, and Wenke Lee

ACM CCS 2017 Oct 31, 2017

slide-2
SLIDE 2

More and more data breaches

2

658 558 819 924 1029 853 1155 815 918 513 1594 428 2459 316 427 665 721 1901 500 1000 1500 2000 2500 3000 2013-H1 2013-H2 2014-H1 2014-H2 2015-H1 2015-H2 2016-H1 2016-H2 2017-H1

DATA BREACHES (SOURCE: BREACH LEVEL INDEX BY GEMALTO)

Number of data breaches Number of breached records (mil)

slide-3
SLIDE 3

Is attack investigation accurate?

3

send send send send read read read A B C

“Hmm, I only want C!”

A, B, or C ?

Dependency confusion!

slide-4
SLIDE 4

4

“Let me change the offer price.”

recv write File archive write write write read ? ? ?

Is this file affected ?

Dependency confusion!

slide-5
SLIDE 5

Related work

Accuracy Runtime Efficiency Analysis Efficiency

  • System-call-based
  • DTrace, Protracer, LSM, Hi-Fi
  • Dynamic Information Flow Tracking (DIFT)
  • Panorama, Dtracker
  • DIFT + Record replay
  • Arnold

4

slide-6
SLIDE 6

RAIN

Accuracy Runtime Efficiency Analysis Efficiency

  • We use
  • Record replay
  • Graph-based pruning
  • Selective DIFT
  • We achieve
  • High accuracy
  • Runtime efficiency
  • Highly improved analysis efficiency

RAIN

5

slide-7
SLIDE 7

Threat model

  • Trusts the OS
  • RAIN tracks user-level attacks.
  • Tracks explicit channels
  • Side or covert channel is out of scope.
  • Records all attacks from their inception
  • Hardware trojans or OS backdoor is out of scope.

7

slide-8
SLIDE 8

8

Analysis host Provenance graph builder Triggering, reachability analysis Replay and selective DIFT RAIN Kernel Module Customized libc Target host Logs Coarse-level graph Pruned sub-graph Refined sub-graph Prune Refine

Architecture

slide-9
SLIDE 9

OS-level record replay

File Socket Randomness

External inputs

1.Records external inputs 2.Captures the thread switching from the pthread interface, not the produced internal data 3.Records system-wide executions

IPC Thread 1 Thread 2 Process group

Internal data

Thread switching (via Pthread)

9

slide-10
SLIDE 10

Coarse-level logging and graph building

  • Keeps logging system-call events
  • Constructs a graph to represent:
  • the processes, files, and sockets as nodes
  • the events as causality edges

10

C B

P1

A

Send Read Read

A: Attacker site B: /docs/report.doc C: /tmp/errors.zip P1: /usr/bin/firefox

slide-11
SLIDE 11

Pruning

  • Does every recorded execution need replay and DIFT?
  • Prunes the data in the graph based on trigger analysis results
  • Upstream
  • Downstream
  • Point-to-point
  • Interference

No!

11

slide-12
SLIDE 12

F

P3

C E

P2

B

P1

A D

Send Read Read Write Write Read Read Mmap

A: Attacker site B: /docs/report.doc C: /tmp/errors.zip D: /docs/ctct1.csv E: /docs/ctct2.pdf F: /docs/loss.csv P1: /usr/bin/firefox P2: /usr/bin/TextEditor P3: /bin/gzip

Upstream

12

slide-13
SLIDE 13

P3

C E

P2

B

P1

A D

Read Write Write Read Write Read Write Read

A: Tampered file /docs/ctct.csv B: Seasonal report docs/s1.csv C: Seasonal report docs/s2.csv D: Budget report docs/bgt.csv E: Half-year report docs/h2.pdf P1: Spreadsheet editor P2: Auto-budget program P3: Auto-report program

Downstream

13

slide-14
SLIDE 14

Point-to-point

F

P3

C E

P2

B A D

P1 P4

Read Read Read Read Read Write Write Write Write Send

1 2

A: Tampered file /docs/ctct.csv B: Seasonal report docs/s1.csv C: Seasonal report docs/s2.csv D: Budget report docs/bgt.csv E: Half-year report docs/h2.pdf F: Document archive server P1: Spreadsheet editor P2: Auto-budget program P3: Auto-report program P4: Firefox browser

14

slide-15
SLIDE 15

Interference

  • Insight: only inbound and outbound files that interfere in a

process will possibly produce causality.

  • We determine interference according to the time order of inbound and
  • utbound IO events.

D B P2 Write Read t1 t2 t1<t2 C F P3 t1 t2 t3 E Read Mmap Write t1< t2< t3

15

slide-16
SLIDE 16

Refinement - selective DIFT

  • Replays and conducts DIFT to the necessary part of the

execution

  • Aggregation
  • Upstream
  • Downstream
  • Point-to-point

16

slide-17
SLIDE 17

F

P3

C E

P2

B

P1

A D

Send Read Read Write Write Read Read Mmap

Upstream refinement

17

slide-18
SLIDE 18

Implementation summary

  • RAIN is built on top of:
  • Arnold, the record replay framework
  • Dtracker (Libdft) and Dytan, the taint engines

Host Module LoC Target host Kernel module 2,200 C (Diff) Trace logistics 1,100 C Analysis host Provenance graph 6,800 C++ Trigger/Pruning 1,100 Python Selective refinement 900 Python DIFT Pin tools 3,500 C/C++ (Diff)

18

slide-19
SLIDE 19

Evaluations

  • Runtime performance
  • Accuracy
  • Analysis cost
  • Storage footprint

19

slide-20
SLIDE 20

20

Runtime overhead: 3.22% SPEC CPU2006

0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 120.00% 140.00% 160.00%

bzip2 perlbench calculix gamess bwaves sjeng

  • mnetpp

mcf astar h264ref hmmer xalancbmk gobmk libquantum sphinx3 milc zeusmp gromacs leslie3d namd lbm dealII soplex povray GemsFDTD tonto wrf gcc

Logging Logging+Recording

slide-21
SLIDE 21

Multi-thread runtime overhead: 5.35% SPLASH-3

21

0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 120.00% 140.00%

  • cean-c
  • cean-n

fmm radiosity water-n water-s barnes volrend

slide-22
SLIDE 22

IO intensive application: less than 50%

22

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

kernel copy movie download libc compilation Firefox session Logging Logging+Recording

slide-23
SLIDE 23

23

0.00% 20.00% 40.00% 60.00% 80.00% 100.00% Screengrab Cameragrab Audiograb NetRecon Motivating Example

90.50% 32% 39.70% 84.70% 67.00% 0% 0% 0% 13% 0%

Dependency confusion rate

Coarse-level Fine-level

High analysis accuracy

Scenarios from red team exercise of DARPA Transparent Computing program

slide-24
SLIDE 24

Pruning effectiveness: ~94.2% reduction

24

100 200 300 400 500 600 700 800

Screengrab Cameragrab Audiograb NetRecon Motive Example

99 141 310 138 720 5 19 11 13 34

Taint workload: #processes

None RAIN

slide-25
SLIDE 25

Storage cost: ~4GB per day (1.5TB per year)

25

4000 740 200.6 166.1 133.6 105 113.9

500 1000 1500 2000 2500 3000 3500 4000 4500

Per day desktop Libc compilation Motive Example NetRecon Audiograb Cameragrab Screengrab

Storage overhead (MB)

slide-26
SLIDE 26

Discussion

  • Limitations
  • RAIN trusts the OS that needs kernel integrity protection.
  • Over-tainting issue
  • Direction
  • Hypervisor-based RAIN
  • Further reduce storage overhead

26

slide-27
SLIDE 27

Conclusion

  • RAIN adopts a multi-level provenance system to facilitate fine-

grained analysis that enables accurate attack investigation.

  • RAIN has low runtime overhead, as well as significantly

improved analysis cost.

27