On-demand Inter-process Information Flow Tracking Yang Ji, Sangho - - PowerPoint PPT Presentation

on demand inter process information
SMART_READER_LITE
LIVE PREVIEW

On-demand Inter-process Information Flow Tracking Yang Ji, Sangho - - PowerPoint PPT Presentation

RAIN: Refinable Attack Investigation with On-demand Inter-process Information Flow Tracking Yang Ji, Sangho Lee, Evan Downing, Weiren Wang, Mattia Fazzini, Taesoo Kim, Alessandro Orso, and Wenke Lee ACM CCS 2017 Oct 31, 2017 More and more


slide-1
SLIDE 1

RAIN: Refinable Attack Investigation with On-demand Inter-process Information Flow Tracking

Yang Ji, Sangho Lee, Evan Downing, Weiren Wang, Mattia Fazzini, Taesoo Kim, Alessandro Orso, and Wenke Lee

ACM CCS 2017 Oct 31, 2017

slide-2
SLIDE 2

More and more data breaches

2

slide-3
SLIDE 3

More and more data breaches

2

658 558 819 924 1029 853 1155 815 918 513 1594 428 2459 316 427 665 721 1901 500 1000 1500 2000 2500 3000 2013-H12013-H22014-H12014-H22015-H12015-H22016-H12016-H22017-H1

DATA BREACHES (SOURCE: BREACH LEVEL INDEX BY GEMALTO)

Number of data breaches Number of breached records (mil)

slide-4
SLIDE 4

Is attack investigation accurate?

3

slide-5
SLIDE 5

Is attack investigation accurate?

3

A B C

slide-6
SLIDE 6

Is attack investigation accurate?

3

read A B C

slide-7
SLIDE 7

Is attack investigation accurate?

3

read read A B C

slide-8
SLIDE 8

Is attack investigation accurate?

3

read read read A B C

slide-9
SLIDE 9

Is attack investigation accurate?

3

read read read A B C

“Hmm, I only want C!”

slide-10
SLIDE 10

Is attack investigation accurate?

3

send send send send read read read A B C

“Hmm, I only want C!”

slide-11
SLIDE 11

Is attack investigation accurate?

3

send send send send read read read A B C

“Hmm, I only want C!”

A, B, or C ?

slide-12
SLIDE 12

Is attack investigation accurate?

3

send send send send read read read A B C

“Hmm, I only want C!”

A, B, or C ?

Dependency confusion!

slide-13
SLIDE 13

4

File archive

slide-14
SLIDE 14

4

recv write File archive

slide-15
SLIDE 15

4

“Let me change the offer price.”

recv write File archive

slide-16
SLIDE 16

4

“Let me change the offer price.”

recv write File archive

slide-17
SLIDE 17

4

“Let me change the offer price.”

recv write File archive write write write read

slide-18
SLIDE 18

4

“Let me change the offer price.”

recv write File archive write write write read ? ? ?

Is this file affected ?

slide-19
SLIDE 19

4

“Let me change the offer price.”

recv write File archive write write write read ? ? ?

Is this file affected ?

Dependency confusion!

slide-20
SLIDE 20

Related work

Accuracy Runtime Efficiency Analysis Efficiency

4

slide-21
SLIDE 21

Related work

Accuracy Runtime Efficiency Analysis Efficiency

  • System-call-based
  • DTrace, Protracer, LSM, Hi-Fi

4

slide-22
SLIDE 22

Related work

Accuracy Runtime Efficiency Analysis Efficiency

  • System-call-based
  • DTrace, Protracer, LSM, Hi-Fi

4

slide-23
SLIDE 23

Related work

Accuracy Runtime Efficiency Analysis Efficiency

  • System-call-based
  • DTrace, Protracer, LSM, Hi-Fi
  • Dynamic Information Flow Tracking (DIFT)
  • Panorama, Dtracker

4

slide-24
SLIDE 24

Related work

Accuracy Runtime Efficiency Analysis Efficiency

  • System-call-based
  • DTrace, Protracer, LSM, Hi-Fi
  • Dynamic Information Flow Tracking (DIFT)
  • Panorama, Dtracker

4

slide-25
SLIDE 25

Related work

Accuracy Runtime Efficiency Analysis Efficiency

  • System-call-based
  • DTrace, Protracer, LSM, Hi-Fi
  • Dynamic Information Flow Tracking (DIFT)
  • Panorama, Dtracker
  • DIFT + Record replay
  • Arnold

4

slide-26
SLIDE 26

Related work

Accuracy Runtime Efficiency Analysis Efficiency

  • System-call-based
  • DTrace, Protracer, LSM, Hi-Fi
  • Dynamic Information Flow Tracking (DIFT)
  • Panorama, Dtracker
  • DIFT + Record replay
  • Arnold

4

slide-27
SLIDE 27

RAIN

Accuracy Runtime Efficiency Analysis Efficiency

5

slide-28
SLIDE 28

RAIN

Accuracy Runtime Efficiency Analysis Efficiency

  • We use
  • Record replay
  • Graph-based pruning
  • Selective DIFT

5

slide-29
SLIDE 29

RAIN

Accuracy Runtime Efficiency Analysis Efficiency

  • We use
  • Record replay
  • Graph-based pruning
  • Selective DIFT
  • We achieve
  • High accuracy
  • Runtime efficiency
  • Highly improved analysis efficiency

RAIN

5

slide-30
SLIDE 30

Threat model

  • Trusts the OS
  • RAIN tracks user-level attacks.
  • Tracks explicit channels
  • Side or covert channel is out of scope.
  • Records all attacks from their inception
  • Hardware trojans or OS backdoor is out of scope.

8

slide-31
SLIDE 31

9

Analysis host Target host

Architecture

slide-32
SLIDE 32

9

Analysis host RAIN Customized Kernel Target host

Architecture

slide-33
SLIDE 33

9

Analysis host RAIN Customized Kernel Customized libc Target host

Architecture

slide-34
SLIDE 34

9

Analysis host RAIN Customized Kernel Customized libc Target host Logs

Architecture

slide-35
SLIDE 35

9

Analysis host Provenance graph builder RAIN Customized Kernel Customized libc Target host Logs Coarse-level graph

Architecture

slide-36
SLIDE 36

9

Analysis host Provenance graph builder Triggering, reachability analysis RAIN Customized Kernel Customized libc Target host Logs Coarse-level graph Pruned sub-graph Prune

Architecture

slide-37
SLIDE 37

9

Analysis host Provenance graph builder Triggering, reachability analysis Replay and selective DIFT RAIN Customized Kernel Customized libc Target host Logs Coarse-level graph Pruned sub-graph Refined sub-graph Prune Refine

Architecture

slide-38
SLIDE 38

OS-level record replay

1.Records external inputs 2.Captures the thread switching from the pthread interface, not the produced internal data 3.Records system-wide executions

10

slide-39
SLIDE 39

OS-level record replay

1.Records external inputs 2.Captures the thread switching from the pthread interface, not the produced internal data 3.Records system-wide executions

Thread 1

10

slide-40
SLIDE 40

OS-level record replay

Socket

External inputs

1.Records external inputs 2.Captures the thread switching from the pthread interface, not the produced internal data 3.Records system-wide executions

Thread 1

10

slide-41
SLIDE 41

OS-level record replay

File Socket

External inputs

1.Records external inputs 2.Captures the thread switching from the pthread interface, not the produced internal data 3.Records system-wide executions

Thread 1

10

slide-42
SLIDE 42

OS-level record replay

File Socket Randomness

External inputs

1.Records external inputs 2.Captures the thread switching from the pthread interface, not the produced internal data 3.Records system-wide executions

Thread 1

10

slide-43
SLIDE 43

OS-level record replay

File Socket Randomness

External inputs

1.Records external inputs 2.Captures the thread switching from the pthread interface, not the produced internal data 3.Records system-wide executions

Thread 1 Process group

10

slide-44
SLIDE 44

OS-level record replay

File Socket Randomness

External inputs

1.Records external inputs 2.Captures the thread switching from the pthread interface, not the produced internal data 3.Records system-wide executions

Thread 1 Thread 2 Process group

10

slide-45
SLIDE 45

OS-level record replay

File Socket Randomness

External inputs

1.Records external inputs 2.Captures the thread switching from the pthread interface, not the produced internal data 3.Records system-wide executions

IPC Thread 1 Thread 2 Process group

Internal data

Thread switching (via Pthread)

10

slide-46
SLIDE 46

OS-level record replay

File Socket Randomness

External inputs

1.Records external inputs 2.Captures the thread switching from the pthread interface, not the produced internal data 3.Records system-wide executions

IPC Thread 1 Thread 2 Process group

Internal data

Thread switching (via Pthread)

10

slide-47
SLIDE 47

Coarse-level logging and graph building

  • Keeps logging system-call events
  • Constructs a graph to represent:
  • the processes, files, and sockets as nodes
  • the events as causality edges

11

A: Attacker site B: /docs/report.doc C: /tmp/errors.zip P1: /usr/bin/firefox

slide-48
SLIDE 48

Coarse-level logging and graph building

  • Keeps logging system-call events
  • Constructs a graph to represent:
  • the processes, files, and sockets as nodes
  • the events as causality edges

11

P1

A: Attacker site B: /docs/report.doc C: /tmp/errors.zip P1: /usr/bin/firefox

slide-49
SLIDE 49

Coarse-level logging and graph building

  • Keeps logging system-call events
  • Constructs a graph to represent:
  • the processes, files, and sockets as nodes
  • the events as causality edges

11

B

P1

Read

A: Attacker site B: /docs/report.doc C: /tmp/errors.zip P1: /usr/bin/firefox

slide-50
SLIDE 50

Coarse-level logging and graph building

  • Keeps logging system-call events
  • Constructs a graph to represent:
  • the processes, files, and sockets as nodes
  • the events as causality edges

11

C B

P1

Read Read

A: Attacker site B: /docs/report.doc C: /tmp/errors.zip P1: /usr/bin/firefox

slide-51
SLIDE 51

Coarse-level logging and graph building

  • Keeps logging system-call events
  • Constructs a graph to represent:
  • the processes, files, and sockets as nodes
  • the events as causality edges

11

C B

P1

A

Send Read Read

A: Attacker site B: /docs/report.doc C: /tmp/errors.zip P1: /usr/bin/firefox

slide-52
SLIDE 52
  • Does every recorded execution need replay and DIFT?

12

slide-53
SLIDE 53
  • Does every recorded execution need replay and DIFT? No!

12

slide-54
SLIDE 54

Pruning

  • Does every recorded execution need replay and DIFT?
  • Prunes the data in the graph based on trigger analysis results
  • Upstream
  • Downstream
  • Point-to-point
  • Interference

No!

12

slide-55
SLIDE 55

A: Attacker site B: /docs/report.doc C: /tmp/errors.zip D: /docs/ctct1.csv E: /docs/ctct2.pdf F: /docs/loss.csv P1: /usr/bin/firefox P2: /usr/bin/TextEditor P3: /bin/gzip

Upstream

13

slide-56
SLIDE 56

A

A: Attacker site B: /docs/report.doc C: /tmp/errors.zip D: /docs/ctct1.csv E: /docs/ctct2.pdf F: /docs/loss.csv P1: /usr/bin/firefox P2: /usr/bin/TextEditor P3: /bin/gzip

Upstream

13

slide-57
SLIDE 57

F

P3

C E

P2

B

P1

A D

Send Read Read Write Write Read Read Mmap

A: Attacker site B: /docs/report.doc C: /tmp/errors.zip D: /docs/ctct1.csv E: /docs/ctct2.pdf F: /docs/loss.csv P1: /usr/bin/firefox P2: /usr/bin/TextEditor P3: /bin/gzip

Upstream

13

slide-58
SLIDE 58

A: Tampered file /docs/ctct.csv B: Seasonal report docs/s1.csv C: Seasonal report docs/s2.csv D: Budget report docs/bgt.csv E: Half-year report docs/h2.pdf P1: Spreadsheet editor P2: Auto-budget program P3: Auto-report program

Downstream

14

slide-59
SLIDE 59

A

A: Tampered file /docs/ctct.csv B: Seasonal report docs/s1.csv C: Seasonal report docs/s2.csv D: Budget report docs/bgt.csv E: Half-year report docs/h2.pdf P1: Spreadsheet editor P2: Auto-budget program P3: Auto-report program

Downstream

14

slide-60
SLIDE 60

P3

C E

P2

B

P1

A D

Read Write Write Read Write Read Write Read

A: Tampered file /docs/ctct.csv B: Seasonal report docs/s1.csv C: Seasonal report docs/s2.csv D: Budget report docs/bgt.csv E: Half-year report docs/h2.pdf P1: Spreadsheet editor P2: Auto-budget program P3: Auto-report program

Downstream

14

slide-61
SLIDE 61

Point-to-point

E A

A: Tampered file /docs/ctct.csv B: Seasonal report docs/s1.csv C: Seasonal report docs/s2.csv D: Budget report docs/bgt.csv E: Half-year report docs/h2.pdf F: Document archive server P1: Spreadsheet editor P2: Auto-budget program P3: Auto-report program P4: Firefox browser

15

slide-62
SLIDE 62

Point-to-point

F

P3

C E

P2

B A D

P1 P4

Read Read Read Read Read Write Write Write Write Send

A: Tampered file /docs/ctct.csv B: Seasonal report docs/s1.csv C: Seasonal report docs/s2.csv D: Budget report docs/bgt.csv E: Half-year report docs/h2.pdf F: Document archive server P1: Spreadsheet editor P2: Auto-budget program P3: Auto-report program P4: Firefox browser

15

slide-63
SLIDE 63

Point-to-point

F

P3

C E

P2

B A D

P1 P4

Read Read Read Read Read Write Write Write Write Send

1 2

A: Tampered file /docs/ctct.csv B: Seasonal report docs/s1.csv C: Seasonal report docs/s2.csv D: Budget report docs/bgt.csv E: Half-year report docs/h2.pdf F: Document archive server P1: Spreadsheet editor P2: Auto-budget program P3: Auto-report program P4: Firefox browser

15

slide-64
SLIDE 64

Point-to-point

F

P3

C E

P2

B A D

P1 P4

Read Read Read Read Read Write Write Write Write Send

1 2

A: Tampered file /docs/ctct.csv B: Seasonal report docs/s1.csv C: Seasonal report docs/s2.csv D: Budget report docs/bgt.csv E: Half-year report docs/h2.pdf F: Document archive server P1: Spreadsheet editor P2: Auto-budget program P3: Auto-report program P4: Firefox browser

15

slide-65
SLIDE 65

Interference

  • Insight: only inbound and outbound files that interfere in a

process will possibly produce causality.

  • We determine interference according to the time order of inbound and
  • utbound IO events.

16

slide-66
SLIDE 66

Interference

  • Insight: only inbound and outbound files that interfere in a

process will possibly produce causality.

  • We determine interference according to the time order of inbound and
  • utbound IO events.

D B P2

16

slide-67
SLIDE 67

Interference

  • Insight: only inbound and outbound files that interfere in a

process will possibly produce causality.

  • We determine interference according to the time order of inbound and
  • utbound IO events.

D B P2 Write t1

16

slide-68
SLIDE 68

Interference

  • Insight: only inbound and outbound files that interfere in a

process will possibly produce causality.

  • We determine interference according to the time order of inbound and
  • utbound IO events.

D B P2 Write Read t1 t2 t1<t2

16

slide-69
SLIDE 69

Interference

  • Insight: only inbound and outbound files that interfere in a

process will possibly produce causality.

  • We determine interference according to the time order of inbound and
  • utbound IO events.

D B P2 Write Read t1 t2 t1<t2 C F P3 E

16

slide-70
SLIDE 70

Interference

  • Insight: only inbound and outbound files that interfere in a

process will possibly produce causality.

  • We determine interference according to the time order of inbound and
  • utbound IO events.

D B P2 Write Read t1 t2 t1<t2 C F P3 t1 E Read

16

slide-71
SLIDE 71

Interference

  • Insight: only inbound and outbound files that interfere in a

process will possibly produce causality.

  • We determine interference according to the time order of inbound and
  • utbound IO events.

D B P2 Write Read t1 t2 t1<t2 C F P3 t1 t2 E Read Mmap

16

slide-72
SLIDE 72

Interference

  • Insight: only inbound and outbound files that interfere in a

process will possibly produce causality.

  • We determine interference according to the time order of inbound and
  • utbound IO events.

D B P2 Write Read t1 t2 t1<t2 C F P3 t1 t2 t3 E Read Mmap Write t1< t2< t3

16

slide-73
SLIDE 73

Interference

  • Insight: only inbound and outbound files that interfere in a

process will possibly produce causality.

  • We determine interference according to the time order of inbound and
  • utbound IO events.

D B P2 Write Read t1 t2 t1<t2 C F P3 t1 t2 t3 E Read Mmap Write t1< t2< t3

16

slide-74
SLIDE 74

Refinement - selective DIFT

  • Replays and conducts DIFT to the necessary part of the

execution

  • Aggregation
  • Upstream
  • Downstream
  • Point-to-point

17

slide-75
SLIDE 75

F

P3

C E

P2

B

P1

A D

Send Read Read Write Write Read Read Mmap

Upstream refinement

18

slide-76
SLIDE 76

F

P3

C E

P2

B

P1

A D

Send Read Read Write Write Read Read Mmap

Upstream refinement

18

slide-77
SLIDE 77

F

P3

C E

P2

B

P1

A D

Send Read Read Write Write Read Read Mmap

Upstream refinement

18

slide-78
SLIDE 78

F

P3

C E

P2

B

P1

A D

Send Read Read Write Write Read Read Mmap

Upstream refinement

18

slide-79
SLIDE 79

Implementation summary

  • RAIN is built on top of:
  • Arnold, the record replay framework
  • Dtracker (Libdft) and Dytan, the taint engines

Host Module LoC Target host Kernel module 2,200 C (Diff) Trace logistics 1,100 C Analysis host Provenance graph 6,800 C++ Trigger/Pruning 1,100 Python Selective refinement 900 Python DIFT Pin tools 3,500 C/C++ (Diff)

19

slide-80
SLIDE 80

Evaluations

  • Runtime performance
  • Accuracy
  • Analysis cost
  • Storage footprint

20

slide-81
SLIDE 81

21

Runtime overhead: 3.22% SPEC CPU2006

0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 120.00% 140.00% 160.00%

bzip2 perlbench calculix gamess bwaves sjeng

  • mnetpp

mcf astar h264ref hmmer xalancbmk gobmk libquantum sphinx3 milc zeusmp gromacs leslie3d namd lbm dealII soplex povray GemsFDTD tonto wrf gcc

Logging Logging+Recording

slide-82
SLIDE 82

Multi-thread runtime overhead: 5.35% SPLASH-3

22

0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 120.00% 140.00%

  • cean-c ocean-n

fmm radiosity water-n water-s barnes volrend

slide-83
SLIDE 83

IO intensive application: less than 50%

23

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

kernel copy movie download libc compilation Firefox session Logging Logging+Recording

slide-84
SLIDE 84

24

0.00% 20.00% 40.00% 60.00% 80.00% 100.00% Screengrab Cameragrab Audiograb NetRecon Motivating Example

90.50% 32% 39.70% 84.70% 67.00% 0% 0% 0% 13% 0%

Dependency confusion rate

Coarse-level Fine-level

High analysis accuracy

Scenarios from red team exercise of DARPA Transparent Computing program

slide-85
SLIDE 85

Pruning effectiveness: ~94.2% reduction

25

100 200 300 400 500 600 700 800

Screengrab Cameragrab Audiograb NetRecon Motive Example

99 141 310 138 720 5 19 11 13 34

Taint workload: #processes

None RAIN

slide-86
SLIDE 86

Storage cost: ~4GB per day (1.5TB per year)

26

4000 740 200.6 166.1 133.6 105 113.9

500 1000 1500 2000 2500 3000 3500 4000 4500

Per day desktop Libc compilation Motive Example NetRecon Audiograb Cameragrab Screengrab

Storage overhead (MB)

slide-87
SLIDE 87

Discussion

  • Limitations
  • RAIN trusts the OS that needs kernel integrity protection.
  • Over-tainting issue
  • Direction
  • Hypervisor-based RAIN
  • Further reduce storage overhead

27

slide-88
SLIDE 88

Conclusion

  • RAIN adopts a multi-level provenance system to facilitate fine-

grained analysis that enables accurate attack investigation.

  • RAIN has low runtime overhead, as well as significantly

improved analysis cost.

28