Random Sampling applied to Rapid Disk Analysis System & Network - - PowerPoint PPT Presentation

random sampling applied to rapid disk analysis
SMART_READER_LITE
LIVE PREVIEW

Random Sampling applied to Rapid Disk Analysis System & Network - - PowerPoint PPT Presentation

Rapid Disk Analysis The Math The Aftermath Conclusions Random Sampling applied to Rapid Disk Analysis System & Network Engineering Research Project Nicolas Canceill July 4, 2013 1/28 Rapid Disk Analysis The Math The Aftermath


slide-1
SLIDE 1

Rapid Disk Analysis The Math The Aftermath Conclusions

Random Sampling applied to Rapid Disk Analysis

System & Network Engineering — Research Project Nicolas Canceill July 4, 2013

1/28

slide-2
SLIDE 2

Rapid Disk Analysis The Math The Aftermath Conclusions

1

Rapid Disk Analysis

2

The Math

3

The Aftermath

4

Conclusions

2/28

slide-3
SLIDE 3

Rapid Disk Analysis The Math The Aftermath Conclusions

Introduction

Background

  • Assoc. Prof. S. Garfinkel — Navy Postgraduate School

Advanced Forensics Format The Sleuth Kit Better analysis for digital evidence “Searching a 1TB hard drive in 10 minutes” (ACM 2013) Research

  • E. van Eijk, Z. Geradts — Nederlands Forensisch Instituut

Stability? Scalability? Precision?

2/28

slide-4
SLIDE 4

Rapid Disk Analysis The Math The Aftermath Conclusions

1

Rapid Disk Analysis

2

The Math

3

The Aftermath

4

Conclusions

3/28

slide-5
SLIDE 5

Rapid Disk Analysis The Math The Aftermath Conclusions

Rapid Analysis: Why?

Traditionally: investigation was “leisurely” Reading a 1TB hard drive: about 3.5h The cost of “seek”: 1 × 36GB ≈ 100, 000 × 64KiB New challenges Large installations: computers room, datacenter. . . Forensics control at checkpoints: border crossing,

  • airports. . .

“The bomb will go off in the next hour!”

4/28

slide-6
SLIDE 6

Rapid Disk Analysis The Math The Aftermath Conclusions

Rapid Analysis: What for?

Profit Indications Data analysis Determine free/wiped space Characterize data based on signatures Hash sectors to look for specific data

5/28

slide-7
SLIDE 7

Rapid Disk Analysis The Math The Aftermath Conclusions

Rapid Analysis: How?

Data characteristics Described (header/trailer) Encoded/formatted Sectorized and distributed Analysis strategies Simplify: hashing Tolerate: extract signature Reduce: random sampling

6/28

slide-8
SLIDE 8

Rapid Disk Analysis The Math The Aftermath Conclusions

Research scope

Research question How can random sampling help forensically investigate hard disk drives? What kind of indications may be provided? Which parameters are in play? Which degree of certainty may be achieved?

7/28

slide-9
SLIDE 9

Rapid Disk Analysis The Math The Aftermath Conclusions

1

Rapid Disk Analysis

2

The Math

3

The Aftermath

4

Conclusions

8/28

slide-10
SLIDE 10

Rapid Disk Analysis The Math The Aftermath Conclusions

Analysis process

Built on top of S. Garfinkel’s frag_find tool

Input Image file to search Data-set/Signatures-set to look for Parameters: hashing, sampling, tolerance Process Build Bloom filter (hashing) Select sample For each block in sample: filter (and compare)

9/28

slide-11
SLIDE 11

Rapid Disk Analysis The Math The Aftermath Conclusions

Random sampling: Basic model

Using a random sample of a statistical population to estimate/predict characteristics Simple scenario “Is this hard drive empty/wiped?” M empty blocks out of N n sampled blocks out of N Error rate The probability to sample only empty blocks: E =

i=n

  • i=1

N − (i − 1) − M N − (i − 1)

10/28

slide-12
SLIDE 12

Rapid Disk Analysis The Math The Aftermath Conclusions

Random sampling: Data layout

Data is sectorized: Data is not always aligned:

11/28

slide-13
SLIDE 13

Rapid Disk Analysis The Math The Aftermath Conclusions

Random sampling: Advanced model

A more realistic scenario “Does this hard drive contain the target block?” All possible offsets: overlap transactions by B − F All possible transactions: N =

  • C

T−(B−F)

  • All target transactions: M =

D

T

  • Error rate

The probability to miss all target blocks: E =

i=n

  • i=1
  • C

T−(B−F)

  • − (i − 1) −

D

T

  • C

T−(B−F)

  • − (i − 1)

12/28

slide-14
SLIDE 14

Rapid Disk Analysis The Math The Aftermath Conclusions

Experimental protocol

Experimental image set Parameters: image size, sector size, % of empty sectors, length of target data, offset size Input: Random files and NSRL Reference DataSet Experimental process Parameters: image size, sector size, transaction size, sampling fraction Randomly select a master file signature Generate several images (length of target data, % of empty sectors) Successively run several timed searches

13/28

slide-15
SLIDE 15

Rapid Disk Analysis The Math The Aftermath Conclusions

1

Rapid Disk Analysis

2

The Math

3

The Aftermath

4

Conclusions

14/28

slide-16
SLIDE 16

Rapid Disk Analysis The Math The Aftermath Conclusions

Results: statistical distribution

100 101 102 103 104 0.1 0.2 0.3 0.4 0.5 0.6

  • Nb. of transactions

Presence of target data

15/28

slide-17
SLIDE 17

Rapid Disk Analysis The Math The Aftermath Conclusions

Results: block-to-transaction scaling

100 101 102 103 104 0.2 0.4 0.6 0.8 1 Sample size (blocks)

  • Avg. error variance

Transaction size 2 blocks 4 blocks 8 blocks

16/28

slide-18
SLIDE 18

Rapid Disk Analysis The Math The Aftermath Conclusions

Results: precision scaling

100 101 102 103 104 2 · 10−2 4 · 10−2 6 · 10−2 8 · 10−2 0.1 0.12 0.14

  • Nb. of transactions
  • Avg. error variance

Image size 2MB 4MB 10MB 20MB

17/28

slide-19
SLIDE 19

Rapid Disk Analysis The Math The Aftermath Conclusions

Results: time scaling

100 101 102 103 104 105 10−3 10−2 10−1

  • Nb. of sampled blocks
  • Avg. search time (seconds)

Image size 200kB 400kB 1MB 2MB 4MB 10MB 20MB 40MB 100MB

18/28

slide-20
SLIDE 20

Rapid Disk Analysis The Math The Aftermath Conclusions

Results: time overhead

100 101 102 10−3.8 10−3.6 10−3.4 10−3.2

  • Nb. of transactions
  • Avg. search time (seconds)

Image size 2MB 4MB 10MB 20MB

19/28

slide-21
SLIDE 21

Rapid Disk Analysis The Math The Aftermath Conclusions

1

Rapid Disk Analysis

2

The Math

3

The Aftermath

4

Conclusions

20/28

slide-22
SLIDE 22

Rapid Disk Analysis The Math The Aftermath Conclusions

Contributions

Main findings Parameters analyzed: Image characteristics: image size, sector size, data alignment, size of target data Sampling settings: sample size, transaction size, tolerance Scalability: Sample size scales with time: S ∼ t Error rate scales with time: E ∼

1 √t

Public material Fork of S. Garfinkel’s tools on GitHub Most of experimental scripts on Gist

21/28

slide-23
SLIDE 23

Rapid Disk Analysis The Math The Aftermath Conclusions

Research answers

What kind of indications may be provided? Presence/absence of target data or signature Which parameters are in play? Disk and data characteristics Sampling parameters Which degree of certainty may be achieved? Certainty scales well with time Insight about target disk will improve certainty Random sampling is a powerful, scalable, adaptive technique for fast HDD analysis Efficiency relies on suitable sampling settings, and limited insight on target HDD

22/28

slide-24
SLIDE 24

Rapid Disk Analysis The Math The Aftermath Conclusions

Further research

Improving insight of target Pre-determine sector size, data alignment Look for optimal block-to-transaction ratio One step further: pre-sampling Automate decision process Optimal time spending Automatic settings balance Simple user-side: time or certainty

23/28

slide-25
SLIDE 25

Rapid Disk Analysis The Math The Aftermath Conclusions

Appendix 1: Bloom Filter (a)

Hash-based filtering technique Initialize An array of n bits set to zero k different hash functions uniformly mapping to [0 − n] Add an element Apply functions to compute k integers in [0 − n] Set k corresponding bits to 1 Query an element Apply functions to compute k integers in [0 − n] Check if k corresponding bits are all 1

24/28

slide-26
SLIDE 26

Rapid Disk Analysis The Math The Aftermath Conclusions

Appendix 1: Bloom Filter (b)

100 101 102 103 104 1 2 3 4 5 6 ·10−2

  • Nb. of transactions
  • Avg. error variance

Bloom filter size 8 bits 32bits

25/28

slide-27
SLIDE 27

Rapid Disk Analysis The Math The Aftermath Conclusions

Appendix 1: Bloom Filter (c)

100 101 102 103 104 0.2 0.4 0.6 0.8 1 1.2

  • Nb. of transactions
  • Avg. building and search time (seconds)

Bloom filter size 8 bits 16 bits 24 bits 30 bits 31 bits 32 bits

26/28

slide-28
SLIDE 28

Rapid Disk Analysis The Math The Aftermath Conclusions

Appendix 2: Data layout (a)

Optimal transaction size depends on sector size Best case: Worst case:

27/28

slide-29
SLIDE 29

Rapid Disk Analysis The Math The Aftermath Conclusions

Appendix 2: Data layout (b)

Optimal transaction size depends on data layout

28/28