Random Sampling applied to Rapid Disk Analysis System & Network - PowerPoint PPT Presentation

Rapid Disk Analysis The Math The Aftermath Conclusions Random Sampling applied to Rapid Disk Analysis System & Network Engineering — Research Project Nicolas Canceill July 4, 2013 1/28

Rapid Disk Analysis The Math The Aftermath Conclusions Rapid Disk Analysis 1 The Math 2 The Aftermath 3 Conclusions 4 2/28

Rapid Disk Analysis The Math The Aftermath Conclusions Introduction Background Assoc. Prof. S. Garfinkel — Navy Postgraduate School Advanced Forensics Format The Sleuth Kit Better analysis for digital evidence “Searching a 1TB hard drive in 10 minutes” (ACM 2013) Research E. van Eijk, Z. Geradts — Nederlands Forensisch Instituut Stability? Scalability? Precision? 2/28

Rapid Disk Analysis The Math The Aftermath Conclusions Rapid Analysis: Why? Traditionally: investigation was “leisurely” Reading a 1TB hard drive: about 3.5h The cost of “seek”: 1 × 36GB ≈ 100 , 000 × 64KiB New challenges Large installations: computers room, datacenter. . . Forensics control at checkpoints: border crossing, airports. . . “The bomb will go off in the next hour!” 4/28

Rapid Disk Analysis The Math The Aftermath Conclusions Rapid Analysis: What for? Profit Indications Data analysis Determine free/wiped space Characterize data based on signatures Hash sectors to look for specific data 5/28

Rapid Disk Analysis The Math The Aftermath Conclusions Rapid Analysis: How? Data characteristics Described (header/trailer) Encoded/formatted Sectorized and distributed Analysis strategies Simplify: hashing Tolerate: extract signature Reduce: random sampling 6/28

Rapid Disk Analysis The Math The Aftermath Conclusions Research scope Research question How can random sampling help forensically investigate hard disk drives? What kind of indications may be provided? Which parameters are in play? Which degree of certainty may be achieved? 7/28

Rapid Disk Analysis The Math The Aftermath Conclusions Analysis process Built on top of S. Garfinkel’s frag_find tool Input Image file to search Data-set/Signatures-set to look for Parameters: hashing, sampling , tolerance Process Build Bloom filter (hashing) Select sample For each block in sample : filter (and compare) 9/28

Rapid Disk Analysis The Math The Aftermath Conclusions Random sampling: Basic model Using a random sample of a statistical population to estimate/predict characteristics Simple scenario “Is this hard drive empty/wiped?” M empty blocks out of N n sampled blocks out of N Error rate The probability to sample only empty blocks: i = n N − ( i − 1 ) − M � E = N − ( i − 1 ) i = 1 10/28

Rapid Disk Analysis The Math The Aftermath Conclusions Random sampling: Data layout Data is sectorized: Data is not always aligned: 11/28

Rapid Disk Analysis The Math The Aftermath Conclusions Random sampling: Advanced model A more realistic scenario “Does this hard drive contain the target block?” All possible offsets: overlap transactions by B − F � � C All possible transactions: N = T − ( B − F ) � D � All target transactions: M = T Error rate The probability to miss all target blocks: � D � � C � − ( i − 1 ) − i = n T − ( B − F ) T � E = � � C − ( i − 1 ) i = 1 T − ( B − F ) 12/28

Rapid Disk Analysis The Math The Aftermath Conclusions Experimental protocol Experimental image set Parameters : image size, sector size, % of empty sectors, length of target data, offset size Input : Random files and NSRL Reference DataSet Experimental process Parameters : image size, sector size, transaction size, sampling fraction Randomly select a master file signature Generate several images (length of target data, % of empty sectors) Successively run several timed searches 13/28

Rapid Disk Analysis The Math The Aftermath Conclusions Results: statistical distribution 0 . 6 0 . 5 Presence of target data 0 . 4 0 . 3 0 . 2 0 . 1 0 10 0 10 1 10 2 10 3 10 4 Nb. of transactions 15/28

Rapid Disk Analysis The Math The Aftermath Conclusions Results: block-to-transaction scaling 1 Transaction size 2 blocks 0 . 8 4 blocks Avg. error variance 8 blocks 0 . 6 0 . 4 0 . 2 0 10 0 10 1 10 2 10 3 10 4 Sample size (blocks) 16/28

Rapid Disk Analysis The Math The Aftermath Conclusions Results: precision scaling 0 . 14 Image size 0 . 12 2MB 4MB Avg. error variance 0 . 1 10MB 20MB 8 · 10 − 2 6 · 10 − 2 4 · 10 − 2 2 · 10 − 2 0 10 0 10 1 10 2 10 3 10 4 Nb. of transactions 17/28

Rapid Disk Analysis The Math The Aftermath Conclusions Results: time scaling Image size 10 − 1 Avg. search time (seconds) 200kB 400kB 1MB 2MB 10 − 2 4MB 10MB 20MB 10 − 3 40MB 100MB 10 0 10 1 10 2 10 3 10 4 10 5 Nb. of sampled blocks 18/28

Rapid Disk Analysis The Math The Aftermath Conclusions Results: time overhead Avg. search time (seconds) 10 − 3 . 2 10 − 3 . 4 Image size 2MB 10 − 3 . 6 4MB 10MB 20MB 10 − 3 . 8 10 0 10 1 10 2 Nb. of transactions 19/28

Rapid Disk Analysis The Math The Aftermath Conclusions Contributions Main findings Parameters analyzed: Image characteristics: image size, sector size, data alignment, size of target data Sampling settings: sample size, transaction size, tolerance Scalability: Sample size scales with time: S ∼ t 1 Error rate scales with time: E ∼ √ t Public material Fork of S. Garfinkel’s tools on GitHub Most of experimental scripts on Gist 21/28

Rapid Disk Analysis The Math The Aftermath Conclusions Research answers What kind of indications may be provided? Presence/absence of target data or signature Which parameters are in play? Disk and data characteristics Sampling parameters Which degree of certainty may be achieved? Certainty scales well with time Insight about target disk will improve certainty Random sampling is a powerful, scalable, adaptive technique for fast HDD analysis Efficiency relies on suitable sampling settings, and limited insight on target HDD 22/28

Rapid Disk Analysis The Math The Aftermath Conclusions Further research Improving insight of target Pre-determine sector size, data alignment Look for optimal block-to-transaction ratio One step further: pre-sampling Automate decision process Optimal time spending Automatic settings balance Simple user-side: time or certainty 23/28

Rapid Disk Analysis The Math The Aftermath Conclusions Appendix 1: Bloom Filter (a) Hash-based filtering technique Initialize An array of n bits set to zero k different hash functions uniformly mapping to [ 0 − n ] Add an element Apply functions to compute k integers in [ 0 − n ] Set k corresponding bits to 1 Query an element Apply functions to compute k integers in [ 0 − n ] Check if k corresponding bits are all 1 24/28

Rapid Disk Analysis The Math The Aftermath Conclusions Appendix 1: Bloom Filter (b) · 10 − 2 6 Bloom filter size 8 bits 5 32bits Avg. error variance 4 3 2 1 0 10 0 10 1 10 2 10 3 10 4 Nb. of transactions 25/28

Rapid Disk Analysis The Math The Aftermath Conclusions Appendix 1: Bloom Filter (c) Avg. building and search time (seconds) 1 . 2 1 0 . 8 Bloom filter size 8 bits 0 . 6 16 bits 24 bits 0 . 4 30 bits 31 bits 0 . 2 32 bits 0 10 0 10 1 10 2 10 3 10 4 Nb. of transactions 26/28

Rapid Disk Analysis The Math The Aftermath Conclusions Appendix 2: Data layout (a) Optimal transaction size depends on sector size Best case: Worst case: 27/28

Rapid Disk Analysis The Math The Aftermath Conclusions Appendix 2: Data layout (b) Optimal transaction size depends on data layout 28/28

Random Sampling applied to Rapid Disk Analysis System & Network - PowerPoint PPT Presentation

Rapid Disk Analysis The Math The Aftermath Conclusions Random Sampling applied to Rapid Disk Analysis System & Network Engineering Research Project Nicolas Canceill July 4, 2013 1/28 Rapid Disk Analysis The Math The Aftermath

Disk Management Disk Structure Disk Scheduling RAID Disk Block Management

Disk Storage Disk Storage Different types of disk storage: The smallest addressable unit

CPSC 410/611: Disk Management Disk Structure Disk Scheduling RAID Disk Block

1 2 Single Disk (a) Side view of a magnetic disk. (b) Top view of a magnetic disk. 3

CPSC 410/611: Disk Management Disk Structure Disk Scheduling RAID

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Today How is data saved in the hard disk? Magnetic disk Disk speed parameters Disk

CPSC 410/ 611: Week 9 Disk St ruct ure Disk Scheduling RAI D Disk Block

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

Rapid Response Jobs are Alaskas Future Rapid Response Rapid Response Rapid Response is a

HARD DISK DRIVES Performance Storage capacity Software support Reliability Why we

Chapter 14: Mass-Storage Systems Disk Structure Disk Scheduling Disk Management

Chapter 14: Mass-Storage Systems ! Disk Structure ! Disk Scheduling ! Disk Management ! Swap-Space

Partner Product Introduction The Future of Archiving Today www.disc-group.com Library

KMT - Kraus Messtechnik GmbH Germany Contactless data transmission from rotating or moving

Meritor Air Disc Brake Evolution ADB-1560 D-LISA EX + External

Estimating Hardware Storage Costs Jenny Woolley William Black ICEAA 2014 Denver, CO * The

ALMA Fellow Report: ALMA Study of Disk Formation around Protostars Shigehisa Takakuwa &

Web Conferencing Loading Content Table of Contents Web Conferencing Loading Presentations

Seagate Technology April 2019 Investor Presentation 4/30/2019 Safe Harbor Statement This

(SITAC) 06-20-16 | Pioneer Room, State Capitol Bismarck ND Agenda Time ime Top opic ic

Random Sampling applied to Rapid Disk Analysis System & Network - PowerPoint PPT Presentation

Rapid Disk Analysis The Math The Aftermath Conclusions Random Sampling applied to Rapid Disk Analysis System & Network Engineering Research Project Nicolas Canceill July 4, 2013 1/28 Rapid Disk Analysis The Math The Aftermath

Disk Management Disk Structure Disk Scheduling RAID Disk Block Management

Disk Storage Disk Storage Different types of disk storage: The smallest addressable unit

CPSC 410/611: Disk Management Disk Structure Disk Scheduling RAID Disk Block

1 2 Single Disk (a) Side view of a magnetic disk. (b) Top view of a magnetic disk. 3

CPSC 410/611: Disk Management Disk Structure Disk Scheduling RAID

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Today How is data saved in the hard disk? Magnetic disk Disk speed parameters Disk

CPSC 410/ 611: Week 9 Disk St ruct ure Disk Scheduling RAI D Disk Block

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

Rapid Response Jobs are Alaskas Future Rapid Response Rapid Response Rapid Response is a

HARD DISK DRIVES Performance Storage capacity Software support Reliability Why we

Chapter 14: Mass-Storage Systems Disk Structure Disk Scheduling Disk Management

Chapter 14: Mass-Storage Systems ! Disk Structure ! Disk Scheduling ! Disk Management ! Swap-Space

Partner Product Introduction The Future of Archiving Today www.disc-group.com Library

KMT - Kraus Messtechnik GmbH Germany Contactless data transmission from rotating or moving

Meritor Air Disc Brake Evolution ADB-1560 D-LISA EX + External

Estimating Hardware Storage Costs Jenny Woolley William Black ICEAA 2014 Denver, CO * The

ALMA Fellow Report: ALMA Study of Disk Formation around Protostars Shigehisa Takakuwa &amp;

Web Conferencing Loading Content Table of Contents Web Conferencing Loading Presentations

Seagate Technology April 2019 Investor Presentation 4/30/2019 Safe Harbor Statement This

(SITAC) 06-20-16 | Pioneer Room, State Capitol Bismarck ND Agenda Time ime Top opic ic

ALMA Fellow Report: ALMA Study of Disk Formation around Protostars Shigehisa Takakuwa &