Secure Indexing/Search for g Regulatory-Compliant Record R - PowerPoint PPT Presentation

Secure Indexing/Search for g Regulatory-Compliant Record R Retention i 1

There is a need for trustworthy record keeping k i Email Instant Messaging Spending on Files Files eDiscovery Growing eDiscovery Growing at 65% CAGR Corporate Corporate Digital Digital Soaring Soaring Misconduct Misconduct I nform ation I nform ation Discovery Discovery Explosion Explosion Explosion Explosion Costs Costs Costs Costs Records Average F500 Average F500 Company Has 125 IDC Forecasts Non-Frivolous 60B Business Focus on Com pliance Focus on Com pliance Lawsuits at Any Emails Annually Given Time HIPAA Sources: IDC, Network World (2003), Socha / Gelbmann (2004) Q. Zhu, W. W. Hsu: Fossilized Index: The Linchpin of Trustworthy Non-Alterable Electronic Records. 2 SIGMOD’2006, 395-406, 2006

What is trustworthy record keeping? Establish solid proof of events that have occurred Storage tim e Device Com m it Record Regret Query Alice Bob Adversary Bob should get back Alice’s data 3

This leads to a unique threat model tim e ti Query is Commit is Adversary has trustworthy trustworthy super-user privileges • Access to storage device R Record is created d i d Record is R d i • Access to any keys properly queried properly Adversary could be Alice herself Adversary could be Alice herself 4

Traditional schemes do not work tim e Cannot rely on Alice’s signature Cannot rely on Alice s signature 5

WORM storage helps address the problem Record Overwrite/ New Record Delete Adversary cannot delete Alice’s record Write Once Read Many (WORM) 6

WORM storage helps address the problem Record Overwrite/ New Record Delete Build on top of Build on top of conventional Adversary cannot rewritable magnetic delete Alice’s record disk, with write-once semantics enforced ti f d through software, with file modification Write Once Read Many and premature p deletion operations disallowed. 7

Index required due to high volume of records Index tim e Com m it Record Query from Update I ndex I ndex Regret Bob Alice Adversary 8

In effect, records can be hidden/altered by modifying the index dif i h i d Or replace B Hide record B Hide record B with B’ from the A B B B’ index The index must also be secured (fossilized) 9

Btree for increasing sequence can be created on WORM d WORM 23 13 7 31 2 4 29 31 11 23 7 19 13 10

B+tree index is insecure, even on WORM 23 25 7 13 31 27 4 7 11 13 19 23 29 31 25 26 30 2  Path to an element depends on elements inserted later – Adversary can attack it y 11

Is this a real threat?  Would someone want to delete a record after  Would someone want to delete a record after a day its created?  Intrusion detection logging  Intrusion detection logging  Once adversary gain control, he would like to delete records of his initial attack delete records of his initial attack  Record regretted moments after creation  Email best practice - Must be committed E il b t ti M t b itt d before its delivered 12

Several levels of indexing … 1 …query … …query … q y 3 … data … … base … …index … Keywords Query 1 3 11 17 3 9 Data Base 3 19 Posting Lists 7 36 Worm I d Index 3 3 To find documents containing keywords “Query” and “Data” and “Base” * Retrieve lists for Query Data and Base and intersect the document Retrieve lists for Query, Data and Base, and intersect the document ids in the list 13

GHT: A Generalized Hash Tree Fossilized I d Index  Tree grows from the root down to the leaves  Tree grows from the root down to the leaves without relocating committed entries  “Balanced” without requiring dynamic  Balanced without requiring dynamic adjustments to its structure  For hash-based scheme dynamic hashing  For hash-based scheme, dynamic hashing scheme that do not require rehashing 14

GHT Defined by {M,K, H} Defined by {M,K, H}  M = {m 0 , m 1 , …}, m i is  size of a tree node (number of buckets) at (number of buckets) at level i K = {k 0 , k 1 ,…}, k i is the  growth factor for level i growth factor for level i A tree has k i times as  many nodes at level (i+1) as at level i H = {h 0 , h 1 ,…}, h i is a  m 0 = m 1 … = 4 hash function for level I k 0 = k 1 … = 2 Different H values lead to  different GHT variants 15

Standard (Default) GHT – Thin Tree h 0 Defined by {M,K, H} Defined by {M,K, H}  M = {m 0 , m 1 , …}, m i is  size of a tree node h 1 (number of buckets) at (number of buckets) at level i K = {k 0 , k 1 ,…}, k i is the  growth factor for level i growth factor for level i h 2 h 2 h 2 h 2 A tree has k i times as  many nodes at level (i+1) as at level i H = {h 0 , h 1 ,…}, h i is a  m 0 = m 1 … = 4 hash function for level i k 0 = k 1 … = 2 16

Standard (Default) GHT – Thin Tree h 0 Defined by {M,K, H} Defined by {M,K, H}  M = {m 0 , m 1 , …}, m i is  size of a tree node h 1 (number of buckets) at (number of buckets) at level i K = {k 0 , k 1 ,…}, k i is the  growth factor for level i growth factor for level i h 2 h 2 h 2 h 2 A tree has k i times as  many nodes at level (i+1) as at level i H = {h 0 , h 1 ,…}, h i is a  m 0 = m 1 … = 4 hash function for level i k 0 = k 1 … = 2 What about h 2 ? x mod 16? h 0 = x mod 4 0 h 1 = x mod 8 17

Standard (Default) GHT – Thin Tree h 0 Defined by {M,K, H} Defined by {M,K, H}  M = {m 0 , m 1 , …}, m i is  size of a tree node h 1 (number of buckets) at (number of buckets) at level i K = {k 0 , k 1 ,…}, k i is the  growth factor for level i growth factor for level i h 2 h 2 h 2 h 2 A tree has k i times as  many nodes at level (i+1) as at level i H = {h 0 , h 1 ,…}, h i is a  m 0 = m 1 … = 4 hash function for level i k 0 = k 1 … = 2 h 0 = x mod 4 0 h 1 = x mod 8 h 2 = h 3 = … = x mod 8 18

GHT Variant (Fat Tree) Can tolerate non-ideal hash functions better h 0 because there are many because there are many more potential target buckets at each level h 1 Hashing at different Hashing at different levels is independent h 2 Can allocate different levels to different disks and access them in parallel m 0 = m 1 … = 4 h 0 = x mod 4 Expensive to maintain k 0 = k 1 … = 2 h 1 = x mod 8 children pointers in each h 2 = x mod 16 node – number of h i = x mod 4*2 i i pointers grow exponentially 19

GHT (Standard) Insertion Bucket = (Level, Child – left or right, Entry within bucket) (0, 0, 1) ( , , ) (1 1 2) (1, 1, 2) (2, 0, 1) 20

GHT Insertion Insert whose hash values at the various levels are shown. (0, 0, 1) ( , , ) Occupied/ h0(key) = 1 collision (1 1 2) (1, 1, 2) h1(key) = 6 (2, 0, 1) h2(key) = 1 h3(key) = 3 21

GHT Insertion Insert whose hash values at the various levels are shown. ( , (0, 0, 1) , ) Occupied/ h0(key) = 1 collision (1 1 2) (1, 1, 2) h1(key) = 6 (2, 0, 1) h2(key) = 1 h3(key) = 3 (3, 0, 3) If hash functions are uniform, tree grows top-down in a balanced fashion 22

GHT Search Search for Search for whose hash values at the various levels are shown whose hash values at the various levels are shown - Similar to insertion - Need to deal with duplicate key values (0, 0, 1) ( , , ) h0(key) = 1 (1 1 2) (1, 1, 2) h1(key) = 6 (2, 0, 1) h2(key) = 1 h3(key) = 3 (3, 0, 3) Only for point queries   Cannot support range search 23

Summary  Trustworthy record keeping is important  Trustworthy record keeping is important  However, need to also ensure efficient retrieval retrieval  Existing indexing structures may be manipulated manipulated  GHT is a “trustworthy” index structure  Once record is committed, it cannot be Once record is committed it cannot be manipulated! 24

Most business records are unstructured, searched by inverted index h d b i d i d Keywords Posting Lists Query 1 3 11 17 Data 3 9 3 19 Base Worm 7 36 3 3 Index Index One WORM file for each posting list One WORM file for each posting list 25 S. Mitra, W. W. Hsu, M. Winslett: Trustworthy Keyword Search for Regulatory-Compliant Record Retention. VLDB’2006, 1001-1012, 2006

Index must be updated as new documents arrive i Keywords Keywords Posting Lists Posting Lists Doc: 79 Query 1 3 11 17 79 Data Data 3 9 79 Query Query Base 3 19 Data Worm 7 36 Index Index 3 79  500 keywords = 500 disk seeks 500 k d 500 di k k  ~1 sec per document 26

Amortize cost by updating in batch Buffer Keywords Keywords Posting Lists Posting Lists D Doc: 79 79 Query 79 81 83 Query 1 3 11 17 Doc: 80 Doc: 80 Data Data 3 9 Doc: 81 Base 3 19 Query Worm 7 36 Index 3 Doc: 82 Doc: 83  1 seek per keyword in batch  1 seek per keyword in batch Query Query  Large buffer to benefit infrequent terms  Over 100,000 documents to achieve 2 docs/sec Over 100 000 documents to achieve 2 docs/sec 27

Index is not updated immediately Index Alice Alice Com m it tim e Record Alter Omit Buffer Buffer Adversary  Prevailing practice – email must be committed before it is delivered 28

Secure Indexing/Search for g Regulatory-Compliant Record R - PowerPoint PPT Presentation

Secure Indexing/Search for g Regulatory-Compliant Record R Retention i 1 There is a need for trustworthy record keeping k i Email Instant Messaging Spending on Files Files eDiscovery Growing eDiscovery Growing at 65% CAGR

Secure Indexing/Search for Regulatory-Compliant Record Retention 1 There is a need for

Non Compliant Compliant 1. Non Submission of Data string upload to the IDP by End the portal

Record Type Families: Record type A Key to Generic Record Combinators families Record scheme

Distributed Indexing Indexing, session 8 CS6200: Information Retrieval Slides by: Jesse Anderton

Indexing Multimedia Multimedia Databases Databases Indexing Indexing Multimedia Databases

Chapter 6 Hash-Based Indexing Efficient Support for Equality Search Hash-Based Indexing Static

How Secure are Secure How Secure are Secure Interdomain Routing Protocols? Interdomain Routing

Indexing Presentation - The Basics Attached is the slide deck for a short presentation on indexing

Indexing and Searching Indexing and Searching TDT4215 TDT4215 Indexing & Searching 3

Bitmap Indexing and related indexing techniques Presented by: El Ghailani Maher Outline I

Indexing December 12, 2008 Indexing Introduction New tuple is stored without any order next

Multi-Probe LSH: Efficient Indexing for Efficient Indexing for Multi-Probe LSH:

Index Construction Dictionary, postings, scalable indexing, dynamic indexing Web Search 1

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

The SOUL Record TM Sarah Durke SOUL Record Coordinator What is The SOUL Record? The SOUL

Regulatory Binder By: Sam Payn Regulatory Binder Goals To learn about regulatory binders.

A Method for Estimating a Distribution of Eigenvalues using the AS Method Kenta SENZAKI 1) Hiroto

Ohio DOT TRAC Process Alternatives for Revising the Scoring Process EDR Group with Burns &

Visualizing data from epidemiologic studies: An expanded scatter plot matrix Benjamin Barnes,

Neoclassical Models of Endogenous Growth October 2007 () Endogenous Growth October 2007 1 / 20

COVID-19 AND LUNG CANCER What We Know, What We Dont Know and What It All Means for Current

Motivations Present and future probes of DE: BAO, Weak Lensing, Ly , 21cm, ... they all require

Financial results FY 2007 Conference call 2007 highlights Very satisfactory sales growth Key

Faster Bootstrapping with Polynomial Error Jacob Alperin-Sheriff Chris Peikert School of

Sambuz

Useful Links

Newsletter

Mail Us

Secure Indexing/Search for g Regulatory-Compliant Record R - PowerPoint PPT Presentation

Secure Indexing/Search for g Regulatory-Compliant Record R Retention i 1 There is a need for trustworthy record keeping k i Email Instant Messaging Spending on Files Files eDiscovery Growing eDiscovery Growing at 65% CAGR

Secure Indexing/Search for Regulatory-Compliant Record Retention 1 There is a need for

Non Compliant Compliant 1. Non Submission of Data string upload to the IDP by End the portal

Record Type Families: Record type A Key to Generic Record Combinators families Record scheme

Distributed Indexing Indexing, session 8 CS6200: Information Retrieval Slides by: Jesse Anderton

Indexing Multimedia Multimedia Databases Databases Indexing Indexing Multimedia Databases

Chapter 6 Hash-Based Indexing Efficient Support for Equality Search Hash-Based Indexing Static

How Secure are Secure How Secure are Secure Interdomain Routing Protocols? Interdomain Routing

Indexing Presentation - The Basics Attached is the slide deck for a short presentation on indexing

Indexing and Searching Indexing and Searching TDT4215 TDT4215 Indexing &amp; Searching 3

Bitmap Indexing and related indexing techniques Presented by: El Ghailani Maher Outline I

Indexing December 12, 2008 Indexing Introduction New tuple is stored without any order next

Multi-Probe LSH: Efficient Indexing for Efficient Indexing for Multi-Probe LSH:

Index Construction Dictionary, postings, scalable indexing, dynamic indexing Web Search 1

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

The SOUL Record TM Sarah Durke SOUL Record Coordinator What is The SOUL Record? The SOUL

Regulatory Binder By: Sam Payn Regulatory Binder Goals To learn about regulatory binders.

A Method for Estimating a Distribution of Eigenvalues using the AS Method Kenta SENZAKI 1) Hiroto

Ohio DOT TRAC Process Alternatives for Revising the Scoring Process EDR Group with Burns &amp;

Visualizing data from epidemiologic studies: An expanded scatter plot matrix Benjamin Barnes,

Neoclassical Models of Endogenous Growth October 2007 () Endogenous Growth October 2007 1 / 20

COVID-19 AND LUNG CANCER What We Know, What We Dont Know and What It All Means for Current

Motivations Present and future probes of DE: BAO, Weak Lensing, Ly , 21cm, ... they all require

Financial results FY 2007 Conference call 2007 highlights Very satisfactory sales growth Key

Faster Bootstrapping with Polynomial Error Jacob Alperin-Sheriff Chris Peikert School of

Sambuz

Useful Links

Newsletter

Mail Us

Indexing and Searching Indexing and Searching TDT4215 TDT4215 Indexing & Searching 3

Ohio DOT TRAC Process Alternatives for Revising the Scoring Process EDR Group with Burns &