Bonsai: Balanced Lineage Authentication Ashish Gehani - - PowerPoint PPT Presentation

bonsai balanced lineage authentication
SMART_READER_LITE
LIVE PREVIEW

Bonsai: Balanced Lineage Authentication Ashish Gehani - - PowerPoint PPT Presentation

Bonsai: Balanced Lineage Authentication Ashish Gehani Bonsai:Balanced Lineage Authentication p. 1/19 What is data lineage ? Output Operation Input 1 Input n (a) Primitive operation (b) Compound operation tree Bonsai:Balanced Lineage


slide-1
SLIDE 1

Bonsai: Balanced Lineage Authentication

Ashish Gehani

Bonsai:Balanced Lineage Authentication – p. 1/19

slide-2
SLIDE 2

What is data lineage?

Output Operation Input 1 Input n

(a) Primitive operation (b) Compound operation tree

Bonsai:Balanced Lineage Authentication – p. 2/19

slide-3
SLIDE 3

Why track lineage?

GIS - Data origins Material science - Component pedigree Biology - Experiment reproducibility Grid - Debugging

Bonsai:Balanced Lineage Authentication – p. 3/19

slide-4
SLIDE 4

Why certify lineage?

Reproduction costly PDB - $200,000 / protein Fermilab Collision Detector - 1 month, multiple TB / datum Reliability Accreditation Ownership Auditability

Bonsai:Balanced Lineage Authentication – p. 4/19

slide-5
SLIDE 5

What’s been done?

LFS - Inputs, Outputs, Options → SQL PASS - Runtime environs → Berkeley DB Trio - Tracks data accuracy using lineage CMCS - Chemistry toolkit → WebDAV Chimera - Workflow scripts

myGrid - Biology Grid workflows

V esta - Incremental builds ESSW - Earth Science data management

Bonsai:Balanced Lineage Authentication – p. 5/19

slide-6
SLIDE 6

What’s the problem?

Single trust domain Chimera, myGrid, V esta, ESSW Centralized service LFS, PASS, Trio, CMCS No assurance Unsigned Incomplete

Bonsai:Balanced Lineage Authentication – p. 6/19

slide-7
SLIDE 7

What granularity?

What to audit? Processes, System calls, File system? Fine grain → High overhead Coarse grain → False positives File system: Pro - Intermediate complexity Pro - Captures most persistent change Con - Can’t track data from: Network, Keyboard, Pipes, Memory maps

Bonsai:Balanced Lineage Authentication – p. 7/19

slide-8
SLIDE 8

Certification approach

? Consumer Producer Output Input Input = Output

No global TCB Require commitments Check agreement of: Producer output Consumer input Trusted user in subtree / path → Tampering detectable

Bonsai:Balanced Lineage Authentication – p. 8/19

slide-9
SLIDE 9

Metadata generation

Intercede on calls for:

exec(), fork(), exit(), open(), close(), read(), write()

Maintain process table entries for:

accessed, modified files

Process File 1 Read File 2 Read close()

  • pen()
  • pen()

close() File 3 Write Process execution Time close()

  • pen()

File 3 File 1 File 2 Owner

Bonsai:Balanced Lineage Authentication – p. 9/19

slide-10
SLIDE 10

Minimal representation

Net Address Inode Time 1 Signature Output Input n Executor Input

Executor: 32 bit IPv4 address, 32 bit user ID Signature: 160 bits [ SIGNKE(E, O, I1, . . . , In)] Input / Output File: 32 bit IPv4 address 32 bit inode 32 bit time (Seconds since 1/1/70)

Bonsai:Balanced Lineage Authentication – p. 10/19

slide-11
SLIDE 11

Workload

Berkeley NOW file system traces Month of activity Access patterns stable Instruction - 20 workstations in teaching lab Research - 13 desktops of research group Web - 1 web server running Postgres Windows - 8 Windows desktops

Bonsai:Balanced Lineage Authentication – p. 11/19

slide-12
SLIDE 12

Cumulative lineage

Current paradigm Entire tree migrates with data Metadata grows rapidly:

Steps 1 2 3 4 5 Workload Instruction 0.4 KB 3 KB 31 KB 253 KB 2 MB Research 0.2 KB 0.8 KB 2 KB 8 KB 29 KB Web 1 KB 39 KB 1 MB 29 MB 813 MB Windows 0.2 KB 0.8 KB 2 KB 9 KB 30 KB

Bonsai:Balanced Lineage Authentication – p. 12/19

slide-13
SLIDE 13

Operational impact

Time (in ms) to read tree in open(): Steps 1 2 3 4 Workload Instruction 0.04 0.05 0.11 1.72 Research 0.05 0.05 0.04 0.04 Web 0.06 0.13 6.42 997.5 Windows 0.07 0.04 0.04 0.04 Time (in ms) to write tree in close(): Steps 1 2 3 4 Workload Instruction 0.20 0.28 0.32 0.84 Research 0.16 0.19 2.39 3.1 Web 0.16 0.24 4.82 579.14 Windows 0.16 0.50 5.34 3.17

Bonsai:Balanced Lineage Authentication – p. 13/19

slide-14
SLIDE 14

In actu

Larger representation Unless certification available for: DHCP bindings inode mappings Clock synchronization

Bonsai:Balanced Lineage Authentication – p. 14/19

slide-15
SLIDE 15

Decentralized lineage

Proposed paradigm Remote pointers replace branches Metadata remains small:

Workload Storage Instruction 0.4 KB Research 0.2 KB Web 1 KB Windows 0.2 KB

Bonsai:Balanced Lineage Authentication – p. 15/19

slide-16
SLIDE 16

Verifying lineage

Algorithm : CHECKLINEAGE(D) {E, S, O, I1, . . . , In} ← GETROOT(D) OUTPUT(E) PE ← PKILOOKUP(E) if I1, . . . , In = {} then

    

Result ← VERIFY(PE, S, E, O) if Result = FALSE then CheckFailed else

            

Result ← VERIFY(PE, S, E, O|I1| . . . |In) if Result = TRUE then

  • for i ← 1 to n

do CHECKLINEAGE(Ii) ← − Reliability drops else CheckFailed

Bonsai:Balanced Lineage Authentication – p. 16/19

slide-17
SLIDE 17

Increasing availability

Traditional strategy: Form virtual topology Flood neighbors Inefficient use of storage

Bonsai:Balanced Lineage Authentication – p. 17/19

slide-18
SLIDE 18

Bonsai

Prune lineage tree

Pruned levels λ Stored locally Pruned − must be recovered from remote node

Bonsai:Balanced Lineage Authentication – p. 18/19

slide-19
SLIDE 19

Simplest pruning

Trade verification reliability for storage

Bonsai:Balanced Lineage Authentication – p. 19/19