bonsai balanced lineage authentication
play

Bonsai: Balanced Lineage Authentication Ashish Gehani - PowerPoint PPT Presentation

Bonsai: Balanced Lineage Authentication Ashish Gehani Bonsai:Balanced Lineage Authentication p. 1/19 What is data lineage ? Output Operation Input 1 Input n (a) Primitive operation (b) Compound operation tree Bonsai:Balanced Lineage


  1. Bonsai: Balanced Lineage Authentication Ashish Gehani Bonsai:Balanced Lineage Authentication – p. 1/19

  2. What is data lineage ? Output Operation Input 1 Input n (a) Primitive operation (b) Compound operation tree Bonsai:Balanced Lineage Authentication – p. 2/19

  3. Why track lineage? GIS - Data origins Material science - Component pedigree Biology - Experiment reproducibility Grid - Debugging Bonsai:Balanced Lineage Authentication – p. 3/19

  4. Why certify lineage? Reproduction costly PDB - $200,000 / protein Fermilab Collision Detector - 1 month, multiple TB / datum Reliability Accreditation Ownership Auditability Bonsai:Balanced Lineage Authentication – p. 4/19

  5. What’s been done? LFS - Inputs, Outputs, Options → SQL PASS - Runtime environs → Berkeley DB Trio - Tracks data accuracy using lineage CMCS - Chemistry toolkit → WebDAV Chimera - Workflow scripts my Grid - Biology Grid workflows V esta - Incremental builds ESSW - Earth Science data management Bonsai:Balanced Lineage Authentication – p. 5/19

  6. What’s the problem? Single trust domain Chimera , my Grid , V esta , ESSW Centralized service LFS , PASS , Trio , CMCS No assurance Unsigned Incomplete Bonsai:Balanced Lineage Authentication – p. 6/19

  7. What granularity? What to audit? Processes, System calls, File system? Fine grain → High overhead Coarse grain → False positives File system: Pro - Intermediate complexity Pro - Captures most persistent change Con - Can’t track data from: Network, Keyboard, Pipes, Memory maps Bonsai:Balanced Lineage Authentication – p. 7/19

  8. Certification approach ? Input = Output No global TCB Require commitments Consumer Check agreement of: Input Output Producer output Producer Consumer input Trusted user in subtree / path → Tampering detectable Bonsai:Balanced Lineage Authentication – p. 8/19

  9. Metadata generation Intercede on calls for: exec(), fork(), exit(), open(), close(), read(), write() Maintain process table entries for: accessed, modified files File 2 Read open() close() File 3 File 1 Read close() Process open() Owner Process execution Time close() File 1 File 2 open() File 3 Write Bonsai:Balanced Lineage Authentication – p. 9/19

  10. Minimal representation Executor Signature Output Input Input n 1 Net Address Inode Time Executor: 32 bit IPv4 address, 32 bit user ID Signature: 160 bits [ S IGN K E ( E, O, I 1 , . . . , I n ) ] Input / Output File: 32 bit IPv4 address 32 bit inode 32 bit time (Seconds since 1/1/70) Bonsai:Balanced Lineage Authentication – p. 10/19

  11. Workload Berkeley NOW file system traces Month of activity Access patterns stable Instruction - 20 workstations in teaching lab Research - 13 desktops of research group Web - 1 web server running Postgres Windows - 8 Windows desktops Bonsai:Balanced Lineage Authentication – p. 11/19

  12. Cumulative lineage Current paradigm Entire tree migrates with data Metadata grows rapidly: Steps 1 2 3 4 5 Workload Instruction 0.4 KB 3 KB 31 KB 253 KB 2 MB Research 0.2 KB 0.8 KB 2 KB 8 KB 29 KB Web 1 KB 39 KB 1 MB 29 MB 813 MB Windows 0.2 KB 0.8 KB 2 KB 9 KB 30 KB Bonsai:Balanced Lineage Authentication – p. 12/19

  13. Operational impact Time (in ms ) to read tree in open() : Steps 1 2 3 4 Workload Instruction 0.04 0.05 0.11 1.72 Research 0.05 0.05 0.04 0.04 Web 0.06 0.13 6.42 997.5 Windows 0.07 0.04 0.04 0.04 Time (in ms ) to write tree in close() : Steps 1 2 3 4 Workload Instruction 0.20 0.28 0.32 0.84 Research 0.16 0.19 2.39 3.1 Web 0.16 0.24 4.82 579.14 Windows 0.16 0.50 5.34 3.17 Bonsai:Balanced Lineage Authentication – p. 13/19

  14. In actu Larger representation Unless certification available for: DHCP bindings inode mappings Clock synchronization Bonsai:Balanced Lineage Authentication – p. 14/19

  15. Decentralized lineage Proposed paradigm Remote pointers replace branches Metadata remains small: Workload Storage Instruction 0.4 KB Research 0.2 KB Web 1 KB Windows 0.2 KB Bonsai:Balanced Lineage Authentication – p. 15/19

  16. Verifying lineage Algorithm : C HECK L INEAGE ( D ) { E, S, O, I 1 , . . . , I n } ← G ET R OOT ( D ) O UTPUT ( E ) P E ← P KI L OOKUP ( E ) if I 1 , . . . , I n = {}  Result ← V ERIFY ( P E , S, E, O )   then if Result = F ALSE  then CheckFailed   Result ← V ERIFY ( P E , S, E, O | I 1 | . . . | I n )    if Result = T RUE    � else for i ← 1 to n then  do C HECK L INEAGE ( I i ) Reliability drops ← −      else CheckFailed Bonsai:Balanced Lineage Authentication – p. 16/19

  17. Increasing availability Traditional strategy: Form virtual topology Flood neighbors Inefficient use of storage Bonsai:Balanced Lineage Authentication – p. 17/19

  18. Bonsai Prune lineage tree Stored locally Pruned − must be recovered λ from remote node levels Pruned Bonsai:Balanced Lineage Authentication – p. 18/19

  19. Simplest pruning Trade verification reliability for storage Bonsai:Balanced Lineage Authentication – p. 19/19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend