toward eidetic distributed file systems
play

Toward Eidetic Distributed File Systems Xianzheng Dou , Jason Flinn, - PowerPoint PPT Presentation

Toward Eidetic Distributed File Systems Xianzheng Dou , Jason Flinn, Peter M. Chen Rich file system features Modern file systems store more than just data Versioning: retention of past state Provenance-aware: connections between file


  1. Toward Eidetic Distributed File Systems Xianzheng Dou , Jason Flinn, Peter M. Chen

  2. Rich file system features • Modern file systems store more than just data – Versioning: retention of past state – Provenance-aware: connections between file data • Problem: – High costs for providing these rich features Xianzheng Dou 1

  3. Versioning FS tradeoffs • Frequency of versioning Less frequent More frequent Lower storage cost Higher storage cost 2

  4. Versioning FS tradeoffs • Frequency of versioning Less frequent More frequent Lower storage cost Higher storage cost Ext4 2

  5. Versioning FS tradeoffs • Frequency of versioning Less frequent More frequent Lower storage cost Higher storage cost Versionfs WAFL 2

  6. Versioning FS tradeoffs • Frequency of versioning Less frequent More frequent Lower storage cost Higher storage cost Elephant FS 2

  7. Versioning FS tradeoffs • Frequency of versioning Less frequent More frequent Lower storage cost Higher storage cost CVFS Wayback 2

  8. Versioning FS tradeoffs • Frequency of versioning Less frequent More frequent Lower storage cost Higher storage cost Any past user-level state? 2

  9. Versioning FS tradeoffs • Frequency of versioning Less frequent More frequent Lower storage cost Higher storage cost Any past user-level state? Any past file system state and any transient state 2

  10. Provenance FS tradeoffs • Details of connection information Lower granulartiy Higher granularity Lower storage cost Higher storage cost 3

  11. Provenance FS tradeoffs • Details of connection information Lower granulartiy Higher granularity Lower storage cost Higher storage cost Ext4 3

  12. Provenance FS tradeoffs • Details of connection information Lower granulartiy Higher granularity Lower storage cost Higher storage cost Connections 3

  13. Provenance FS tradeoffs • Details of connection information Lower granulartiy Higher granularity Lower storage cost Higher storage cost PASS 3

  14. Provenance FS tradeoffs • Details of connection information Lower granulartiy Higher granularity Lower storage cost Higher storage cost Complete byte-level provenance? 3

  15. Background: eidetic systems[OSDI’14] • Recall any past user-level state – By pervasive deterministic record and replay Logs of Replay Record PLAY RECORD non-deterministic events … … … … … … … … Xianzheng Dou 4

  16. Background: eidetic systems[OSDI’14] • Recall any past user-level state – By pervasive deterministic record and replay Logs of Replay Record PLAY RECORD non-deterministic events • Provenance at the byte granularity … … – Intra-process lineage: dynamic information tracking … … … … … … – Inter-process lineage: data flow dependency graph Xianzheng Dou 4

  17. A clean-sheet design of FS • Eidetic systems prototype – Graft eidetic support onto an existing FS – Consider only local storage • An eidetic distributed file system – A small number of personal devices + cloud servers • New design choices – Fundamental unit of persistent storage – File transfer Xianzheng Dou 5

  18. Traditional distributed FS Xianzheng Dou 6

  19. Traditional distributed FS Xianzheng Dou 6

  20. Traditional distributed FS Xianzheng Dou 6

  21. Traditional distributed FS Xianzheng Dou 6

  22. Eidetic distributed file systems Xianzheng Dou 7

  23. Eidetic distributed file systems Xianzheng Dou 7

  24. Fundamental unit • What is the fundamental unit of persistent storage? Xianzheng Dou 8

  25. Fundamental unit • What is the fundamental unit of persistent storage? Xianzheng Dou 8

  26. Fundamental unit • What is the fundamental unit of persistent storage? Replay Xianzheng Dou 8

  27. Fundamental unit • What is the fundamental unit of persistent storage? Xianzheng Dou 9

  28. Fundamental unit • What is the fundamental unit of persistent storage? Xianzheng Dou 9

  29. Fundamental unit • What is the fundamental unit of persistent storage? Fundamental unit: Logs of non-determinism Files are only considered as caches Xianzheng Dou 9

  30. File persistency • When is a file considered persistent on the server? Xianzheng Dou 10

  31. File persistency • When is a file considered persistent on the server? As long as logs generating the data are persistent Xianzheng Dou 10

  32. File persistency • When is a file considered persistent on the server? Xianzheng Dou 10

  33. Updating server cache • Should the server cache the file version? ? Xianzheng Dou 11

  34. Updating server cache • Should the server cache the file version? ? Probability of future access Costs for regeneration Xianzheng Dou 11

  35. File transfer methods • How are files transferred to the server? Xianzheng Dou 12

  36. File transfer methods • How are files transferred to the server? Xianzheng Dou 12

  37. File transfer methods • How are files transferred to the server? Xianzheng Dou 13

  38. File transfer methods • How are files transferred to the server? Xianzheng Dou 13

  39. File transfer methods • How are files transferred to the server? Compare computation costs with communication costs - by value (file data) - or by replay Xianzheng Dou 13

  40. Read path • How should a client read a particular version? Xianzheng Dou 14

  41. Read path • How should a client read a particular version? Xianzheng Dou 14

  42. Available transfer methods • How should a client read a particular version? Xianzheng Dou 15

  43. Available transfer methods • How should a client read a particular version? Xianzheng Dou 15

  44. Available transfer methods • How should a client read a particular version? Xianzheng Dou 15

  45. Available transfer methods • How should a client read a particular version? Xianzheng Dou 16

  46. Available transfer methods • How should a client read a particular version? Xianzheng Dou 16

  47. Available transfer methods • How should a client read a particular version? Xianzheng Dou 16

  48. Available transfer methods • How should a client read a particular version? Xianzheng Dou 17

  49. Available transfer methods • How should a client read a particular version? By value By replay on the client By replay on the server From peers Xianzheng Dou 17

  50. Choosing the right method • How should a client read a particular version? • Server has the most complete knowledge • Metrics – User waiting time – Monetary cost – Client energy consumption Xianzheng Dou 18

  51. Feasibility • Eidetic system overheads – Record 4 years of workstation data on a 4TB hard disk – Under 8% performance overhead on most benchmarks • Applications (log size vs. diff size) – Logs are smaller • image/audio editing, latex, make, slides editing – Diffs are smaller: text editing • File sharing – Most files are not shared Xianzheng Dou 19

  52. Conclusions • A new point in the design space of – Versioning file systems – Provenance-aware file systems • Hypothesis – More effective in versioning and provenance – Achieving reasonable overheads • Under implementation Xianzheng Dou 20

  53. Thank you! Xianzheng Dou 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend