Toward Eidetic Distributed File Systems Xianzheng Dou , Jason Flinn, - - PowerPoint PPT Presentation

toward eidetic distributed file systems
SMART_READER_LITE
LIVE PREVIEW

Toward Eidetic Distributed File Systems Xianzheng Dou , Jason Flinn, - - PowerPoint PPT Presentation

Toward Eidetic Distributed File Systems Xianzheng Dou , Jason Flinn, Peter M. Chen Rich file system features Modern file systems store more than just data Versioning: retention of past state Provenance-aware: connections between file


slide-1
SLIDE 1

Toward Eidetic Distributed File Systems

Xianzheng Dou, Jason Flinn, Peter M. Chen

slide-2
SLIDE 2

Rich file system features

  • Modern file systems store more than just data

– Versioning: retention of past state – Provenance-aware: connections between file data

  • Problem:

– High costs for providing these rich features

Xianzheng Dou 1

slide-3
SLIDE 3
  • Frequency of versioning

Versioning FS tradeoffs

Less frequent Lower storage cost More frequent Higher storage cost

2

slide-4
SLIDE 4
  • Frequency of versioning

Versioning FS tradeoffs

Less frequent Lower storage cost More frequent Higher storage cost

2

Ext4

slide-5
SLIDE 5
  • Frequency of versioning

Versioning FS tradeoffs

Less frequent Lower storage cost More frequent Higher storage cost

2

Versionfs WAFL

slide-6
SLIDE 6
  • Frequency of versioning

Versioning FS tradeoffs

Less frequent Lower storage cost More frequent Higher storage cost

2

Elephant FS

slide-7
SLIDE 7
  • Frequency of versioning

Versioning FS tradeoffs

Less frequent Lower storage cost More frequent Higher storage cost

2

CVFS Wayback

slide-8
SLIDE 8
  • Frequency of versioning

Versioning FS tradeoffs

Less frequent Lower storage cost More frequent Higher storage cost

2

Any past user-level state?

slide-9
SLIDE 9
  • Frequency of versioning

Versioning FS tradeoffs

Less frequent Lower storage cost More frequent Higher storage cost

2

Any past user-level state?

Any past file system state and any transient state

slide-10
SLIDE 10
  • Details of connection information

Provenance FS tradeoffs

Lower granulartiy Lower storage cost Higher granularity Higher storage cost

3

slide-11
SLIDE 11
  • Details of connection information

Provenance FS tradeoffs

Lower granulartiy Lower storage cost Higher granularity Higher storage cost

3

Ext4

slide-12
SLIDE 12
  • Details of connection information

Provenance FS tradeoffs

Lower granulartiy Lower storage cost Higher granularity Higher storage cost

3

Connections

slide-13
SLIDE 13
  • Details of connection information

Provenance FS tradeoffs

Lower granulartiy Lower storage cost Higher granularity Higher storage cost

3

PASS

slide-14
SLIDE 14
  • Details of connection information

Provenance FS tradeoffs

Lower granulartiy Lower storage cost Higher granularity Higher storage cost

3

Complete byte-level provenance?

slide-15
SLIDE 15

Background: eidetic systems[OSDI’14]

  • Recall any past user-level state

– By pervasive deterministic record and replay

Xianzheng Dou 4

… … … …

RECORD

… … … …

PLAY

Replay Record

Logs of non-deterministic events

slide-16
SLIDE 16

Background: eidetic systems[OSDI’14]

  • Recall any past user-level state

– By pervasive deterministic record and replay

  • Provenance at the byte granularity

– Intra-process lineage: dynamic information tracking – Inter-process lineage: data flow dependency graph

Xianzheng Dou 4

… … … …

RECORD

… … … …

PLAY

Replay Record

Logs of non-deterministic events

slide-17
SLIDE 17

A clean-sheet design of FS

  • Eidetic systems prototype

– Graft eidetic support onto an existing FS – Consider only local storage

  • An eidetic distributed file system

– A small number of personal devices + cloud servers

  • New design choices

– Fundamental unit of persistent storage – File transfer

Xianzheng Dou 5

slide-18
SLIDE 18

Traditional distributed FS

Xianzheng Dou 6

slide-19
SLIDE 19

Traditional distributed FS

Xianzheng Dou 6

slide-20
SLIDE 20

Traditional distributed FS

Xianzheng Dou 6

slide-21
SLIDE 21

Traditional distributed FS

Xianzheng Dou 6

slide-22
SLIDE 22

Eidetic distributed file systems

Xianzheng Dou 7

slide-23
SLIDE 23

Eidetic distributed file systems

Xianzheng Dou 7

slide-24
SLIDE 24

Fundamental unit

  • What is the fundamental unit of persistent storage?

Xianzheng Dou 8

slide-25
SLIDE 25

Fundamental unit

  • What is the fundamental unit of persistent storage?

Xianzheng Dou 8

slide-26
SLIDE 26

Fundamental unit

  • What is the fundamental unit of persistent storage?

Xianzheng Dou 8

Replay

slide-27
SLIDE 27

Fundamental unit

  • What is the fundamental unit of persistent storage?

Xianzheng Dou 9

slide-28
SLIDE 28

Fundamental unit

  • What is the fundamental unit of persistent storage?

Xianzheng Dou 9

slide-29
SLIDE 29

Fundamental unit

  • What is the fundamental unit of persistent storage?

Xianzheng Dou 9

Fundamental unit: Logs of non-determinism Files are only considered as caches

slide-30
SLIDE 30

File persistency

  • When is a file considered persistent on the server?

Xianzheng Dou 10

slide-31
SLIDE 31

File persistency

  • When is a file considered persistent on the server?

Xianzheng Dou 10

As long as logs generating the data are persistent

slide-32
SLIDE 32

File persistency

  • When is a file considered persistent on the server?

Xianzheng Dou 10

slide-33
SLIDE 33

Updating server cache

  • Should the server cache the file version?

Xianzheng Dou 11

?

slide-34
SLIDE 34

Updating server cache

  • Should the server cache the file version?

Xianzheng Dou 11

?

Probability of future access Costs for regeneration

slide-35
SLIDE 35

File transfer methods

  • How are files transferred to the server?

Xianzheng Dou 12

slide-36
SLIDE 36

File transfer methods

  • How are files transferred to the server?

Xianzheng Dou 12

slide-37
SLIDE 37

File transfer methods

  • How are files transferred to the server?

Xianzheng Dou 13

slide-38
SLIDE 38

File transfer methods

  • How are files transferred to the server?

Xianzheng Dou 13

slide-39
SLIDE 39

File transfer methods

  • How are files transferred to the server?

Xianzheng Dou 13

Compare computation costs with communication costs

  • by value (file data)
  • or by replay
slide-40
SLIDE 40

Read path

  • How should a client read a particular version?

Xianzheng Dou 14

slide-41
SLIDE 41

Read path

  • How should a client read a particular version?

Xianzheng Dou 14

slide-42
SLIDE 42

Available transfer methods

  • How should a client read a particular version?

Xianzheng Dou 15

slide-43
SLIDE 43

Available transfer methods

  • How should a client read a particular version?

Xianzheng Dou 15

slide-44
SLIDE 44

Available transfer methods

  • How should a client read a particular version?

Xianzheng Dou 15

slide-45
SLIDE 45

Available transfer methods

  • How should a client read a particular version?

Xianzheng Dou 16

slide-46
SLIDE 46

Available transfer methods

  • How should a client read a particular version?

Xianzheng Dou 16

slide-47
SLIDE 47

Available transfer methods

  • How should a client read a particular version?

Xianzheng Dou 16

slide-48
SLIDE 48

Available transfer methods

  • How should a client read a particular version?

Xianzheng Dou 17

slide-49
SLIDE 49

Available transfer methods

  • How should a client read a particular version?

Xianzheng Dou 17

By value By replay on the client By replay on the server From peers

slide-50
SLIDE 50

Choosing the right method

  • How should a client read a particular version?
  • Server has the most complete knowledge
  • Metrics

– User waiting time – Monetary cost – Client energy consumption

Xianzheng Dou 18

slide-51
SLIDE 51

Feasibility

  • Eidetic system overheads

– Record 4 years of workstation data on a 4TB hard disk – Under 8% performance overhead on most benchmarks

  • Applications (log size vs. diff size)

– Logs are smaller

  • image/audio editing, latex, make, slides editing

– Diffs are smaller: text editing

  • File sharing

– Most files are not shared

Xianzheng Dou 19

slide-52
SLIDE 52

Conclusions

  • A new point in the design space of

– Versioning file systems – Provenance-aware file systems

  • Hypothesis

– More effective in versioning and provenance – Achieving reasonable overheads

  • Under implementation

Xianzheng Dou 20

slide-53
SLIDE 53

Thank you!

Xianzheng Dou 21