corfu a shared log design for flash clusters motivation
play

CORFU: A SHARED LOG DESIGN FOR FLASH CLUSTERS Motivation - How to - PowerPoint PPT Presentation

RUNYU ZHENG CORFU: A SHARED LOG DESIGN FOR FLASH CLUSTERS Motivation - How to Agree on Total Order? E E C S 5 9 1 ! 0 1 2 3 4 5 6 7 8 9 Motivation - How to Agree on Total Order? Whats slot 1? Whats slot 5? Its


  1. RUNYU ZHENG CORFU: A SHARED LOG DESIGN FOR FLASH CLUSTERS

  2. Motivation - How to Agree on Total Order? E E C S 5 9 1 ! 0 1 2 3 4 5 6 7 8 9

  3. Motivation - How to Agree on Total Order? What’s slot 1? What’s slot 5? It’s ‘E’ It’s ‘9’ E E C S 5 9 1 ! 0 1 2 3 4 5 6 7 8 9

  4. Motivation - How to Agree on Total Order? What’s slot 1? What’s slot 5? It’s ‘E’ It’s ‘9’ E E C S 5 9 1 ! Shared Log 0 1 2 3 4 5 6 7 8 9

  5. Motivation - How to Build a Shared Log? Server + ssd ‣ Performance limited by server’s bandwidth

  6. Motivation - How to Build a Shared Log? Server + ssd ‣ Performance limited by server’s bandwidth

  7. Motivation - How to Build a Shared Log? CORFU slot 0 slot 5 ‣ Client communicates directly with flash units ‣ Increased throughput

  8. Motivation - How to Build a Shared Log? CORFU slot 0 slot 5 ‣ Client communicates directly with flash units ‣ Increased throughput

  9. Design - Client API ‣ append(entry b): get the position l ‣ read(log position l): get the entry ‣ trim(log position l): garbage collection ‣ fill(log position l): indicate hole E E C S 5 9 1 0 1 2 3 4 5 6 7 8 9

  10. Design - Client API ‣ append(entry b): get the position l ‣ read(log position l): get the entry ‣ trim(log position l): garbage collection ‣ fill(log position l): indicate hole GC E E C S 5 9 1 0 1 2 3 4 5 6 7 8 9

  11. Design - Client API ‣ append(entry b): get the position l ‣ read(log position l): get the entry ‣ trim(log position l): garbage collection ‣ fill(log position l): indicate hole GC H E E C S 5 9 1 0 1 2 3 4 5 6 7 8 9

  12. Design - Architecture

  13. Design - Architecture Controller (for every flash unit)

  14. Design - Architecture Map log pos-> flash page (maintained by clients)

  15. Design - Architecture Tail-finding mechanism

  16. Design - Architecture Replication (single pos map to multiple flash units)

  17. Design - Architecture Tail-finding mechanism Map log pos-> flash page (maintained by clients) Controller Replication (for every flash unit) (single pos map to multiple flash units)

  18. Detail - Controller for Flash Unit ‣ Write-once semantics ‣ not trimmed =>each slot can only be written once flash page 00 01 02 03 04

  19. Detail - Controller for Flash Unit ‣ Write-once semantics ‣ not trimmed =>each slot can only be written once flash B page 00 01 02 03 04

  20. Detail - Controller for Flash Unit ‣ Write-once semantics ‣ not trimmed =>each slot can only be written once ‣ Seal => used for map change epoch #1 ‣ set epoch number ‣ reject requests with smaller epoch flash B page 00 01 02 03 04

  21. Detail - Map ‣ Map log position to flash pages ‣ Map is maintained by clients ‣ need to agree on a single map ‣ Change of map ‣ consensus algorithm => same map among clients ‣ infrequently (failure/ need more log position) ‣ epoch + seal => old map get rejected

  22. Detail - Tail-Finding Mechanism ‣ Solution 1: Let the client find the tail ‣ utilize the write-once semantics flash ‣ contention + congestion => bad performance page 00 01 02 03 04 ‣ Solution 2: Sequencer to assign log position ‣ hole => fill command ‣ only optimization, cannot rely on the sequencer

  23. Detail - Tail-Finding Mechanism ‣ Solution 1: Let the client find the tail ‣ utilize the write-once semantics flash B ‣ contention + congestion => bad performance page 00 01 02 03 04 ‣ Solution 2: Sequencer to assign log position ‣ hole => fill command ‣ only optimization, cannot rely on the sequencer

  24. Detail - Replication ‣ Map will map a log position to multiple flash pages (in different flash units) ‣ f+1 replicas, data be visible only after it reaches all replicas ‣ How to write? ‣ Chain replication (write in deterministic order) page00 on A page11 on B pos 1 map to page23 on C

  25. Evaluation

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend