CORFU: A SHARED LOG DESIGN FOR FLASH CLUSTERS Motivation - How to - - PowerPoint PPT Presentation

corfu a shared log design for flash clusters motivation
SMART_READER_LITE
LIVE PREVIEW

CORFU: A SHARED LOG DESIGN FOR FLASH CLUSTERS Motivation - How to - - PowerPoint PPT Presentation

RUNYU ZHENG CORFU: A SHARED LOG DESIGN FOR FLASH CLUSTERS Motivation - How to Agree on Total Order? E E C S 5 9 1 ! 0 1 2 3 4 5 6 7 8 9 Motivation - How to Agree on Total Order? Whats slot 1? Whats slot 5? Its


slide-1
SLIDE 1

CORFU: A SHARED LOG DESIGN FOR FLASH CLUSTERS

RUNYU ZHENG

slide-2
SLIDE 2

E E C S 5 9 1 ! 1 2 3 4 5 6 7 8 9

Motivation - How to Agree on Total Order?

slide-3
SLIDE 3

Motivation - How to Agree on Total Order?

What’s slot 1? E E C S 5 9 1 ! 1 2 3 4 5 6 7 8 9 It’s ‘E’ It’s ‘9’ What’s slot 5?

slide-4
SLIDE 4

Motivation - How to Agree on Total Order?

What’s slot 1? E E C S 5 9 1 ! 1 2 3 4 5 6 7 8 9 It’s ‘E’ It’s ‘9’ What’s slot 5? Shared Log

slide-5
SLIDE 5

Motivation - How to Build a Shared Log?

Server + ssd

  • Performance limited by server’s bandwidth
slide-6
SLIDE 6

Motivation - How to Build a Shared Log?

Server + ssd

  • Performance limited by server’s bandwidth
slide-7
SLIDE 7

Motivation - How to Build a Shared Log?

  • Client communicates directly with flash units
  • Increased throughput

slot 0 slot 5 CORFU

slide-8
SLIDE 8

Motivation - How to Build a Shared Log?

  • Client communicates directly with flash units
  • Increased throughput

slot 0 slot 5 CORFU

slide-9
SLIDE 9

Design - Client API

  • append(entry b): get the position l
  • read(log position l): get the entry
  • trim(log position l): garbage collection
  • fill(log position l): indicate hole

E E C S 5 9 1 1 2 3 4 5 6 7 8 9

slide-10
SLIDE 10

Design - Client API

  • append(entry b): get the position l
  • read(log position l): get the entry
  • trim(log position l): garbage collection
  • fill(log position l): indicate hole

E E C S 5 9 1 1 2 3 4 5 6 7 8 9

GC

slide-11
SLIDE 11

Design - Client API

  • append(entry b): get the position l
  • read(log position l): get the entry
  • trim(log position l): garbage collection
  • fill(log position l): indicate hole

E E C S 5 9 1 1 2 3 4 5 6 7 8 9

GC H

slide-12
SLIDE 12

Design - Architecture

slide-13
SLIDE 13

Design - Architecture

Controller (for every flash unit)

slide-14
SLIDE 14

Design - Architecture

Map log pos-> flash page (maintained by clients)

slide-15
SLIDE 15

Design - Architecture

Tail-finding mechanism

slide-16
SLIDE 16

Design - Architecture

Replication (single pos map to multiple flash units)

slide-17
SLIDE 17

Design - Architecture

Tail-finding mechanism Map log pos-> flash page (maintained by clients) Controller (for every flash unit) Replication (single pos map to multiple flash units)

slide-18
SLIDE 18

Detail - Controller for Flash Unit

flash page 00 01 02 03 04

  • Write-once semantics
  • not trimmed =>each slot can only be written once
slide-19
SLIDE 19

Detail - Controller for Flash Unit

B flash page 00 01 02 03 04

  • Write-once semantics
  • not trimmed =>each slot can only be written once
slide-20
SLIDE 20

Detail - Controller for Flash Unit

B flash page 00 01 02 03 04

  • Write-once semantics
  • not trimmed =>each slot can only be written once
  • Seal => used for map change
  • set epoch number
  • reject requests with smaller epoch

epoch #1

slide-21
SLIDE 21

Detail - Map

  • Map log position to flash pages
  • Map is maintained by clients
  • need to agree on a single map
  • Change of map
  • consensus algorithm => same map among clients
  • infrequently (failure/ need more log position)
  • epoch + seal => old map get rejected
slide-22
SLIDE 22

Detail - Tail-Finding Mechanism

  • Solution 1: Let the client find the tail
  • utilize the write-once semantics
  • contention + congestion => bad performance
  • Solution 2: Sequencer to assign log position
  • hole => fill command
  • only optimization, cannot rely on the sequencer

flash page 00 01 02 03 04

slide-23
SLIDE 23

Detail - Tail-Finding Mechanism

B flash page 00 01 02 03 04

  • Solution 1: Let the client find the tail
  • utilize the write-once semantics
  • contention + congestion => bad performance
  • Solution 2: Sequencer to assign log position
  • hole => fill command
  • only optimization, cannot rely on the sequencer
slide-24
SLIDE 24

Detail - Replication

  • Map will map a log position to multiple flash pages (in different flash units)
  • f+1 replicas, data be visible only after it reaches all replicas
  • How to write?
  • Chain replication (write in deterministic order)

pos 1 map to page00 on A page11 on B page23 on C

slide-25
SLIDE 25

Evaluation