Paxos wrapup Doug Woos Logistics notes Whence video lecture? - - PowerPoint PPT Presentation

paxos wrapup
SMART_READER_LITE
LIVE PREVIEW

Paxos wrapup Doug Woos Logistics notes Whence video lecture? - - PowerPoint PPT Presentation

Paxos wrapup Doug Woos Logistics notes Whence video lecture? Problem Set 3 out on Friday Paxos Made Moderately Complex Made Simple When to run for office When should a leader try to get elected? - At the beginning of time - When the current


slide-1
SLIDE 1

Paxos wrapup

Doug Woos

slide-2
SLIDE 2

Logistics notes

Whence video lecture? Problem Set 3 out on Friday

slide-3
SLIDE 3

Paxos Made Moderately Complex Made Simple

slide-4
SLIDE 4

When to run for office

When should a leader try to get elected?

  • At the beginning of time
  • When the current leader seems to have failed

Paper describes an algorithm, based on pinging the leader and timing out If you get preempted, don’t immediately try for election again!

slide-5
SLIDE 5

Reconfiguration

All replicas must agree on who the leaders and acceptors are How do we do this?

slide-6
SLIDE 6

Reconfiguration

All replicas must agree on who the leaders and acceptors are How do we do this?

  • Use the log!
  • Commit a special reconfiguration command
  • New config applies after WINDOW slots
slide-7
SLIDE 7

Replicas

Op1 Op2 Op3 Op4 Op5 Op6 reconfig(L, A) Put k1 v1 App k2 v2

slot_out slot_in Replica Leader WINDOW=2

slide-8
SLIDE 8

Reconfiguration

What if we need to reconfigure now and client requests aren’t coming in?

slide-9
SLIDE 9

Reconfiguration

What if we need to reconfigure now and client requests aren’t coming in?

  • Commit no-ops until WINDOW is cleared
slide-10
SLIDE 10

Other complications

State simplifications

  • Can track much less information, esp. on replicas

Garbage collection

  • Unbounded memory growth is bad
  • Lab 3: track finished slots across all instances,

garbage collect when everyone is ready Read-only commands

  • Can’t just read from replica (why?)
  • But, don’t need their own slot
slide-11
SLIDE 11

Data center architecture

Doug Woos

slide-12
SLIDE 12

The Internet

Theoretically: huge, decentralized infrastructure In practice: an awful lot of it is in Amazon data centers

  • Most of the rest is in Google’s, Facebook’s, etc.
slide-13
SLIDE 13

The Internet

slide-14
SLIDE 14

The Internet

slide-15
SLIDE 15

Data centers

10k - 100k servers 100PB - 1EB storage 100s of Tb/s bandwidth

  • More than core of Internet

10-100MW power

  • 1-2% of global energy consumption

100s of millions of dollars

slide-16
SLIDE 16

Servers in racks

19” wide 1.75” tall (1u) (convention from 1922!) ~40 servers/rack

  • Commodity HW

Connected to switch at top

  • ToR switch
slide-17
SLIDE 17

Racks in rows

slide-18
SLIDE 18

Rows in hot/cold pairs

slide-19
SLIDE 19

Hot/cold pairs in data centers

slide-20
SLIDE 20

Where is the cloud?

Amazon, in the US:

  • Northern Virginia
  • Ohio
  • Oregon
  • Northern California

Why those locations?

slide-21
SLIDE 21

Early data center networks

3 layers of switches

  • Edge (ToR)
  • Aggregation
  • Core
slide-22
SLIDE 22

Early data center networks

3 layers of switches

  • Edge (ToR)
  • Aggregation
  • Core

Optical Electrical

slide-23
SLIDE 23

Early data center limitations

Cost

  • Core, aggregation routers = high capacity, low

volume

  • Expensive!

Fault-tolerance

  • Failure of a single core or aggregation router =

large bandwidth loss Bisection bandwidth limited by capacity of largest available router

  • Google’s DC traffic ~doubles every year!
slide-24
SLIDE 24

Clos networks (1953)

How can I replace a big switch by many small switches? Big switch

Small switch

slide-25
SLIDE 25

Clos networks (1953)

How can I replace a big switch by many small switches? Big switch

Small switch Small switch Small switch Small switch

slide-26
SLIDE 26

Fat-tree architecture

To reduce costs, thin out top of fat-tree

slide-27
SLIDE 27

Multipath routing

Lots of bandwidth, split across many paths Round-robin load balancing between any two racks?

  • TCP works better if packets arrive in-order

ECMP: hash on packet header to determine route

slide-28
SLIDE 28

Data center scaling

“Moore’s Law is over”

  • Moore: processor speed doubles every 18 mo
  • Chips still getting faster, but more slowly
  • Limitations: chip size (communication latency),

transistor size, power dissipation Network link bandwidth still scaling

  • 40 Gb/s common, 100 Gb/s coming
  • 10-100 µs cross-DC latency

Services scaling out across the data center

slide-29
SLIDE 29

Local storage

Old: magnetic disks — “spinning rust” Now: solid state storage (flash) Future: NVRAM

slide-30
SLIDE 30

Persistence

When should we consider data persistent?

  • In DRAM on one node?
  • On multiple nodes?
  • In same data center? Different data centers?
  • Different switches? Different power supplies?
  • In storage on one node? etc.
slide-31
SLIDE 31