TransparentCheckpointofClosed DistributedSystemsin Emulab - - PowerPoint PPT Presentation

transparent checkpoint of closed distributed systems in
SMART_READER_LITE
LIVE PREVIEW

TransparentCheckpointofClosed DistributedSystemsin Emulab - - PowerPoint PPT Presentation

TransparentCheckpointofClosed DistributedSystemsin Emulab AntonBurtsev,PrashanthRadhakrishnan, MikeHibler,andJayLepreau UniversityofUtah,SchoolofCompuEng Emulab


slide-1
SLIDE 1

Transparent
Checkpoint
of
Closed
 Distributed
Systems
in
 Emulab


Anton
Burtsev,
Prashanth
Radhakrishnan,

 Mike
Hibler,
and
Jay
Lepreau
 University
of
Utah,
School
of
CompuEng


slide-2
SLIDE 2

Emulab


  • Public
testbed
for
network
experimentaEon


2


  • Complex
networking
experiments
within
minutes

slide-3
SLIDE 3

Emulab
—
precise
research
tool


  • Realism:



– Real
dedicated
hardware


  • Machines
and
networks


– Real
operaEng
systems
 – Freedom
to
configure
any
component
of
the
soNware
 stack
 – Meaningful
real‐world
results


  • Control:


– Closed
system


  • Controlled
external
dependencies
and
side
effects


– Control
interface
 – Repeatable,
directed
experimentaEon


3


slide-4
SLIDE 4

Goal:
more
control
over
execuEon


  • Stateful
swap‐out


– Demand
for
physical
resources
exceeds
capacity
 – PreempEve
experiment
scheduling


  • Long‐running


  • Large‐scale
experiments


– No
loss
of
experiment
state


  • Time‐travel


– Replay
experiments


  • DeterminisEcally
or
non‐determinisEcally


– Debugging
and
analysis
aid


4


slide-5
SLIDE 5

Challenge


  • Both
controls
should
preserve
fidelity
of


experimentaEon


  • Both
rely
on
transparency
of
distributed
checkpoint


5


slide-6
SLIDE 6

Transparent
checkpoint


  • TradiEonally,
semanEc
transparency:


– Checkpointed
execuEon
is
one
of
the
possible
correct
 execuEons


  • What
if
we
want
to
preserve
performance


correctness?



– Checkpointed
execuEon
is
one
of
the
correct
execuEons
 closest
to
a
non‐checkpointed
run


  • Preserve
measurable
parameters
of
the
system


– CPU
allocaEon
 – Elapsed
Eme
 – Disk
throughput
 – Network
delay
and
bandwidth


6


slide-7
SLIDE 7

TradiEonal
view


  • Local
case


– Transparency
=
smallest
possible
downEme
 – Several
milliseconds
[Remus]
 – Background
work
 – Harms
realism


  • Distributed
case


– Lamport
checkpoint


  • Provides
consistency


– Packet
delays,
Emeouts,
traffic
bursts,
replay
buffer


  • verflows


7


slide-8
SLIDE 8

Main
insight


  • Conceal
checkpoint
from
the
system
under
test


– But
sEll
stay
on
the
real
hardware
as
much
as
possible


  • “Instantly”
freeze
the
system


– Time
and
execuEon
 – Ensure
atomicity
of
checkpoint


  • Single
non‐divisible
acEon


  • Conceal
checkpoint
by
Eme
virtualizaEon


8


slide-9
SLIDE 9

ContribuEons


  • Transparency
of

distributed
checkpoint

  • Local
atomicity



– Temporal
firewall



  • ExecuEon
control
mechanisms
for
Emulab


– Stateful
swap‐out
 – Time‐travel


  • Branching
storage


9


slide-10
SLIDE 10

Challenges
and
implementaEon


10


slide-11
SLIDE 11

Checkpoint
essenEals


  • State
encapsulaEon



– Suspend
execuEon
 – Save
running
state
of
the
 system


  • VirtualizaEon
layer


11


slide-12
SLIDE 12

Checkpoint
essenEals


  • State
encapsulaEon



– Suspend
execuEon
 – Save
running
state
of
the
 system


  • VirtualizaEon
layer


– Suspends
the
system
 – Saves
its
state
 – Saves
in‐flight
state
 – Disconnects/reconnects
to
 the
hardware


12


slide-13
SLIDE 13

First
challenge:
atomicity


  • Permanent
encapsulaEon
is


harmful


– Too
slow
 – Some
state
is
shared


  • Encapsulated
upon


checkpoint


  • Externally
to
VM


– Full
memory
virtualizaEon
 – Needs
declaraEve
descripEon


  • f

shared
state

  • Internally
to
VM


– Breaks
atomicity


13


?


slide-14
SLIDE 14

Atomicity
in
the
local
case


  • Temporal
firewall


– SelecEvely
suspends
 execuEon
and
Eme
 – Provides
atomicity
inside
 the
firewall


  • ExecuEon
control
in
the


Linux
kernel


– Kernel
threads
 – Interrupts,
excepEons,
 IRQs


  • Conceals
checkpoint



– Time
virtualizaEon


14


slide-15
SLIDE 15

Second
challenge:
synchronizaEon








???


$%#!
 Timeout


  • Lamport
checkpoint


– No
synchronizaEon
 – System
is
parEally
 suspended


  • Preserves
consistency



– Logs
in‐flight
packets


  • Once
logged
it’s


impossible
to
remove


  • Unsuspended
nodes


– Time‐outs


15


slide-16
SLIDE 16

Synchronized
checkpoint


  • Synchronize
clocks


across
the
system


  • Schedule


checkpoint



  • Checkpoint
all


nodes
at
once


  • Almost
no
in‐flight


packets


16


slide-17
SLIDE 17

Bandwidth‐delay
product


  • Large
number
of
in‐

flight
packets



  • Slow
links
dominate


the
log


  • Faster
links
wait
for


the
enEre
log
to
 complete


  • Per‐path
replay?


– Unavailable
at
Layer
2
 – Accurate
replay
 engine
on
every
node


17


slide-18
SLIDE 18

Checkpoint
the
network
core


  • Leverage
Emulab
delay


nodes


– Emulab
links
are
no‐delay
 – Link
emulaEon
done
by


 delay
nodes


  • Avoid
replay
of
in‐flight


packets


  • Capture
all
in‐flight
packets


in
core


– Checkpoint
delay
nodes


18


slide-19
SLIDE 19

Efficient
branching
storage


  • To
be
pracEcal
stateful


swap‐out
has
to
be
fast


  • Mostly
read‐only
FS


– Shared
across
nodes
and
 experiments


  • Deltas
accumulate


across
swap‐outs


  • Based
on
LVM


– Many
opEmizaEons


19


slide-20
SLIDE 20

EvaluaEon


slide-21
SLIDE 21

EvaluaEon
plan


  • Transparency
of
the
checkpoint

  • Measurable
metrics


– Time
virtualizaEon
 – CPU
allocaEon
 – Network
parameters


21


slide-22
SLIDE 22

Time
virtualizaEon


22
 Checkpoint
every
5
sec
 (24
checkpoints)
 Timer
accuracy
is
28
μsec
 Checkpoint
adds
±80
μsec
 error
 do
{
 




usleep(10
ms)
 




germeofday()
 }
while
()
 sleep
+
overhead
=
20
ms


slide-23
SLIDE 23

CPU
allocaEon


23
 Checkpoint
every
5
sec
 (29
checkpoints)
 do
{
 




stress_cpu()
 




germeofday()
 }
while()
 stress
+
overhead
=
236.6
ms
 Normally
within
9
ms


  • f

average


Checkpoint
adds
27
ms
error
 ls
/root



–

7ms
overhead
 xm
list




–


130
ms


slide-24
SLIDE 24

Network
transparency:
iperf



24
 No
TCP
window
change
 No
packet
drops
 Throughput
drop
is
 due
to
background
 acEvity
 Average
inter‐packet
Eme:
18
μsec
 Checkpoint
adds:
330
‐‐
5801

μsec
 Checkpoint
every
5
sec
 (4
checkpoints)
 ‐
1Gbps,
0
delay
network,

 ‐ 
iperf
between
two
VMs
 ‐ 
tcpdump
inside
one
of
VMs
 ‐ 
averaging
over
0.5
ms


slide-25
SLIDE 25

Network
transparency:
BitTorrent


25
 100Mbps,
low
delay

 1BT
server
+
3
clients

 3GB
file
 Checkpoint
preserves
 average
throughput
 Checkpoint
every
5
sec
 (20
checkpoints)


slide-26
SLIDE 26

Conclusions


  • Transparent
distributed
checkpoint


– Precise
research
tool
 – Fidelity
of
distributed
system
analysis


  • Temporal
firewall


– General
mechanism
to
change
percepEon
of
Eme
for
the
 system
 – Conceal
various
external
events


  • Future
work
is
Eme‐travel


26


slide-27
SLIDE 27

Thank
you


aburtsev@flux.utah.edu


slide-28
SLIDE 28

Backup


28


slide-29
SLIDE 29

Branching
storage


29


  • Copy‐on‐write
as
a
redo
log

  • Linear
addressing

  • Free
block
eliminaEon

  • Read
before
write
eliminaEon

slide-30
SLIDE 30

Branching
storage


30