Instant OS Updates via Userspace Checkpoint-and-Restart Sanidhya - - PowerPoint PPT Presentation

instant os updates via userspace checkpoint and restart
SMART_READER_LITE
LIVE PREVIEW

Instant OS Updates via Userspace Checkpoint-and-Restart Sanidhya - - PowerPoint PPT Presentation

Instant OS Updates via Userspace Checkpoint-and-Restart Sanidhya Kashyap , Changwoo Min, Byoungyoung Lee, Taesoo Kim, Pavel Emelyanov OS updates are prevalent And OS updates are unavoidable Prevent known, state-of-the-art attacks


slide-1
SLIDE 1

Instant OS Updates via Userspace Checkpoint-and-Restart

Sanidhya Kashyap, Changwoo Min, Byoungyoung Lee, Taesoo Kim, Pavel Emelyanov

slide-2
SLIDE 2

OS updates are prevalent

slide-3
SLIDE 3

And OS updates are unavoidable

  • Prevent known, state-of-the-art attacks

– Security patches

  • Adopt new features

– New I/O scheduler features

  • Improve performance

– Performance patches

slide-4
SLIDE 4
slide-5
SLIDE 5

Unfortunately, system updates come at a cost

  • Unavoidable downtime
  • Potential risk of system failure
slide-6
SLIDE 6

Unfortunately, system updates come at a cost

  • Unavoidable downtime
  • Potential risk of system failure

$109k per minute Hidden costs (losing customers)

slide-7
SLIDE 7

Example: memcached

  • Facebook's memcached servers incur a

downtime of 2-3 hours per machine

– Warming cache (e.g., 120 GB) over the network

slide-8
SLIDE 8

Example: memcached

  • Facebook's memcached servers incur a

downtime of 2-3 hours per machine

– Warming cache (e.g., 120 GB) over the network

Our approach updates OS in 3 secs for 32GB of data from v3.18 to v3.19 for Ubuntu / Fedora releases

slide-9
SLIDE 9

Existing practices for OS updates

  • Dynamic Kernel Patching (e.g., kpatch, ksplice)

– Problem: only support minor patches

  • Rolling Update (e.g., Google, Facebook, etc)

– Problem: inevitable downtime and requires

careful planning

slide-10
SLIDE 10

Existing practices for OS updates

  • Dynamic Kernel Patching (e.g., kpatch, ksplice)

– Problem: only support minor patches

  • Rolling Update (e.g., Google, Facebook, etc)

– Problem: inevitable downtime and requires

careful planning

Losing application state is inevitable

→ Restoring memcached takes 2-3 hours

slide-11
SLIDE 11

Existing practices for OS updates

  • Dynamic Kernel Patching (e.g., kpatch, ksplice)

– Problem: only support minor patches

  • Rolling Update (e.g., Google, Facebook, etc)

– Problem: inevitable downtime and requires

careful planning

Losing application state is inevitable

→ Restoring memcached takes 2-3 hours

Goals of this work:

  • Support all types of patches
  • Least downtime to update new OS
  • No kernel source modifjcation
slide-12
SLIDE 12

Problems of typical OS update

OS

Memcached

OS OS OS

Stop service

slide-13
SLIDE 13

Problems of typical OS update

OS

Memcached

OS New OS OS OS

Stop service Soft reboot

slide-14
SLIDE 14

Problems of typical OS update

OS

Memcached

OS New OS New OS

Memcached

OS OS

Stop service Soft reboot Start service

slide-15
SLIDE 15

Problems of typical OS update

OS

Memcached

OS New OS New OS

Memcached

OS OS

Stop service Soft reboot Start service

2-3 hours of downtime

slide-16
SLIDE 16

Problems of typical OS update

OS

Memcached

OS New OS New OS

Memcached

OS OS

Stop service Soft reboot Start service

2-3 hours of downtime 2-10 minutes of downtime

slide-17
SLIDE 17

Problems of typical OS update

OS

Memcached

OS New OS New OS

Memcached

OS OS

Stop service Soft reboot Start service

Is it possible to keep the application state?

2-3 hours of downtime 2-10 minutes of downtime

slide-18
SLIDE 18

OS updates loose application states

OS

Memcached

OS New OS New OS

Memcached

OS OS

Stop service Soft reboot Start service

KUP: Kernel update with application checkpoint-and-restore (C/R)

slide-19
SLIDE 19

OS updates loose application states

OS

Memcached

OS New OS New OS

Memcached

OS OS

Stop service Soft reboot Start service

Memcached

Checkpoint

KUP: Kernel update with application checkpoint-and-restore (C/R)

slide-20
SLIDE 20

OS updates loose application states

OS

Memcached

OS New OS New OS

Memcached

OS OS

Stop service Soft reboot Start service

Memcached Memcahed

In-kernel switch Checkpoint

KUP: Kernel update with application checkpoint-and-restore (C/R)

slide-21
SLIDE 21

OS updates loose application states

OS

Memcached

OS New OS New OS

Memcached

OS OS

Stop service Soft reboot Start service

Memcached Memcahed

In-kernel switch Checkpoint Restore

KUP: Kernel update with application checkpoint-and-restore (C/R)

slide-22
SLIDE 22

OS updates loose application states

Stop service Start service Checkpoint Restore

KUP's life cycle KUP: Kernel update with application checkpoint-and-restore (C/R)

In-kernel switch

slide-23
SLIDE 23

OS updates loose application states

Stop service Start service Checkpoint Restore

KUP's life cycle KUP: Kernel update with application checkpoint-and-restore (C/R)

In-kernel switch

1-10 minutes of downtime

slide-24
SLIDE 24

OS updates loose application states

New OS New OS

Stop service Start service Checkpoint Restore

KUP's life cycle KUP: Kernel update with application checkpoint-and-restore (C/R)

In-kernel switch

Challenge: how to further decrease the potential downtime?

1-10 minutes of downtime

slide-25
SLIDE 25

Techniques to decrease the downtime

Checkpoint Restore In-kernel switch

1) Incremental checkpoint

slide-26
SLIDE 26

Techniques to decrease the downtime

Checkpoint Restore In-kernel switch

1) Incremental checkpoint 2) On-demand restore

slide-27
SLIDE 27

Techniques to decrease the downtime

Checkpoint Restore In-kernel switch

1) Incremental checkpoint 2) On-demand restore 3) FOAM: a snapshot abstraction

slide-28
SLIDE 28

Techniques to decrease the downtime

Checkpoint Restore In-kernel switch

1) Incremental checkpoint 2) On-demand restore 3) FOAM: a snapshot abstraction 4) PPP: reuse memory without an explicit dump

slide-29
SLIDE 29

Techniques to decrease the downtime

Checkpoint Restore In-kernel switch

1) Incremental checkpoint 2) On-demand restore 3) FOAM: a snapshot abstraction 4) PPP: reuse memory without an explicit dump

slide-30
SLIDE 30

Incremental checkpoint

Timeline S1

  • Reduces downtime (up to 83.5%)
  • Problem: Multiple snapshots increase the restore time

Naive checkpoint

downtime

Si Snapshot instance →

slide-31
SLIDE 31

Incremental checkpoint

Timeline S1 S1

  • Reduces downtime (up to 83.5%)
  • Problem: Multiple snapshots increase the restore time

Naive checkpoint Incremental checkpoint

downtime

Si Snapshot instance →

slide-32
SLIDE 32

Incremental checkpoint

Timeline S1 S1 S2

  • Reduces downtime (up to 83.5%)
  • Problem: Multiple snapshots increase the restore time

Naive checkpoint Incremental checkpoint

downtime

Si Snapshot instance →

slide-33
SLIDE 33

Incremental checkpoint

Timeline S1 S1 S2 S3

  • Reduces downtime (up to 83.5%)
  • Problem: Multiple snapshots increase the restore time

Naive checkpoint Incremental checkpoint

downtime

Si Snapshot instance →

slide-34
SLIDE 34

Incremental checkpoint

Timeline S1 S1 S2 S3

  • Reduces downtime (up to 83.5%)
  • Problem: Multiple snapshots increase the restore time

Naive checkpoint Incremental checkpoint S4

downtime downtime

Si Snapshot instance →

slide-35
SLIDE 35

On-demand restore

  • Rebind the memory once the application

accesses it

– Only map the memory region with snapshot

and restart the application

  • Decreases the downtime (up to 99.6%)
  • Problem: Incompatible with incremental

checkpoint

slide-36
SLIDE 36

Problem: both techniques together result in ineffjcient application C/R

  • During restore, need to map each pages individually

– Individual lookups to fjnd the relevant pages – Individual page mapping to enable on-demand restore

S1 S1

2 4 3

  • An application has 4 pages as

its working set size

  • Incremental checkpoint has 2

iterations

– 1st iteration

all 4 pages (1, 2, 3, 4) are dumped →

– 2nd iteration

2 pages (2, 4) are dirtied →

1

  • Increases the restoration downtime (42.5%)
slide-37
SLIDE 37

Problem: both techniques together result in ineffjcient application C/R

  • During restore, need to map each pages individually

– Individual lookups to fjnd the relevant pages – Individual page mapping to enable on-demand restore

S1 S1 S2

3 2 4

  • An application has 4 pages as

its working set size

  • Incremental checkpoint has 2

iterations

– 1st iteration

all 4 pages (1, 2, 3, 4) are dumped →

– 2nd iteration

2 pages (2, 4) are dirtied →

1

  • Increases the restoration downtime (42.5%)
slide-38
SLIDE 38

New abstraction: fjle-ofgset based address mapping (FOAM)

  • Flat address space representation for the snapshot

– One-to-one mapping between the address space and the

snapshot

– No explicit lookups for the pages across the snapshots – A few map operations to map the entire snapshot with address

space

  • Use sparse fjle representation

– Rely on the concept of holes supported by modern fjle systems

  • Simplifjes incremental checkpoint and on-demand restore
slide-39
SLIDE 39

Techniques to decrease the downtime

Checkpoint Restore In-kernel switch

1) Incremental checkpoint 2) On-demand restore 3) FOAM: a snapshot abstraction 4) PPP: reuse memory without an explicit dump

slide-40
SLIDE 40

Redundant data copy

Running Checkpoint In-kernel switch Restore Running

OS

Running

  • Application C/R copies data back and forth
  • Not a good fjt for applications with huge memory

Memcached

RAM

2 4 3 1

slide-41
SLIDE 41

Redundant data copy

Running Checkpoint In-kernel switch Restore Running

S1 Snapshot

2 4 3 1

OS

Checkpoint

  • Application C/R copies data back and forth
  • Not a good fjt for applications with huge memory

RAM

Memcached

slide-42
SLIDE 42

Redundant data copy

Running Checkpoint In-kernel switch Restore Running

S1 Snapshot

2 4 3 1

OS

In-kernel switch

New OS

  • Application C/R copies data back and forth
  • Not a good fjt for applications with huge memory

Memcached

RAM

Memcached

slide-43
SLIDE 43

Redundant data copy

Running Checkpoint In-kernel switch Restore Running

S1 Snapshot

2 4 3 1

OS

Restore

New OS

  • Application C/R copies data back and forth
  • Not a good fjt for applications with huge memory

Memcached

RAM

2 4 3 1 Memcached

slide-44
SLIDE 44

Redundant data copy

Running Checkpoint In-kernel switch Restore Running

S1 Snapshot

2 4 3 1

OS

Running

New OS

  • Application C/R copies data back and forth
  • Not a good fjt for applications with huge memory

Memcached Memcached

RAM

2 4 3 1 Memcached

slide-45
SLIDE 45

Redundant data copy

Running Checkpoint In-kernel switch Restore Running

S1 Snapshot

2 4 3 1

OS

Running

New OS

  • Application C/R copies data back and forth
  • Not a good fjt for applications with huge memory

Memcached Memcached

RAM

2 4 3 1 Memcached

Dump data Read data

slide-46
SLIDE 46

Redundant data copy

Running Checkpoint In-kernel switch Restore Running

S1 Snapshot

2 4 3 1

OS

Running

New OS

  • Application C/R copies data back and forth
  • Not a good fjt for applications with huge memory

Memcached Memcached Memcached

RAM

2 4 3 1 Memcached

Is it possible to avoid memory copy?

Dump data Read data

slide-47
SLIDE 47

Avoid redundant data copy across reboot

Running Checkpoint In-kernel switch Restore Running

OS

Running

  • Reserve the application's memory across reboot
  • Inherently rebind the memory without any copy

Memcached

RAM

2 4 3 1 Memory actively used

slide-48
SLIDE 48

Avoid redundant data copy across reboot

Running Checkpoint In-kernel switch Restore Running

S1 Snapshot

OS

Checkpoint

  • Reserve the application's memory across reboot
  • Inherently rebind the memory without any copy

RAM

2 4 3 1 Memcached Reserve the memory in the OS

slide-49
SLIDE 49

Avoid redundant data copy across reboot

Running Checkpoint In-kernel switch Restore Running

S1 Snapshot

OS

In-kernel switch

New OS Old OS

  • Reserve the application's memory across reboot
  • Inherently rebind the memory without any copy

Memcached

RAM

2 4 3 1 Memcached Reserve the same memory in the new OS

slide-50
SLIDE 50

Avoid redundant data copy across reboot

Running Checkpoint In-kernel switch Restore Running

S1 Snapshot

OS

Restore

New OS Old OS

  • Reserve the application's memory across reboot
  • Inherently rebind the memory without any copy

Memcached

RAM

2 4 3 1 Memcached Implicitly map the memory region

slide-51
SLIDE 51

Avoid redundant data copy across reboot

Running Checkpoint In-kernel switch Restore Running

S1 Snapshot

OS

Running

New OS Old OS

  • Reserve the application's memory across reboot
  • Inherently rebind the memory without any copy

Memcached Memcached

RAM

2 4 3 1 Memcached Memory again in use

slide-52
SLIDE 52

Avoid redundant data copy across reboot

Running Checkpoint In-kernel switch Restore Running

S1 Snapshot

OS

Running

New OS Old OS

  • Reserve the application's memory across reboot
  • Inherently rebind the memory without any copy

Memcached Memcached Memcached

RAM

2 4 3 1 Memcached

Challenge: how to notify the newer OS without modifying its source?

Memory again in use

slide-53
SLIDE 53

Persist physical pages (PPP) without OS modifjcation

  • Reserve virtual-to-physical mapping information

– Static instrumentation of the OS binary – Inject our own memory reservation function, then

further boot the OS

  • Handle page-faults for the restored application

– Dynamic kernel instrumentation – Inject our own page fault handler function for

memory binding

slide-54
SLIDE 54

Persist physical pages (PPP) without OS modifjcation

  • Reserve virtual-to-physical mapping information

– Static instrumentation of the OS binary – Inject our own memory reservation function, then

further boot the OS

  • Handle page-faults for the restored application

– Dynamic kernel instrumentation – Inject our own page fault handler function for

memory binding

  • No explicit memory copy
  • Does not require any kernel source modifjcation
slide-55
SLIDE 55

Implementation

  • Application C/R

→ criu

– Works at the namespace level

  • In-kernel switch

→ kexec system call

– A mini boot loader that bypasses BIOS while booting

slide-56
SLIDE 56

Evaluation

  • How efgective is KUP's approach compared to

the in-kernel hot patching?

  • What is the efgective performance of each

technique during the update?

slide-57
SLIDE 57

KUP can support major and minor updates in Ubuntu

  • KUP supports 23 minor/4 major updates (v3.17–v4.1)
  • However, kpatch can only update 2 versions

– e.g., layout change in data structure

kpatch failure scenarios

slide-58
SLIDE 58

Updating OS with memcached

  • PPP has the least degradation
  • Storage also afgects the performance

50 100 150 190 200 210 220 230 240 250 Bandwidth (MB) Timeline (sec) Basic - SSD

slide-59
SLIDE 59

Updating OS with memcached

  • PPP has the least degradation
  • Storage also afgects the performance

50 100 150 190 200 210 220 230 240 250 Bandwidth (MB) Timeline (sec) Incremental checkpoint - SSD

slide-60
SLIDE 60

Updating OS with memcached

  • PPP has the least degradation
  • Storage also afgects the performance

50 100 150 190 200 210 220 230 240 250 Bandwidth (MB) Timeline (sec) On-demand restore - SSD

slide-61
SLIDE 61

Updating OS with memcached

  • PPP has the least degradation
  • Storage also afgects the performance

50 100 150 190 200 210 220 230 240 250 Bandwidth (MB) Timeline (sec) FOAM - SSD

slide-62
SLIDE 62

Updating OS with memcached

  • PPP has the least degradation
  • Storage also afgects the performance

50 100 150 190 200 210 220 230 240 250 Bandwidth (MB) Timeline (sec) Basic - RP-RAMFS

slide-63
SLIDE 63

Updating OS with memcached

  • PPP has the least degradation
  • Storage also afgects the performance

50 100 150 190 200 210 220 230 240 250 Bandwidth (MB) Timeline (sec) Incremental checkpoint - RP-RAMFS

slide-64
SLIDE 64

Updating OS with memcached

  • PPP has the least degradation
  • Storage also afgects the performance

50 100 150 190 200 210 220 230 240 250 Bandwidth (MB) Timeline (sec) On-demand restore - RP-RAMFS

slide-65
SLIDE 65

Updating OS with memcached

  • PPP has the least degradation
  • Storage also afgects the performance

50 100 150 190 200 210 220 230 240 250 Bandwidth (MB) Timeline (sec) FOAM - RP-RAMFS

slide-66
SLIDE 66

Updating OS with memcached

  • PPP has the least degradation
  • Storage also afgects the performance

50 100 150 190 200 210 220 230 240 250 Bandwidth (MB) Timeline (sec) PPP

slide-67
SLIDE 67

Updating OS with memcached

  • PPP has the least degradation
  • Storage also afgects the performance

Basic - SSD Incremental checkpoint - SSD On-demand restore - SSD FOAM - SSD Basic - RP-RAMFS Incremental checkpoint - RP-RAMFS On-demand restore - RP-RAMFS FOAM - RP-RAMFS 200 210 220 230 240 250 Timeline (sec) PPP

slide-68
SLIDE 68

Limitations

  • KUP does not support checkpoint and

restore all socket implementations

– TCP, UDP and netlink are supported

  • Failure during restoration

– System call is removal or interface

modifjcation

slide-69
SLIDE 69

Demo

slide-70
SLIDE 70

Summary

  • KUP: a simple update mechanism with

application checkpoint-and-restore (C/R)

  • Employs various techniques:

– New data abstraction for application C/R – Fast in-kernel switching technique – A simple mechanism to persist the memory

slide-71
SLIDE 71

Summary

  • KUP: a simple update mechanism with

application checkpoint-and-restore (C/R)

  • Employs various techniques:

– New data abstraction for application C/R – Fast in-kernel switching technique – A simple mechanism to persist the memory

Thank you!

slide-72
SLIDE 72

Backup Slides

slide-73
SLIDE 73

Handling in-kernel states

  • Handles namespace and cgroups
  • ptrace() syscall to handle the blocking system calls ,

timers, registers etc.

  • Parasite code to fetch / put the application's states
  • /proc fjle system exposes the required information

for application C/R

  • A new mode (TCP_REPAIR) allows handling the TCP

connections

slide-74
SLIDE 74

What cannot be checkpointed

  • X11 applications
  • Tasks with debugger attached
  • Tasks running in compat mode (32 bit)
slide-75
SLIDE 75

Possible changes after application C/R

  • Per-task statistics
  • Namespace IDs
  • Process start time
  • Mount point IDs
  • Socket IDs (st_ino)
  • VDSO
slide-76
SLIDE 76

Suitable applications

  • Suitable for all kinds of applications
  • PPP approach supports all types of applications

– May fail to restore on the previous kernel

  • FOAM is not a good candidate for write-

intensive applications

– More confjdence in safely restoring the

application on the previous kernel

slide-77
SLIDE 77

PPP works efgectively

10 20 30 40 50 60 70 80 90 8 16 24 32 40 48 56 64 72 Downtime (sec) WSS (GB) with 50% write FOAM - SSD

  • FOAM on SSD

slow →

  • FOAM on RP-RAMFS

space ineffjcient →

slide-78
SLIDE 78

PPP works efgectively

10 20 30 40 50 60 70 80 90 8 16 24 32 40 48 56 64 72 Downtime (sec) WSS (GB) with 50% write Out of memory error FOAM - SSD FOAM - RP-RAMFS

  • FOAM on SSD

slow →

  • FOAM on RP-RAMFS

space ineffjcient →

slide-79
SLIDE 79

PPP works efgectively

10 20 30 40 50 60 70 80 90 8 16 24 32 40 48 56 64 72 Downtime (sec) WSS (GB) with 50% write Out of memory error FOAM - SSD FOAM - RP-RAMFS PPP

  • FOAM on SSD

slow →

  • FOAM on RP-RAMFS

space ineffjcient →