Dynamic and Adaptive Updates of Non-Quiescent Subsystems in - - PowerPoint PPT Presentation

dynamic and adaptive updates of non quiescent subsystems
SMART_READER_LITE
LIVE PREVIEW

Dynamic and Adaptive Updates of Non-Quiescent Subsystems in - - PowerPoint PPT Presentation

Dynamic and Adaptive Updates of Non-Quiescent Subsystems in Commodity OS Kernels Kristis Makris <kristis.makris@asu.edu> Arizona State University Kyung Dong Ryu <kryu@us.ibm.com> IBM T.J. Watson Research Center 1 March 23, 2007


slide-1
SLIDE 1

March 23, 2007 DynAMOS -- EuroSys '07

1

Dynamic and Adaptive Updates

  • f Non-Quiescent Subsystems

in Commodity OS Kernels

Kristis Makris <kristis.makris@asu.edu> Arizona State University Kyung Dong Ryu <kryu@us.ibm.com> IBM T.J. Watson Research Center

slide-2
SLIDE 2

March 23, 2007 DynAMOS -- EuroSys '07

2

Overview

 Motivation  Dynamic Kernel Updates Categorization  System Architecture  Adaptive Function Cloning  Synchronized Updates  Applications  Conclusion

slide-3
SLIDE 3

March 23, 2007 DynAMOS -- EuroSys '07

3

Motivation

 Dynamic kernel updates are essential  Existing updating methods are inadequate  Two approaches

– Build adaptable OS

 Specially crafted (K42, VINO, Synthetix)  Require OS and application restructuring

– Dynamic code instrumentation

 No kernel source modification (KernInst, GILK)  Basic block code interposition  Currently limited

– No procedure replacement – No autonomous kernel adaptability – No safe, complete subsystem update guarantees

slide-4
SLIDE 4

March 23, 2007 DynAMOS -- EuroSys '07

4

Dynamic Updates Categorization (1)

 Updating variable values

– Update an entry in system call table – Update owner (uid) of an inode

 Needs synchronized update

– Count number of system calls of a process

 Needs state tracking

 Updating datatypes

– Add new fields in Linux PCB for process checkpointing

 Update all functions that use the old datatype, or  Maintain new fields in separate data structure

– Does not need state transfer

slide-5
SLIDE 5

March 23, 2007 DynAMOS -- EuroSys '07

5

Dynamic Updates Categorization (2)

 Updating single function

– Correct a defect

 Updating kernel threads

– Update memory paging subsystem

 Needs update during infinite loop

 Updating function groups

– Update pipefs subsystem

 Needs synchronized update

slide-6
SLIDE 6

March 23, 2007 DynAMOS -- EuroSys '07

6

Our Approach

 DynAMOS

– Prototype for i386 Linux 2.2-2.6

 Dynamic code instrumentation

No kernel source modification or reboot

Procedure replacement

 Adaptive updates

Concurrent execution of multiple versions

State tracking

Autonomous kernel adaptability

 Safe updates of complete subsystems

Quiescence detection

Update synchronization (non-quiescent subsystems)

Datatype updates

State transfer

slide-7
SLIDE 7

March 23, 2007 DynAMOS -- EuroSys '07

7

Unmodified kernel in memory

DynAMOS System Architecture

update source gcc ld vmlinux kernel source make

  • bject

file insert module new function images

  • riginal

function images

slide-8
SLIDE 8

March 23, 2007 DynAMOS -- EuroSys '07

8

Unmodified kernel in memory

DynAMOS System Architecture

DynAMOS kernel module load DynAMOS new function images

  • riginal

function images

slide-9
SLIDE 9

March 23, 2007 DynAMOS -- EuroSys '07

9

Unmodified kernel in memory

DynAMOS System Architecture

DynAMOS kernel module Update tool /dev/dynamos version manager initiate update new function images

  • riginal

function images

slide-10
SLIDE 10

March 23, 2007 DynAMOS -- EuroSys '07

10

Update tool Unmodified kernel in memory

DynAMOS System Architecture

DynAMOS kernel module new function images image relocation disassembler prepare update version manager copy

  • riginal

function images /dev/dynamos cloned new function images

slide-11
SLIDE 11

March 23, 2007 DynAMOS -- EuroSys '07

11

Unmodified kernel in memory

DynAMOS System Architecture

DynAMOS kernel module version manager cloned new function images

  • riginal

function images new function images Update tool /dev/dynamos cloned new function images

slide-12
SLIDE 12

March 23, 2007 DynAMOS -- EuroSys '07

12

Unmodified kernel in memory

DynAMOS System Architecture

DynAMOS kernel module version manager activate update redirection cloned new function images

  • riginal

function images new function images /dev/dynamos Update tool

slide-13
SLIDE 13

March 23, 2007 DynAMOS -- EuroSys '07

13

schedule

Execution Flow Redirection

... call schedule ... caller step 1  Apply Linger-Longer scheduler

– Unobtrusive fine-grain cycle stealing – Implemented in schedule_LL as a

scheduling policy

slide-14
SLIDE 14

March 23, 2007 DynAMOS -- EuroSys '07

14

Execution Flow Redirection

step 2 jmp * schedule ... call schedule ... caller trampoline

 Trampoline installation – Disable processor interrupts – Flush I-cache  Indirect jump

Don’t modify page permissions redirection handler

slide-15
SLIDE 15

March 23, 2007 DynAMOS -- EuroSys '07

15

schedule

Execution Flow Redirection

... call schedule ... caller step 2 trampoline preserve state perform bookkeeping execute adaptation handler restore state

 Bookkeeping – Maintain use counters  User-defined adaptation handler – Execute if available – Select active version of function

adaptation handler call ret redirection handler

slide-16
SLIDE 16

March 23, 2007 DynAMOS -- EuroSys '07

16

redirection handler

Execution Flow Redirection

step 3 jmp * jump to active function schedule_clone schedule_LL_clone schedule ... call schedule ... caller trampoline adaptation handler

slide-17
SLIDE 17

March 23, 2007 DynAMOS -- EuroSys '07

17

Execution Flow Redirection

step 4 jump to active function schedule_clone schedule_LL_clone jump back jump back jmp * schedule ... call schedule ... caller trampoline adaptation handler redirection handler

slide-18
SLIDE 18

March 23, 2007 DynAMOS -- EuroSys '07

18

Execution Flow Redirection

step 5 jump to active function schedule_clone schedule_LL_clone jump back preserve state perform bookkeeping restore state ret return to caller jump back schedule ... call schedule ... caller trampoline adaptation handler redirection handler

slide-19
SLIDE 19

March 23, 2007 DynAMOS -- EuroSys '07

19

Adaptive Function Cloning Benefits

 No processor state saved on stack

– Function arguments accessed directly

 Autonomous kernel determination of update

timeliness

– Using adaptation handler

 Function-level updates

– Basic blocks can be bypassed (no control-flow graph

needed)

– Function modifications developed in original source

language

slide-20
SLIDE 20

March 23, 2007 DynAMOS -- EuroSys '07

20

Function Relocation Issues

 Replace ret (1-byte) with jmp * (6-byte) back to

handler

– Adjust inbound (jmp) and outbound (call) relative offsets

 Safely detect

– Backward branches: jmp to code overwritten by trampoline – Outbound branches: jmp to code outside function image – Indirect outbound branches: jmp * from indirection table – Data-in-code

Need user verification

– Multiple entry-points: e.g. produced by Intel C Compiler

slide-21
SLIDE 21

March 23, 2007 DynAMOS -- EuroSys '07

21

Performance

 Small memory footprint (42k)  Indirect addressing (jmp *) hurts branch prediction

Can use direct addressing (jmp)

Overhead not correlated to path length

Mostly 1-8%

slide-22
SLIDE 22

March 23, 2007 DynAMOS -- EuroSys '07

22

Quiescence Detection

 Needed to

– Atomically update function groups

 e.g. Count number of processes using a filesystem

– Safely reverse updates

 Implemented by

– Usage counters

 On entry and exit

– Stack walk-through

 For non-returning calls (do_exit in Linux; no ret instruction)  Examine stack and program counter of all processes  Default kernel compilation (works without frame pointers)

slide-23
SLIDE 23

March 23, 2007 DynAMOS -- EuroSys '07

23

wait for new data in buffer wait for more room in buffer

Non-quiescent Subsystems

pipe_read() { ... acquire Sem while (buffer_empty) { ... release Sem L1: sleep acquire Sem } read from data buffer release Sem return } pipe_write() { ... acquire Sem while (buffer_full) { ... release Sem L2: sleep acquire Sem } write in data buffer release Sem return }

Adaptively enlarge pipefs 4k copy buffer during large data transfers

reader and writer are synchronized with each other

slide-24
SLIDE 24

March 23, 2007 DynAMOS -- EuroSys '07

24

Non-quiescent Subsystems

pipe_read() { ... acquire Sem while (buffer_empty) { ... release Sem L1: sleep acquire Sem } read from data buffer release Sem return } pipe_write() { ... acquire Sem while (buffer_full) { ... release Sem L2: sleep acquire Sem } write in data buffer release Sem return }

subsystem may never quiesce cannot update atomically

quiescent non-quiescent; sleeping

slide-25
SLIDE 25

March 23, 2007 DynAMOS -- EuroSys '07

25

Synchronized update of pipefs

pipe_read() { acquire Sem while (4k_buffer_empty) { release Sem L1: sleep acquire Sem } read data from 4k_buffer release Sem return }

Phase 1

pipe_read_v3() { acquire Sem while (1mb_buffer_empty) { release Sem L1: sleep acquire Sem } read data from 1mb_buffer release Sem return }

slide-26
SLIDE 26

March 23, 2007 DynAMOS -- EuroSys '07

26

Synchronized update of pipefs

pipe_read() { acquire Sem while (4k_buffer_empty) { release Sem L1: sleep acquire Sem } read data from 4k_buffer release Sem return }

Semantically equivalent version at sou Wait for pipe_read to become inactive

pipe_read_v3() { acquire Sem while (1mb_buffer_empty) { release Sem L1: sleep acquire Sem } read data from 1mb_buffer release Sem return }

Phase 2

pipe_read_v2() { acquire Sem while (4k_buffer_empty) { release Sem L1: sleep acquire Sem if (must_update) { phase = 3 STATE TRANSFER goto new } } read data from 4k_buffer release Sem return new: }

slide-27
SLIDE 27

March 23, 2007 DynAMOS -- EuroSys '07

27

Synchronized update of pipefs

pipe_read() { acquire Sem while (4k_buffer_empty) { release Sem L1: sleep acquire Sem } read data from 4k_buffer release Sem return } pipe_read_v2() { acquire Sem while (4k_buffer_empty) { release Sem L1: sleep acquire Sem if (must_update) { phase = 3 STATE TRANSFER goto new } } read data from 4k_buffer release Sem return while (1mb_buffer_empty) { release Sem sleep acquire Sem new: } read data from 1mb_buffer release Sem return }

Inline updated version

pipe_read_v3() { acquire Sem while (1mb_buffer_empty) { release Sem L1: sleep acquire Sem } read data from 1mb_buffer release Sem return }

Phase 2

slide-28
SLIDE 28

March 23, 2007 DynAMOS -- EuroSys '07

28

pipe_read() { acquire Sem while (4k_buffer_empty) { release Sem L1: sleep acquire Sem } read data from 4k_buffer release Sem return } pipe_read_v2() { acquire Sem while (4k_buffer_empty) { release Sem L1: sleep acquire Sem if (must_update) { phase = 3 STATE TRANSFER goto new } } read data from 4k_buffer release Sem return while (1mb_buffer_empty) { release Sem sleep acquire Sem new: } read data from 1mb_buffer release Sem return } pipe_read_v3() { acquire Sem while (1mb_buffer_empty) { release Sem L1: sleep acquire Sem } read data from 1mb_buffer release Sem return }

Synchronized update of pipefs

Phase 3

slide-29
SLIDE 29

March 23, 2007 DynAMOS -- EuroSys '07

29

pipe_read_v2() { acquire Sem while (4k_buffer_empty) { release Sem L1: sleep acquire Sem if (must_update) { phase = 3 STATE TRANSFER goto new } } read data from 4k_buffer release Sem return while (1mb_buffer_empty) { release Sem sleep acquire Sem new: } read data from 1mb_buffer release Sem return } pipe_read_v3() { acquire Sem while (1mb_buffer_empty) { release Sem L1: sleep acquire Sem } read data from 1mb_buffer release Sem return } pipe_read_adaptation_handler() { if (phase == 3) activate pipe_read_v3 else activate pipe_read_v2 if (this process read more than 64k) must_update = 1 }

Sleep in original version Awake in new version Multi-phase approach Adaptive update 30-90% improvement in Linux 2.6 3.2% overhead when not adapting

Synchronized update of pipefs

Phase 3

slide-30
SLIDE 30

March 23, 2007 DynAMOS -- EuroSys '07

30

Adaptive Memory Paging For Efficient Gang Scheduling

 Kernel thread update (kswapd), Linux 2.2

– Infinite loop – Awaken by other subsystems – Goes back to sleep

e.g. calls interruptible_sleep_on in Linux

 To update

– Activate interruptible_sleep_on_v2

 Save state, exit  Start new version of kernel thread, restore state

slide-31
SLIDE 31

March 23, 2007 DynAMOS -- EuroSys '07

31

Kernel-Assisted Process Checkpointing

 Datatype update for EPCKPT in Linux 2.4

– Compact datatypes in commodity kernel. No extra room

 struct task_struct: semaphores, pipes, memory

mapped files

 struct file: checkpoint filename

 Shadow data structures

– Instantiation (do_fork, sys_open): map memory address

  • f original variable to shadow using hash table

– Removal (do_exit, fput): free shadow too – Already instantiated variables

 Shadow missing: idempotent use of new fields

– Update only functions that use new fields

 No state transfer needed

slide-32
SLIDE 32

March 23, 2007 DynAMOS -- EuroSys '07

32

Related Work

 K42

– Specially designed with hot-swappable capabilities – Guarantees quiescence

 Ginseng

– User-level software updates; requires recompilation

 KernInst, GILK, Detours, ATOM, EEL

– Do not facilitate adaptive execution – Do not safely replace complete subsystems

slide-33
SLIDE 33

March 23, 2007 DynAMOS -- EuroSys '07

33

On-going and Future Work

 Automatically produce updates given a patch

– Apply MOSIX, Superpages: parallel applications – Apply Nooks: OS reliability – Upgrade Linux kernel

 Multiprocessor support

– Safely install trampoline: freeze other processors

using single-byte trap instruction (ud2)

 Kernel module port

– FreeBSD, OpenSolaris

slide-34
SLIDE 34

March 23, 2007 DynAMOS -- EuroSys '07

34

Conclusion

 Dynamic Kernel Updates

Dynamic code instrumentation

Commodity operating system (prototype for i386 Linux 2.2- 2.6)

 Adaptive function cloning

Concurrent execution of multiple function versions

 Safe updates of non-quiescent subsystems

Scheduler, kernel threads, synchronized updates

 Datatype updates  Demonstrated updates

Synchronized pipefs adaptation, process checkpointing, adaptive memory paging for efficient gang-scheduling, unobtrusive fine-grain cycle stealing, public security fixes

 Small memory footprint (42k), 1-8% overhead

slide-35
SLIDE 35

March 23, 2007 DynAMOS -- EuroSys '07

35

Björn's questions

 How to handle false positives produced by “stack walk-

through” approach?

 Datatype updates: is it possible to add new fields in the

middle of a struct or only at the end?

 I didn't understand why they need indirect addressing in

the trampoline.