SE350: Operating Systems Lecture 5: Multithreaded Kernels Outline - - PowerPoint PPT Presentation

se350 operating systems
SMART_READER_LITE
LIVE PREVIEW

SE350: Operating Systems Lecture 5: Multithreaded Kernels Outline - - PowerPoint PPT Presentation

SE350: Operating Systems Lecture 5: Multithreaded Kernels Outline Use cases for multithreaded programs Kernel vs. user-mode threads Concurrencys problems Recall: Why Processes & Threads? Go Goals ls: Mu Multiprogramming


slide-1
SLIDE 1

SE350: Operating Systems

Lecture 5: Multithreaded Kernels

slide-2
SLIDE 2

Outline

  • Use cases for multithreaded programs
  • Kernel vs. user-mode threads
  • Concurrency’s problems
slide-3
SLIDE 3

Recall: Why Processes & Threads?

  • Mu

Multiprogramming: Run multiple applications concurrently

  • Pr

Protec ection: Don’t want bad applications to crash system! Go Goals ls:

  • Pr

Process ess: unit of execution and allocation

  • Vir

Virtual l Ma Machin ine abstractio ion: give process illusion it owns machine (i.e., CPU, Memory, and IO device multiplexing) So Solu lutio ion:

  • Process creation & switching expensive
  • Need concurrency within same app (e.g., web server)

Ch Chal allenge ge:

  • Th

Threa ead: Decouple allocation and execution

  • Run multiple threads within same process

So Solu lutio ion:

slide-4
SLIDE 4

Multithreaded Processes

  • PCBs could point to multiple TCBs
  • Switching threads within one block is simple thread switch
  • Switching threads across blocks requires changes to memory

and I/O address tables

slide-5
SLIDE 5

Examples Multithreaded Programs

  • Embedded systems
  • Elevators, planes, medical systems, smart watches
  • Single program, concurrent operations
  • Most modern OS kernels
  • Internally concurrent to deal with concurrent requests by

multiple users/applications

  • But no protection needed within kernel
  • Database servers
  • Access to shared data by many concurrent users
  • Also background utility processing must be done
slide-6
SLIDE 6

Example Multithreaded Programs (cont.)

  • Network servers
  • Concurrent requests from network
  • Again, single program, multiple concurrent operations
  • File server, web server, and airline reservation systems
  • Parallel programming (more than one physical CPU)
  • Split program into multiple threads for parallelism
  • This is called multiprocessing
  • Some multiprocessors are actually uniprogrammed
  • Multiple threads in one address space but one program at a time
slide-7
SLIDE 7

A Typical Use Case

  • fork process for each tab
  • create thread to render page
  • run GET in separate thread
  • spawn multiple outstanding GETs
  • as they complete, render portion
  • fork process for each client connection
  • create threads to get request and issue response
  • create threads to read data, access DB, etc.
  • join and respond

Client Browser Web Server

slide-8
SLIDE 8

Kernel Use Cases

  • Thread for each user process
  • Thread for sequence of steps in processing I/O
  • Threads for device drivers
slide-9
SLIDE 9

Device Drivers

  • Device-specific code in kernel that interacts directly with device hardware
  • Supports standard, internal interface
  • Same kernel I/O system can interact easily with different device drivers
  • Special device-specific configuration supported with ioctl() syscall
  • Device drivers are typically divided into two pieces
  • To

Top half: accessed in call path from system calls

  • implements a set of standard, cross-device calls like
  • pen(), close(), read(), write(), ioctl(), etc.
  • This is kernel’s interface to device driver
  • Top half will start I/O to device, may put thread to sleep until finished
  • Bo

Botto ttom hal alf: run as interrupt routine

  • Gets input or transfers next block of output
  • May wake sleeping threads if I/O now complete
slide-10
SLIDE 10

Life Cycle of An I/O Request

Device Driver Top Half Device Driver Bottom Half Device Hardware Kernel I/O Subsystem User Program

slide-11
SLIDE 11

Multithreaded Kernel

  • User programs use syscalls to create, join, yield, exit threads
  • Kernel handles scheduling and context switching
  • Simple, but a lot of transitions between user and kernel mode

Kernel User-Level Processes

Heap Code Globals TCB 1 Kernel Thread 1 Stack TCB 2 Kernel Thread 2 Stack TCB 3 Kernel Thread 3 Stack TCB 1.B Stack TCB 1.A Stack Process 1 PCB 1 TCB 2.B Stack TCB 2.A Stack Process 2 PCB 2 Heap Code Globals Stack Thread A Stack Thread B Process 2 Heap Code Globals Stack Thread A Stack Thread B Process 1

slide-12
SLIDE 12

Kernel vs. User-Mode Threads

  • We have been talking about kernel supported threads
  • Each user-level thread maps to one kernel thread
  • Every thread can run or block independently
  • One process may have several threads waiting on different events
  • Examples: Windows, Linux
  • Downside of kernel supported threads: a bit expensive
  • Need to make crossing into kernel mode to schedule
  • Solution: user supported threads
slide-13
SLIDE 13

Basic Cost of System Calls

  • Min syscall has ~ 25x cost of function call
  • Scheduling could be many times more
  • Streamline system processing as much as possible
  • Other optimizations seek to process as much of syscall in user

space as possible (e.g., Linux vDSO)

slide-14
SLIDE 14

User-Mode Threads

  • Lighter weight option
  • Many user-level threads are mapped to single kernel thread
  • User program provides scheduler and thread package
  • Examples: Solaris Green Threads, GNU Portable Threads
  • Downside of user-mode threads
  • Multiple threads may not run in parallel on multicore
  • When one thread blocks on I/O, all threads block
  • Option: Scheduler Activations
  • Have kernel inform user level when thread blocks …
slide-15
SLIDE 15

Classification

  • Most operating systems have either
  • One or many address spaces
  • One or many threads per address space

Mach, OS/2, Linux Windows 10 Win NT to XP , Solaris, HP- UX, OS X Embedded systems (Geoworks, VxWorks, JavaOS, Pilot(PC), etc.) Traditional UNIX MS/DOS, early Macintosh

Many One # threads Per AS: Many One # of addr spaces:

slide-16
SLIDE 16

Putting it Together: Process

Memory I/O State (e.g., file, socket contexts) CPU state (PC, SP , registers..) Sequential stream

  • f instructions

A(int tmp) { if (tmp<2) B(); printf(tmp); } B() { C(); } C() { A(2); } A(1); …

(Unix) Process Resources

Stack

Stored in OS

slide-17
SLIDE 17

Putting it Together: Processes

  • Switch overhead: hi

high

  • CPU state: lo

low

  • Memory/IO state: hi

high

  • Process creation: hi

high

  • Protection
  • CPU: ye

yes

  • Memory/IO: ye

yes

  • Sharing overhead: hi

high (involves at least one context switch)

Process 1 Process 2 Process N

CPU scheduler

OS

CPU (1 core)

1 process at a time

CPU state IO state Mem. CPU state IO state Mem. CPU state IO state Mem.

slide-18
SLIDE 18

CPU scheduler

Putting it Together: Threads

  • Switch overhead: me

medium

  • CPU state: lo

low

  • Thread creation: me

medium

  • Protection
  • CPU: ye

yes

  • Memory/IO: no

no

  • Sharing overhead: lo

low(ish) (thread switch overhead low)

Process 1 OS

CPU (1 core)

1 thread at a time

IO state Mem.

threads

Process N

IO state Mem.

threads

CPU state CPU state CPU state CPU state

slide-19
SLIDE 19

Putting it Together: Multi-Cores

  • Switch overhead: lo

low (only CPU state)

  • Thread creation: lo

low

  • Protection
  • CPU: ye

yes

  • Memory/IO: no

no

  • Sharing overhead: lo

low (thread switch overhead lo low, may not need to switch at all!)

Core 1 Core 2 Core 3 Core 4

CPU 4 threads at a time

CPU scheduler

Process 1 OS

IO state Mem.

threads

Process N

IO state Mem.

threads

CPU state CPU state CPU state CPU state

slide-20
SLIDE 20

Hyperthreading

  • Superscalar processors can execute multiple instructions that are independent
  • Multiprocessors can execute multiple independent threads
  • Fine-grained multithreading executes two independent threads by switches between them
  • Hyperthreading duplicates register state to make second (hardware) “thread” (virtual core)
  • From OS’s point of view, virtual cores are separate CPUs
  • OS can schedule as many threads at a time as there are virtual cores (but, sub-linear speedup!)
  • See: http://www.cs.washington.edu/research/smt/index.html

Superscalar Architecture Multi-processor Architecture Fine-grained Multithreading Simultaneous Multithreading

Time (cycles)

Thread 1 Thread 2 Colored blocks show executed instructions

slide-21
SLIDE 21

PCore 1 PCore 2 PCore 3 PCore 4

Putting it Together: Hyperthreading

  • Switch overhead

between hardware- threads: ve very-lo low (done in hardware)

  • Contention for

ALUs/FPUs may hur hurt performance

CPU

8 threads at a time hardware-threads (VCores)

CPU scheduler

Process 1 OS

IO state Mem.

threads

Process N

IO state Mem.

threads

CPU state CPU state CPU state CPU state

slide-22
SLIDE 22

Recall: Thread Abstraction

  • Illusion: Infinite number of processors
  • Each thread runs on dedicated virtual processor
  • Reality: few processors, multiple threads running at variable speed
  • To map arbitrary set of threads to fixed set of cores, kernel implements scheduler

Programmer Abstraction Threads Processors 1 2 3 4 5 Physical Reality 1 2 Running Threads Ready Threads

slide-23
SLIDE 23

Programmer vs. Processor View

  • .

.

Possible Execution #3

. . . x = x + 1 ; y = y + x ; . . . . . . . . . . . . . . . Thread is suspended. Other thread(s) run. Thread is resumed. . . . . . . . . . . . . . . . . z = x + 5 y ;

  • Possible

Execution #2

. . . x = x + 1 ; . . . . . . . . . . . . . . Thread is suspended. Other thread(s) run. Thread is resumed. . . . . . . . . . . . . . . . y = y + x ; z = x + 5 y ;

  • Possible

Execution #1

. . . x = x + 1 ; y = y + x ; z = x + 5 y ; . . .

Programmers View

. . . x = x + 1 ; y = y + x ; z = x + 5 y ; . . .

slide-24
SLIDE 24

Possible Interleavings

Thread 1 Thread 2 Thread 3

One Execution

Thread 1 Thread 2 Thread 3

Another Execution Another Execution

Thread 1 Thread 2 Thread 3

slide-25
SLIDE 25

Correctness with Concurrent Threads

  • If threads can be scheduled in any way, programs must work under all conditions
  • Can you test for this?
  • How can you know if your program works?
  • Independent Threads
  • No state shared with other threads
  • Deterministic Þ Input state determines results
  • Reproducible Þ Can recreate starting conditions, I/O
  • Scheduling order doesn’t matter (if switch() works!!!)
  • Cooperating Threads
  • Shared state between multiple threads
  • Non-deterministic
  • Non-reproducible
  • Non-deterministic and non-reproducible means that bugs can be intermittent
  • Sometimes called “Heisenbugs”
slide-26
SLIDE 26

Interactions Complicate Debugging

  • Is any program truly independent?
  • Every process shares file system, OS resources, network, etc.
  • E.g., buggy device driver makes thread I crash “independent thread” 2
  • Non-deterministic errors are extremely difficult to find
  • E.g., memory layout of kernel + user programs
  • Depends on scheduling, which depends on timer/other things
  • Original UNIX had bunch of non-deterministic errors
  • E.g., something which does interesting I/O
  • User typing of letters used to help generate secure keys
slide-27
SLIDE 27

Why Allow Cooperating Threads?

  • Advantage 1: Sh

Sharin ing resources

  • One computer, many users
  • One bank balance, many ATMs
  • What if ATMs were only updated at night?
  • Embedded systems (robot control: coordinate arm & hand)
  • Advantage 2: Sp

Speedup

  • Overlap I/O and computation
  • Many different file systems do read-ahead
  • Multiprocessors – chop up program into parallel pieces
  • Advantage 3: Mo

Modularity

  • More important than you might think
  • Chop large problem up into simpler pieces
slide-28
SLIDE 28

High-level Example: Web Server

  • Server must handle many requests
  • Non-cooperating version:

serverLoop() { connection = AcceptCon(); fork(ServiceWebPage, connection); }

  • What are some disadvantages of this technique?
slide-29
SLIDE 29

Threaded Web Server

  • Instead, use single process
  • Multithreaded (cooperating) version:

serverLoop() { connection = AcceptCon(); thread_create(ServiceWebPage, connection); }

  • Looks almost the same, but has many advantages
  • Can share file caches kept in memory
  • Threads are cheaper to create than processes (lower per-request overhead)
  • What about Denial of Service (DoS) attacks?
slide-30
SLIDE 30

Thread Pools

  • Problem with previous version: unbounded number of threads
  • When web-site becomes too popular, throughput sinks
  • Instead, allocate bounded “pool” of worker threads, representing maximum level of

multiprogramming

master() { allocThreads(worker,queue); while(TRUE) { con=AcceptCon(); Enqueue(queue,con); wakeUp(queue); } } worker(queue) { while(TRUE) { con=Dequeue(queue); if (con==null) sleepOn(queue); else ServiceWebPage(con); } }

Ma Master Th Threa ead Queue Thread Pool Cl Client Request Response

slide-31
SLIDE 31

ATM Bank Server

  • ATM server requirements:
  • Service a set of requests
  • Do so without corrupting database
  • Don’t hand out too much money
slide-32
SLIDE 32

ATM bank server example

  • Suppose we wanted to implement server process to handle requests from ATM network

BankServer() { while (TRUE) { ReceiveRequest(&op, &acctId, &amount); ProcessRequest(op, acctId, amount); } } ProcessRequest(op, acctId, amount) { if (op == deposit) Deposit(acctId, amount); else if … } Deposit(acctId, amount) { acct = GetAccount(acctId); /* may use disk I/O */ acct->balance += amount; StoreAccount(acct); /* Involves disk I/O */ }

  • How could we speed this up?
  • More than one request being processed at once
  • Event driven (overlap computation and I/O)
  • Multiple threads (multi-proc, or overlap comp and I/O)
slide-33
SLIDE 33

Can Threads Make This Easier?

  • Threads yield overlapped I/O and computation without having to “deconstruct”

code into non-blocking fragments

  • One thread per request
  • Requests proceeds to completion, blocking as required

Deposit(acctId, amount) { acct = GetAccount(actId); /* May use disk I/O */ acct->balance += amount; StoreAccount(acct); /* Involves disk I/O */ }

  • Unfortunately, shared state can get corrupted:

Thread 1 Thread 2

load r1, acct->balance load r1, acct->balance add r1, amount2 store r1, acct->balance add r1, amount1 store r1, acct->balance

slide-34
SLIDE 34

Problem Is At The Lowest Level

  • When threads work on separate data, order of scheduling does not change results

Thread A Thread B x = 1; y = 2;

  • Scheduling order matters when threads work on shared data

Thread A Thread B x = 1; y = 2; x = y + 1; y = y * 2;

  • What are possible values of x? (initially, y = 12)

Thread A Thread B x = 1; x = y + 1; y = 2; y = y * 2; x = 13

slide-35
SLIDE 35

Problem Is At The Lowest Level

  • When threads work on separate data, order of scheduling does not change results

Thread A Thread B x = 1; y = 2;

  • Scheduling order matters when threads work on shared data

Thread A Thread B x = 1; y = 2; x = y + 1; y = y * 2;

  • What are possible values of x? (initially, y = 12)

Thread A Thread B y = 2; y = y * 2; x = 1; x = y + 1; x = 5

slide-36
SLIDE 36

Problem Is At The Lowest Level

  • When threads work on separate data, order of scheduling does not change results

Thread A Thread B x = 1; y = 2;

  • Scheduling order matters when threads work on shared data

Thread A Thread B x = 1; y = 2; x = y + 1; y = y * 2;

  • What are possible values of x? (initially, y = 12)

Thread A Thread B y = 2; x = 1; x = y + 1; y = y * 2; x = 3

slide-37
SLIDE 37

Summary

  • Processes have two parts
  • Threads (Concurrency)
  • Address Spaces (Protection)
  • Various textbooks talk about processes
  • When this concerns concurrency, talking about thread portion
  • When this concerns protection, talking about address space portion
  • Concurrent threads are a very useful abstraction
  • Allow transparent overlapping of computation and I/O
  • Allow use of parallel processing when available
  • Concurrent threads introduce problems when accessing shared data
  • Programs must be insensitive to arbitrary interleavings
  • Without careful design, shared variables can become completely inconsistent
slide-38
SLIDE 38

Questions?

globaldigitalcitizen.org

slide-39
SLIDE 39

Acknowledgment

  • Slides by courtesy of Anderson, Culler, Stoica,

Silberschatz, Joseph, and Canny