Roadmap for Section B A Brief History of Windows and Linux - - PDF document

roadmap for section b
SMART_READER_LITE
LIVE PREVIEW

Roadmap for Section B A Brief History of Windows and Linux - - PDF document

Unit OS B: Comparing the Linux and Windows Kernels Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze Roadmap for Section B A Brief History of Windows and Linux Comparing the Windows and Linux


slide-1
SLIDE 1

1

Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze

Unit OS B: Comparing the Linux and Windows Kernels

3

Roadmap for Section B

A Brief History of Windows and Linux Comparing the Windows and Linux kernel architectures Linux: becoming more like Windows Benchmarks and other lies What does the future hold?

slide-2
SLIDE 2

2

4

Scope

We’re going to look at the technology of the kernels We’re not going to look at:

Cost Support Applications Management Use as a desktop system

5

The History of Linux

The real history of Linux starts in 1969, when Ken Thompson developed the first version of UNIX at Bell Labs

After Dennis Ritchie, designer of the C programming language, joined the project it debuted to the research community in an academic paper in 1974 Bell Labs released the first commercial version in 1976 as UNIX Version 6 (V6)

UNIX spread throughout universities and in 1978 Bell Labs released UNIX Time-Sharing System, a version with portability in mind

slide-3
SLIDE 3

3

6

Linux History Continued

Because Bell Labs distributed UNIX with source code, the early 1980’s saw three major branches grow on the UNIX tree:

UNIX System III from Bell Lab’s UNIX Support Group (USG) UNIX Berkeley Source Distribution (BSD) from the University of California at Berkeley Microsoft’s XENIX

The UNIX market fragmented further in the 1980’s, despite the IEEE’s POSIX standard and the X/Open Group’s Portability Guide

7

Linus and Linux

In 1991 Linus Torvalds took a college computer science course that used the Minix operating system

Minix is a “toy” UNIX-like OS written by Andrew Tanenbaum as a learning workbench Linus wanted to make MINIX more usable, but Tanenbaum wanted to keep it ultra-simple

Linus went in his own direction and began working on Linux

In October 1991 he announced Linux v0.02 In March 1994 he released Linux v1.0

slide-4
SLIDE 4

4

8

The History of Windows (NT)

The history of Windows really begins in the mid-1970s, when Dick Hustvedt, Peter Lipman and David Cutler designed the VMS operating system for Digital’s 32-bit VAX processor

Digital shipped VMS v1.0 in 1978

Cutler moved to Seattle to open DECWest and worked on the Digital Mica OS for a new CPU codenamed Prism

12 engineers went with him and the facility grew to 200 In 1988 Digital cancelled the project

9

The History of Windows Continued

Bill Gates wanted a UNIX rival

He hired Cutler and 20 Digital engineers in 1989 The new project was called NT OS/2 because it focused on OS/2 backward compatibility

With the success of Windows 3.0’s 1990 release Gates refocused the project on Windows compatibility

The project renamed to Windows NT Microsoft released Windows NT 3.1 in August 1993

slide-5
SLIDE 5

5

10

Windows and Linux

Both Linux and Windows are based on foundations developed in the mid-1970s

1970 1980 1990 2000

UNIX born UNIX public UNIX V6 Linux v1.0 v2.0 v2.1 v2.2 v2.3 v2.4 v2.6

1970 1980 1990 2000

VMS v1.0 Windows NT 3.1 NT 4.0 Windows 2000 Windows XP Server 2003 11

Comparing the Architectures

Both Linux and Windows are monolithic

All core operating system services run in a shared address space in kernel-mode All core operating system services are part of a single module

Linux: vmlinuz Windows: ntoskrnl.exe

Windowing is handled differently:

Windows has a kernel-mode Windowing subsystem Linux has a user-mode X-Windowing system

slide-6
SLIDE 6

6

12

Kernel Architectures

Device Drivers Process Management, Memory Management, I/O Management, etc. X-Windows Application System Services User Mode Kernel Mode Hardware Dependent Code

Linux

Device Drivers Process Management, Memory Management, I/O Management, etc. Win32 Windowing Application System Services User Mode Kernel Mode Hardware Dependent Code

Windows

13

Linux Kernel

Linux is a monolithic but modular system

All kernel subsystems form a single piece of code with no protection between them

Modularity is supported in two ways:

Compile-time options Most kernel components can be built as a dynamically loadable kernel module (DLKM)

DLKMs

Built separately from the main kernel Loaded into the kernel at runtime and on demand (infrequently used components take up kernel memory only when needed) Kernel modules can be upgraded incrementally Support for minimal kernels that automatically adapt to the machine and load only those kernel components that are used

slide-7
SLIDE 7

7

14

Windows Kernel

Windows is a monolithic but modular system

No protection among pieces of kernel code and drivers

Support for Modularity is somewhat weak:

Windows Drivers allow for dynamic extension of kernel functionality Windows XP Embedded has special tools / packaging rules that allow coarse-grained configuration of the OS

Windows Drivers are dynamically loadable kernel modules

Significant amount of code run as drivers (including network stacks such as TCP/IP and many services) Built independently from the kernel Can be loaded on-demand Dependencies among drivers can be specified

15

Comparing Portability

Both Linux and Windows kernels are portable

Mainly written in C Have been ported to a range of processor architectures

Windows

i486, MIPS, PowerPC, Alpha, IA-64, x86-64 Only x86-64 and IA-64 currently supported > 64MB memory required

Linux

Alpha, ARM, ARM26, CRIS, H8300, i386, IA-64, M68000, MIPS, PA-RISC, PowerPC, S/390, SuperH, SPARC, VAX, v850, x86-64 DLKMs allow for minimal kernels for microcontrollers > 4MB memory required

slide-8
SLIDE 8

8

16

Comparing Layering, APIs, Complexity

Windows

Kernel exports about 250 system calls (accessed via ntdll.dll) Layered Windows/POSIX subsystems Rich Windows API (17 500 functions on top of native APIs)

Linux

Kernel supports about 200 different system calls Layered BSD, Unix Sys V, POSIX shared system libraries Compact APIs (1742 functions in Single Unix Specification Version 3; not including X Window APIs)

17

Comparing Architectures

Processes and scheduling SMP support Memory management I/O File Caching Security

slide-9
SLIDE 9

9

18

Process Management

Windows Process

Address space, handle table, statistics and at least

  • ne thread

No inherent parent/child relationship

Threads

Basic scheduling unit Fibers - cooperative user- mode threads

Linux Process is called a Task

Basic Address space, handle table, statistics Parent/child relationship Basic scheduling unit

Threads

No threads per-se Tasks can act like Windows threads by sharing handle table, PID and address space PThreads – cooperative user-mode threads

19

Scheduling Priorities

Windows Two scheduling classes

“Real time” (fixed) - priority 16-31 Dynamic - priority 1-15

Higher priorities are favored

Priorities of dynamic threads get boosted on wakeups Thread priorities are never lowered

31 15 16 Fixed Dynamic I/O

Windows

slide-10
SLIDE 10

10

20

Scheduling Priorities

Windows Two scheduling classes

“Real time” (fixed) - priority 16-31 Dynamic - priority 1-15

Higher priorities are favored

Priorities of dynamic threads get boosted on wakeups Thread priorities are never lowered Linux Has 3 scheduling classes:

Normal – priority 100-139 Fixed Round Robin – priority 0-99 Fixed FIFO – priority 0-99

Lower priorities are favored

Priorities of normal threads go up (decay) as they use CPU Priorities of interactive threads go down (boost)

21

Scheduling Priorities (cont)

31 15 16 Fixed Dynamic I/O

Windows

140 100 99 Fixed FIFO Fixed Round-Robin Normal CPU I/O

Linux

slide-11
SLIDE 11

11

22

Linux Scheduling Details

Most threads use a dynamic priority policy

Normal class - similar to the classic UNIX scheduler A newly created thread starts with a base priority Threads that block frequently (I/O bound) will have their priority gradually increased Threads that always exhaust their time slice (CPU bound) will have their priority gradually decreased

“Nice value” sets a thread’s base priority

Larger values = less priority, lower values = higher priority Valid nice values are in the range of -20 to +20 Nonprivileged users can only specify positive nice value

Dynamic priority policy threads have static priority zero

Execute only when there are no runnable real-time threads

23

Real-Time Scheduling on Linux

Linux supports two static priority scheduling policies:

Round-robin and FIFO (first in, first out)

Selected with the sched-setscheduler( ) system call Use static priority values in the range of 1 to 99 Executed strictly in order of decreasing static priority

FIFO policy lets a thread run to completion

Thread needs to indicate completion by calling the sched-yield( )

Round-robin lets threads run for up to one time slice

Then switches to the next thread with the same static priority

RT threads can easily starve lower-prio threads from executing

Root privileges or the CAP-SYS-NICE capability are required for the selection of a real-time scheduling policy

Long running system calls can cause priority-inversion

Same as in Windows; but cmp. rtLinux

slide-12
SLIDE 12

12

24

Windows Scheduling Details

Most threads run in variable priority levels

Priorities 1-15; A newly created thread starts with a base priority Threads that complete I/O operations experience priority boosts (but never higher than 15) A thread’s priority will never be below base priority

The Windows API function SetThreadPriority() sets the priority value for a specified thread

This value, together with the priority class of the thread's process, determines the thread's base priority level Windows will dynamically adjust priorities for non-realtime threads

25

Real-Time Scheduling on Windows

Windows supports static round-robin scheduling policy for threads with priorities in real-time range (16-31)

Threads run for up to one quantum Quantum is reset to full turn on preemption Priorities never get boosted

RT threads can starve important system services

Such as CSRSS.EXE SeIncreaseBasePriorityPrivilege required to elevate a thread’s priority into real-time range (this privilege is assigned to members of Administrators group)

System calls and DPC/APC handling can cause priority inversion

slide-13
SLIDE 13

13

26

Scheduling Timeslices

Windows The thread timeslice (quantum) is 10ms-120ms

When quanta can vary, has one of 2 values

Reentrant and preemptible

Fixed: 120ms 20ms Foreground: 60ms Background

Linux The thread quantum is 10ms-200ms

Default is 100ms Varies across entire range based on priority, which is based on interactivity level

Reentrant and preemptible

100ms 200ms 10ms

27

Multiprocessor Support

Windows

Supports symmetric multiprocessing (SMP)

Up to 32 processors on 32-bit Windows Up to 64 processors on 64-bit Windows All CPUs can take interrupts

Supports Non-Uniform Memory Access systems

Scheduler favors the node a thread prefers to run on Memory manager tries to allocate memory on the node a thread prefers to run on

Supports Hyperthreading

Scheduler favors idle physical processors when it has a choice Doesn’t count logical CPUs against licensing limits

Physical CPU 0 Physical CPU 1 1 3 4 Ready Thread

slide-14
SLIDE 14

14

28

Multiprocessor Support

Windows

Supports symmetric multiprocessing (SMP)

Up to 32 processors on 32-bit Windows Up to 64 processors on 64-bit Windows All CPUs can take interrupts

Supports Non-Uniform Memory Access systems

Scheduler favors the node a thread prefers to run on Memory manager tries to allocate memory on the node a thread prefers to run on

Supports Hyperthreading

Scheduler favors idle physical processors when it has a choice Doesn’t count logical CPUs against licensing limits

Linux Supports SMP

No upper CPU limit: set as kernel build constant All CPUs can take interrupts

Supports Non-Uniform Memory Access systems

Scheduler favors the node a thread last ran on Memory manager tries to allocate memory on the node a thread is running on

Supports Hyperthreading Scheduler favors idle physical processors when it has a choice

29

Virtual Memory Management

Windows 32-bit versions split user- mode/kernel-mode from 2GB/2GB to 3GB/1GB Demand-paged virtual memory

32 or 64-bits Copy-on-write Shared memory Memory mapped files

User System 2GB 4GB

Linux Splits user-mode/kernel-mode from 1GB/3GB to 3GB/1GB

2.6 has “4/4 split” option where kernel has its own address space

Demand-paged virtual memory

32-bits and/or 64-bits Copy-on-write Shared memory Memory mapped files

User System 3GB 4GB

slide-15
SLIDE 15

15

30

Physical Memory Management

Windows Per-process working sets

Working set tuner adjust sets according to memory needs using the “clock” algorithm

No “swapper”

Process LRU Reused Page

Linux Global working set management uses “clock” algorithm No “swapper” (the working set trimmer code is called the swap daemon, however)

LRU Reused Page Other Process LRU

31

I/O Management

Windows

Centered around the file object Layered driver architecture throughout driver types Most I/O supports asynchronous

  • peration

Internal interrupt request level (IRQL) controls interruptability Interrupts are split between an Interrupt Service Routine (ISR) and a Deferred Procedure Call (DPC) Supports plug-and-play

Linux Centered around the vnode No layered I/O model Most I/O is synchronous

Only sockets and direct disk I/O support asynchronous I/O

Internal interrupt request level (IRQL) controls interruptability Interrupts are split between an ISR and soft IRQ or tasklet Supports plug-and-play

IRQL Masked

slide-16
SLIDE 16

16

32

File Caching

Windows

Single global common cache Virtual file cache

Caching is at file vs. disk block level Files are memory mapped into kernel memory

Cache allows for zero-copy file serving File Cache File System Driver Disk Driver

Linux

Single global common cache Virtual file cache

Caching is at file vs. disk block level Files are memory mapped into kernel memory

Cache allows for zero-copy file serving File Cache File System Driver Disk Driver

33

Security

Windows

Very flexible security model based on Access Control Lists Users are defined with

Privileges Member groups

Security can be applied to any Object Manager object

Files, processes, synchronization

  • bjects, …

Supports auditing

Linux

Two models:

Standard UNIX model Access Control Lists (SELinux)

Users are defined with:

Capabilities (privileges) Member groups

Security is implemented on an

  • bject-by-object basis

Has no built-in auditing support Version 2.6 includes Linux Security Module framework for add-on security models

slide-17
SLIDE 17

17

34

Monitoring - Linux procfs

Linux supports a number of special filesystems

Like special files, they are of a more dynamic nature and tend to have side effects when accessed

Prime example is procfs (mounted at /proc)

provides access to and control over various aspects of Linux (I.e.; scheduling and memory management)

/proc/meminfo contains detailed statistics on the current memory usage of Linux Content changes as memory usage changes over time

Services for Unix implements procfs on Windows

35

Windows’ Evolution Towards Linux

Services for Unix 3.5 - really targeted at POSIX, not Linux

POSIX threads, full POSIX subsystem (Interix) X Window clients+server (X-Win32 LX) nfs, NIS, pam proc-file system for Windows

Configurability / Module Management

Windows XP Embedded Target Designer/Component Designer/ Component Management Database

Editions targeting new Application Domains

Windows Compute Cluster Server 2003 POSIX compatibility in Windows actually predates Linux and was one of the original design goals

slide-18
SLIDE 18

18

36

Linux’s Evolution Towards Windows

I/O processing Kernel reentrancy Kernel preemptibility Per-processor memory allocation O(1) scheduler and per-CPU ready queues Zero-Copy SendFile Wake-One socket semantics Asynchronous I/O Light-weight synchronization

37

I/O Processing

Linux 2.2 had the notion of bottom halves (BH) for low- priority interrupt processing

Fixed number of BHs Only one BH of a given type could be active on a SMP

Linux 2.4 introduced tasklets, which are non-preemptible procedures called with interrupts enabled Tasklets are the equivalent of Windows Deferred Procedure Calls (DPCs)

slide-19
SLIDE 19

19

38

Kernel Reentrancy

Mark Russinovich’s April 1999 Windows NT Magazine article, “Linux and the Enterprise”, pointed out that much of the Linux 2.2 was not reentrant

Ingo Molnar stated in rebuttal:

“his example is a clear red herring.”

A month later he made all major paths reentrant

cpu 1 cpu 2 cpu 1 cpu 2 Non-reentrant Reentrant Time Saved

39

Kernel Preemptibility

A preemptible kernel is more responsive to high-priority tasks Through the base release of v2.4 Linux was only cooperatively preemptible

There are well-defined safe places where a thread running in the kernel can be preempted

The kernel is preemptible in v2.4 patches and v2.6 Windows NT has always been preemptible

slide-20
SLIDE 20

20

40

Per-CPU Memory Allocation

Keeping accesses to memory localized to a CPU minimizes CPU cache thrashing

Hurts performance on enterprise SMP workloads

Linux 2.4 introduced per-CPU kernel memory buffers Windows introduced per-CPU buffers in an NT 4 Service Pack in 1997

1 Buffer Cache 0 Buffer Cache 1 CPUs

41

Scheduling

The Linux 2.4 scheduler is O(n)

If there are 10 active tasks, it scans 10 of them in a list in order to decide which should execute next This means long scans and long durations under the scheduler lock

103 112 112 101 Ready List Highest Priority Task

slide-21
SLIDE 21

21

42

Scheduling

Linux 2.6 has a revamped scheduler that’s O(1) from Ingo Molnar that:

Calculates a task’s priority at the time it makes scheduling decision Has per-CPU ready queues where the tasks are pre-sorted by priority 112 112 101 103 Highest-priority Non-empty Queue

43

Scheduling

Windows NT has always had an O(1) scheduler based

  • n pre-sorted thread priority queues

Server 2003 introduced per-CPU ready queues

Linux load balances queues Windows does not

Not seen as an issue in performance testing by Microsoft Applications where it might be an issue are expected to use affinity

slide-22
SLIDE 22

22

44

Zero-Copy Sendfile

Linux 2.2 introduced Sendfile to efficiently send file data over a socket

I pointed out that the initial implementation incurred a copy operation, even if the file data was cached

Linux 2.4 introduced zero-copy Sendfile Windows NT pioneered zero-copy file sending with TransmitFile, the Sendfile equivalent, in Windows NT 4

File Data Buffer Network Adapter Buffer Network File Data Buffer Network Driver Network Network Driver

1-Copy 0-Copy

45

Wake-one Socket Semantics

Linux 2.2 kernel had the thundering herd or

  • verscheduling problem

In a network server application there are typically several threads waiting for a new connection In v2.2 when a new connection came in all the waiters would race to get it

Ingo Molnar’s response:

5/2/99: “here he again forgets to _prove_ that overscheduling happens in Linux.” 5/7/99: “as of 2.3.1 my wake-one implementation and waitqueues rewrite went in”

In Linux 2.4 only one thread wakes up to claim the new connection Windows NT has always had wake-1 semantics

slide-23
SLIDE 23

23

46

Asynchronous I/O

Linux 2.2 only supported asynchronous I/O on socket connect operations and tty’s Linux 2.6 adds asynchronous I/O for direct-disk access

AIO model includes efficient management of asynchronous I/O

Also added alternate epoll model

Useful for database servers managing their database on a dedicated raw partition Database servers that manage a file-based database suffer from synchronous I/O

Windows I/O is inherently asynchronous Windows has had completion ports since NT 3.5

More advanced form of AIO

47

Light-Weight Synchronization

Linux 2.6 introduces Futexes

There’s only a transition to kernel-mode when there’s contention

Windows has always had CriticalSections

Same behavior

Futexes go further:

Allow for prioritization of waits Works interprocess as well

slide-24
SLIDE 24

24

48

A Look at the Future

The kernel architectures are fundamentally similar

There are differences in the details Linux implementation is adopting more of the good ideas used in Windows

For the next 2-4 years Windows has and will maintain an edge

Linux is still behind on the cutting edge of performance tricks Large performance team and lab at Microsoft has direct ties into the kernel developers

As time goes on the technological gap will narrow

Open Source Development Labs (OSDL) will feed performance test results to the kernel team IBM and other vendors have Linux technology centers Squeezing performance out of the OS gets much harder as the OS gets more tuned

49

Linux Technology Unknowns

Linux kernel forking

RedHat has already done it: Red Hat Enterprise Server v3.0 is Linux 2.4 with some Linux 2.6 features

Backward compatibility philosophy

Linus Torvalds makes decisions on kernel APIs and architecture based on technical reasons, not business reasons

slide-25
SLIDE 25

25

50

Further Reading

Transaction Processing Council: www.tpc.org SPEC: www.spec.org NT vs Linux benchmarks: www.kegel.com/nt-linux- benchmarks.html The C10K problem: http://www.kegel.com/c10k.html Linus Torvald’s home: http://www.osdl.org/ Linux Kernel Archives: http://www.kernel.org/ Linux history: http://www.firstmonday.dk/issues/issue5_11/moon/ Veritest Netbench result: http://www.veritest.com/clients/reports/microsoft/ms_netbench.pdf Mark Russinovich’s 1999 article, “Linux and the Enterprise”: http://www.winntmag.com/Articles/Index.cfm?ArticleID=5048 The Open Group's Single UNIX Specification: http://www.unix.org/version3/