Multiple processor Multiple processor systems systems 1

Multiprocessor Systems Multiprocessor Systems � � Continuous need for faster Continuous need for faster computers computers � Multiprocessors: Multiprocessors: � � shared shared memory memory model, access time model, access time nanosec nanosec (ns) (ns) � � Multicomputers Multicomputers: : � ( � � s) � message message passing passing multiprocessor, access time multiprocessor, access time microsec microsec ( s) � � Distributed systems: Distributed systems: � � wide wide area distributed area distributed system, access time system, access time millisec millisec (ms) (ms) � 2

Multiprocessors Multiprocessors Definition: Definition: A computer system in which two or more A computer system in which two or more CPUs share full access to a common CPUs share full access to a common RAM RAM Memory access: Memory access: UMA UMA – – Uniformed Memory Access Uniformed Memory Access NUMA NUMA – – Nonuniform Nonuniform Memory Access Memory Access 3

Bus- Bus -Based UMA Multiprocessors Based UMA Multiprocessors � Contention for the bus Contention for the bus � � Only one CPU can access the memory at any time Only one CPU can access the memory at any time � � If bus is busy, requesting CPU must wait If bus is busy, requesting CPU must wait � � Limits the number of CPU Limits the number of CPU � � Internal cache reduces the memory accesses Internal cache reduces the memory accesses � � Need: cache Need: cache- -coherence protocol (several copies exists) coherence protocol (several copies exists) � 4

UMA Multiprocessor using a crossbar switch UMA Multiprocessor using a crossbar switch � Allows n CPUs to connect with k memories Allows n CPUs to connect with k memories � using n*k using n*k crosspoints crosspoints � Nonblocking Nonblocking network network � � CPU can always connect, assuming that memory is available CPU can always connect, assuming that memory is available � 5

NUMA Multiprocessor NUMA Multiprocessor Characteristics Characteristics Single address space visible to all CPUs Single address space visible to all CPUs 1. 1. Access to remote memory via commands Access to remote memory via commands 2. 2. LOAD LOAD - - STORE STORE - - Access to remote memory slower than to local Access to remote memory slower than to local 3. 3. 6

Multicore Cips Multicore Cips � ”What to do with all the transistors on the chip?” ”What to do with all the transistors on the chip?” � � � Add cache Add cache - -> after a point does not improve performance > after a point does not improve performance � Add a CPU, called core Add a CPU, called core - -> multicore chip > multicore chip � � May share cache or each has its own (or both) May share cache or each has its own (or both) � � � Failure in shared component can bring down more cores Failure in shared component can bring down more cores � System on a ship System on a ship – –design has also special design has also special- -purpose cores (for purpose cores (for � video, audio, crypto, network) with the CPUs video, audio, crypto, network) with the CPUs � � For the software symmetric multicore chips are similar than For the software symmetric multicore chips are similar than UMA multiprocessors UMA multiprocessors � ”How to use them with the software?” ”How to use them with the software?” � � � Parallel coding, lack of algorithms Parallel coding, lack of algorithms � Synchronization, race conditions, deadlocks Synchronization, race conditions, deadlocks � � Benefits Benefits 7 �

Multiprocessor OS Types Multiprocessor OS Types � Each CPU Each CPU has its own Operating System has its own Operating System � � Partition the memory for private use Partition the memory for private use � � Each CPU and its OS operate independently Each CPU and its OS operate independently � � Master Master- -Slave Multiprocessors Slave Multiprocessors � � � One CPU is the master and others are slaves One CPU is the master and others are slaves � Only master may run Operating System Only master may run Operating System � � Symmetric Multiprocessors (SMP) Symmetric Multiprocessors (SMP) � � One copy of the OS and any CPU may run it One copy of the OS and any CPU may run it � � � Balances processes and memory dynamically Balances processes and memory dynamically � Synchronization issues in the OS code itself Synchronization issues in the OS code itself � � Critical regions must be protected (disabling interrupts is not enough) Critical regions must be protected (disabling interrupts is not enough) � 8

E ach CPU has its own OS E ach CPU has its own OS � System call handled by the private OS using its own private data System call handled by the private OS using its own private data � structures structures � No sharing of processes, no load balancing No sharing of processes, no load balancing � � No sharing of memory pages, no reallocation of free pages No sharing of memory pages, no reallocation of free pages � � � No buffer cache for shared file systems to avoid inconsistencies No buffer cache for shared file systems to avoid inconsistencies � No used any more No used any more � 9

Master Master- -Slave multiprocessors Slave multiprocessors � Only one instance of OS and its data structures Only one instance of OS and its data structures � � � Master allocates load (processes/threads) to other CPUs Master allocates load (processes/threads) to other CPUs � Shared memory, pages can be allowed for all processes Shared memory, pages can be allowed for all processes � � � Shared buffer cache, inconsistencies do not occur Shared buffer cache, inconsistencies do not occur � Master may become bottleneck! Master may become bottleneck! � � It executes all system calls for all processes It executes all system calls for all processes � 10

Symmetric Multiprocessors, SMP Symmetric Multiprocessors, SMP � One copy of the OS, any CPU can run it One copy of the OS, any CPU can run it � � All CPUs can execute system calls All CPUs can execute system calls � � Access to critical regions must be controlled Access to critical regions must be controlled � � Mutexes Mutexes to control access to multiple independent critical regions to control access to multiple independent critical regions � � � Mutexes Mutexes to to cotrol cotrol access to critical tables (used in several critical access to critical tables (used in several critical regions) regions) � � Only one CPU can access a particular part of OS at any time! Only one CPU can access a particular part of OS at any time! 11 � Deadlocks might freeze the system Deadlocks might freeze the system �

Multiprocessor Multiprocessor Synchronization Synchronization Test and Set Lock (TSL) -instruction can fail if bus cannot be locked � Correct Correct mutex mutex implementation is not simple! implementation is not simple! � � � Disabled interrupts do not work with multiple processors Disabled interrupts do not work with multiple processors � � Test and Set Lock without locking the bus does not work (see fig above) Test and Set Lock without locking the bus does not work (see fig above) � Test and Set with logging the bus before command and releasing it after Test and Set with logging the bus before command and releasing it after � works works � The waiting CPU spins (loops in testing) fast waiting for the The waiting CPU spins (loops in testing) fast waiting for the spin lock spin lock � � Instead of spinning the CPU could switch to another thread Instead of spinning the CPU could switch to another thread � 12

Spinning versus switching Spinning versus switching � In In some cases CPU must wait some cases CPU must wait � � waits to acquire ready list waits to acquire ready list � � In other cases a choice exists In other cases a choice exists � � spinning wastes CPU cycles spinning wastes CPU cycles � � � switching uses up CPU cycles also switching uses up CPU cycles also � could be possible could be possible to make separate decision each time to make separate decision each time � locked locked mutex mutex encountered encountered 13

Multiprocessor scheduling Multiprocessor scheduling � What is scheduled? What is scheduled? � level threads � � OS schedules processes � � User User- -level threads OS schedules processes level threads � � OS schedules threads � Kernel Kernel- -level threads OS schedules threads � � Where to run it? Where to run it? � � � Which CPU Which CPU � Schedules thread independently or in groups? Schedules thread independently or in groups? � � Timesharing Timesharing � � Space sharing Space sharing � � Gang scheduling Gang scheduling � 14

Timesharing Timesharing � Single scheduling data Single scheduling data structure for structure for CPUs CPUs � � Automatic load balancing Automatic load balancing � � � Smart scheduling : thread in critical region is not switched off Smart scheduling : thread in critical region is not switched off � Affinity scheduling: make an effort to have thread run on the Affinity scheduling: make an effort to have thread run on the � same CPU again same CPU again 15

Multiple processor Multiple processor systems systems 1 - PowerPoint PPT Presentation

Multiple processor Multiple processor systems systems 1 Multiprocessor Systems Multiprocessor Systems Continuous need for faster Continuous need for faster computers computers Multiprocessors: Multiprocessors: shared

FPGA co-processor Patrick Dunne for the co-processor group Introduction Co-processor will

Systems Architecture The ARM Processor The ARM Processor p. 1/14 The ARM Processor ARM:

Processor Design Pipelined Processor Hung-Wei Tseng Drawbacks of a single-cycle processor

Embedded systems & the Nios II soft core processor A Nios II processor system I equivalent to

Cortex-A15 Processor ARMs next generation mobile applications processor Travis Lanier Senior

Ch. 5: Processor + Memory December 12, 2008 Ch. 5: Processor + Memory Overview of Implementation

Chapter 12 CPU Structure and Function Contents Processor organization Register

Processor Architecture: Current Trends A B Transfer a truckload at a time from A to B Processor

Processor Design Single Cycle Processor Hung-Wei Tseng Recap: the stored-program computer

Chapter 18 Parallel Processing Multiple Processor Organization Single instruction, single

Part IV I/O Systems y Chapter 13: I/O Systems p y 1 Fall 2009 I/O Hardw are I/O Hardw are

Multiple Instruction Issue Multiple instructions issued each cycle a processor that can

Multiple Decrement Models Lecture: Weeks 8-9 Lecture: Weeks 8-9 (STT 456) Multiple Decrement

Multiple Decrement Models Lecture: Weeks 8-9 Lecture: Weeks 8-9 (STT 456) Multiple Decrement

Multiple Sequence Multiple Sequence Alignments Alignments Multiple alignment Pairwise

Single Single- -Thread NVE Thread NVE Multiple Subsystems, Multiple Threads Multiple

RETHINKING OPERATING SYSTEM DESIGNS FOR A Ken Birman Based heavily MULTICORE WORLD on a slide

Terminology Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube,

Lecture 24: Virtual Memory, Multiprocessors Todays topics: Virtual memory

INF4140 - Models of concurrency Hsten 2015 October 19, 2015 Abstract This is the

Issues in Multiprocessors Which programming model for interprocessor communication shared

Parallel Numerical Algorithms Chapter 1 Parallel Computing Michael T. Heath and Edgar

Chapter 4: Threads Outline Wh a t a r e t h r e a d s ? H o w d o t h

Distributed Systems Share single address space Share data in that space Use threads for

Multiple processor Multiple processor systems systems 1 - PowerPoint PPT Presentation

Multiple processor Multiple processor systems systems 1 Multiprocessor Systems Multiprocessor Systems Continuous need for faster Continuous need for faster computers computers Multiprocessors: Multiprocessors: shared

FPGA co-processor Patrick Dunne for the co-processor group Introduction Co-processor will

Systems Architecture The ARM Processor The ARM Processor p. 1/14 The ARM Processor ARM:

Processor Design Pipelined Processor Hung-Wei Tseng Drawbacks of a single-cycle processor

Embedded systems &amp; the Nios II soft core processor A Nios II processor system I equivalent to

Cortex-A15 Processor ARMs next generation mobile applications processor Travis Lanier Senior

Ch. 5: Processor + Memory December 12, 2008 Ch. 5: Processor + Memory Overview of Implementation

Chapter 12 CPU Structure and Function Contents Processor organization Register

Processor Architecture: Current Trends A B Transfer a truckload at a time from A to B Processor

Processor Design Single Cycle Processor Hung-Wei Tseng Recap: the stored-program computer

Chapter 18 Parallel Processing Multiple Processor Organization Single instruction, single

Part IV I/O Systems y Chapter 13: I/O Systems p y 1 Fall 2009 I/O Hardw are I/O Hardw are

Multiple Instruction Issue Multiple instructions issued each cycle a processor that can

Multiple Decrement Models Lecture: Weeks 8-9 Lecture: Weeks 8-9 (STT 456) Multiple Decrement

Multiple Decrement Models Lecture: Weeks 8-9 Lecture: Weeks 8-9 (STT 456) Multiple Decrement

Multiple Sequence Multiple Sequence Alignments Alignments Multiple alignment Pairwise

Single Single- -Thread NVE Thread NVE Multiple Subsystems, Multiple Threads Multiple

RETHINKING OPERATING SYSTEM DESIGNS FOR A Ken Birman Based heavily MULTICORE WORLD on a slide

Terminology Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube,

Lecture 24: Virtual Memory, Multiprocessors Todays topics: Virtual memory

INF4140 - Models of concurrency Hsten 2015 October 19, 2015 Abstract This is the

Issues in Multiprocessors Which programming model for interprocessor communication shared

Parallel Numerical Algorithms Chapter 1 Parallel Computing Michael T. Heath and Edgar

Chapter 4: Threads Outline Wh a t a r e t h r e a d s ? H o w d o t h

Distributed Systems Share single address space Share data in that space Use threads for

Embedded systems & the Nios II soft core processor A Nios II processor system I equivalent to