SLIDE 1
CSCI341 Lecture 38, Introduction to Multicore Architectures GOAL: - - PowerPoint PPT Presentation
CSCI341 Lecture 38, Introduction to Multicore Architectures GOAL: - - PowerPoint PPT Presentation
CSCI341 Lecture 38, Introduction to Multicore Architectures GOAL: PERFORMANCE Recall: Power as the overriding issue. Performance, heat, power efficiency. PIPELINING Exploits potential parallelism among instructions.
SLIDE 2
SLIDE 3
PIPELINING
“Exploits potential parallelism among instructions.” “Instruction-level parallelism”
SLIDE 4
PROCESS-LEVEL PARALLELISM
Utilizing multiple processors by running independent programs simultaneously.
SLIDE 5
PARALLEL PROCESSING PROGRAM
Executing one program upon multiple processors simultaneously.
SLIDE 6
MULTI-PROCESSOR ARCHITECTURES
A system with at least two processors.
SLIDE 7
MULTI-CORE ARCHITECTURES
A system with multiple processors (“cores”) within a single integrated circuit.
SLIDE 8
SEQUENTIAL VS. CONCURRENT
SLIDE 9
THE PROBLEM
(not about the hardware) It is difficult to write software that uses multiple processors that complete tasks faster. Why?
SLIDE 10
MUST YIELD THE BENEFIT
The parallel implementation must be faster, especially as the number of processors increase. Otherwise, what’s the point? Single-processor instruction-level parallelism has evolved. (see superscalar & out-of-order execution)
SLIDE 11
COMPLICATIONS
- scheduling
- load balancing
- time for synchronization
- communication overhead
- Amdahl’s law
Example: multiple journalists writing a story.
SLIDE 12
SMP
Shared Memory Multiprocessor Multiple processors, single memory address space. All cores have access to all data. (Multi-core architectures generally use this approach)
SLIDE 13
SMP
SLIDE 14
SYNCHRONIZATION
Coordinating operations on shared data between multiple processors. Common solution: locks.
SLIDE 15
MESSAGE PASSING
What if each processor has its own address space?
SLIDE 16
MESSAGE PASSING
Pragmatically, manifests as clusters of individual machines. But, there’s a cost to administering these individual physical machines.
SLIDE 17
VIRTUAL MACHINES
An additional layer of abstraction on top of hardware. Multiple cluster nodes on top of hardware, each capable
- f sending/receiving messages.
SLIDE 18
SO MUCH MORE...
- Multithreading
- MIMD (Multiple Instruction / Multiple Data Streams)
- Vector architectures (see Cray)
- GPUs
SLIDE 19
AND MORE...
Storage & I/O (Chapter 6) One simple approach: memory-mapped I/O
SLIDE 20
AND MORE...
Many instructions are loads/stores... how can we exploit the memory hierarchy?
SLIDE 21
PRINCIPAL OF LOCALITY
- Temporal
- Spatial
SLIDE 22
PRINCIPAL OF LOCALITY
Memory closest to the processor fastest (most expensive).
SLIDE 23
HIERARCHY
< 3 ns < 70 ns < 20m ns $2000/GB $20/GB $0.25/GB
SLIDE 24
HIERARCHY
SLIDE 25
HOMEWORK
- Reading 32
- Final exam program