Multicore Processors
Raul Queiroz Feitosa
Parts of these slides are from the support material provided by W. Stallings
Multicore Processors Raul Queiroz Feitosa Parts of these slides are - - PowerPoint PPT Presentation
Multicore Processors Raul Queiroz Feitosa Parts of these slides are from the support material provided by W. Stallings Objective Objective This chapter provides an overview of multicore systems. Stallings 2 Multicore Computers
Multicore Processors
Raul Queiroz Feitosa
Parts of these slides are from the support material provided by W. Stallings
Objective
“This chapter provides an overview of multicore systems”. Stallings
Objective
Outline
Hardware Performance Issues Software Performance Issues Multicore Organizations Intel Core Architecture
Outline
Hardware performance issues
Chip Density
Microprocessors performance increase due to
a) Improved organization, e.g., b) Increased clock frequency
both made possible by 1. increasing chip density!
Pipelining Superscalar Multithreading …By 2018 → 30 trillion transistors on 300mm2 die.
Hardware performance issues
Chip Density
Microprocessors performance increase in due to
a) Improved organization, e.g., b) Increased clock frequency
both made possible by 1. increasing chip density!
Pipelining Superscalar Multithreading …By 2015 → 100 billion transistors on 300mm2 die.
Source: IEEE Spectrum, 2017“performance is roughly proportional to square root of increase in complexity” . Single thread computer Four small cores
Pollack’s Rule
Complexity Power Performance 1 1 1 4 4 2 25 25 5
Multicore Computers 6Complexity Power Performance 41 41 4
Hardware performance issues
Multicore Computers 7 Source: Henk Poleyy, 2014Hardware performance issues
Source: Henk Poleyy, 2014Hardware performance issues
Power
chip density and clock frequency!
Hardware performance issues
Increased Complexity
Outline
Hardware Performance Issues Software Performance Issues Multicore Organizations Intel Core Architecture
Outline
Software Performance Issues
small amounts of serial code impact performance
According to Amdahl’s law
where f is the fraction of code infinitely parallelizable with no schedule
N f f N 1 1 processors parallel
program execute to time processor single a
program execute to time Speedup
Software Performance Issues
Small amounts of serial code impact performance due to communication, distribution of work and cache coherence overheads
percentage of sequential codeSoftware Performance Issues
More recently software engineers have developed applications that effectively exploit multiprocessor architecture, e. g., database applications.
Outline
Hardware Performance Issues Software Performance Issues Multicore Organization Intel Core Architecture
Outline
Multicore Organization
In view of:
1.
Increasing chip density.
2.
Diminishing gains with complexity increase.
3.
Power requirements grow exponentially with chip density and clock frequency.
4.
Memory transistors have a power density one order of magnitude lower than that of logic.
5.
Applications, which exploit multiprocessor architecture.
What to do with extra transistors made available by the semiconductor industry?
Multicore Organization
What to do with extra transistors made available by the semiconductor industry?
Reduce complexity, so that multiple complete processors
fit in a single chip
Reduce clock frequency and increase the proportion of
chip occupied by cache to reduce power requirements
Multicore Organization
Main variable in a multicore organization:
Number of core processors on chip Number of levels of cache on chip Amount of shared cache
Multicore Organization Alternatives
Dedicated L1Cache (ARM 11 MPCore)
Multicore Computers 19Multicore Organization Alternatives
Dedicated L1Cache (AMD Opteron)
Multicore Computers 20Multicore Organization Alternatives
Shared L2 Cache (Intel Core Duo)
Multicore Computers 21Multicore Organization Alternatives
Shared L3 Cache (Intel Core i7)
Multicore Computers 22Private × shared L2 Cache
Advantages of shared L2 Cache
Constructive interference reduces overall miss rate Data shared by multiple cores not replicated at cache level With proper frame replacement algorithms mean amount of shared
cache dedicated to each core is dynamic
Threads with less locality can have more cache Easy inter-process communication through shared memory Cache coherency confined to L1
Advantages of private L2 Cache
Dedicated L2 cache gives each core more rapid access
Shared L3 cache may also improve performance
Outline
Hardware Performance Issues Software Performance Issues Multicore Organization Intel Core Architecture
Outline
Intel i3, i5, i7, i9
Multicore Computers 25 Source: Intel 10th Gen Intel Core Desktop ProcessorsMulticore Processors