Multithreading
1
Multithreading 1 A rchitectural State and Context Switches 2 A - - PowerPoint PPT Presentation
Multithreading 1 A rchitectural State and Context Switches 2 A rchitectural State The A rchitectural State of a thread is everything that defines the state of a running program The contents of the register file The current
1
2
ISA dictate that instructions execute one-at-a-time.
3
state
(Sometimes)
4
5
6
[Brown ’10] [Choi’08]
7
memory
the the dependence graph requires communication.
8
For(i = 1..5) { s[i] = a[i] + b[i] + c[i]+…; }
10
Par_msum (matrix A, Matrix B) { matrix R; R[upleft] = Spawn(msum(a[upleft],b[upleft]) R[upright] = Spawn(msum(a[upright],b[upright]) R[lowleft] = Spawn(msum(a[lowleft],b[lowleft]) R[lowright] = Spawn(msum(a[lowright],b[lowright]) Wait(the_barrier) }
available parallelism.
find ILP
that is profitable to exploit
11
very poor cache performance
ILP
12
100 90 80 70 60 50 40 30 20 10
processor busy itlb miss dtlb miss icache miss dcache miss branch mispred. control hazards load delays short int long int short fp long fp mem conflict
Percent of total issue cycles
PC
PC
PC
PC
PC
instruction stream
Issue Slots
Time (proc cycles)
Issue Slots
Time (proc cycles)
Time (proc cycles)
Issue Slots
Issue Slots
Time (proc cycles)
Issue Slots
Time (proc cycles)
superscalar design.
thread.
threads.
Instruction Cache 8 Decode Register Renaming floating point instruction queue integer instruction queue fp units int. units PC Fetch Unit int/ld- store units Data Cache integer reg’s fp reg’s
Instruction Cache 8 Decode Register Renaming floating point instruction queue integer instruction queue fp units int. units Fetch Unit int/ld- store units Data Cache integer reg’s fp reg’s
PC
1 2 3 4 5 2 4 6 8 Number of Threads Unmodified Superscalar
Throughput (Instructions Per Cycle)
1 2 3 4 5 2 4 6 8 Number of Threads Improved Baseline Unmodified superscalar
Instructions per cycle
time
memory
30
31
32
1Mhz machine with no caches.
code?
33
34