Multithreaded processors
Hung-Wei Tseng
Multithreaded processors Hung-Wei Tseng Simultaneous Multi- - - PowerPoint PPT Presentation
Multithreaded processors Hung-Wei Tseng Simultaneous Multi- Threading (SMT) 12 Simultaneous Multi-Threading (SMT) Fetch instructions from different threads/processes to fill the not utilized part of pipeline Exploit thread level
Hung-Wei Tseng
12
to fill the not utilized part of pipeline
shared
13
14
Register renaming logic Schedule
Execution Units
Data Cache
Instruction Fetch: T0 Instruction Decode Instruction Fetch: T1 Instruction Fetch: T2 Instruction Fetch: T3 ROB: T0 ROB: T1 ROB: T2 ROB: T3
each cycle to fill the not utilized part of pipeline
T1 1: lw $t1, 0($a0) T1 2: lw $a0, 0($t1) T2 1: sll $t0, $a1, 2 T2 2: add $t1, $a0, $t0 T1 3: addi $a1, $a1, -1 T1 4: bne $a1, $zero, LOOP T2 3: lw $v0, 0($t1) T2 4: addi $t1, $t1, 4 T2 5: add $v0, $v0, $t2 T2 6: jr $ra
IF IF IF IF IF IF ID ID ID ID IF IF
Can execute 6 instructions before bne resolved.
15
EXE Sch MEM Sch MEM EXE Sch Sch Sch Sch ID ID ID ID Ren Ren Ren Ren IF IF C C C MEM C Sch EXE Sch Sch Sch Sch Ren Ren Ren Ren ID ID C C EXE C C C EXE C C EXE C C Sch Sch Sch Sch C C MEM EXE EXE Sch Sch EXE Sch Sch Sch EXE Sch Sch Sch Ren Ren
Register Files
17
problem
be high.
18
19
cubic!
power by 2x
more core into a single chip!
multiple narrower issue processor.
20
23
performance on CMP/SMT.
24
processors can share
same address space.
variables in memory
thread programming
25
bus.
bus must run at the same speed
26
Bus
Core 0 Core 1 Core 2 Core 3
Shared $
its own local cache
27
Core 0
Local $
Core 1
Local $
Core 2
Local $
Core 3
Local $
Bus
Shared $
memory address in the system when the processors need the value at the same time
28
29
Invalid Shared Exclusive
read miss(processor) write miss(processor) write miss(bus) w r i t e r e q u e s t ( p r
e s s
) write miss(bus)
write back data
r e a d m i s s ( b u s )
w r i t e b a c k d a t a
read miss/hit read/write miss (bus) write hit
30
belongs to the same cache block as 0x1000?
31
Bus Shared $
Local $
Core 0 Core 1 Core 2 Core 3
Shared 0x1000 Shared 0x1000 Shared 0x1000 Shared 0x1000
Invalid 0x1000 Invalid 0x1000 Invalid 0x1000 Write miss 0x1000
32
Bus Shared $
Local $
Core 0 Core 1 Core 2 Core 3
Shared 0x1000 Invalid 0x1000 Shared 0x1000 Invalid 0x1000
Invalid 0x1000 Read miss 0x1000 Write back 0x1000 Fetch 0x1000
34
thread 1 thread 2 while(1) printf(“%d ”,a); while(1) a++;
which belongs the same block as 0x1000?
35
Bus Shared $
Local $
Core 0 Core 1 Core 2 Core 3
Shared 0x1000 Invalid 0x1000 Shared 0x1000 Invalid 0x1000 Invalid 0x1000 Invalid 0x1000
Invalid 0x1000 Write miss 0x1004
processors.
However, Y is invalidated because X and Y are in the same block!
36
processes or multiple “threads”
computer at the same time.
threads.
37
38
thread 1 thread 2 int loop; int main() { pthread_t thread; loop = 1; pthread_create(&thread, NULL, modifyloop, NULL); pthread_join(&thread, NULL); while(loop) { continue; } fprintf(stderr,"finished\n"); return 0; } void* modifyloop(void *x) { sleep(1); loop = 0; return NULL; }
39