Example Task: Doing a load of laundry W ash, D ry, F old Each - - PowerPoint PPT Presentation

example task doing a load of laundry w ash d ry f old
SMART_READER_LITE
LIVE PREVIEW

Example Task: Doing a load of laundry W ash, D ry, F old Each - - PowerPoint PPT Presentation

Example Task: Doing a load of laundry W ash, D ry, F old Each laundry load takes T hours Completing n tasks requires nT hours T 2T 3T 4T 5T 6T 7T 8T


slide-1
SLIDE 1

Example

  • Task: Doing a load of laundry

– Wash, Dry, Fold – Each laundry load takes T hours

  • Completing n tasks requires nT hours

WDF WDF WDF WDF

T 2T 3T 4T 5T 6T 7T 8T 9T

WDF WDF WDF WDF WDF

1

slide-2
SLIDE 2

Parallel Processing

  • M independent machines

– Wash, Dry, Fold – Do M laundry loads concurrently

  • Completing n tasks takes T x ceiling(n/M) hours

WDF WDF WDF

T 2T 3T

WDF WDF WDF WDF WDF WDF

Requires M units to achieve speedup

2

slide-3
SLIDE 3

Pipelining

  • Divide each task into component microtasks
  • Each microtask requires unit time (same for all microtasks)

– One microtask performed per stage

  • With p stages per task: n tasks require np time units.

W1 D1 F1 W2 D2 F2 W3 D3 F3 1 2 3 4 5 6 7 8 9

5

slide-4
SLIDE 4

Pipelining

W1 D1 F1 W2 D2 F2 W3 D3 F3 W4 D4 F4 1 2 3 4 5 6

Task i begins immediately after task i-1 completes its first stage p-stage pipeline: task n completes at time step n + p - 1 (vs. n x p sequential) In steady-state p tasks concurrently active

6

Pipeline Fill: 1-2 Steady State: 3-4 Pipeline Flush: 5-6

slide-5
SLIDE 5

Pipelining

W W W W

1 2 3 4 5 6 7 8 9

W W W D F D F D F D F D F D F D F

Task i begins immediately after task i-1 completes its first stage For a p-stage pipeline: task n completes at time step n + p -1 In steady-state p tasks concurrently active Latency of a task = p time steps (Not changed by Pipelining)

7

By reducing time for n tasks from np to n+p-1 Increases task throughput

slide-6
SLIDE 6

Latency and Throughput

  • Latency of a task:

– Time elapsed between start and finish of the task Assume that W, D, F take 1 hour each In all designs latency is the same (3 hours)

  • Throughput:

Number of tasks completed per unit time – Non-pipelined design:

T = p time units per task: Throughput = 1/p

1 task completes every p time units – Pipelined design with p-stage pipeline (unit time per stage) n tasks in n+p-1 time: Throughput = n/(n+p-1) = 1/(1 + (p-1)/n) approaches 1 (as n >> p) 1 task completes per time unit Speedup = Tnon-pipelined/Tpipelined = np/(n+p-1) For n >> p Speedup approaches p – Parallel processing design with M machines: Every T = p time units M tasks complete Throughput = M/p and Speedup M

8

slide-7
SLIDE 7

Multi Cycle Implementation P

IR

MEM

MDR

REG FILE ALU

ALUout

A B

PCWrite IRWrite MEMRead ALUop

C

4

MDRWrite AWrite ALUWrite BWrite

STATE MACHINE DECODER

9

slide-8
SLIDE 8

Multi-Cycle Design State Machine Model

LD (5 cycles) ADD (4 cycles)

S1 S0

Instruction Fetch: IR = IM[PC]; PC = PC+4 Instruction Decode: Generate Control Signals A = REG[$rs] B = REG[$rt] ALUout = PC + Shift(SE(offset)) R-R : p= A q = B ALUout = p op q lw : p= A q = SE(d) ALUout = p op q sw : p= A q =SE(d) ALUout = p op q R-R : REG[$rd] = ALUout lw : MDR = DM[ALUout] lw : REG[$rt] = MDR

1 2 3 5 8 10 6 7

sw : DM[ALUout] = B

9

beq : p = A; q = B; Z = (p .eq. q); If (z == 1) PC = ALUOUT;

10

S0 S5 S1 S2 S6 S7 S3 S0

2