 
              Pipelining PIPELINING what Seymour Cray taught the laundry industry How to correctly pipeline circuits… I’ve got 3 months Worth of laundry Funny, considering that he’s only got To do tonight… one outfit… • Acknowledgement: The following slides have been provided by Prof. Ward in September 2004. • Reformatting of PowerPoint and addition of two more slide done September 2007 by Jens Sparsø. • Slides are used in DTU course 02154 Digital Systems Engineering (fall 2008). • Due to my (Joachim Rodrigues) position at DTU, I took the freedom to use the slides in EITF35. 02340 Lectu 02340 cture 3 e 3 / / Ackn Acknow owledgemen ledgement: Slides Slides from MI from MIT T cou course 6.004 6.004 prov ovided by Prof. ided by Prof. Wa Ward Sep rd Septemb ember 2004 r 2004 2 02340 02340 Lectu cture 3 e 3 / / Ackn Acknow owledgemen ledgement: Slides Slides from MI from MIT T cou course 6.004 6.004 prov ovided by Prof. ided by Prof. Wa Ward Sep rd Septemb ember 2004 r 2004 3 One load at a time Forget EITF35… lets solve a “Real Problem” Everyone knows that the real reason that MIT students put Step 1: INPUT: off doing laundry so long is not Device: Washer dirty laundry because they procrastinate, Function: Fill, Agitate, Spin are lazy, or even have better things to do. Washer PD = 30 mins Step 2: The fact is, doing one load at a OUTPUT: time is not smart. 6 more weeks Device: Dryer Function: Heat, Spin Dryer PD = 60 mins Total = Washer PD + Dryer PD 90 = _________ mins 02340 Lectu 02340 cture 3 e 3 / / Ackn Acknow owledgemen ledgement: Slides Slides from MI from MIT T cou course 6.004 6.004 prov ovided by Prof. ided by Prof. Wa Ward Sep rd Septemb ember 2004 r 2004 4 02340 02340 Lectu cture 3 e 3 / / Ackn Acknow owledgemen ledgement: Slides Slides from MI from MIT T cou course 6.004 6.004 prov ovided by Prof. ided by Prof. Wa Ward Sep rd Septemb ember 2004 r 2004 5
Doing N loads of laundry Doing N Loads… the MIT way MIT students Step 1: Step 1: Here’s how they do laundry at “pipeline” the laundry Harvard, the “combinational” way. process. Step 2: Step 2: (Of course, this is just an urban legend. No one at Harvard That’s why we wait! Step 3: Step 3: actually does laundry. The … butlers all arrive on Wednesday morning, pick up the dirty Actually, it’s more like N*60 Step 4: laundry and return it all pressed + 30 if we account for the … and starched in time for startup transient correctly. Total = N * Max(Washer PD , Dryer PD ) When doing pipeline analysis, afternoon tea) we’re mostly interested in N*60 = ____________ mins Total = N*(Washer PD + Dryer PD ) the “steady state” where we assume we have an infinite N*90 = ____________ mins supply of inputs. 02340 02340 Lectu cture 3 e 3 / / Ackn Acknow owledgemen ledgement: Slides Slides from MI from MIT T cou course 6.004 6.004 prov ovided by Prof. ided by Prof. Wa Ward Sep rd Septemb ember 2004 r 2004 6 02340 02340 Lectu cture 3 e 3 / / Ackn Acknow owledgemen ledgement: Slides Slides from MI from MIT T cou course 6.004 6.004 prov ovided by Prof. ided by Prof. Wa Ward Sep rd Septemb ember 2004 r 2004 7 Some definitions Okay, back to circuits… Latency: For combinational logic: latency = t PD , F The delay from when an input is established until the output throughput = 1/t PD. associated with that input becomes valid. X H P(X) We can’t get the answer faster, but are we making effective use 90 90 (Harvard Laundry = _________ mins) Assuming that the wash is started as of our hardware at all times? G soon as possible and waits (wet) in the ( MIT Laundry = _________ mins) 120 120 washer until dryer is available. X Throughput: F(X) The rate of which inputs or outputs are processed. G(X) (Harvard Laundry = _________ outputs/min) 1/90 1/90 P(X) ( MIT Laundry = _________ outputs/min) 1/60 1/60 F & G are “idle”, just holding their outputs stable while H performs its computation 02340 02340 Lectu cture 3 e 3 / / Ackn Acknow owledgemen ledgement: Slides Slides from MI from MIT T cou course 6.004 6.004 prov ovided by Prof. ided by Prof. Wa Ward Sep rd Septemb ember 2004 r 2004 8 02340 02340 Lectu cture 3 e 3 / / Ackn Acknow owledgemen ledgement: Slides Slides from MI from MIT T cou course 6.004 6.004 prov ovided by Prof. ided by Prof. Wa Ward Sep rd Septemb ember 2004 r 2004 9
Pipelined Circuits Pipeline diagrams use registers to hold H’s input stable! F 15 15 Clock cycle X H P(X) P(X 25 25 i i+1 i+2 i+3 G 20 20 Now F & G can be working on input X i+1 F while H is performing its computation 15 Input X i X i+1 X i+2 X i+3 on X i . We’ve created a 2-stage pipeline : … if we have a valid input X during clock Pipeline stages X H P(X) 25 cycle j, P(X) is valid during clock j+2. F Reg F(X i ) F(X i+1 ) F(X i+2 ) G … 20 G(X i ) G(X i+1 ) G(X i+2 ) G Reg Suppose F, G, H have propagation delays of 15, 20, 25 ns and we are using ideal zero-delay registers: H Reg H(X i ) H(X i+1 ) H(X i+2 ) latency throughput unpipelined 45 1/45 The results associated with a particular set of input 2-stage pipelined ______ ______ 50 1/25 data moves diagonally through the diagram, progressing through one pipeline stage each clock cycle. worse better 02340 02340 Lectu cture 3 e 3 / / Ackn Acknow owledgemen ledgement: Slides Slides from MI from MIT T cou course 6.004 6.004 prov ovided by Prof. ided by Prof. Wa Ward Sep rd Septemb ember 2004 r 2004 10 10 02340 02340 Lectu cture 3 e 3 / / Ackn Acknow owledgemen ledgement: Slides Slides from MI from MIT T cou course 6.004 6.004 prov ovided by Prof. ided by Prof. Wa Ward Sep rd Septemb ember 2004 r 2004 11 11 Pipeline diagrams (alternative view) Pipeline Conventions Slide added by DEFINITION: F J. Sparsø a K-Stage Pipeline (“K-pipeline”) is an acyclic circuit having exactly K 15 15 Clock cycles registers on every path from an input to an output. X H P(X) P(X 25 25 i i+1 i+2 i+3 … G a COMBINATIONAL CIRCUIT is thus an 0-stage pipeline. 20 20 F(X i ) CONVENTION: H(X i ) X i Every pipeline stage, hence every K-Stage pipeline, has a register on its G(X i ) OUTPUT (not on its input). F(X i+1 ) Inputs ALWAYS: X i+1 H(X i+1 ) The CLOCK common to all registers must have a period sufficient to G(X i+1 ) cover propagation over combinational paths PLUS (input) register t PD PLUS (output) register t SETUP . F(X i+2 ) X i+2 H(X i+2 ) G(X i+2 ) The LATENCY of a K-pipeline is K times the period of the clock common to all registers. … … … The THROUGHPUT of a K-pipeline is the • Each row shows the processing of a particular set of input data. frequency of the clock. (In a processor the processing of an instruction. You’ll see plenty…) 02340 Lectu 02340 cture 3 e 3 / / Ackn Acknow owledgemen ledgement: Slides Slides from MI from MIT T cou course 6.004 6.004 prov ovided by Prof. ided by Prof. Wa Ward Sep rd Septemb ember 2004 r 2004 12 12 02340 Lectu 02340 cture 3 e 3 / / Ackn Acknow owledgemen ledgement: Slides Slides from MI from MIT T cou course 6.004 6.004 prov ovided by Prof. ided by Prof. Wa Ward Sep rd Septemb ember 2004 r 2004 13 13
Recommend
More recommend