Lets Build a Processor Almost ready to move into chapter 5 and - PowerPoint PPT Presentation

Lets Build a Processor • Almost ready to move into chapter 5 and start building a processor • First, let’s review Boolean Logic and build the ALU we’ll need (Material from Appendix B) operation a 32 ALU result 32 b 32 86  2004 Morgan Kaufmann Publishers

Review: Boolean Algebra & Gates • Problem: Consider a logic function with three inputs: A, B, and C. Output D is true if at least one input is true Output E is true if exactly two inputs are true Output F is true only if all three inputs are true • Show the truth table for these three functions. • Show the Boolean equations for these three functions. • Show an implementation consisting of inverters, AND, and OR gates. 87  2004 Morgan Kaufmann Publishers

An ALU (arithmetic logic unit) • Let's build an ALU to support the andi and ori instructions – we'll just build a 1 bit ALU, and use 32 of them operation op a b res a result b • Possible Implementation (sum-of-products): 88  2004 Morgan Kaufmann Publishers

Review: The Multiplexor • Selects one of the inputs to be the output, based on a control input S note: we call this a 2-input mux A even though it has 3 inputs! 0 C B 1 • Lets build our ALU using a MUX: 89  2004 Morgan Kaufmann Publishers

Different Implementations • Not easy to decide the “best” way to build something – Don't want too many inputs to a single gate – Don’t want to have to go through too many gates – for our purposes, ease of comprehension is important • Let's look at a 1-bit ALU for addition: CarryIn c out = a b + a c in + b c in a sum = a xor b xor c in Sum b CarryOut • How could we build a 1-bit ALU for add, and, and or? • How could we build a 32-bit ALU? 90  2004 Morgan Kaufmann Publishers

Building a 32 bit ALU CarryIn Operation a0 CarryIn Result0 ALU0 b0 CarryOut Operation CarryIn a1 CarryIn Result1 a ALU1 0 b1 CarryOut 1 Result a2 CarryIn Result2 ALU2 2 b2 b CarryOut CarryOut a31 CarryIn Result31 ALU31 b31 91  2004 Morgan Kaufmann Publishers

What about subtraction (a – b) ? • Two's complement approach: just negate b and add. • How do we negate? • A very clever solution: Binvert Operation CarryIn a 0 1 Result b 0 2 1 CarryOut 92  2004 Morgan Kaufmann Publishers

Adding a NOR function • Can also choose to invert a. How do we get “a NOR b” ? Ainvert Operation Binvert CarryIn a 0 0 1 1 Result b 0 2 + 1 CarryOut 93  2004 Morgan Kaufmann Publishers

Tailoring the ALU to the MIPS • Need to support the set-on-less-than instruction (slt) – remember: slt is an arithmetic instruction – produces a 1 if rs < rt and 0 otherwise – use subtraction: (a-b) < 0 implies a < b • Need to support test for equality (beq $t5, $t6, $t7) – use subtraction: (a-b) = 0 implies a = b 94  2004 Morgan Kaufmann Publishers

Supporting slt • Can we figure out the idea? Operation Ainvert Operation Ainvert Binvert CarryIn Binvert CarryIn a 0 a 0 0 0 1 1 1 1 Result Result b 0 b 0 2 + 2 + 1 1 Less 3 Less 3 Set CarryOut Overflow Overflow detection all other bits Use this ALU for most significant bit

Supporting slt Binvert Operation Ainvert CarryIn a0 CarryIn Result0 b0 ALU0 Less CarryOut a1 CarryIn Result1 b1 ALU1 0 Less CarryOut a2 CarryIn Result2 b2 ALU2 0 Less CarryOut . . . . . . . . . CarryIn a31 CarryIn Result31 Set b31 ALU31 0 Less Overflow 96  2004 Morgan Kaufmann Publishers

Test for equality • Notice control lines: Bnegate Operation Ainvert 0000 = and a0 CarryIn 0001 = or Result0 b0 ALU0 0010 = add Less CarryOut 0110 = subtract 0111 = slt a1 CarryIn Result1 1100 = NOR b1 ALU1 0 Less Zero . CarryOut . . a2 CarryIn • Note: zero is a 1 when the result is zero! Result2 b2 ALU2 0 Less CarryOut . . . . . . . . . . . . CarryIn Result31 a31 CarryIn Set b31 ALU31 0 Less Overflow 97  2004 Morgan Kaufmann Publishers

Conclusion • We can build an ALU to support the MIPS instruction set – key idea: use multiplexor to select the output we want – we can efficiently perform subtraction using two’s complement – we can replicate a 1-bit ALU to produce a 32-bit ALU • Important points about hardware – all of the gates are always working – the speed of a gate is affected by the number of inputs to the gate – the speed of a circuit is affected by the number of gates in series (on the “critical path” or the “deepest level of logic”) • Our primary focus: comprehension, however, – Clever changes to organization can improve performance (similar to using better algorithms in software) – We saw this in multiplication, let’s look at addition now 98  2004 Morgan Kaufmann Publishers

Problem: ripple carry adder is slow • Is a 32-bit ALU as fast as a 1-bit ALU? • Is there more than one way to do addition? – two extremes: ripple carry and sum-of-products Can you see the ripple? How could you get rid of it? c 1 = b 0 c 0 + a 0 c 0 + a 0 b 0 c 2 = b 1 c 1 + a 1 c 1 + a 1 b 1 c 2 = c 3 = b 2 c 2 + a 2 c 2 + a 2 b 2 c 3 = c 4 = b 3 c 3 + a 3 c 3 + a 3 b 3 c 4 = Not feasible! Why? 99  2004 Morgan Kaufmann Publishers

Carry-lookahead adder • An approach in-between our two extremes • Motivation: – If we didn't know the value of carry-in, what could we do? – When would we always generate a carry? g i = a i b i – When would we propagate the carry? p i = a i + b i • Did we get rid of the ripple? c 1 = g 0 + p 0 c 0 c 2 = g 1 + p 1 c 1 c 2 = c 3 = g 2 + p 2 c 2 c 3 = c 4 = g 3 + p 3 c 3 c 4 = Feasible! Why? 100  2004 Morgan Kaufmann Publishers

Use principle to build bigger adders CarryIn a0 CarryIn b0 Result0–3 a1 b1 a2 ALU0 b2 pi P0 a3 gi G0 b3 C1 Carry-lookahead unit ci + 1 a4 CarryIn b4 Result4–7 a5 • Can’t build a 16 bit adder this way... (too big) b5 ALU1 a6 • Could use ripple carry of 4-bit CLA adders b6 pi + 1 P1 a7 gi + 1 G1 b7 • Better: use the CLA principle again! C2 ci + 2 a8 CarryIn b8 Result8–11 a9 b9 ALU2 a10 pi + 2 b10 P2 a11 gi + 2 G2 b11 C3 ci + 3 a12 CarryIn b12 Result12–15 a13 b13 a14 ALU3 b14 pi + 3 P3 a15 gi + 3 G3 b15 C4 ci + 4 101 CarryOut  2004 Morgan Kaufmann Publishers

ALU Summary • We can build an ALU to support MIPS addition • Our focus is on comprehension, not performance • Real processors use more sophisticated techniques for arithmetic • Where performance is not critical, hardware description languages allow designers to completely automate the creation of hardware! 102  2004 Morgan Kaufmann Publishers

Chapter Five 103  2004 Morgan Kaufmann Publishers

The Processor: Datapath & Control • We're ready to look at an implementation of the MIPS • Simplified to contain only: – memory-reference instructions: lw, sw – arithmetic-logical instructions: add, sub, and, or, slt – control flow instructions: beq, j • Generic Implementation: – use the program counter (PC) to supply instruction address – get the instruction from memory – read registers – use the instruction to decide exactly what to do • All instructions use the ALU after reading the registers Why? memory-reference? arithmetic? control flow? 104  2004 Morgan Kaufmann Publishers

More Implementation Details • Abstract / Simplified View: 4 Add Add Data Register # ALU Address PC Address Instruction Registers Register # Data Instruction memory memory Register # Data Two types of functional units: – elements that operate on data values (combinational) – elements that contain state (sequential) 105  2004 Morgan Kaufmann Publishers

State Elements • Unclocked vs. Clocked • Clocks used in synchronous logic – when should an element that contains state be updated? Falling edge Clock period Rising edge cycle time 106  2004 Morgan Kaufmann Publishers

An unclocked state element • The set-reset latch – output depends on present inputs and also on past inputs R Q Q S 107  2004 Morgan Kaufmann Publishers

Latches and Flip-flops • Output is equal to the stored value inside the element (don't need to ask for permission to look at the value) • Change of state (value) is based on the clock • Latches: whenever the inputs change, and the clock is asserted • Flip-flop: state changes only on a clock edge (edge-triggered methodology) "logically true", — could mean electrically low A clocking methodology defines when signals can be read and written — wouldn't want to read a signal at the same time it was being written 108  2004 Morgan Kaufmann Publishers

D-latch • Two inputs: – the data value to be stored (D) – the clock signal (C) indicating when to read & store D • Two outputs: – the value of the internal state (Q) and it's complement C D Q C Q _ Q D 109  2004 Morgan Kaufmann Publishers

D flip-flop • Output changes only on the clock edge Q Q D D D Q D D latch latch Q C C Q C D C Q 110  2004 Morgan Kaufmann Publishers

Lets Build a Processor Almost ready to move into chapter 5 and - PowerPoint PPT Presentation

Lets Build a Processor Almost ready to move into chapter 5 and start building a processor First, lets review Boolean Logic and build the ALU well need (Material from Appendix B) operation a 32 ALU result 32 b 32 86

FPGA co-processor Patrick Dunne for the co-processor group Introduction Co-processor will

Processor Design Pipelined Processor Hung-Wei Tseng Drawbacks of a single-cycle processor

Systems Architecture The ARM Processor The ARM Processor p. 1/14 The ARM Processor ARM:

The Definite Integral Lets review what we saw in part 1: The Definite Integral Lets review

LET Development Rookie Members LETs INSPIRE Helping to grow the game LETs PLAY Helping

Build-Finance or Design-Build-Finance Transportation Projects Types of P3s Design-Build (DB)

Build Build Build Build System building The process of compiling and linking software

Heapsort Build-Max-Heap Next we build a full heap from an unsorted sequence Build-Max-Heap(A)

Electron Cloud Build Electron Cloud Build- Electron Cloud Build Electron Cloud Build -Up

Cortex-A15 Processor ARMs next generation mobile applications processor Travis Lanier Senior

Ch. 5: Processor + Memory December 12, 2008 Ch. 5: Processor + Memory Overview of Implementation

Chapter 12 CPU Structure and Function Contents Processor organization Register

Processor Architecture: Current Trends A B Transfer a truckload at a time from A to B Processor

Embedded systems & the Nios II soft core processor A Nios II processor system I equivalent to

Processor Design Single Cycle Processor Hung-Wei Tseng Recap: the stored-program computer

DIY AUDIO ELECTRONICS WHY DIY? LETS BUILD AN EFFECTS PEDAL CIRCUIT BASICS A circuit contains

Processor Datapath Levels in Processor Design We can talk about design at a variety of levels

CS 31: Intro to Systems Digital Logic Martin Gagn Swarthmore College January 31, 2017

CSCI 510/610: Advanced Computer Architecture Implementing a Datapath in Verilog A Lab Manual

Power Optimization of FPGA Interconnect Via Circuit and CAD Techniques Safeen Huda and Jason

Programmable Microfluidics William Thies, J.P. Urbanski , Mats Cooper , David Wentzlaff,

A Theory of Abstraction for Arrays Steven German IBM T.J. Watson Research Center October 2011 1

for genetic programming W. B. Langdon CREST lab, Department of Computer Science Slides

Using Form al Techniques for Design for Verifiability Rolf Drechsler University of Brem en DFKI

Lets Build a Processor Almost ready to move into chapter 5 and - PowerPoint PPT Presentation

Lets Build a Processor Almost ready to move into chapter 5 and start building a processor First, lets review Boolean Logic and build the ALU well need (Material from Appendix B) operation a 32 ALU result 32 b 32 86

FPGA co-processor Patrick Dunne for the co-processor group Introduction Co-processor will

Processor Design Pipelined Processor Hung-Wei Tseng Drawbacks of a single-cycle processor

Systems Architecture The ARM Processor The ARM Processor p. 1/14 The ARM Processor ARM:

The Definite Integral Lets review what we saw in part 1: The Definite Integral Lets review

LET Development Rookie Members LETs INSPIRE Helping to grow the game LETs PLAY Helping

Build-Finance or Design-Build-Finance Transportation Projects Types of P3s Design-Build (DB)

Build Build Build Build System building The process of compiling and linking software

Heapsort Build-Max-Heap Next we build a full heap from an unsorted sequence Build-Max-Heap(A)

Electron Cloud Build Electron Cloud Build- Electron Cloud Build Electron Cloud Build -Up

Cortex-A15 Processor ARMs next generation mobile applications processor Travis Lanier Senior

Ch. 5: Processor + Memory December 12, 2008 Ch. 5: Processor + Memory Overview of Implementation

Chapter 12 CPU Structure and Function Contents Processor organization Register

Processor Architecture: Current Trends A B Transfer a truckload at a time from A to B Processor

Embedded systems &amp; the Nios II soft core processor A Nios II processor system I equivalent to

Processor Design Single Cycle Processor Hung-Wei Tseng Recap: the stored-program computer

DIY AUDIO ELECTRONICS WHY DIY? LETS BUILD AN EFFECTS PEDAL CIRCUIT BASICS A circuit contains

Processor Datapath Levels in Processor Design We can talk about design at a variety of levels

CS 31: Intro to Systems Digital Logic Martin Gagn Swarthmore College January 31, 2017

CSCI 510/610: Advanced Computer Architecture Implementing a Datapath in Verilog A Lab Manual

Power Optimization of FPGA Interconnect Via Circuit and CAD Techniques Safeen Huda and Jason

Programmable Microfluidics William Thies*, J.P. Urbanski , Mats Cooper , David Wentzlaff*,

A Theory of Abstraction for Arrays Steven German IBM T.J. Watson Research Center October 2011 1

for genetic programming W. B. Langdon CREST lab, Department of Computer Science Slides

Using Form al Techniques for Design for Verifiability Rolf Drechsler University of Brem en DFKI

Embedded systems & the Nios II soft core processor A Nios II processor system I equivalent to

Programmable Microfluidics William Thies, J.P. Urbanski , Mats Cooper , David Wentzlaff,