CS35101 Ch3.Part1 27 Steinfadt SP08 KSU
CS 35101 Computer Architecture Spring 2008 Chapter 3 Part 2 - - PowerPoint PPT Presentation
CS 35101 Computer Architecture Spring 2008 Chapter 3 Part 2 - - PowerPoint PPT Presentation
CS 35101 Computer Architecture Spring 2008 Chapter 3 Part 2 (3.4-3.6, Apndx B) Taken from Mary Jane Irwin (www.cse.psu.edu/~mji) and Kevin Schaffer [ adapted from D. Patterson slides ] CS35101 Ch3.Part1 27 Steinfadt SP08 KSU Heads Up
CS35101 Ch3.Part1 28 Steinfadt SP08 KSU
Head’s Up
Last week’s material
MIPS arithmetic
- Reading assignment – 3.1-3.3
Exam 1 on 2/21 Thursday
This week’s material
MIPS arithmetic and ALU design
- Reading assignment – 3.4-3.5, B.1-B.5
CS35101 Ch3.Part1 30 Steinfadt SP08 KSU
Overflow Detection
Overflow occurs when the result is too large to
represent in the number of bits allocated
adding two positives yields a negative or, adding two negatives gives a positive or, subtract a negative from a positive gives a negative or, subtract a positive from a negative gives a positive
On your own: Prove you can detect overflow by:
Carry into MSB xor Carry out of MSB
1 1 1 1 1 1 1 1 1 1 1 1 + 7 3 1 – 6 1 1 1 1 1 + –4 – 5 7 1
CS35101 Ch3.Part1 31 Steinfadt SP08 KSU
What operations does an ALU need to handle?
Adds: Sub’s: Multiply / Divide: Logical Ops: Branch / Comp’s:
CS35101 Ch3.Part1 32 Steinfadt SP08 KSU
What operations does an ALU need to handle?
Adds: add, addi, addiu, addu Sub’s: sub, subu Multiply / Divide: mult, multu, div, divu Logical Ops: and, andi, nor, or, ori,
xor, xori
Branch / Comp’s: beq, bne, slt, slti,
sltiu, sltu
CS35101 Ch3.Part1 33 Steinfadt SP08 KSU
What operations does an ALU need to handle?
Adds: add, addi, addiu, addu Sub’s: sub, subu Multiply / Divide: mult, multu, div, divu Logical Ops: and, andi, nor, or, ori,
xor, xori
Branch / Comp’s: beq, bne, slt, slti,
sltiu, sltu
Assume that the immediates are handled before they reach the ALU
CS35101 Ch3.Part1 34 Steinfadt SP08 KSU
What operations does an ALU need to handle?
Adds: add, addu Sub’s: sub, subu Multiply / Divide: mult, multu, div, divu Logical Ops: and, nor, or, xor Branch / Comp’s: beq, bne, slt, sltu
Assume that the immediates are handled before they reach the ALU
CS35101 Ch3.Part1 35 Steinfadt SP08 KSU
What operations does an ALU need to handle?
Adds: add, addu Sub’s: sub, subu Multiply / Divide: mult, multu, div, divu Logical Ops: and, nor, or, xor Branch / Comp’s: beq, bne, slt, sltu
Multiply and Divide will get their own hardware
CS35101 Ch3.Part1 36 Steinfadt SP08 KSU
What operations does an ALU need to handle?
Adds: add, addu Sub’s: sub, subu Logical Ops: and, nor, or, xor Branch / Comp’s: beq, bne, slt, sltu
Check for Equality can be done with arithmetic functions (a=b if a-b = 0)
CS35101 Ch3.Part1 37 Steinfadt SP08 KSU
Building a 1-bit ALU
What are the opcodes for the remaining functions that the ALU must support? What about the function codes? What is the leading hex value for each Function Code?
add 1 addu 2 sub 3 subu 4 and 5
- r
6 xor 7 nor a slt b sltu
CS35101 Ch3.Part1 38 Steinfadt SP08 KSU
MIPS Arithmetic and Logic Instructions
R-type: I-Type: 31 25 20 15 5
- p
Rs Rt Rd funct
- p
Rs Rt Immed 16
Type
- p
funct ADDI 001000 xx ADDIU 001001 xx SLTI 001010 xx SLTIU 001011 xx ANDI 001100 xx ORI 001101 xx XORI 001110 xx LUI 001111 xx Type
- p
funct ADD 000000 100000 ADDU 000000 100001 SUB 000000 100010 SUBU 000000 100011 AND 000000 100100 OR 000000 100101 XOR 000000 100110 NOR 000000 100111 Type op funct 000000 101000 000000 101001 SLT 000000 101010 SLTU 000000 101011 000000 101100
CS35101 Ch3.Part1 39 Steinfadt SP08 KSU
Design Trick: Divide & Conquer
Example: assume the immediates have been
taken care of before the ALU
now down to 10 operations can encode in 4 bits
Break the problem into simpler
problems, solve them and glue together the solution
Next up: Section B, The Basics of Logic Design
add 1 addu 2 sub 3 subu 4 and 5
- r
6 xor 7 nor a slt b sltu
CS35101 Ch3.Part1 40 Steinfadt SP08 KSU
Combinational vs. Sequential
Combinational logic has no memory,
- utputs depend entirely on inputs
Sequential logic has memory, outputs
depend on both inputs and the current contents of memory
Memory in sequential logic is called state
CS35101 Ch3.Part1 41 Steinfadt SP08 KSU
Combinational Logic
Truth tables Logic equations Gates
CS35101 Ch3.Part1 42 Steinfadt SP08 KSU
Truth Tables
Gives values of outputs
for each combination of inputs
Logic block with n
inputs is defined by a truth table with 2n entries
1 1 1 1 1 1 1 1 D C B A Outputs Inputs
CS35101 Ch3.Part1 43 Steinfadt SP08 KSU
Logic Equations
OR (Logical sum): A + B AND (Logical product): A · B NOT (Logical complement): A'
CS35101 Ch3.Part1 44 Steinfadt SP08 KSU
Boolean Algebra
Identity laws
A + 0 = A A · 1 = A
Zero and One laws
A + 1 = 1 A · 0 = 0
Inverse laws
A + A' = 1 A · A' = 0
CS35101 Ch3.Part1 45 Steinfadt SP08 KSU
Boolean Algebra (cont'd)
Commutative laws
A + B = B + A A · B = B · A
Associative laws
A + (B + C) = (A + B) + C A · (B · C) = (A · B) · C
Distributive laws
A · (B + C) = (A · B) + (A · C) A + (B · C) = (A + B) · (A + C)
CS35101 Ch3.Part1 46 Steinfadt SP08 KSU
Gates
AND gate OR gate Inverter (NOT gate)
CS35101 Ch3.Part1 47 Steinfadt SP08 KSU
Inversion Bubbles
Inverters are so commonly used that designers
have developed a shorthand notation
Instead of using explicit inverters, you can attach
bubbles to the inputs or outputs of other gates
CS35101 Ch3.Part1 48 Steinfadt SP08 KSU
Universal Gates
Any combinational function can be built
from AND, OR and NOT gates
However, there are universal gates that
alone can implement any function
NAND and NOR are two such gates NAND and NOR are AND and OR gates
with inverted outputs
CS35101 Ch3.Part1 49 Steinfadt SP08 KSU
Decoder
A decoder asserts
exactly one of its 2n
- utputs for each
combination of its n inputs
The n inputs are
interpreted as an n-bit binary number
CS35101 Ch3.Part1 50 Steinfadt SP08 KSU
Decoder (cont'd)
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Out0 Out1 Out2 Out3 Out4 Out5 Out6 Out7 In0 In1 In2 Outputs Inputs
CS35101 Ch3.Part1 51 Steinfadt SP08 KSU
Multiplexor
A multiplexor selects one of its 2n data inputs,
based on the value of its n selector inputs, to become its output
The n selector inputs are interpreted as an n-bit
binary number
CS35101 Ch3.Part1 52 Steinfadt SP08 KSU
Arrays of Logic Elements
CS35101 Ch3.Part1 53 Steinfadt SP08 KSU
Two-Level Logic
Any combinational function can be expressed in a
canonical two-level representation
Sum of products is a logical sum (OR) of logical
products (AND)
Product of sums is just the opposite
CS35101 Ch3.Part1 54 Steinfadt SP08 KSU
Sum of Products
CS35101 Ch3.Part1 55 Steinfadt SP08 KSU
Full Adder
Sum CarryIn CarryOut a b
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 SUM COUT B A CIN
CS35101 Ch3.Part1 56 Steinfadt SP08 KSU
Subtraction
Subtraction is implemented by negating the
second operand before addition
To negate a number in two's complement, invert
the bits and add one
a – b = a + -b = a + (~b + 1) = a + ~b + 1 We can take advantage of the carry in to the LSB
in order to add one to result
CS35101 Ch3.Part1 57 Steinfadt SP08 KSU
Adder/Subtractor
CS35101 Ch3.Part1 58 Steinfadt SP08 KSU
Delay in Ripple Carry Adders
Ripple carry adders are simple, but slow The critical path (longest path any signal takes) goes
through all the full adders
Therefore the delay is O(k) for a k-bit adder Design trick – throw hardware at it (Carry Lookahead [info
located in Appendix B.6])
CS35101 Ch3.Part1 59 Steinfadt SP08 KSU
Clocks
A clock is a logic signal that oscillates between
0 and 1 with a fixed frequency
When the clock transitions from 0 to 1, this is
called a rising (or positive) edge; a transition from 1 to 0 is a falling (or negative) edge
Logic can be built to respond to the value of the
clock (level-sensitive) or to its edges (edge- sensitive)
CS35101 Ch3.Part1 60 Steinfadt SP08 KSU
Clocks
Clock period (or clock cycle time) is the inverse of the clock
frequency
Example: a clock with a period of 500 ps has a frequency of
2 GHz
CS35101 Ch3.Part1 61 Steinfadt SP08 KSU
Latches
Latches are level-sensitive
storage elements
The simplest type of latch
is the S-R latch (set-reset latch)
Q is the currently stored
value
CS35101 Ch3.Part1 62 Steinfadt SP08 KSU
Clocked Latches
A D latch (data latch) is an
example of a clocked latch
When the clock (C) is high,
the data input (D) is copied to the output
When the clock is low, the
- utput remains unchanged
CS35101 Ch3.Part1 63 Steinfadt SP08 KSU
Latch Example (rising edge trigger)
CS35101 Ch3.Part1 64 Steinfadt SP08 KSU
Flip-Flops
Flip-flops are edge-
sensitive store elements
A D flip-flop updates its
stored value only on a rising (or falling) clock edge
A register consists of
multiple D flip-flops with a common clock
D flip-flop with a falling-
edge trigger
CS35101 Ch3.Part1 65 Steinfadt SP08 KSU
Flip-Flop Example (falling edge trigger)
CS35101 Ch3.Part1 66 Steinfadt SP08 KSU
Setup and Hold Time
Inputs must be stable before (setup time) and after (hold
time) the clock edge
Failure to meet setup and hold time requirements may
result in unpredictable behavior
CS35101 Ch3.Part1 67 Steinfadt SP08 KSU
Mixed Logic 1
Data from state element 1 propagates through the
combinational logic
At the clock edge, the output is sampled and stored into
state element 2
CS35101 Ch3.Part1 68 Steinfadt SP08 KSU
Mixed Logic 2
With edge-sensitive clocking, it's possible to write back into
the same state element
CS35101 Ch3.Part1 69 Steinfadt SP08 KSU
Register Files
Array of registers Multiple ports allow multiple
simultaneous reads and writes
Used to implement general-
purpose registers in MIPS
CS35101 Ch3.Part1 70 Steinfadt SP08 KSU
Register File Write Port
CS35101 Ch3.Part1 71 Steinfadt SP08 KSU
Register File Read Ports
CS35101 Ch3.Part1 72 Steinfadt SP08 KSU
More complicated than addition
Can be accomplished via shifting and adding
0010 (multiplicand) x_1011 (multiplier) 0010 0010 (partial product 0000 array) 0010 00010110 (product)
Double precision product produced More time and more area to compute
Multiplication
CS35101 Ch3.Part1 73 Steinfadt SP08 KSU
Multiply produces a double precision product
mult $s0, $s1 # hi||lo = $s0 * $s1
Low-order word of the product is left in processor register
lo and the high-order word is left in register hi
Instructions mfhi rd and mflo rd are provided to
move the product to (user accessible) registers in the register file
MIPS Multiply Instruction
- p rs rt rd shamt funct
Multiplies are done by fast, dedicated hardware
and are much more complex (and slower) than adders
Hardware dividers are even more complex and
even slower; ditto for hardware square root
CS35101 Ch3.Part1 74 Steinfadt SP08 KSU
Shift-Add Multiplier (1)
64-bit ALU Control test Multiplier Shift right Product Write Multiplicand Shift left 64 bits 64 bits 32 bits
CS35101 Ch3.Part1 75 Steinfadt SP08 KSU
Multiplication Algorithm (1)
Repeat the following steps 32 times If Multiplier0 is 1 then add multiplicand to product
and place the result in Product register
Shift the Multiplicand register left 1 bit Shift the Multiplier register right 1 bit
CS35101 Ch3.Part1 76 Steinfadt SP08 KSU
Shift-Add Multiplier (2)
Multiplier Shift right Write 32 bits 64 bits 32 bits Shift right Multiplicand 32-bit ALU Product Control test
CS35101 Ch3.Part1 77 Steinfadt SP08 KSU
Multiplication Algorithm (2)
Repeat the following steps 32 times If Multiplier0 is 1 then add multiplicand to the left
half of the product and place the result in the left half of the Product register
Shift the Product register right 1 bit Shift the Multiplier register right 1 bit
CS35101 Ch3.Part1 78 Steinfadt SP08 KSU
Shift-Add Multiplier (3)
Control test Write 32 bits 64 bits Shift right Product Multiplicand 32-bit ALU
CS35101 Ch3.Part1 79 Steinfadt SP08 KSU
Multiplication Algorithm (3)
Repeat the following steps 32 times If Product0 is 1 then add multiplicand to the left
half of the product and place the result in the left half of the Product register
Shift the Product register right 1 bit
CS35101 Ch3.Part1 80 Steinfadt SP08 KSU
Parallel Multipliers
Parallel multipliers trade additional hardware for
faster multiplication
A carry save adder reduces a sum of three
- perands to a sum of two operands
Multiple levels of carry save adders combined with
a final normal adder can add multiple operands in logarithmic time
CS35101 Ch3.Part1 81 Steinfadt SP08 KSU
Tree of Adders
CS35101 Ch3.Part1 82 Steinfadt SP08 KSU
Tree of Carry Save Adders
CS35101 Ch3.Part1 83 Steinfadt SP08 KSU
Division
Division is just a bunch of quotient digit guesses
and left shifts and subtracts
dividend divisor partial remainder array quotient
n n
remainder
n
0 0 0
CS35101 Ch3.Part1 84 Steinfadt SP08 KSU
Divide generates the reminder in hi and the
quotient in lo div $s0, $s1 # lo = $s0 / $s1 # hi = $s0 mod $s1
Instructions mflo rd and mfhi rd are provided to
move the quotient and reminder to (user accessible) registers in the register file
MIPS Divide Instruction
As with multiply, divide ignores overflow so
software must determine if the quotient is too
- large. Software must also check the divisor to
avoid division by 0.
- p rs rt rd shamt funct
CS35101 Ch3.Part1 85 Steinfadt SP08 KSU
Division
Once again we go back to the long-hand decimal
algorithm to understand the binary algorithm
The dividend is divided by the divisor which
produces a quotient and a remainder
CS35101 Ch3.Part1 86 Steinfadt SP08 KSU
Shift-Subtract Divider 1
CS35101 Ch3.Part1 87 Steinfadt SP08 KSU
Division Algorithm 1
Repeat the following steps 33 times Subtract the Divisor register from the Remainder register and
place the result in the Remainder register
If Remainder is greater than or equal to zero then shift the
Quotient left and insert a 1
If Remainder is less than zero then add the Divisor back to the
Remainder; also shift the Quotient left and insert a 0
Shift the Divisor register right 1 bit
CS35101 Ch3.Part1 88 Steinfadt SP08 KSU
Shift-Subtract Divider 2
CS35101 Ch3.Part1 89 Steinfadt SP08 KSU
Real Numbers
There are too many real numbers to represent
accurately in a computer
Instead we use approximations to real
numbers
Fixed-point numbers split a number to a whole
part and a fractional part (radix point is fixed)
Does not require special hardware
Floating-point numbers allow the radix point to
move by encoding its position explicitly
More flexible
CS35101 Ch3.Part1 90 Steinfadt SP08 KSU
Binary Floating-Point
Conversion from decimal to binary
Use repeated division for the whole part Use repeated multiplication for the fraction
From binary to decimal
Multiply each bit by its weight Bits after the decimal point are 2-1, 2-2, etc.
Normalization ensures that there is exactly
- ne bit to the left of the decimal point
Results in a unique representation for every number
CS35101 Ch3.Part1 91 Steinfadt SP08 KSU
Floating-Point Formats
In the past each architecture had its own
floating-point format
This made it difficult to exchange data Today virtually all computers use the IEEE
floating-point standard
Sometimes non-standard formats are used for
extra precision
Sometimes only parts of the standard are
implemented in order to improve performance
CS35101 Ch3.Part1 92 Steinfadt SP08 KSU
Representing Big (and Small) Numbers
What if we want to encode the approx. age of the
earth?
4,600,000,000 or 4.6 x 109
- r the weight in kg of one a.m.u. (atomic mass unit)
0.0000000000000000000000000166 or 1.6 x 10-27
There is no way we can encode either of the above in
a 32-bit integer.
Floating point representation (-1)sign x F x 2E
Still have to fit everything in 32 bits (single precision)
s E (exponent) F (fraction)
1 bit 8 bits 23 bits
The base (2, not 10) is hardwired in the design of the FPALU More bits in the fraction (F) or the exponent (E) is a trade-off
between precision (accuracy of the number) and range (size of the number)
CS35101 Ch3.Part1 93 Steinfadt SP08 KSU
IEEE Floating-Point Standard
Format
Sign bit Exponent (biased) Fraction (with a hidden bit)
Two sizes
Single-precision has an 8-bit exponent and 23-bit
fraction (32-bit total); bias is 127
Double-precision has an 11-bit exponent and a 52-
bit fraction (64-bit total); bias is 1023
CS35101 Ch3.Part1 94 Steinfadt SP08 KSU
IEEE FP Numbers
The value of a normal number is
(-1)sign x (1+F) x 2E-bias
s is the sign bit f is the fraction e is the unbiased exponent
The 1 in front of the fraction is the hidden bit;
the 1.f term is called the significand
Note that the sign bit determines the sign of
the number as a whole; the exponent has its
- wn sign
CS35101 Ch3.Part1 95 Steinfadt SP08 KSU
IEEE 754 FP Standard Encoding
F is stored in normalized form where the msb in the fraction is 1
(so there is no need to store it!) – called the hidden bit
To simplify sorting FP numbers, E comes before F in the word
and E is represented in excess (biased) notation not a number (NaN) nonzero 2047 nonzero 255 ± infinity ± 2047 ± 255 ± floating point number anything ± 1-2046 anything ± 1-254 ± denormalized number nonzero nonzero true zero (0) F (52) E (11) F (23) E (8) Object Represented Double Precision Single Precision
CS35101 Ch3.Part1 96 Steinfadt SP08 KSU
Special Numbers
IEEE FP also defines classes of special numbers
Denormalized numbers Zero Infinity Not a Number (NaN)
CS35101 Ch3.Part1 97 Steinfadt SP08 KSU
Zero
All normalized FP numbers have a 1 before the
radix point, this makes it impossible to exactly represent zero
IEEE uses a special encoding for zero
Biased exponent is zero Fraction is zero
Due to the signed-magnitude representation, there
is both a positive and negative zero
CS35101 Ch3.Part1 98 Steinfadt SP08 KSU
Denormalized Numbers
It is also difficult to represent numbers that are close
to zero in normalized form
Denormalized numbers are stored unnormalized and
therefore do not have a hidden bit
IEEE also uses a special encoding for denormals
Biased exponent is zero Fraction is not zero
Denormals help prevent underflow Also known as subnormal numbers
1.0000…000 x 2-126 vs. 0.0000…001 x 2-126 or 1.02 x 2-149
CS35101 Ch3.Part1 99 Steinfadt SP08 KSU
Underflow
Underflow occurs when a number is too small in
magnitude to be represented
This occurs when the exponent is less than the
minimum representable value
Be careful not to confuse negative overflow with
underflow
Underflow is unique to floating-point; integer
arithmetic can never underflow
CS35101 Ch3.Part1 100 Steinfadt SP08 KSU
Infinity
IEEE has an encoding for infinity
Biased exponent is maximum (255 for single) Fraction is zero Sign determines positive or negative infinity
Infinities result from overflow or division of a non-
zero by zero
Arithmetic with infinities is supported where it
makes sense, eg: x + ∞ = ∞
CS35101 Ch3.Part1 101 Steinfadt SP08 KSU
Not a Number (NaN)
In IEEE an undefined operation results in a special
value called Not a Number (NaN)
Biased exponent is maximum (255 for single) Fraction is not zero Sign is ignored
Example of undefined operations
Dividing zero by zero Adding infinities of different signs Square root of a negative number
Any operation on a NaN results in a NaN
CS35101 Ch3.Part1 102 Steinfadt SP08 KSU
Conversion to IEEE FP
Convert decimal real number to binary
Repeated division for the integer part Repeated multiplication for the fractional part
Normalize the binary real number
Move the radix point left, increase exponent Move the radix point right, decrease exponent
Add bias to exponent and convert to binary Remove hidden bit from significand Determine sign
CS35101 Ch3.Part1 103 Steinfadt SP08 KSU
Conversion from IEEE FP
Add hidden bit to fraction Convert exponent to a decimal number Subtract bias from exponent Unnormalize significand Convert significand to decimal by multiplying bits
by their weights
Negate if sign bit is a 1
CS35101 Ch3.Part1 104 Steinfadt SP08 KSU
Floating-Point Addition
Adjust radix point of smaller number so it is
aligned with larger number
Add the significands Normalize Round
CS35101 Ch3.Part1 105 Steinfadt SP08 KSU
Floating Point Addition
Addition (and subtraction)
(±F1 × 2E1) + (±F2 × 2E2) = ±F3 × 2E3
Step 1: Restore the hidden bit in F1 and in F2 Step 1: Align fractions by right shifting F2 by E1 - E2
positions (assuming E1 ≥ E2) keeping track of (three of) the bits shifted out in a round bit, a guard bit, and a sticky bit
Step 2: Add the resulting F2 to F1 to form F3 Step 3: Normalize F3 (so it is in the form 1.XXXXX …)
- If F1 and F2 have the same sign → F3 ∈[1,4) → 1 bit right
shift F3 and increment E3
- If F1 and F2 have different signs → F3 may require many left
shifts each time decrementing E3
Step 4: Round F3 and possibly normalize F3 again Step 5: Rehide the most significant bit of F3 before storing
the result
CS35101 Ch3.Part1 106 Steinfadt SP08 KSU
MIPS Floating Point Instructions
MIPS has a separate Floating Point Register File ($f0,
$f1, …, $f31) (whose registers are used in pairs for double precision values) with special instructions to load to and store from them lwcl $f1,54($s2) #$f1 = Memory[$s2+54] swcl $f1,58($s4) #Memory[$s4+58] = $f1
And supports IEEE 754 single
add.s $f2,$f4,$f6 #$f2 = $f4 + $f6 and double precision operations add.d $f2,$f4,$f6 #$f2||$f3 = $f4||$f5 + $f6||$f7 similarly for sub.s, sub.d, mul.s, mul.d, div.s, div.d
CS35101 Ch3.Part1 107 Steinfadt SP08 KSU
MIPS Floating Point Instructions, Con’t
And floating point single precision comparison
- perations
c.x.s $f2,$f4 #if($f2 < $f4) cond=1; else cond=0 where x may be eq, neq, lt, le, gt, ge and branch operations bclt 25 #if(cond==1) go to PC+4+25 bclf 25 #if(cond==0) go to PC+4+25
And double precision comparison operations
c.x.d $f2,$f4 #$f2||$f3 < $f4||$f5 cond=1; else cond=0
CS35101 Ch3.Part1 108 Steinfadt SP08 KSU
Floating-Point Multiplication
Add the exponents and subtract the bias Multiply the significands Normalize Round Compare signs
CS35101 Ch3.Part1 109 Steinfadt SP08 KSU
CS35101 Ch3.Part1 110 Steinfadt SP08 KSU