SLIDE 1 Foundations of Global Networked Computing: Building a Modern Computer From First Principles
IWKS 3300: NAND to Tetris Spring 2019 John K. Bennett
This course is based upon the work of Noam Nisan and Shimon Schocken. More information can be found at (www.nand2tetris.org).
Boolean Arithmetic
SLIDE 2 Counting Systems
quant ntity ty decimal mal binary ary 3-bit register ter 000 1 1 001 2 10 010 3 11 011 4 100 100 5 101 101 6 110 110 7 111 111 8 1000
9 1001
10 1010
SLIDE 3 Number Representation
19 2 1 2 1 2 2 2 1 ) 10011 (
1 2 3 4
two
i n i i b n n
b x x x x
1
) ... (
9038 1 8 1 3 1 1 9 ) 9038 (
1 2 3
ten
General Case: Base 10: Base 2:
SLIDE 4 Binary Representation of Numeric Values
Sign Magnitude (e.g., -2 = 1010) One’s Complement (invert: e.g., -2 = 1101) Two’s Complement (invert and add 1: e.g., -2 = 1101) 1 1 0 0 0 1 1 1 0 1 0 1 1 0 1
+
= 3 = -2 = -5 0 0 1 0 0 1 1 1 1 0 1 1 1 1 0
+
= 3 = -2 = -1 1 1 1 0 0 0 1 1 1 1 1 0 0 0 0 1
+
= 3 = -2 = 1
Sign Magnitude One’s Complement Two’s Complement
Invertible: --n = n Only one zero: 0 = -0 Two zeros: 0 != -0 Two zeros: 0 != -0
SLIDE 5 Representing 2’s Complement Negative Numbers (4-bit system)
The representation of all positive numbers begins with a “0” The representation of all negative numbers begins with a “1“ To convert a two’s comp. number: Inside CPU: complement number and add 1 (easy to do with full adders) On Paper: leave all trailing 0’s and first 1 from right intact, then flip all the remaining bits moving left 0000 0000
1 0001 1111
2 0010 1110
3 0011 1101
4 0100 1100
5 0101 1011
6 0110 1010
7 0111 1001
Example: 2 - 5 = 2 + (-5) = 0 0 1 0 + 1 0 1 1 1 1 0 1 = -3
SLIDE 6 Like decimal addition When signs are opposite, carry into or out of sign bit can be ignored When signs are the same, carry into sign bit (that changes sign) indicates
handled)
Overflow in 2’s Complement Binary Addition
Assuming 4-bit numbers (3 bits plus sign):
No overflow 0 0 0 1 1 0 0 1 0 1 0 1 1 1 1 0
+
= -7 = 5 = -2 Overflow (but we don’t care) 1 1 1 1 1 0 1 1 0 1 1 1 0 0 1 0
+
= -5 = 7 = 2 Overflow (we do care) 0 1 1 1 0 1 0 1 0 1 1 1 1 1 0 0
+
= 5 = 7 = -4 How it works:
SLIDE 7 Building an Adder chip
Adder: a chip designed to add two (two’s comp.) integers Proposed implementation: Half adder: designed to add 2 bits Full adder: designed to add 3 bits Adder: designed to add two n-bit numbers.
a
16
16-bit adder b
16 16
SLIDE 8 Half adder (designed to add 2 bits)
Implementation: based on two gates that you’ve seen before.
half adder
a sum b carry
a b sum carry 1 1 1 1 1 1 1
SLIDE 9
How To Build A Half Adder
Truth table for Half Adder
A B A XOR B (A + B) Cout 0 0 0 0 0 0 1 1 1 0 1 0 1 1 0 1 1 0 0 1 Half Adder ∑ Cout B A
??
∑
SLIDE 10
How To Build A Half Adder
Truth table for XOR and Half Adder
A B A XOR B (A + B) Cout A AND B 0 0 0 0 0 0 0 1 1 1 0 0 1 0 1 1 0 0 1 1 0 0 1 1 Half Adder ∑ Cout B A ∑ = A XOR B Cout = A AND B ∑
SLIDE 11 Full adder (designed to add with carry in)
Implementation: based on half-adder gates.
a b c sum carry 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
full adder
a sum b carry c
SLIDE 12
Implementation of Full Adder
Truth table for Full Adder
Cin A B Cout 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 1 0 1 1 0 0 1 0 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 ∑ Full Adder ∑ Cout B A Cin
SLIDE 13
How To Build A Full Adder
Truth table for Full Adder
Cin A B Cout1 Cout2 Cout 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 1 0 1 0 1 1 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 ∑ ∑1 Half Adder ∑2 Cout2 B2 A2 Half Adder ∑1 Cout1 B1 A1 A B Cin ∑ Cout
??
SLIDE 14
How To Build A Full Adder
Truth table for Full Adder
Cin A B Cout1 Cout2 Cout 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 1 0 1 0 0 1 1 0 0 0 0 0 1 0 1 0 1 1 0 1 0 1 1 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 ∑ ∑1 Half Adder ∑2 Cout2 B2 A2 Half Adder ∑1 Cout1 B1 A1 A B Cin ∑ Cout
SLIDE 15 n-bit Adder (designed to add two 16-bit numbers)
Simple Implementation: series of full-adder gates.
a
16
16-bit adder b
16 16
... 1 0 1 1 a … 0 1 0 b … 1 1 0 1
+
SLIDE 16
Chained Carry Adder
Chained carry makes addition time approximately equal to number of bits times the propagation delay of a Full Adder
Full Adder ∑ Cout B A Cin Full Adder ∑ Cout B A Cin Full Adder ∑ Cout B A Cin Full adder prop. delay = 3 gpd (carry output) So a 16 bit adder would take 48 gpd to complete add
SLIDE 17 Carry Look Ahead Basics
If we understand how carry works we can compute carry in advance. This is called “Carry Look-Ahead.” For any bit position, if A = 1 and B = 1; Cout = 1, i.e., a carry will be generated to the next bit position, regardless of value of Cin. This is called “Carry Generate” For any bit position, if one input is 1 and the other input is 0; Cout will equal Cin (i.e., the value of Cin will be propagated to the next bit
- position. This is called “Carry Propagate”
For any bit position, if A = 0 and B = 0; Cout will equal 0, regardless
- f value of Cin. This is called “Carry Stop.”
Full Adder ∑ Cout B A Cin Full Adder ∑ Cout B A Cin Full Adder ∑ Cout B A Cin
SLIDE 18
Carry Generate, Propagate and Stop
Truth table for Full Adder
Cin A B Cout fgps 0 0 0 x CSi fgps 0 1 1 x CPi fgps 1 0 1 x CPi fgps 1 1 0 x CGi ∑ Full Adder ∑ Cout B A Cin X fgps No need for carry chain
SLIDE 19
Carry Look Ahead Basics
The equations to compute Cin at Bit Position i are as follows:
Cini = Cgi-1 + Cpi-1 ● Cgi-2 + Cpi-1 ● Cpi-2 ● Cgi-3 … + Cpi-1 ● Cpi-2 … Cp1 ● Cg0
SLIDE 20
Practical Considerations
Cini = Cgi-1 + Cpi-1 ● Cgi-2 + Cpi-1 ● Cpi-2 ● Cgi-3 … + Cpi-1 ● Cpi-2 … Cp1 ● Cg0
Very wide (more than 8 input) gates are impractical, so we would likely use a logn depth tree of gates to implement the wide ANDs and ORs. This is still faster than chained carry, even for 16 bits (and is much faster for 32 or 64 bit adders).
SLIDE 21
Practical Implementation Note
Use Cin0 to add one (for example, when inverting the sign of a two’s complement number). Cin0 then becomes a control signal that can be turned on by the control part of the microprocessor.
Full Adder ∑ Cout B A Cin
SLIDE 22 The ALU (of the Hack platform)
half adder
a sum b carry
full adder
a sum b carry c
x
16
16-bit adder y
16 16
zx no zr nx zy ny f
ALU
ng
16 bits 16 bits
x y
16 bits
- ut
- ut(x, y, control bits) =
x+y, x-y, y–x, 0, 1, -1, x, y, -x, -y, !x, !y, x+1, y+1, x-1, y-1, x&y, x|y
SLIDE 23 ALU logic (Hack platform)
Implementation: build a logic gate architecture that “executes” the control bit “instructions”: if zx==1 then set x to 0 (bit-wise), etc.
SLIDE 24
ALU Details – How might we build this?
SLIDE 25
Example: How Do We Get “-x”
SLIDE 26
Example: How Do We Get “-x” out of ALU x = x (assume x = 4) = 0000 0000 0000 0100 y = !0 = 1111 1111 1111 1111 f = !(x+y); x+y = 0000 0000 0000 0011 !(x+y) = 1111 1111 1111 1100 1111111111111100 = -4 (two’s complement) 1111 1111 1111 1
SLIDE 27 Straightforward ALU Implementation
zx no zr nx zy ny f
ALU
ng
16 bits 16 bits
x y
16 bits
zx x y zy ny nx f no
+
How could this design be improved?
(zr and zg impl. not shown)
SLIDE 28 Faster ALU Implementation
zx no zr nx zy ny f
ALU
ng
16 bits 16 bits
x y
16 bits
zx x y zy ny nx f no
+
What Else Could We Do? 4:1 4:1
2:1 as function generator (would require arch. change)
CLA?
SLIDE 29 23 03 02 01 00 13 12 11 10 1 1 22 21 20 2 2 33 32 31 30 3 3
What About Multiplication?
SLIDE 30 Carry Save Addition
The idea is to perform several additions in sequence, keeping the carries and the sum separate. This means that all of the columns can be added in parallel without relying on the result of the previous column, creating a two output "adder" with a time delay that is independent of the size of its
- inputs. The sum and carry can then be recombined using one normal
carry-aware addition (ripple or CLA) to form the correct result.
SLIDE 31 CSA Uses Full Adders
“Wallace Tree” Addition CSA Adder Tree Depth = 7; 7 adders used Plus add with carry Depth = 4; 7 adders Plus add with carry
SLIDE 32
A 4-bit Example (carry propagating to the right)
(or carry look-ahead)
SLIDE 33 Example: An 8-bit Carry Save Array Multiplier
A parallel multiplier for unsigned operands. It is composed of 2-input AND gates for producing the partial products, a series of carry save adders for adding them and a ripple-carry adder for producing the final product.
Generating Partial Products FA with 3 inputs, 2 outputs
SLIDE 34 The ALU in the CPU (a sneak preview of the Hack computer)
ALU Mux D
A/M a
D register A register
A M c1,c2, … ,c6 RAM
(selected register)
SLIDE 35
Perspective
The ALU as implemented is all combinatorial logic Adder design is very basic (but we could use CLA adders) It generally pays to optimize adders (computers add a lot) Hack ALU is very basic: no multiplication, no division (expensive HW). Modern microprocessors provide HW multiplication, but they generally did not through the `80s. It is possible to do very fast multiplication in HW, but it’s expensive (in number of gates), and multiplication is not as frequent as addition (in computing (HW or SW), always optimize for the common case).
SLIDE 36 Historical End-note: Leibnitz (1646-1716)
“The binary system may be used in place of the decimal system; express all numbers by unity and by nothing” 1679: built a mechanical calculator (+, -, *, /) CHALLENGE: “All who are occupied with the reading or writing of scientific literature have assuredly very often felt the want of a common scientific language, and regretted the great loss of time and trouble caused by the multiplicity of languages employed in scientific literature: SOLUTION: “Characteristica Universalis”: a universal, formal, and decidable language of reasoning The dream’s end: Turing and Gödel in 1930’s (there are undecidable problems).
Leibniz’s medallion for the Duke of Brunswick