SLIDE 1 Foundations of Global Networked Computing: Building a Modern Computer From First Principles
IWKS 3300: NAND to Tetris Spring 2019 John K. Bennett
This course is based upon the work of Noam Nisan and Shimon Schocken. More information can be found at (www.nand2tetris.org).
Computer Architecture
SLIDE 2
A Brief History of Computer Architecture
SLIDE 3
Abacus (2700–2300 BC; Sumeria) Sexagesimal (base-60) number system
SLIDE 4
Blaise Pascal (1623-62)
SLIDE 5
Jacquard Loom (1801)
SLIDE 6
AVL Jacquard Loom (2016)
SLIDE 7
Atanasoff–Berry Computer (1937-42; vacuum tubes)
SLIDE 8
Zuse Z3 (1941-43; relays)
SLIDE 9
Eniac (1943-46; vacuum tubes)
SLIDE 10
Eniac
SLIDE 11
Eniac
SLIDE 12
Eniac
SLIDE 13
Illiac I / OrdVac (1951)
Ordnance Discrete Variable Automatic Computer
SLIDE 14
R1- Rice Research Computer (1959)
SLIDE 15
IBM 360 (1964)
SLIDE 16
R2- Rice Research Computer (1973)
SLIDE 17
Hexadecimal Calculator
SLIDE 18
Cray 1 (1975)
SLIDE 19
Cray 1 (1975)
SLIDE 20
Cray 2 (1985)
SLIDE 21
The Xerox Alto, 1973 First personal workstation; first wide deployment of: Bit-map graphics Mouse WYSIWYG editing Hosted the invention of: Local-area networking Laser printing All of modern client / server distributed computing
SLIDE 22
MITS Altair 8800 (1975)
SLIDE 23
Basic on an 8-bit Computer
SLIDE 24
Apple I (1975-6)
SLIDE 25
Apple II (1977)
SLIDE 26
TRS 80 (1977)
SLIDE 27
Apple Lisa (1979)
SLIDE 28
IBM PC (1981)
SLIDE 29
UW Eden Node Machine (1982)
SLIDE 30
IBM PC XT (1983)
SLIDE 31
Apple Macintosh (1984)
SLIDE 32
Sun 1-3 Workstations (1982-85)
SLIDE 33
iPhone (2007)
SLIDE 34
Amazon Kindle (2007)
SLIDE 35
iPad (2009)
SLIDE 36 Von Neumann Machine (circa 1940)
Arithmetic Logic Unit (ALU)
CPU
Registers Control
Memory
(data + instructions)
Input device Output device
Gordon Moore, Andy Grove (and others) ... made it small and fast. John Von Neumann (and others) ... made it possible
SLIDE 37
Harvard Mark 1 (circa 1940)
Howard Aiken
SLIDE 38 Arithmetic Logic Unit (ALU)
CPU
Registers Control
Memory
(data + instructions)
Input device Output device
Processing logic: fetch-execute cycle
Executing the current instruction involves one or more of the following tasks:
Have the ALU compute some function out = f (register values) Write the ALU output to selected registers As a side-effect of this computation,
determine what instruction to fetch and execute next.
SLIDE 39 The Hack chip-set and hardware platform
Elementary Gates
- Nand
- Not
- And
- Or
- Xor
- Mux
- Dmux
- Not16
- And16
- Or16
- Mux16
- Or8Way
- Mux4Way16
- Mux8Way16
- DMux4Way
- DMux8Way
Combinatorial Chips
- HalfAdder
- FullAdder
- Add16
- Inc16
- ALU
Sequential Chips
- DFF
- Bit
- Register
- RAM8
- RAM64
- RAM512
- RAM4K
- RAM16K
- PC
Computer Architecture
done done done this lecture
SLIDE 40 The Hack Computer
Main parts of the Hack computer:
Instruction memory (ROM)
Memory (RAM):
- Data memory
- Screen (memory map)
- Keyboard (memory map)
CPU
Computer (the framework that holds everything together).
- A 16-bit Harvard platform (data and instructions are separate)
- The instruction memory and the data memory are physically separate
- Screen: 512 rows by 256 columns, black and white
- Keyboard: standard (memory-mapped to a specific RAM address)
- Designed to execute programs written in the Hack machine language
- Can be easily built from the chip-set that we have built so far in the course
SLIDE 41 Where We Are Headed
Data Memory (Memory) instruction CPU Instruction Memory (ROM32K) inM
addressM writeM pc reset
CHIP Computer { IN reset; PARTS: // implementation missing }
SLIDE 42 Instruction Memory
15 16
address ROM32K
Function:
The ROM is pre-loaded with a program written in the Hack machine language
The ROM chip always emits a 16-bit number:
This number is interpreted as the current instruction.
ROM[2n] is implemented using 8 ROM[2n-3]’s, an 8:1 multiplexor, and a 1:8 demultiplexor.
SLIDE 43 Data Memory
Function:
When read, the RAM16K chip always emits a 16-bit value:
A 16 bit value is required to write to the RAM16K chip:
RAM16K[address] = in & load & clock
RAM[2n] is implemented using 4 or 8 RAM[2n-3]’s, an 8:1 multiplexor, and a 1:8 demultiplexor.
in
16 15 16
address
RAM16K
load
SLIDE 44 Data Memory
Low-level (hardware) read/write logic:
To read RAM[k]: set address to k, probe out To write RAM[k]=x: set address to k, set in to x, set load to 1, run the clock
High-level (OS) read/write logic:
To read RAM[k]: use the OS command out = peek(k) To write RAM[k]=x: use the OS command poke(k,x)
peek and poke are OS commands whose implementation should effect the same behavior
as the low-level commands More about peek and poke this later in the course, when we write the OS.
in
16 15 16
address
RAM16K
load
SLIDE 45 Screen
The Screen chip emulates basic RAM chip functionality:
read logic: out = Screen[address] write logic: if load then Screen[address] = in
Side effect: Continuously refreshes a 256 by 512 black-and-white screen device
load
in
16 15 16
address Screen
Physical Screen
The bit contents of the Screen chip are called the “screen memory map”
The simulated 256 by 512 B&W screen
When loaded into the hardware simulator, the built-in Screen.hdl chip
window; the simulator then refreshes this window from the screen memory map several times each second.
Simulated screen:
SLIDE 46 Screen Memory Map
How to set the (row,col) pixel of the screen to black or to white:
Low-level (machine language): Set the col%16 bit of the word found at
Screen[row*32+col/16] to 1 or to 0 (col/16 is integer division)
High-level: Use the OS command drawPixel(row,col)
(effects the same operation, discussed later in the course, when we write the OS).
1 255 . . . . . . 1 2 3 4 5 6 7 511 0011000000000000 0000000000000000 0000000000000000 1 31 . . . row 0 0001110000000000 0000000000000000 0000000000000000 32 33 63 . . . row 1 0100100000000000 0000000000000000 0000000000000000 8129 8130 8160 . . . row 255 . . . . . . . . . . . . refresh several times each second
Screen
In the Hack platform, the screen is implemented as an 8K 16-bit RAM chip.
16384 (0x4000)
SLIDE 47 Keyboard
Keyboard chip: a single memory-mapped 16-bit register Input: scan-code (16-bit value) of the currently pressed key, or 0 if no key is pressed Output: same
16
Keyboard
Keyboard
How to read the keyboard:
Low-level (hardware): probe the contents of the Keyboard chip at RAM location:
24576 (0x6000)
High-level: use the OS command keyPressed()
(effects the same operation, discussed later in the course, when we write the OS).
Special keys:
The keyboard is implemented as a built-in Keyboard.hdl chip. When this java chip is loaded into the simulator, it connects to the regular keyboard and pipes the scan-code of the currently pressed key to the keyboard memory map.
The simulated keyboard enabler button
Simulated keyboard:
SLIDE 48 Memory Physical Implementation
Access logic:
Access to any address from 0 to 16,383 results in accessing the RAM16K chip-part
Access to any address from 16,384 to 24,575 results in accessing the Screen chip-part
Access to address 24,576 results in accessing the keyboard chip-part
Access to any other address is invalid.
load
in
16 15 16
RAM16K
(16K mem. chip)
address
16383
Screen
(8K mem. chip) 16384 24575 24576
Keyboard
(one register)
Memory
Keyboard Screen
The Memory chip is essentially a package that integrates the three chip-parts RAM16K, Screen, and Keyboard into a single, contiguous address space. This packaging effects the programmer’s view of the memory, as well as the necessary I/O side-effects.
SLIDE 49 Memory: Programmer’s View
Using the memory:
To record or recall values (e.g. variables, objects, arrays), use the first 16K words of the memory: (0x0000-3FFF)
To write to the screen (or read the screen), use the next 8K words of the memory: (0x4000-5FFF)
To read which key is currently pressed, use the next word of the memory: (0x6000).
Data Screen memory map Keyboard map
Memory
Keyboard Screen
SLIDE 50 LogicCircuit Keyboard Configuration
How to read the keyboard:
Low-level (hardware): probe the contents of the Keyboard chip at RAM location:
24576 (0x6000)
High-level: use the OS command keyPressed()
(effects the same operation, discussed later in the course, when we write the OS).
SLIDE 51 LogicCircuit Configuration for the Hack Screen
The Screen chip emulates basic RAM chip functionality:
read logic: out = Screen[address] write logic: if load then Screen[address] = in
Side effect: Continuously refreshes a 256 by 512 black-and-white screen device
SLIDE 52 LogicCircuit Configuration of the Hack Instruction Memory
Access logic:
Access to any address from 0 to 16,383 results in accessing the RAM16K chip-part
Access to any address from 16,384 to 24,575 results in accessing the Screen chip-part
Access to address 24,576 results in accessing the keyboard chip-part
Access to any other address is invalid.
SLIDE 53 LogicCircuit Configuration of the Hack Data Memory.
Access logic:
Access to any address from 0 to 16,383 results in accessing the RAM16K chip-part
Access to any address from 16,384 to 24,575 results in accessing the Screen chip-part
Access to address 24,576 results in accessing the keyboard chip-part
Access to any other address is invalid.
SLIDE 54 CPU Operation
instruction inM
16 1 15 15 16
16
writeM addressM pc reset
1
CPU
to data memory to instruction memory from data memory from instruction memory
CPU internal components (invisible in this chip diagram): ALU and 3 registers: A, D & PC CPU execute logic: The CPU executes the instruction according to the Hack language specification:
The D and A values, if they appear in the instruction, are read from (or written to) the respective CPU-resident registers
The M value, if there is one in the instruction operand, is read from inM
If the instruction’s result includes M, then the ALU output is placed in outM, the value of the CPU-resident A register is placed in addressM, and writeM is asserted.
a Hack machine language instruction like M=D+M, represented as a 16-bit value
SLIDE 55 CPU Operation
instruction inM
16 1 15 15 16
16
writeM addressM pc reset
1
CPU
to data memory to instruction memory from data memory from instruction memory
CPU internal components (invisible in this chip diagram): ALU and 3 registers: A, D, PC CPU fetch logic: Recall that:
- 1. the instruction may include a jump directive (if the jump bits are non-zero)
- 2. the ALU emits two control bits, indicating if the ALU output is zero or less than zero
If reset==0: the CPU uses this information (the jump bits and the ALU control bits) as follows: If there should be a jump, the PC is set to the value of A; else, PC is set to PC+1 If reset==1: the PC is set to 0. (thus restarting the computer)
a Hack machine language instruction like M=D+M, represented as a 16-bit value
SLIDE 56 The C-instruction Revisited
jump dest comp
1 1 1
a c1 c2 c3 c4 c5 c6 d1 d2 d3 j1 j2 j3
binary:
dest = comp; jump
SLIDE 57 Execute logic:
Decode Execute
Fetch logic: If there should be a jump, set PC to A else set PC to PC+1
ALU
Mux
D
Mux
reset inM addressM pc
A/M
instruction
decode
C C C C C
D A PC
C C
A A A M ALU output
writeM
C C
jump dest comp
1 1 1
a c1 c2 c3 c4 c5 c6 d1 d2 d3 j1 j2 j3
binary:
dest = comp; jump
CPU Implementation
Cycle:
Execute Fetch
Resetting the computer: Set reset to 1, then set it to 0. CPU Schematic:
Includes most of the
CPU’s execution logic
The CPU’s control logic is
hinted: each circled “c” represents one or more control bits, taken from the instruction
The “decode”
bar does not represent a chip, but rather indicates that the instruction bits are decoded (somehow).
SLIDE 58 ALU
Mux
D
Mux
reset inM addressM pc
A/M
instruction
decode
C C C C C
D A PC
C C
A A A M ALU output
writeM
C C
jump dest comp
1 1 1
a c1 c2 c3 c4 c5 c6 d1 d2 d3 j1 j2 j3
binary:
dest = comp; jump
CPU Implementation Issue #1: How to interpret instructions like A=A+1;JMP or A=M;JMP?
HDL Implementation of this alternative: // If A is changing, use its new value for the target address Mux16 (sel=loadA, a=aReg, b=aIn, out=jmpAddr); PC (in=jmpAddr, reset=reset, inc=true, load=jmp, out[0..14]=pc); The CPU as diagrammed in the book uses the initial value of the A-register for the jump address, which is perhaps counter- intuitive. An alternative implementation would use the new value of A or M to load A. One could implement this alternative by adding a multiplexor that selects the A-register input value for the jump address during instructions that change the A-register.
x
c
SLIDE 59 ALU
Mux
D
Mux
reset inM addressM pc
A/M
instruction
decode
C C C C C
D A PC
C C
A A A M ALU output
writeM
C C
jump dest comp
1 1 1
a c1 c2 c3 c4 c5 c6 d1 d2 d3 j1 j2 j3
binary:
dest = comp; jump
CPU Implementation Issue #2: How to handle potential instruction execution asynchrony?
The CPU as diagrammed in the book glosses over the order in which the A, D, and PC registers, and memory, change. An as-described implementation
- f the Hack architecture using a
simulator (or hardware) that notices this asynchrony can create problems, e.g., if the PC changes as the result of a jump before the instruction has completely finished executing. One way to address this issue would be to add an instruction register that would remain stable throughout the instruction execution cycle.
Instruction Register
SLIDE 60 Computer Interface
Computer reset
Keyboard Screen
SLIDE 61 Computer Implementation
Data Memory (Memory) instruction CPU Instruction Memory (ROM32K) inM
addressM writeM pc reset
CHIP Computer { IN reset; PARTS: // implementation missing }
Implementation: Simple, the chip-parts do all the work.
SLIDE 62
LogicCircuit Implementation of the Hack Computer
SLIDE 63
Perspective: What We Have Left Out
Caching Instruction pipelining More I/O units Special-purpose processors (I/O, graphics, communications, …) Multi-core / parallelism Efficiency considerations Energy consumption considerations And a bunch of other stuff (take a good computer architecture course!)