CS 3410 Computer Science Cornell University [K. Bala, A. Bracy, E. - PowerPoint PPT Presentation

CS 3410 Computer Science Cornell University [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]

Combinational logic • Output computed directly from inputs • System has no internal state • Nothing depends on the past! Combinational Inputs Outputs N circuit M Need: • to record data • to build stateful circuits • a state-holding device 2

A basic processor current Prog CPU instruction Mem • fetches executes • decodes • executes +4 one instruction at a time PC Instructions live in Program Memory PC = Program Counter, address of 00100000000000100000000000001010 current instruction 00100000000000010000000000000000 00000000001000100001100000101010 “Next Instruction Address” = PC + 4 When should we update the PC? As fast and as often as possible? 3

Clock helps coordinate state changes • Fixed period • Frequency = 1/period clock rising high falling edge edge 1 0 clock clock period low 4

State changes at clock edge positive edge-triggered negative edge-triggered Need to design edge-triggered storage DFF Positive edge-triggered D Flip-Flop: • Data captured when clock low • Output changes only on rising edge (could also design it to be negative edge-triggered) 5

Signals must be stable prior to rising edge Positive edge-triggered D Flip-Flop: • Output changes only on rising edge • Data captured when clock low outputs inputs DFF DFF combinational circuit clk t combinational compute get set 6

current Prog CPU instruction Mem executes PC +4 x clk PC PC x100 x104 x x100 x104 x108 insn ADD SUB XOR (a 32 bit encoding of a subtract instruction) 7

current Prog CPU instruction Mem executes PC +4 If we wanted to make the clock faster, x what would we need to speed up? PC (A) the +4 adder (B) the time it takes to read Program Memory (C) the time it takes to execute an instruction (D) B or C (E) A, B & C 8

Clocks State • Storing 1 bit • Storing N bits: –Registers –Memory 9

DFF D0 • D flip-flops in parallel • shared clock DFF D1 • Additional (optional) inputs: writeEnable, reset, … DFF D2 DFF D3 4-bit 4 4 reg clk clk 10

Register File • N read/write registers Q R D W 32 Single-Read-Port • Indexed by 32 Single-Write-Port register number 32 x 32 Register File W R W R R 1 5 5 11

Register File D32 Reg 0 • N read/write registers Reg 1 …. • Indexed by 5-to-32 register number decoder Reg 30 … Reg 31 addi r5, r0, 10 00101 5 R W How to write to one register in the register file? • Need a decoder 12

i2 i1 i0 o0 o1 o2 o3 o4 o5 o6o7 3-to-8 0 0 0 decoder … 0 0 1 0 1 0 101 3 R W 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 13

i2 i1 i0 o0 o1 o2 o3 o4 o5 o6o7 3-to-8 0 0 0 1 decoder … 0 0 1 1 0 1 0 1 101 3 R W 0 1 1 1 1 0 0 1 i2 o0 i1 1 0 1 1 i0 1 1 0 1 i2 o5 i1 1 1 1 1 i0 14

Register File D32 Reg 0 • N read/write registers Reg 1 …. • Indexed by 5-to-32 register number decoder Reg 30 Reg 31 addi r5, r0, 10 5R W W How to write to one register in the register file? • Need a decoder • Write enable signal prevents unintended writes 15

Register File 32 Reg 0 • N read/write registers Reg 1 …. …. • Indexed by register number Reg 30 Reg 31 How to read from one register? Need: (A) Encoder (B) Decoder (C) Or Gate (D) Multiplexor 16

Register File 32 Reg 0 • N read/write registers Reg 1 32 M …. …. • Indexed by Q A U X register number Reg 30 Reg 31 How to read from one register? • Need a multiplexor 5 R A 17

Register File 32 Reg 0 • N read/write registers Reg 1 32 M …. …. • Indexed by Q A U X register number Reg 30 Reg 31 M 32 …. U Q B X How to read from two registers? • Need 2 multiplexors! 5 5 R A R B 18

Register File D32 32 Reg 0 • N read/write registers Reg 1 32 M …. …. • Indexed by Q A U 5-to-32 X register number decoder Reg 30 Reg 31 M Implementation: 32 …. U Q B X • D flip flops to store bits • Decoder for each write port • Mux for each read port 5 5 5 R A W R W R B 19

Register File • N read/write registers Q A D W 32 Dual-Read-Port • Indexed by 32 Single-Write-Port register number Q B 32 32 x 32 Register File W R W R A R B Implementation: 1 5 5 5 • D flip flops to store bits • Decoder for each write port • Mux for each read port 20

MIPS register file 32 x 32-bit registers • A r1 W 32 r0 wired to zero • 32 r2 Write port indexed via R W • B 32 … • on falling edge when WE=1 r31 Read ports indexed via R A , R B • WE R W R A R B Registers 1 5 5 5 Numbered from 0 to 31. • Can be referred by number: $0, $1, $2, … $31 • Convention, each register also has a name: • • $16 - $23 à $s0 - $s7, $8 - $15 à $t0 - $t7 21

If we wanted to support 64 A registers, what would r0 W 32 32 change? r1 B 32 … (A) W,A,B 32 à 64 r31 (B) R w ,R a ,R b 5 à 6 WE R W R A R B (C) W 32 à 64, R w 5 à 6 1 5 5 5 (D) A & B only 22

8-to-1 mux a Register File tradeoffs b + Very fast (a few gate delays for c both read and write) + Adding extra ports is d straightforward e – Doesn’t scale f e.g. 32Mb register file with g 32 bit registers (1M registers) h Need 32x 1M-to-1 multiplexor and 32x 20-to-1M decoder How many logic gates/transistors? s 2 s 1 s 0 23

Clocks State • Storing 1 bit • Storing N bits: –Registers –Memory 24

• Storage Cells + bus • Inputs: Address, Data (for writes) • Outputs: Data (for reads) • Also need R/W signal (not shown) N Address N address bits à 2 N words total • M M data bits à each word M bits • Data 25

• Storage Cells + bus • Decoder selects a word line • R/W selector determines access type • Word line is then coupled to the data lines note: w/ a tri-state buffer, not a huge mux! data lines Address Decoder R/W

D in [1] D in [2] E.g. How do we design D Q D Q a 4 x 2 Memory Module? enable enable 0 (i.e. 4 word lines that are D Q 2-to-4 D Q decoder each 2 bits wide)? enable enable 1 2 4 x 2 Memory D Q D Q Address enable enable 2 D Q D Q enable enable 3 Write Enable Output Enable D out [1] D out [2]

D in [1] D in [2] E.g. How do we design a 4 x 2 Memory Module? enable enable 0 (i.e. 4 word lines that are 2-to-4 decoder each 2 bits wide)? enable enable 1 2 Address enable enable 2 enable enable 3 Write Enable Output Enable D out [1] D out [2]

D in [1] D in [2] E.g. How do we design a 4 x 2 Memory Module? enable enable 0 (i.e. 4 word lines that are 2-to-4 decoder each 2 bits wide)? enable enable 1 2 Address enable enable 2 Word lines enable enable 3 Write Enable Output Enable D out [1] D out [2] 29

D in [1] D in [2] E.g. How do we design a 4 x 2 Memory Module? enable enable 0 (i.e. 4 word lines that are 2-to-4 decoder each 2 bits wide)? enable enable 1 2 Address enable enable 2 Bit lines enable enable 3 Write Enable Output Enable D out [1] D out [2] 30

D out D in 1 byte address memory 32 32 0xffffffff . . . 32 2 0x05 0x0000000b E addr mc 0x0000000a • 32-bit address 0x00000009 0x00000008 • 32-bit data (but byte addressed) 0x00000007 0x00000006 • Enable + 2 bit memory control (mc) 0x00000005 00: read word (4 byte aligned) 0x00000004 0x00000003 01: write byte 0x00000002 10: write halfword (2 byte aligned) 0x00000001 0x00000000 11: write word (4 byte aligned) 31

In past semesters we have covered the rest of this lecture in the beginning of the Caches Lecture. So if you have no recollection of covering this, it might be because once again we didn’t. J 32

Typical SRAM Cell bit line word line ! B B Each cell stores one bit, and requires 4 – 8 transistors (6 is typical) Pass-Through Transistors 33

SRAM • A few transistors (~6) per cell • Used for working memory (caches) • But for even higher density… 34

Dynamic-RAM (DRAM) • Data values require constant refresh bit line word line Capacitor Gnd Each cell stores one bit, and requires 1 transistors 35

Dynamic-RAM (DRAM) • Data values require constant refresh bit line word line Pass-Through Transistors Capacitor Gnd Each cell stores one bit, and requires 1 transistors 36

Single transistor vs. many gates • Denser, cheaper ($30/1GB vs. $30/2MB) • But more complicated, and has analog sensing Also needs refresh • Read and write back… • …every few milliseconds • Organized in 2D grid, so can do rows at a time • Chip can do refresh internally Hence… slower and energy inefficient 37

Register File tradeoffs + Very fast (a few gate delays for both read and write) + Adding extra ports is straightforward – Expensive, doesn’t scale – Volatile Volatile Memory alternatives: SRAM, DRAM, … – Slower + Cheaper, and scales well – Volatile Non-Volatile Memory (NV-RAM): Flash, EEPROM, … + Scales well – Limited lifetime; degrades after 100000 to 1M writes 38

CS 3410 Computer Science Cornell University [K. Bala, A. Bracy, E. - PowerPoint PPT Presentation

CS 3410 Computer Science Cornell University [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon] Combinational logic Output computed directly from inputs System has no internal state Nothing depends on the past! Combinational Inputs

CS 3410: Computer System Organization and Programming Anne Bracy Computer Science Cornell

Anne Bracy CS 3410 Computer Science Cornell University The slides are the product of many

Anne Bracy CS 3410 Computer Science Cornell University The slides are the product of many

Virtual Memory Anne Bracy CS 3410 Computer Science Cornell University The slides are the

Caches and Memory Anne Bracy CS 3410 Computer Science Cornell University Slides by Anne Bracy

Anne Bracy CS 3410 Computer Science Cornell University The slides are the product of many

Anne Bracy CS 3410 Computer Science Cornell University The slides are the product of many

Anne Bracy CS 3410 Computer Science Cornell University The slides are the product of many

Introduction Prof. Hakim Weatherspoon CS 3410 Computer Science Cornell University

Anne Bracy CS 3410 Computer Science Cornell University The slides are the product of many

Anne Bracy CS 3410 Computer Science Cornell University The slides are the product of many

Virtual Memory Anne Bracy CS 3410 Computer Science Cornell University The slides are the

Anne Bracy CS 3410 Computer Science Cornell University The slides are the

Hakim Weatherspoon CS 3410 Computer Science Cornell University The slides are the product of

Prof. Anne Bracy CS 3410 Computer Science Cornell University The slides are the product of many

Anne Bracy CS 3410 Computer Science Cornell University The slides are the product of many

Unit Testing a C++ Database Application with Unit Testing a C++ Database Application with Unit

Transparently Scale-out SQL Databases with Data Grids Erik Brandsberg, CTO Heimdall Data Agenda

Modern SQL: Evolution of a dinosaur Markus Winand Krakw, 9-11 May 2018 Still using Windows

Facility 96 Calan TM Gate facility Feed Intake in Brangus

Convex optimization based on global lower second-order models Nikita Doikov Yurii Nesterov

Claudia Frugiuele Mixing stops at the LHC in collaboration with P . Agrawal hep ph 1304.3068

CMPS 112: Spring 2019 Comparative Programming Languages Datatypes and Recursion

CAQE: A Certifying QBF Solver FMCAD Austin, Texas, September 29 2015 1 / 15 Markus N. Rabe 1