CS 3410 Computer Science Cornell University [K. Bala, A. Bracy, E. - - PowerPoint PPT Presentation

cs 3410 computer science cornell university
SMART_READER_LITE
LIVE PREVIEW

CS 3410 Computer Science Cornell University [K. Bala, A. Bracy, E. - - PowerPoint PPT Presentation

CS 3410 Computer Science Cornell University [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon] Combinational logic Output computed directly from inputs System has no internal state Nothing depends on the past! Combinational Inputs


slide-1
SLIDE 1

CS 3410 Computer Science Cornell University

[K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]

slide-2
SLIDE 2

Combinational logic

  • Output computed directly from inputs
  • System has no internal state
  • Nothing depends on the past!

Need:

  • to record data
  • to build stateful circuits
  • a state-holding device

2

Inputs Combinational circuit Outputs N M

slide-3
SLIDE 3

PC Prog Mem

+4

Instructions live in Program Memory PC = Program Counter, address of current instruction “Next Instruction Address” = PC + 4

3 00100000000000100000000000001010 00100000000000010000000000000000 00000000001000100001100000101010

current instruction

A basic processor

  • fetches
  • decodes
  • executes
  • ne instruction at a time

CPU executes

When should we update the PC? As fast and as often as possible?

slide-4
SLIDE 4

Clock helps coordinate state changes

  • Fixed period
  • Frequency = 1/period

4

1

clock period clock high clock low rising edge falling edge

slide-5
SLIDE 5

State changes at clock edge

5

positive edge-triggered negative edge-triggered

Need to design edge-triggered storage

Positive edge-triggered D Flip-Flop:

  • Data captured when clock low
  • Output changes only on rising edge

(could also design it to be negative edge-triggered)

DFF

slide-6
SLIDE 6

Signals must be stable prior to rising edge Positive edge-triggered D Flip-Flop:

  • Output changes only on rising edge
  • Data captured when clock low

6

clk

compute

set

tcombinational

get

DFF

combinational circuit

DFF

inputs

  • utputs
slide-7
SLIDE 7

PC Prog Mem

+4

7

current instruction

CPU executes clk x PC x

x104 x108 x100

PC

x100 x104

insn

SUB XOR ADD (a 32 bit encoding of a subtract instruction)

slide-8
SLIDE 8

If we wanted to make the clock faster, what would we need to speed up?

(A) the +4 adder (B) the time it takes to read Program Memory (C) the time it takes to execute an instruction (D) B or C (E) A, B & C

8

PC Prog Mem

+4

current instruction

CPU executes x PC

slide-9
SLIDE 9

Clocks State

  • Storing 1 bit
  • Storing N bits:

–Registers –Memory

9

slide-10
SLIDE 10
  • D flip-flops in parallel
  • shared clock
  • Additional (optional) inputs:

writeEnable, reset, …

10

clk D0 D3 D1 D2

4 4

4-bit reg clk

DFF DFF DFF DFF

slide-11
SLIDE 11

Register File

  • N read/write registers
  • Indexed by

register number Single-Read-Port Single-Write-Port 32 x 32 Register File QR DW RW RR W

32 32 1 5 5

11

slide-12
SLIDE 12

Register File

  • N read/write registers
  • Indexed by

register number

addi r5, r0, 10 How to write to one register in the register file?

  • Need a decoder

Reg 0 Reg 30 Reg 31 Reg 1

5-to-32 decoder

5 RW D32

…. …

00101

12

slide-13
SLIDE 13

i2 i1 i0 o0 o1 o2 o3 o4 o5 o6o7 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1

3-to-8 decoder

3 RW

101

13

slide-14
SLIDE 14

i2 i1 i0 o0 o1 o2 o3 o4 o5 o6o7 0 0 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 1 0 1 1 1 1 0 1 1 1 1 1

3-to-8 decoder

3 RW

101

i2 i1 i0

i2 i1 i0

  • 5

14

slide-15
SLIDE 15

Register File

  • N read/write registers
  • Indexed by

register number

addi r5, r0, 10 How to write to one register in the register file?

  • Need a decoder
  • Write enable signal prevents unintended writes

Reg 0

….

Reg 30 Reg 31 Reg 1

5-to-32 decoder

5RW W D32

15

slide-16
SLIDE 16

Register File

  • N read/write registers
  • Indexed by

register number

How to read from one register? Need:

(A) Encoder (B) Decoder (C) Or Gate (D) Multiplexor

Reg 0 Reg 1

….

Reg 30 Reg 31

16

32

….

slide-17
SLIDE 17

Register File

  • N read/write registers
  • Indexed by

register number

How to read from one register?

  • Need a multiplexor

32 Reg 0 Reg 1

….

Reg 30 Reg 31

M U X

32 QA 5 RA

….

17

slide-18
SLIDE 18

Register File

  • N read/write registers
  • Indexed by

register number

How to read from two registers?

  • Need 2 multiplexors!

32 Reg 0 Reg 1

….

Reg 30 Reg 31

M U X M U X

32 QA 32 QB 5 5 RB RA

…. ….

18

slide-19
SLIDE 19

Register File

  • N read/write registers
  • Indexed by

register number

Implementation:

  • D flip flops to store bits
  • Decoder for each write port
  • Mux for each read port

32 Reg 0 Reg 1

….

Reg 30 Reg 31

M U X M U X

32 QA 32 QB 5 5 RB RA

…. ….

5-to-32 decoder

5 RW W D32

19

slide-20
SLIDE 20

Register File

  • N read/write registers
  • Indexed by

register number

Implementation:

  • D flip flops to store bits
  • Decoder for each write port
  • Mux for each read port

Dual-Read-Port Single-Write-Port 32 x 32 Register File QA QB DW RW RA RB W

32 32 32 1 5 5 5

20

slide-21
SLIDE 21

MIPS register file

  • 32 x 32-bit registers
  • r0 wired to zero
  • Write port indexed via RW
  • on falling edge when WE=1
  • Read ports indexed via RA, RB

Registers

  • Numbered from 0 to 31.
  • Can be referred by number: $0, $1, $2, … $31
  • Convention, each register also has a name:
  • $16 - $23 à $s0 - $s7, $8 - $15 à $t0 - $t7

A B W RW RA RB WE

32 32 32 1 5 5 5

r1 r2 … r31

21

slide-22
SLIDE 22

If we wanted to support 64 registers, what would change? (A) W,A,B 32 à 64 (B) Rw,Ra,Rb 5 à 6 (C) W 32à 64, Rw 5 à 6 (D) A & B only

22

A B W RW RA RB WE

32 32 32 1 5 5 5

r0 r1 … r31

slide-23
SLIDE 23

Register File tradeoffs

+ Very fast (a few gate delays for both read and write) + Adding extra ports is straightforward – Doesn’t scale e.g. 32Mb register file with 32 bit registers (1M registers) Need 32x 1M-to-1 multiplexor and 32x 20-to-1M decoder How many logic gates/transistors? a b c d e f g h s2s1 s0 8-to-1 mux

23

slide-24
SLIDE 24

Clocks State

  • Storing 1 bit
  • Storing N bits:

–Registers –Memory

24

slide-25
SLIDE 25
  • Storage Cells + bus
  • Inputs: Address, Data (for writes)
  • Outputs: Data (for reads)
  • Also need R/W signal (not shown)
  • N address bits à 2N words total
  • M data bits à each word M bits

M N Address Data

25

slide-26
SLIDE 26
  • Storage Cells + bus
  • Decoder selects a word line
  • R/W selector determines access type
  • Word line is then coupled to the data lines

note: w/ a tri-state buffer, not a huge mux!

data lines Address Decoder R/W

slide-27
SLIDE 27

Dout[2]

E.g. How do we design a 4 x 2 Memory Module? (i.e. 4 word lines that are each 2 bits wide)?

2-to-4 decoder

2 Address

D Q D Q D Q D Q D Q D Q D Q D Q

Dout[1] Din[1] Din[2]

enable enable enable enable enable enable enable enable

1 2 3

Write Enable Output Enable

4 x 2 Memory

slide-28
SLIDE 28

2-to-4 decoder

2 Address Dout[1] Dout[2] Din[1] Din[2]

enable enable enable enable enable enable enable enable

1 2 3

Write Enable Output Enable

E.g. How do we design a 4 x 2 Memory Module? (i.e. 4 word lines that are each 2 bits wide)?

slide-29
SLIDE 29

2-to-4 decoder

2 Address Dout[1] Dout[2] Din[1] Din[2]

enable enable enable enable enable enable enable enable

1 2 3

Write Enable Output Enable

E.g. How do we design a 4 x 2 Memory Module? (i.e. 4 word lines that are each 2 bits wide)?

Word lines

29

slide-30
SLIDE 30

2-to-4 decoder

2 Address Dout[1] Dout[2] Din[1] Din[2]

enable enable enable enable enable enable enable enable

1 2 3

Write Enable Output Enable

E.g. How do we design a 4 x 2 Memory Module? (i.e. 4 word lines that are each 2 bits wide)?

Bit lines

30

slide-31
SLIDE 31
  • 32-bit address
  • 32-bit data (but byte addressed)
  • Enable + 2 bit memory control (mc)

00: read word (4 byte aligned) 01: write byte 10: write halfword (2 byte aligned) 11: write word (4 byte aligned)

memory

32 addr 2 mc 32 32 E Din Dout

0xffffffff . . . 0x0000000b 0x0000000a 0x00000009 0x00000008 0x00000007 0x00000006 0x00000005 0x00000004 0x00000003 0x00000002 0x00000001 0x00000000

0x05 1 byte address

31

slide-32
SLIDE 32

In past semesters we have covered the rest of this lecture in the beginning of the Caches Lecture. So if you have no recollection of covering this, it might be because once again we didn’t. J

32

slide-33
SLIDE 33

Typical SRAM Cell

B ! B word line bit line

Each cell stores one bit, and requires 4 – 8 transistors (6 is typical) Pass-Through Transistors

33

slide-34
SLIDE 34

SRAM

  • A few transistors (~6) per cell
  • Used for working memory (caches)
  • But for even higher density…

34

slide-35
SLIDE 35

Dynamic-RAM (DRAM)

  • Data values require constant refresh

Gnd word line bit line Capacitor

Each cell stores one bit, and requires 1 transistors

35

slide-36
SLIDE 36

Dynamic-RAM (DRAM)

  • Data values require constant refresh

Gnd word line bit line Capacitor

Pass-Through Transistors Each cell stores one bit, and requires 1 transistors

36

slide-37
SLIDE 37

Single transistor vs. many gates

  • Denser, cheaper ($30/1GB vs. $30/2MB)
  • But more complicated, and has analog sensing

Also needs refresh

  • Read and write back…
  • …every few milliseconds
  • Organized in 2D grid, so can do rows at a time
  • Chip can do refresh internally

Hence… slower and energy inefficient

37

slide-38
SLIDE 38

Register File tradeoffs

+ Very fast (a few gate delays for both read and write) + Adding extra ports is straightforward – Expensive, doesn’t scale – Volatile

Volatile Memory alternatives: SRAM, DRAM, …

– Slower + Cheaper, and scales well – Volatile

Non-Volatile Memory (NV-RAM): Flash, EEPROM, …

+ Scales well – Limited lifetime; degrades after 100000 to 1M writes

38