Binary Access Memory: An Optimized Lookup Table for Successive - - PowerPoint PPT Presentation

binary access memory
SMART_READER_LITE
LIVE PREVIEW

Binary Access Memory: An Optimized Lookup Table for Successive - - PowerPoint PPT Presentation

Binary Access Memory: An Optimized Lookup Table for Successive Approximation Applications Benjamin Hershberg*, Skyler Weaver*, Seiji Takeuchi, Koichi Hamashita , Un -Ku Moon* *School of Electrical Engineering and Computer Science, Oregon


slide-1
SLIDE 1

Binary Access Memory:

An Optimized Lookup Table for Successive Approximation Applications

Benjamin Hershberg*, Skyler Weaver*, Seiji Takeuchi†, Koichi Hamashita†, Un-Ku Moon* *School of Electrical Engineering and Computer Science, Oregon State University †Asahi Kasei EMD Corporation, Atsugi, Japan

slide-2
SLIDE 2

Presentation Overview

  • Introduction & Motivation
  • Binary Access Memory (BAM)

– Basic idea of BAM – Global pre-fetching – Local pre-charging – Asynchronous BAM

  • Conclusion
slide-3
SLIDE 3

Introduction & Motivation

slide-4
SLIDE 4

Typical SAR Error Correction

  • Popular SAR error correction methods

– Radix calibration – Trimming – Lookup table (LUT)

  • Outside the loop
  • Inside the loop

Vin

DAC SAR

m

slide-5
SLIDE 5

Generalized SAR Error Correction

  • Remaps each SAR code to some DAC code
  • Payoff: enables new ways of implementing binary search
  • Drawback: power, latency

Vin

DAC SAR Lookup Table (BAM)

n m

slide-6
SLIDE 6

Lookup Table Implementation

  • RAM – Random Access Memory

– But, binary search is not a random access pattern!

  • BAM – Binary Access Memory

– Exploit probabilistic aspects of binary search to reduce the latency and power requirements of the lookup table...

slide-7
SLIDE 7

BAM memory organization

slide-8
SLIDE 8

SRAM

8 words 8 words 8 words 8 words 8 words 8 words 8 words 8 words 8 words 8 words 8 words 8 words 8 words 8 words 8 words 8 words COLUMN SELECT ROW SELECT

ADDR[6:0] ADDR[6:5] ADDR[2:0] ADDR[4:3]

slide-9
SLIDE 9

Useful Properties of Binary Search

Property 1

  • A binary search is a
  • ne-way journey

down the search tree

SA begin Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7

Signal Level

slide-10
SLIDE 10

Useful Properties of Binary Search

Property 1

  • A binary search is a
  • ne-way journey

down the search tree

SA begin Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7

Signal Level

slide-11
SLIDE 11

Useful Properties of Binary Search

Property 1

  • A binary search is a
  • ne-way journey

down the search tree

SA begin Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7

Signal Level

slide-12
SLIDE 12

How can we improve the organization

  • f data words in the memory?
  • SRAM: organizes data according to similarity in

address code

  • BAM: organizes data according to similarity in

location within the search tree

slide-13
SLIDE 13

Useful Properties of Binary Search

Property 2

  • The probability that a node will be visited during a

binary search is a non-uniform distribution.

SA begin Step 1 0.25 1.00 0.5 0.5 0.25 0.25

0.125 0.125 0.125 0.125 0.125 0.125

Most likely Least likely

slide-14
SLIDE 14

How can we improve the organization

  • f data words in the memory?
  • Make the nodes with the highest probability of being

visited the “easiest” to access.

  • Minimize average energy/bit and average latency.
slide-15
SLIDE 15

BAM memory organization

Level 1 Level 2 Level 3

slide-16
SLIDE 16

Basic Operation – Step 1

1 1 Level 1 Level 2 1 1 Level 3

Step 1 1 0 Level 1 2 3 4

Level 2

5 6

Level 3

7 Memory Address

Level 1 Level 2 Level 3 1000000 1100000 1001110 1001101 1001000 1001100 1001010 1001011 1001001 1001111 1011100 1000100 1010000 1011000 1010100 1110000 0100000 0110000 0010000

slide-17
SLIDE 17

Basic Operation – Step 2

1 1 Level 1 Level 2 1 1 Level 3

Step 1 1 0 Level 1 2 1 1 3 4

Level 2

5 6

Level 3

7 Memory Address

Level 1 Level 2 Level 3 1000000 1100000 1001110 1001101 1001000 1001100 1001010 1001011 1001001 1001111 1011100 1000100 1010000 1011000 1010100 1110000 0100000 0110000 0010000

slide-18
SLIDE 18

Basic Operation – Step 3

1 1 Level 1 Level 2 1 1 Level 3

Step 1 1 0 Level 1 2 1 1 3 1 1 4

Level 2

5 6

Level 3

7 Memory Address

Level 1 Level 2 Level 3 1000000 1100000 1001110 1001101 1001000 1001100 1001010 1001011 1001001 1001111 1011100 1000100 1010000 1011000 1010100 1110000 0100000 0110000 0010000

slide-19
SLIDE 19

Basic Operation – Step 4

1 1 Level 1 Level 2 1 1 Level 3

Step 1 1 0 Level 1 2 1 1 3 1 1 4 1 1 0 Level 2 5 6

Level 3

7 Memory Address

Level 1 Level 2 Level 3 1000000 1100000 1001110 1001101 1001000 1001100 1001010 1001011 1001001 1001111 1011100 1000100 1010000 1011000 1010100 1110000 0100000 0110000 0010000

slide-20
SLIDE 20

Basic Operation – Final Result

1 1 Level 1 Level 2 1 1 Level 3

Step 1 1 0 Level 1 2 1 1 3 1 1 4 1 1 0 Level 2 5 1 1 1 6 1 1 1 1 0 Level 3 7 1 1 1 1 Memory Address

Level 1 Level 2 Level 3 1000000 1100000 1001110 1001101 1001000 1001100 1001010 1001011 1001001 1001111 1011100 1000100 1010000 1011000 1010100 1110000 0100000 0110000 0010000

slide-21
SLIDE 21

Basic Operation – Decode

Step 1 1 0 Level 1 2 1 1 3 1 1 4 1 1 0 Level 2 5 1 1 1 6 1 1 1 1 0 Level 3 7 1 1 1 1 Memory Address

Level 1 Level 2 Level 3

  • Simple decoding options
  • Level Select

– Determined by location of the ‘walking 1’

  • Block Select

– Use parent level’s address bits (either specific select lines from the parent level’s decoder or raw address bits will work)

  • Block Decode

– Use own level’s address bits

slide-22
SLIDE 22

Reduced Number of Block Switches

  • In BAM, number of

block switches per conversion always equals the number of levels

1 1 Level 1 Level 2 1 1 Level 3

1000000 1100000 1001110 1001101 1001000 1001100 1001010 1001011 1001001 1001111 1011100 1000100 1010000 1011000 1010100 1110000 0100000 0110000 0010000

3-level Memory Depth and Organization Average SRAM block switches Average BAM block switches 7bit - 3x2x2 4.5 3 9bit – 3x3x3 5.5 3 12bit: 4x4x4 7.5 3 14bit: 4x4x6 8.5 3

slide-23
SLIDE 23

Pre-fetching

slide-24
SLIDE 24

Useful Properties of Binary Search

Property 3

  • Only the two children nodes directly below the current node

have a chance of being accessed on the next step.

  • Reduce latency by pre-fetching both possible children nodes

during the parent’s step

SA begin Step 1 Step 2 Step 3 Step 4 Step 5 0.5 0.5

slide-25
SLIDE 25

Pre-fetch Top Level Changes

  • Reduce effective access latency
  • Store both children words at the parent’s address

Vin

DAC SAR Lookup Table (BAM)

n m-1 2n n n

slide-26
SLIDE 26

Sub-Block Re-Structuring for Pre-Fetch

word word word word word word word Decode 3-to-8

LSB[2:0] block select ADDR[2:0] enable (111) LBL LBL LBL LBL LBL LBL LBL LBL (110) (101) (011) (010) (001) (000 or 100)

Level 1 Sub-Block (No Prefetch)

slide-27
SLIDE 27

Sub-Block Re-Structuring for Pre-Fetch

double word double word double word Decode 2-to-4 ADDR[1:0] (11)

LBL LBL LBL LBL LBL LBL LBL LBL

(10) (01) (00) word (empty) enable

Level 1 Sub-Block (Prefetch)

slide-28
SLIDE 28

Pre-charging

slide-29
SLIDE 29

Pre-Charging

  • With pre-fetching implemented, there is now a word

which is guaranteed to be the first accessed after a sub-block switch.

1 1 Level 1 Level 2 1 1 Level 3

(empty) 000000 100000 110000 010000 101000

Step (7)

Level 1

1 1 2 1 1 3 1 1

Level 2

4 1 1 5 1 1 1

Level 3

6 1 1 1 1 7 Memory Address

Level 1 Level 2 Level 3

101100 100100 100110 100111 100101

slide-30
SLIDE 30

Pre-Charging

  • Worst-case latency occurs when switching to a new

sub-block

– In some designs, output glitching can also occur – Solution: pre-charging

  • When not selected, a sub-block’s ‘off state’ is to pre-

charges its local bit lines to the double-word which will always be requested first

Buffer data to system output Block Switch Acquire Data at local block output Inner block decode

slide-31
SLIDE 31

Pre-Charging

  • Worst-case latency occurs when switching to a new

sub-block

– In some designs, output glitching can also occur – Solution: pre-charging

  • When not selected, a sub-block’s ‘off state’ is to pre-

charges its local bit lines to the double-word which will always be requested first

Buffer data to system output Block Switch Acquire Data at local block output Inner block decode

slide-32
SLIDE 32

Pre-Charging

  • Worst-case latency occurs when switching to a new

sub-block

– In some designs, output glitching can also occur – Solution: pre-charging

  • When not selected, a sub-block’s ‘off state’ is to pre-

charges its local bit lines to the double-word which will always be requested first

Buffer data to system output Block Switch

slide-33
SLIDE 33

Asynchronous BAM

slide-34
SLIDE 34

Useful Properties of Binary Search

Property 4

  • There is only one step number which a node can be

visited during, and this step number is known for all nodes.

SA begin Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7

When STEP = 2, P(is current node) = 2-(STEP-1) When STEP != 2, P(is current node) = 0

  • Use this knowledge to generate

an asynchronous DONE signal

slide-35
SLIDE 35

Asynchronous BAM

Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7

Step DONE bit value 1 2 1 3 4 1 5 6 1 7 Steps where words store a ‘0’ DONE bit:

slide-36
SLIDE 36

Asynchronous BAM

Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7

Step DONE bit value 1 2 1 3 4 1 5 6 1 7 Steps where words store a ‘1’ DONE bit:

slide-37
SLIDE 37

Asynchronous BAM

double word double word double word Decode 2-to-4 ADDR[1:0] (11)

LBL LBL LBL LBL LBL LBL LBL LBL

(10) (01) (00) word (empty) enable

Level 1 Sub-Block (Prefetch)

1

LBL

DONE bits

Step 2 Step 1 Step 3

slide-38
SLIDE 38

Asynchronous BAM

  • BAM access latency depends on where you are in the

conversion

– Early steps have low latency due to tree structuring.

  • Fast early steps, slower later steps

– Asynchronous BAM – Compatible with incomplete-settling, metastability- reduction, and other well known SAR techniques

slide-39
SLIDE 39

Conclusion

slide-40
SLIDE 40

Conclusion

  • Binary Access Memory (BAM)

– Improve performance (speed, power) by customizing lookup table for binary search tree memory access patterns – Key concepts

  • Blocks organized by location in tree rather than code
  • Most frequently accessed blocks are easiest to access
  • Prefetch the two possible ‘next’ codes in advance
  • Precharge bit-lines for fast access on block switch steps
  • Encode a DONE clock bit for asynchronous operation
slide-41
SLIDE 41

Binary Access Memory:

An Optimized Lookup Table for Successive Approximation Applications

Benjamin Hershberg*, Skyler Weaver*, Seiji Takeuchi†, Koichi Hamashita†, Un-Ku Moon* *School of Electrical Engineering and Computer Science, Oregon State University †Asahi Kasei EMD Corporation, Atsugi, Japan