IEE5008 Autumn 2012 Memory Systems PIPELINED SRAM Pranav Arya - - PowerPoint PPT Presentation

iee5008 autumn 2012 memory systems pipelined sram
SMART_READER_LITE
LIVE PREVIEW

IEE5008 Autumn 2012 Memory Systems PIPELINED SRAM Pranav Arya - - PowerPoint PPT Presentation

IEE5008 Autumn 2012 Memory Systems PIPELINED SRAM Pranav Arya EECS Intl Graduate Program National Chiao Tung University pranav_arya7@yahoo.co.in Pranav Arya 2012 Outline Introduction Cache organization Cache implementation


slide-1
SLIDE 1

Pranav Arya 2012

IEE5008 –Autumn 2012 Memory Systems PIPELINED SRAM

Pranav Arya EECS Int’l Graduate Program National Chiao Tung University pranav_arya7@yahoo.co.in

slide-2
SLIDE 2

NCTU IEE5008 Memory Systems 2012 Pranav Arya

Outline

Introduction

Cache organization Cache implementation

Pipelined SRAM

Wave pipeline cache Pipelined-burst cache

Conclusion Reference

2

slide-3
SLIDE 3

NCTU IEE5008 Memory Systems 2012 Pranav Arya

Introduction

Processors – fast and faster

Parallelism – ILP, thread level parallelism Multicore Architectures

Memory – not so fast

More varieties Relatively less speed improvement

3

slide-4
SLIDE 4

NCTU IEE5008 Memory Systems 2012 Pranav Arya

Introduction (contd.)

Development trends: Processor vs Memory [1]

Figure1: Comparison of performance improvement of processors and memory

  • ver the years [1]

4

slide-5
SLIDE 5

NCTU IEE5008 Memory Systems 2012 Pranav Arya

Introduction (contd.)

Nearest to processing unit; fastest Principle of Locality of reference Cache parameters

Organization Content Management Consistency Management

5

slide-6
SLIDE 6

NCTU IEE5008 Memory Systems 2012 Pranav Arya

Introduction: Cache Organization

Three ways to organize cache

Direct mapped Fully associative Set associative

Figure2: Various cache organizations [1]

6

slide-7
SLIDE 7

NCTU IEE5008 Memory Systems 2012 Pranav Arya

Cache Organization (contd.)

Multilevel cache hierarchy (L1, L2 and L3)

L1 On chip L3 Off chip; shared L2 Off chip

  • Figure3. Cache Hierarchy
  • Figure4. 8-core Nehalem processor and its Cache hierarchy [7]

7

slide-8
SLIDE 8

NCTU IEE5008 Memory Systems 2012 Pranav Arya

Cache Implementation

Organization Physical implementation Control and timing

8

slide-9
SLIDE 9

NCTU IEE5008 Memory Systems 2012 Pranav Arya

Physical Implementation

Figure5: Different implementations of SRAM cell [1]

9

slide-10
SLIDE 10

NCTU IEE5008 Memory Systems 2012 Pranav Arya

Control and Timing

SRAM control signals and timing signals Timing operations Two types based on timing

Asynchronous SRAM Synchronous SRAM

10

slide-11
SLIDE 11

NCTU IEE5008 Memory Systems 2012 Pranav Arya

Asynchronous Operation

ATD based operation

  • Figure6. 2-way set associative asynchronous SRAM and its timing diagram [1]

11

slide-12
SLIDE 12

NCTU IEE5008 Memory Systems 2012 Pranav Arya

Synchronous SRAM

Clock based operation

Completely synchronized (single clock) Partial synchronization (two clocks)

Pipelined operation

Wave pipeline mode Pipelined-burst mode

12

slide-13
SLIDE 13

NCTU IEE5008 Memory Systems 2012 Pranav Arya

Synchronous SRAM: Wave Pipeline Cache

Early implementation model High capacity, high speed Operation based on clock signal

Internal clock – for internal circuitry External – for addressing Combined – external clock for addressing, internal for the SRAM core

13

slide-14
SLIDE 14

NCTU IEE5008 Memory Systems 2012 Pranav Arya

Wave Pipeline Cache: Example 1

Fully pipelined 512kb SRAM, 2ns cycle time

Figure7: Block Diagram of 512kb CMOS Pipelined SRAM [2]

14

slide-15
SLIDE 15

NCTU IEE5008 Memory Systems 2012 Pranav Arya

Wave Pipeline: Example 1 (contd.)

8-stage pipeline synchronized to a clock signal

  • Figure8. Pipelined operation for the 512kb SRAM [2]

15

slide-16
SLIDE 16

NCTU IEE5008 Memory Systems 2012 Pranav Arya

Wave Pipeline: Example 2

Need for SRAM to directly connect with high- frequency CPU bus line Two stage wave pipeline

First stage – clock triggered asynchronous SRAM core

  • peration

Second stage – clock triggered synchronous data output

16

slide-17
SLIDE 17

NCTU IEE5008 Memory Systems 2012 Pranav Arya

Wave Pipeline: Example 2 (contd.)

  • Figure9. Block diagram of a 16 Mb synchronous SRAM and its wave pipeline operation [3]

17

slide-18
SLIDE 18

NCTU IEE5008 Memory Systems 2012 Pranav Arya

Wave Pipeline: Improvements

Issue with early designs

Synchronization of system clock with output data at high frequency – overlap of data waves Reason – sensitivity of access time to variations in voltage, temperature and process.

Solution to synchronization issue – dual sensing latches

18

slide-19
SLIDE 19

NCTU IEE5008 Memory Systems 2012 Pranav Arya

Wave Pipeline: Example 3-Sensing Latches

  • Figure10. Dual-sensing scheme [4]
  • Figure11. Dual-sensing latch circuit diagram [4]

19

slide-20
SLIDE 20

NCTU IEE5008 Memory Systems 2012 Pranav Arya

Example 3-Sensing Latches (contd.)

Use of two clocking signals

Clock for addressing and driving internal circuit Clock’ to mux and latch out the data

  • Figure12. Data wave diagram after latching [4]
  • Figure13. Dependence of cycle time and maximum

access time [4]

20

slide-21
SLIDE 21

NCTU IEE5008 Memory Systems 2012 Pranav Arya

Pipelined-Burst SRAM

Used in most modern SRAM architectures Burst mode read and write operations X-1-1-1 operations

21

slide-22
SLIDE 22

NCTU IEE5008 Memory Systems 2012 Pranav Arya

Pipelined-Burst SRAM: Example 1

Features:

4-1-1-1 pipelined-burst scheme Burst read of four 32bit word; Data prefetched for write

  • peration
  • Figure14. Synchronous pipelined-burst SRAM block diagram [5]

22

slide-23
SLIDE 23

NCTU IEE5008 Memory Systems 2012 Pranav Arya

Example 1 (contd.)

Read and write in bursts Idle cycles in RAW and WAR conditions

  • Figure15. Timing diagram for the pipelined-burst SRAM block diagram given in figure14 [5]

23

slide-24
SLIDE 24

NCTU IEE5008 Memory Systems 2012 Pranav Arya

Example 2: Some Improvements in design

Added double-late address-data buffers (DLWBs)

  • Figure16. Synchronous pipelined-burst SRAM block diagram using DLWBs [5]

24

slide-25
SLIDE 25

NCTU IEE5008 Memory Systems 2012 Pranav Arya

Example 2 (contd.)

  • Figure17. Synchronous pipelined-burst SRAM block diagram using DLWBs [5]

25

slide-26
SLIDE 26

NCTU IEE5008 Memory Systems 2012 Pranav Arya

Example 2 (contd.)

  • Figure18. Timing diagram for pipelined-burst SRAM using DLWBs [5]

26

slide-27
SLIDE 27

NCTU IEE5008 Memory Systems 2012 Pranav Arya

Conclusion

SRAM performance improvements achievable through pipelining Various schemes available for pipelining Wave pipeline shows variable performance due to clock synchronization issues Pipelined-burst SRAM better since data read/write

  • ccur in bursts – faster data operations on SRAM

blocks

27

slide-28
SLIDE 28

NCTU IEE5008 Memory Systems 2012 Pranav Arya

Reference

1.

  • B. Jacob, S. W. Ng, D. T. Wang. Memory systems: Cache, DRAM, Disk.

2. Terry I. Chappell, Barbara A. Chappell, Stanley E. Schuster, James W. Allan, Stephen P. Klepner, Rajiv

  • V. Joshi and Robert L. Franch.. A 2-ns Cycle, 3.8-11s Access 512-kb CMOS ECL SRAM with a Fully

Pipelined Architecture, IEEE Journal of solid-state circuits, VOL. 26, NO. 11, November 1991 3. Kazuyuki Nakamura, Shigeru Kuhara, Tohru Kimura, Masahide Takada, Hisamitsu Suzuki, Hiroshi Yoshida, and Tohru Yamazaki. A 220-MHz Pipelined 16-Mb BiCMOS SRAM with PLL Proportional Self-Timing Generator, IEEE Journal of solid-state circuits, VOL. 29, NO. 11, November 1994 4. Suguru Tachibana, Hisayuki Higuchi, Koichi Takasugi, Katsuro Sasaki, Toshiaki Yamanaka, and Yoshinobu Nakagome. A 2.6ns Wave-Pipelined CMOS SRAM with Dual-Sensing-Latch Circuits, IEEE Journal of solid-state circuits, VOL. 30, NO. 4, April 1995 5. Kazuyuki Nakamura, Koichi Takeda, Hideo Toyoshima, Kenji Nodal, Hiroaki Ohkubo, Tetsuya Uchida, Toshiyuki Shimizu, Toshiro Itani, Ken Tokashiki, Koji Kishimoto. A 500MHz 4Mb CMOS Pipeline-Burst Cache SRAM with Point-to-Point Noise Reduction Coding 110, Journal of solid-state circuits, VOL. 32,

  • NO. 11, November 1997

6. Cangsang Zhao, Uddalak Bhattacharya, Martin Denham, Jim Kolousek, Yi Lu, Yong-Gee Ng, Novat Nintunze, Kamal Sarkez, and Hemmige D. Varadarajan. An 18-Mb, 12.3-GB/s CMOS Pipeline-Burst Cache SRAM with 1.54 Gb/s/pin, IEEE Journal of solid-state circuits, VOL. 34, NO. 11, November 1999 7.

  • D. Molka, D. Hackenberg, R. Schone, and M.S. Muller, Memory Performance and Cache Coherency

Effects on an Intel Nehalem Multiprocessor System, 18th International Conference on Parallel Architectures and Compilation Techniques, September 2009

28

slide-29
SLIDE 29

NCTU IEE5008 Memory Systems 2012 Pranav Arya

THANK YOU

29