Pranav Arya 2012
IEE5008 Autumn 2012 Memory Systems PIPELINED SRAM Pranav Arya - - PowerPoint PPT Presentation
IEE5008 Autumn 2012 Memory Systems PIPELINED SRAM Pranav Arya - - PowerPoint PPT Presentation
IEE5008 Autumn 2012 Memory Systems PIPELINED SRAM Pranav Arya EECS Intl Graduate Program National Chiao Tung University pranav_arya7@yahoo.co.in Pranav Arya 2012 Outline Introduction Cache organization Cache implementation
NCTU IEE5008 Memory Systems 2012 Pranav Arya
Outline
Introduction
Cache organization Cache implementation
Pipelined SRAM
Wave pipeline cache Pipelined-burst cache
Conclusion Reference
2
NCTU IEE5008 Memory Systems 2012 Pranav Arya
Introduction
Processors – fast and faster
Parallelism – ILP, thread level parallelism Multicore Architectures
Memory – not so fast
More varieties Relatively less speed improvement
3
NCTU IEE5008 Memory Systems 2012 Pranav Arya
Introduction (contd.)
Development trends: Processor vs Memory [1]
Figure1: Comparison of performance improvement of processors and memory
- ver the years [1]
4
NCTU IEE5008 Memory Systems 2012 Pranav Arya
Introduction (contd.)
Nearest to processing unit; fastest Principle of Locality of reference Cache parameters
Organization Content Management Consistency Management
5
NCTU IEE5008 Memory Systems 2012 Pranav Arya
Introduction: Cache Organization
Three ways to organize cache
Direct mapped Fully associative Set associative
Figure2: Various cache organizations [1]
6
NCTU IEE5008 Memory Systems 2012 Pranav Arya
Cache Organization (contd.)
Multilevel cache hierarchy (L1, L2 and L3)
L1 On chip L3 Off chip; shared L2 Off chip
- Figure3. Cache Hierarchy
- Figure4. 8-core Nehalem processor and its Cache hierarchy [7]
7
NCTU IEE5008 Memory Systems 2012 Pranav Arya
Cache Implementation
Organization Physical implementation Control and timing
8
NCTU IEE5008 Memory Systems 2012 Pranav Arya
Physical Implementation
Figure5: Different implementations of SRAM cell [1]
9
NCTU IEE5008 Memory Systems 2012 Pranav Arya
Control and Timing
SRAM control signals and timing signals Timing operations Two types based on timing
Asynchronous SRAM Synchronous SRAM
10
NCTU IEE5008 Memory Systems 2012 Pranav Arya
Asynchronous Operation
ATD based operation
- Figure6. 2-way set associative asynchronous SRAM and its timing diagram [1]
11
NCTU IEE5008 Memory Systems 2012 Pranav Arya
Synchronous SRAM
Clock based operation
Completely synchronized (single clock) Partial synchronization (two clocks)
Pipelined operation
Wave pipeline mode Pipelined-burst mode
12
NCTU IEE5008 Memory Systems 2012 Pranav Arya
Synchronous SRAM: Wave Pipeline Cache
Early implementation model High capacity, high speed Operation based on clock signal
Internal clock – for internal circuitry External – for addressing Combined – external clock for addressing, internal for the SRAM core
13
NCTU IEE5008 Memory Systems 2012 Pranav Arya
Wave Pipeline Cache: Example 1
Fully pipelined 512kb SRAM, 2ns cycle time
Figure7: Block Diagram of 512kb CMOS Pipelined SRAM [2]
14
NCTU IEE5008 Memory Systems 2012 Pranav Arya
Wave Pipeline: Example 1 (contd.)
8-stage pipeline synchronized to a clock signal
- Figure8. Pipelined operation for the 512kb SRAM [2]
15
NCTU IEE5008 Memory Systems 2012 Pranav Arya
Wave Pipeline: Example 2
Need for SRAM to directly connect with high- frequency CPU bus line Two stage wave pipeline
First stage – clock triggered asynchronous SRAM core
- peration
Second stage – clock triggered synchronous data output
16
NCTU IEE5008 Memory Systems 2012 Pranav Arya
Wave Pipeline: Example 2 (contd.)
- Figure9. Block diagram of a 16 Mb synchronous SRAM and its wave pipeline operation [3]
17
NCTU IEE5008 Memory Systems 2012 Pranav Arya
Wave Pipeline: Improvements
Issue with early designs
Synchronization of system clock with output data at high frequency – overlap of data waves Reason – sensitivity of access time to variations in voltage, temperature and process.
Solution to synchronization issue – dual sensing latches
18
NCTU IEE5008 Memory Systems 2012 Pranav Arya
Wave Pipeline: Example 3-Sensing Latches
- Figure10. Dual-sensing scheme [4]
- Figure11. Dual-sensing latch circuit diagram [4]
19
NCTU IEE5008 Memory Systems 2012 Pranav Arya
Example 3-Sensing Latches (contd.)
Use of two clocking signals
Clock for addressing and driving internal circuit Clock’ to mux and latch out the data
- Figure12. Data wave diagram after latching [4]
- Figure13. Dependence of cycle time and maximum
access time [4]
20
NCTU IEE5008 Memory Systems 2012 Pranav Arya
Pipelined-Burst SRAM
Used in most modern SRAM architectures Burst mode read and write operations X-1-1-1 operations
21
NCTU IEE5008 Memory Systems 2012 Pranav Arya
Pipelined-Burst SRAM: Example 1
Features:
4-1-1-1 pipelined-burst scheme Burst read of four 32bit word; Data prefetched for write
- peration
- Figure14. Synchronous pipelined-burst SRAM block diagram [5]
22
NCTU IEE5008 Memory Systems 2012 Pranav Arya
Example 1 (contd.)
Read and write in bursts Idle cycles in RAW and WAR conditions
- Figure15. Timing diagram for the pipelined-burst SRAM block diagram given in figure14 [5]
23
NCTU IEE5008 Memory Systems 2012 Pranav Arya
Example 2: Some Improvements in design
Added double-late address-data buffers (DLWBs)
- Figure16. Synchronous pipelined-burst SRAM block diagram using DLWBs [5]
24
NCTU IEE5008 Memory Systems 2012 Pranav Arya
Example 2 (contd.)
- Figure17. Synchronous pipelined-burst SRAM block diagram using DLWBs [5]
25
NCTU IEE5008 Memory Systems 2012 Pranav Arya
Example 2 (contd.)
- Figure18. Timing diagram for pipelined-burst SRAM using DLWBs [5]
26
NCTU IEE5008 Memory Systems 2012 Pranav Arya
Conclusion
SRAM performance improvements achievable through pipelining Various schemes available for pipelining Wave pipeline shows variable performance due to clock synchronization issues Pipelined-burst SRAM better since data read/write
- ccur in bursts – faster data operations on SRAM
blocks
27
NCTU IEE5008 Memory Systems 2012 Pranav Arya
Reference
1.
- B. Jacob, S. W. Ng, D. T. Wang. Memory systems: Cache, DRAM, Disk.
2. Terry I. Chappell, Barbara A. Chappell, Stanley E. Schuster, James W. Allan, Stephen P. Klepner, Rajiv
- V. Joshi and Robert L. Franch.. A 2-ns Cycle, 3.8-11s Access 512-kb CMOS ECL SRAM with a Fully
Pipelined Architecture, IEEE Journal of solid-state circuits, VOL. 26, NO. 11, November 1991 3. Kazuyuki Nakamura, Shigeru Kuhara, Tohru Kimura, Masahide Takada, Hisamitsu Suzuki, Hiroshi Yoshida, and Tohru Yamazaki. A 220-MHz Pipelined 16-Mb BiCMOS SRAM with PLL Proportional Self-Timing Generator, IEEE Journal of solid-state circuits, VOL. 29, NO. 11, November 1994 4. Suguru Tachibana, Hisayuki Higuchi, Koichi Takasugi, Katsuro Sasaki, Toshiaki Yamanaka, and Yoshinobu Nakagome. A 2.6ns Wave-Pipelined CMOS SRAM with Dual-Sensing-Latch Circuits, IEEE Journal of solid-state circuits, VOL. 30, NO. 4, April 1995 5. Kazuyuki Nakamura, Koichi Takeda, Hideo Toyoshima, Kenji Nodal, Hiroaki Ohkubo, Tetsuya Uchida, Toshiyuki Shimizu, Toshiro Itani, Ken Tokashiki, Koji Kishimoto. A 500MHz 4Mb CMOS Pipeline-Burst Cache SRAM with Point-to-Point Noise Reduction Coding 110, Journal of solid-state circuits, VOL. 32,
- NO. 11, November 1997
6. Cangsang Zhao, Uddalak Bhattacharya, Martin Denham, Jim Kolousek, Yi Lu, Yong-Gee Ng, Novat Nintunze, Kamal Sarkez, and Hemmige D. Varadarajan. An 18-Mb, 12.3-GB/s CMOS Pipeline-Burst Cache SRAM with 1.54 Gb/s/pin, IEEE Journal of solid-state circuits, VOL. 34, NO. 11, November 1999 7.
- D. Molka, D. Hackenberg, R. Schone, and M.S. Muller, Memory Performance and Cache Coherency
Effects on an Intel Nehalem Multiprocessor System, 18th International Conference on Parallel Architectures and Compilation Techniques, September 2009
28
NCTU IEE5008 Memory Systems 2012 Pranav Arya