IEE5008 Autumn 2012 Memory Systems PIPELINED SRAM Pranav Arya - PowerPoint PPT Presentation

IEE5008 – Autumn 2012 Memory Systems PIPELINED SRAM Pranav Arya EECS Int’l Graduate Program National Chiao Tung University pranav_arya7@yahoo.co.in Pranav Arya 2012

Outline  Introduction  Cache organization  Cache implementation  Pipelined SRAM  Wave pipeline cache  Pipelined-burst cache  Conclusion  Reference 2 Pranav Arya NCTU IEE5008 Memory Systems 2012

Introduction  Processors – fast and faster  Parallelism – ILP, thread level parallelism  Multicore Architectures  Memory – not so fast  More varieties  Relatively less speed improvement 3 Pranav Arya NCTU IEE5008 Memory Systems 2012

Introduction (contd.)  Development trends: Processor vs Memory [1] Figure1: Comparison of performance improvement of processors and memory over the years [1] 4 Pranav Arya NCTU IEE5008 Memory Systems 2012

Introduction (contd.)  Nearest to processing unit; fastest  Principle of Locality of reference  Cache parameters  Organization  Content Management  Consistency Management 5 Pranav Arya NCTU IEE5008 Memory Systems 2012

Introduction: Cache Organization  Three ways to organize cache  Direct mapped  Fully associative  Set associative Figure2: Various cache organizations [1] 6 Pranav Arya NCTU IEE5008 Memory Systems 2012

Cache Organization (contd.)  Multilevel cache hierarchy (L1, L2 and L3) L1 On chip L2 Off chip L3 Off chip; shared Figure3. Cache Hierarchy Figure4. 8-core Nehalem processor and its Cache hierarchy [7] 7 Pranav Arya NCTU IEE5008 Memory Systems 2012

Cache Implementation  Organization  Physical implementation  Control and timing 8 Pranav Arya NCTU IEE5008 Memory Systems 2012

Physical Implementation Figure5: Different implementations of SRAM cell [1] 9 Pranav Arya NCTU IEE5008 Memory Systems 2012

Control and Timing  SRAM control signals and timing signals  Timing operations  Two types based on timing  Asynchronous SRAM  Synchronous SRAM 10 Pranav Arya NCTU IEE5008 Memory Systems 2012

Asynchronous Operation  ATD based operation Figure6. 2-way set associative asynchronous SRAM and its timing diagram [1] 11 Pranav Arya NCTU IEE5008 Memory Systems 2012

Synchronous SRAM  Clock based operation  Completely synchronized (single clock)  Partial synchronization (two clocks)  Pipelined operation  Wave pipeline mode  Pipelined-burst mode 12 Pranav Arya NCTU IEE5008 Memory Systems 2012

Synchronous SRAM: Wave Pipeline Cache  Early implementation model  High capacity, high speed  Operation based on clock signal  Internal clock – for internal circuitry  External – for addressing  Combined – external clock for addressing, internal for the SRAM core 13 Pranav Arya NCTU IEE5008 Memory Systems 2012

Wave Pipeline Cache: Example 1  Fully pipelined 512kb SRAM, 2ns cycle time Figure7: Block Diagram of 512kb CMOS Pipelined SRAM [2] 14 Pranav Arya NCTU IEE5008 Memory Systems 2012

Wave Pipeline: Example 1 (contd.)  8-stage pipeline synchronized to a clock signal Figure8. Pipelined operation for the 512kb SRAM [2] 15 Pranav Arya NCTU IEE5008 Memory Systems 2012

Wave Pipeline: Example 2  Need for SRAM to directly connect with high- frequency CPU bus line  Two stage wave pipeline  First stage – clock triggered asynchronous SRAM core operation  Second stage – clock triggered synchronous data output 16 Pranav Arya NCTU IEE5008 Memory Systems 2012

Wave Pipeline: Example 2 (contd.) Figure9. Block diagram of a 16 Mb synchronous SRAM and its wave pipeline operation [3] 17 Pranav Arya NCTU IEE5008 Memory Systems 2012

Wave Pipeline: Improvements  Issue with early designs  Synchronization of system clock with output data at high frequency – overlap of data waves  Reason – sensitivity of access time to variations in voltage, temperature and process.  Solution to synchronization issue – dual sensing latches 18 Pranav Arya NCTU IEE5008 Memory Systems 2012

Wave Pipeline: Example 3-Sensing Latches Figure10. Dual-sensing scheme [4] Figure11. Dual-sensing latch circuit diagram [4] 19 Pranav Arya NCTU IEE5008 Memory Systems 2012

Example 3-Sensing Latches (contd.)  Use of two clocking signals  Clock for addressing and driving internal circuit  Clock’ to mux and latch out the data Figure12. Data wave diagram after latching [4] Figure13. Dependence of cycle time and maximum access time [4] 20 Pranav Arya NCTU IEE5008 Memory Systems 2012

Pipelined-Burst SRAM  Used in most modern SRAM architectures  Burst mode read and write operations  X-1-1-1 operations 21 Pranav Arya NCTU IEE5008 Memory Systems 2012

Pipelined-Burst SRAM: Example 1  Features:  4-1-1-1 pipelined-burst scheme  Burst read of four 32bit word; Data prefetched for write operation Figure14. Synchronous pipelined-burst SRAM block diagram [5] 22 Pranav Arya NCTU IEE5008 Memory Systems 2012

Example 1 (contd.)  Read and write in bursts  Idle cycles in RAW and WAR conditions Figure15. Timing diagram for the pipelined-burst SRAM block diagram given in figure14 [5] 23 Pranav Arya NCTU IEE5008 Memory Systems 2012

Example 2: Some Improvements in design  Added double-late address-data buffers (DLWBs) Figure16. Synchronous pipelined-burst SRAM block diagram using DLWBs [5] 24 Pranav Arya NCTU IEE5008 Memory Systems 2012

Example 2 (contd.) Figure17. Synchronous pipelined-burst SRAM block diagram using DLWBs [5] 25 Pranav Arya NCTU IEE5008 Memory Systems 2012

Example 2 (contd.) Figure18. Timing diagram for pipelined-burst SRAM using DLWBs [5] 26 Pranav Arya NCTU IEE5008 Memory Systems 2012

Conclusion  SRAM performance improvements achievable through pipelining  Various schemes available for pipelining  Wave pipeline shows variable performance due to clock synchronization issues  Pipelined-burst SRAM better since data read/write occur in bursts – faster data operations on SRAM blocks 27 Pranav Arya NCTU IEE5008 Memory Systems 2012

Reference 1. B. Jacob, S. W. Ng, D. T. Wang. Memory systems: Cache, DRAM, Disk. 2. Terry I. Chappell, Barbara A. Chappell, Stanley E. Schuster, James W. Allan, Stephen P. Klepner, Rajiv V. Joshi and Robert L. Franch.. A 2-ns Cycle, 3.8-11s Access 512-kb CMOS ECL SRAM with a Fully Pipelined Architecture, IEEE Journal of solid-state circuits, VOL. 26, NO. 11, November 1991 3. Kazuyuki Nakamura, Shigeru Kuhara, Tohru Kimura, Masahide Takada, Hisamitsu Suzuki, Hiroshi Yoshida, and Tohru Yamazaki. A 220-MHz Pipelined 16-Mb BiCMOS SRAM with PLL Proportional Self-Timing Generator, IEEE Journal of solid-state circuits, VOL. 29, NO. 11, November 1994 4. Suguru Tachibana, Hisayuki Higuchi, Koichi Takasugi, Katsuro Sasaki, Toshiaki Yamanaka, and Yoshinobu Nakagome. A 2.6ns Wave-Pipelined CMOS SRAM with Dual-Sensing-Latch Circuits, IEEE Journal of solid-state circuits, VOL. 30, NO. 4, April 1995 5. Kazuyuki Nakamura, Koichi Takeda, Hideo Toyoshima, Kenji Nodal, Hiroaki Ohkubo, Tetsuya Uchida, Toshiyuki Shimizu, Toshiro Itani, Ken Tokashiki, Koji Kishimoto. A 500MHz 4Mb CMOS Pipeline-Burst Cache SRAM with Point-to-Point Noise Reduction Coding 110, Journal of solid-state circuits, VOL. 32, NO. 11, November 1997 6. Cangsang Zhao, Uddalak Bhattacharya, Martin Denham, Jim Kolousek, Yi Lu, Yong-Gee Ng, Novat Nintunze, Kamal Sarkez, and Hemmige D. Varadarajan. An 18-Mb, 12.3-GB/s CMOS Pipeline-Burst Cache SRAM with 1.54 Gb/s/pin, IEEE Journal of solid-state circuits, VOL. 34, NO. 11, November 1999 7. D. Molka, D. Hackenberg, R. Schone, and M.S. Muller, Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System, 18th International Conference on Parallel Architectures and Compilation Techniques, September 2009 28 Pranav Arya NCTU IEE5008 Memory Systems 2012

THANK YOU 29 Pranav Arya NCTU IEE5008 Memory Systems 2012

IEE5008 Autumn 2012 Memory Systems PIPELINED SRAM Pranav Arya - PowerPoint PPT Presentation

IEE5008 Autumn 2012 Memory Systems PIPELINED SRAM Pranav Arya EECS Intl Graduate Program National Chiao Tung University pranav_arya7@yahoo.co.in Pranav Arya 2012 Outline Introduction Cache organization Cache implementation

IEE5008 Autumn 2012 Memory Systems 3D Stacking SRAM Anwar,Hossameldin Department of

IEE5008 Autumn 2012 Memory Systems Quad Data Rate SRAM for the High-Throughput Communication

DLX Pipeline 2-stage fully pipelined Adder 4-stage fully pipelined Multiplier 5-cycle

Review: FP Pipeline Model 4-stage fully pipelined adder, Non-pipelined multiplier and divider A1

IEE5008 Autumn 2012 Memory Systems 3D Nand Flash Memory Pranav Arya Department of

Processor + SRAM By: Jakub Hladik, Tim Lindquist The SRAM SRAM REQUIREMENTS: 256x8bit

IEE5008 Autumn 2012 Memory Systems Survey on Memory Access Scheduling For On-Chip Cache

COMP 590-154: Computer Architecture Memory / DRAM SRAM vs. DRAM SRAM = Static RAM As

Hardware Design with VHDL Design Example: SRAM ECE 443 External SRAM A common type of system

Memory Systems Survey on the Off-Chip Scheduling of Memory Accesses in the Memory Interface of

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Chapter 6: Designing a Pipelined CPU What are our resources? 1 washer, 1 dryer, 1 folder

WARM SRAM: A Novel Scheme to Reduce Static Leakage Energy in SRAM Arrays Mahadevan

Background Allen Tanner built an SRAM/ROM generator program back in 2004 the ROM seems to

Background w Allen Tanner built an SRAM/ROM generator program back in 2004 n the ROM seems

Background memCellsF09 Allen Tanner built an SRAM/ROM generator program back in 2004 Single-

Conjugate Heat Transfer Analysis of a high loaded convection cooled Vane with STAR-CCM+ Ren

Lawson, Inc. Interim Earnings Presentation Six Months Ended August 31, 2001 (Year Ending

4th JCI & ACI Joint Seminar Sustainable and Resilient Concrete Structures -Codes and

SEVENTH STS FORUM Kyoto 3-5 October 2010 David Bibby, Victoria University of Wellington, New

Poster Presentation Poster Session (I), L008-Lobby, 14:30-15:30 Poster Session Chairs: Dr.

Ensemble Tropical Cyclone Activity Prediction using TIGGE data JMA/WMO Workshop on Effective

The LHCf (LHC forward) experiment ~ a collider experiment dedicated for ultra-high-energy

DED U PDATE SCEDD W ORKSHOP 2014 A URORA Brian Gaskill Housing Coordinator Agrex Announcement,

IEE5008 Autumn 2012 Memory Systems PIPELINED SRAM Pranav Arya - PowerPoint PPT Presentation

IEE5008 Autumn 2012 Memory Systems PIPELINED SRAM Pranav Arya EECS Intl Graduate Program National Chiao Tung University pranav_arya7@yahoo.co.in Pranav Arya 2012 Outline Introduction Cache organization Cache implementation

IEE5008 Autumn 2012 Memory Systems 3D Stacking SRAM Anwar,Hossameldin Department of

IEE5008 Autumn 2012 Memory Systems Quad Data Rate SRAM for the High-Throughput Communication

DLX Pipeline 2-stage fully pipelined Adder 4-stage fully pipelined Multiplier 5-cycle

Review: FP Pipeline Model 4-stage fully pipelined adder, Non-pipelined multiplier and divider A1

IEE5008 Autumn 2012 Memory Systems 3D Nand Flash Memory Pranav Arya Department of

Processor + SRAM By: Jakub Hladik, Tim Lindquist The SRAM SRAM REQUIREMENTS: 256x8bit

IEE5008 Autumn 2012 Memory Systems Survey on Memory Access Scheduling For On-Chip Cache

COMP 590-154: Computer Architecture Memory / DRAM SRAM vs. DRAM SRAM = Static RAM As

Hardware Design with VHDL Design Example: SRAM ECE 443 External SRAM A common type of system

Memory Systems Survey on the Off-Chip Scheduling of Memory Accesses in the Memory Interface of

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Chapter 6: Designing a Pipelined CPU What are our resources? 1 washer, 1 dryer, 1 folder

WARM SRAM: A Novel Scheme to Reduce Static Leakage Energy in SRAM Arrays Mahadevan

Background Allen Tanner built an SRAM/ROM generator program back in 2004 the ROM seems to

Background w Allen Tanner built an SRAM/ROM generator program back in 2004 n the ROM seems

Background memCellsF09 Allen Tanner built an SRAM/ROM generator program back in 2004 Single-

Conjugate Heat Transfer Analysis of a high loaded convection cooled Vane with STAR-CCM+ Ren

Lawson, Inc. Interim Earnings Presentation Six Months Ended August 31, 2001 (Year Ending

4th JCI &amp; ACI Joint Seminar Sustainable and Resilient Concrete Structures -Codes and

SEVENTH STS FORUM Kyoto 3-5 October 2010 David Bibby, Victoria University of Wellington, New

Poster Presentation Poster Session (I), L008-Lobby, 14:30-15:30 Poster Session Chairs: Dr.

Ensemble Tropical Cyclone Activity Prediction using TIGGE data JMA/WMO Workshop on Effective

The LHCf (LHC forward) experiment ~ a collider experiment dedicated for ultra-high-energy

DED U PDATE SCEDD W ORKSHOP 2014 A URORA Brian Gaskill Housing Coordinator Agrex Announcement,

4th JCI & ACI Joint Seminar Sustainable and Resilient Concrete Structures -Codes and