ADVANCED MEMORY SYSTEMS Mahdi Nazm Bojnordi Assistant Professor - - PowerPoint PPT Presentation

advanced memory systems
SMART_READER_LITE
LIVE PREVIEW

ADVANCED MEMORY SYSTEMS Mahdi Nazm Bojnordi Assistant Professor - - PowerPoint PPT Presentation

ADVANCED MEMORY SYSTEMS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Programmable Controller Limitations to Existing Memory Controllers Modern memory controllers are


slide-1
SLIDE 1

ADVANCED MEMORY SYSTEMS

CS/ECE 6810: Computer Architecture

Mahdi Nazm Bojnordi

Assistant Professor School of Computing University of Utah

slide-2
SLIDE 2

Programmable Controller

slide-3
SLIDE 3

Limitations to Existing Memory Controllers

¨ Modern memory controllers are performance-critical and complex

Core 1 Core 2 Core 3 Core 4 Bank 1 Bank 2 Bank 3 Bank 4 Shared Cache

Memory Controller

Address Mapping Command Scheduling Power Management QoS Maintenance Refresh Management

On-chip Off-chip

Multiple performance

  • bjectives

Application-specific

  • ptimizations

Patches and in-field updates

slide-4
SLIDE 4

Programmable Memory Controllers

¨ Programmability can make a memory controller higher-performance

and more flexible

Core 1 Core 2 Core 3 Core 4 Bank 1 Bank 2 Bank 3 Bank 4 Shared Cache

On-chip Off-chip

Programmable Framework

Multiple performance

  • bjectives

Application-specific

  • ptimizations

Patches and in-field updates

  • Memory

Controller

slide-5
SLIDE 5

Design Overview

¨ Key idea: Judicious division of labor between

specialized hardware and firmware

¤ Request and transaction processing in firmware ¤ Configurable timing validation in hardware

Request Processor Transaction Processor Command Logic

PARDIS

slide-6
SLIDE 6

Request Processing

¨ A RISC ISA for operating on memory requests

ALU

Memory Request Address Mapping Control Flow Application Hints

Address Metadata

Processor Memory

slide-7
SLIDE 7

Request Processing

¨ Queue management with instruction flags ¤ R flag enqueues a request ¤ T flag dequeues a transaction

Request Processor

Firmware Request Queue Transaction Queue

¨ An instruction can be

annotated with both R and T flags if needed

ADD XOR SUB T AND R

slide-8
SLIDE 8

Implementation

¨ Two five-stage pipelines and one configurable timing validation

circuit

slide-9
SLIDE 9

Emerging Technologies

slide-10
SLIDE 10

DRAM Cell Structure

¨ One-transistor, one-capacitor

¤ Realizing the capacitor is challenging

  • 1T-1C DRAM
  • Charge based sensing
  • Volatile
slide-11
SLIDE 11

DRAM Cell Structure

¨ One-transistor, one-capacitor

¤ Realizing the capacitor is challenging

  • 1T-1C DRAM
  • Charge based sensing
  • Volatile
slide-12
SLIDE 12

Memory Scaling in Jeopardy

Scaling of semiconductor memories greatly challenged beyond 20nm

Example: DRAM

slide-13
SLIDE 13

Memory Scaling in Jeopardy

Scaling of semiconductor memories greatly challenged beyond 20nm

Example: DRAM

A/R < 10

slide-14
SLIDE 14

Why DRAM Slow?

¨ Logic VLSI Process: optimized for better transistor

performance

¨ DRAM VLSI Process: optimized for low cost and low

leakage

PCB Logic DRAM How to reduce distance?

slide-15
SLIDE 15

Processing-in-Memory

¨ Increasing bandwidth by placing processing units on

same die with DRAM

¨ Not a new concept!

¤ Merged Logic and DRAM (MLD)

n IBM, Mitsubishi, Samsung, Toshiba, etc.

¤ Other efforts

n FlexRAM n IRAM n Active Pages n …

slide-16
SLIDE 16

Historical PIM Challenges

¨ Hard to program (no standard interface) ¨ Embedding logic on modified DRAM process

¤ Substantially larger transistors

n Reduce memory capacity

¤ Slower logic and lower performance

¨ Embedding DRAM on modified logic process

¤ Leaky transistors, high refresh rates, increased cost/bit ¤ Increased manufacturing complexity

slide-17
SLIDE 17

3D Die-Stacking

¨ Different devices are stacked on top of each other ¨ Layers are connected by through-silicon vias (TSVs) ¨ Why? ¤ Communication between devices bottlenecked by limited

I/O pins

¤ Integrating heterogeneous elements on a single wafer is

expensive and suboptimal

PCB Logic DRAM Logic DRAM DRAM

slide-18
SLIDE 18

3D Stacked Memory

¨ Hybrid Memory Cube (HMC)

¤ A logic layer at the bottom

¨ High Bandwidth Memory (HBM)

¤ Silicon interposer at the bottom

Package Substrate Silicon Interposer DRAM Dice{ … Processor Die Interface Controller Bank In-Package Cache Controller