Why Use Scheduling? Sequential accesses to DRAM Memory Access are - - PowerPoint PPT Presentation

why use scheduling
SMART_READER_LITE
LIVE PREVIEW

Why Use Scheduling? Sequential accesses to DRAM Memory Access are - - PowerPoint PPT Presentation

Why Use Scheduling? Sequential accesses to DRAM Memory Access are wasteful Scheduler Improve latency and bandwidth of memory requests Matthew Cohen, Alvin Lin Order requests to take advantage 6.884 Complex Digital Systems of


slide-1
SLIDE 1

Memory Access Scheduler

Matthew Cohen, Alvin Lin 6.884 – Complex Digital Systems May 6th, 2005

Why Use Scheduling?

Sequential accesses to DRAM

are wasteful

Improve latency and bandwidth of

memory requests

Order requests to take advantage

  • f DRAM characteristics

DRAM Bank FSM

Idle Row Active Activate Row Bank Precharge Reads, Writes

Memory Access Scheduling

Active

Traditional Scheduling:

Bank 0 R Precharge Idle Active R Precharge Active R Precharge Active Idle Idle Bank 1 Idle Idle Idle Bank 2 Bank 3

Memory Access Scheduling:

Bank 0 Bank 1 Bank 2 Bank 3 Active R Precharge Active Active Active R R R Precharge Precharge Precharge Idle Idle Idle Idle Idle Idle Idle Idle Idle

Avoid data line conflicts (read/write)

Avoid control line conflicts

slide-2
SLIDE 2

High-Level Architecture

CPU

Inst. Cache Controller Data Cache Controller Memory Scheduler DRAM Instructions Data

Instruction and Data Cache

Separate I- and D-caches Fully parameterizable sizes Direct mapped caches Write-through, no-write-allocate Four words per cache line

V Tag Word 0 Word 1 Word 2 Word 3 V Tag Word 0 Word 1 Word 2 Word 3 V Tag Word 0 Word 1 Word 2 Word 3

Incremental Design

Fully blocking, single word per line Fully blocking, four words per line Hit under miss Miss under miss

Necessary for full benefits of scheduling

Non-Blocking Cache Architecture

On cache load miss,

add request to Pending Request Buffer (PRB)

Place µP tag in Tag location, set Valid, issue read

request to scheduler with tag = PRB index

If another read to same line, set tag and valid but no

new read request

On return of data, match tag to PRB line, retrieve µP

tag of valid entries, return data to µP

BUFTAG V0 V1 V2 V3 Tag0Tag1Tag2Tag3 BUFTAG V0 V1 V2 V3 Tag0Tag1Tag2Tag3 BUFTAG V0 V1 V2 V3 Tag0Tag1Tag2Tag3 BUFTAG V0 V1 V2 V3 Tag0Tag1Tag2Tag3

slide-3
SLIDE 3

Non-Blocking Cache Architecture

On cache store

request, search PRB

If already issued read

to this line, stall

BUFTAG V0 V1 V2 V3 Tag0Tag1Tag2Tag3 BUFTAG V0 V1 V2 V3 Tag0Tag1Tag2Tag3 BUFTAG V0 V1 V2 V3 Tag0Tag1Tag2Tag3 BUFTAG V0 V1 V2 V3 Tag0Tag1Tag2Tag3

High-Level Architecture

CPU

Inst. Cache Controller Data Cache Controller Memory Scheduler DRAM Instructions Data

Scheduler Overview

Cache misses are sent to the

scheduler

Scheduler is responsible for

interfacing with the DRAM

Requests may be honored out of

  • rder

Scheduler Tasks

Keep waiting buffers of pending memory

requests

Prioritize accesses in waiting buffer Respect timing of the DRAM Capture data coming back from DRAM Keep the DRAM busy!

slide-4
SLIDE 4

Scheduler RTL Design

Waiting Buffer Bank 0 Instructions Data Waiting Buffer Bank 1 Waiting Buffer Bank 2 Waiting Buffer Bank 3 DRAM From Cache Controllers Back to Cache Controllers

Incremental Design

Blocking In-Order Scheduler FIFOs as Waiting Buffers and In-

Order Scheduling

Real Waiting Buffers and

Interleaved Scheduling

Infinite Compile Time

Scheduler exploded in complexity Huge amount of combinational logic Memory access scheduling is a

difficult problem

DRAM is not designed to work easily

with scheduling

Architectural Exploration

Change cache size to adjust cache miss

percentage

Change PRB size to allow for scheduling

  • ptimization

Larger sizes should yield better results but

higher cost

slide-5
SLIDE 5

Total Time to Make 6000 Random Accesses to 512 Addresses 10000 20000 30000 40000 50000 60000 1 10 100 PRB Lines Time (ns) 128 Byte Cache 256 Byte Cache 512 Byte Cache

Synthesis Results (Area = 196,117.6 µm2)

Conclusion

Memory becoming bottleneck for computer

systems

In-order memory access is simple in logic

but wasteful in performance

Memory access scheduling is much more

efficient in theory, but complex in implementation

Acknowledgements

6884-bluespec 6884-staff group1, for teaching us

how to use Vector, even if you didn’t realize it…