Processing in Storage Class Memory Joel Nider Craig Mustard - - PowerPoint PPT Presentation
Processing in Storage Class Memory Joel Nider Craig Mustard - - PowerPoint PPT Presentation
Processing in Storage Class Memory Joel Nider Craig Mustard Andrada Zoltan Alexandra Fedorova Embedding Processors in SCM CPU Non-volatile RAM Storage Latency Is Decreasing Scaling Compute with Storage Storage Arrays Persistent Smart
Embedding Processors in SCM
CPU Non-volatile RAM
Storage Latency Is Decreasing
Scaling Compute with Storage
CPU + registers Smart Caches PIM in RAM
SCM
Smart Disks / SSD Storage Arrays Volatile Persistent
Latency
Scaling Compute with Storage
CPU + registers Smart Caches PIM in RAM
SCM
Smart Disks / SSD Storage Arrays Volatile Persistent
Latency
Benefits of PIM on SCM
CPU
Memory bus
DPU SCM DRAM
Benefits of PIM on SCM
CPU
Memory bus
Benefits of PIM on SCM
CPU
Memory bus
Benefits of PIM on SCM
CPU
DPU SCM
Memory bus
Benefits of PIM on SCM
CPU
Memory bus
DPU Count: SCM Capacity:
64 4 GB
Ratio: 1:64 MB
Core Density
Benefits of PIM on SCM
CPU
Memory bus
DPU Count: SCM Capacity:
128 8 GB
Ratio: 1:64 MB
Benefits of PIM on SCM
CPU
Memory bus
DPU Count: SCM Capacity:
256 16 GB
Ratio: 1:64 MB
Benefits of PIM on SCM
CPU
Memory bus
DPU Count: SCM Capacity:
512 32 GB
Ratio: 1:64 MB
Benefits of PIM on SCM
CPU
Memory bus
PIM Design Points
Inter-PIM Communication Core Density Instruction Set Address Translation
UPMEM Architecture and Limitations
DPU DRAM
UPMEM Architecture and Limitations
DPU DRAM DDR Interface Control SRAM External Bus
Interleaved Multithreading
UPMEM Architecture and Limitations
ABCDEFGHIJKLMNOPQRSTUV
Memory bus Input data DPU 0 DPU 1 DPU 2
UPMEM Architecture and Limitations
IJKLMNOPQRSTUVWXYZabcd
Memory bus Input data
A B C D E F G H
DPU 0
A
DPU 1
B
DPU 2
C
UPMEM Architecture and Limitations
QRSTUVWXYZabcdefghijkl
Memory bus Input data
A I B J C K D L E M F N G O H P
DPU 0
AI
DPU 1
BJ
DPU 2
CK
Raw Performance: Throughput
64KB SRAM
9 ranks x 64 DPUS = 576 DPUs 576 DPUs x 64MB = 36GB DRAM
36 GB in 0.16 s = 252 GB/s
Top speed of DDR4-2400 channel: 19GB/s
16 threads @ 2KB per transfer
64 MB DRAM
DPU
Use Case: Compression
File Size DPUs
spamfile 84 MB 172 mozilla 50 MB 105 nci 30 MB 64 dickens 10 MB 35 sao 7 MB 21 xml 5 MB 15 world192 1 MB 4 plrabn12 0.5 MB 2 terror2 0.1 MB 1
Wishlist
Concurrent Memory Access Data Triggered Functions Mix Of Memory Types Tuning For Performance
Future Directions
Hyperdimensional Computing Regular Expression Search ?
Thank you for watching
Joel Nider joel@ece.ubc.ca Craig Mustard craigm@ece.ubc.ca Andrada Zoltan zoltandrada@gmail.com Alexandra Fedorova sasha@ece.ubc.ca