Exploring Use-cases for Non-Volatile Memories in support of HPC - - PowerPoint PPT Presentation

exploring use cases for non volatile
SMART_READER_LITE
LIVE PREVIEW

Exploring Use-cases for Non-Volatile Memories in support of HPC - - PowerPoint PPT Presentation

Exploring Use-cases for Non-Volatile Memories in support of HPC Resilience Onkar Patil 1 , Saurabh Hukerikar 2 , Frank Mueller 1 , Christian Engelmann 2 1 Dept. of Computer Science, North Carolina State University 2 Computer Science and Mathematics


slide-1
SLIDE 1

Onkar Patil1, Saurabh Hukerikar2, Frank Mueller1, Christian Engelmann2

  • 1Dept. of Computer Science, North Carolina State University

2Computer Science and Mathematics Division, Oak Ridge National Laboratory

Exploring Use-cases for Non-Volatile Memories in support of HPC Resilience

slide-2
SLIDE 2

MOTIVATION

  • Exaflop Computers  compute devices + memory devices + interconnects + cooling and power
  • Close Proximity!
  • Manufacturing processes not foolproof
  • Lower durability and reliability
  • Frequency of device failures and data corruptions ↑  effectiveness and utility ↓
  • Future Applications need to be more resilient,
  • Maintain a balance between performance and power consumption
  • Minimize trade-offs
slide-3
SLIDE 3
  • Non-volatile memory (NVM) technologies  maintain state of computation in the primary memory

architecture

  • More potential as specialized hardware
  • Data Retention  critical in improving resilience of an application against crashes
  • Persistent memory regions to improve HPC resiliency  key aspect of this project

PROBLEM STATEMENT

APPLICATION

STATIC DATA STRUCTURES DYNAMIC DATA STRUCTURES

DRAM NVM APPLICATION

STATIC DATA STRUCTURES DYNAMIC DATA STRUCTURES

DRAM NVM APPLICATION

STATIC DATA STRUCTURES DYNAMIC DATA STRUCTURES

DRAM NVM

Data Versioning NVM-based Main Memory Application-directed Checkpointing

slide-4
SLIDE 4
  • Experimentation Setup
  • 16-node cluster with Dual socket, Quad-Core AMD Opteron, 128 GB DRAM memory, Intel

SSD from 100GB to 256GB

  • DGEMM benchmark of the HPCC benchmark suite
  • Tested for 4, 8 and 16-node configurations for a matrix sizes of 1000, 2000 and 3000

elements

RESULTS

slide-5
SLIDE 5

0.001 0.01 0.1 1 10 4 8 16 GFLOPS Nodes

GFLOPS in node scaling for StarDGEMM

DRAM PMEM_ONLY PMEM_CPY PMEM_VER

  • DRAM only allocation and NVM-based main memory perform better
  • An inefficient lookup algorithm

0.01 0.1 1 10 100 4 8 16 Time(sec) Nodes

Execution times in node scaling for StarDGEMM

DRAM PMEM_ONLY PMEM_CPY PMEM_VER

slide-6
SLIDE 6

0.0001 0.001 0.01 0.1 1 10 1000 2000 3000 GFLOPS

  • No. of elements

GFLOPS for problem size scaling in StarDGEMM

DRAM PMEM_ONL Y

  • All modes perform similar and consistently for node and data scaling
  • Execution time increases exponentially for multiple copies of memory

0.01 0.1 1 10 100 1000 10000 1000 2000 3000 Time(sec)

  • No. of elements

Execution time for problem size scaling in StarDGEMM

DRAM PMEM_ONLY PMEM_CPY PMEM_VER

slide-7
SLIDE 7
  • Conclusion:
  • Non-volatile memory devices can be used as specialized hardware for improving the

resilience of the system

  • Future Work:
  • Memory usage modes to make applications efficient and maintain complete system state
  • Minimal overhead
  • Support more complex applications
  • Lightweight recovery mechanisms to work with the checkpointing schemes
  • Reduce downtime and rollback time
  • Intelligent policies that can help build resilient static and dynamic runtime system

CONCLUSION & FUTURE WORK

slide-8
SLIDE 8

Evaluating Performance of Burst Buffer Models for Real-Application Workloads in HPC Systems

Harsh Khetawat Frank Mueller Christopher Zimmer

slide-9
SLIDE 9

Introduction

  • Existing storage systems

becoming bottleneck

  • Solution: burst buffers
  • Use burst buffers for:

– Checkpoint/Restart I/O – Staging – Write-through cache for parallel FS

Burst Buffers on Cori

slide-10
SLIDE 10

Placement

  • Burst buffer placement:

– Co-located with compute nodes (Summit) – Co-located with I/O nodes (Cori) – Separate set of nodes

  • Trade-offs in choice of placements

– Capability – I/O models, staging, etc. – Predictability – Impact on shared resources, runtime variability – Economic – Infrastructure reuse, cost of storage device

  • I/O performance dependent on placement

– Choice of network topology

slide-11
SLIDE 11

Idea

  • Simulate network and burst buffer architectures

– CODES simulation suite – Real-world I/O traces (Darshan) – Full multi-tenant system with mixed workloads (capability/capacity) – Supports network topologies – Local & external storage models

  • Combine network topologies and storage

architectures

  • Performance under striping/protection schemes
  • Reproducible tool for HPC centers
slide-12
SLIDE 12

Conclusion

  • Determine based on workload characteristics:

– Burst buffer placement – Network topology – Performance of striping across burst buffers – Overhead of resilience schemes

  • Reproducible tool to:

– Simulate specific workloads – Determine best fit

slide-13
SLIDE 13

Thank You