Exploring Use-cases for Non-Volatile Memories in support of HPC - PowerPoint PPT Presentation

Exploring Use-cases for Non-Volatile Memories in support of HPC Resilience Onkar Patil 1 , Saurabh Hukerikar 2 , Frank Mueller 1 , Christian Engelmann 2 1 Dept. of Computer Science, North Carolina State University 2 Computer Science and Mathematics Division, Oak Ridge National Laboratory

MOTIVATION • Exaflop Computers  compute devices + memory devices + interconnects + cooling and power • Close Proximity! • Manufacturing processes not foolproof • Lower durability and reliability • Frequency of device failures and data corruptions ↑  effectiveness and utility ↓ • Future Applications need to be more resilient, • Maintain a balance between performance and power consumption • Minimize trade-offs

PROBLEM STATEMENT • Non-volatile memory (NVM) technologies  maintain state of computation in the primary memory architecture • More potential as specialized hardware • Data Retention  critical in improving resilience of an application against crashes • Persistent memory regions to improve HPC resiliency  key aspect of this project APPLICATION APPLICATION APPLICATION STATIC DATA DYNAMIC DATA STATIC DATA DYNAMIC DATA STATIC DATA DYNAMIC DATA STRUCTURES STRUCTURES STRUCTURES STRUCTURES STRUCTURES STRUCTURES DRAM NVM DRAM NVM DRAM NVM NVM-based Main Memory Application-directed Checkpointing Data Versioning

RESULTS • Experimentation Setup • 16-node cluster with Dual socket, Quad-Core AMD Opteron, 128 GB DRAM memory, Intel SSD from 100GB to 256GB • DGEMM benchmark of the HPCC benchmark suite • Tested for 4, 8 and 16-node configurations for a matrix sizes of 1000, 2000 and 3000 elements

GFLOPS in node scaling for StarDGEMM DRAM 10 DRAM PMEM_ONLY Execution times in node scaling for StarDGEMM 100 PMEM_CPY PMEM_ONLY PMEM_VER PMEM_CPY PMEM_VER 1 10 GFLOPS Time(sec) 0.1 1 0.01 0.1 0.001 0.01 4 Nodes 8 16 4 Nodes 8 16 • DRAM only allocation and NVM-based main memory perform better • An inefficient lookup algorithm

DRAM Execution time for problem size scaling in StarDGEMM DRAM GFLOPS for problem size scaling in StarDGEMM PMEM_ONLY 10000 10 PMEM_CPY PMEM_ONL PMEM_VER Y 1000 1 100 0.1 GFLOPS Time(sec) 10 0.01 1 0.001 0.1 0.0001 0.01 1000 2000 3000 1000 2000 3000 No. of elements No. of elements • All modes perform similar and consistently for node and data scaling • Execution time increases exponentially for multiple copies of memory

CONCLUSION & FUTURE WORK • Conclusion: • Non-volatile memory devices can be used as specialized hardware for improving the resilience of the system • Future Work: • Memory usage modes to make applications efficient and maintain complete system state • Minimal overhead • Support more complex applications • Lightweight recovery mechanisms to work with the checkpointing schemes • Reduce downtime and rollback time • Intelligent policies that can help build resilient static and dynamic runtime system

Evaluating Performance of Burst Buffer Models for Real-Application Workloads in HPC Systems Harsh Khetawat Frank Mueller Christopher Zimmer

Introduction • Existing storage systems becoming bottleneck • Solution: burst buffers • Use burst buffers for: – Checkpoint/Restart I/O – Staging – Write-through cache for Burst Buffers on Cori parallel FS

Placement • Burst buffer placement: – Co-located with compute nodes (Summit) – Co-located with I/O nodes (Cori) – Separate set of nodes • Trade-offs in choice of placements – Capability – I/O models, staging, etc. – Predictability – Impact on shared resources, runtime variability – Economic – Infrastructure reuse, cost of storage device • I/O performance dependent on placement – Choice of network topology

Idea • Simulate network and burst buffer architectures – CODES simulation suite – Real-world I/O traces (Darshan) – Full multi-tenant system with mixed workloads (capability/capacity) – Supports network topologies – Local & external storage models • Combine network topologies and storage architectures • Performance under striping/protection schemes • Reproducible tool for HPC centers

Conclusion • Determine based on workload characteristics: – Burst buffer placement – Network topology – Performance of striping across burst buffers – Overhead of resilience schemes • Reproducible tool to: – Simulate specific workloads – Determine best fit

Thank You

Exploring Use-cases for Non-Volatile Memories in support of HPC - PowerPoint PPT Presentation

Exploring Use-cases for Non-Volatile Memories in support of HPC Resilience Onkar Patil 1 , Saurabh Hukerikar 2 , Frank Mueller 1 , Christian Engelmann 2 1 Dept. of Computer Science, North Carolina State University 2 Computer Science and Mathematics

Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring

Use Cases Use Cases Use Use cases cases 2003 Giorgini Information Acquisition -- 1 Basi

Encrypted Non-volatile Main Memory Systems Yu Hua Huazhong University of Science and Technology

Soft Updates Made Simple and Fast on Non-volatile Memory Mingkai Dong , Haibo Chen Institute of

Object-Oriented Recovery for Non-volatile Memory Nachshon Cohen, David Aksun, James Larus EPFL 10

Living Actor Living Actor Living Actor - Use Cases Living Actor - Use Cases Use Cases

Middle Grades/High School Exploring Change in the Number of Cases Middle Grades/High School

THE RESEARCH Assault 82,235 cases of assault on 64,519 cases Women Kidnapping women /

Report to JPS S PTC July 2019 2019 YTD Cases CASE TYPE 2018 # OF CASES 2019 YTD # OF CASES

Florida COVID-19 Cases COVID-19 Cases Hillsborough COVID-19 Cases Orange COVID-19 Cases

Silicones for Personal Care Personal Care Product Range Volatile Fluids Dimethicones Phenyl

Rethinking Applications in the NVM Era Amitabha Roy ex- Intel Research NVM = Non Volatile

Ziggurat: A Tiered File System for Non-Volatile Main Memories and Disks Shengan Zheng ,

NOVA-Fortis: A Fault-Tolerant Non- Volatile Main Memory File System Jian Andiry Xu, Lu Zhang ,

Architectural Support for Atomic Durability in Non-Volatile Memory Arpit Joshi , Vijay Nagarajan,

A Persistent Friedman Lock-Free Queue Maurice Herlihy for Non-Volatile Memory Virendra

15-721 ADVANCED DATABASE SYSTEMS Lecture #24 Non-Volatile Memory Databases Andy Pavlo / /

Model View Controller (MVC) Multi-window support Benefits of MVC Java Implementation Widgets 1

SECTOR SECTOR ADVISORY ADVISORY COU COUNCILS NCILS Small Business Large Business Labor and

eVET: Innovating vocational education and training Digital technology to support, enrich,

Cyber-Physical Systems Memory Architecture ICEN 553/453 Fall 2018 Prof. Dola Saha 1 Role of

Introduction to Computers and Programming Hardware Software Computer languages

NFSv4 ID Status Spencer Shepler shepler@eng.sun.com ID Updates (0406) / Definition

A Framework for Emulating Non-Volatile Memory Systems with Different Performance Characteristics