improving i o performance of hpc applications using intra
play

Improving I/O Performance of HPC Applications Using Intra-Job - PowerPoint PPT Presentation

Improving I/O Performance of HPC Applications Using Intra-Job Scheduling Arnab K. Paul , Olaf Faaland , Adam Moody , Elsa Gonsiorowski , Kathryn Mohror , Ali R. Butt Virginia Tech , Lawrence Livermore National


  1. Improving I/O Performance of HPC Applications Using Intra-Job Scheduling Arnab K. Paul † , Olaf Faaland ‡ , Adam Moody ‡ , Elsa Gonsiorowski ‡ , Kathryn Mohror ‡ , Ali R. Butt † † Virginia Tech , ‡ Lawrence Livermore National Laboratory PDSW-DISCS 2019; collocated with SC’19, Denver, CO

  2. Motivation: The Increasing Gap Processor Performance vs Disk Access Time 2 https://newsroom.intel.com/editorials/3d-xpoint-memory-storage/#gs.gqtcop

  3. Motivation I/O operations become a limiting factor in application efficiency. Processor Performance vs Disk Access Time 3 https://newsroom.intel.com/editorials/3d-xpoint-memory-storage/#gs.gqtcop

  4. Motivation I/O operations become a limiting factor in application efficiency. Improve I/O Performance of HPC Applications Using Intra-Job Scheduling Processor Performance vs Disk Access Time 4 https://newsroom.intel.com/editorials/3d-xpoint-memory-storage/#gs.gqtcop

  5. Lustre Parallel File System Lustre Clients . . . Management Server (MGS) Management Ethernet or Infiniband Network Target (MGT) Metadata Server (MDT) Metadata direct, Target (MDT) parallel file access DNE Metadata . . . Servers and Metadata Object Storage Servers and Targets (OSS & OSTs) Targets . . . . . . 5

  6. System Design Job Statistics Machine Learning Validation Dataset Modeling Models are stored 6

  7. System Design Currently Model running jobs DB New jobs Job scheduler Current and new jobs’ future requests 7

  8. Preliminary Results • Built a Lustre Simulator on NS3. • Results from time-series modeling show an accuracy of 95% in predicting job write bursts. 8

  9. Next Steps • Modify the scheduler to reduce I/O contention. • Measure the I/O performance of the jobs as well as the overall performance of the system. 9

  10. Thank You! Q & A akpaul@vt.edu http://research.cs.vt.edu/dssl/ 10

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend