large scale research data management ul hpc
play

Large-scale Research Data Management @ UL HPC Road to GDPR - PowerPoint PPT Presentation

Large-scale Research Data Management @ UL HPC Road to GDPR compliance Prof. Pascal Bouvry, Dr. Sebastien Varrette V. Plugaru, S. Peter, H. Cartiaux & C. Parisot Belval Campus, April 25 th , 2018 University of Luxembourg (UL), Luxembourg S.


  1. Large-scale Research Data Management @ UL HPC Road to GDPR compliance Prof. Pascal Bouvry, Dr. Sebastien Varrette V. Plugaru, S. Peter, H. Cartiaux & C. Parisot Belval Campus, April 25 th , 2018 University of Luxembourg (UL), Luxembourg S. Varrette & the UL HPC Team (UL) Large-scale Research Data Management @ UL HPC 1 / 21 �

  2. Introduction Summary 1 Introduction 2 [GDPR] Challenges in a Data Intensive Research 3 Conclusion S. Varrette & the UL HPC Team (UL) Large-scale Research Data Management @ UL HPC 2 / 21 �

  3. Introduction Why HPC and BD ? HPC : H igh P erformance C omputing BD : B ig D ata Andy Grant, Head of Big Data and HPC, Atos UK&I To out-compete you must out-compute Increasing competition, heightened customer expectations and shortening product development cycles are forcing the pace of acceleration across all industries. S. Varrette & the UL HPC Team (UL) Large-scale Research Data Management @ UL HPC 3 / 21 �

  4. Introduction Why HPC and BD ? HPC : H igh P erformance C omputing BD : B ig D ata Essential tools for Science, Society and Industry → All scientific disciplines are becoming computational today ֒ � requires very high computing power, handles huge volumes of data Industry, SMEs increasingly relying on HPC → to invent innovative solutions ֒ → . . . while reducing cost & decreasing time to market ֒ Andy Grant, Head of Big Data and HPC, Atos UK&I To out-compete you must out-compute Increasing competition, heightened customer expectations and shortening product development cycles are forcing the pace of acceleration across all industries. S. Varrette & the UL HPC Team (UL) Large-scale Research Data Management @ UL HPC 3 / 21 �

  5. Introduction Why HPC and BD ? HPC : H igh P erformance C omputing BD : B ig D ata Essential tools for Science, Society and Industry → All scientific disciplines are becoming computational today ֒ � requires very high computing power, handles huge volumes of data Industry, SMEs increasingly relying on HPC → to invent innovative solutions ֒ → . . . while reducing cost & decreasing time to market ֒ HPC = global race (strategic priority) - EU takes up the challenge: → EuroHPC / IPCEI on HPC and Big Data (BD) Applications ֒ Andy Grant, Head of Big Data and HPC, Atos UK&I To out-compete you must out-compute Increasing competition, heightened customer expectations and shortening product development cycles are forcing the pace of acceleration across all industries. S. Varrette & the UL HPC Team (UL) Large-scale Research Data Management @ UL HPC 3 / 21 �

  6. Introduction Different HPC Needs per Domains Material Science & Engineering #Cores Network Bandwidth Flops/Core Network Latency Storage Capacity I/O Performance S. Varrette & the UL HPC Team (UL) Large-scale Research Data Management @ UL HPC 4 / 21 �

  7. Introduction Different HPC Needs per Domains Biomedical Industry / Life Sciences #Cores Network Bandwidth Flops/Core Network Latency Storage Capacity I/O Performance S. Varrette & the UL HPC Team (UL) Large-scale Research Data Management @ UL HPC 4 / 21 �

  8. Introduction Different HPC Needs per Domains Deep Learning / Cognitive Computing #Cores Network Bandwidth Flops/Core Network Latency Storage Capacity I/O Performance S. Varrette & the UL HPC Team (UL) Large-scale Research Data Management @ UL HPC 4 / 21 �

  9. Introduction Different HPC Needs per Domains IoT, FinTech #Cores Network Bandwidth Flops/Core Network Latency Storage Capacity I/O Performance S. Varrette & the UL HPC Team (UL) Large-scale Research Data Management @ UL HPC 4 / 21 �

  10. Introduction Different HPC Needs per Domains Deep Learning / Cognitive Computing Biomedical Industry / Life Sciences Material Science & Engineering IoT, FinTech ALL Research Computing Domains #Cores Network Bandwidth Flops/Core Network Latency Storage Capacity I/O Performance S. Varrette & the UL HPC Team (UL) Large-scale Research Data Management @ UL HPC 4 / 21 �

  11. Introduction High Performance Computing @ UL Started in 2007 , under resp. of Prof P. Bouvry & Dr. S. Varrette → expert UL HPC team (S. Varrette, V. Plugaru, S. Peter, H. Cartiaux, C. Parisot) ֒ → 8,173,747 e cumulative investment in hardware ֒ Key numbers 469 users 662 computing nodes → 10132 cores, 346.652 TFlops ֒ → 50 accelerators ( + 76.22 TFlops ) ֒ 9232.4 TB storage 130 (+ 71) servers 5 sysadmins 2 sites: Kirchberg / Belval http://hpc.uni.lu S. Varrette & the UL HPC Team (UL) Large-scale Research Data Management @ UL HPC 5 / 21 �

  12. Introduction Sites / Data centers Kirchberg Belval Biotech I, CDC/MSA CS.43, AS. 28 2 sites, ≥ 4 server rooms S. Varrette & the UL HPC Team (UL) Large-scale Research Data Management @ UL HPC 6 / 21 �

  13. Introduction Sites / Data centers Kirchberg Belval Biotech I, CDC/MSA CS.43, AS. 28 2 sites, ≥ 4 server rooms S. Varrette & the UL HPC Team (UL) Large-scale Research Data Management @ UL HPC 6 / 21 �

  14. Introduction UL HPC Computing capacity 5 clusters 346.652 TFlops 662 nodes 10132 cores 34512GPU cores S. Varrette & the UL HPC Team (UL) Large-scale Research Data Management @ UL HPC 7 / 21 �

  15. Introduction UL HPC Storage capacity 4 distributed/parallel FS 2183 disks 9232.4 TB (incl. 2116TB for Backup) S. Varrette & the UL HPC Team (UL) Large-scale Research Data Management @ UL HPC 8 / 21 �

  16. Introduction [Big]Data Management: FS Summary File System (FS) : Logical manner to store, organize & access data → (local) Disk FS : FAT32 , NTFS , HFS+ , ext4 , {x,z,btr}fs . . . ֒ → Networked FS : NFS , CIFS / SMB , AFP ֒ → Parallel/Distributed FS : SpectrumScale/GPFS , Lustre ֒ � typical FS for HPC / HTC (High Throughput Computing) S. Varrette & the UL HPC Team (UL) Large-scale Research Data Management @ UL HPC 9 / 21 �

  17. Introduction [Big]Data Management: FS Summary File System (FS) : Logical manner to store, organize & access data → (local) Disk FS : FAT32 , NTFS , HFS+ , ext4 , {x,z,btr}fs . . . ֒ → Networked FS : NFS , CIFS / SMB , AFP ֒ → Parallel/Distributed FS : SpectrumScale/GPFS , Lustre ֒ � typical FS for HPC / HTC (High Throughput Computing) Main Characteristic of Parallel/Distributed File Systems Capacity and Performance increase with #servers S. Varrette & the UL HPC Team (UL) Large-scale Research Data Management @ UL HPC 9 / 21 �

  18. Introduction [Big]Data Management: FS Summary File System (FS) : Logical manner to store, organize & access data → (local) Disk FS : FAT32 , NTFS , HFS+ , ext4 , {x,z,btr}fs . . . ֒ → Networked FS : NFS , CIFS / SMB , AFP ֒ → Parallel/Distributed FS : SpectrumScale/GPFS , Lustre ֒ � typical FS for HPC / HTC (High Throughput Computing) Main Characteristic of Parallel/Distributed File Systems Capacity and Performance increase with #servers Name Type Read* [GB/s] Write* [GB/s] ext4 Disk FS 0.426 0.212 nfs Networked FS 0.381 0.090 gpfs (iris) Parallel/Distributed FS 11.25 9,46 lustre (iris) Parallel/Distributed FS 12.88 10,07 gpfs (gaia) Parallel/Distributed FS 7.74 6.524 lustre (gaia) Parallel/Distributed FS 4.5 2.956 ∗ maximum random read/write, per IOZone or IOR measures, using concurrent nodes for networked FS. S. Varrette & the UL HPC Team (UL) Large-scale Research Data Management @ UL HPC 9 / 21 �

  19. [GDPR] Challenges in a Data Intensive Research Summary 1 Introduction 2 [GDPR] Challenges in a Data Intensive Research 3 Conclusion S. Varrette & the UL HPC Team (UL) Large-scale Research Data Management @ UL HPC 10 / 21 �

  20. [GDPR] Challenges in a Data Intensive Research Data Intensive Computing Data volumes increasing massively → Clusters, storage capacity increasing massively ֒ Disk speeds are not keeping pace. Seek speeds even worse than read/write S. Varrette & the UL HPC Team (UL) Large-scale Research Data Management @ UL HPC 11 / 21 �

  21. [GDPR] Challenges in a Data Intensive Research Speed Expectation on Data Transfer http://fasterdata.es.net/ S. Varrette & the UL HPC Team (UL) Large-scale Research Data Management @ UL HPC 12 / 21 �

  22. [GDPR] Challenges in a Data Intensive Research Speed Expectation on Data Transfer http://fasterdata.es.net/ S. Varrette & the UL HPC Team (UL) Large-scale Research Data Management @ UL HPC 12 / 21 �

  23. [GDPR] Challenges in a Data Intensive Research ULHPC Storage Performances: GPFS Self Encrypting Disks (SED)-based storage S. Varrette & the UL HPC Team (UL) Large-scale Research Data Management @ UL HPC 13 / 21 �

  24. [GDPR] Challenges in a Data Intensive Research ULHPC Storage Performances: Lustre Self Encrypting Disks (SED)-based storage 13000 12000 11000 10000 I/O bandwidth (MB/s) 9000 8000 7000 6000 5000 4000 3000 2000 Write, filesize 48G, 2 threads / node, blocksize 16M 1000 Read, filesize 48G, 2 threads / node, blocksize 16M 0 0 16 32 48 64 80 96 112 128 Number of nodes S. Varrette & the UL HPC Team (UL) Large-scale Research Data Management @ UL HPC 14 / 21 �

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend