jun he 1 huaiming song 1 xian he sun 1 yanlong yin 1
play

Jun He 1 , Huaiming Song 1 , Xian-He Sun 1 , Yanlong Yin 1 , Rajeev - PowerPoint PPT Presentation

Pattern-Aware File Reorganization in MPI-IO Jun He 1 , Huaiming Song 1 , Xian-He Sun 1 , Yanlong Yin 1 , Rajeev Thakur 2 1: Illinois Institute of Technology, Chicago, Illinois 2: Argonne National Laboratory, Argonne, Illinois PDSW11 Outline


  1. Pattern-Aware File Reorganization in MPI-IO Jun He 1 , Huaiming Song 1 , Xian-He Sun 1 , Yanlong Yin 1 , Rajeev Thakur 2 1: Illinois Institute of Technology, Chicago, Illinois 2: Argonne National Laboratory, Argonne, Illinois PDSW’11

  2. Outline Motivation • Examples o Basic idea o Design • System Overview o Trace collecting o Pattern classification o I/O Trace analyzer o Remapping table o MPI-IO remapping layer o Evaluation • Remapping overhead o Pattern variation o Benchmarks o Conclusion & Future Work • PDSW’11

  3. Motivation PDSW’11

  4. Parallel File Systems A typical parallel file system • Important Factors Network overhead IOPS o Number of requests Locality o Contiguousness of accesses … PDSW’11

  5. Mismatch • Logical data o Developer’s understanding, for programmability and runtime performance o -> Logical organization -> Access pattern • Physical data o Where the data blocks are stored o -> Physical data organization Good logical organization != Good physical organization for better I/O performance PDSW’11

  6. A Tiny Example for Irregular Data 0 1 2 3 4 5 6 7 8 9 Programmer’s view Also file system’s view 3 5 8 7 4 2 1 0 9 6 Potential benefit: Better spatial locality Easier for some optimization to take effect Less disk head movements … PDSW’11

  7. An Example for Regular 2-d Array Default Organization A 2-D array PDSW’11

  8. Read a Subarray A 2-D array PDSW’11

  9. After Re-organizing PDSW’11

  10. A Messier One • Irregular data • Very complex data model • Computation which involves multiple data fields PDSW’11

  11. Pattern-Aware Reorganization Be aware of repeating non-contiguous access patterns • n-d strided and irregular o Try to reorganize the data so that data is contiguous. • Less network overhead o Less IO operations o Better locality o Beneficial for other optimizations, e.g. data sieving… o Motivating Scenarios • Application start-up o Data analysis, visualization o … o Where it does not apply • Patterns do not repeat from run to run. o PDSW’11

  12. Design PDSW’11

  13. System Overview Application Remapping MPI-IO Table Remapping Layer I/O Trace I/O Traces Analyzer I/O Client PDSW’11

  14. Trace Collecting • Wrap the original function call Add recording function o Call original function inside o • Process ID, MPI rank, file path, type of operation, offset, length, data type, time stamp, and file view Application Remapping MPI-IO Table Remapping Layer I/O Trace I/O Traces Analyzer I/O Client PDSW’11

  15. Pattern Classification Request Size Spatial Pattern   Fixed Small  Contiguous   Variable Medium  Non-contiguous  Large  Fixed strided Repetition  2d-strided  Single occurrence  Negative strided  Repeating  Random strided I/O Operation Temporal Intervals  kd-strided   Read only Fixed  Combination of contiguous and   Write only Random non-contiguous patterns  Read/write PDSW’11

  16. I/O Trace Analyzer • Pattern matching Sort Traces by time o Separate by process o Find out patterns o • I/O Signature {I/O operation, initial position, dimension, ([{offset Pattern}, {request size pattern}, {pattern of number of repetitions}, {temporal pattern}], [...]), # of repetitions} Application Remapping MPI-IO Table Remapping Layer I/O Trace I/O Traces Analyzer I/O Client PDSW’11

  17. I/O-signature-based Remapping Table Old New File, {MPI_READ, offset0, 1, Offset0’ ([(hole size, 1), LEN, 1]), 4} Example, 1-d strided LEN LEN LEN LEN Offset 0 Offset 1 Offset 2 Offset 3 Offset 0' Offset 1' Offset 2' Offset 3' Application Remapping MPI-IO Table Remapping Layer I/O Trace I/O Traces Analyzer I/O Client PDSW’11

  18. MPI-IO Remapping Layer • Convert old offsets to new ones Example: • Read m bytes data from offset f . • Whether this access falls in a 1-d strided pattern ? starting offset off o Application read size rsz o Remapping MPI-IO hole size hsz Table o Remapping Layer number of accesses of this pattern n I/O Trace o I/O Traces Analyzer I/O Client • (f-off)/(rsz+hsz) <n (1) • (f-off)%(rsz+hsz) = 0 (2) • m = rsz (3) newoff = off+rsz*(f-off)/(rsz+hsz) PDSW’11

  19. Evaluation PDSW’11

  20. Experiment Environment • Dual 2.3GHz Opteron quad-core processors • 8G memory • 250GB 7200RPM SATA hard drive • 100GB PCI-E OCZ Revodrive X2 SSD (read: up to 740 MB/s, write: up to 690 MB/s). Ethernet/Infiniband • Ubuntu 9.04 (Linux kernel 2.6.28-11-server) • PVFS2 2.8.1: stripe size 64 KB • MPICH2 1.3.1 • PDSW’11

  21. Remapping Overhead 1-D Strided Remapping Table Performance (1,000,000 accesses) Table Type Size (bytes) Building time Time of (sec) 1,000,000 lookups (sec) 1-to-1 64,000,000 0.780287 0.489902 I/O Signature 28 0.000000269 0.024771 Who use 1-to-1: PLFS uses 1-to-1 mapping table in index file. Most OS file systems also use similar table to store free blocks in disk. PDSW’11

  22. Request Size Variation • X: different of request size. For example, 5% means the actual request size is 5% less than the one assumed. PDSW’11

  23. Variation of Starting Offset • X: difference of starting offsets. 5% means that the starting offset moved to the 5%th of the whole access. PDSW’11

  24. R/W Performance – on IOR 4 I/O clients, 4 I/O servers. 64 processes with HDD and Infiniband • PDSW’11

  25. Performance on MPI- TILE-IO 4 I/O clients, 4 I/O servers. 64 processes with HDD and Infiniband. • Elements in a tile: 1024x1024. PDSW’11

  26. Performance on MPI- TILE-IO with SSD 4 I/O clients, 4 I/O servers. 64 processes with SSD and Infiniband. • Elements in a tile: 1024x1024. PDSW’11

  27. Conclusion & Future Work Conclusion Different file organizations lead to very different • performance. Bridging logical data and physical data • Access pattern -> better organization -> better performance Future Work Multiple replicas with different organizations. • More complicated access patterns, patterns with hints • File reorganization for emerging storage medias, such as • SSD PDSW’11

  28. Acknowledgement • Hui Jin and Spenser Gilliland (Illinois Institute of Technology) • Ce Yu (Tianjin University, China) • Samuel Lang (Argonne National Laboratory) • NSF grant CCF-0621435, CCF-0937877 • Office of Advanced Scientific Computing Research, Office of Science, U.S. DOE, under Contract DEAC02-06CH11357. Thanks! PDSW’11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend