investigation into file size distribution and its effect
play

Investigation into file size distribution and its effect on disk - PowerPoint PPT Presentation

Investigation into file size distribution and its effect on disk server performance Brain Davies STFC Rutherford Appleton Laboratory Your university or experiment logo here The Issue and Problem Site needs to drain hardware for


  1. Investigation into file size distribution and its effect on disk server performance Brain Davies STFC Rutherford Appleton Laboratory Your university or experiment logo here

  2. The Issue and Problem • Site needs to drain hardware for replacement. – For ATLAS@RAL 50 servers / 2PB • Noticed draining rates varied vastly by VO – Regular network line rate For CMS/LHCb, much lower for ATLAS – ~1 day per server for LHCb/CMS 800MB/s for 10Gb NIC – 2-3 weeks for ATLAS • From draining tool (within Castor SE ) noticed vastly more files per disk partition on server for ATLAS compared to CMS/LHCb – We think the problem might be with SE being to cope with too many candidate files for draining dramatically effects draining efficiency – ~5k files per partition for CMS/LHCb; ~125k for ATLAS Your university or experiment logo here

  3. File size Comparison per VO (single server comparison) LHCb CMS ATLAS ATLAS ATLAS VO LHCb CMS ATLAS ATLAS ATLAS VO All All All non-Log files non-Log Log files sub section sub section All All All Log files files # Files 16305 14717 396887 181799 215088 # Files 16305 14717 396887 181799 215088 37.565 39.599 37.564 35.501 2.062 Size (TB) Size (TB) 37.565 39.599 37.564 35.501 2.062 1 24 75 75 0 # Files > 10 GB # Files > 10 GB 1 24 75 75 0 8526 11902 9683 9657 26 # Files > 1GB # Files > 1GB 8526 11902 9683 9657 26 4434 2330 387204 134137 211381 # Files < 100MB # Files < 100MB 4434 2330 3E+06 134137 3E+06 # Files < 10MB # Files < 10MB 2200 2200 569 569 265464 265464 68792 68792 196672 196672 # Files < 1MB 1429 294 85190 20587 64603 1429 294 85190 20587 64603 # Files < 1MB # Files < 100kB 243 91 6693 2124 4569 243 91 6693 2124 4569 # Files < 100kB # Files < 10kB 6 13 635 156 479 6 13 635 156 479 # Files < 10kB Ave File size (GB) 2.30 2.69 0.0946 0.195 0.00959 2.3 2.69 0.0946 0.195 0.00959 Ave. File size (GB) % space used by files > 1GB 96.71 79.73 64.56 % space used by files > 96.71 79.73 64.56 1GB Your university or experiment logo here

  4. Single scope investigation for ATLAS • ATLAS scope rather the disk server analysis – Srm://srm- atlas.gridpp.rl.ac.uk/castor/ads.rl.ac.uk/prod/atlas/stripInput/atlas datadisk/rucio/* • Total number of files 3670322 / 590.7 TB • Total number of log files 109025 / 4.254 TB • Log files (* log.tgz *) ~30% of the files in the scope – Represents 0.7% of disk space used by scope (4.254TB/590.7TB) • Mean file size of the log files is 3.9MB • Median file size of log files is 2.3MB. – Log file size varies from 6kB to 10GB – Removal of log files from scope would: • Mean file size increases from 161MB to 227MB • Median file size almost doubles from 22.87MB to 45.63MB. Your university or experiment logo here

  5. Further single scope file analysis Volume(TB) # Files Filename Mean file size (MB) 38.224 11749 ESD 3253.383267 160.1 185706 AOD 862.1153867 4.449 6401 DAOD 695.0476488 235.012 381327 NTUP 616.3004455 5.932 12831 RAW 462.317824 55.584 584883 DESD_ 95.03439149 55.462 765206 DESDM 72.47982896 23.646 447134 HIST 52.88347565 7.868 165557 DRAW_ 47.52441757 0.2 19277 TAG 10.37505836 4.254 1090251 log 3.901853793 • Only 127558/2580071 files > 1GB (5%) • Min/Max size= 280B / 10.48GB Your university or experiment logo here

  6. Possible solutions • Fix/change Castor – Improve draining process • CASTOR upgrade in progress at RAL so problem maybe alleviated • tar/Concatenate log.tgz files into larger files – Would need ATLAS experts to evaluate issues which this might cause • Have separate scopes for log files and data – Non-mixing data and log files may lead to increased storage I/O rates and better maintenance IF site could separate storage for scopes • May require more planning of separate storage solutions. • Other options??? – Next step to analyse a MC scope to see if better or worse than data. – See if diskservers@T2/T2D follow similar pattern • Different workflows may make a big difference. Your university or experiment logo here

  7. Thank You • Brian.Davies@stfc.ac.uk • http://gridpp-storage.blogspot.co.uk/2014/04/how-much- of-small-file-problem-do-we.html Your university or experiment logo here

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend