usage of dcache resilient pools for user code
play

Usage Of dCache Resilient Pools for User Code Distribution Recap - PowerPoint PPT Presentation

Usage Of dCache Resilient Pools for User Code Distribution Recap Analyzers and some production groups lost access to a subset of their frequently changing code located on Bluearc and have to download files from dCache scratch area. The


  1. Usage Of dCache Resilient Pools for User Code Distribution

  2. Recap ● Analyzers and some production groups lost access to a subset of their frequently changing code located on Bluearc and have to download files from dCache scratch area. The thousand of jobs they started on the Grid tried to simultaneously access the same file. Bluearc access were controlled by locks (5 per experiment). ● After we dismounted BlueArc (1/18/2018) users could run anywhere. Jobs started on the OSG are downloading the same files but with much lower transfer rates, creating even more complications for jobs that are waiting access to the same pool. ● Temporary Solution (part 1) ○ We attempted to parallelize file access by distributing file replicas across many pools utilizing dCache file Resilient pool feature. ○ Each file under /pnfs/<experiment>/resilient was set to be replicated 20 times across existing readWritePools. ○ Asked user to store files on dCache resilient pool group. ● Temporary Solution (part 2) ○ Implemented JobSub feature that allows to upload tar files to a special area on resilient pools and handles clean up.

  3. Current Status Resilient Scratch Number of tar files (code) pulled from dCache during last 7 days by experiment (scratch vs resilient) 180k 220k Direct upload Upload via jobsub Number of tar files (code) pulled from dCache during last 7 days by experiment (direct upload vs jobsub upload)

  4. Clean Up Issues Most of the experiment don’t use jobsub feature and don’t clean up the area Tar files in resilient areas by experiment (old == didn’t access during last 3 months) Factor of 20! Most of the users keep their code but some users keep LOG.TAR files at resilient pools!

  5. Moving Forward Proposal ● Contact experiment’s liaison with the list of users/old tar files and request clean up. ● ● Continue to push users to use jobsub feature. ● Start deleting files that are older than a month. (We cannot really do it ourselves) ● Drop factor of 20 replication to 10. ● Ultimate goal is to move to Rapid Code Distribution Service

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend