xrootd monitoring report
play

XRootD Monitoring Report A.Beche D.Giordano Outlines Talk 1: - PowerPoint PPT Presentation

XRootD Monitoring Report A.Beche D.Giordano Outlines Talk 1: XRootD Monitoring Dashboard Context Dataflow and deployment model Database: storage & aggregation User interface & use cases Open issues & future


  1. XRootD Monitoring Report A.Beche D.Giordano

  2. Outlines  Talk 1: XRootD Monitoring Dashboard  Context  Dataflow and deployment model  Database: storage & aggregation  User interface & use cases  Open issues & future work  Summary  Talk 2: Beyond XRootD monitoring  HTTP/WebDAV integration  Integration in the WLCG Transfers Dashboard 10 – April - 14 2 A.Beche – Federated Workshop

  3. XRootD federation monitoring  Activity started during summer 2012  4 sites for FAX, 11 for AAA Number of sites reporting 45 Monitoring data increased 40 accordingly 35 30 # sites July 2012 March 2014 25 20 15 AAA 606k 43M 10 5 FAX 15k 22M 0 10 – April - 14 3 A.Beche – Federated Workshop

  4. Why monitoring ?  Understand data flows to estimate data traffic  Provide information for efficient operations  Identify access patterns and propose data placement strategies 10 – April - 14 4 A.Beche – Federated Workshop

  5. XRootD monitoring dataflow real time asynchronous Federation stomp GLED Consumer ActiveMQ Raw Collector stomp UDP 10 minutes External applications WEB Stats API Dashboard UI 10 – April - 14 5 A.Beche – Federated Workshop

  6. GLED Deployment model FAX EU FAX US Federation monitoring data rate CERN (1 site) SLAC (9Hz) 20 15 Hz 10 EOS 5 CERN (150Hz) 0 AMQ @ CERN Shared cluster EOS monitoring data rate 5 nodes AAA 200 UCSD (16Hz) 150 Hz 100 50 0 EOS CERN (10Hz) 10 – April - 14 6 A.Beche – Federated Workshop

  7. Consolidated dataflow  Two usage of these raw data:  Dashboard monitoring  XRootD popularity  Now share the same database:  Storage optimization  Consistency guaranteed 10 – April - 14 7 A.Beche – Federated Workshop

  8. Database AAA Database usage growth* 700 ~300 GB 600 ~1B records 500 400 GB Daily insert 300 2 GB / 6M rows 200 100 FAX 0 ~600 GB ~2B records * Indexes excluded  Storage  Raw, statistics, metadata  Tables daily partitioned , no global indexes 10 – April - 14 8 A.Beche – Federated Workshop

  9. Database  Raw data aggregation:  Done using PL/SQL procedures  Events are unordered  Stateless: Full re-computation of touched bins each time  Compute stats from raw data in 10 min bins  Aggregate 10 min stats in daily bins 10 – April - 14 9 A.Beche – Federated Workshop

  10. Aggregation methods Transfers Easy method 2pm 3pm 4pm 5pm 6pm 7pm Transfers 1 0 0 2 1 Bytes 10 0 0 15 20 10 – April - 14 10 A.Beche – Federated Workshop

  11. Aggregation methods Transfers Easy method 2pm 3pm 4pm 5pm 6pm 7pm Transfers 1 0 0 2 1 Bytes 10 0 0 15 20 Adopted method Transfers 1 (1) 1 (0) 2 (0) 3 (2) 1 (1) Bytes 8 1 14 (9+6) 15 (1+9+5) 5 10 – April - 14 11 A.Beche – Federated Workshop

  12. Visualization Interface 10 – April - 14 12 A.Beche – Federated Workshop

  13. Pre-defined set of views 10 – April - 14 13 A.Beche – Federated Workshop

  14. Use case example Understand site access patterns 1. Which sites are reading from FNAL 2. Zoom to a specific site to 1 understand which users are reading 3. Understand which files are read by a user 2 2 3 10 – April - 14 14 A.Beche – Federated Workshop

  15. Data popularity  XRootD monitoring provides information about file access patterns:  Including non official collections (ie: user files)  Contribute to simplify and make more efficient the usage of disk resources  Popularity data analytics built on this information:  Adopted already for CMS-EOS  will be extended to full AAA 10 – April - 14 15 A.Beche – Federated Workshop

  16. Archive recommendation for CMS-EOS  Help to manage the disk space of EOS including user space  No central bookkeeping system % TB  Unused files: created > 4 months ago, no access in the last 3 months:  ~500 TB of space occupied and not used <=> 30% of total for these areas 10 – April - 14 16 A.Beche – Federated Workshop

  17. Open issues  Missing servers:  Dcache sites  Server should provide their site name.  CMS: only 5 sites:  anon, BUDAPEST, Hephy-Vienna, T2_US_USCD, UKI-LT2-Brunel  Not coherent convention naming  ATLAS: GLED RPM to be deployed  GLED Collector improvements:  Reliability of the service:  Recover time, can be long due to time difference  GLED should be operated as a production service  Scalability:  to be fixed with automatic reconnection soon 10 – April - 14 17 A.Beche – Federated Workshop

  18. Future work  Strong requirement from ATLAS to understand efficiency:  Need the concept of error / failure  How XRootD server could be instrumented to report it?  European GLED collector is up and running:  Only 1 pilot site is reporting to it (CNAF)  Should we keep it?  Data mining activity (not started yet):  Almost 2 years of raw data (1TB) 10 – April - 14 18 A.Beche – Federated Workshop

  19. Data Mining  Extract further knowledge from the data…  Detect inefficiencies  Propose deletion strategies  Define data placement  … by  Understand access patterns and data usage  Correlate data traffic and data access performance  Possibility to automate some operations 10 – April - 14 19 A.Beche – Federated Workshop

  20. Application usage FAX AAA 30 20 15 10 10 – April - 14 20 A.Beche – Federated Workshop

  21. Summary  Monitoring federations is a challenge  High rate of traffic & information  Challenge met by data aggregation, scalable technologies  Dashboard is not actively used  Less than 10 daily users (FAX), less than 15 (AAA)  Is there any missing functionalities?  Improvement work is ongoing  New requests are coming  XRootD monitoring is a one piece of the entire Data transfers puzzle  See next talk 10 – April - 14 21 A.Beche – Federated Workshop

  22. Beyond XRootD monitoring A.Beche D.Giordano

  23. Outlines  Talk 1: XRootD Monitoring Dashboard  Context  Dataflow and deployment model  Database: storage & aggregation  User interface & use cases  Open issues & future work  Summary  Talk 2: Beyond XRootD monitoring  HTTP/WebDAV integration  Integration in the WLCG Transfers Dashboard 10 – April - 14 23 A.Beche – Federated Workshop

  24. HTTP Federation is coming  HTTP protocol will be used in the future  XRootD servers can be accessed  See Fabrizio’s presentation on xrdhttp  Two kind of accesses:  Pure HTTP access (through Apache)  HTTP gate to XRootD server  Can’t be monitor in the same way 10 – April - 14 24 A.Beche – Federated Workshop

  25. Monitoring XRootD access protocol  XRootD 4 will now reports the user protocol:  All the monitoring chain needs to be updated  Dashboard DB and UI are fully ready HTTP XRootD 10 – April - 14 25 A.Beche – Federated Workshop

  26. HTTP/WebDAV federation monitoring XRootD Federation Site Site XRootD SE JOB GLED collector ActiveMQ 10 – April - 14 26 A.Beche – Federated Workshop

  27. HTTP/WebDAV federation monitoring XRootD Federation HTTP Federation Site Site Site XRootD SE JOB GLED collector ActiveMQ 10 – April - 14 27 A.Beche – Federated Workshop

  28. HTTP/WebDAV federation monitoring XRootD Federation HTTP Federation Site Site Site Xrd JOB XRootD HTTP SE JOB GLED collector ActiveMQ 29 November 2013 28 Alexandre Beche - ITTF

  29. HTTP/WebDAV federation monitoring XRootD Federation HTTP Federation Site Site Site Xrd JOB XRootD HTTP SE JOB Apache JOB GLED collector ActiveMQ 10 – April - 14 29 A.Beche – Federated Workshop

  30. HTTP/WebDAV federation monitoring XRootD Federation HTTP Federation Site Site Site Xrd JOB XRootD HTTP SE JOB Apache JOB GLED ? collector ActiveMQ 10 – April - 14 30 A.Beche – Federated Workshop

  31. How to compare data from different applications? 10 – April - 14 31 A.Beche – Federated Workshop

  32. data transfers & accesses monitoring tools WEB WEB WEB API / UI API/UI API/UI WLCG FAX AAA EOS EOS FTS FAX AAA 10 – April - 14 32 A.Beche – Federated Workshop

  33. WLCG Transfers Dashboard federated approach WLCG Transfers Dashboard API / UI WEB WEB WEB API/UI API/UI API / UI FAX AAA FTS EOS EOS FAX AAA FTS 10 – April - 14 33 A.Beche – Federated Workshop

  34. Some plots FTS XRootD ALTAS LHCb CMS ALICE 10 – April - 14 34 A.Beche – Federated Workshop

  35. Summary  Lots of effort has been put in XRootD monitoring workflow and dashboard in the last 2 years  Reliable system achieved  Lots of use cases covered  HTTP Monitoring already started  Will require a lot of effort to reach XRootD monitoring level  New WLCG Transfers Dashboard architecture  Highly extensible system  Cross-VO or cross-technology analysis 10 – April - 14 35 A.Beche – Federated Workshop

  36. Credits  Andreeva Julia  Cons Lionel  Giordano Domenico  Saiz Pablo  Tadel Matevz  Tuckett David  Vukotic Ilija  The AAA and FAX deployment team  …. 10 – April - 14 36 A.Beche – Federated Workshop

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend