the state of fife monitoring accounting
play

The State of FIFE Monitoring & Accounting Kevin Retzke FIFE - PowerPoint PPT Presentation

The State of FIFE Monitoring & Accounting Kevin Retzke FIFE Workshop 20 th -21 st June 2016 Fifemon is a comprehensive monitoring Users platform for all Experiment Service FIFE experiments, Stakeholders Providers services, and


  1. The State of FIFE Monitoring & Accounting Kevin Retzke FIFE Workshop 20 th -21 st June 2016

  2. Fifemon is a comprehensive monitoring Users platform for all Experiment Service FIFE experiments, Stakeholders Providers services, and Management stakeholders https://fifemon.fnal.gov/monitor 2 FIFE Monitoring FIFE Workshop 2016

  3. The Landscape program supports the development of unified comprehensive monitoring for Scientific Computing. HEP cloud https://landscape.fnal.gov 3 FIFE Monitoring FIFE Workshop 2016

  4. State of FIFE Monitoring 2015 Fermigrid Fifemon 4 FIFE Monitoring FIFE Workshop 2016

  5. A New Monitoring Paradigm • Leverage open-source monitoring technology • Focus on incorporating new data sources and new dashboards • Rapid development and iteration of tailored views for each target audience 5 FIFE Monitoring FIFE Workshop 2016

  6. Fifemon Architecture Grafana Time-series Fifebatch GPGrid Aggregations CMS Tier 1 Graphite CMS LPC HEP Cloud Probes Collect: ● Job Details ● Slot Details ● System Metrics Kibana Data handling ● Event Logs dCache BlueArc Postgres Elasticsearch more... Raw Documents 6 FIFE Monitoring FIFE Workshop 2016

  7. Statistics Unique Users per Day 250 users 15 data sources 25 50 dashboards 280K total metrics Dashboard Loads per Day 500K datapoints per hour 70K log events per hour 150 7 FIFE Monitoring FIFE Workshop 2016

  8. Usage Requests by Group Top Dashboards Exp. Summ. 6% Exp. Eff. 6% Exp. Management FIFE Overview 5% 20% 8% User Batch Exp Users Details Batch Production 50% Details 65% 10% 30% Statistics based on dashboard requests in last 60 days. 8 FIFE Monitoring FIFE Workshop 2016

  9. Upcoming Features Near-Term Long-Term • Federated SSO Auth • Realtime job updates • Grafana v3 • Email reports • Completed job details & • User and experiment areas resource usage • Alerting • dCache Preview and test: • Running job logs https://fifemon-pp.fnal.gov • Outage notices and logs What do you want to see? • Dashboard improvements https://fermi.service-now.com 9 FIFE Monitoring FIFE Workshop 2016

  10. Beyond FIFE • CMS LPC cluster monitoring • Collaborating with OSG and wider scientific computing community Increase Fermilab visibility – Feedback improvements – Better site monitoring - what – resources are available offsite for http://www.lumaxart.com/ FIFE jobs? Collaborative Project https://fifemon.github.io 10 FIFE Monitoring FIFE Workshop 2016

  11. Accounting • OSG retiring Gratia, ruled it unmaintainable and inflexible • “GRÅCC” being developed by OSG and FNAL Modular, microservice-based architecture – Primary data store: Elasticsearch – Primary frontend: Grafana – Compatible with existing probes – Alpha stage this summer, Production by end of year – • Accounting data will be more readily accessible Integrate FIFE accounting data into Fifemon... – https://gracc.opensciencegrid.org 11 FIFE Monitoring FIFE Workshop 2016

  12. Over Two Years of FIFE History https://fifemon-pp.fnal.gov/dashboard/db/fife-history 12 FIFE Monitoring FIFE Workshop 2016

  13. Fifemon Tutorial Tomorrow • Grafana basic usage, tips & tricks • Common workflows: Checking your job status and resource usage – Checking your experiment’s status and usage – Checking batch system usage and resource availability – • Q&A 13 FIFE Monitoring FIFE Workshop 2016

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend