OSG GRid ACCounting system :: GRACC Derek Weitzel, Marian Zvada - PowerPoint PPT Presentation

OSG GRid ACCounting system :: GRACC Derek Weitzel, Marian Zvada Elastic Workshop @FNAL, September 30th, 2019

GRACC - Mapping Jobs to ES Each job is mapped to a document in ES with ~60 attributes each ● GRACC receives 1.2M records a day ● Commodity hardware (and no SSDs)! - ES proved too slow to visualize using raw records over 30+ ● days. Summarized by bucket ’ing jobs into 1 day periods on specific unique attributes. Summing the ● usage. Enrich the summarized records with outside resource information ● 2

GRACC Big Picture Gratia probe : A piece of software that collects accounting data from the computer on which it's running, and ● transmits it to a Gratia server. GRACC server : A server that collects Gratia accounting data from one or more sites and can share it with users via a ● web page. The GRACC server is hosted by the OSG. Reporter : A web service running on the GRACC server. Users can connect to the reporter via a web browser to ● explore the Gratia data. Collector: A web service running on the GRACC server that collects data from one or more Gratia probes. Users do ● not directly interact with the collector. 3

GRACC components architecture Gratia probes run on CE’s and ● submit hosts Each of these boxes are multiple ● actual processes 4

GRACC Collector Program that listens for HTTP POST s from gratia probes. ● Parses a semi-XML format from the POST into JSON ● Places the records onto the message bus for ingestion into ES ● 5

Message Bus Message bus is utilized by GRACC, Network Monitoring, StashCache federation accounting Hosted on commercial provider: CloudAMQP ● Monitored through Grafana alerts, and CloudAMQP alerts ● 6

ES Ingestion We use Logstash receive from the message bus and insert into ES ● Network ingestion uses custom ingester, and constantly a source of trouble ● Very difficult to write a correct message bus to ES ingester ○ Many error conditions ○ Correctly confirming to message bus when ingested ○ 7

Elastic Elasticsearch 5.6.5 (really old) ● Read-only ES interface with 2 layers of security ● NGINX proxy that only allows GET requests, no POST or PUTS… ○ Read Only Rest instance ○ Backups ● HDFS daily snapshots ○ Grafana (4.6.3) ● Kibana (5.6.5) ● 8

Interfaces Grafana (prod) ● Dashboards made for/by stakeholders ○ Kibana - Debug ● Used primarily for debug and early prototyping ○ Email Reports ● Periodic status updates ○ Queries the Read Only interface with custom query ○ 9

GRACC technical specs Hardware hosted on OpenStack platform ElasticSearch cluster (ELK), CEPH storage ● 1 VM Front-End (64GB RAM, 2TB data volume) ● 5 VMs data nodes (32GB RAM, 5TB data volume) ● With this allocated volume size we’re good for another ~3 years ● End of Jan 2019 End of July 2019 End of Sep 2019 10

GRACC Monitoring check_mk with automated notifications ● Deployment fully puppetized ● docker containers (not for everything) ● 11

GRACC Monitoring dashboards status of ES health ● status of nodes ● 12

Transfer and Cache Accounting In addition to jobs, we use GRACC for transfer and cache accounting 13

TCP Transfer Statistics Finding network issues between submit hosts and worker nodes ● Using Filebeats for uploading XferLogs from HTCondor ● 14

Wishlist Interested in roll-ups for summarization. Not sure about enriching the records ● Some life-cycle management with Curator, could be expanded ● 15

Concerns ES can be slow, but it’s probably our hosting platform ● We are scared of drive-by attacks ● We have done disaster recovery exercises, takes >48 hours to restore the platform and data from ● snapshots. Likely days from tape… ○ We inherit projects from others, and we are scared of ingesters ● Writing a good ingester from message bus to ES is hard, so many error conditions ○ 16

OSG GRid ACCounting system :: GRACC Derek Weitzel, Marian Zvada - PowerPoint PPT Presentation

OSG GRid ACCounting system :: GRACC Derek Weitzel, Marian Zvada Elastic Workshop @FNAL, September 30th, 2019 GRACC - Mapping Jobs to ES Each job is mapped to a document in ES with ~60 attributes each GRACC receives 1.2M records a day

OSG As A Partner Brian Bockelman OSG Technology Area Lead Three Lessons for Today What OSG

OSG STORAGE OVERVIEW Tanya Levshina Talk Outline 2 OSG Storage architecture OSG Storage

Testing OSG Software Mtys Selmeci OSG Software Lead Developer OSG All Hands Meeting

Security infrastructure, certificates and responsibilities Anand Padmanabhan for the OSG

Open Science Grid Security Activities D. Olson, LBNL OSG Deputy Security Officer For the OSG

Grid Colombia Workshop with OSG Grid Colombia Workshop with OSG Rob Gardner Aaron Van Meerten

Sun and Grid John Barr Grid Business Development 07808 328351 john.barr@sun.com Sun and Grid

Security Policy Update Mike Stanfield OSG Security Team OSG Council Face-to-Face October 11 th ,

Data on OSG Frank Wrthwein OSG Executive Director Professor of Physics UCSD/SDSC

User Support, Campus Integration, OSG XSEDE Rob Gardner OSG Council Meeting June 25, 2015

OSG User Support Strategies March 24, 2015 OSG All Hands @ Northwestern University Rob Gardner

OSG Technologies Updates Brian Bockelman OSG AHM 2014 This presentation Ill cover topics

Initial comments See OSG from perspective of the Campus continue to commit HCC to OSG

INTRODUCTION TO ACCOUNTING Session 01 Session Outline Definition of Accounting History

ON-GRID VS OFF-GRID SOLAR On-Grid Solar is solar generation that is connected to the utility grid

Migrating from Grid to Cloud: Migrating from Grid to Cloud: Migrating from Grid to Cloud:

Rout e 1 M ult im odal Alt ernat ives Analysis Com m unit y Involvem ent Com m ittee M arch 18,

Moving Smart Mobility Initiatives into Practice Ohio Transportation Engineering Conference Oct.

ECE 697J Advanced Topics Advanced Topics ECE 697J in Computer Networks in Computer

Design of a Simple Computer 2 Schedule Today Simple

System Architectures Using Network Attached Peripherals Rodney Van Meter USC/Information Sciences

SPECIFICATION-BASED IDS FOR THE DNP3 PROTOCOL NOVEMBER, 12TH, 2014 HUI LIN UNIVERSITY OF

North King County Mobility Coalition Aug ugust ust 2020 Welcome! Review Agenda Welcome

Equity Survey Results www.envirometro.org @envirometro facebook.com/envirometro May 9, 2018

Sambuz

Useful Links

Newsletter

Mail Us