XRootD Monitoring Report A.Beche D.Giordano Outlines Talk 1: - - PowerPoint PPT Presentation

xrootd monitoring report
SMART_READER_LITE
LIVE PREVIEW

XRootD Monitoring Report A.Beche D.Giordano Outlines Talk 1: - - PowerPoint PPT Presentation

XRootD Monitoring Report A.Beche D.Giordano Outlines Talk 1: XRootD Monitoring Dashboard Context Dataflow and deployment model Database: storage & aggregation User interface & use cases Open issues & future


slide-1
SLIDE 1

XRootD Monitoring Report

A.Beche D.Giordano

slide-2
SLIDE 2

Outlines

  • Talk 1: XRootD Monitoring Dashboard
  • Context
  • Dataflow and deployment model
  • Database: storage & aggregation
  • User interface & use cases
  • Open issues & future work
  • Summary
  • Talk 2: Beyond XRootD monitoring
  • HTTP/WebDAV integration
  • Integration in the WLCG Transfers Dashboard

10 – April - 14 A.Beche – Federated Workshop 2

slide-3
SLIDE 3

5 10 15 20 25 30 35 40 45 # sites

Number of sites reporting

XRootD federation monitoring

  • Activity started during summer 2012
  • 4 sites for FAX, 11 for AAA

Monitoring data increased accordingly July 2012 March 2014 AAA 606k 43M FAX 15k 22M

10 – April - 14 A.Beche – Federated Workshop 3

slide-4
SLIDE 4

Why monitoring ?

  • Understand data flows to estimate data

traffic

  • Provide information for efficient operations
  • Identify access patterns and propose data

placement strategies

10 – April - 14 A.Beche – Federated Workshop 4

slide-5
SLIDE 5

Raw Stats 10 minutes

XRootD monitoring dataflow

Federation

GLED Collector Consumer WEB API Dashboard UI External applications real time UDP stomp stomp asynchronous ActiveMQ

10 – April - 14 A.Beche – Federated Workshop 5

slide-6
SLIDE 6

GLED Deployment model

50 100 150 200 Hz

EOS monitoring data rate

5 10 15 20 Hz

Federation monitoring data rate

AMQ @ CERN

Shared cluster 5 nodes

AAA UCSD (16Hz) EOS CERN (10Hz) EOS CERN (150Hz) FAX US SLAC (9Hz) FAX EU CERN (1 site)

10 – April - 14 A.Beche – Federated Workshop 6

slide-7
SLIDE 7

Consolidated dataflow

  • Two usage of these raw data:
  • Dashboard monitoring
  • XRootD popularity
  • Now share the same database:
  • Storage optimization
  • Consistency guaranteed

10 – April - 14 A.Beche – Federated Workshop 7

slide-8
SLIDE 8

AAA

~300 GB ~1B records

Database

FAX

~600 GB ~2B records

Daily insert

2 GB / 6M rows

  • Storage
  • Raw, statistics, metadata
  • Tables daily partitioned, no global indexes

100 200 300 400 500 600 700 GB

Database usage growth*

* Indexes excluded

10 – April - 14 A.Beche – Federated Workshop 8

slide-9
SLIDE 9

Database

  • Raw data aggregation:
  • Done using PL/SQL procedures
  • Events are unordered
  • Stateless: Full re-computation of touched bins

each time

  • Compute stats from raw data in 10 min bins
  • Aggregate 10 min stats in daily bins

10 – April - 14 A.Beche – Federated Workshop 9

slide-10
SLIDE 10

Aggregation methods

2pm 3pm 4pm 5pm 6pm 7pm

Transfers Easy method

Transfers 1 2 1 Bytes 10 15 20

10 – April - 14 A.Beche – Federated Workshop 10

slide-11
SLIDE 11

Aggregation methods

2pm 3pm 4pm 5pm 6pm 7pm

Transfers 1 2 1 Bytes 10 15 20 Transfers 1 (1) 1 (0) 2 (0) 3 (2) 1 (1) Bytes 8 1 14 (9+6) 15 (1+9+5) 5

Easy method Transfers Adopted method

10 – April - 14 A.Beche – Federated Workshop 11

slide-12
SLIDE 12

Visualization Interface

10 – April - 14 A.Beche – Federated Workshop 12

slide-13
SLIDE 13

Pre-defined set of views

10 – April - 14 A.Beche – Federated Workshop 13

slide-14
SLIDE 14

Use case example

Understand site access patterns

  • 1. Which sites are reading

from FNAL

  • 2. Zoom to a specific site to

understand which users are reading

  • 3. Understand which files are

read by a user

1 2 3 2

10 – April - 14 A.Beche – Federated Workshop 14

slide-15
SLIDE 15

Data popularity

  • XRootD monitoring provides information

about file access patterns:

  • Including non official collections (ie: user files)
  • Contribute to simplify and make more efficient the

usage of disk resources

  • Popularity data analytics built on this

information:

  • Adopted already for CMS-EOS
  • will be extended to full AAA

10 – April - 14 A.Beche – Federated Workshop 15

slide-16
SLIDE 16

Archive recommendation for CMS-EOS

  • Help to manage the disk space of EOS including user space
  • No central bookkeeping system
  • Unused files: created > 4 months ago, no access in the last 3

months:

  • ~500 TB of space occupied and not used <=> 30% of total for these areas

10 – April - 14 A.Beche – Federated Workshop 16

% TB

slide-17
SLIDE 17

Open issues

  • Missing servers:
  • Dcache sites
  • Server should provide their site name.
  • CMS: only 5 sites:
  • anon, BUDAPEST, Hephy-Vienna, T2_US_USCD, UKI-LT2-Brunel
  • Not coherent convention naming
  • ATLAS: GLED RPM to be deployed
  • GLED Collector improvements:
  • Reliability of the service:
  • Recover time, can be long due to time difference
  • GLED should be operated as a production service
  • Scalability:
  • to be fixed with automatic reconnection soon

10 – April - 14 A.Beche – Federated Workshop 17

slide-18
SLIDE 18

Future work

  • Strong requirement from ATLAS to

understand efficiency:

  • Need the concept of error / failure
  • How XRootD server could be instrumented to report it?
  • European GLED collector is up and running:
  • Only 1 pilot site is reporting to it (CNAF)
  • Should we keep it?
  • Data mining activity (not started yet):
  • Almost 2 years of raw data (1TB)

10 – April - 14 A.Beche – Federated Workshop 18

slide-19
SLIDE 19

Data Mining

  • Extract further knowledge from the data…
  • Detect inefficiencies
  • Propose deletion strategies
  • Define data placement
  • … by
  • Understand access patterns and data usage
  • Correlate data traffic and data access performance
  • Possibility to automate some operations

10 – April - 14 A.Beche – Federated Workshop 19

slide-20
SLIDE 20

Application usage

20 10 30 15

FAX AAA

10 – April - 14 A.Beche – Federated Workshop 20

slide-21
SLIDE 21

Summary

  • Monitoring federations is a challenge
  • High rate of traffic & information
  • Challenge met by data aggregation, scalable technologies
  • Dashboard is not actively used
  • Less than 10 daily users (FAX), less than 15 (AAA)
  • Is there any missing functionalities?
  • Improvement work is ongoing
  • New requests are coming
  • XRootD monitoring is a one piece of the entire Data transfers

puzzle

  • See next talk

10 – April - 14 A.Beche – Federated Workshop 21

slide-22
SLIDE 22

Beyond XRootD monitoring

A.Beche D.Giordano

slide-23
SLIDE 23

Outlines

  • Talk 1: XRootD Monitoring Dashboard
  • Context
  • Dataflow and deployment model
  • Database: storage & aggregation
  • User interface & use cases
  • Open issues & future work
  • Summary
  • Talk 2: Beyond XRootD monitoring
  • HTTP/WebDAV integration
  • Integration in the WLCG Transfers Dashboard

10 – April - 14 A.Beche – Federated Workshop 23

slide-24
SLIDE 24

HTTP Federation is coming

  • HTTP protocol will be used in the future
  • XRootD servers can be accessed
  • See Fabrizio’s presentation on xrdhttp
  • Two kind of accesses:
  • Pure HTTP access (through Apache)
  • HTTP gate to XRootD server
  • Can’t be monitor in the same way

10 – April - 14 A.Beche – Federated Workshop 24

slide-25
SLIDE 25

Monitoring XRootD access protocol

  • XRootD 4 will now reports the user protocol:
  • All the monitoring chain needs to be updated
  • Dashboard DB and UI are fully ready

HTTP XRootD

10 – April - 14 A.Beche – Federated Workshop 25

slide-26
SLIDE 26

Site GLED collector ActiveMQ

JOB

XRootD Federation

XRootD

Site SE

HTTP/WebDAV federation monitoring

10 – April - 14 A.Beche – Federated Workshop 26

slide-27
SLIDE 27

Site GLED collector ActiveMQ

JOB

XRootD Federation

XRootD

Site SE HTTP Federation Site

HTTP/WebDAV federation monitoring

10 – April - 14 A.Beche – Federated Workshop 27

slide-28
SLIDE 28

Site

28

GLED collector ActiveMQ

JOB JOB

XRootD Federation HTTP Federation

XRootD Xrd HTTP

Site Site SE

29 November 2013 Alexandre Beche - ITTF

HTTP/WebDAV federation monitoring

slide-29
SLIDE 29

Site GLED collector ActiveMQ

JOB JOB JOB

XRootD Federation HTTP Federation

XRootD Xrd HTTP Apache

Site Site SE

HTTP/WebDAV federation monitoring

10 – April - 14 A.Beche – Federated Workshop 29

slide-30
SLIDE 30

Site GLED collector ActiveMQ

JOB JOB JOB

XRootD Federation HTTP Federation

XRootD Xrd HTTP Apache

Site Site SE

?

HTTP/WebDAV federation monitoring

10 – April - 14 A.Beche – Federated Workshop 30

slide-31
SLIDE 31

How to compare data from different applications?

10 – April - 14 A.Beche – Federated Workshop 31

slide-32
SLIDE 32

data transfers & accesses monitoring tools

WEB API / UI WEB API/UI WEB API/UI

WLCG FAX AAA

FAX EOS AAA EOS FTS

10 – April - 14 A.Beche – Federated Workshop 32

slide-33
SLIDE 33

WLCG Transfers Dashboard

federated approach

WEB API / UI WEB API/UI WEB API/UI

FTS FAX AAA

FAX EOS AAA EOS FTS

WLCG Transfers Dashboard API / UI

10 – April - 14 A.Beche – Federated Workshop 33

slide-34
SLIDE 34

Some plots

10 – April - 14 A.Beche – Federated Workshop 34

FTS XRootD ALTAS CMS LHCb ALICE

slide-35
SLIDE 35

Summary

  • Lots of effort has been put in XRootD monitoring workflow and

dashboard in the last 2 years

  • Reliable system achieved
  • Lots of use cases covered
  • HTTP Monitoring already started
  • Will require a lot of effort to reach XRootD monitoring level
  • New WLCG Transfers Dashboard architecture
  • Highly extensible system
  • Cross-VO or cross-technology analysis

10 – April - 14 A.Beche – Federated Workshop 35

slide-36
SLIDE 36

Credits

  • Andreeva Julia
  • Cons Lionel
  • Giordano Domenico
  • Saiz Pablo
  • Tadel Matevz
  • Tuckett David
  • Vukotic Ilija
  • The AAA and FAX deployment team
  • ….

10 – April - 14 A.Beche – Federated Workshop 36

slide-37
SLIDE 37

Useful links

  • AAA Dashboard
  • http://dashb-cms-xrootd-transfers.cern.ch
  • FAX Dashboard:
  • http://dashb-atlas-xrootd-transfers.cern.ch
  • CHEP materials
  • https://indico.cern.ch/abstractDisplay.py?abstractId=101&confId=214784
  • https://indico.cern.ch/getFile.py/access?contribId=94&sessionId=6&resId=0&materialId=slide

s&confId=214784

  • https://indico.cern.ch/getFile.py/access?contribId=265&sessionId=6&resId=1&materialId=slid

es&confId=214784

  • Xbrowse framework:
  • https://twiki.cern.ch/twiki/bin/view/ArdaGrid/XbrowseFramework

Thanks for your attention

10 – April - 14 A.Beche – Federated Workshop 37