Overload of frontier lpad by MC Overlay Elizabeth Gallas (Oxford) - - PowerPoint PPT Presentation

overload of frontier lpad by mc overlay
SMART_READER_LITE
LIVE PREVIEW

Overload of frontier lpad by MC Overlay Elizabeth Gallas (Oxford) - - PowerPoint PPT Presentation

Overload of frontier lpad by MC Overlay Elizabeth Gallas (Oxford) ADC Weekly Meeting April 15, 2014 Caveat these are bits pieces Overview which I am aware of and which seem relevant to the discussion Limited time to collect


slide-1
SLIDE 1

Overload of frontier lpad by MC Overlay

Elizabeth Gallas (Oxford)

ADC Weekly Meeting

April 15, 2014

slide-2
SLIDE 2

Overview

  • Problem:
  • MC Overlay jobs cause Frontier overload on the grid
  • Aspects of the issue:
  • Conditions aspects of MC Overlay jobs
  • Conditions folders of interest
  • Conditions DB & COOL
  • Conditions deployment on the grid (Frontier & DB Releases)
  • MC Overlay Task deployment on the grid
  • Reconstruction Software

 how these aspects, in combination, result in overload

Caveat … these are bits pieces

  • which I am aware of and
  • which seem relevant to the discussion
  • Limited time to collect metrics

 This is an open discussion !

 Corrections, additions: welcome !

April 2014 E.Gallas 2

slide-3
SLIDE 3

MC Overlay jobs

  • Overlay real “zero bias” events on simulated events
  • An exception to the norm wrt Conditions
  • Access data from multiple Conditions Instances:
  • COMP200 (Run 1 real data conditions)
  • OFLP200 (MC conditions)

 This is not thought to contribute to the problem

  • What seems exceptional and notable is that
  • Conditions data volume needed by each job to reconstruct

these events

  • Is much greater (10-200x) conditions volume of typical reco
  • Estimates vary …
  • Is greater than event data volume of each job
  • Event volume: A few hundred events ? x 1.5MB

 the metadata is larger than the data itself …

April 2014 E.Gallas 3

slide-4
SLIDE 4

Conditions deployment on the grid

  • 2 modes of access (direct Oracle access demoted)
  • DB Release files or Frontier
  • MC Overlay can’t use just the default DB Release
  • It doesn’t contain real data conditions
  • So it is using Frontier
  • Alastair: An unusual aspect to the overload is that it is actually bringing

down Frontier servers: reboot required !

  •  try to understand cause of comatosis (more later on this)
  • Could we use DB Release (for some/all conditions)? From Misha:
  • Yes, sure it's possible to make DBRelease for these data
  • DB access also could be mixed in any way (frontier + DB Release)
  • Finding the folder list: main role of DBRelease-on-Demand system
  • it's release and jobOption specific.
  • Problem: is how distribute it.
  • Before HOTDISK was used
  • Now it should be CVMFS: requires new approach.
  • DB Release size … can’t be known without studying
  • Alastair:
  • CVMFS likes small files, not large ones …

April 2014 E.Gallas 4

slide-5
SLIDE 5

Conditions DB and Athena IOVDbSvc

  • IOVDbSvc:
  • gets conditions in a time window wider than the

actual request

  • So each conditions retrieval contains probably a bit more

data than might be needed by the job

  • This mechanism is generally very effective in reducing

subsequent queries in related time windows

 Unsure if this mechanism is helping here  It depends on if subsequent zero bias events in the

same job are in the Run/LB range of the retrieved conditions

April 2014 E.Gallas 5

slide-6
SLIDE 6

MC Overlay task deployment

  • Assumptions about how these tasks are deployed:
  • Related jobs are (same MC process id) deployed to

specific sites (or clouds) and each requires a unique set of zero bias events over all the jobs

  • Each of the “related” jobs
  • Are in clouds using the same Squids and/or Frontier
  • Accesses the conditions needed for the zero bias

events being overlayed

  • The conditions being accessed is always distinct

 this completely undermines any benefit of Frontier

caching (queries are always unique)

  • Multiply this by the hundred/thousand jobs in the

task, each retrieving distinct conditions

 obvious stress on the system

April 2014 E.Gallas 6

slide-7
SLIDE 7

Query evaluation:

  • Folder of interest: Identified via Frontier logs:
  • ATLAS_COOLONL_TDAQ.COMP200_F0063_IOVS
  • IOV: 1351385600000000000-1351390400000000000
  • Evaluate this specific query:
  • RunLB range: 213486 LB 612–700 (part of run 213486)
  • Folder: COOLONL_TDAQ/COMP200

/TDAQ/OLC/BUNCHLUMIS

  • IOV basis: TIME (not Run/LB)
  • Channel Count: 4069
  • channels retrieved generally less … depends on IOV
  • Payload:
  • RunLB (UInt63)
  • AverageRawInstLum (Float)
  • BunchRawInstLum (Blob64k) –> LOB !! Large Object !!
  • Valid (UInt32)
  • The query retrieves 2583 rows, each including LOBs
  • number of rows >> number of LBs (~80)
  • This is the nature of the folder being used

April 2014 E.Gallas 7

Bunch-wise Luminosity !

slide-8
SLIDE 8

… more about LOBs …

Folder accessed has a LOB payload (Large Object)

  • Back to COOL ( and via Frontier):
  • LOB access from COOL
  • not the same as access to other payload column types
  • There is some more back/forth communication
  • Between client (Frontier) and Oracle
  • Rows are retrieved individually
  • Always a question: can LOB access be improved ?
  • Also: is there something about Frontier and LOBs
  • Something that might cause the Frontier failure ?
  • It doesn’t happen with single jobs
  • Only seems to occur when loaded above a certain level

 no individual query in these jobs results in data

throughput beyond the system capacity

 it is somehow triggered by load

April 2014 E.Gallas 8

slide-9
SLIDE 9

No system has infinite capacity

  • General ATLAS Database Domain Goal:
  • Develop and deploy systems which can deliver any

data in databases needed by jobs

  • Even large volumes when needed
  • In reality: Capacity, bandwidth, etc … are not infinite
  • So consider ways to moderate requests but still satisfy

use cases

  • This case, bunch-wise luminosity is being retrieved
  • More channels are being retrieved than being used
  • Inefficiency in the COOL callback mechanism
  • Already planned improvement to folders for Run 2
  • Thanks to Eric, Mika (lumi experts), Andy (MC

experts) for critical feedback

  • I asked in email off-thread … answers on the next slide:
  • Is bunch-wise luminosity really needed ?

April 2014 E.Gallas 9

slide-10
SLIDE 10

Is bunch-wise lumi needed ?

  • Andy:
  • … not doing anything special for overlay for lumi info …

running standard reco … must be the default for standard reco of data as well … What's unusual for overlay is that each event can be from a different LB, whereas for data the events are mostly from the same LB within a job.

  • Eric:
  • Yes of course, and this could trigger the IOVDBSvc to

constantly reload this information for every event from COOL.

  • … per-BCID luminosity … used by LAr as a part of

standard reco since (early) 2012 … used to predict the LAr noise as a function of position in the bunch train from

  • ut of time pileup. I don't know exactly what happens in

the overlay job, but presumably it also accesses this information to find the right mix of events.

April 2014 E.Gallas 10

slide-11
SLIDE 11

Attempt at a summary

Aspects of conditions implementation, usage and deployment all seem to conspire … no one smoking gun

  • DB caching mechanisms:
  • completely undermined by this pattern of access
  • Software: using default reconstruction for luminosity
  • Bunch-wise corrections needed for real data reco (Lar)
  • NO problems with this in deployment – should not change !
  • But is the default overkill for zero bias overlay ?
  • Would the BCID average luminosity suffice (use different folder) ?
  • Eliminates the need for LOB access in this use case
  • Conditions DB/COOL and Frontier
  • COOL side: no obvious culprit … BLOB sizes vary
  • Frontier: Evaluate cause of failure w/ LOBs under high load
  • Task deployment:
  • DB Release option ? any other ideas ?
  • Please be patient
  • Must find the best overall long term solution for this case
  • Without undermining software which is critical for other use cases
  • Use this use case for studying the bottlenecks

April 2014 E.Gallas 11