Overload of frontier lpad by MC Overlay Elizabeth Gallas (Oxford) - PowerPoint PPT Presentation

Overload of frontier lpad by MC Overlay Elizabeth Gallas (Oxford) ADC Weekly Meeting April 15, 2014

Caveat … these are bits pieces Overview  which I am aware of and  which seem relevant to the discussion  Limited time to collect metrics  This is an open discussion !  Corrections, additions: welcome !  Problem:  MC Overlay jobs cause Frontier overload on the grid  Aspects of the issue:  Conditions aspects of MC Overlay jobs  Conditions folders of interest  Conditions DB & COOL  Conditions deployment on the grid (Frontier & DB Releases)  MC Overlay Task deployment on the grid  Reconstruction Software  how these aspects, in combination, result in overload April 2014 E.Gallas 2

MC Overlay jobs  Overlay real “zero bias” events on simulated events  An exception to the norm wrt Conditions  Access data from multiple Conditions Instances:  COMP200 (Run 1 real data conditions)  OFLP200 (MC conditions)  This is not thought to contribute to the problem  What seems exceptional and notable is that  Conditions data volume needed by each job to reconstruct these events  Is much greater (10-200x) conditions volume of typical reco  Estimates vary …  Is greater than event data volume of each job  Event volume: A few hundred events ? x 1.5MB  the metadata is larger than the data itself … April 2014 E.Gallas 3

Conditions deployment on the grid 2 modes of access (direct Oracle access demoted)   DB Release files or Frontier MC Overlay can’t use just the default DB Release   It doesn’t contain real data conditions  So it is using Frontier  Alastair: An unusual aspect to the overload is that it is actually bringing down Frontier servers: reboot required !   try to understand cause of comatosis (more later on this) Could we use DB Release (for some/all conditions)? From Misha:   Yes, sure it's possible to make DBRelease for these data  DB access also could be mixed in any way (frontier + DB Release)  Finding the folder list: main role of DBRelease-on-Demand system  it's release and jobOption specific.  Problem: is how distribute it.  Before HOTDISK was used  Now it should be CVMFS: requires new approach.  DB Release size … can’t be known without studying Alastair:   CVMFS likes small files, not large ones … April 2014 E.Gallas 4

Conditions DB and Athena IOVDbSvc  IOVDbSvc:  gets conditions in a time window wider than the actual request  So each conditions retrieval contains probably a bit more data than might be needed by the job  This mechanism is generally very effective in reducing subsequent queries in related time windows  Unsure if this mechanism is helping here  It depends on if subsequent zero bias events in the same job are in the Run/LB range of the retrieved conditions April 2014 E.Gallas 5

MC Overlay task deployment  Assumptions about how these tasks are deployed:  Related jobs are (same MC process id) deployed to specific sites (or clouds) and each requires a unique set of zero bias events over all the jobs  Each of the “related” jobs  Are in clouds using the same Squids and/or Frontier  Accesses the conditions needed for the zero bias events being overlayed  The conditions being accessed is always distinct  this completely undermines any benefit of Frontier caching (queries are always unique)  Multiply this by the hundred/thousand jobs in the task, each retrieving distinct conditions  obvious stress on the system April 2014 E.Gallas 6

Query evaluation:  Folder of interest: Identified via Frontier logs:  ATLAS_COOLONL_TDAQ.COMP200_F0063_IOVS  IOV: 1351385600000000000-1351390400000000000  Evaluate this specific query:  RunLB range: 213486 LB 612 – 700 (part of run 213486)  Folder: COOLONL_TDAQ/COMP200 /TDAQ/OLC/BUNCHLUMIS  IOV basis: TIME (not Run/LB)  Channel Count: 4069  channels retrieved generally less … depends on IOV  Payload: Bunch-wise  RunLB (UInt63) Luminosity !  AverageRawInstLum (Float)  BunchRawInstLum (Blob64k) – > LOB !! Large Object !!  Valid (UInt32)  The query retrieves 2583 rows, each including LOBs  number of rows >> number of LBs (~80)  This is the nature of the folder being used April 2014 E.Gallas 7

… more about LOBs … Folder accessed has a LOB payload (Large Object)  Back to COOL ( and via Frontier):  LOB access from COOL  not the same as access to other payload column types  There is some more back/forth communication  Between client (Frontier) and Oracle  Rows are retrieved individually  Always a question: can LOB access be improved ?  Also: is there something about Frontier and LOBs  Something that might cause the Frontier failure ?  It doesn’t happen with single jobs  Only seems to occur when loaded above a certain level  no individual query in these jobs results in data throughput beyond the system capacity  it is somehow triggered by load April 2014 E.Gallas 8

No system has infinite capacity  General ATLAS Database Domain Goal:  Develop and deploy systems which can deliver any data in databases needed by jobs  Even large volumes when needed  In reality: Capacity, bandwidth, etc … are not infinite  So consider ways to moderate requests but still satisfy use cases  This case, bunch-wise luminosity is being retrieved  More channels are being retrieved than being used  Inefficiency in the COOL callback mechanism  Already planned improvement to folders for Run 2  Thanks to Eric, Mika (lumi experts), Andy (MC experts) for critical feedback  I asked in email off- thread … answers on the next slide:  Is bunch-wise luminosity really needed ? April 2014 E.Gallas 9

Is bunch-wise lumi needed ?  Andy:  … not doing anything special for overlay for lumi info … running standard reco … must be the default for standard reco of data as well … What's unusual for overlay is that each event can be from a different LB, whereas for data the events are mostly from the same LB within a job.  Eric:  Yes of course, and this could trigger the IOVDBSvc to constantly reload this information for every event from COOL.  … per - BCID luminosity … used by LAr as a part of standard reco since (early) 2012 … used to predict the LAr noise as a function of position in the bunch train from out of time pileup. I don't know exactly what happens in the overlay job, but presumably it also accesses this information to find the right mix of events. April 2014 E.Gallas 10

Attempt at a summary Aspects of conditions implementation, usage and deployment all seem to conspire … no one smoking gun  DB caching mechanisms:  completely undermined by this pattern of access  Software: using default reconstruction for luminosity  Bunch-wise corrections needed for real data reco (Lar)  NO problems with this in deployment – should not change !  But is the default overkill for zero bias overlay ?  Would the BCID average luminosity suffice (use different folder) ?  Eliminates the need for LOB access in this use case  Conditions DB/COOL and Frontier  COOL side: no obvious culprit … BLOB sizes vary  Frontier: Evaluate cause of failure w/ LOBs under high load  Task deployment:  DB Release option ? any other ideas ?  Please be patient  Must find the best overall long term solution for this case  Without undermining software which is critical for other use cases  Use this use case for studying the bottlenecks April 2014 E.Gallas 11

Overload of frontier lpad by MC Overlay Elizabeth Gallas (Oxford) - PowerPoint PPT Presentation

Overload of frontier lpad by MC Overlay Elizabeth Gallas (Oxford) ADC Weekly Meeting April 15, 2014 Caveat these are bits pieces Overview which I am aware of and which seem relevant to the discussion Limited time to collect

AFC Asia Frontier Fund AFC Asia Frontier Fund CONFIDENTIAL January 2017 September 2013

AFC Asia Frontier Fund AFC Asia Frontier Fund CONFIDENTIAL May 2017 September 2013 INTRODUCING

July 2017 September 2013 INTRODUCING ASIA FRONTIER CAPITAL AFC Asia Frontier Fund 2

Margins Overload = Balance Margins is the gap between overload and your limits. Overload

A Novel Approach for Cooperative Overlay-Maintenance in Multi-Overlay Environments 1 Wu-Chun

CS5412: OVERLAY NETWORKS Lecture IV Ken Birman Overlay Networks 2 We use the term overlay

The Frontier Thesis: How & Why the Riverina Was Won The Frontier Thesis The Frontier Thesis:

BELTLINE OVERLAY DISTRICT Z-06-121 Beltline Zoning Overlay District Regulations CITY OF ATLANTA

Heuristic Search 1/25/17 Generic search algorithm add start to frontier while frontier not

Analyzing Search Generic search algorithm add start to frontier while frontier not empty get

Operator Overload Ch 11.1 Highlights - operator overload Basic point class Suppose we wanted

The Implication of Overlay Routing The Implication of Overlay Routing on ISPs Connecting

L2 Overlay L2 Overlay

Computational Geometry Lecture 2b: Subdivision representation and map overlay 1 Computational

Computational Geometry Lecture 2b: Subdivision representation and map overlay Computational

Computing the Overlay of Two Computational Geometry Subdivisions Doubly Connected Edge List

Impact of Macroprudential Policy Measures on Economic Dynamics: Simulation Using a Financial

Review and Preliminary Mortgage Analysis S CALABLE DATA P ROCES S IN G IN R Michael Kane

Design and Analysis of Algorithms Michael Gelfond Texas Tech University September, 2017 Michael

Israel: Israel: Past, Present, and Past, Present, and Future Future The LO RD did not set

Reasoning on programs using Step-indexed Realizability Guilhem Jaber PPS, IRIF, Universite Paris

Database Migration: Challenges of Migration from Oracle to Open Source European Bioinformatics

Handout on List Class The template library 'list' is a sequence container that contains elements

Lecture 16: PCFG Parsing (updated) Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center

Sambuz

Useful Links

Newsletter

Mail Us

Overload of frontier lpad by MC Overlay Elizabeth Gallas (Oxford) - PowerPoint PPT Presentation

Overload of frontier lpad by MC Overlay Elizabeth Gallas (Oxford) ADC Weekly Meeting April 15, 2014 Caveat these are bits pieces Overview which I am aware of and which seem relevant to the discussion Limited time to collect

AFC Asia Frontier Fund AFC Asia Frontier Fund CONFIDENTIAL January 2017 September 2013

AFC Asia Frontier Fund AFC Asia Frontier Fund CONFIDENTIAL May 2017 September 2013 INTRODUCING

July 2017 September 2013 INTRODUCING ASIA FRONTIER CAPITAL AFC Asia Frontier Fund 2

Margins Overload = Balance Margins is the gap between overload and your limits. Overload

A Novel Approach for Cooperative Overlay-Maintenance in Multi-Overlay Environments 1 Wu-Chun

CS5412: OVERLAY NETWORKS Lecture IV Ken Birman Overlay Networks 2 We use the term overlay

The Frontier Thesis: How &amp; Why the Riverina Was Won The Frontier Thesis The Frontier Thesis:

BELTLINE OVERLAY DISTRICT Z-06-121 Beltline Zoning Overlay District Regulations CITY OF ATLANTA

Heuristic Search 1/25/17 Generic search algorithm add start to frontier while frontier not

Analyzing Search Generic search algorithm add start to frontier while frontier not empty get

Operator Overload Ch 11.1 Highlights - operator overload Basic point class Suppose we wanted

The Implication of Overlay Routing The Implication of Overlay Routing on ISPs Connecting

L2 Overlay L2 Overlay

Computational Geometry Lecture 2b: Subdivision representation and map overlay 1 Computational

Computational Geometry Lecture 2b: Subdivision representation and map overlay Computational

Computing the Overlay of Two Computational Geometry Subdivisions Doubly Connected Edge List

Impact of Macroprudential Policy Measures on Economic Dynamics: Simulation Using a Financial

Review and Preliminary Mortgage Analysis S CALABLE DATA P ROCES S IN G IN R Michael Kane

Design and Analysis of Algorithms Michael Gelfond Texas Tech University September, 2017 Michael

Israel: Israel: Past, Present, and Past, Present, and Future Future The LO RD did not set

Reasoning on programs using Step-indexed Realizability Guilhem Jaber PPS, IRIF, Universite Paris

Database Migration: Challenges of Migration from Oracle to Open Source European Bioinformatics

Handout on List Class The template library 'list' is a sequence container that contains elements

Lecture 16: PCFG Parsing (updated) Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center

Sambuz

Useful Links

Newsletter

Mail Us

The Frontier Thesis: How & Why the Riverina Was Won The Frontier Thesis The Frontier Thesis: