ATLAS FullDressRehearsals CommonCompu5ngReadinessChallenges a - - PowerPoint PPT Presentation

atlas
SMART_READER_LITE
LIVE PREVIEW

ATLAS FullDressRehearsals CommonCompu5ngReadinessChallenges a - - PowerPoint PPT Presentation

ATLAS FullDressRehearsals CommonCompu5ngReadinessChallenges a forward look ISGC2008 711April2008,AcademiaSinica,Taipei,Taiwan KorsBos CERN/NIKHEF,ATLAS 1 ALMOST READY


slide-1
SLIDE 1

ATLAS


Full
Dress
Rehearsals
 Common
Compu5ng
Readiness
Challenges


a forward look 


ISGC
2008
 7‐11
April
2008,
Academia
Sinica,
Taipei,
Taiwan


Kors
Bos
 CERN/NIKHEF,
ATLAS


1


slide-2
SLIDE 2

ALMOST READY TO GO the ATLAS Main Control Room

slide-3
SLIDE 3

Top view of the open detector with the ECT moved off the beam line (before lowering of the Small Wheel)

2‐April‐2008
 3
 DOE/NSF
JOG#21,
ATLAS
status
report


slide-4
SLIDE 4

Installation Status

  • The detector is now open and in its ‘long shutdown’ configuration:

– Forward Muon Wheels (Big Wheels) have been retracted against the end-wall structures – Endcap Toroids were off beam line in parking positions, Endcap Calorimeters are in open (3 m retracted) position – Both Shielding Disk JD/Small Wheels were lowered end of February

  • Inner detectors are all in place. All detectors, except Pixels, are also cabled and operational. Pixel

electrical connections are well advanced and all, including cooling, should be finished in the coming week, followed by global tests and sign-off which is foreseen for mid to end April. This is the critical path of ATLAS.

  • Endcap Calorimeter electronics refurbishing has been completed and work on Barrel electronics

is progressing well. All work is foreseen to be completed by early April.

  • Both Endcap Toroids have been tested individually up to 50 and 75% current respectively. The

ATLAS detector must be closed (run-configuration) before the overall magnet tests at full power.

  • All Muon Barrel Chambers are now mechanically installed, including special chambers on the

Endcap Toroids. Here the critical path is the commissioning of the RPC chambers and the installation of all CAEN power supplies (late delivery).

  • The Muon end wall chamber installation is well advanced but not fully completed. There are still

some chambers to be installed (about 3 weeks of work) both on sides A and C. This can be done also when the beam pipe is already fully closed.

slide-5
SLIDE 5

InstallaFon Schedule Version 9.3 for CompleFng the Detector

5


slide-6
SLIDE 6

Updated
informa.on
available
at:
 hPp://hcc.web.cern.ch/hcc/


5‐April‐2008


The
progress
on
the
cooldown
has
been
good
!


slide-7
SLIDE 7

3
principal
ac5vi5es
 for
the
next
challenge
in
May


  • 1. T0
processing
and
data
distribu5on

  • 2. T1
data
re‐processing

  • 3. T2
Simula5on
Produc5on


Fully
rely
on
srmv2
everywhere
 Test
now
at
real
scale
(need
disk
space
now!)
 Test
the
full
show:
shi^s,
communica5on,
etc.



7


slide-8
SLIDE 8

‐1‐

 T0
processing
and
data
distribu5on


  • Starts
on
May
5
for
4
weeks

  • Use
M6
data
and
data
generator

  • Test
of
new
small
file
merging
schema

  • Simulate
running
of
10
hours@200Hz
per
day


– nominal
is
50,000
seconds
=
14
hours


  • Distribu5on
of
data
to
T1’s
and
T2’s

  • Request
T1
storage
classes
ATLASDATADISK
and


ATLASDATATAPE
for
disk
and
tape


  • Request
T1
storage
space
for
full
4
weeks


8


slide-9
SLIDE 9

9


SFO2
 SFO1
 SFO3
 SFO4
 T0


Tape
 Reconstruc.on
 farm


T1
 T1
 T1
 T1
 T1


T2
 T2
 T2
 T2
 T2
 T2


Calibra5on
 Reconstruc5on
 Checksum
 Merging
 ESD
 RAW
 ESD
 RAW
 AOD
 AOD
 SFO5
 AOD
 AOD


slide-10
SLIDE 10

ATLAS
Data
@
T0


  • Raw
data
arrives
on
disk
and
is
archived
to
tape

  • ini5al
processing
provides
ESD,
AOD
and
TAG
and
NTUP

  • a
frac5on
(10%)
of
RAW,
ESD
and
AOD
is
made
available
on
disk


  • RAW
data
is
distributed
by
ra5o
over
the
T1’s
to
go
to
tape

  • AOD
is
copied
to
each
T1
to
remain
on
disk

  • ESD
follows
the
RAW
to
the
T1

  • a
second
ESD
copy
is
send
to
the
paired
T1

  • we
may
change
this
distribu5on
for
early
running


10


slide-11
SLIDE 11

Tier‐1

Raw

t0atlas

Tier‐0 Dataflow
for
ATLAS
DATA


ATLASDATATAPE

AOD ESD AOD TAG Group
Analysis AOD

ATLASDATATAPE ATLASDATADISK

TAG

ATLASDATADISK

End
User
 Analysis AOD

Tier‐2 Tier‐3

ATLASENDUSER ATLASGRP<name>

11


slide-12
SLIDE 12

Data
sample
per
day


10
hrs@200Hz
7.2
Mevents/day
 In
the
T0:


  • 11.5
TB/day
RAW
to
tape

  • 1.2
TB/day
RAW
to
disk
(10%)

  • 7.2
TB/day
ESD
to
disk

  • 1.4
TB/day
AOD
to
disk


10
day
t0atlas
buffer
must
be
98
TByte


T1
 Share
 Tape
 Disk
 BNL
 25
%
 
2.9
TB
 9
TB
 IN2P3
 15
 1.7
 4
 SARA
 15
 1.7
 4
 RAL
 10
 1.2
 3
 FZK
 10
 1.2
 3
 CNAF
 5
 0.6
 2
 ASGC
 5
 0.6
 2
 PIC
 5
 0.6
 2
 NDGF
 5
 0.6
 2
 Triumf
 5
 0.6
 2
 RAW=1.6MB
 ESD=1MB
 AOD=0.2MB


12


slide-13
SLIDE 13

Tape
&
Disk
Space
Requirements
 for
the
4
weeks
of
CCRC


10
hrs@200Hz
7.2
Mevents/day
 CCRC
is
4
weeks
of
28
days
 In
the
T0:


  • 322
TB
RAW
to
tape

  • 32
TB
RAW
to
disk
(10%)

  • 202
TB
ESD
to
disk

  • 39
TB
AOD
to
disk


atldata
disk
must
be
273
TB



T1
 Share
 Tape
 Disk
 BNL
 25
%
 81
TB
 252
TB
 IN2P3
 15
 48
 112
 SARA
 15
 48
 112
 RAL
 10
 34
 84
 FZK
 10
 34
 84
 CNAF
 5
 17
 56
 ASGC
 5
 17
 56
 PIC
 5
 17
 56
 NDGF
 5
 17
 56
 Triumf
 5
 17
 56
 RAW=1.6MB
 ESD=1MB
 AOD=0.2MB


13


slide-14
SLIDE 14

14

Throughput during February CCRC

  • Generated data files of realistic sizes

– RAW to all T1’s to tape – ESD and AOD to all T1’s to disk – Ramped up to nominal rates – Full Computing Model with MoU shares

  • Relatively good throughput achieved

– Sustained 700 MB/s for 2 days – Peaks above 1.1GB/s for several hours – Errors understood and fixed

Throughput


Mbyte/sec
 Day


slide-15
SLIDE 15

15

U.S.
Tier
2’s


slide-16
SLIDE 16

ATLAS
Data
@
T1


  • T1’s
are
for
data
archive
and
(re‐)processing

  • And
for
group
analysis
on
ESD
(and
AOD
data)

  • A
share
of
Raw
data
goes
to
tape
@T1

  • Each
T1
receives
a
copy
of
all
AOD
files

  • Each
T1
receives
a
share
of
the
ESD
files


– In
total
3
copies
of
all
ESD
files
world‐wide


16


Space
Token
 Storage
Type
 Used
for
 Size
 ATLASDATADISK
 T1D0
 ESD,AOD,TAG
 By
Share
 ATLASDATATAPE
 T0D1
 RAW
 By
Share


slide-17
SLIDE 17

ATLAS
Data
@
T2


  • T2’s
are
for
Monte
Carlo
Simula5on
Produc5on

  • ATLASs
assume
there
is
no
tape
storage
available

  • Also
used
for
Group
analysis


– Each
physics
group
has
its
own
space
token
ATLASGRP<name>
 – F.e.
ATLASGRPHIGGS,
ATLASGRPSUSY,
ATLASGRPMINBIAS
 – Some
ini5al
volume
for
tes5ng:
2
TB


  • T2’s
may
request
AOD
datasets


– Defined
by
the
primary
interest
of
the
physics
community
 – Another
full
copy
of
all
AOD’s
should
be
available
in
the
cloud


  • Also
for
End‐User
Analysis


– Accounted
as
T3
ac5vity,
not
under
ATLAS
control
 – Storage
space
not
accounted
as
ATLAS
 – But
almost
all
T2
(and
even
T1’s
)
need
space
for
token
ATLASENDUSER
 – Some
ini5al
value
for
tes5ng:
2
TB


17


Space
Token
 Storage
Type
 Used
For
 Size
[TB]
 ATLASDATADISK
 T0D1
 AOD,
TAG
 2
 ATLASGRP<name>
 T0D1
 Group
Data
 2
 ATLASENDUSER
 T0D1
 End‐User
Data
 2


slide-18
SLIDE 18

‐2‐

 T1
re‐processing


  • Not
at
full
scale
(yet)

  • M5
data
staged
back
from
tape
per
dataset

  • Condi5ons
data
on
disk
(140
files)


– Each
re‐proc.
job
opens
~35
of
those
files


  • M5
data
file
copied
to
local
disk
of
WN

  • Output
ESD
and
AOD
file
 


– Kept
on
disk
and
archived
on
tape
(T1D1
storage
class)
 – Copied
to
one
or
two
other
T1’s
for
ESD
files
 – Copied
to
all
other
T1’s
for
AOD
files


18


Space
Token
 Storage
Type
 Used
for
 Size
 ATLASDATADISKTAPE
 T1D1
 ESD,AOD,TAG
from
 re‐processing
of
 detector
data
 By
Share


slide-19
SLIDE 19

19


slide-20
SLIDE 20

Space
requirements
for
M5
re‐proc


  • M5
RAW
data
was
distributed
over
T1’s

  • Total
data
volume
60
TB
(3
days)

  • Only
(small)
ESD
output
from
re‐processing

  • So
minimal
requirements
for
T1D1
pool

  • Re‐processing
3
5mes
faster
than
ini5al


processing
to
achieve
3
5mes
per
year


  • So,
should
aim
to
re‐process
M5
every
day



20


slide-21
SLIDE 21

‐3‐
 T2
Simula5on
Produc5on


  • Simula5on
of
physics
and
background
for
FDR‐2

  • Need
to
produce
~30M
events

  • Simula5on
HITS

(2.8
MB/ev),
Digi5za5on
RDO
(2
MB/ev)

  • Reconstruc5on
ESD
(1.1
MB/ev),
AOD
(0.2
MB/ev)

  • Simula5on
is
done
at
the
T2

  • HITS
uploaded
to
T1
and
kept
on
disk

  • In
T1:
digi5za5on

RDOs
sent
to
BNL
for
mixing

  • In
T1:
Reconstruc5on

ESD,
AOD



– ESD,
AOD
archived
to
tape
at
T1
 – ESD
copied
to
one
or
two
other
T1’s
 – AOD
copied
to
each
other
T1


21


slide-22
SLIDE 22

Tape
&
Disk
Space
Requirements
 for
the
4
weeks
of
CCRC


0.5
Mevents/day
 FDR2
produc5on
8
weeks
30M
events
 In
total:


  • 84
TB
HITS

  • 60
TB
RDO

BNL

  • 33
TB
ESD

  • 6
TB
AOD


atlprod
disk
must
be
39
TB



T1
 Share
 Tape
 Disk
 BNL
 25
%
 0
TB
 100
TB
 IN2P3
 15
 0
 18
 SARA
 10
 0
 12
 RAL
 15
 0
 18
 FZK
 10
 0
 12
 CNAF
 5
 0
 6
 ASGC
 5
 0
 6
 PIC
 5
 0
 6
 NDGF
 5
 0
 6
 Triumf
 5
 0
 6
 HITS=2.8
MB
 RDO=2
MB
 ESD=1.1
MB
 AOD=0.2
MB


22


slide-23
SLIDE 23

Tier‐1 Tier‐2

RDO HITS Simula5on RDO HITS

ATLSMCDISK

Tier‐0

ATLASMCDISK

OtherTier‐1 Data
flow
for
Simula5on
 Produc5on

ATLASMCTAPE

Pile‐up HITS RDO

ATLASMCDISK

Reconstruc5on RDO RDO HITS ESD AOD TAG ESD AOD TAG ESD AOD TAG Mixing RDO BS

ATLASPROD

23


slide-24
SLIDE 24

Storage
Types
@T2
 for
simula5on
produc5on


Space
Token
 Storage
Type
 Used
For
 Size
[TB]
 ATLASMCDISK
 T0D1
 HITS
 Scales
with
#cpu’s


24


Space
Token
 Storage
Type
 Used
for
 Size
 ATLASMCDISK
 T0D1
 ESD,
AOD
from


  • ther
T1’s


By
Share
 ATLASMCTAPE
 T1D0
 HITS
from
MC
 (not
yet)
 By
Share
 ATLASMCDISKTAPE
 T!D!
 ESD,AOD
from
 reconstruc5on
 By
Share


Addi5onal
storage
types
@T1
 for
simula5on
produc5on


slide-25
SLIDE 25

25

Full Dress Rehearsal (FDR) Challenge

  • First phase run during first 2 weeks of March
  • Use simulated data with physics mix of 1031 Luminosity
  • Run 10 hrs @200 Hz from pit-1 injected into output buffers (SFO’s)
  • Full T0 operation:

– Calibration, calibration database may take 24 hrs – Data quality, sign off end of the next day – First pass reconstruction may take 24 hrs – If necessary re-calibration may take another 24 hrs – Subscribe data for transfers to T1&2’s

  • We had planned to concentrate on T0 operations first
  • But it took longer than was expected
  • Moreover generated data sample was smaller: no fakes
  • Next FDR (at higher luminosity planned first week of June

Main goal of FDR: exercise data processing chain from P1  physics analysis

slide-26
SLIDE 26

Data
sporadically
included
 hot
LAr
cells
and
noisy/ dead
crates

spoOed
 (all
?)
by
data
quality
 experts:
 Full
Dress
Rehearsal
–
Phase
I
(Example)
 Ran
fast
ID
alignment
of
distorted
geometry
using
dedicated
calibra.on
stream:



Misaligned
 ID‐aligned
 Ideal


slide-27
SLIDE 27

27

Distribution of FDR data

  • FDR exports

– 2 runs: day 1+2 together and day 3

  • Very high failure in the first run

– Certificate-to-pool mapping mistake at T0 – Understood and fixed

  • Basically 100% success in second run:

all transfers done within minutes!

  • Although very useful for physics, not

enough for throughput studies

Day
 Day


slide-28
SLIDE 28

February March April May Junel July

CC
Readiness
Challenges



4 4 8 wks

Produc5on
for
FDR2


2

FDR‐2
 M6


1

FDR‐2
Re‐processing


4
wks


Mixing
for
FDR2


2

FDR‐1


2

FDR‐1
Re‐processing


4
wks


M5/6
Re‐processing


4
wks


Throughput
Tests


1 1 1 1

Func5onal
Tests


1 1 1

CompuFng OperaFons Planning

28


slide-29
SLIDE 29

Schedule:
March
–
April


Page
29
 Month Date System Requirements, remarks Parallel Shifts Week 13 24/3 - 30/3 L1Calo + CTP 3 days Starting work with Calos when ready April Week 14 31/3 - 6/4 Calos+L1Cal+HLT 1 week start testing tdaq-01-09 Week 15 7/4 - 13/4 Muons 11/4 (WE?) Available for systems + CTP tests Decide/install new

  • ffline release by 7/4

April 7-10 Tile laser testing  unavailable Week 16 14/4 - 20/4 Muons 14-16/4 Tile 17/18 Available for systems + CTP tests ID standalone tests w/o CTP No Tile LB for 10 days, EB available Week 17 21/4 - 27/4 TDAQ/HLT week ID ROSs in use ID standalone tests w/o CTP; BCM integration?

slide-30
SLIDE 30

Month Date System Requirements, remarks Parallel Shifts April Week 18 28/4 - 29/4 L1Calo+Calo run? May Week 18 30/4 - 4/5 3 days TRT + 3 Days SCT Sub-systems: Transition to tdaq-01-09 May Week 19 5/5 – 11/5 2 days ID combined running including Pixel DAQ Towards end of week after transition to 01-09 Start of magnet test ~HLT algos available May Week 20 12/5-18/5

Calo+L1calo+HLT

  • Timing, calo DQ,

debugging, high rate, algo tests

  • Finish with a stable

week end run? Week days: morning expert work; evening calo + central desks WE: 24/7 calos + central desks May Week 21 19/5-25/5

Muon+L1Mu+HLT

  • Same as above
  • Finish with a stable

week end run? with calos? Week days: morning expert work; evening muon (calo?)+ central desks WE: 24/7 muon (calos?) + central desks May Week 22 26/5-1/6

ID+DAQ+HLT Beam pipe closure

  • Same as above
  • Dedicated DAQ test

after detector testing and before HLT testing

  • Finish with a stable

week end run? with muons + calos? Week days: morning expert work; evening ID (Muon/calo?) + central desks WE: 24/7 ID (muon/calos?) + central desks Page
30


Schedule:
May


slide-31
SLIDE 31

Schedule:
June


Month Date System Requirements, remarks Parallel Shifts June Week 23 2/6 – 8/6 No Tier-0 ! Magnet test FDR-2 Week 24 9/6 – 15/6 Magnet test Week 25 16/6 – 22/6 LHC cold? Magnet test Week 26 23/6 – 29/6 Magnet test July Week 27

ATLAS running ?

Page
31


slide-32
SLIDE 32

7‐April‐2008
 ATLAS
Week
Welcome
 32


The first Higgs in ATLAS … (4th April 2008)