ALICE Detector Control System Management and Organization
Peter Chochula, Mateusz Lechm an for ALICE Controls Coordination Team
ALICE Detector Control System Management and Organization Peter - - PowerPoint PPT Presentation
ALICE Detector Control System Management and Organization Peter Chochula, Mateusz Lechm an for ALICE Controls Coordination Team Outline The ALICE experiment at CERN Organization of the controls activities Design goals and strategy
Peter Chochula, Mateusz Lechm an for ALICE Controls Coordination Team
2
3
European Organization for Nuclear Research
Conseil Européen pour la Recherche Nucléaire
Main function: to provide particle accelerators and other
22 member states + wide cooperation: 105 nationalities 2500 employes + 12000 associated members of personnel Main project: Large Hardron Collider
4
5
6
7
8
Strong formal foundation for fulfilling duties
Collaboration Board Management Board Technical Board, Finance Board, Offline Board, Physics Board Project level: ACC team , individual sub-detectors projects, DAQ, TRG, Offline groups,...
Technical Coordinator Project Leaders Controls Board Controls Coordinator
9
Sub-detector groups Groups providing the external services (IT, gas, electricity,
DAQ, Trigger and Offline systems, LHC Machine
ALICE Controls Coordinator + one representative per each
The principal steering group for DCS project, reports to
10
Over 100 developers from all around the world and from
Many sub-detector teams had limited expertise in controls,
Provides infrastructure Guidelines and tools Consultancy Integration Cooperates with other CERN experiments/ groups
11
12
JCOP IT – database service IT – network service IT – cyber security service ATLAS CMS CERN (BE/ ICS) LHCB Electronics Pool ALICE Sub- detectors Common vendors Vendors ALICE DAQ, TRG, Offline groups
CERN infrastructure services: gas, cooling,ventilation Vendors
13
Coordination Board
defining the strategy for JCOP steering its implementation
Technical (working group)
14 Defining and reviewing the architecture, the components, the
SCADA, field bus, PLC brands, etc
Setting the priorities for the availability of services and the
in a way which is --as much as possible- compatible with the needs of all the experiments.
Finding the resources
for the implementation of the program of work
Identifying and resolving issues
which jeopardize the completion of the program as-agreed, in-time and with the available resources.
Promoting the technical discussions and the training
to ensure the adhesion of all the protagonists to the agreed strategy
15
16
17
User Requirements Document (URD) Overview Drawings Meetings and workshops
18
Description and requirement of sub-systems
Functionality Devices / Equipment (including their location, link to
Parameters used for monitoring/ control Interlocks and Safety aspects Operational and Supervisory aspects
Requirement on the control system
Interlocks and Safety aspects Operational and Supervisory aspects
For each phase:
Design, Production and purchasing, Installation,
19
20
Examples:
Standard ways of measuring temperatures Control of HV systems Monitoring of LV power supplies
from operator to electronics
21
22
Establish communication with all the involved parties To overcome cultural differences: Start coordinating early,
HEP environment: original developers tend to drift away
(apart from a few exceptions) very difficult to ensure continuity for
the control systems in the projects
In many small detector projects, controls is done only part-
follow the evolution of the experiment equipment and
follow the evolution of the use of the system follow the evolution of the users
23
24
1 9 autonom ous detector system s 1 0 0 W I NCC OA system s > 1 0 0 subsystem s 1 0 0 0 0 0 0 supervised param eters 2 0 0 0 0 0 OPC item s 1 0 0 0 0 0 frontend services 2 7 0 crates 1 2 0 0 netw ork attached devices 1 7 0 control com puters > 7 0 0 em bedded com puters
25
26
User Interface Layer Operations Layer Controls Layer Device abstraction Layer Field Layer Intuitive human interface Hierarchy and partitioning by FSM Core SCADA based
OPC and FED servers DCS devices User Interface Layer Operations Layer Controls Layer Device abstraction Layer Field Layer
27
28
UI UI Control API Data Event Driver Driver UI UI Control API Data Event Driver Driver DI ST DI ST
OA SCADA system
m anagers
connected into one distributed system 1 0 0 W I NCC OA system s 2 7 0 0 m anagers 29
created for each detector 30
system s
distributed system consisting of autonom ous distributed system s 31
connections betw een detectors are not perm itted
inform ation and re-distribute them to
and anom alies are addressed ‘illegal’ connection 32
DB servers
servers
Frontend services
ORACLE size: 5 .4 TB Fileservers W orker nodes 33
34
User Interface Layer Operations Layer Controls Layer Device abstraction Layer Field Layer Intuitive human interface Hierarchy and partitioning by FSM Core SCADA based
OPC and FED servers DCS devices User Interface Layer Operations Layer Controls Layer Device abstraction Layer Field Layer
35
com ponents are used
36
37 ETHERNET EASYNET CAN JTAG VME RS 2 3 2 Custom links… PROFI BUS
com m unication channels
w ithin the sam e detector
Device Driver OPC Server WINCC OA Standardized Device Standardized interface DCOM Commands Status WINCC OA OPC Client
2 0 0 0 0 0 OPC item s in ALI CE 38
Device Driver ???? WINCC OA Custom Device (Custom) interface ??? Commands Status ???
m aintained by institutes
across hundreds of em bedded com puters ( Arm Linux) 39
(Custom) interface
FED ( DI M) SERVER
(Custom) interface
FED ( DI M) CLI ENT
40
41
Commands Data Sync
42
Commands Data Sync
MXI
43
Commands Data Sync
DIM
DCS control board (~ 750 used in ALICE)
5 0 0 FEE servers 2 FED servers
44
Central control Detector Subsystem Device Hierarchical approach Based on CERN toolkit ( SMI + + ) Each node m odelled as FSM I ntegrated w ith W I NCC OA 45
1 top DCS node
46 1 9 detector nodes 1 0 0 subsystem s 5 0 0 0 logical devices 1 0 0 0 0 leaves
OFF Standby Standby Configured Beam Tuning READY 47
OFF
GO_ON
OFF
GO_ON GO_ON
Som e detectors require cooling before they turn on the low voltage But Frontend w ill freeze if cooling is present w ithout low voltage
Unconfigured chips m ight burn ( high current) if pow ered But The chips can be configured only once pow ered ON ON 48
OFF ON
GO_ON
OFF ON
GO_ON
Am I authorized? I s Cooling OK? I s LHC OK Are m agnets OK? I s run in progress? Are counting rates OK?
GO_ON
Originally sim ple operation becom e com plex in real experim ent environm ent Cross-system dependencies are introduced. 49
50
51
Single operator controls ALI CE Failing part is rem oved from hierarchy Rem ote expert
sm all room for developm ents and im provem ents
parallel operation
Certain LHC operations m ight be potentially dangerous for detectors Detectors can be protected by m odified settings ( low er HV…) But…… Excluded parts do not receive the com m and!
DE T H V L V FE E … C H C H C H C H C H C H C H C H C H C H C H C H C H C H C H C H C H C H
DC S
DE T VH V H V L V … C H C H C H C H C H C H C H C H C H C H C H C H C H C H C H C H C H C H
52
53
54
55
User Interface Layer Operations Layer Controls Layer Device abstraction Layer Field Layer Intuitive human interface Hierarchy and partitioning by FSM Core SCADA based
OPC and FED servers DCS devices User Interface Layer Operations Layer Controls Layer Device abstraction Layer Field Layer
56
The original sim ple FSM layout got com plex w ith tim e Potential risk of hum an errors in operation A set of intuitive panels and em bedded procedures replaced the direct FSM
57
58
59
60
24/ 7 shift coverage during ALICE operation periods High turnaround of operators – specific to HEP
Shifters training and on-call service provided by the
Requires clear, extensive documentation understandable
Oncall expert reachable during operation with beams
Remote access for interventions
In critical periods, detector shifts might be manned by
Very rare and punctual activity e.g. few hours when heavy ion
61
Fatal high priority - imminent danger, immediate reaction required Error middle priority - severe condition which does not represent imminent danger but shall be treated without delay Warning low priority - early warning about possible problem, does not represent any imminent danger
62 Reaction to DCS alerts (classes fatal and error) is one of the main
Warnings:
Under responsibility of subsystems shifters/ experts
No reaction expected from central operator
Dedicated screen
63
64
65
66
67
68
Standard HW -> by ACC Rules for accepting non-standard HW
Rules for accepting non-standard components
Access control and user privileges
2 levels: operators and experts
Files import and export rules
69 DCS services require numerous software and hardware assets
Essential to ensure that reliable and accurate information about all
CIs are recorded in different configuration databases at CERN Configuration Management System - integrated view on all the
Repository for software
70
71
72
MS SharePoint - documents management and
before TWiki & custom ACC webpages were in use
JIRA – issues tracking
Technical documentation for experts Operational procedures Training materials DCS Computing Rules Known Errors register Operation reports Publications ...
73
Standarization is the key to success Experiment environment evolves rapidly
Scalability and flexibility play important role in DCS design
Stable central team contributing to the conservation of expertise
Central operation
Cope with large number of operators
Adequate and flexible operation tools, automation
Easily accessible, explicit procedures
Experiment world is dynamic, volatile
Requires a major coordination effort
ALICE DCS provided excellent and uninterrupted service
74
Number of calls decreased significantly over time (from
More automation Better training and documentation Better procedures Better UIs that make operation more intuitive (hiding
75