P i Paired Redundant IOCs Paired Redundant IOCs d R d d t IOC - - PowerPoint PPT Presentation

p i paired redundant iocs paired redundant iocs d r d d t
SMART_READER_LITE
LIVE PREVIEW

P i Paired Redundant IOCs Paired Redundant IOCs d R d d t IOC - - PowerPoint PPT Presentation

Slide 1 Slide 1 P i Paired Redundant IOCs Paired Redundant IOCs d R d d t IOC with Redundant Hardware with Redundant Hardware ith R d d t H d S A Baily and Eric Bjorklund S. A. Baily and Eric Bjorklund S. A. Baily and Eric Bjorklund


slide-1
SLIDE 1

Slide 1 Slide 1

P i d R d d t IOC Paired Redundant IOCs Paired Redundant IOCs ith R d d t H d with Redundant Hardware with Redundant Hardware S A Baily and Eric Bjorklund

  • S. A. Baily and Eric Bjorklund
  • S. A. Baily and Eric Bjorklund

UNCLASSIFIED

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

slide-2
SLIDE 2

Slide 2 Slide 2

Wh D LANCSE U R d d t IOC Why Does LANCSE Use Redundant IOCs Why Does LANCSE Use Redundant IOCs y F It Ti i S t ? For Its Timing System? For Its Timing System? g y

  • Since the 1990s we’ve had redundant hardware for the master timer
  • Since the 1990s we ve had redundant hardware for the master timer

t system. system. If th t ti t tti t ti i t ll f RF

  • If the master timer stops putting out timing gates, all of our RF

If the master timer stops putting out timing gates, all of our RF stands trip off stands trip off. p

It’s not difficult to recover a single stand but recovering all stands takes – It s not difficult to recover a single stand, but recovering all stands takes l t f ti a lot of time. a lot of time. Hi t i ll d i ll RF b tl t 120 H d it id – Historically, dropping all RF power abruptly at 120 Hz caused city-wide y, pp g p p y y power outages power outages. ( – Power dispatch calls when our electricity usage drops quickly (at lower Power dispatch calls when our electricity usage drops quickly (at lower repetition rates) repetition rates).

  • Without redundancy maintenance opportunities on the timing
  • Without redundancy, maintenance opportunities on the timing

t ld b li it d system would be very limited. system would be very limited.

UNCLASSIFIED

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

slide-3
SLIDE 3

Slide 3 Slide 3

Redundant IOC software Redundant IOC software Redundant IOC software

D l d b

  • Developed by
  • Developed by

p y DESY in DESY in DESY in collaboration collaboration collaboration ith SLAC with SLAC with SLAC Maintained b

  • Maintained by

Maintained by DESY DESY DESY.

UNCLASSIFIED

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

slide-4
SLIDE 4

Slide 4 Slide 4

How Redundant IOC Software Works How Redundant IOC Software Works How Redundant IOC Software Works

  • Redundancy Monitoring Task
  • Redundancy Monitoring Task

Monitors state health of network – Monitors state health of network and drivers and drivers

  • Continuous Control Executive
  • Continuous Control Executive

Synchronizes EPICS databases – Synchronizes EPICS databases

  • RMT Driver
  • RMT Driver

Plugs in to RMT – Plugs in to RMT. g Reports state of health – Reports state of health p Reports synchronization status – Reports synchronization status p y Accepts commands to enter – Accepts commands to enter p master or slave states master or slave states.

UNCLASSIFIED

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

slide-5
SLIDE 5

Slide 5 Slide 5

Why a Redundant IOC Pair? Why a Redundant IOC Pair? Why a Redundant IOC Pair?

  • Master Timer
  • Master Timer

– Timing Pattern Generator (TPG) g ( )

  • VME-64 IOC with solution and zero crossing detector
  • VME-64 IOC with solution and zero crossing detector.

Generates sched led timing patterns and timing e ent link

  • Generates scheduled timing patterns and timing event link.
  • Includes RF gates and triggerable beam sequences.

g gg q

– MRF event generators. MRF event generators.

Gate Enable Inhibit and Countdown Controller (GEICCO) – Gate Enable, Inhibit, and Countdown Controller (GEICCO)

  • cRIO IOC determines which beam events get sent.

g

  • Triggers actual beam gates (slave event generators in the TPG) in response to operator

Triggers actual beam gates (slave event generators in the TPG) in response to operator switches and counters switches, and counters.

cRIO with TTL Binary I/O – cRIO with TTL Binary I/O.

A d i id d d

  • A second pair provides redundancy

A second pair provides redundancy A fib ti it h (MRF f t t t ) l t hi h t i

  • A fiber optic switch (MRF-fanout concentrator) selects which system is

p ( ) y used used.

UNCLASSIFIED

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

slide-6
SLIDE 6

Slide 6 Slide 6

G i 1 Geicco-1

Hardware Setup Hardware Setup Hardware Setup

TPG-1

  • Two IOC pairs

TPG 1

  • Two IOC pairs.

Two IOC pairs.

  • Two single points of
  • Two single points of

RF Dist

Two single points of f il

RF Dist.

failure

RF Input

failure.

RF Input

RF i t ( t tt )

RF Dist

– RF input (may not matter)

RF Dist.

RF input (may not matter) Event link (unavoidable)

TPG 2

– Event link (unavoidable)

TPG-2

( )

S i h

  • Systems monitor each
  • Systems monitor each

y

  • ther as part of
  • ther as part of

GEICCO-2

  • ther as part of

h i ti

GEICCO-2

synchronization synchronization.

E t li k it h

y

Event-link switch Event link switch

UNCLASSIFIED

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

slide-7
SLIDE 7

Slide 7 Slide 7

LANSCE Timing System IOC Functions LANSCE Timing System IOC Functions LANSCE Timing System IOC Functions

Ti i P tt G t (TPG)

  • Timing Pattern Generator (TPG)

Timing Pattern Generator (TPG)

– Generates scheduled timing patterns Generates scheduled timing patterns. – Includes RF gates and triggerable beam sequences Includes RF gates and triggerable beam sequences.

  • MRF event generators (VME-64).

MRF event generators (VME 64).

G t E bl I hibit d C td C t ll (GEICCO)

  • Gate Enable Inhibit and Countdown Controller (GEICCO)

Gate Enable, Inhibit, and Countdown Controller (GEICCO)

– Triggers actual beam gates (slave event generators in the TPG) Triggers actual beam gates (slave event generators in the TPG) in response to operator switches and counters in response to operator switches, and counters. p p ,

  • cRIO with TTL Binary I/O
  • cRIO with TTL Binary I/O.

UNCLASSIFIED

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

slide-8
SLIDE 8

Slide 8 Slide 8

Differs From Traditional Redundant IOC Differs From Traditional Redundant IOC Differs From Traditional Redundant IOC N t t id ti l IOC th b

  • Not two identical IOCs on the same bus
  • Not two identical IOCs on the same bus.

H d d t h d

  • Has redundant hardware
  • Has redundant hardware.

IOC h ld ti l

  • IOCs should run continuously

IOCs should run continuously. y IOC h ld f il i i

  • IOCs should fail over in pairs

IOCs should fail over in pairs. p

UNCLASSIFIED

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

slide-9
SLIDE 9

Slide 9 Slide 9

Broadly Applicable Improvements Broadly Applicable Improvements Broadly Applicable Improvements

  • Added support for syncing fields with
  • Added support for syncing fields with

pp y g SPC DBADDR (waveform records) SPC DBADDR (waveform records) SPC_DBADDR (waveform records)

– Modified the Continous Control Executive (CCEXEC) – Modified the Continous Control Executive (CCEXEC) to support larger fields to support larger fields.

Made it possible to specify which records should

  • Made it possible to specify which records should

Made it possible to specify which records should b d be synced be synced

Configurable via info nodes – Configurable via info nodes. g M difi d 2db th t C F t b d t – Modified e2db so that CapFast can be used to Modified e2db so that CapFast can be used to fi i f d configure info nodes. configure info nodes.

UNCLASSIFIED

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

slide-10
SLIDE 10

Slide 10 Slide 10

Why Not Sync All Records? Why Not Sync All Records? Why Not Sync All Records?

IOC t t h ld b i di id l

  • IOC stats should be individual
  • IOC stats should be individual.

C b d idth

  • Conserve bandwidth
  • Conserve bandwidth

R d d t h d d b k h l

  • Redundant hardware read-back channels

Redundant hardware read-back channels h ld f th t l h d should come from the actual hardware should come from the actual hardware.

UNCLASSIFIED

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

slide-11
SLIDE 11

Slide 11 Slide 11

R d d t H d S ifi Redundant Hardware Specific Redundant Hardware Specific p I t Improvements Improvements p

  • Added an option to pause only the channel access
  • Added an option to pause only the channel access

p p y server instead of the entire IOC server instead of the entire IOC. server instead of the entire IOC.

  • Call post event on the slave when the database
  • Call post event on the slave when the database

p _ becomes synced with the master becomes synced with the master beco es sy ced e as e (CCEXEC SLV INSYNC state) so that passive (CCEXEC SLV INSYNC state) so that passive (CCEXEC_SLV_INSYNC state) so that passive d b d t th l records can be made to process on the slave records can be made to process on the slave.

Uses event driven fanout records – Uses event-driven fanout records. C t l th i d – Can control the processing order Can control the processing order. Particularly useful for commands – Particularly useful for commands. y

UNCLASSIFIED

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

slide-12
SLIDE 12

Slide 12 Slide 12

Small “surprises” Small “surprises” Small surprises

  • Database changes should be made live (as well as to
  • Database changes should be made live (as well as to

g ( the file) the file). )

If l th fil i h d b t d IOC ill j t – If only the file is changed, a rebooted IOC will just resync If only the file is changed, a rebooted IOC will just resync with its master with its master.

N d d i di id l i d l d l t

  • Needed individual zero crossing delay record values to

Needed individual zero crossing delay record values to k ti i t h l ibl make timing match as close as possible make timing match as close as possible.

  • Failover cannot be made fast enough for our needs
  • Failover cannot be made fast enough for our needs.

g

Th ’ d l f i t l 1 d i dditi t ll – There’s a delay of approximately 1 second in addition to all There s a delay of approximately 1 second in addition to all the specified parameters the specified parameters. p p

UNCLASSIFIED

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

slide-13
SLIDE 13

Slide 13 Slide 13

Managing a pair of IOCs Managing a pair of IOCs Managing a pair of IOCs

  • One of the health monitoring tasks monitors
  • One of the health monitoring tasks monitors

One of the health monitoring tasks monitors th i d IOC the paired IOC the paired IOC. p Ch th IOC f hi h d t f il

  • Choose the IOC from which command-to-fail

Choose the IOC from which command to fail commands should be issued The other commands should be issued. The other it th it h iti monitors the switch position monitors the switch position. C iti l h d f lt f il h dl d

  • Critical hardware fault failovers are handled

Critical hardware fault failovers are handled O by the program in the cRIO by the program in the cRIO. y p g

Th d d t IOC ft th f ll – The redundant IOC software then follows – The redundant IOC software then follows.

UNCLASSIFIED

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

slide-14
SLIDE 14

Slide 14 Slide 14

RMT Drivers RMT Drivers RMT Drivers

  • TPG
  • TPG

None our software tasks – None, our software tasks

  • Monitors and reports its own health to GEICCO
  • Monitors and reports its own health to GEICCO
  • Monitors GEICCO status/health (using asyn)

Monitors GEICCO status/health (using asyn) M it it h iti ( i )

  • Monitors switch position (using asyn)

p ( g y )

  • Issues command to fail to follow the switch position
  • Issues command-to-fail to follow the switch position
  • GEICCO
  • GEICCO

Monitors LabVIEW status/health – Monitors LabVIEW status/health

  • Our program sets the switch position when the IOC becomes master
  • Our program sets the switch position when the IOC becomes master

– Monitors TPG status/health – Monitors TPG status/health

UNCLASSIFIED

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA

slide-15
SLIDE 15

Slide 15 Slide 15

Results Results Results

  • Redundant IOC software can be adapted to coordinate
  • Redundant IOC software can be adapted to coordinate

multiple IOCs multiple IOCs.

– Requires some custom software tasks to provide faster failover – Requires some custom software tasks to provide faster failover. – Our system can detect a dropped 120 Hz cycle and respond in – Our system can detect a dropped 120 Hz cycle, and respond in ti f th t l time for the next cycle. time for the next cycle.

N h i ti t l f t (i f d )

  • New synchronization control feature (info nodes)

New synchronization control feature (info nodes)

  • New pause only channel access feature
  • New pause only channel access feature
  • New post event feature
  • New post_event feature

p _ New support for SPC DBR field and large fields

  • New support for SPC DBR field and large fields

pp _ g

UNCLASSIFIED

Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA