Machine Learning Considerations Auralee Edelen SLAC National - - PowerPoint PPT Presentation

machine learning considerations
SMART_READER_LITE
LIVE PREVIEW

Machine Learning Considerations Auralee Edelen SLAC National - - PowerPoint PPT Presentation

Machine Learning Considerations Auralee Edelen SLAC National Accelerator Laboratory Controls Modernization Workshop, FNAL 28 September, 2018 Overview Some use-cases for ML Online modeling Virtual diagnostics/reconstruction problems


slide-1
SLIDE 1

Machine Learning Considerations

Auralee Edelen SLAC National Accelerator Laboratory Controls Modernization Workshop, FNAL 28 September, 2018

slide-2
SLIDE 2

Overview

  • Some use-cases for ML

§ Online modeling § Virtual diagnostics/reconstruction problems à get previously inaccessible or cumbersome

information from the machine

§ Anomaly detection and failure prediction § Tuning

  • Some practical considerations

§ Archive data and accessibility § Interfaces to control system § Computing needs

slide-3
SLIDE 3

Online Modeling

  • Use a machine model during operation
  • Ideally:
  • Fast-executing, but accurate enough to be useful
  • Use measured inputs directly from machine
  • Combine a priori knowledge + learned parameters
  • Applications:
  • A tool for operators + virtual diagnostic
  • Predictive control
  • Help flag aberrant behavior
  • Bonus: control system development

One approach: faster modeling codes

Simpler models (tradeoff with accuracy) analytic calculations Parallelization and GPU-acceleration of existing codes

HPSim/PARMILA elegant

Improvements to modeling algorithms

I.

  • V. Pogorelov, et al., IPAC15, MOPMA035
  • X. Pang, PAC13, MOPMA13
  • e. g. J. Galambos, et al., HPPA5, 2007

J.-L. Vay, Phys. Rev. Lett.98 (2007) 130405

Lorentz-boosted frame

slide-4
SLIDE 4

Online Modeling

Another approach: machine learning model

Once trained, neural networks can execute quickly Train on data from slow, high-fidelity simulations Train on measured data

  • Use a machine model during operation
  • Ideally:
  • Fast-executing, but accurate enough to be useful
  • Use measured inputs directly from machine
  • Combine a priori knowledge + learned parameters
  • Applications:
  • A tool for operators + virtual diagnostic
  • Predictive control
  • Help flag aberrant behavior
  • Bonus: control system development

+

Simulation + Machine NN Model

slide-5
SLIDE 5

Online Modeling

Another approach: machine learning model

Once trained, neural networks can execute quickly Train on data from slow, high-fidelity simulations Train on measured data

  • Use a machine model during operation
  • Ideally:
  • Fast-executing, but accurate enough to be useful
  • Use measured inputs directly from machine
  • Combine a priori knowledge + learned parameters
  • Applications:
  • A tool for operators + virtual diagnostic
  • Predictive control
  • Help flag aberrant behavior
  • Bonus: control system development

x +

Simulation + Machine NN Model An initial study at Fermilab: One PARMELA run with 2-D space charge: ~ 20 minutes Neural network model: ~ a millisecond

  • A. L. Edelen, et al. NAPAC16, TUPOA51
slide-6
SLIDE 6

Predict what diagnostics might look like when they are unavailable or don’t exist

Online Model Real-time prediction of beam characteristics or explicit diagnostic output

Virtual Diagnostics

fast-executing simulation

measured machine inputs

slide-7
SLIDE 7

Predict what diagnostics might look like when they are unavailable or don’t exist

Online Model Real-time prediction of beam characteristics or explicit diagnostic output

Virtual Diagnostics

fast-executing simulation

measured machine inputs

e.g. GPU-accelerated HPSim at LANSCE (based on PARMILA)

  • X. Pang, et al., PAC13, MOPMA13
  • L. Rybarcyk, et al., IPAC15, MOPWI033
  • L. Rybarcyk, HB2016,

WEPM4Y01

  • X. Pang, IPAC15,

WEXC2

  • X. Pang and L. Rybarcyk, CPC185, is. 3 (2014)
slide-8
SLIDE 8

Online Model diagnostic measurements

(ML model)

Online Model Real-time prediction of beam characteristics or explicit diagnostic output

Virtual Diagnostics

fast-executing simulation

measured machine inputs Predict what diagnostics might look like when they are unavailable or don’t exist measured machine inputs

slide-9
SLIDE 9

diagnostic prediction training updates

Virtual Diagnostics

Predict what diagnostics might look like when they are unavailable or don’t exist

diagnostic measurements Online Model Online Model Real-time prediction of beam characteristics or explicit diagnostic output

Virtual Diagnostics

fast-executing simulation

measured machine inputs measured machine inputs

slide-10
SLIDE 10

Online Model diagnostic measurements diagnostic prediction

  • moved to another location
  • destructive, cannot always use
  • blocked for update time

Online Model Real-time prediction of beam characteristics or explicit diagnostic output

Virtual Diagnostics

fast-executing simulation

measured machine inputs Predict what diagnostics might look like when they are unavailable or don’t exist measured machine inputs

(ML model)

slide-11
SLIDE 11

Virtual Diagnostics

  • A. Sanchez-Gonzalez, et al. https://arxiv.org/pdf/1610.03378.pdf
  • Used archived data to learn correlation between fast and slow

diagnostics

  • Looked at a variety of ML methods and different diagnostics
slide-12
SLIDE 12

Virtual Diagnostics at Fermilab’s FAST Facility

to high energy line and IOTA

!" !!′ !$"

mask screen beam

fit to obtain subset of phase space parameters

Multi-slit emittance measurement after the second capture cavity (X107 to X111) takes 10-15 seconds à can we get an online prediction of what this intercepting diagnostic would show?

the subject of this work

slide-13
SLIDE 13

Neural Network Model

Neural Network Solenoid Current Phases (Gun, CC1, CC2) Initial Bunch Properties (charge, length, εx,y , x-y corr.) Transmission Average Beam Energy Transverse Sigma Matrix εx,y βx,y αx,y

slide-14
SLIDE 14

Predicting Image Output Directly

Simulated NN Predictions Difference

  • A. L. Edelen, et al. IPAC18, WEPAF040
slide-15
SLIDE 15

Failure Prediction (Prognostics) + Anomaly Detection

Anomaly Detection:

  • Detect deviations from normal operating

conditions that may otherwise go noticed

  • Could be at device level or higher-level machine

state

Machine Protection:

  • Catastrophic failures and faults sometimes

preceded by tell-tale signs

  • Can we predict these events and take

compensatory action? Replacement Cycles and Predictive Maintenance:

  • When will this device (and others) fail?
  • Historical lifetime data + detection of signals

preceding long-term failure

  • How can we plan maintenance to reduce the

number of times we need to stop operations to fix items as they fail

slide-16
SLIDE 16

“Some of the most dangerous malfunctions of the magnets are quenches which occur when a part of the superconducting cable becomes normally-conducting.” Aim: use a recurrent NN to identify quench precursors in voltage time series à Predict future behavior, then classify it Initial study with small data set:

  • 425 quenches for 600 A magnets
  • Used archived data from 2008 to 2016
  • 16-32 previous values à predict a few time steps

ahead

slide-17
SLIDE 17

Anomaly detection example from SLAC: cathode QE drop

FEL Pulse Energy Cathode QE

zoom

  • D. Sanzone
slide-18
SLIDE 18

T uning:Fast Switching Between T rajectories

JLab

  • 76 BPMs, 57 dipoles, 53 quadrupoles
  • Traditional approach has never worked (linear response matrix)
  • Rely on a few experts for steering tune-up
  • Want to specify small offsets in trajectory at some locations
  • Didn’t initially have an up-to-date machine model available

Learn responses (NN model) from tune-up data and dedicated study time: dipole + quadrupole settings à predict BPMs + transmission Train controller (NN policy) offline using NN model: desired trajectory à dipole settings (and penalize losses + large magnet settings)

Work with C. Tennant and D. Douglas, JLab

slide-19
SLIDE 19

Fast Switching Between T rajectories

Controller: random initial states à on average within 0.2 mm of center immediately Model Errors for BPMs: Training Set: 0.07 mm MAE 0.09 mm STD Validation Set: 0.08 mm MAE 0.07 mm STD Test Set: 0.08 mm MAE 0.03 mm STD Preliminary Results:

Modeling Example (randomly selected a BPM

  • ut of the data set to plot)

Main anticipated advantage of NN over standard approach: Adaptive control policy à adjust without interfering with

  • peration for response measurements as often?

Handling of trajectories away from BPM center (nonlinear) But, need to quantify this … Learn responses (NN model) from tune-up data and dedicated study time: dipole + quadrupole settings à predict BPMs + transmission Train controller (NN policy) offline using NN model: desired trajectory à dipole settings (and penalize losses + large magnet settings)

slide-20
SLIDE 20

T uning example from SLAC: FEL T aper T uning

Simulation: power Simulation: taper profile

20

Experiment: Pulse energy

Experiment: Taper profile

  • J. Wu

Factor of 2 increase in power

slide-21
SLIDE 21

Some Practical Challenges

Training on Measured Data Observed parameter range in archived data Undocumented manual changes (e.g. rotating a BPM) Relevant-but-unlogged variables Availability of diagnostics Time on machine for characterization studies (schedule + expense)

Ideal case:

  • comprehensive, high-resolution data archive

(e.g. including things like ambient temp./pressure)

  • excellent log of manual changes
slide-22
SLIDE 22

Training on Measured Data Training on Simulation Data Observed parameter range in archived data Undocumented manual changes (e.g. rotating a BPM) Relevant-but-unlogged variables Availability of diagnostics Input/output parameters need to translate directly to what’s

  • n the machine (quantitatively)

High-fidelity (e.g. PIC) à time-consuming to run Retention + availability

  • f prior results:

(optimize and throw the iterations away!) How representative of the real machine behavior? Time on machine for characterization studies (schedule + expense)

Ideal case:

  • comprehensive, high-resolution data archive

(e.g. including things like ambient temp./pressure)

  • excellent log of manual changes

Some Practical Challenges

slide-23
SLIDE 23

Training on Measured Data Training on Simulation Data Observed parameter range in archived data Undocumented manual changes (e.g. rotating a BPM) Relevant-but-unlogged variables Availability of diagnostics Input/output parameters need to translate directly to what’s

  • n the machine (quantitatively)

High-fidelity (e.g. PIC) à time-consuming to run Retention + availability

  • f prior results:

(optimize and throw the iterations away!) How representative of the real machine behavior? Deployment Initial training on HPC systems à deployment usually not

  • Execution on front-end: necessary speed + memory?
  • Subsequent training: on front-end or transfer to HPC?
  • May need some specialized local compute: e.g. GPUs

Time on machine for characterization studies (schedule + expense)

Ideal case:

  • comprehensive, high-resolution data archive

(e.g. including things like ambient temp./pressure)

  • excellent log of manual changes

I/O for large amounts of data Software compatibility for older systems: interface with machine + make use of modern ML software libraries

Some Practical Challenges

slide-24
SLIDE 24

More Detail: Archive Wishlist

  • Consistent timestamps (known relation to central clock)
  • Easy flags for machine states
  • beam off – intentionally vs. beam off – unexpected
  • test state, startup / extensive tuning, normal operation
  • Ability to take data quickly/ support storage of large quantities of data, when needed
  • e.g. RF waveform data
  • Most parameters logged (including environmental)
  • e.g. temperature sensors in enclosures…not useful if too sparse
  • Useful names/information in PV database
  • FNAL is already pretty good with this, e.g. model of BPM hardware included in database info for PV
  • Fast access to archive for pulling large amounts of data
  • Logging of data from image-based diagnostics à lots of useful info for ML
slide-25
SLIDE 25

Other Miscellaneous Considerations

  • Read-back and write large number of PVs simultaneously (tens – hundreds), with low latency
  • Solid support for recent versions of python (either base install or containers, plus good

interface to read/write to machine) à make use of a very large set of scientific tools + ML

  • ACNET has a lot of very nice features already
  • Main practical impediments to ML at FNAL wrt control system (at least as of six months ago):

§ Timestamp consistency / accuracy § Software environment (e.g. need to support modern versions of python and its libraries) § I/O + computing resources for deployment § Lack of diagnostics or archiving of key variables for some problems § Undocumented changes to machine setup à how best to link these to archive data

slide-26
SLIDE 26

Recap

  • Some use-cases for ML:

§ Online modeling § Virtual diagnostics/reconstruction problems à get previously inaccessible or cumbersome information

from the machine

§ Anomaly detection and failure prediction § Tuning

  • Some practical considerations:

§ Archive data and accessibility § Interfaces to control system § Computing needs

slide-27
SLIDE 27

Final Notes

  • ML encompasses a very flexible set of tools à far more powerful + accessible in recent years
  • Lots of opportunities to use ML to improve accelerator performance on both existing and

future machines

  • Transferrable between machines to some degree à lots of potential for fruitful collaborations!
  • Growing community à two recent workshops on ML for accelerators
  • But, has requirements on control system / archive data / compute in order for success

Intelligent Controls for Particle Accelerators 30 – 31 January at Daresbury Lab Agenda/Talks: https://tinyurl.com/y9rg3uht Machine Learning for Particle Accelerators 27 February – 2 March at SLAC Agenda/Talks: https://tinyurl.com/y988njbl