AmpTools a modular library to enable amplitude analysis Matthew - - PowerPoint PPT Presentation

amptools a modular library to enable amplitude analysis
SMART_READER_LITE
LIVE PREVIEW

AmpTools a modular library to enable amplitude analysis Matthew - - PowerPoint PPT Presentation

AmpTools a modular library to enable amplitude analysis Matthew Shepherd Indiana University May 4, 2019 Joint GlueX and PANDA Workshop Ashburn, VA Motivation Needed a general framework to handle amplitude analysis for CLEO-c, BES


slide-1
SLIDE 1

AmpTools a modular library to enable amplitude analysis

Matthew Shepherd Indiana University May 4, 2019 Joint GlueX and PANDA Workshop Ashburn, VA

slide-2
SLIDE 2
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Motivation

  • Needed a general framework to handle

amplitude analysis for CLEO-c, BES III, and GlueX

2

slide-3
SLIDE 3
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Motivation

  • Needed a general framework to handle

amplitude analysis for CLEO-c, BES III, and GlueX

  • Feature rich
  • free parameters in amplitudes
  • multi-channel/multi-experiment fits; external

constraints

  • Wanted to stimulate exploration of different

phenomenological models

  • make data analysis accessible to theory

colleagues

  • Separate physics from computing
  • allows one to focus of physics

2

slide-4
SLIDE 4
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Motivation

  • Needed a general framework to handle

amplitude analysis for CLEO-c, BES III, and GlueX

  • Feature rich
  • free parameters in amplitudes
  • multi-channel/multi-experiment fits; external

constraints

  • Wanted to stimulate exploration of different

phenomenological models

  • make data analysis accessible to theory

colleagues

  • Separate physics from computing
  • allows one to focus of physics
  • Development driven by physics and practical

needs rather than the desire to produce a one- size-fits-all software package (like ROOT)

2

slide-5
SLIDE 5
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Ancient History and Acknowledgements

3 On Feb 1, 2006, at 11:08 PM, Ryan Mitchell wrote: Hi, Another keyfile is attached. To fix parameters this version uses an "x" instead of "*". Names in {} are added, which can come before or after the amplitude is specified (but all before the ";").

Package has been codeveloped with Ryan Mitchell (Indiana U.)

slide-6
SLIDE 6
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Ancient History and Acknowledgements

3 On Feb 1, 2006, at 11:08 PM, Ryan Mitchell wrote: Hi, Another keyfile is attached. To fix parameters this version uses an "x" instead of "*". Names in {} are added, which can come before or after the amplitude is specified (but all before the ";"). On May 27, 2009, at 4:51 PM, Hrayr Matevosyan <hmatevosyan@gmail.com> wrote: Hi Guys, The GPU code now works with the fitter in both single and double

  • precision. I want to have a walk-through the changes before I commit

anything to CVS, but I attach the output of the runs over 1.8 mil points with CPU and GPU. CPU takes ~ 2.7 sec for each fit iteration, where GPU in double precision ~ 0.006 sec.

Data structure optimization and initial GPU implementation in CUDA by Hrayr Matevosyan (now at U. Adelaide) Package has been codeveloped with Ryan Mitchell (Indiana U.)

slide-7
SLIDE 7
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

4

Design Goals and Implementation

slide-8
SLIDE 8
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

AmpTools Design Goals

  • Separate physics from computing
  • The “user” provides:
  • an algorithm (C++ class) to unpack four-vectors from a file
  • algorithms (C++ classes) to compute various physics amplitudes from four-vectors
  • a recipe (text file) for assembling the amplitudes into an intensity

5

slide-9
SLIDE 9
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

AmpTools Design Goals

  • Separate physics from computing
  • The “user” provides:
  • an algorithm (C++ class) to unpack four-vectors from a file
  • algorithms (C++ classes) to compute various physics amplitudes from four-vectors
  • a recipe (text file) for assembling the amplitudes into an intensity
  • AmpTools provides:
  • a general framework that makes no assumptions about experiment or physics model

(other than quantum mechanics)

  • a set of core libraries optimized for unbinned likelihood fitting and parallel processing
  • MPI parallelization was always a part of design: knew eventual problem size

would exceed RAM on one machine

  • GPU acceleration used (with or without) MPI to further accelerate computation
  • n a single node
  • modular code that can also be used for MC generation and displaying fit results
  • a standard format for writing data I/O and amplitude calculation methods

5

slide-10
SLIDE 10
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Underlying Assumptions

  • Use an unbinned maximum-likelihood fit with with a likelihood for each data

set with N events written as:

6

L(θ) = e−µµN N!

N

Y

i=1

P(xi; θ),

slide-11
SLIDE 11
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Underlying Assumptions

  • Use an unbinned maximum-likelihood fit with with a likelihood for each data

set with N events written as:

6

L(θ) = e−µµN N!

N

Y

i=1

P(xi; θ),

  • A PDF that is constructed from an intensity and an acceptance function:

P(x; θ) = 1 µI(x; θ)η(x).

µ = Z I(x; θ)η(x) dx,

slide-12
SLIDE 12
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Underlying Assumptions

  • Use an unbinned maximum-likelihood fit with with a likelihood for each data

set with N events written as:

6

L(θ) = e−µµN N!

N

Y

i=1

P(xi; θ),

  • A PDF that is constructed from an intensity and an acceptance function:

P(x; θ) = 1 µI(x; θ)η(x).

µ = Z I(x; θ)η(x) dx,

  • And (as of now) and intensity that is formulated as a combinations of

coherent and incoherent sums of scalable, factorizable amplitudes

I(x) = X

σ

  • X

α

sσ,αVσ,αAσ,α(x)

  • 2

Aσ,α(x) =

nσ,α

Y

γ=1

aσ,α,γ(x),

slide-13
SLIDE 13
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Random Implementation Notes

  • Fit parameters, V, are complex and expressed using real and imaginary parts
  • Symmetrization for identical particles is handled by the framework (necessary if
  • ne wants to factorize amplitudes)
  • The likelihood values for each data set are summed together in multi-channel
  • r multi-experiment fits
  • Background subtraction is implemented by subtracting likelihood contribution

using a background sample whose sum of weights is equal to the expected background

  • Minimization is done with MINUIT; many common MINUIT configuration

functions are implemented in the interface

  • AmpTools itself is collection of libraries; the user must provide:
  • library with code to read data and compute amplitude factors
  • executables to do fits, generate MC, …

7

slide-14
SLIDE 14
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Example DataReader

8

DalitzDataReader::DalitzDataReader( const vector< string >& args ) : UserDataReader< DalitzDataReader >(args), m_eventCounter( 0 ){ assert(args.size() == 1); string inFileName(args[0]); string inTreeName("nt"); TH1::AddDirectory( kFALSE ); gSystem->Load( "libTree" ); ifstream fileexists( inFileName.c_str() ); if (fileexists){ m_inFile = new TFile( inFileName.c_str() ); m_inTree = static_cast<TTree*>( m_inFile->Get( inTreeName.c_str() ) ); } else{ cout << "DalitzDataReader WARNING: Cannot find file... " << inFileName << endl; m_inFile = NULL; m_inTree = NULL; } if (m_inTree){ m_inTree->SetBranchAddress( "EnP1", &m_EnP1 ); m_inTree->SetBranchAddress( "PxP1", &m_PxP1 ); m_inTree->SetBranchAddress( "PyP1", &m_PyP1 ); m_inTree->SetBranchAddress( "PzP1", &m_PzP1 ); m_inTree->SetBranchAddress( "EnP2", &m_EnP2 ); m_inTree->SetBranchAddress( "PxP2", &m_PxP2 ); m_inTree->SetBranchAddress( "PyP2", &m_PyP2 ); m_inTree->SetBranchAddress( "PzP2", &m_PzP2 ); m_inTree->SetBranchAddress( "EnP3", &m_EnP3 ); m_inTree->SetBranchAddress( "PxP3", &m_PxP3 ); m_inTree->SetBranchAddress( "PyP3", &m_PyP3 ); m_inTree->SetBranchAddress( "PzP3", &m_PzP3 ); m_inTree->SetBranchAddress( "weight", &m_weight ); } } Kinematics* DalitzDataReader::getEvent(){ if( m_eventCounter < numEvents() ){ m_inTree->GetEntry( m_eventCounter++ ); vector< TLorentzVector > particleList; particleList.push_back( TLorentzVector( m_PxP1, m_PyP1, m_PzP1, m_EnP1 ) ); particleList.push_back( TLorentzVector( m_PxP2, m_PyP2, m_PzP2, m_EnP2 ) ); particleList.push_back( TLorentzVector( m_PxP3, m_PyP3, m_PzP3, m_EnP3 ) ); return new Kinematics( particleList, m_weight ); } else{ return NULL; } }

  • Inherits from: DataReader
  • Arguments to constructor are a vector

strings that come from the configuration file

  • The getEvent() method provides four-

vectors in a specific order

slide-15
SLIDE 15
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Example Amplitude Factor

  • Inherits from: Amplitude
  • Constructor receives vector of

strings from configuration file

  • Provides a complex number

from an array of four vectors

  • Optional: parameter callback

method to do expensive calculations common to all events when parameters change

  • Optional: user data cache for

storing expensive quantities that are fixed for every fit iteration

9

BreitWigner::BreitWigner( const vector< string >& args ) : UserAmplitude< BreitWigner >(args) { assert( args.size() == 4 ); m_mass = AmpParameter(args[0]); m_width = AmpParameter(args[1]); m_daughter1 = atoi(args[2].c_str()); m_daughter2 = atoi(args[3].c_str()); // need to register any free parameters so the framework knows about them registerParameter( m_mass ); registerParameter( m_width ); } complex< GDouble > BreitWigner::calcAmplitude( GDouble** pKin ) const { TLorentzVector P1(pKin[m_daughter1-1][1], pKin[m_daughter1-1][2], pKin[m_daughter1-1][3], pKin[m_daughter1-1][0]); TLorentzVector P2(pKin[m_daughter2-1][1], pKin[m_daughter2-1][2], pKin[m_daughter2-1][3], pKin[m_daughter2-1][0]); return complex<GDouble>(1.0,0.0) / complex<GDouble>((P1+P2).M2() - m_mass*m_mass, m_mass*m_width); }

slide-16
SLIDE 16
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Example Configuration File: dalitz1.cfg

10

fit dalitz1 reaction dalitz p1 p2 p3 genmc dalitz DalitzDataReader phasespace.gen.root accmc dalitz DalitzDataReader phasespace.acc.root data dalitz DalitzDataReader physics.acc.root normintfile dalitz dalitz1.ni sum dalitz s1 amplitude dalitz::s1::R12 BreitWigner 1.000 0.200 1 2 amplitude dalitz::s1::R13 BreitWigner 1.500 0.150 1 3 initialize dalitz::s1::R12 cartesian 1.0 0.0 initialize dalitz::s1::R13 cartesian 1.0 0.0 real

slide-17
SLIDE 17
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Excerpts from user-provided fitter:

11

// ************************ // parse the config file // ************************ ConfigFileParser parser(cfgname); ConfigurationInfo* cfgInfo = parser.getConfigurationInfo(); cfgInfo->display(); // ************************ // AmpToolsInterface // ************************ AmpToolsInterface::registerAmplitude(BreitWigner()); AmpToolsInterface::registerDataReader(DalitzDataReader()); AmpToolsInterface ATI(cfgInfo); MinuitMinimizationManager* fitManager = ATI.minuitMinimizationManager(); fitManager->setStrategy(1); fitManager->migradMinimization();

slide-18
SLIDE 18
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Example Configuration File: dalitz3.cfg

12

fit dalitz3 reaction dalitz p1 p2 p3 genmc dalitz DalitzDataReader phasespace.gen.root accmc dalitz DalitzDataReader phasespace.acc.root data dalitz DalitzDataReader physics.acc.root normintfile dalitz dalitz3.ni sum dalitz s1 amplitude dalitz::s1::R12 BreitWigner [M12] [G12] 1 2 amplitude dalitz::s1::R13 BreitWigner [M13] [G13] 1 3 initialize dalitz::s1::R12 cartesian 1.0 0.0 initialize dalitz::s1::R13 cartesian 1.0 0.0 real parameter M12 1.000 parameter G12 0.200 parameter M13 1.500 parameter G13 0.150

slide-19
SLIDE 19
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

13

Scalability and Performance

  • n Large Data Sets
slide-20
SLIDE 20
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Performance Considerations

  • In the best case each fit iteration one has to compute n logs of the

intensity, each with about m2 terms, where n is the number of data events and m is the number of amplitudes

  • amplitude computations, if fixed, can be cached for each event
  • A fit with 80 amplitudes and 215K events can take over 24 hours on a

single CPU

  • Putting parameters in amplitudes (e.g., mass of a resonance) increases

cost by orders of magnitude

14

slide-21
SLIDE 21
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Performance Considerations

  • In the best case each fit iteration one has to compute n logs of the

intensity, each with about m2 terms, where n is the number of data events and m is the number of amplitudes

  • amplitude computations, if fixed, can be cached for each event
  • A fit with 80 amplitudes and 215K events can take over 24 hours on a

single CPU

  • Putting parameters in amplitudes (e.g., mass of a resonance) increases

cost by orders of magnitude

  • Two routes to speed:
  • parallelize likelihood calculation
  • reduce number of likelihood calculations or fit iterations
  • gradient calculation in MINUIT
  • alternate minimization algorithms

14

slide-22
SLIDE 22
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Two Fits: An Easier One

  • Only production coefficients

(couplings) are free parameters

  • Amplitude is fixed for every

event and contains no free parameters

  • Common “mass independent

analysis”

15

2

Events / 15 MeV/c

0.5 1 1.5 2 2.5 5000 10000 15000 20000 25000 30000

]

2

) [GeV/c π π Mass(

2

Events / 15 MeV/c

0.5 1 1.5 2 2.5 2000 4000 6000 8000 10000 12000 14000 16000 18000

]

2

) [GeV/c π π Mass( (a) 0++ (b) 2++ E1

J/ψ➞γπ0π0

  • M. Ablikim et. al [BESIII Collaboration], PRD 92, 052003 (2015)
slide-23
SLIDE 23
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Two Fits: An Easier One

  • Only production coefficients

(couplings) are free parameters

  • Amplitude is fixed for every

event and contains no free parameters

  • Common “mass independent

analysis”

  • Functions used to normalize the

PDF (that numerically integrated using MC) factorize enabling rapid recompilation with each iteration

15

2

Events / 15 MeV/c

0.5 1 1.5 2 2.5 5000 10000 15000 20000 25000 30000

]

2

) [GeV/c π π Mass(

2

Events / 15 MeV/c

0.5 1 1.5 2 2.5 2000 4000 6000 8000 10000 12000 14000 16000 18000

]

2

) [GeV/c π π Mass( (a) 0++ (b) 2++ E1

J/ψ➞γπ0π0

  • M. Ablikim et. al [BESIII Collaboration], PRD 92, 052003 (2015)
slide-24
SLIDE 24
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Two Fits: An Easier One

  • Only production coefficients

(couplings) are free parameters

  • Amplitude is fixed for every

event and contains no free parameters

  • Common “mass independent

analysis”

  • Functions used to normalize the

PDF (that numerically integrated using MC) factorize enabling rapid recompilation with each iteration

  • User code is run only during the

first fit iteration: acceleration of core library benefits all users!

15

2

Events / 15 MeV/c

0.5 1 1.5 2 2.5 5000 10000 15000 20000 25000 30000

]

2

) [GeV/c π π Mass(

2

Events / 15 MeV/c

0.5 1 1.5 2 2.5 2000 4000 6000 8000 10000 12000 14000 16000 18000

]

2

) [GeV/c π π Mass( (a) 0++ (b) 2++ E1

J/ψ➞γπ0π0

  • M. Ablikim et. al [BESIII Collaboration], PRD 92, 052003 (2015)
slide-25
SLIDE 25
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Two Fits: A Harder One

  • Use the data to determine some parameter
  • f the model: add a parameter for the

f2(1270) mass

  • One (or more) amplitudes A now can change

with each fit iteration

16

f2(1270) contribution

]

2

) [GeV/c Mass( 0.5 1 1.5 2 2.5 3 Events / 15 MeV 2000 4000 6000 8000 10000 12000

J/ψ➞γπ0π0

slide-26
SLIDE 26
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Two Fits: A Harder One

  • Use the data to determine some parameter
  • f the model: add a parameter for the

f2(1270) mass

  • One (or more) amplitudes A now can change

with each fit iteration

  • conceptually: efficiency for reconstructing an

amplitude depends on a parameter in the fit and must be recomputed when the parameter changes

16

f2(1270) contribution

ln L =

Ndata

X

i=1

ln @

Namps

X

α,β

VαV ∗

β Aα(~

xi)Aβ(~ xi)∗ 1 A − 1 N gen

MC Namps

X

α,β

VαV ∗

β

@

N acc

MC

X

j=1

Aα(~ xj)Aβ(~ xj)∗ 1 A

not fixed: depends on parameters

N acc

MC ≈ 10Ndata

]

2

) [GeV/c Mass( 0.5 1 1.5 2 2.5 3 Events / 15 MeV 2000 4000 6000 8000 10000 12000

J/ψ➞γπ0π0

slide-27
SLIDE 27
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Two Fits: A Harder One

  • Use the data to determine some parameter
  • f the model: add a parameter for the

f2(1270) mass

  • One (or more) amplitudes A now can change

with each fit iteration

  • conceptually: efficiency for reconstructing an

amplitude depends on a parameter in the fit and must be recomputed when the parameter changes

  • user-written code now dominates the run

time: must provide users with tools to

  • ptimize their amplitude calculations (cache

and parameter change callback functions)

16

f2(1270) contribution

ln L =

Ndata

X

i=1

ln @

Namps

X

α,β

VαV ∗

β Aα(~

xi)Aβ(~ xi)∗ 1 A − 1 N gen

MC Namps

X

α,β

VαV ∗

β

@

N acc

MC

X

j=1

Aα(~ xj)Aβ(~ xj)∗ 1 A

not fixed: depends on parameters

N acc

MC ≈ 10Ndata

]

2

) [GeV/c Mass( 0.5 1 1.5 2 2.5 3 Events / 15 MeV 2000 4000 6000 8000 10000 12000

J/ψ➞γπ0π0

slide-28
SLIDE 28
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Generic Fitting Topology

17

Node 1

GPU 1 GPU 2 CPU 1 CPU 2

Node 2

GPU 1 GPU 2 CPU 1 CPU 2

Node n

GPU 1 GPU 2 CPU 1 CPU 2

Master Node

CPU

Optional GPU acceleration Finely parallel: 
 typically one CUDA thread per event or amplitude Coarsely parallel:

  • ne MPI process per

GPU Parameters and partial sums exchanged over network layer Minimizing algorithm runs here

no constraint on GPU/CPU ratio or number of GPU or CPU per node

slide-29
SLIDE 29
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Parallelization

  • AmpTools provides MPI parallelization “for free” with little additional work

by the user — useful templates and interfaces provided:

18

AmpToolsInterface::registerAmplitude(BreitWigner()); AmpToolsInterface::registerDataReader(DalitzDataReader()); AmpToolsInterfaceMPI::registerAmplitude(BreitWigner()); AmpToolsInterfaceMPI::registerDataReader(DataReaderMPI<DalitzDataReader>());

  • MPI setup and management has to happen in main function of user fitting code — an

example is provided in the Dalitz tutorial

  • GPU acceleration using CUDA by NVIDIA requires user development of amplitude

calculations that can run the GPU

  • under testing: automatic GPU acceleration for amplitudes with no free parameters
slide-30
SLIDE 30
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Example GPU amplitude kernel

  • Use CUDA from NVIDIA
  • allows semi-painless mixing of

“device” (GPU) and “host” (CPU) code

  • pointers/memory on device

managed through cudaMalloc(...), cudaMemcpy(...), etc.

  • Users need to write routines to

calculate their amplitudes on either the GPU or CPU

  • preprocessor macros can make it

easier for users to write code

  • presently requires maintaining two

consistent pieces of code: one in CUDA and one C++

19

__global__ void GPUBreitWigner_kernel( GPU_AMP_PROTO, GDouble mass0, GDouble width0, GDouble spin ){ int iEvent = GPU_THIS_EVENT; GDouble dV1[4] = GPU_P4(2); GDouble dV2[4] = GPU_P4(3); GDouble mass = SQ( dV1[0] + dV2[0] ); GDouble mass1 = SQ( dV1[0] ); GDouble mass2 = SQ( dV2[0] ); for( int i = 1; i <= 3; ++i ){ mass -= SQ( dV1[i] + dV2[i] ); mass1 -= SQ( dV1[i] ); mass2 -= SQ( dV2[i] ); } GDouble F = barrierFactor( q, spin ); mass = G_SQRT( mass ); mass1 = G_SQRT( mass1 ); mass2 = G_SQRT( mass2 ); WCUComplex bwTop = { G_SQRT( mass0 * width0 / 3.1416 ), 0 }; WCUComplex bwBot = { SQ( mass0 ) - SQ( mass ), -1.0 * mass0 * width0 }; pcDevAmp[iEvent] = ( F * bwTop / bwBot ); } void GPUBreitWigner_exec( dim3 dimGrid, dim3 dimBlock, GPU_AMP_PROTO, GDouble mass, GDouble width, int spin ) { GPUBreitWigner_kernel<<< dimGrid, dimBlock >>> ( GPU_AMP_ARGS, mass, width, spin ); }

compiled with NVIDIA’s compiler: nvcc, linked into standard C/C++ code

code to compute Breit-Wigner amplitude for one event parallel invocation here, called from core C++ fitting code

slide-31
SLIDE 31
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

A Platform for Studying Performance

  • BigRed II at Indiana University: Cray XE6/XK7
  • 344 CPU-only nodes
  • 2 AMD Opteron 16-core Abu Dhabi x86_64 CPUs
  • 676 CPU/GPU compute nodes
  • 1 AMD Opteron 16-core Interlagos x86_64 CPU
  • 1 NVIDIA Tesla K20 GPU with a single 


Kepler GK110 GPU (4800 MB global memory)

  • Cray Gemini interconnect (critical for large scale parallel

fitting)

20

For small scale GPU applications one can get significant performance/cost using commodity hardware

slide-32
SLIDE 32
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

A Platform for Studying Performance

  • BigRed II at Indiana University: Cray XE6/XK7
  • 344 CPU-only nodes
  • 2 AMD Opteron 16-core Abu Dhabi x86_64 CPUs
  • 676 CPU/GPU compute nodes
  • 1 AMD Opteron 16-core Interlagos x86_64 CPU
  • 1 NVIDIA Tesla K20 GPU with a single 


Kepler GK110 GPU (4800 MB global memory)

  • Cray Gemini interconnect (critical for large scale parallel

fitting)

  • Test fit: mock data from J/ψ➞γπ0π0
  • 216K data events
  • 2M MC events
  • 39 free parameters for 80 amplitudes (some parity

constraints exist)

  • typical convergence in 30K fit iterations

20

For small scale GPU applications one can get significant performance/cost using commodity hardware

slide-33
SLIDE 33
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

21

First 17 s of an 8-core parallel fit (the easier one) ln L =

Ndata

X

i=1

ln @

Namps

X

α,β

VαV ∗

β Aα(~

xi)Aβ(~ xi)∗ 1 A − 1 N gen

MC Namps

X

α,β

VαV ∗

β

@

N acc

MC

X

j=1

Aα(~ xj)Aβ(~ xj)∗ 1 A user code: amplitude calculations

AmpTools code AmpTools code

slide-34
SLIDE 34
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

22

First few iterations of a 16 core CPU fit with a floating parameter (the harder one)

ln L =

Ndata

X

i=1

ln @

Namps

X

α,β

VαV ∗

β Aα(~

xi)Aβ(~ xi)∗ 1 A − 1 N gen

MC Namps

X

α,β

VαV ∗

β

@

N acc

MC

X

j=1

Aα(~ xj)Aβ(~ xj)∗ 1 A

user code: amplitude calculations

slide-35
SLIDE 35
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Performance on BigRedII

23

0.1$ 1$ 10$ 100$ 1000$ 1$ 5$ 9$ 13$ 17$ 21$ 25$ 29$ 33$

Fit$Speed$$[Func-on$Call$Rate$(Hz)]$ Number$of$AMD$Cores$or$NVIDIA$K20$GPUs$ CPU$ GPU$ CPU$Strong$Scaling$Limit$ GPU$Strong$Scaling$Limit$

GPU “steps” are a correctable artifact of event padding to the nearest power of 2

“The Easier Fit” big data set with fixed amplitudes

slide-36
SLIDE 36
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Performance on BigRedII

23

0.1$ 1$ 10$ 100$ 1000$ 1$ 5$ 9$ 13$ 17$ 21$ 25$ 29$ 33$

Fit$Speed$$[Func-on$Call$Rate$(Hz)]$ Number$of$AMD$Cores$or$NVIDIA$K20$GPUs$ CPU$ GPU$ CPU$Strong$Scaling$Limit$ GPU$Strong$Scaling$Limit$

GPU “steps” are a correctable artifact of event padding to the nearest power of 2

“The Easier Fit” big data set with fixed amplitudes

0.1$ 1$ 10$ 100$ 1$ 10$ 100$ 1000$

Fit$Speed$$[Func-on$Call$Rate$(Hz)]$

$

Number$of$AMD$Cores$or$NVIDIA$K20$GPUs$

CPU$ GPU$ CPU$Strong$Scaling$Limit$ GPU$Strong$Scaling$Limit$

“The Harder Fit” same big data set with a free parameter in the amplitudes

slide-37
SLIDE 37
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

24

Applications, Usability, and Getting Started

slide-38
SLIDE 38
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Multiple Use Toolset

  • AmpTools itself is just a library of different

utility objects

  • The ability to construct a function that

compute the intensity for an event has broad utility

  • fit data
  • generate signal MC
  • use random accept/reject based on

calculated intensity

  • plot fit results
  • compare intensity-weighted

projections of MC to data

  • examine how background populates

key variables

  • parallel processing or GPU

acceleration to speed generation of projections

  • Users need to write their own executables to

utilize the toolset — examples provided to follow

25

slide-39
SLIDE 39
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

26

AmpTools enforces standardized class structures for amplitudes, which facilitate code sharing with theorists.

slide-40
SLIDE 40
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

27

slide-41
SLIDE 41
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Examples and Getting Started

  • Source code available from: github.com/mashephe/AmpTools
  • Package contains three modules:
  • AmpTools: the main AmpTools library
  • Tutorials: working examples
  • AmpPlotter: a GUI for viewing fit projections
  • Some documentation, including class reference: github.com/mashephe/AmpTools/wiki

28

slide-42
SLIDE 42
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Examples and Getting Started

  • Source code available from: github.com/mashephe/AmpTools
  • Package contains three modules:
  • AmpTools: the main AmpTools library
  • Tutorials: working examples
  • AmpPlotter: a GUI for viewing fit projections
  • Some documentation, including class reference: github.com/mashephe/AmpTools/wiki
  • Build based on “make” — “configured” by hand by user
  • Recommend examining the Tutorials/Dalitz example which should be fully functional after

defining the location of AMPTOOLS, AMPPLOTTER, and DALITZ in your environment

  • see the shell script: Tutorials/Dalitz/run/testfit which will build all the code and run

some executables

28

slide-43
SLIDE 43
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Examples and Getting Started

  • Source code available from: github.com/mashephe/AmpTools
  • Package contains three modules:
  • AmpTools: the main AmpTools library
  • Tutorials: working examples
  • AmpPlotter: a GUI for viewing fit projections
  • Some documentation, including class reference: github.com/mashephe/AmpTools/wiki
  • Build based on “make” — “configured” by hand by user
  • Recommend examining the Tutorials/Dalitz example which should be fully functional after

defining the location of AMPTOOLS, AMPPLOTTER, and DALITZ in your environment

  • see the shell script: Tutorials/Dalitz/run/testfit which will build all the code and run

some executables

  • Some publications that used AmpTools:
  • BESIII Collab., “Amplitude analysis of the π0π0 system produced in radiative J/ψ

decays,” Phys. Rev. D 92, 052003 (2015).

  • CLEO-c Collab., “Amplitude analyses of the decays Χc1→ηπ+π- and Χc1→η’π+π-,” 

  • Phys. Rev. D84, 112009 (2011).

28

slide-44
SLIDE 44
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Random Lessons Learned

  • Precision matters: especially in large

fits

  • The common assumption of Gaussian

uncertainties is sometimes very bad, especially for the tails of the confidence interval — consider MINUIT or MINOS errors a suggestion

29

ψ J/ π

g

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

ψ J/ π

/g

DD*

g

10 20 30 40 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 (a)

Once troublesome parameters are identified, one can examine how the likelihood depends on these parameters.

slide-45
SLIDE 45
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Random Lessons Learned

  • Precision matters: especially in large

fits

  • The common assumption of Gaussian

uncertainties is sometimes very bad, especially for the tails of the confidence interval — consider MINUIT or MINOS errors a suggestion

  • Things you must do:
  • check mean and variance of fit
  • utput with ensembles of mock

data generated to match real data

  • use bootstrap techniques (random

sampling with replacement) to explore uncertainties on fit parameters

29

ψ J/ π

g

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

ψ J/ π

/g

DD*

g

10 20 30 40 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 (a)

Once troublesome parameters are identified, one can examine how the likelihood depends on these parameters.

slide-46
SLIDE 46
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Ideas Under Development

  • More easily accommodate alternate structures of the intensity
  • Sum of moments (sum of functions)
  • Construction from photon spin density matrix
  • Amplitudes that are naturally written as products of sums

30

slide-47
SLIDE 47
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Ideas Under Development

  • More easily accommodate alternate structures of the intensity
  • Sum of moments (sum of functions)
  • Construction from photon spin density matrix
  • Amplitudes that are naturally written as products of sums
  • User data cache for expensive numbers needed in amplitude calculation
  • Automatic GPU acceleration for parameter-free amplitudes

30

slide-48
SLIDE 48
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Ideas Under Development

  • More easily accommodate alternate structures of the intensity
  • Sum of moments (sum of functions)
  • Construction from photon spin density matrix
  • Amplitudes that are naturally written as products of sums
  • User data cache for expensive numbers needed in amplitude calculation
  • Automatic GPU acceleration for parameter-free amplitudes
  • Tools to automatically generate skeletons for the various pieces of code

that the user needs to develop

  • More robust configuration and build system for various GPU and MPI

installations

30

slide-49
SLIDE 49
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Ideas Under Development

  • More easily accommodate alternate structures of the intensity
  • Sum of moments (sum of functions)
  • Construction from photon spin density matrix
  • Amplitudes that are naturally written as products of sums
  • User data cache for expensive numbers needed in amplitude calculation
  • Automatic GPU acceleration for parameter-free amplitudes
  • Tools to automatically generate skeletons for the various pieces of code

that the user needs to develop

  • More robust configuration and build system for various GPU and MPI

installations

  • [insert the most immediate need here]
  • development team is limited to small fraction of the time of 2 people
  • contributions are welcome

30

slide-50
SLIDE 50
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Summary

  • AmpTools is…
  • a modular framework for amplitude analysis designed to make it easy to

implement arbitrary amplitudes in a robust and optimized way

  • being actively used by members of the GlueX, BESIII, and CLAS

collaborations

  • developed as needs for particular features arise (driven by the most

pressing physics needs rather than software ideas)

  • potentially useful and freely available for others to use

31

slide-51
SLIDE 51
  • M. R. Shepherd

GlueX/PANDA Workshop May 4, 2019

Summary

  • AmpTools is…
  • a modular framework for amplitude analysis designed to make it easy to

implement arbitrary amplitudes in a robust and optimized way

  • being actively used by members of the GlueX, BESIII, and CLAS

collaborations

  • developed as needs for particular features arise (driven by the most

pressing physics needs rather than software ideas)

  • potentially useful and freely available for others to use
  • AmpTools isn’t…
  • a highly polished software tool: even building can be a little tricky
  • as well documented as it should be, although we try to maintain working

examples and talking with developers is easy (a reference manual is under active development)

  • guaranteed to be the best solution for your particular amplitude analysis

problem

31