Hydra A library for data analysis in massively parallel platforms - - PowerPoint PPT Presentation

hydra
SMART_READER_LITE
LIVE PREVIEW

Hydra A library for data analysis in massively parallel platforms - - PowerPoint PPT Presentation

Hydra A library for data analysis in massively parallel platforms A. Augusto Alves Jr and Michael D. Sokoloff University of Cincinnati aalvesju@cern.ch Presented at NVIDIAs GPU Technology Conference, May 8-11, 2017 - Silicon Valley, US A.


slide-1
SLIDE 1

Hydra

A library for data analysis in massively parallel platforms

  • A. Augusto Alves Jr and Michael D. Sokoloff

University of Cincinnati aalvesju@cern.ch

Presented at NVIDIA’s GPU Technology Conference, May 8-11, 2017 - Silicon Valley, US

  • A. Augusto Alves Jr.

Hydra May 7, 2017 1 / 23

slide-2
SLIDE 2

Outline

Design and goals of Hydra Basic functionalities and main algorithms Performance

Multidimensional numerical integration Phase-space Monte Carlo generation Interface to ROOT::Minuit2 and fitting

Summary

  • A. Augusto Alves Jr.

Hydra May 7, 2017 2 / 23

slide-3
SLIDE 3

Motivation

The Large Hadron Collider (LHC) and other facilities acquire 10’s petabytes of data anually. The collective effort to analyze this amount data requires state-of-the-art software tools that:

Scale efficiently to face the increasing statistics from the experiments. Meet the high precision requirements typically necessary to address High Energy Physics (HEP) problems. Are efficient and flexible enough to face the different conditions of specific HEP experiments. Are portable, scalable, compatible with existing software and hardware standards.

  • A. Augusto Alves Jr.

Hydra May 7, 2017 3 / 23

slide-4
SLIDE 4

Hydra

Hydra is a header only templated C++ library designed to perform common HEP data analyses on massively parallel platforms. It is implemented on top of the C++11 Standard Library and a variadic version of the Thrust library. Hydra is designed to run on Linux systems and to use OpenMP, CUDA and TBB enabled devices. It is focused on portability, usability, performance and precision.

  • A. Augusto Alves Jr.

Hydra May 7, 2017 4 / 23

slide-5
SLIDE 5

Design and features

The main design features are: The library is structured using static polymorphism. There is absolutely no need to write explicit back-end oriented code. Clean and concise semantics. Interfaces are easy to use correctly and hard to use incorrectly. The same source files written using Hydra and standard C++ compile for GPU or CPU, just exchanging the extension from .cu to .cpp and one or two compiler flags.

  • A. Augusto Alves Jr.

Hydra May 7, 2017 5 / 23

slide-6
SLIDE 6

Features

Generation of Phase-space Monte Carlo samples. Sampling of multidimensional probability density functions. Data fitting using binned and unbinned multidimensional datasets. Evaluation of multidimensional functions over heterogeneous data sets. Numerical integration of multidimensional functions.

  • A. Augusto Alves Jr.

Hydra May 7, 2017 6 / 23

slide-7
SLIDE 7

Functors

Hydra adds features and type information to generic functors using the CRTP idiom. A generic functor with N parameters is represented like this:

s t r u c t MyFunctor : p u b l i c hydra : : BaseFunctor<MyFunctor , double ,N > { // MyFunctor c o n s t r u c t o r and

  • ther

implementation d e t a i l s . . . // User always need to implement the Evaluate () method template<typename T > __host__ __device__ i n l i n e double Evaluate (T∗ x ) { // a c t u a l c a l c u l a t i o n } } ;

✝ ✆

All functors deriving from hydra::BaseFunctor<Func,ReturnType,NPars> can be cached, used to perform fits and to compose more complex mathematical expressions.

  • A. Augusto Alves Jr.

Hydra May 7, 2017 7 / 23

slide-8
SLIDE 8

Arithmetic operations and composition with functors

All the basic arithmetic operators are overloaded. Composition is also possible. If A, B and C are Hydra functors, the code below is completely legal.

. . . // b a s i c a r i t h m e t i c

  • p e r a t i o n s

auto A_plus_B = A + B; auto A_minus_B = A − B; auto A_times_B = A ∗ B; auto A_per_B = A/B; // any composition

  • f

b a s i c

  • p e r a t i o n s

auto any_functor = (A − B)∗(A + B)∗(A/C ) ; // C(A,B) i s r e p r e s e n t e d by : auto compose_functor = hydra : : compose (C, A, B) . . .

✝ ✆

The functors resulting from arithmetic operations and composition can be cached as well. No intrinsic limit on the number of functors participating on arithmetic or composition mathematical expressions.

  • A. Augusto Alves Jr.

Hydra May 7, 2017 8 / 23

slide-9
SLIDE 9

Support for C++11 lambdas

Lambda functions are fully supported in Hydra. The user can define a C++11 lambda function and convert it into a Hydra functor using hydra::wrap_lambda():

. . . double two = 2 . 0 ; // d e f i n e a si mpl e lambda and capture "two" auto my_lambda = [ ] __host__ __device__( double ∗ x ) { return two∗ s i n ( x [ 0 ] ) ; }; // c o n v e r t i s i n t o a Hydra f u n c t o r auto my_lamba_wrapped = hydra : : wrap_lambda (my_lambda ) ; . . .

✝ ✆

CUDA 8.0 supports lambda functions in device and host code.

  • A. Augusto Alves Jr.

Hydra May 7, 2017 9 / 23

slide-10
SLIDE 10

Data containers

hydra::Point represents multidimensional data points including its coordinates, value and errors. hydra::PointVector Looks like an array of structs, but data is stored in structure of arrays.

//two d i m e n s i o n a l p o i n t typedef hydra : : Point<GReal_t , 2> point_t ; //two d i m e n s i o n a l data s e t

  • n

the d e v i c e hydra : : PointVector <point_t , device > data_d (1 e6 ) ; . . . // get data from d e v i c e hydra : : PointVector <point_t , host> data_h ( data_d ) ; // f i l l a ROOT 2D histogram TH2D h i s t ( " h i s t " , "my histogram " , 100 , min , max ) ; f o r ( auto row : data_h ){ auto p o i n t ( row ) ; h i s t . F i l l ( p o i n t . GetCoordinate (0 ) , p o i n t . GetCoordinate ( 1 ) ) ; }

✝ ✆

  • A. Augusto Alves Jr.

Hydra May 7, 2017 10 / 23

slide-11
SLIDE 11

Functionalities

Data fitting and Monte Carlo generation Interface to ROOT::Minuit2 minimization package. Phase-space generator. Multidimensional p.d.f. sampling. Parallel function evaluation over multidimensional datasets Numerical integration Flat Monte Carlo sampling. Vegas-like self-adaptive importance sampling (Monte Carlo). Gauss-Kronrod one-dimensional quadrature. Genz-Malik multidimesional quadrature.

  • A. Augusto Alves Jr.

Hydra May 7, 2017 11 / 23

slide-12
SLIDE 12

Vegas-like multidimensional numerical integration

The VEGAS algorithm is based on importance sampling. It samples the integrand and adapts itself, so that the points are concentrated in the regions that make the largest contribution to the integral. Hydra implementation follows the corresponding GSL algorithm. No limit in the number of dimensions.

// VegasState hold r e s o u r c e s and c o n f i g u r a t i o n s VegasState<N, device > State_d (_min , _max ) ; State_d . S e t I t e r a t i o n s ( i t e r a t i o n s ) ; State_d . SetMaxError ( max_error ) ; State_d . S e t C a l l s ( c a l l s ) ; State_d . S e t T r a i n i n g C a l l s ( t c a l l s ) ; State_d . S e t T r a i n i n g I t e r a t i o n s ( 1 ) ; // Vegas i n t e g r a t o r

  • b j e c t

Vegas<N, device > Vegas_d ( State_d ) ; // i n t e g r a t e a Gaussian Vegas_d . I n t e g r a t e ( Gaussian ) ;

✝ ✆

  • A. Augusto Alves Jr.

Hydra May 7, 2017 12 / 23

slide-13
SLIDE 13

Vegas-like multidimensional numerical integration

Processing a Gaussian distribution in 10 dimensions.

Iteration 1 2 3 4 5 6 7 8 9 Integral result 0.5 0.6 0.7 0.8 0.9 1

GPU Iteration result Cumulative result Number of samples 500 1000 1500 2000 2500 3000 3500 4000 4500

3

10 × Duration [ms] 5000 10000 15000 20000 25000 30000 Number of samples 500 1000 1500 2000 2500 3000 3500 4000 4500

3

10 × Speed-up GPU vs CPU 2 4 6 8 10 12 14

GPU CPU speed-up

System configuration: GPU model: Tesla K40c CPU: Intel R Xeon(R) CPU E5-2680 v3 @ 2.50GHz (one thread)

  • A. Augusto Alves Jr.

Hydra May 7, 2017 13 / 23

slide-14
SLIDE 14

Phase-Space Monte Carlo

Describes the kinematics of a particle with a given four-momentum decaying to N-particle final state. No limitation on the number of particles in the final state. Support the generation of sequential decays. Generation of weighted and unweighted samples.

// Masses

  • f

the p a r t i c l e s hydra : : Vector4R Mother ( mother_mass ,

  • 0. 0 ,

0 .0 , 0 . 0 ) ; double Daughter_Masses [ 3 ] { daughter1_mass , daughter2_mass , daughter3_mass }; // Create PhaseSpace

  • b j e c t

hydra : : PhaseSpace<3> phsp ( Mother_mass , Daughter_Masses ) ; // A l l o c a t e the c o n t a i n e r f o r the eve n ts hydra : : Events <3, device > even t s ( ndecays ) ; // Generate phsp . Generate ( Mother , even t s . begin ( ) , e ven ts . end ( ) ) ;

✝ ✆

  • A. Augusto Alves Jr.

Hydra May 7, 2017 14 / 23

slide-15
SLIDE 15

Phase-Space Monte Carlo

) π M(K 0.5 1 1.5 2 2.5 3 3.5 4 4.5 ) π Ψ M(J/ 12 14 16 18 20 22 dalitz

Entries 1e+07 Mean x 2.312 Mean y 16.5 Std Dev x 1.105 Std Dev y 3.038

50 100 150 200 250 300 350 400 dalitz

Entries 1e+07 Mean x 2.312 Mean y 16.5 Std Dev x 1.105 Std Dev y 3.038

Number of events 1 2 3 4 5 6 7 8 9 10

6

10 × Duration [ms] 1 10

2

10

3

10 Number of events 1 2 3 4 5 6 7 8 9 10

6

10 × Speed-up GPU vs CPU 50 100 150 200 250 300

GPU CPU speed-up

System configuration: GPU model: Tesla K40c CPU: Intel R Xeon(R) CPU E5-2680 v3 @ 2.50GHz (one thread)

  • A. Augusto Alves Jr.

Hydra May 7, 2017 15 / 23

slide-16
SLIDE 16

Interface to Minuit2

ROOT::Minuit2 is widely used in particle physics to find the minimum value of a multi-parameter function (FCN) and analyze the shape of the function around the minimum, and so to compute model’s best-fit parameter values and uncertainties.

Hydra implements an interface to ROOT::Minuit2 that parallelizes the FCN calculation. This dramatically accelerates the calculation over large datasets. The PDFs are normalized on-the-fly using analytical or numerical integration algorithms provided by Hydra. Data is passed using hydra::PointVector.

  • A. Augusto Alves Jr.

Hydra May 7, 2017 16 / 23

slide-17
SLIDE 17

Interface to Minuit2

Model = Ng ∗ Gaussian + Ne ∗ Exponential

G a u s s A n a l y t i c I n t e g r a l G a u s s I n t e g r a l ( min , max ) ; E x p A n a l y t i c I n t e g r a l E x p I n t e g r a l ( min , max ) ; auto Gaussian_PDF = hydra : : make_pdf ( Gaussian , G a u s s I n t e g r a l ) ; auto Exponentia_PDF = hydra : : make_pdf ( Exponentia , E x p I n t e g r a l ) ; //add the pds to make a extended pdf model std : : array <hydra : : Parameter ∗ , 3> y i e l d s { NGaussian , NExponential } ; auto Model = hydra : : add_pdfs ( y i e l d s , Gaussian_PDF , Exponentia_PDF ) ; model . SetExtended ( 1 ) ; // get the FCN auto Model_FCN = hydra : : make_loglikehood_fcn ( Model , data_d ) ; // pass the FCN to Minuit2 . . .

✝ ✆

  • A. Augusto Alves Jr.

Hydra May 7, 2017 17 / 23

slide-18
SLIDE 18

Interface to Minuit2

20 million event maximum likelihood unbinned fit.

data

Entries 2e+07 Mean 5.499 Std Dev 3.694

X 2 4 6 8 10 12 14 Yield 100 200 300 400 500 600 700 800 900

3

10 × data

Entries 2e+07 Mean 5.499 Std Dev 3.694

data

Entries 2e+07 Mean 5.499 Std Dev 3.694

Timing: Fit on GPU: 4.865 seconds Fit on CPU: 299.867 seconds Speed-up: ∼62x System configuration: GPU model: Tesla K40c CPU: Intel R Xeon(R) CPU E5-2680 v3 @ 2.50GHz (one thread)

  • A. Augusto Alves Jr.

Hydra May 7, 2017 18 / 23

slide-19
SLIDE 19

Summary

Hydra’s development has been supported by the National Science Foundation under the grant number PHY-1414736.

The project is hosted on GitHub: https://github.com/MultithreadCorner/Hydra The package includes a suite of examples. It is being used at CERN on analyses aiming to measure the Kaon mass using large datasets.

Acknowledgments

To Karen Tomko and Bradley Hittle from the Ohio Supercomputer Center. To the University of Cincinnati LHCb group.

Please, visit the page of the project, try it out, report bugs, make suggestions... Thanks!

  • A. Augusto Alves Jr.

Hydra May 7, 2017 19 / 23

slide-20
SLIDE 20

Backup

slide-21
SLIDE 21

Phase-Space Monte Carlo

OpenMP: scalling with number of threads

System configuration: CPU: Intel R Xeon(R) CPU E5-2680 v3 @ 2.50GHz x 48

Number of OpenMP threads 5 10 15 20 25 30 35 40 Duration [ms] 50 100 150 200 250 300 350

  • A. Augusto Alves Jr.

Hydra May 7, 2017 21 / 23

slide-22
SLIDE 22

Phase-Space Monte Carlo

CUDA OpenMP, TBB

GPU vs OpenMP

Number of events 1 2 3 4 5 6 7 8 9 10

6

10 × Duration [ms] 1 10

2

10 Number of events 1 2 3 4 5 6 7 8 9 10

6

10 × Speed-up GPU vs CPU 2 4 6 8 10 12 14

GPU CPU speed-up

GPU vs TBB

Number of events 1 2 3 4 5 6 7 8 9 10

6

10 × Duration [ms] 1 10

2

10 Number of events 1 2 3 4 5 6 7 8 9 10

6

10 × Speed-up GPU vs CPU 2 4 6 8 10 12 14

GPU CPU speed-up

  • A. Augusto Alves Jr.

Hydra May 7, 2017 22 / 23

slide-23
SLIDE 23

Vegas-like multidimensional numerical integration

OpenMP: scalling with number of threads

System configuration: CPU: Intel R Xeon(R) CPU E5-2680 v3 @ 2.50GHz x 48

Number of OpenMP threads 5 10 15 20 25 30 35 40 Duration [ms] 1000 1500 2000 2500 3000 3500 4000 4500 5000

  • A. Augusto Alves Jr.

Hydra May 7, 2017 23 / 23