RICH Cherenkov angle status report March 2017 Christina Quast March - PowerPoint PPT Presentation

Memory layout Performance improvements RICH Cherenkov angle status report March 2017 Christina Quast March 6, 2017 Christina Quast RICH Cherenkov angle status report March 2017

Memory layout Performance improvements Nanoseconds per photon Theoretical limit: 52 . 0 B / 340 GBps = 0.153 For 33554432 photons: Old solver float: 1000.26 Christina Quast RICH Cherenkov angle status report March 2017

Memory layout Performance improvements Memory layout before Christina Quast RICH Cherenkov angle status report March 2017

Memory layout Performance improvements Memory layout before 2 Christina Quast RICH Cherenkov angle status report March 2017

Memory layout Performance improvements Memory layout after Christina Quast RICH Cherenkov angle status report March 2017

Memory layout Performance improvements Nanoseconds per photon Theoretical limit: 52 . 0 B / 340 GBps = 0.153 For 33554432 photons , 1024 wg size, 256 threads: Old solver float: 1000.26 ns Agner Fog’s Vectorclass: 1.04248 ns Christina Quast RICH Cherenkov angle status report March 2017

Memory layout Performance improvements --- a/ QuarticSolverCacheline .h +++ b/ QuarticSolverCacheline .h - T reflPointX ; - T reflPointY ; - T reflPointZ ; + T reflPointX __attribute__ (( __aligned__ (64))); + T reflPointY __attribute__ (( __aligned__ (64))); + T reflPointZ __attribute__ (( __aligned__ (64))); reflPointX = ex + CoCX; reflPointY = ey + CoCY; @@ // TODO :align 64 // FIXME: ueberall const dranmachen ? - VECT emissionPointVecX ; + VECT emissionPointVecX __attribute__ (( __aligned__ (64))); emissionPointVecX .load_a (& data.emissPnt.x()[0]); - VECT emissionPointVecY ; + VECT emissionPointVecY __attribute__ (( __aligned__ (64))); emissionPointVecY .load_a (& data.emissPnt.y()[0]); - VECT emissionPointVecZ ; + VECT emissionPointVecZ __attribute__ (( __aligned__ (64))); emissionPointVecZ .load_a (& data.emissPnt.z()[0]); - VECT CoCX; + VECT CoCX __attribute__ (( __aligned__ (64))); CoCX.load_a (& data.centOfCurv .x()[0]); - VECT CoCY; + VECT CoCY __attribute__ (( __aligned__ (64))); CoCY.load_a (& data.centOfCurv .y()[0]); - VECT CoCZ; Christina Quast RICH Cherenkov angle status report March 2017

Memory layout Performance improvements + VECT CoCZ __attribute__ (( __aligned__ (64))); CoCZ.load_a (& data.centOfCurv .z()[0]); @ VECT e2 = evecX*evecX + evecY*evecY + evecZ*evecZ; // vector from mirror centre of curvature to virtual detec - VECT virtDetPointVecX ; + VECT virtDetPointVecX __attribute__ (( __aligned__ (64))); virtDetPointVecX .load_a (& data. virtDetPoint .x()[0]); - VECT virtDetPointVecY ; + VECT virtDetPointVecY __attribute__ (( __aligned__ (64))); virtDetPointVecY .load_a (& data. virtDetPoint .y()[0]); - VECT virtDetPointVecZ ; + VECT virtDetPointVecZ __attribute__ (( __aligned__ (64))); virtDetPointVecZ .load_a (& data. virtDetPoint .z()[0]); // const Vector dvec( virtDetPoint - CoC ); @@ -220,7 +220 ,7 @@ namespace RichCacheline - VECT radius; + VECT radius __attribute__ (( __aligned__ (64))); radius.load_a (& data.radius [0]); - VECT reflPointX ; - VECT reflPointY ; - VECT reflPointZ ; + VECT reflPointX __attribute__ (( __aligned__ (64))); + VECT reflPointY __attribute__ (( __aligned__ (64))); + VECT reflPointZ __attribute__ (( __aligned__ (64))); --- a/main.cpp Christina Quast RICH Cherenkov angle status report March 2017

Memory layout Performance improvements +++ b/main.cpp @@ -227,8 +227 ,8 @@ int main ( int argc , char ** argv) - VECTYPE :: PhotonReflections <float > dataV0_vect ; - VECTYPE :: PhotonReflections <float > dataV1_vect ; + VECTYPE :: PhotonReflections <float > dataV0_vect __attribute__ (( __aligned__ (64))); + VECTYPE :: PhotonReflections <float > dataV1_vect __attribute__ (( __aligned__ (64))); diff --git a/vectype.h b/vectype.h index 75 c05bf ..72 db553 100644 --- a/vectype.h +++ b/vectype.h template <typename T, std :: size_t DIM = 16> - using PhotonReflections = std :: vector <PhotonReflection <T, DIM >>; + using PhotonReflections = std :: vector <PhotonReflection <T, DIM >, aligned_alloca Christina Quast RICH Cherenkov angle status report March 2017

Memory layout Performance improvements Nanoseconds per photon Theoretical limit: 52 . 0 B / 340 GBps = 0.153 For 33554432 photons , 1024 wg size, 256 threads: Old solver float: 1000.26 ns Agner Fog’s Vectorclass: 1.04248 ns Aligned allocator: 0.946315 ns Christina Quast RICH Cherenkov angle status report March 2017

Memory layout Performance improvements Nanoseconds per photon Theoretical limit: 52 . 0 B / 340 GBps = 0.153 For 33554432 photons , 1024 wg size, 256 threads: Old solver float: 1000.26 ns Agner Fog’s Vectorclass: 1.04248 ns Aligned allocator: 0.946315 ns Const variables: 0.932545 Christina Quast RICH Cherenkov angle status report March 2017

Memory layout Performance improvements --- a/ QuarticSolverCacheline .h +++ b/ QuarticSolverCacheline .h @@ -81,8 +81 ,8 @@ namespace RichCacheline - const T divnorm = 1.0f/norm; - const T norm_sqrt = sqrt(norm ); + const T divnorm = approx_recipr (norm ); + const T norm_sqrt = approx_recipr ( approx_rsqrt (norm )); nx *= divnorm; ny *= divnorm; nz *= divnorm; @@ - const auto enorm = radius/e; + const auto enorm = radius* approx_recip @@ - VECT cosgamma2 = (evecDvec * evecDvec )/ ed2; + VECT cosgamma2 = (evecDvec * evecDvec) * approx_recipr (ed2 ); - const VECT e = sqrt(e2); - const VECT d = sqrt(d2); + const VECT e = approx_recipr ( approx_rsqrt (e2 )); + const VECT d = approx_recipr ( approx_rsqrt (d2 )); - const VECT singamma = sqrt (1.0f - cosgamma2 )); - const VECT cosgamma = approx_recipr ( approx_rsqrt (cosgamma2 )); + const VECT singamma = approx_recipr ( approx_rsqrt (1.0f - cosgamma2 )); + const VECT cosgamma = approx_recipr ( approx_rsqrt (cosgamma2 )); @@ const VECT maxval = std :: numeric_limits <SKALART >:: max (); - const VECT inv_a0 = ((a0 > 0)? 1.0f/a0: maxval ); + const VECT inv_a0 = ((a0 > 0)? approx_recipr (a0): maxval ); @@ - const auto toberooted = (abs(R) + sqrt(abs(R2 -Q3)) ); + const auto toberooted = (abs(R) + approx_recipr ( approx_rsqrt (abs(R2 -Q3 )))); Christina Quast RICH Cherenkov angle status report March 2017

Memory layout Performance improvements // FIXME: oder zuerst in normales array , dann load? // FIXME: also for double? @@ const auto A = sgnR * rooted; PR(A); - const auto B = Q / A; + const auto B = Q * approx_recipr (A); - const auto u1 = -0.5 * (A + B) - rc / 3.0; + const auto u1 = -0.5 * (A + B) - rc * (1.0f / 3.0f); // FIXME: saturated or not? // const const auto u2 = UU * abs_saturated (A-B); const auto u2 = UU * abs(A-B); - const auto V = sqrt(u1*u1 + u2*u2); + const auto V = approx_recipr ( approx_rsqrt (u1*u1 + u2*u2 )); // std :: complex <TYPE > w3 = ( abs_satured (V) != 0.0 ? (TYPE )( qq * -0.125 ) / V : // std :: complex <TYPE >(0 ,0) ); // FIXME: warum abs saturated when compared to 0.0 ?? - const auto w3r = ((V != 0.0)? (qq * -0.125)/V : 0.0); + const auto w3r = ((V != 0.0)? (qq * -0.125)* approx_recipr (V) : 0.0); // TYPE res = std :: real(w1) + std :: real(w2) + std :: real(w3) - (r4*a); - const auto res = sqrt ((u1+V)*2) + w3r - (r4*a); + const auto res = approx_recipr ( approx_rsqrt ((u1+V)*2)) + w3r - (r4*a); // return the final result // FIXME: std :: move ? const auto r = (( res > 1.0)? 1.0: (( res < -1.0)? -1.0: res )); Christina Quast RICH Cherenkov angle status report March 2017

Memory layout Performance improvements Nanoseconds per photon Theoretical limit: 52 . 0 B / 340 GBps = 0.153 For 33554432 photons , 1024 wg size, 256 threads: Old solver float: 1000.26 ns Agner Fog’s Vectorclass: 1.04248 ns Aligned allocator: 0.946315 ns Const variables: 0.932545 Approx. functions: 0.851242 Christina Quast RICH Cherenkov angle status report March 2017

Memory layout Performance improvements --- a/ QuarticSolverCacheline .h +++ b/ QuarticSolverCacheline .h @@ -142,6 +142 ,18 @@ namespace RichCacheline { + builtin_prefetch (&(((& data )+0)-> radius [0]) , 0, 3); + builtin_prefetch (&(((& data +1)-> emissPnt.x())[0]) , 0, 3); + builtin_prefetch (&(((& data +1)-> emissPnt.y())[0]) , 0, 3); + builtin_prefetch (&(((& data +1)-> emissPnt.z())[0]) , 0, 3); + builtin_prefetch (&(((& data +1)-> centOfCurv.x())[0]) , 0, 3); + builtin_prefetch (&(((& data +1)-> centOfCurv.y())[0]) , 0, 3); + builtin_prefetch (&(((& data +1)-> centOfCurv.z())[0]) , 0, 3); + builtin_prefetch (&(((& data +1)-> virtDetPoint .x())[0]) , 0, 3); + builtin_prefetch (&(((& data +1)-> virtDetPoint .y())[0]) , 0, 3); + builtin_prefetch (&(((& data +1)-> virtDetPoint .z())[0]) , 0, 3); VECT emissionPointVecX __attribute__ (( __aligned__ (64))); emissionPointVecX .load_a (& data.emissPnt.x()[0]); VECT emissionPointVecY __attribute__ (( __aligned__ (64))); @@ + __builtin_prefetch (& data. sphReflPoint .x()[0] , 1, 0); + __builtin_prefetch (& data. sphReflPoint .y()[0] , 1, 0); + __builtin_prefetch (& data. sphReflPoint .z()[0] , 1, 0); reflPointX .store_a (& data. sphReflPoint .x()[0]); reflPointY .store_a (& data. sphReflPoint .y()[0]); Christina Quast RICH Cherenkov angle status report March 2017

RICH Cherenkov angle status report March 2017 Christina Quast March - PowerPoint PPT Presentation

Memory layout Performance improvements RICH Cherenkov angle status report March 2017 Christina Quast March 6, 2017 Christina Quast RICH Cherenkov angle status report March 2017 Memory layout Performance improvements Nanoseconds per photon

RICH DETECTORS Giulia Meo University of Heidelberg 27 January 2017 1/30 Cherenkov Radiation

ALUMINUM ANGLE ARCH ALUMINUM ANGLE 1-1/2x1-1/2x1/8x20 ARCH ALUMINUM ANGLE 1x1x1/16x20 6063 ARCH

Right Angle: An angle whose measure is 90. Straight Angle: An angle whose measure is 180.

Cert-Lexsi Cert-Lexsi Dead angle ( Torpig vs PRG) Dead angle ( Torpig vs PRG) Dead angle (

Air Cherenkov Telescope Arrays Air Cherenkov Telescope Arrays

Phase Angle What is it & why is it Important? Phase Angle Basics What is Phase Angle and

3D orientation Rotation matrix Fixed angle and Euler angle Axis angle

3D orientation Rotation matrix Fixed angle and Euler angle Axis angle

Radio Cherenkov Cherenkov searches for searches for cosmogenic cosmogenic ultra ultra- -

Cherenkov Telescope Arrays Michael Daniel University of Durham michael.daniel@durham.ac.uk

PHENIX crossing angle APEX, May 20, fill # 19083. With proton and Au at proton injection,

Aperture studies for dAu on Feb. 24 We started put in DX angle for the yellow beam, then apply

Financial disclosure Netra Systems, Inc. Pearls on Angle Assessment Pearls on Angle

Approaches for Angle of Arrival Estimation Wenguang Mao Angle of Arrival (AoA) Definition:

Angles which are SMALLER than a right-angle (90 degrees). Angles which are LARGER than a

Status of the CBM- and HADES RICH projects at FAIR C. Pauly, Wuppertal University for the CBM

Advanced Real-Time Simulation Laboratory Prof. Gabriel A. Wainer Twitter: @ARSLab_CU

Adam Tremonte, Andrew Thomas, Emily Huynh, Patrick Dixon Updated Problem Quick Refreshers

MakerGame Game Programming Language Outline Motivation Features Runtime

Domain Name System (DNS) Session-1: Fundamentals Joe Abley AfNOG Workshop, AIS

Open source tools for FPGA development What is available? What is missing? How can we contribute?

Welcome to the Atomic Scale Era: New Paradigms and Processes for Continued Scaling May 12, 2017

ECE PBI Presentation Schedule 2017 Supervisor Organization Committee Room Roll No Name Of

Strategic Update September 2014 Kate Scolnick Vice President, Investor Relations Agenda

RICH Cherenkov angle status report March 2017 Christina Quast March - PowerPoint PPT Presentation

Memory layout Performance improvements RICH Cherenkov angle status report March 2017 Christina Quast March 6, 2017 Christina Quast RICH Cherenkov angle status report March 2017 Memory layout Performance improvements Nanoseconds per photon

RICH DETECTORS Giulia Meo University of Heidelberg 27 January 2017 1/30 Cherenkov Radiation

ALUMINUM ANGLE ARCH ALUMINUM ANGLE 1-1/2x1-1/2x1/8x20 ARCH ALUMINUM ANGLE 1x1x1/16x20 6063 ARCH

Right Angle: An angle whose measure is 90. Straight Angle: An angle whose measure is 180.

Cert-Lexsi Cert-Lexsi Dead angle ( Torpig vs PRG) Dead angle ( Torpig vs PRG) Dead angle (

Air Cherenkov Telescope Arrays Air Cherenkov Telescope Arrays

Phase Angle What is it &amp; why is it Important? Phase Angle Basics What is Phase Angle and

3D orientation Rotation matrix Fixed angle and Euler angle Axis angle

3D orientation Rotation matrix Fixed angle and Euler angle Axis angle

Radio Cherenkov Cherenkov searches for searches for cosmogenic cosmogenic ultra ultra- -

Cherenkov Telescope Arrays Michael Daniel University of Durham michael.daniel@durham.ac.uk

PHENIX crossing angle APEX, May 20, fill # 19083. With proton and Au at proton injection,

Aperture studies for dAu on Feb. 24 We started put in DX angle for the yellow beam, then apply

Financial disclosure Netra Systems, Inc. Pearls on Angle Assessment Pearls on Angle

Approaches for Angle of Arrival Estimation Wenguang Mao Angle of Arrival (AoA) Definition:

Angles which are SMALLER than a right-angle (90 degrees). Angles which are LARGER than a

Status of the CBM- and HADES RICH projects at FAIR C. Pauly, Wuppertal University for the CBM

Advanced Real-Time Simulation Laboratory Prof. Gabriel A. Wainer Twitter: @ARSLab_CU

Adam Tremonte, Andrew Thomas, Emily Huynh, Patrick Dixon Updated Problem Quick Refreshers

MakerGame Game Programming Language Outline Motivation Features Runtime

Domain Name System (DNS) Session-1: Fundamentals Joe Abley AfNOG Workshop, AIS

Open source tools for FPGA development What is available? What is missing? How can we contribute?

Welcome to the Atomic Scale Era: New Paradigms and Processes for Continued Scaling May 12, 2017

ECE PBI Presentation Schedule 2017 Supervisor Organization Committee Room Roll No Name Of

Strategic Update September 2014 Kate Scolnick Vice President, Investor Relations Agenda

Phase Angle What is it & why is it Important? Phase Angle Basics What is Phase Angle and