vectorization of single particle tube distancetoin
play

Vectorization of single particle Tube DistanceToIn A particle can - PowerPoint PPT Presentation

Vectorization of single particle Tube DistanceToIn A particle can hit a tube in three ways: Top or bottom Z faces Theres no need to calculate both only calculate distance with the closest one (by taking the absolute value of


  1. Vectorization of single particle Tube DistanceToIn • A particle can hit a tube in three ways: – Top or bottom Z faces • There’s no need to calculate both – only calculate distance with the closest one (by taking the absolute value of Z), impossible to hit the other one • No benefit from vectorization

  2. – Inner or outer cylinders • Involves solving the set of equations: – x’ 2 + y’ 2 = r 2 – x + dist*dir.x = x’ (similarly for y) – Need to solve quadratic equation of the form ax 2 + bx + c = 0 – Can solve both for Rmin and Rmax at the same time using vectorization

  3. – Two phi planes • Calculate intersection between two vectors: the trajectory of the particle and the vector of the phi plane • Can calculate both distances at the same time using vectorization

  4. – An important observation: Each time we will be filling the vectors with only two elements • In AVX, two of the slots in the vector will be empty! – Should not be a problem in most cases, alas… – DIVPD latency: 10-20 cycles – VDIVPD latency: 19-35 cycles – SQRTPD latency: 8-14 cycles – VSQRTPD latency: 16-28 cycles – (number of cycles for a haswell) • In which case we would want to emit SSE instructions, not AVX! – No ability to switch between instruction sets in Vc “on the fly” – Would need to compile for SSE (or wait until Vc 1.0 adds support for SIMD arrays of arbitrary length) » Still would need to be careful – AVX-SSE transition penalties (vzeroupper instruction might help?)

  5. • Compiling for SSE 4.2 performed around 10- 15% faster than AVX in this particular instance Speedups • Averaged over 1000 repetitions • 50% hit rate – no points inside • Turbo off – pinned to core #2 • CPU governor set to maximum

  6. One-particle vectorized DistanceToIn vs USolids 4.00E+00 3.50E+00 3.00E+00 2.50E+00 2.00E+00 Speedup 1.50E+00 1.00E+00 5.00E-01 0.00E+00 10 20 40 80 100 200 400 500 800 1000 2000 5000 10000 Number of particles

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend