Outline Outline 1.) N-Body Methods 2.) Dynamic Programming 3.) - PowerPoint PPT Presentation

Outline Outline  1.) N-Body Methods  2.) Dynamic Programming  3.) Sparse Linear Algebra  4.) Unstructured Grids  5.) Conclusion 1

N-Body Methods Overview

N-Body Methods • Particle simulation – Large number of simple entities – Each entity is dependent on each other – Regularly structured FP computations • Xeon Phi offers acceptable performance – Regularity leads to good auto-vectorization – Cache misses result in very high memory latency • L1 misses mostly lead to L2 misses

N-Body platform comparison Source: [3]

Dynamic Programming Overview

Dynamic Programming • Problem is divided into smaller problems – Each smaller (and simpler) problem is solved first – Results are combined to compute larger problems – Often used for sequence alignment algorithms • Example: DNA Sequence Alignment – Xeon Phi 5110P vs. 2 GPU configurations ( GeForce GTX 480, K20c ) • Outperforms GTX 480 • Reaches 74,3% performance compared to the K20c – Computation heavily based on peak integer performance • Xeon Phi has ~ 57,3 % peak integer performance of the K20c

Sparse Linear Algebra Overview

Sparse Linear Algebra • Matrices with scattered non-zero entries • Linear system solvers, Eigen solvers • Different matrix structures – Patterns can arise – Irregularities • False predictions • Vector registers contain zero entries

Case Study: Sparse MatrixVector Operations • Large sparse matrix A • Dense vector x • Multiply-Add • Simple instructions, large quantitiy • Parallel execution • Access patterns Image Source: [2]

Bottleneck Considerations Bandwidth limits Bandwidth limits Memory bandwidth Memory bandwidth Computation bandwidth Computation bandwidth Memory latency Memory latency SIMD efficiency SIMD efficiency Vector register utilization Vector register utilization Core utilization Core utilization

Algorithm Showcase • CSR / SELLPACK format – Column blocks – Finite-Window sorting • Adaptive load balancing Image Source: [1]

Evaluation: Effects of single improvements Source: [1]

Evaluation: Platform comparison Source: [1]

Evaluation: Platform comparison [Gflops/s] Matrices from the UFL Sparse Matrix Collection (/20) Matrix 8 has much more non-zero entries than the rest. Based on data from [2]

Conclusion: Sparse Linear Algebra on Xeon Phi • High potential in sparse linear algebra – Wide vector registers – High number of cores • Main problem points: – Irregular access patterns – Sparse data structures – Memory latency high on L2 cache misses • Performance extremely dependent on data

Unstructured Grids Unstructured Grids 1 Image Source: [4] 6

Image Source: [5]

Image Source: [6]

Unstructured Grids Unstructured Grids ● Partitioning the grid (few connections) ● Irregular acces-patterns => bad for auto-vectorization ● Software prefetching grid cells ● Ideal prefetch amount: MM latency * MM bandwidth 1 9

Unstructured Grids Unstructured Grids  Data races can not be determined statically  „loop over edges and accessing data on edges“ => data races 2 0

Airfoil Benchmark Airfoil Benchmark ● 2D inviscid airfoil code ● considered bandwith bound ● 2,800,000 cell mesh 2 Image Source: [7] 1

Competitors Competitors 2 × Xeon E5-2680 Xeon Phi 5110P Tesla K40 3500 3000 2500 2000 1500 1000 500 0 Price $ # Cores Mem-Bandwidth double GFlops Last Level Cache 2 2

Airfoil on Xeon Phi Airfoil on Xeon Phi  At the highest level: Message Passing Protocol (MPI)  At the lower level: OpenMP + Vector intrinsics 2 3

Xeon Phi Benchmark graph Xeon Phi Benchmark graph 2 Image Source: [8] 4

Benchmark graph Benchmark graph 2 Image Source: [8] 5

Unstructured Grids conclusion Unstructured Grids conclusion  Benchmark probably not fair (price)  DO manual Vectorization: Speedup 1.7-1.82  Use MPI + + OpenMP 2 6

Who did what Who did what Philipp Bartels:  Architecture introduction: slide 18 and 1 to 8  Presentation team x2: slide 1 and 16 to 27 Eugen Seljutin:  Presentation team x2: slide 2 to 15 2 7

Credits Credits [1] Xing Liu et al.: Efficient Sparse Matrix-Vector Multiplication on  x86-Based Many-Core Processors [2] Erik Saule et al.: Performance Evaluation of Sparse Matrix  Multiplication Kernels on Intel Xeon Phi [3] Konstantinos Krommydas et al.: On the Characterization of  OpenCL Dwarfs on Fixed and Reconfigurable Platforms [4] http://en.wikipedia.org/wiki/Unstructured_grid  [5] http://view.eecs.berkeley.edu/wiki/Unstructured_Grids  [6] https://cfd.gmu.edu/~jcebral/gallery/vis04/index.html  2 8

Credits Credits [7] http://en.wikipedia.org/wiki/File:Airfoil_with_flow.png  [8] I. Z. Reguly et al.: Vectorizing Unstructured Mesh Computations  for Many-core Architecture 2 9

Outline Outline 1.) N-Body Methods 2.) Dynamic Programming 3.) - PowerPoint PPT Presentation

Outline Outline 1.) N-Body Methods 2.) Dynamic Programming 3.) Sparse Linear Algebra 4.) Unstructured Grids 5.) Conclusion 1 N-Body Methods Overview N-Body Methods Particle simulation Large number of simple entities

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

Marc Paterno and V. Daniel Elvira Fermilab 2 nd Annual

/home/ytang/slides /home/ytang/exercise make your own copy!

Algebraic Perspectives in Interacting Classical Field Theories Romeo Brunetti Universit` a di

Coupling Solar Simulations with Space Weather: Code Comparison in SWIFF Consortium Vyacheslav

Applications in finite state automata Presentation Topics and First Steps Kurt Eberle

On the Duality of Operating System Structures Hugh Lauer Xerox Corporation Roger Needham Cambridge

USER INTERFACES Steven Tang and Eric Tzeng August 13, 2013 Announcements Tons of office

Here are the songs we sang this Sunday. This shows the song name, the artist who performed the