Decoupled Access/Execute Metaprogramming Anton Lokhmotov, Lee - PowerPoint PPT Presentation

Decoupled Access/Execute Metaprogramming Anton Lokhmotov, Lee Howes, Paul H.J. Kelly (Imperial); Alastair F. Donaldson (Oxford/Codeplay) University of Birmingham, 3 July 2009

Challenge Æcute model Vision Recent meeting on accelerated computing at Imperial (35–40 attendees summarised by their affiliation) Computing (software optimisation, cognitive robotics, visual information processing, reconfigurable computing) Electrical Engineering (reconfigurable computing, design automation) Mechanical Engineering (multiscale flow dynamics, vibration technology) Earth Science and Engineering (applied modelling & computation) Physics (plasma, experimental solid state) Chemistry (computational, biological & biophysical) Biomedical Engineering Chemical Engineering Civil Engineering Aeronautics A. Lokhmotov, L. Howes, A.F. Donaldson, P.H.J. Kelly Decoupled Access/Execute Metaprogramming

Challenge Æcute model Vision Berkeley motifs (dwarfs) Dense Linear Algebra Sparse Linear Algebra N-Body Methods Spectral Methods Structured Grids Unstructured Grids MapReduce Combinational Logic Graph Traversal Dynamic Programming Backtrack and Branch-and-Bound Graphical Models Finite State Machines A. Lokhmotov, L. Howes, A.F. Donaldson, P.H.J. Kelly Decoupled Access/Execute Metaprogramming

Challenge Æcute model Vision Why accelerator programming is challenging? Accelerator hardware hundreds of functional units software-managed memory hierarchies, e.g. host memory (main memory) device global memory (on-board) device local memory (on-chip) Accelerator software low-level, hence unproductive architecture-specific, hence nonportable A. Lokhmotov, L. Howes, A.F. Donaldson, P.H.J. Kelly Decoupled Access/Execute Metaprogramming

Challenge Æcute model Vision The fundamental software engineering challenge How to use accelerator technology but keep maintainability, composability, reusability, portability? A. Lokhmotov, L. Howes, A.F. Donaldson, P.H.J. Kelly Decoupled Access/Execute Metaprogramming

Challenge Access/execute metadata Æcute model Cell BE implementation Vision Current & future work Decoupled Access/Execute (Æcute) model Decoupled Access/Execute metaprogramming kernel code written for uniform memory execute metadata describe execution constraints access metadata describe memory access pattern part of the kernel’s interface specification Goals robust translation into efficient low-level code ample opportunities for optimisation convenience and flexibility A. Lokhmotov, L. Howes, A.F. Donaldson, P.H.J. Kelly Decoupled Access/Execute Metaprogramming

Challenge Access/execute metadata Æcute model Cell BE implementation Vision Current & future work Execute metadata Execute metadata for a kernel is a tuple E = ( I , R , P ) , where: I ⊂ Z n is a finite, n -dimensional iteration space , for some n > 0; R ⊆ I × I , is a precedence relation such that ( i 1 , i 2 ) ∈ R iff iteration i 1 must be executed before iteration i 2 . P is a partition of I into a set of non-empty, disjont iteration I k : I = � I k ; I i � I j = ∅ , i � = j � � subspaces: P = A. Lokhmotov, L. Howes, A.F. Donaldson, P.H.J. Kelly Decoupled Access/Execute Metaprogramming

Challenge Access/execute metadata Æcute model Cell BE implementation Vision Current & future work Access metadata Access metadata for a kernel is a tuple A = ( M r , M w ) , where: M r : I → P ( M ) specifies the set of memory locations M r ( i ) that may be read on iteration i ∈ I ; M w : I → P ( M ) specifies the set of memory locations M w ( i ) that may be written on iteration i ∈ I . A. Lokhmotov, L. Howes, A.F. Donaldson, P.H.J. Kelly Decoupled Access/Execute Metaprogramming

Challenge Access/execute metadata Æcute model Cell BE implementation Vision Current & future work Example: 2D convolution K K � � O y , x = C u , v · I y + u , x + v u = − K v = − K I : input image O : output image C : coefficients W : image width H : image height K : neighbourhood radius K ≤ y < H − K ; K ≤ x < W − K A. Lokhmotov, L. Howes, A.F. Donaldson, P.H.J. Kelly Decoupled Access/Execute Metaprogramming

Challenge Access/execute metadata Æcute model Cell BE implementation Vision Current & future work Memory access of a single ( y , x ) iteration ( K = 1) K K � � O y , x = C u , v · I y + u , x + v u = − K v = − K I O Region of Iteration (y,x) Region of 3 1 x 3 1 y 3 3 All of C A. Lokhmotov, L. Howes, A.F. Donaldson, P.H.J. Kelly Decoupled Access/Execute Metaprogramming

Challenge Access/execute metadata Æcute model Cell BE implementation Vision Current & future work Æcute specification ( h × w rectangular tiling) K K � � O y , x = C u , v · I y + u , x + v u = − K v = − K Execute metadata ( I , R , P ) : � � I = ( y , x ) : K ≤ y < H − K , K ≤ x < W − K R = ∅ � P = { ( y , x ) ∈ I : h ( j − 1 ) ≤ y − K < hj , w ( i − 1 ) ≤ � x − K < wi } : 1 ≤ j < ( H − 2 K ) / h , 1 ≤ i < ( W − 2 K ) / w Access metadata ( M r , M w ) : � � M r = I y + u , x + v , C u , v : ( y , x ) ∈ I , − K ≤ u , v ≤ K � � O y , x : ( y , x ) ∈ I M w = A. Lokhmotov, L. Howes, A.F. Donaldson, P.H.J. Kelly Decoupled Access/Execute Metaprogramming

Challenge Access/execute metadata Æcute model Cell BE implementation Vision Current & future work I i , j , O i , j : 0 ≤ i < H , 0 ≤ j < W ; C i , j : − K ≤ i , j ≤ K C++ rgb I[W][H]; rgb O[W][H]; rgb C[2*K+1][2*K+1]; Æcute (data wrappers) Array2D<rgb> arrayI(&I[0][0], W, H); Array2D<rgb> arrayO(&O[0][0], W, H); Array2D<rgb> arrayC(&C[0][0], 2*K+1, 2*K+1); A. Lokhmotov, L. Howes, A.F. Donaldson, P.H.J. Kelly Decoupled Access/Execute Metaprogramming

Challenge Access/execute metadata Æcute model Cell BE implementation Vision Current & future work Iteration space: K ≤ y < H − K , K ≤ x < W − K C++ for (y = K; y < H-K; ++y) for (x = K; x < W-K; ++x) // Kernel code for each (y,x) Æcute (execute metadata) IterationSpace1D y(K,H-K); IterationSpace1D x(K,W-K); IterationSpace2D iterYX(y,x); A. Lokhmotov, L. Howes, A.F. Donaldson, P.H.J. Kelly Decoupled Access/Execute Metaprogramming

Challenge Access/execute metadata Æcute model Cell BE implementation Vision Current & future work Access regions: implicit in C++, explicit in Æcute C++ // Kernel code for each (y,x) rgb sum(0.0f, 0.0f, 0.0f); for (u = -K; u <= K; ++u) for (v = -K; v <= K; ++v) sum += C[K+u][K+v] * I[y+u][x+v]; // read from C and I O[y][x] = sum; // write to O Æcute (access metadata) // Access descriptors Neighbourhood2D_R accessI(iterYX, arrayI, K); Point2D_W accessO(iterYX, arrayO); All_R accessC(iterYX, arrayC); A. Lokhmotov, L. Howes, A.F. Donaldson, P.H.J. Kelly Decoupled Access/Execute Metaprogramming

Challenge Access/execute metadata Æcute model Cell BE implementation Vision Current & future work Kernel code C++ // Kernel code for each (y,x) int u, v; rgb sum(0.0f, 0.0f, 0.0f); for (u = -K; u <= K; ++u) for (v = -K; v <= K; ++v) sum += C[K+u][K+v] * I[y+u][x+v]; O[y][x] = sum; Æcute (kernel method) void kernel( const IterationSpace2D::iterator &it) { int u, v; rgb sum(0.0f, 0.0f, 0.0f); for (u = -K; u <= K; ++u) for (v = -K; v <= K; ++v) sum += accessC(u, v) * accessI(it, u, v); accessO(it) = sum; } A. Lokhmotov, L. Howes, A.F. Donaldson, P.H.J. Kelly Decoupled Access/Execute Metaprogramming

Challenge Access/execute metadata Æcute model Cell BE implementation Vision Current & future work Bringing all together // Data wrappers Array2D<rgb> arrayI(&I, W, H); Array2D<rgb> arrayO(&O, W, H); Array2D<rgb> arrayC(&C, 2*K+1, 2*K+1); // Execute metadata IterationSpace1D y(K,H-K); IterationSpace1D x(K,W-K); IterationSpace2D iterYX(y,x); // Access metadata Neighbourhood2D_R accessI(iterYX, arrayI, K); Point2D_W accessO(iterYX, arrayO); All_R accessC(iterYX, arrayC); // Filter initialisation and execution ConvolutionFilter2D conv(iterYX, accessI, accessO, accessC); iterYX.tile(h, w); conv.run(); A. Lokhmotov, L. Howes, A.F. Donaldson, P.H.J. Kelly Decoupled Access/Execute Metaprogramming

Challenge Access/execute metadata Æcute model Cell BE implementation Vision Current & future work Æcute metadata benefits data movement synthesis and optimisation (e.g. software pipelining and exploiting data reuse) machine-independent abstraction, machine-dependent tuning (via partitioning) potential for inter-kernels optimisations (e.g. loop fusion and array contraction) A. Lokhmotov, L. Howes, A.F. Donaldson, P.H.J. Kelly Decoupled Access/Execute Metaprogramming

Decoupled Access/Execute Metaprogramming Anton Lokhmotov, Lee - PowerPoint PPT Presentation

Decoupled Access/Execute Metaprogramming Anton Lokhmotov, Lee Howes, Paul H.J. Kelly (Imperial); Alastair F. Donaldson (Oxford/Codeplay) University of Birmingham, 3 July 2009 Challenge cute model Vision Recent meeting on accelerated

Decoupled Access/Execute Computer Architectures James E. Smith Presented by Dan Amelang How

Metaprogramming Haskell, Metaprogramming Haskell, Metaprogramming Haskell, The Racket Way The

Backstage Java Making a Difference in Metaprogramming Zachary Palmer and Scott F. Smith The

Metaprogramming & JavaScript Object Proxies What is metaprogramming ? Writing programs that

Deriving Efficient Data Movement From Decoupled Access/Execute Specifications Lee W. Howes,

Metaprogramming Programs as Data Metaprogramming Programs that use other programs as data

TDDE45 - Lecture 6: Metaprogramming and debugging Martin Sjlund Department of Computer and

Metaprogramming November 29, 2017 Todays goals Seeing the diversity of tools for

Damping Power System Inter-area Oscillations Through Decoupled Modulation Rui Fan, Shaobu Wang

Secrets of the decoupled Drupal practitioner Preston So April 11, 2019 DrupalCon

Decoupled I/O for Data-Intensive High Performance Computing Chao Chen 1 Yong Chen 1 Kun Feng 2

1 Decoupled & Uprooted Case Study, Government of Flanders Tomas Flpp (Vacilando) 2

Demystifying Decoupled Drupal with Contenta CMS Bayo Fodeke & Mark Shropshire Todays

Metaprogramming Tutorial: OCaml and Template Haskell Jake Donham and Nicolas Pouillard

Metamodeling and Metaprogramming 1. Introduction to metalevels 2. Different Ways of

Metaprogramming & JavaScript Object Proxies Prof. Tom Austin San Jos State University

NP-completeness Evgenij Thorstensen V18 Evgenij Thorstensen NP-completeness V18 1 / 24 Recap

Advanced Machine Learning Convolutional Neural Networks Amit Sethi Electrical Engineering, IIT

First Building Blocks For Implementations of Security Protocols Verified in Coq Reynald Affeldt

Lecture 17: Edit Distance Steven Skiena Department of Computer Science State University of New

Deep Learning Lab Paulo Rauber paulo@idsia.ch Imanol Schlag imanol@idsia.ch Aleksandar Stanic

Code Completion with Neural Attention and Pointer Networks Jian Li, Yue Wang, Irwin King, and

Formalizing Turing Machines Andrea Asperti & Wilmer Ricciotti Department of Computer Science,

Lesson 9 Introduction Signal Spectral Analysis: Estimation of the power spectral density

Decoupled Access/Execute Metaprogramming Anton Lokhmotov, Lee - PowerPoint PPT Presentation

Decoupled Access/Execute Metaprogramming Anton Lokhmotov, Lee Howes, Paul H.J. Kelly (Imperial); Alastair F. Donaldson (Oxford/Codeplay) University of Birmingham, 3 July 2009 Challenge cute model Vision Recent meeting on accelerated

Decoupled Access/Execute Computer Architectures James E. Smith Presented by Dan Amelang How

Metaprogramming Haskell, Metaprogramming Haskell, Metaprogramming Haskell, The Racket Way The

Backstage Java Making a Difference in Metaprogramming Zachary Palmer and Scott F. Smith The

Metaprogramming &amp; JavaScript Object Proxies What is metaprogramming ? Writing programs that

Deriving Efficient Data Movement From Decoupled Access/Execute Specifications Lee W. Howes,

Metaprogramming Programs as Data Metaprogramming Programs that use other programs as data

TDDE45 - Lecture 6: Metaprogramming and debugging Martin Sjlund Department of Computer and

Metaprogramming November 29, 2017 Todays goals Seeing the diversity of tools for

Damping Power System Inter-area Oscillations Through Decoupled Modulation Rui Fan, Shaobu Wang

Secrets of the decoupled Drupal practitioner Preston So April 11, 2019 DrupalCon

Decoupled I/O for Data-Intensive High Performance Computing Chao Chen 1 Yong Chen 1 Kun Feng 2

1 Decoupled &amp; Uprooted Case Study, Government of Flanders Tomas Flpp (Vacilando) 2

Demystifying Decoupled Drupal with Contenta CMS Bayo Fodeke &amp; Mark Shropshire Todays

Metaprogramming Tutorial: OCaml and Template Haskell Jake Donham and Nicolas Pouillard

Metamodeling and Metaprogramming 1. Introduction to metalevels 2. Different Ways of

Metaprogramming &amp; JavaScript Object Proxies Prof. Tom Austin San Jos State University

NP-completeness Evgenij Thorstensen V18 Evgenij Thorstensen NP-completeness V18 1 / 24 Recap

Advanced Machine Learning Convolutional Neural Networks Amit Sethi Electrical Engineering, IIT

First Building Blocks For Implementations of Security Protocols Verified in Coq Reynald Affeldt

Lecture 17: Edit Distance Steven Skiena Department of Computer Science State University of New

Deep Learning Lab Paulo Rauber paulo@idsia.ch Imanol Schlag imanol@idsia.ch Aleksandar Stanic

Code Completion with Neural Attention and Pointer Networks Jian Li, Yue Wang, Irwin King, and

Formalizing Turing Machines Andrea Asperti &amp; Wilmer Ricciotti Department of Computer Science,

Lesson 9 Introduction Signal Spectral Analysis: Estimation of the power spectral density

Metaprogramming & JavaScript Object Proxies What is metaprogramming ? Writing programs that

1 Decoupled & Uprooted Case Study, Government of Flanders Tomas Flpp (Vacilando) 2

Demystifying Decoupled Drupal with Contenta CMS Bayo Fodeke & Mark Shropshire Todays

Metaprogramming & JavaScript Object Proxies Prof. Tom Austin San Jos State University

Formalizing Turing Machines Andrea Asperti & Wilmer Ricciotti Department of Computer Science,