Neural Network Assisted Tile Size Selection Mohammed Rahman, - PowerPoint PPT Presentation

Neural Network Assisted Tile Size Selection Mohammed Rahman, Louis-Noël Pouchet and P . Sadayappan Dept. of Computer Science and Engineering Ohio State University June 22, 2010 iWAPT 2010 Workshop Berkeley, USA

Introduction: iWAPT’10 Overview Situation: ◮ New advances in parametric tiling → more user code to be tuned ◮ The problem of tile size selection is complex and unsolved! Our approach: ◮ Use machine learning to create a performance predictor of tile size performance, for a specific program ◮ Rely on the distribution shape to extract promising subspaces for empirical search ◮ Outcome: < 2% of the space traversed → 90+% of maximal speedup achieved Ohio State 2

Problem Statement: iWAPT’10 Tiling ◮ Tiling partition the computation into blocks ◮ Note we consider only rectangular tiling here ◮ For tiling to be legal, such a partitioning must be legal Ohio State 3

Problem Statement: iWAPT’10 Parametric Tiling Automatic parametric tiling [ICS’09,CGO’10]: ◮ Produce code where the tile dimensions are parameters ◮ Seamlessly find/apply all required transformation to make the code tilable ◮ Actual tile sizes are given at run-time ◮ very useful for tile size selection (no need to recompile) ◮ recent progresses have generalized the approach: ◮ Operates on arbitrary affine-control loops (imperfectly nested) ◮ Produce good quality code ◮ Even expose pipeline-parallelism if needed ◮ Software (from OSU): Pluto, PrimeTile/DynTile/PTile Ohio State 4

Problem Statement: iWAPT’10 Tile Size Selection Problem: how to select the tile size to have the best performance? ◮ data reuse within the execution of a tile ; ◮ data reuse between tiles ; ◮ the layout in memory of the data used in a tile; ◮ the relative penalty of misses at each level of the hierarchy, which is machine-dependent. ◮ the cache replacement policy; ◮ the interaction with other units, such at prefetching; ◮ the interaction with vectorization, to enable a profitable steady-state for the vectorized loop(s); ◮ ... Ohio State 5

Problem Statement: iWAPT’10 Performance Distribution Performance distribution of fdtd-2d and syr2k dsyr2k: Performance Distribution with Tile Size fdtd-2d: Performance distribution with Tile Size Configurations configurations 0.7 0.6 Execution time in Seconds Execution Time in Seconds 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 1:1:1 4:2:40 8:4:500 12:8:30 30:10:300 40:16:12 64:30:200 128:40:8 200:48:128 300:100:4 500:128:64 1:1:1 4:2:40 8:4:500 12:8:30 16:10:300 20:16:12 25:30:200 30:40:8 35:48:128 42:100:4 48:128:64 Tile sizes- Ti:Tj:Tk Tile Sizes ( Ti:Tj:Tk) ◮ Search space: 10648 possible tile sizes ◮ { 1 , 2 , 4 , 6 , 8 , 10 , 12 , 16 , 30 , 32 , 40 , 48 , 64 , 100 , 128 , 150 , 200 , 256 , 300 , 400 , 500 , 600 } ◮ Machine: Core i7 (1 thread) ◮ 2 "standard" distribution shapes Ohio State 6

Performance Prediction: iWAPT’10 Ojectives Correlate execution time with tile sizes ◮ (Static) performance models do exist... ◮ ... but fail to capture the interplay between all hardware components ◮ Usually better suited for well-known problems (eg, uniform reuse + square tiles) ◮ Another view: pruning the space of poor-performing tile sizes Our approach: ◮ Build a neural network to model the performance distribution ◮ Focus directly on the execution time ◮ ANN dedicated to a specific program + dataset size Ohio State 7

Performance Prediction: iWAPT’10 Neural Network Layout: ◮ Fully connected, multi-layer perceptron (MLP) ◮ Input layer: the tile sizes ( T i , T j , T k ) ◮ Output layer: predicted execution time ◮ One hidden layer consisting of 30 hidden neurons ◮ Use Stuttgart Neural Network Simulator library Training: ◮ Select 5% (530 tuples) from the search space of 10648 ◮ Run the program on the machine using the tile size specified by the tuples ◮ Train with resilient back-propagation (rprop), using the actual execution time for a tuple ◮ Standard 10% cross-validation procedure Ohio State 8

Performance Prediction: iWAPT’10 Performance Prediction [1/2] fdtd-2d: Predicted versus Actual Performance dsyr2k : Predicted versus Actual Performance 0.7 5 Execution Time in Seconds 0.6 4.5 ExTime(Actual) ExTime (Actual ) 4 ExTime(Predicted) Execution Time in seconds 0.5 ExTime (Predicted) 3.5 0.4 3 2.5 0.3 2 0.2 1.5 1 0.1 0.5 0 0 10:12:8 16:2:8 12:1:48 45:128:6 20:2:16 12:400:8 32:4:4 30:64:150 10:1:256 16:400:400 40:600:12 8:4:64 600:128:32 64:4:16 10:400:500 128:2:300 256:200:256 100:40:300 30:300:300 40:10:4 100:300:12 6:12:1 Tile Sizes (Ti:Tj:Tk) Tile sizes - Ti:Tj:Tk Ohio State 9

Performance Prediction: iWAPT’10 Performance Prediction [2/2] lu: Predicted versus Actual Performance dgemm: Predicted versus Actual Performance 0.8 3.5 0.7 ExTime (Actual) 3 Execution Time in Seconds Execution Time in Seconds ExTime (Predicted) 0.6 2.5 ExTime (Actual) 0.5 2 ExTime (Predicted) 0.4 1.5 0.3 1 0.2 0.5 0.1 0 1:1:1 4:2:40 8:4:500 12:8:30 30:10:300 40:16:12 64:30:200 128:40:8 200:48:128 300:100:4 500:128:64 0 12:12:16 32:2:128 64:40:16 2:10:1 1:32:256 256:64:4 10:256:12 4:500:10 30:64:400 6:200:500 256:400:16 Tile Sizes (Ti:Tj:Tk) Tile sizes ( Ti:Tj:Tk) Ohio State 10

Performance Prediction: iWAPT’10 Discussions ◮ for trmm, lu, 2d-jacobi, syr2k and doitgen, predict more than 90% of our search space with less than 10% deviation for the actual execution time ◮ In total, can predict 80% and more with less than 10% deviation ◮ Usually smaller deviation for the best tile sizes → These ANN are able to model the performance distribution Openings: ◮ Program classifier w.r.t. performance distribution ◮ Training: do not "fit" that much the training points? Ohio State 11

Tile Size Selection: iWAPT’10 Selecting the Best Tile Size The performance distribution can drive the empirical search to focus on promising subspaces Tile size selection: ◮ Random approach has a huge variability on some distribution shapes ◮ Exhaustive search is likely not needed ◮ Need for an intermediate solution ◮ Low number of empirical runs ◮ Good convergence, good variability ◮ General enough to work on arbitrary user codes Ohio State 12

Tile Size Selection: iWAPT’10 Overview of the Algorithm Generate a parametrically tiled code 1 Randomly select x % of the tile size space, and run them on the machine 2 Train an ANN using this data 3 Use the ANN to predict performance of the entire space 4 Collect y tile sizes that are predicted best and not already ran 5 Run the y tile sizes on the machine, output the best found 6 Ohio State 13

Tile Size Selection: iWAPT’10 Experimental Setup ◮ Studied various kernels (perfectly/imperfectly nested, BLAS & stencils) ◮ Only focused on single-threaded execution, on an Intel Core i7 ◮ Comparison: simple random search (R), ANN search (ANN) ◮ Repeat each experiment 100 times, for various sampling rate Ohio State 14

Tile Size Selection: iWAPT’10 Experimental Results ( y = 50 ) doitgen gemm syr2k lu 2d-jacobi fdtd-2d R-best 100% 99.86% 98.15% 99.89% 99.91% 97.75% R-average 98.71% 96.29% 94.80% 92.19% 94.10% 84.15% R-worst 95.35% 69.64% 89.81% 40.63% 17.69% 31.02% 1% ANN-best 100% 99.86% 100% 100% 99.91% 100% ANN-average 98.89% 96.35% 96.01% 92.62% 98.51% 84.50% ANN-worst 97.26% 82.93% 89.79% 79.68% 94.23% 66.53% R-best 99.97% 99.86% 98.71% 99.89% 100% 100% R-average 98.71% 96.42% 94.80% 92.87% 97.60% 84.10% R-worst 86.49% 67.89% 88.20% 45.29% 55.98% 27.30% 2% ANN-best 100% 99.86% 100% 100% 100% 100% ANN-average 98.89% 96.76% 96.69% 95.34% 98.55% 88.61% ANN-worst 97.26% 89.83% 89.65% 85.80% 94.17% 60.65% R-best 99.97% 99.86% 98.71% 99.89% 100% 100% R-average 98.77% 96.47% 94.80% 94.27% 98.39% 85.47% R-worst 94.89% 63.58% 87.99% 61.24% 84.54% 47.99% 3% ANN-best 99.97% 99.86% 100% 100% 100% 100% ANN-average 98.93% 97.14% 97.17% 95.34% 98.74% 91.45% ANN-worst 97.64% 91.01% 92.27% 85.80% 94.50% 63.34% R-best 99.97% 99.86% 98.71% 99.89% 100% 100% R-average 98.80% 96.65% 94.93% 92.19% 98.41% 85.55% R-worst 96.86% 69.73% 88.57% 52.03% 82.47% 43.74% 4% ANN-best 100% 99.86% 100% 100% 100% 100% ANN-average 98.99% 97.67% 97.20% 95.79% 98.90% 93.55% ANN-worst 98.28% 93.65% 92.66% 85.80% 94.50% 79.26% Ohio State 15

Tile Size Selection: iWAPT’10 Some Related Work Epshteyn et al. [LCPC’05]: ◮ Search-oriented contribution ◮ Uses regression curves to approximate the performance distribution ◮ Uses active learning to select good candidates for empirical evaluation ◮ Good results for BLAS kernels Yuki et al. [CGO’10]: ◮ Aims at selecting/combining between different static models ◮ Uses program features to characterize accesses, train ANN ◮ Results demonstrated for matrix-like kernels Ohio State 16

Neural Network Assisted Tile Size Selection Mohammed Rahman, - PowerPoint PPT Presentation

Neural Network Assisted Tile Size Selection Mohammed Rahman, Louis-Nol Pouchet and P . Sadayappan Dept. of Computer Science and Engineering Ohio State University June 22, 2010 iWAPT 2010 Workshop Berkeley, USA Introduction: iWAPT10

Experience the Difference 2017 DECRA Villa Tile Panel Detail 2017 DECRA Villa Tile Roof

Automatic Creation of Tile Size Selection Models Tomofumi Yuki Lakshminarayanan Renganarayanan

Marker Assisted Marker Assisted Selection Selection Biotechnology in Action Biotechnology in

Eastern Redcedar Mulch Tile Meet the Team Overview Mission Statement Mulch Tile Process

Odyssey 2016 The Speaker and Language Recognition Workshop June 21-24, 2016 Bilbao, Spain The

Domino Tilings Can you tile the grid with L-shaped tiles? Domino Tilings Can you tile the grid

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Corporate Presentation May 2018 Agenda Global Tile Industry Indian Tile Industry Kajaria

RED LAKE RIVER FARM TO RED LAKE RIVER FARM TO STREAM TILE DRAINAGE STREAM TILE DRAINAGE STUDY

Corporate Presentation Oct 2018 Agenda Global Tile Industry Indian Tile Industry Kajaria

Corporate Presentation May 2019 Agenda Global Tile Industry Indian Tile Industry Kajaria

Corporate Presentation February 2020 Agenda Global Tile Industry Indian Tile Industry Kajaria

CSSS 569 Visualizing Data and Models Lab 5: Intro to tile Kai Ping (Brian) Leung Department of

The Diopsis Multiprocessor Tile of ShApes The Diopsis Multiprocessor Tile of ShApes Pier

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Overview Introduction Adaptability Reconfiguration Recap of the Bio-Networking

On Creating Agency, Inter-agency and National Initiatives Jim Kurose Distinguished University

Savings Network Peer Call July 9, 2019, 2:00-3:30pm ET Todays Topic: Understanding Racial

May 2010 Charlie Carroll This material is based upon work supported by the Defense Advanced

From Sputnik to Interplanetary Networking: a concise overview of Space Communications in the last

Chapter 1: Introduction What is a Network? What is Internet? Compared with postal service

Discovering Exis7ng Systems:c-sparql G. Cugola E. Della

FOSDEM17 OPEN MEDIA DEV ROOM Review and Wrap Up ! A. Kouadio - EBU This years Topics