Code R Region B Based A Auto-Tu Tuning En Enabled C Compilers - - PowerPoint PPT Presentation

code r region b based a auto tu tuning en enabled c
SMART_READER_LITE
LIVE PREVIEW

Code R Region B Based A Auto-Tu Tuning En Enabled C Compilers - - PowerPoint PPT Presentation

Code R Region B Based A Auto-Tu Tuning En Enabled C Compilers M. James Kalyan Xiang Wang Ahmed Eltantawy Yaoqing Gao Motivation Binary Developer 2 Motivation Binary Auto-Tuner 3 Approach .6% speedup over


slide-1
SLIDE 1

† §

  • M. James Kalyan§†

Xiang Wang§ Ahmed Eltantawy§ Yaoqing Gao§

Code R Region B Based A Auto-Tu Tuning En Enabled C Compilers

slide-2
SLIDE 2

Motivation

2

Developer Binary

slide-3
SLIDE 3

Motivation

3

Auto-Tuner Binary

slide-4
SLIDE 4

Approach

4

Auto-Tuner Tuning A Aware Co Compile ler Binary

Up to 19.6 .6% speedup over standard optimization and 11.5 .5% over coarse grained tuning

slide-5
SLIDE 5

High-Level

5

slide-6
SLIDE 6

Code Region Tuning

  • Any segment of IR that can be

independently optimized

  • Loops
  • Modules
  • Basic Blocks

6

Code Region Based Au Auto-Tu Tuning

What is a code region?

slide-7
SLIDE 7

Module 1

Tuning Parameters

  • Optimization pass selection/order
  • Loop Unroll/peel count
  • Machine scheduling policy
  • Support for more additional tuning parameters was limited by

development time

7

Pass 1 Pass 2 Loop 1 Loop 2 Loop 2 Loop 2 Loop 2 Loop 1

Basic Block 1

Policy 1 Policy 2

Module 2

Pass 1 Pass 3

Basic Block 2

Policy 1 Policy 2

slide-8
SLIDE 8

Code Region Auto-Tuning

  • Prerequisites:
  • Identify t

the co code r regions of a given source and the possible optimizations on those code regions

  • Au

Auto-tu tune: automatically make optimization decisions about the code regions

  • Apply t

the o

  • ptimization decisions when compiling

8

This is what we call enabling the compiler for auto-tuning, which is a necessary step for code region based auto-tuning

How to enable auto-tuning on code regions?

slide-9
SLIDE 9

Code Region Auto-Tuning

(for the diagrammatically inclined)

9

We penetrate LLVM’s pass analysis to record tuning

  • pportunities (identify c

code re regions)

The code regions are identified uniquely

The auto-tuner’s search algorithms make decisions about what optimizations to apply (aut auto-tu tuning) These decisions are recorded as a tuning configuration in an xml format The tuning configuration is read by the compiler and the correct

  • ptimizations are overridden

The tuned binary is compiled and profiled, the performance is given as feedback to the search driver Note: the dotted lines are executed once per tuning run

slide-10
SLIDE 10

Methodology

  • We built our tuning mechanism using:
  • OpenTuner
  • LLVM 4.0
  • Search algorithms: OpenTuner’s built-in AUC Bandit meta-technique

cycling between:

  • Differential Evolution, Random Nelder-Mead, Greedy Hill Climbing
  • Results are shown on the industry benchmarks: CoreMark, HPCG,

and Livermore Loops, running on an x86 CPU

10

slide-11
SLIDE 11

Experimental Results (CoreMark)

11

Results for CoreMark on x86

Na Name De Description Coarse S Scope Fine S Scope Best S Speedup Over Coarse Over –O2 Phase

  • rdering

Ordering of optimization passes (LLVM IR) All modules Per module 1.115x 1.196x Loop unrolling/p eeling Factor to unroll/peel loops by (LLVM IR) All loops Per loop 1.036x 1.106x Machine scheduling policy Scheduling rule for instructions (x86 machine IR) All basic blocks Per basic block 1.001x 1.003x

slide-12
SLIDE 12

Experimental Results (CoreMark)

12

Coarse Fine

Iteration time = time(configuration choice) + time(compile) + time(runtime) ≈ 45s

Coarse Fine

Loop Auto-Tuning Module Auto-Tuning

Potential Speedup

  • O2

Expected Speedup

slide-13
SLIDE 13

Experimental Results (others)

  • HPCG
  • 5% speedup over coarse grained while tuning loops
  • Livermore Loops
  • 2% speedup over coarse grained while tuning loops

13

slide-14
SLIDE 14

Related Work

  • Code Region Oblivious Auto-Tuning
  • Compiler as a black box
  • Compiler Auto-Tuning Survey (2018)
  • GCC flag tuning with CK-autotuning framework
  • Isolated Code Region Based Auto-Tuning
  • Predicting Unroll Factors Using Supervised Classification
  • Code Region Based Auto-Tuning
  • Region-Aware Multi-Objective Auto-Tuner for Parallel Programs (2017)
  • Code region based thread count tuning for parallelization

14

slide-15
SLIDE 15

Limitations/Future Work

  • Have not identified/implemented many code regions or fine

grained optimizations

  • Support more code region types and optimizations
  • Optimizations disrupt the IR—can lose track of CRIDs
  • Auto-tuning stages
  • Iterative compiler auto-tuning is time-expensive and must be done

per program

  • RNN/RL approach for predicting compiler configurations

15

A new host of challenges

slide-16
SLIDE 16

Future Work: Predictive Tuning Challenges

  • Predict configurations for code regions of arbitr

trary ty type

  • Features to describe any code region (while minimizing noise)
  • Feature extraction (encompass code region and program info)
  • Label vectors of variable size (pass sequences)
  • Stage based tuning is remaining issue

16

slide-17
SLIDE 17

Summary

  • Problem:
  • Current compiler auto-tuning methods are missing out on performance

peaks

  • Approach:
  • Enabled code region based (fine grained) tuning within the compiler
  • Results:
  • Observed speedup over standard optimization and coarse grained tuning

17