impact of different compiler options on energy
play

Impact of different compiler options on energy consumption James - PowerPoint PPT Presentation

Impact of different compiler options on energy consumption James Pallister University of Bristol / Embecosm Simon Hollis University of Bristol Jeremy Bennett Embecosm 1 Motivation Compiler optimizations are claimed to have a


  1. μ Impact of different compiler options on energy consumption James Pallister University of Bristol / Embecosm Simon Hollis University of Bristol Jeremy Bennett Embecosm 1

  2. μ Motivation ● Compiler optimizations are claimed to have a large impact on software: – Performance – Energy ● No extensive study prior to this considering: – Different benchmarks – Many individual optimizations – Different platforms ● This work looks at the effect of many different optimizations across 10 benchmarks and 5 platforms. ● 238 Optimization passes covered by 150 flags – Huge amount of combinations 2

  3. μ This Talk ● This talk will cover: – Importance of benchmarks – Platforms – How to explore 2^150 combinations of options – Correlation between time and energy – How to predict the effect of the optimizations 3

  4. μ Importance of Benchmarks ● One benchmark can't ● Broad categories to trigger all be considered for a optimizations benchmark: ● Perform differently on – Integer different platforms – Floating point ● Need a range of – Branching benchmarks – Memory 4

  5. μ Existing Benchmark Suites Considered ● MiBench ● Require embedded Linux ● WCET ● Targeted at higher-end systems ● DSPstone ● Multithreaded ● ParMiBench benchmarks typically for ● OpenBench HPC ● LINPACK ● Don't necessarily test all ● Livermore Fortran corners of the platform Kernels ● Dhry/Whet-stone 5

  6. μ Our Benchmark List 6

  7. μ Choosing the Platforms ● Range of different features in the platforms chosen – Pipeline Depth – Multi- vs Single- core – FPU available? – Caching – On-chip vs off-chip memory 8

  8. μ Platforms Chosen ARM Cortex-M0 ARM Cortex-M3 ARM Cortex-A8 XMOS L1 Adapteva Epiphany Small memory Small memory Large memory Small memory On-chip and off-chip memory Simple Pipeline Simple Pipeline, Complex Simple pipeline Simple superscalar with forwarding superscalar pipeline logic, etc. pipeline SIMD/FPU FPU Multiple threads 16 cores 9

  9. μ Experimental Methodology ● Compiler optimizations have many non-linear interactions ● 238 optimization passes combined into 150 different options (GCC) ● 82 compiler options enabled by O3 ● How to test all of these, while accounting for the interactions between optimizations? Fractional Factorial Designs ● 10

  10. μ Hardware Measurements ● Current, voltage and power monitor ● 10 kSamples/s ● Low noise ● XMOS board to control and timestamp measurements ● Integrate to get energy consumption 16

  11. μ Results ● Energy consumption ≈ Execution time – Generalization, not true in every case ● Optimization unpredictability ● No optimization is universally good across benchmarks and platforms 20

  12. μ Overview FDCT, Cortex-M0 FDCT, Cortex-A8 21

  13. μ Overview FDCT, Cortex-M0 FDCT, Cortex-A8 22

  14. μ Overview FDCT, Cortex-M0 FDCT, Cortex-A8 23

  15. μ Overview 24

  16. μ Time ≈ Energy O1 Flags, Blowfish, Cortex-M0 25

  17. μ When Time ≠ Energy ● Complex pipeline ● -ftree-vectorize – NEON SIMD unit – Much lower power O3 Flags, 2DFIR, Cortex-A8 29

  18. μ Conclusion: Mostly, Time ≈ Energy ● Complex pipelines: ● Highly correlated – Still a correlation ● Especially so for – But more variability 'simple' pipelines – SIMD, superscalar ● Little scope for stalling execution or superscalar ● To get the most optimal execution energy consumption we need better than “go fast” 30

  19. μ Optimization Unpredictability ● Pairs of optimizations on top of O0 ● Possibly higher order interactions occurring? O1 Flags, Cubic, Cortex-M0 31

  20. μ Conclusion: Which optimization to choose? For the general case, this question can't be answered ● Unpredictable interactions ● Evidence of higher ● Many non-linear order interactions effects between ● Not enough data optimizations? recorded in the fractional factorial design to model 35

  21. μ What does this mean? For the Compiler Writer ● Current optimizations ● Current optimization levels (O1, O2, etc.) are a good balance targeted for performances between compile time and ● Few (if any) optimizations performance/energy. in current compilers ● Never completely optimal designed to reduce ● Machine learning energy consumption – MILEPOST – Genetic algorithms 36

  22. μ Conclusion ● Time ≈ Energy – True for simple pipelines – Mostly true for complex pipelines – Good approximation ● Optimization unpredictability – Difficult to model the interactions between optimizations 38

  23. μ Questions? jp@cs.bris.ac.uk simon@cs.bris.ac.uk jeremy.bennett@embecosm.com All data at: www.jpallister.com/wiki 39

  24. μ The Best Three Optimizations for Energy 40

  25. μ Conclusion: Optimizations are common across architectures... … Sometimes ● A few consistently ● Common options good options for across all the ARM Epiphany platforms for a – Simpler instruction set particular benchmark – Newer compiler – Many more registers than ARM 41

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend