Using Invariant Analysis for Improving Instrumentation-based - - PowerPoint PPT Presentation

using invariant analysis for improving instrumentation
SMART_READER_LITE
LIVE PREVIEW

Using Invariant Analysis for Improving Instrumentation-based - - PowerPoint PPT Presentation

Using Invariant Analysis for Improving Instrumentation-based Performance Evaluation of SPECjvm2008 Benchmarks Michael Kuperberg, Martin Krogmann, Ralf Reussner Karlsruhe Institute of Technology SOFTWARE DESIGN AND QUALITY GROUP INSTITUTE FOR


slide-1
SLIDE 1

KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association SOFTWARE DESIGN AND QUALITY GROUP INSTITUTE FOR PROGRAM STRUCTURES AND DATA ORGANIZATION, FACULTY OF INFORMATICS

www.kit.edu

Using Invariant Analysis for Improving Instrumentation-based Performance Evaluation of SPECjvm2008 Benchmarks

Michael Kuperberg, Martin Krogmann, Ralf Reussner

Karlsruhe Institute of Technology

slide-2
SLIDE 2

Software Design and Quality Group Institute for Program Structures and Data Organization 2 Oct 8th, 2010

Cross-platform performance prediction [KKR2008a] for systematic engineering of component-based software

Performance in our case: execution duration of component services

Performance prediction e.g. for following scenarios:

Relocation of an application to another execution platform Sizing: choosing appropriate execution platform to fulfil changed perf. requirements

Motivation

Kuperberg et al. - Invariant Analysis for Performance Evaluation

  • Exec. platform 1

A E

  • Exec. platform 2

A E

  • Exec. platform 4

? ?

  • Exec. platform 3

F D

  • Exec. platform 5
slide-3
SLIDE 3

Software Design and Quality Group Institute for Program Structures and Data Organization 3 Oct 8th, 2010

Bytecode-based Performance Prediction

Context of presented work: bytecode-based performance prediction [KKR2008a] for existing components:

Performance of a component on other execution platform Bytecode instructions counts as a performance metric

Counting must be performed at runtime, since static analysis or symbolic execution not sufficient Must be applicable to sourceless and legacy components

  • 3. Predict performance: combine counts and benchmark results

1. Count bytecode instructions

IADD NEWARRAY LMUL DUP

2. Benchmark bytecode instructions

IADD LMUL DUP NEWARRAY

number of intructions execution duration

Kuperberg et al. - Invariant Analysis for Performance Evaluation

slide-4
SLIDE 4

Software Design and Quality Group Institute for Program Structures and Data Organization 4 Oct 8th, 2010

ByCounter: Runtime Bytecode Instruction Counting using Application Instrumentation

ByCounter collects runtime counts of Java bytecode instructions and method invocations

Counts different instruction types individually Configurable parameter recording for array-related instructions Not constrained by timer accuracies and costs (cf. short methods) Based on JVM-independent application instrumentation

Kuperberg et al. - Invariant Analysis for Performance Evaluation

... IINC meth1() IMUL meth2() ISTORE LLOAD LLOAD ...

Bytecode classes

  • f application

ByCounter

Method a(): ... 27865*LLOAD 976*meth1() ... Method b(): ...

Application Workload Aggregated instruction counts Settings

slide-5
SLIDE 5

Software Design and Quality Group Institute for Program Structures and Data Organization 5 Oct 8th, 2010

Overview over the ByCounter Process

Instrument bytecode before execution Execute instrumented bytecode

  • 1. Parse

program bytecode

  • 2. Instrument

parsed program representation

  • 3. Convert

into executable bytecode

  • 4. Create

testbed if needed (parameters, etc.)

  • 5. Replace
  • riginal

with instru- mented bytecode classes

  • 6. Run

instrumented bytecode, collect counting results

... ILOAD IADD ... ... ILOAD IINC C1 IADD IINC C8 ... ... 27865*ILOAD 11108*IADD 8764*meth1 () ... ... 10111 1 11011 1 ... ... 10111 1 11011 1 ... ... 101 111 110 111 ... ... 10111 1 11011 1 ... ... 10111 1 11011 1 ... ... 101 110 ... Kuperberg et al. - Invariant Analysis for Performance Evaluation

slide-6
SLIDE 6

Software Design and Quality Group Institute for Program Structures and Data Organization 6 Oct 8th, 2010

Idea and Advantages of ByCounter

Idea: instrument the application, not the virtual machine

Insert counters into existing bytecode, preserve method signatures

Advantages:

Instrumentation transparent to the application: no functional side-effects (but: runtime overhead) Method invocations by the bytecode of the instrumented method: configurable and extendable treatment No dependence on native interfaces, works on any JVM Idea applicable to Dalvik, CLR etc.

Previous approaches: use modified JVMs or JVMTI etc.

Insufficient portability; not desirable in production environments

Kuperberg et al. - Invariant Analysis for Performance Evaluation

slide-7
SLIDE 7

Software Design and Quality Group Institute for Program Structures and Data Organization 7 Oct 8th, 2010

Example: SOR Part of the Scimark Benchmark in SPECjvm2008

No jumps, loops, method invocations or other control flow è The number of executed bytecode instructions...

... is independent of the input parameter values of num_flops ... is independent of the state of the invocation target ... can be determined statically

Kuperberg et al. - Invariant Analysis for Performance Evaluation

slide-8
SLIDE 8

Software Design and Quality Group Institute for Program Structures and Data Organization 8 Oct 8th, 2010

Switching to Bytecode Instruction Sequences

Since counting bytecode instructions individually...

... is costly in terms of runtime overhead (CPU, memory) ... limits scalability, offers room for improvement

Solution: identify and use performance-invariant bytecode instruction sequences (PIBISes)

Decreases amount of inserted instrumentation Maintains existing precision of counting results Similar to basic blocks (and dictionaries in data compression)

We extended ByCounter and studied the effects using workloads of the SPECjvm2008 benchmark

Kuperberg et al. - Invariant Analysis for Performance Evaluation

slide-9
SLIDE 9

Software Design and Quality Group Institute for Program Structures and Data Organization 9 Oct 8th, 2010

PIBISes: Treatment in ByCounter PIBISes are not identical to basic blocks:

As with basic blocks: no jumps etc. allowed Additionally: a PIBIS may not contain instructions with parameter-dependent performance (which can change between executions: cf. size parameter of newarray)

Extended ByCounter: identifies PIBISes

Instead of 1 counter incrementation for every single executed instruction: 1 incrementation per PIBIS exec. Note that some PIBISes still contain just one instruction

Kuperberg et al. - Invariant Analysis for Performance Evaluation

slide-10
SLIDE 10

Software Design and Quality Group Institute for Program Structures and Data Organization 10 Oct 8th, 2010

Implementation of ByCounter for Java

Analysable, easily modifiable representation

  • Obtained using ASM framework
  • Insert counting instrumentation into application
  • Counters are long-typed bytecode local variables

(invisible outside the instrumented method),

  • Counters initialised when method execution starts
  • Each execution of instruction/PIBIS: counter is

also incremented

  • Report counters at method exit points (write to a log

file or report to a central „collector“ daemon)

  • 1. Parse

program bytecode

  • 2. Instrument

parsed program representation and run resulting bytecode

  • Instrumented .class files: persistable, usable by any ClassLoader
  • Existing workloads, harnesses, scripts and configurations can be used

Kuperberg et al. - Invariant Analysis for Performance Evaluation

slide-11
SLIDE 11

Software Design and Quality Group Institute for Program Structures and Data Organization 11 Oct 8th, 2010

Preliminary Results

Kuperberg et al. - Invariant Analysis for Performance Evaluation 5.79 55.30 4.26 6.09 58.40 139.10 6.10 56.90 48.02 0.00 20.00 40.00 60.00 80.00 100.00 120.00 140.00 160.00

Crypto.AES Derby MPEG Audio Uninstrumented Instrumented(Original Method) Instrumented (PIBIS Analysis)

  • Durations in seconds
  • Median values based on 21

measurements using java. lang.System.nanoTime()

  • Durations include result

aggregation and storage

  • JITting takes place (proof:
  • XX:+PrintCompilation

JVM flag to enable logging) Evaluation platform (runs Mac OS X 10.6.4, 64 bit):

  • 2.8 GHz Intel Core 2 Duo, 4 GB of 1067 MHz DDR3 main memory
  • JVM 1.6.0_20 provided by Apple (default mode, equals –server)
  • Xmx768M JVM flag to allocate 768 MB of heap memory
slide-12
SLIDE 12

Software Design and Quality Group Institute for Program Structures and Data Organization 12 Oct 8th, 2010

Related Work

  • Concerning SPECjvm98:
  • [Gregg et al., 2002] modified JVM to benchmarking methods and bytecode

instructions, no research on counting overhead

  • [Lambert and Power, 2005] static/dynamic frequencies of basic blocks
  • [Li et al., 2000] complete system simulation: not addressing bytecode-level

basic blocks or precise bytecode counts

  • SPECjvm2008
  • [Oi, 2009], [Oi, 2010] compared other performance metrics, different JVMs
  • [Shiv et al., 2009] impact of hardware architecture details on

SPECjvm2008 performance in comparison to other SPEC benchmarks

  • JVM-internal basic block analysis for Just-in-Time compilation etc.
  • Analysis results not available to platform-independent counting tools
  • Program optimisers, escape analysis and control flow graph analysis of

basic blocks have different objectives

Kuperberg et al. - Invariant Analysis for Performance Evaluation

slide-13
SLIDE 13

Software Design and Quality Group Institute for Program Structures and Data Organization 13 Oct 8th, 2010

Assumptions and Limitations Subsequences (i.e. Sub-PIBISes) irrelevant: PIBISes should be as large as possible Bytecode supplied to ByCounter must be „final“

Complex classloading in application servers: to test ByCounter works as JVM „instrumentation agent“, too

JIT impact to be considered Further evaluation needed (e.g. SPECjbb2005) Instrumenting Java Platform API methods: t.b.d.

Kuperberg et al. - Invariant Analysis for Performance Evaluation

slide-14
SLIDE 14

Software Design and Quality Group Institute for Program Structures and Data Organization 14 Oct 8th, 2010

Future Work Further potential for decreasing runtime overhead

Identify performance-invariant methods: no need for result reporting each time (counts constant) Parallelise evaluation and aggregation of results on multi-core execution platforms

Combine with purity analysis

To prevent counting code that otherwise is „dead code“

Study the shape/contents of different PIBISes

Also: their static/dynamic frequency

Compare overhead to JVMTI-based tools

Kuperberg et al. - Invariant Analysis for Performance Evaluation

slide-15
SLIDE 15

Software Design and Quality Group Institute for Program Structures and Data Organization 15 Oct 8th, 2010

Bibliography

[BLC2002a] Bruneton, E., Lenglet, R., and Coupaye, T. (2002). ASM: a code manipulation tool to implement adaptable systems. Adaptable and Extensible Component Systems. http://asm.ow2.org. [GPW2002a] Gregg, D., Power, J., and Waldron, J. (2002). Benchmarking the Java virtual architecture - the specjvm98 benchmark suite. Java Microarchitectures, pages 1–18. [HKRR2009a] Hauck, M., Kuperberg, M., Krogmann, K., and Reussner, R. (2009). Modelling Layered Component Execution Environments for Performance Prediction. Springer LNCS, 2009 [KB2007a] Kuperberg, M. and Becker, S. (2007). Predicting Software Component Performance: On the Relevance of Parameters for Benchmarking Bytecode and APIs. Proceedings of the 12th International Workshop on Component Oriented Programming (WCOP 2007). [KKR2008a] Kuperberg, M., Krogmann, K., and Reussner, R. (2008). Performance Prediction for Black-Box Components using Reengineered Parametric Behaviour. Springer LNCS, 2008. [KKR2009a] Kuperberg, M., Krogmann, M., and Reussner, R. (2009). TimerMeter: Quantifying Properties

  • f Software Timers for System Analysis. Proceedings of QEST2009.

[KKR2010a] Krogmann, K., Kuperberg, M., and Reussner, R. (2010). Using Genetic Search for Reverse Engineering of Parametric Behaviour Models for Performance Prediction. IEEE Transactions on Software Engineering. Accepted for publication, to appear 2010. [SPECjvm2008] SPECjvm2008 Benchmarks. SPEC Corporation. http://www.spec.org/jvm2008/

Kuperberg et al. - Invariant Analysis for Performance Evaluation

slide-16
SLIDE 16

Software Design and Quality Group Institute for Program Structures and Data Organization 16 Oct 8th, 2010

Conclusions

Runtime bytecode instruction counts using ByCounter: platform-independent dynamic performance metric

  • Successful usage in cross-platform perf. prediction [KKR2008a]
  • Uses transparent instrumentation of application bytecode
  • Neither profilers nor JVM monitoring tools are instruction-precise

New: to decrease overhead in ByCounter: identify and use performance-invariant bytecode instruction sequences Evaluation shows significant overhead decrease, e.g. for SPECjvm2008 MPEGaudio: 2.9x lesser runtime overhead

Kuperberg et al. - Invariant Analysis for Performance Evaluation