ISA-Independent W ISA-Independent Workload Characterization and - - PowerPoint PPT Presentation

isa independent w isa independent workload
SMART_READER_LITE
LIVE PREVIEW

ISA-Independent W ISA-Independent Workload Characterization and - - PowerPoint PPT Presentation

ISA-Independent W ISA-Independent Workload Characterization and orkload Characterization and Implications for Specialized Ar Implications for Specialized Architectur chitectures es Yakun Sophia Shao and David Brooks Harvard University


slide-1
SLIDE 1

ISA-Independent W ISA-Independent Workload Characterization and

  • rkload Characterization and

Implications for Specialized Ar Implications for Specialized Architectur chitectures es

Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

slide-2
SLIDE 2

Specialized ar Specialized architectur chitectures ar es are e decoupled fr decoupled from legacy ISAs.

  • m legacy ISAs.

2

Spectrum of Specialization: General-Purpose CPU GPU Fixed-Function ASIC High Efficiency Low Efficiency Low Programmability High Programmability No ISA Tied to a Specific ISA

slide-3
SLIDE 3

Specialization r Specialization requir equires es workload intrinsic characteristics. workload intrinsic characteristics.

Specialized architecture is tailored to applications.

  • e.g. special data path, memory access patterns.

3

I want to design specialized architectures for applications. You need to first understand their characteristics. Where should I start first?

slide-4
SLIDE 4

4

Yeah, good point! What should I do to understand those characteristics? Hmmm…it’s what you used to do for CPU designs.

Specialization r Specialization requir equires es workload intrinsic characteristics. workload intrinsic characteristics.

but is what you get the true program characteristic? How about I run the program and collect performance- counter stats?

slide-5
SLIDE 5

Per Perfor formance-Counter Based mance-Counter Based Workload Characterization

  • rkload Characterization
  • Metrics

– IPC – Cache miss rates – Branch mis-prediction rates – …

  • Microarchitecture-dependent

– What if there is a bigger cache/a better branch predictor? – Not program intrinsic characteristics

5

slide-6
SLIDE 6

6

Specialization r Specialization requir equires es workload intrinsic characteristics. workload intrinsic characteristics.

Oh I also heard about microarchitecture-independent workload characterization. hmmm…that removes microarchitecture dependency. But it still ties to a specific ISA. We can perform the profiling analysis just using the instruction trace.

slide-7
SLIDE 7

7

Specialization r Specialization requir equires es workload intrinsic characteristics. workload intrinsic characteristics.

“Ties to a specific ISA”? Will that be a problem? Yes for specialized architectures!

slide-8
SLIDE 8

ISA impacts pr ISA impacts program behaviors.

  • gram behaviors.

Stack Overhead

  • Limited Registers
  • Additional Load/Store

Complex Operations

  • Memory Operands
  • Vector Operations

Calling Conventions

8

slide-9
SLIDE 9

9

Specialization r Specialization requir equires es workload intrinsic characteristics. workload intrinsic characteristics.

I see. So is there a way to get ISA-independent program characteristics? That’s a good question. I found a paper in ISPASS this year which seems to answer this question. Let’s take a look!

slide-10
SLIDE 10

Paper Summar Paper Summary y

Goal:

  • An analysis tool to characterize workloads ISA-Independent

characteristics for specialized architectures

10

Methods:

  • Leverage compiler’s intermediate representation (IR)
  • Categorize characteristics into compute, memory, and control

Takeaways:

  • ISA-dependent characterization is misleading for specialization.
  • ISA-independent characterization allows designers to quickly

identify opportunities for specialization.

slide-11
SLIDE 11

Tool Overview

  • ol Overview

Program IR Trace x86 Trace Characterization for Specialized Architecture Compute Memory Control ISA-Independent Design of Specialized Architecture

11

ISA-Dependent

slide-12
SLIDE 12

Pr Program Repr

  • gram Representations

esentations

12

Program IR Trace x86 Trace ILDJIT LLVM

slide-13
SLIDE 13

Pr Program Repr

  • gram Representations

esentations

  • SPEC CPU2000

13

Program IR Trace x86 Trace ILDJIT LLVM

slide-14
SLIDE 14

Pr Program Repr

  • gram Representations

esentations

ILDJIT

  • A modular compilation framework
  • Performs machine-independent

classical optimizations at the IR level

  • Uses LLVM’s back end to

– Do machine-dependent optimizations – Generate machine code

14

Program IR Trace x86 Trace ILDJIT LLVM

Campanoni, et al., A Highly Flexible, Parallel Virtual Machine: Design and Experience of ILDJIT, Software Practice Experience, 2010

slide-15
SLIDE 15

Pr Program Repr

  • gram Representations

esentations

ILDJIT IR

  • High-level IR
  • Machine-, ISA-, and system-library-

independent

  • Features:

– 80 instructions – Unlimited registers – Only loads/stores access memory – No vector operations – Parameters are passed by variables

15

Program IR Trace x86 Trace ILDJIT LLVM

slide-16
SLIDE 16

Pr Program Repr

  • gram Representations

esentations

x86 Trace

  • Used for ISA-dependent analysis
  • Semantically equivalent to the IR

code

  • Collected with Pin instrumentation

16

Program IR Trace x86 Trace ILDJIT LLVM

slide-17
SLIDE 17

Tool Overview

  • ol Overview

Program IR Trace x86 Trace Characterization for Specialized Architecture Compute Memory Control ISA-Independent Design of Specialized Architecture

17

ISA-Dependent

slide-18
SLIDE 18

ISA-Independent W ISA-Independent Workload Characteristics

  • rkload Characteristics

18

Compute Memory Control

  • Opcode Diversity
  • Static Instructions (I-MEM)
  • Memory Footprint (D-MEM)
  • Global Address Entropy
  • Local Address Entropy
  • Branch Instruction Counts
  • Branch Entropy
slide-19
SLIDE 19

Compute::Static Instructions Compute::Static Instructions

19

slide-20
SLIDE 20

20

Compute::Static Instructions Compute::Static Instructions

I will think those stack

  • perations are part of the

“hot code”. So if you use x86 trace instead of IR trace…

slide-21
SLIDE 21

ISA-Independent W ISA-Independent Workload Characteristics

  • rkload Characteristics

21

Compute Memory Control

  • Opcode Diversity
  • Static Instructions (I-MEM)
  • Memory Footprint (D-MEM)
  • Global Address Entropy
  • Local Address Entropy
  • Branch Instruction Counts
  • Branch Entropy
slide-22
SLIDE 22

Memor Memory::Entr y::Entropy

  • py

Entropy: a measure of the randomness

22

Entropy = − p(xi)*log2

i=1 N

p(xi)

Case 1: X is always a constant.

p(X) =1 log2 p(X) = 0 Entropy = 0

Case 2: N possible outcomes of X occur equally.

p(X) = 1 N log2 p(X) = log2 N −1 Entropy = −N * 1 N *log2 N −1 Entropy = log2 N

slide-23
SLIDE 23

Memor Memory::Global Addr y::Global Address Entr ess Entropy

  • py

23

Temporal Locality

Address Stream A Address Stream B

(less temporal locality) (more temporal locality) 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1

Entropy = 2 Entropy = 0

Yen, Draper, and Hill. Notary: Hardware Techniques to Enhance Signatures. MICRO 08

slide-24
SLIDE 24

Memor Memory::Global Addr y::Global Address Entr ess Entropy

  • py

24

Temporal Locality

Address Stream A Address Stream B

(less temporal locality) (more temporal locality) 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1

Entropy = 2 Entropy = 0

Yen, Draper, and Hill. Notary: Hardware Techniques to Enhance Signatures. MICRO 08

slide-25
SLIDE 25

Memor Memory::Global Addr y::Global Address Entr ess Entropy

  • py

25

Temporal Locality

I will have wrong locality estimate for workloads! So if you use x86 trace instead of IR trace…

slide-26
SLIDE 26

Memor Memory::Local Addr y::Local Address Entr ess Entropy

  • py

4 2 1 3 # of Bits Skipped Local Entropy 1 2 A B

26

Spatial Locality

Address Stream A Address Stream B

(less spatial locality) (more spatial locality) 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 0

slide-27
SLIDE 27

Memor Memory::Local Addr y::Local Address Entr ess Entropy

  • py

27

Spatial Locality

I will think program has more spatial locality than it really has. So if you use x86 trace instead of IR trace…

slide-28
SLIDE 28

ISA-Independent W ISA-Independent Workload Characteristics

  • rkload Characteristics

28

Compute Memory Control

  • Opcode Diversity
  • Static Instructions (I-MEM)
  • Memory Footprint (D-MEM)
  • Global Address Entropy
  • Local Address Entropy
  • Branch Instruction Counts
  • Branch Entropy

Yokota, et all, Introducing Entropies for Representing Program Behavior and Branch Predictor Performance, 07

slide-29
SLIDE 29

Contr Control::Branch Entr

  • l::Branch Entropy
  • py

29

slide-30
SLIDE 30

Contr Control::Branch Entr

  • l::Branch Entropy
  • py

30

I won’t get much wrong for control. So if you use x86 trace instead of IR trace…

slide-31
SLIDE 31

Tool Overview

  • ol Overview

Program IR Trace x86 Trace Characterization for Specialized Architecture Compute Memory Control ISA-Independent Design of Specialized Architecture

31

ISA-Dependent

slide-32
SLIDE 32

ISA-Independent W ISA-Independent Workload Characteristics

  • rkload Characteristics

32

Compute Memory Control

  • Opcode Diversity
  • Static Instructions (I-MEM)
  • Memory Footprint (D-MEM)
  • Global Address Entropy
  • Local Address Entropy
  • Branch Instruction Counts
  • Branch Entropy

Is there a way to compare those across workloads? Yes, Kiviat plot!

slide-33
SLIDE 33

ISA-Independent W ISA-Independent Workload Characteristics

  • rkload Characteristics

33

Compute Memory Control

  • Opcode Diversity
  • Static Instructions (I-MEM)
  • Memory Footprint (D-MEM)
  • Global Address Entropy
  • Local Address Entropy
  • Branch Instruction Counts
  • Branch Entropy
slide-34
SLIDE 34

Workload Characterization

  • rkload Characterization

34

slide-35
SLIDE 35

Conclusions Conclusions

  • We demonstrate that ISA-dependent analysis can be

misleading for specialized architectures.

  • We present an analysis tool to characterize ISA-

independent characteristics for specialization.

  • We show that our tool provides opportunities for

designers to compare workloads’ characteristics.

35