isa independent w isa independent workload
play

ISA-Independent W ISA-Independent Workload Characterization and - PowerPoint PPT Presentation

ISA-Independent W ISA-Independent Workload Characterization and orkload Characterization and Implications for Specialized Ar Implications for Specialized Architectur chitectures es Yakun Sophia Shao and David Brooks Harvard University


  1. ISA-Independent W ISA-Independent Workload Characterization and orkload Characterization and Implications for Specialized Ar Implications for Specialized Architectur chitectures es Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu

  2. Specialized ar Specialized architectur chitectures ar es are e decoupled fr decoupled from legacy ISAs. om legacy ISAs. General-Purpose Fixed-Function Spectrum of GPU CPU ASIC Specialization: Low Efficiency High Efficiency High Low Programmability Programmability Tied to a No ISA Specific ISA 2

  3. Specialization r Specialization requir equires es workload intrinsic characteristics. workload intrinsic characteristics. Specialized architecture is tailored to applications. • e.g. special data path, memory access patterns. I want to design specialized architectures for applications. Where should I start first? You need to first understand their characteristics. 3

  4. Specialization r Specialization requir equires es workload intrinsic characteristics. workload intrinsic characteristics. Yeah, good point! What should I do to understand those characteristics? How about I run the program and collect performance- counter stats? Hmmm…it’s what you used to do for CPU designs. but is what you get the true program characteristic? 4

  5. Per Perfor formance-Counter Based mance-Counter Based Workload Characterization orkload Characterization • Metrics – IPC – Cache miss rates – Branch mis-prediction rates – … • Microarchitecture-dependent – What if there is a bigger cache/a better branch predictor? – Not program intrinsic characteristics 5

  6. Specialization r Specialization requir equires es workload intrinsic characteristics. workload intrinsic characteristics. Oh I also heard about microarchitecture-independent workload characterization. We can perform the profiling analysis just using the instruction trace. hmmm…that removes microarchitecture dependency. But it still ties to a specific ISA. 6

  7. Specialization r Specialization requir equires es workload intrinsic characteristics. workload intrinsic characteristics. “Ties to a specific ISA”? Will that be a problem? Yes for specialized architectures! 7

  8. ISA impacts pr ISA impacts program behaviors. ogram behaviors. Stack Overhead • Limited Registers • Additional Load/Store Complex Operations • Memory Operands • Vector Operations Calling Conventions 8

  9. Specialization r Specialization requir equires es workload intrinsic characteristics. workload intrinsic characteristics. I see. So is there a way to get ISA-independent program characteristics? That’s a good question. I found a paper in ISPASS this year which seems to answer this question. Let’s take a look! 9

  10. Paper Summary Paper Summar y Goal: • An analysis tool to characterize workloads ISA-Independent characteristics for specialized architectures Methods: • Leverage compiler’s intermediate representation (IR) • Categorize characteristics into compute, memory, and control Takeaways: • ISA-dependent characterization is misleading for specialization. • ISA-independent characterization allows designers to quickly identify opportunities for specialization. 10

  11. Tool Overview ool Overview Design of Program Specialized Architecture Characterization for ISA-Independent IR Trace Specialized Architecture Compute Memory Control ISA-Dependent x86 Trace 11

  12. Pr Program Repr ogram Representations esentations Program ILDJIT IR Trace LLVM x86 Trace 12

  13. Pr Program Repr ogram Representations esentations • SPEC CPU2000 Program ILDJIT IR Trace LLVM x86 Trace 13

  14. Pr Program Repr ogram Representations esentations ILDJIT Program • A modular compilation framework ILDJIT • Performs machine-independent classical optimizations at the IR level IR Trace • Uses LLVM’s back end to – Do machine-dependent optimizations LLVM – Generate machine code x86 Trace Campanoni, et al., A Highly Flexible, Parallel Virtual Machine: Design and Experience of ILDJIT, Software Practice Experience, 2010 14

  15. Program Repr Pr ogram Representations esentations ILDJIT IR Program • High-level IR ILDJIT • Machine-, ISA-, and system-library- independent IR Trace • Features: – 80 instructions LLVM – Unlimited registers – Only loads/stores access memory x86 Trace – No vector operations – Parameters are passed by variables 15

  16. Pr Program Repr ogram Representations esentations x86 Trace Program • Used for ISA-dependent analysis ILDJIT • Semantically equivalent to the IR code IR Trace • Collected with Pin instrumentation LLVM x86 Trace 16

  17. Tool Overview ool Overview Design of Program Specialized Architecture Characterization for ISA-Independent IR Trace Specialized Architecture Compute Memory Control ISA-Dependent x86 Trace 17

  18. ISA-Independent W ISA-Independent Workload Characteristics orkload Characteristics • Opcode Diversity Compute • Static Instructions (I-MEM) � • Memory Footprint (D-MEM) Memory • Global Address Entropy • Local Address Entropy • Branch Instruction Counts Control • Branch Entropy 18

  19. Compute::Static Instructions Compute::Static Instructions 19

  20. Compute::Static Instructions Compute::Static Instructions So if you use x86 trace instead of IR trace… I will think those stack operations are part of the “hot code”. 20

  21. ISA-Independent W ISA-Independent Workload Characteristics orkload Characteristics • Opcode Diversity Compute • Static Instructions (I-MEM) • Memory Footprint (D-MEM) Memory • Global Address Entropy � • Local Address Entropy � • Branch Instruction Counts Control • Branch Entropy 21

  22. Memory::Entr Memor y::Entropy opy Entropy: a measure of the randomness N ∑ p ( x i ) Entropy = − p ( x i )*log 2 i = 1 Case 2: Case 1: N possible outcomes of X is always a constant. X occur equally. p ( X ) = 1 p ( X ) = 1 N log 2 p ( X ) = 0 log 2 p ( X ) = log 2 N − 1 Entropy = 0 Entropy = − N * 1 N *log 2 N − 1 Entropy = log 2 N 22

  23. Memor Memory::Global Addr y::Global Address Entr ess Entropy opy Temporal Locality Address Stream A Address Stream B (less temporal locality) (more temporal locality) 0 0 0 0 � 0 0 0 1 � 0 0 1 0 � 0 0 1 1 � Entropy = 2 � Entropy = 0 � Yen, Draper, and Hill. Notary: Hardware Techniques to Enhance Signatures. MICRO 08 23

  24. Memor Memory::Global Addr y::Global Address Entr ess Entropy opy Temporal Locality Address Stream A Address Stream B (less temporal locality) (more temporal locality) 0 0 0 0 � 0 0 0 1 � 0 0 1 0 � 0 0 1 1 � Entropy = 2 � Entropy = 0 � Yen, Draper, and Hill. Notary: Hardware Techniques to Enhance Signatures. MICRO 08 24

  25. Memor Memory::Global Addr y::Global Address Entr ess Entropy opy Temporal Locality So if you use x86 trace instead of IR trace… I will have wrong locality estimate for workloads! 25

  26. Memor Memory::Local Addr y::Local Address Entr ess Entropy opy Spatial Locality Address Stream A Address Stream B (less spatial locality) (more spatial locality) 0 0 0 0 � 0 1 0 0 � 1 0 0 0 � 1 1 0 0 � A B 2 Local Entropy 1 4 0 1 2 3 # of Bits Skipped 26

  27. Memor Memory::Local Addr y::Local Address Entr ess Entropy opy Spatial Locality So if you use x86 trace instead of IR trace… I will think program has more spatial locality than it really has. 27

  28. ISA-Independent W ISA-Independent Workload Characteristics orkload Characteristics • Opcode Diversity Compute • Static Instructions (I-MEM) • Memory Footprint (D-MEM) Memory • Global Address Entropy • Local Address Entropy • Branch Instruction Counts Control • Branch Entropy � Yokota, et all, Introducing Entropies for Representing Program Behavior and Branch Predictor Performance, 07 28

  29. Contr Control::Branch Entr ol::Branch Entropy opy 29

  30. Contr Control::Branch Entr ol::Branch Entropy opy So if you use x86 trace instead of IR trace… I won’t get much wrong for control. 30

  31. Tool Overview ool Overview Design of Program Specialized Architecture Characterization for ISA-Independent IR Trace Specialized Architecture Compute Memory Control ISA-Dependent x86 Trace 31

  32. ISA-Independent W ISA-Independent Workload Characteristics orkload Characteristics Is there a way to compare those • Opcode Diversity across workloads? Compute • Static Instructions (I-MEM) • Memory Footprint (D-MEM) Memory • Global Address Entropy • Local Address Entropy Yes, Kiviat plot! • Branch Instruction Counts Control • Branch Entropy 32

  33. ISA-Independent W ISA-Independent Workload Characteristics orkload Characteristics • Opcode Diversity � Compute • Static Instructions (I-MEM) � • Memory Footprint (D-MEM) � Memory • Global Address Entropy � • Local Address Entropy • Branch Instruction Counts Control • Branch Entropy � 33

  34. Workload Characterization orkload Characterization 34

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend