measuring instruction latencies with llvm
play

Measuring instruction latencies with llvm Guillaume Chatelet C. - PowerPoint PPT Presentation

Measuring instruction latencies with llvm Guillaume Chatelet C. Courbet, B. De Backer, O. Sykora Google Compiler Research Confidential + Proprietary Confidential + Proprietary Why? Scheduling needs latencies and Op decomposition This


  1. Measuring instruction latencies with llvm Guillaume Chatelet C. Courbet, B. De Backer, O. Sykora Google Compiler Research Confidential + Proprietary Confidential + Proprietary

  2. Why? ● Scheduling needs latencies and μOp decomposition This talk is about latency measurement only ○ ● Vendors release some information ○ May be incomplete / not be in a machine readable format Updating LLVM td files ● ○ is tedious / requires careful guesswork and analysis. Consequences ● ○ scheduling information is incomplete for most X86 models 2 Confidential + Proprietary

  3. How it works ∀ processor, ∀ instruction: start_measure .rept 10000 add rax, rax .endr end_measure 3 Confidential + Proprietary

  4. How it works - actually subtler than this... ∀ processor, ∀ instruction: start_measure .rept 10000 andn eax, ebx, edx # processor can execute these in parallel .endr end_measure We need a way to make the execution sequential ● 4 Confidential + Proprietary

  5. MCInst in LLVM Implicit input (e.g. EFLAGS) Explicit inputs (e.g. GR16) Implicit output Explicit output 5 Confidential + Proprietary

  6. Sequential execution: Create Dependency Current instruction must use an output of previous instruction 6 Confidential + Proprietary

  7. Implicit self cycle Possible cycle: Possible instance: AAA 7 Confidential + Proprietary

  8. Implicit self cycle - through register aliasing Possible cycle: Possible instance: AAA 8 Confidential + Proprietary

  9. Possible explicit self cycle Possible cycle: Possible instance: AND32ri EAX, EAX, 1 9 Confidential + Proprietary

  10. Possible cycle through another instruction Possible cycle: Possible instance: MMX_PMOVMSKBrr R10D, MM1 MMX_MOVD64rr MM1, R10D 10 Confidential + Proprietary

  11. Possible cycle through another instruction Possible cycle: Possible instance: VCMPPSZ256rri K5, YMM31, YMM31, 1 VFMSUBADD213PDZrk ZMM31, ZMM25, K5, ZMM29, ZMM9 Keep in mind: This process is fully automated 11 Confidential + Proprietary

  12. Results > llvm-exegesis -opcode-name IMUL16rri8 -benchmark-mode latency --- asm_template: name: latency IMUL16rri8 cpu_name: sandybridge llvm_triple: x86_64-grtev4-linux-gnu num_repetitions: 10000 measurements: - { key: latency, value: 4.0115, debug_string: '' } error: '' ... Identified discrepancies between TD files and measurements ● 12 Confidential + Proprietary

  13. What's next? ● Extend to memory operands Automate fixing of TD files ● ● Measure the effect of immediate: ±0, 1, ~1, 2 8,16,32,64 , ±∞, nan, denorm ○ ○ register values: SUB EAX, EAX, EAX vs SUB EAX, EAX, EBX ● Make it work on other CPUs (ARM under way, Power?) 13 Confidential + Proprietary

  14. Try It Out! https://llvm.org/docs/CommandGuide/llvm-exegesis.html 14 Confidential + Proprietary

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend