Measuring instruction latencies with llvm Guillaume Chatelet C. - - PowerPoint PPT Presentation

measuring instruction latencies with llvm
SMART_READER_LITE
LIVE PREVIEW

Measuring instruction latencies with llvm Guillaume Chatelet C. - - PowerPoint PPT Presentation

Measuring instruction latencies with llvm Guillaume Chatelet C. Courbet, B. De Backer, O. Sykora Google Compiler Research Confidential + Proprietary Confidential + Proprietary Why? Scheduling needs latencies and Op decomposition This


slide-1
SLIDE 1

Confidential + Proprietary Confidential + Proprietary

Measuring instruction latencies with llvm

Guillaume Chatelet

  • C. Courbet, B. De Backer, O. Sykora

Google Compiler Research

slide-2
SLIDE 2

Confidential + Proprietary

Why?

  • Scheduling needs latencies and μOp decomposition

○ This talk is about latency measurement only

  • Vendors release some information

○ May be incomplete / not be in a machine readable format

  • Updating LLVM td files

○ is tedious / requires careful guesswork and analysis.

  • Consequences

○ scheduling information is incomplete for most X86 models

2

slide-3
SLIDE 3

Confidential + Proprietary

How it works

∀ processor, ∀ instruction: start_measure .rept 10000 add rax, rax .endr end_measure

3

slide-4
SLIDE 4

Confidential + Proprietary

How it works - actually subtler than this...

∀ processor, ∀ instruction: start_measure .rept 10000 andn eax, ebx, edx # processor can execute these in parallel .endr end_measure

  • We need a way to make the execution sequential

4

slide-5
SLIDE 5

Confidential + Proprietary

MCInst in LLVM

Explicit inputs (e.g. GR16) Implicit input (e.g. EFLAGS) Explicit output Implicit output

5

slide-6
SLIDE 6

Confidential + Proprietary

Sequential execution: Create Dependency

Current instruction must use an output of previous instruction

6

slide-7
SLIDE 7

Confidential + Proprietary

Implicit self cycle

Possible cycle: Possible instance: AAA

7

slide-8
SLIDE 8

Confidential + Proprietary

Implicit self cycle - through register aliasing

Possible cycle: Possible instance: AAA

8

slide-9
SLIDE 9

Confidential + Proprietary

Possible instance: AND32ri EAX, EAX, 1 Possible cycle:

Possible explicit self cycle

9

slide-10
SLIDE 10

Confidential + Proprietary

Possible instance: MMX_PMOVMSKBrr R10D, MM1 MMX_MOVD64rr MM1, R10D

Possible cycle through another instruction

Possible cycle:

10

slide-11
SLIDE 11

Confidential + Proprietary

Possible instance: VCMPPSZ256rri K5, YMM31, YMM31, 1 VFMSUBADD213PDZrk ZMM31, ZMM25, K5, ZMM29, ZMM9

Keep in mind: This process is fully automated

Possible cycle through another instruction

Possible cycle:

11

slide-12
SLIDE 12

Confidential + Proprietary

Results

> llvm-exegesis -opcode-name IMUL16rri8 -benchmark-mode latency

  • asm_template:

name: latency IMUL16rri8 cpu_name: sandybridge llvm_triple: x86_64-grtev4-linux-gnu num_repetitions: 10000 measurements:

  • { key: latency, value: 4.0115, debug_string: '' }

error: '' ...

  • Identified discrepancies between TD files and measurements

12

slide-13
SLIDE 13

Confidential + Proprietary

What's next?

  • Extend to memory operands
  • Automate fixing of TD files
  • Measure the effect of

○ immediate: ±0, 1, ~1, 28,16,32,64, ±∞, nan, denorm ○ register values: SUB EAX, EAX, EAX vs SUB EAX, EAX, EBX

  • Make it work on other CPUs (ARM under way, Power?)

13

slide-14
SLIDE 14

Confidential + Proprietary

Try It Out!

https://llvm.org/docs/CommandGuide/llvm-exegesis.html

14