Ithemal: Accurate, Portable and Fast Basic Block Throughput - - PowerPoint PPT Presentation

ithemal accurate portable and fast basic block throughput
SMART_READER_LITE
LIVE PREVIEW

Ithemal: Accurate, Portable and Fast Basic Block Throughput - - PowerPoint PPT Presentation

Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks Charith Mendis Alex Renda Saman Amarasinghe Michael Carbin Compilers need to search through code Sequences High-level code Optimizing


slide-1
SLIDE 1

Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks

Charith Mendis Alex Renda Saman Amarasinghe Michael Carbin

slide-2
SLIDE 2

Optimizing Compiler High-level code

lea r14, [rbx-0x40] lea rdx, [rbp+0x38] cmp rdi, rax

…….. ….….

Compilers need to search through code Sequences

slide-3
SLIDE 3

40 Cycles

Optimizing Compiler High-level code

lea r14, [rbx-0x40] lea rdx, [rbp+0x38] cmp rdi, rax

…….. ….….

Compilers need to search through code Sequences

How many cycles does it take to run? Basic Block Throughput

slide-4
SLIDE 4

44 Cycles 36 Cycles

lea r14, [rbx-0x40] sub rbp, 0x60 cmp rdi, rax

…….. ….….

lea r14, [rbx-0x40] mov rbp, rbx cmp rdi, rax

…….. ….….

…….. ….….

Code 2 Code n

40 Cycles

Optimizing Compiler High-level code

lea r14, [rbx-0x40] lea rdx, [rbp+0x38] cmp rdi, rax

…….. ….….

Code 1

Compilers need to search through code Sequences

slide-5
SLIDE 5

44 Cycles 36 Cycles

lea r14, [rbx-0x40] sub rbp, 0x60 cmp rdi, rax

…….. ….….

lea r14, [rbx-0x40] mov rbp, rbx cmp rdi, rax

…….. ….….

…….. ….….

Code 2 Code n

40 Cycles

Optimizing Compiler High-level code

lea r14, [rbx-0x40] lea rdx, [rbp+0x38] cmp rdi, rax

…….. ….….

Code 1

Compilers need to search through code Sequences

Ground Truth Slow

slide-6
SLIDE 6

44 Cycles 36 Cycles

lea r14, [rbx-0x40] sub rbp, 0x60 cmp rdi, rax

…….. ….….

lea r14, [rbx-0x40] mov rbp, rbx cmp rdi, rax

…….. ….….

…….. ….….

Code 2 Code n

40 Cycles

Optimizing Compiler High-level code

lea r14, [rbx-0x40] lea rdx, [rbp+0x38] cmp rdi, rax

…….. ….….

Code 1

Compilers need to search through code Sequences

Analytical Model

Fast

slide-7
SLIDE 7
  • out-of-order
  • pipelined
  • super-scalar
  • bypassed
  • stateful components
  • complicated and inaccurate

manuals

  • opaque implementations

(vendor specific)

Analytical models are inaccurate

Analytical Model

<latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit><latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit><latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit><latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit>

~20% error

slide-8
SLIDE 8

prediction highly non-linear

Analytical models are inaccurate

Analytical Model

<latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit><latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit><latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit><latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit>

prediction problem is highly non-linear

slide-9
SLIDE 9

vxorps xmm0, xmm0, xmm0

Throughput: 1 clock cycle Intel Architecture Optimization Reference Manual 662 of 672

Motivating Example - Zero Idioms

Intel Architecture Optimization Reference Manual 51 of 672

vxorps xmm1, xmm2, xmm3

Special Case Throughput: 0.33 clock cycles

slide-10
SLIDE 10

Motivating Example - Zero Idioms

llvm-mca IACA

  • Intel Architecture Code Analyzer
  • Developed in-house at Intel
  • Part of LLVM compiler infrastructure
  • Uses industry standard compiler (LLVM)

scheduling models

  • e.g., more than >230 commits spread over

2 years for x86 Haswell Scheduling model

100 iterations

vxorps xmm0, xmm0, xmm0

Method Estimate Measured 32 llvm-mca 100 IACA 24

slide-11
SLIDE 11

Analytical Model 2 Analytical Model 3 Analytical Model 1

Analytical models are not portable

<latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit><latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit><latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit><latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit>

<latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit><latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit><latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit><latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit>

<latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit><latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit><latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit><latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit>
slide-12
SLIDE 12

mov ebx, [ecx] add ebx, ecx

Execute in a microprocessor Hand-written tools llvm-mca, IACA Data-driven prediction “Ithemal” Ground Truth Slow Fast Fast

Not portable; manual effort needed Portable; only need to retrain

~8% error ~20% error

Basic Block Throughput Estimation

slide-13
SLIDE 13

Motivating Example - Zero Idioms

100 iterations

vxorps xmm0, xmm0, xmm0

llvm-mca IACA

  • Intel Architecture Code Analyzer
  • Developed in-house at Intel
  • Part of LLVM compiler infrastructure
  • Uses industry standard compiler (LLVM)

scheduling models

  • e.g., more than >230 commits spread over

2 years for x86 Haswell Scheduling model

Ithemal 35

Ithemal

  • Data-driven model

Method Estimate Measured 32 llvm-mca 100 IACA 24

slide-14
SLIDE 14

Ithemal model architecture

Vmov V<S> VCONST V<D> V<E> Vecx

Token Layer

mov <S> CONST <D> ecx <E>) ( mov ecx, 0x02 add ebx, ecx Vadd V<S> Vebx Vecx Vebx V<D> add <S> ebx ecx <D> ebx ) ( V<E> <E>

Token Embedding Lookup Table Canonicalization

Hierarchical Embeddings Token embeddings

slide-15
SLIDE 15

Ithemal model architecture

Vmov V<S> VCONST V<D> V<E> Vecx

Token Layer

mov <S> CONST <D> ecx <E>) ( mov ecx, 0x02 add ebx, ecx Vadd V<S> Vebx Vecx Vebx V<D> add <S> ebx ecx <D> ebx ) ( V<E> <E>

Token Embedding Lookup Table Canonicalization

Hierarchical Embeddings

h∅

LSTM LSTM LSTM LSTM LSTM

Instruction Layer

LSTM

hmov hadd h∅

LSTM LSTM LSTM LSTM LSTM LSTM LSTM

Instruction embeddings

slide-16
SLIDE 16

Ithemal model architecture

Vmov V<S> VCONST V<D> V<E> Vecx

Token Layer

mov <S> CONST <D> ecx <E>) ( mov ecx, 0x02 add ebx, ecx Vadd V<S> Vebx Vecx Vebx V<D> add <S> ebx ecx <D> ebx ) ( V<E> <E>

Token Embedding Lookup Table Canonicalization

Hierarchical Embeddings

h∅

LSTM LSTM LSTM LSTM LSTM

Instruction Layer

LSTM

hmov hadd h∅

LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM

hblock

Prediction Layer

87.35

Throughput Prediction

LSTM

h∅

×

slide-17
SLIDE 17

Ithemal halves the error rate

0.075 0.15 0.225 0.3 Ivy Bridge

0.089 0.181

llvm-mca IACA Ithemal

Average Prediction Error

N/A

slide-18
SLIDE 18

Ithemal halves the error rate across multiple microarchitectures

0.075 0.15 0.225 0.3 Ivy Bridge Haswell Skylake

0.079 0.089 0.089 0.167 0.209 0.239 0.2 0.181

llvm-mca IACA Ithemal

Average Prediction Error

N/A

slide-19
SLIDE 19

Conclusion and Future Work

  • Potential to replace or augment traditional systems with data-driven counterparts
  • Can Ithemal be made more robust?
  • Continuous improvement in compiler optimization - cost model guided learnt
  • ptimizations

Cost Model Compiler

Suggest transformation Feedback

slide-20
SLIDE 20

Download and use!

  • Dataset
  • Over 1 Million-timed basic blocks
  • Code
  • Try live demo

Come visit our poster

http://3.18.198.23/predict https://github.com/psg-mit/Ithemal

Today (Jun 11th Tuesday) from 06:30 to 09:00 PM at Pacific Ballroom #241