ithemal accurate portable and fast basic block throughput
play

Ithemal: Accurate, Portable and Fast Basic Block Throughput - PowerPoint PPT Presentation

Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks Charith Mendis Alex Renda Saman Amarasinghe Michael Carbin Compilers need to search through code Sequences High-level code Optimizing


  1. Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks Charith Mendis Alex Renda Saman Amarasinghe Michael Carbin

  2. Compilers need to search through code Sequences High-level code Optimizing Compiler lea r14, [rbx-0x40] …….. ….…. lea rdx, [rbp+0x38] cmp rdi, rax

  3. Compilers need to search through code Sequences High-level code Optimizing Compiler How many cycles does it take to run? lea r14, [rbx-0x40] …….. ….…. Basic Block Throughput lea rdx, [rbp+0x38] cmp rdi, rax 40 Cycles

  4. Compilers need to search through code Sequences High-level code Optimizing Compiler Code n Code 1 Code 2 lea r14, [rbx-0x40] lea r14, [rbx-0x40] lea r14, [rbx-0x40] …….. …….. …….. …….. ….…. ….…. ….…. ….…. lea rdx, [rbp+0x38] sub rbp, 0x60 mov rbp, rbx cmp rdi, rax cmp rdi, rax cmp rdi, rax 44 Cycles 40 Cycles 36 Cycles

  5. Compilers need to search through code Sequences High-level code Optimizing Compiler Code n Code 1 Code 2 lea r14, [rbx-0x40] lea r14, [rbx-0x40] lea r14, [rbx-0x40] …….. …….. …….. …….. ….…. ….…. ….…. ….…. lea rdx, [rbp+0x38] sub rbp, 0x60 mov rbp, rbx cmp rdi, rax cmp rdi, rax cmp rdi, rax Slow Ground Truth 44 Cycles 40 Cycles 36 Cycles

  6. Compilers need to search through code Sequences High-level code Optimizing Compiler Code n Code 1 Code 2 lea r14, [rbx-0x40] lea r14, [rbx-0x40] lea r14, [rbx-0x40] …….. …….. …….. …….. ….…. ….…. ….…. ….…. lea rdx, [rbp+0x38] sub rbp, 0x60 mov rbp, rbx cmp rdi, rax cmp rdi, rax cmp rdi, rax Analytical Fast Model 44 Cycles 40 Cycles 36 Cycles

  7. <latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit> <latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit> <latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit> <latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit> Analytical models are inaccurate Analytical ~20% error ≈ Model • out-of-order • pipelined • super-scalar • bypassed • stateful components • complicated and inaccurate manuals • opaque implementations (vendor specific)

  8. <latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit> <latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit> <latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit> <latexit sha1_base64="SxF9sK8UBP+ezB7ihesmnOLZHFg=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4Krsi6LHoxWMF+wHtUrJptg3NJjHJimXpn/DiQRGv/h1v/huz7R609cHA470ZuZFijNjf/bW1ldW9/YLG2Vt3d29/YrB4ctI1NaJNILnUnwoZyJmjTMstpR2mKk4jTdjS+yf32I9WGSXFvJ4qGCR4KFjOCrZM6PayUlk/lfqXq1/wZ0DIJClKFAo1+5as3kCRNqLCEY2O6ga9smGFtGeF0Wu6lhipMxnhIu4KnFATZrN7p+jUKQMUS+1KWDRTf09kODFmkSuM8F2ZBa9XPzP6Y2vgozJlRqSDzRXHKkZUofx4NmKbE8okjmGjmbkVkhDUm1kWUhxAsvrxMWue1wK8FdxfV+nURwmO4QTOIBLqMtNKAJBDg8wyu8eQ/ei/fufcxbV7xi5gj+wPv8Acpwj8o=</latexit> Analytical models are inaccurate Analytical ≈ Model prediction highly non-linear prediction problem is highly non-linear

  9. Motivating Example - Zero Idioms vxorps xmm0, xmm0, xmm0 Intel Architecture Optimization Reference Manual 662 of 672 vxorps xmm1, xmm2, xmm3 Throughput: 1 clock cycle Intel Architecture Optimization Reference Manual 51 of 672 Special Case Throughput: 0.33 clock cycles

  10. Motivating Example - Zero Idioms vxorps xmm0, xmm0, xmm0 llvm-mca • Part of LLVM compiler infrastructure 100 iterations • Uses industry standard compiler (LLVM) scheduling models Method Estimate • e.g., more than >230 commits spread over 2 years for x86 Haswell Scheduling model Measured 32 IACA • Intel Architecture Code Analyzer llvm-mca 100 • Developed in-house at Intel IACA 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend