Recent advances in the HPMPC and BLASFEO software packages
Gianluca Frison
Syscop group retreat
6 September 2016
Gianluca Frison Recent advances in the HPMPC and BLASFEO software packages
Recent advances in the HPMPC and BLASFEO software packages Gianluca - - PowerPoint PPT Presentation
Recent advances in the HPMPC and BLASFEO software packages Gianluca Frison Syscop group retreat 6 September 2016 Gianluca Frison Recent advances in the HPMPC and BLASFEO software packages HPMPC library for High-Performance implementation
Gianluca Frison Recent advances in the HPMPC and BLASFEO software packages
Gianluca Frison Recent advances in the HPMPC and BLASFEO software packages
◮ HPMPC: optimization algorithms for MPC ◮ BLASFEO: linear algebra for embedded optimization Gianluca Frison Recent advances in the HPMPC and BLASFEO software packages
Gianluca Frison Recent advances in the HPMPC and BLASFEO software packages
Gianluca Frison Recent advances in the HPMPC and BLASFEO software packages
Gianluca Frison Recent advances in the HPMPC and BLASFEO software packages
Gianluca Frison Recent advances in the HPMPC and BLASFEO software packages
Gianluca Frison Recent advances in the HPMPC and BLASFEO software packages
Gianluca Frison Recent advances in the HPMPC and BLASFEO software packages
Gianluca Frison Recent advances in the HPMPC and BLASFEO software packages
Gianluca Frison Recent advances in the HPMPC and BLASFEO software packages
◮ invisible to the user, only one new argument Np
◮ general constraints to be done ◮ atm the partial condensing happens in the feedback phase ◮ needs extensive testing and debugging Gianluca Frison Recent advances in the HPMPC and BLASFEO software packages
◮ focus on best possible performance for small matrices ◮ use panel-major matrix format ◮ main loop of each LA kernel is the gemm loop ◮ LA kernels written as C function with intrinsics
◮ trade-off between performance and code size ◮ focus on code reuse ◮ use panel-major matrix format ◮ LA kernels coded in assembly using custom function calling
Gianluca Frison Recent advances in the HPMPC and BLASFEO software packages
◮ first 6 arguments passed in GP registers (rdi, rsi, rdx, rcx, r8,
◮ the other arguments passed on the stack, one evey 64-bit
◮ GP registers rbx, rbp, r12, r13, r14, r15 have to be saved on
◮ the other GP registers can be freely modified ◮ no arguments can be passed on the FP registers ◮ the upper 256-bit of the FP registers must be set to zero
◮ large overhead (lot of stuff to be saved on the stack) ◮ FP registers can not be used to pass arguments Gianluca Frison Recent advances in the HPMPC and BLASFEO software packages
◮ no use of stack ◮ content of GP registers rdi, rsi, rdx, rcx, r8, r9 is untouched ◮ int and pointers passed in GP registers r10, r11, r12, r13, 14,
◮ first n = 4, 8 or 12 FP registers used as accumulation registers ◮ remaining (16 − n) FP registers used for local FP operations
◮ procedures have very small overhead (about the same as 2
◮ a procedure codes for an ’atomic’ operation on FP registers ◮ same procedure called by many LA kernels Gianluca Frison Recent advances in the HPMPC and BLASFEO software packages
◮ trade-off between code size and number of call and ret (and
◮ level 0: all procedures, no macros ◮ level 1: gemm procedure, all others macros ◮ level 2: no procedures, all macros
Gianluca Frison Recent advances in the HPMPC and BLASFEO software packages
◮ Intel Haswell 64-bit ◮ Intel Sandy-Bridge 64-bit ◮ Intel Core 64-bit ◮ AMD Bulldozer 64-bit
Gianluca Frison Recent advances in the HPMPC and BLASFEO software packages