BLASFEO
Gianluca Frison
University of Freiburg
BLIS retreat September 19, 2017
Gianluca Frison BLASFEO
BLASFEO Gianluca Frison University of Freiburg BLIS retreat - - PowerPoint PPT Presentation
BLASFEO Gianluca Frison University of Freiburg BLIS retreat September 19, 2017 Gianluca Frison BLASFEO BLASFEO Basic Linear Algebra Subroutines For Embedded Optimization Gianluca Frison BLASFEO BLASFEO performance dgemm_nt 50 40
Gianluca Frison BLASFEO
Gianluca Frison BLASFEO
◮ BLASFEO HP ◮ OpenBLAS 0.2.19 ◮ MKL 2017.2.174 ◮ ATLAS 3.10.3 ◮ BLIS 0.1.6
10 20 30 40 50 50 100 150 200 250 300 Gflops matrix size n dgemm_nt BLASFEO_HP OpenBLAS MKL ATLAS BLIS 10 20 30 40 50 50 100 150 200 250 300 Gflops matrix size n dpotrf_l BLASFEO_HP OpenBLAS MKL ATLAS BLIS
Gianluca Frison BLASFEO
Gianluca Frison BLASFEO
Gianluca Frison BLASFEO
Gianluca Frison BLASFEO
Gianluca Frison BLASFEO
Gianluca Frison BLASFEO
2
5 10 15 20 25 50 100 150 200 250 300 Gflops matrix size n test dsyrk + dpotrf 2 4 6 8 10 12 14 5 10 15 20 Gflops matrix size n test dsyrk + dpotrf
Gianluca Frison BLASFEO
5 10 15 20 25 50 100 150 200 250 300 Gflops matrix size n test dsyrk + dpotrf 2 4 6 8 10 12 14 5 10 15 20 Gflops matrix size n test dsyrk + dpotrf
Gianluca Frison BLASFEO
5 10 15 20 25 50 100 150 200 250 300 Gflops matrix size n test dsyrk + dpotrf 2 4 6 8 10 12 14 5 10 15 20 Gflops matrix size n test dsyrk + dpotrf
Gianluca Frison BLASFEO
5 10 15 20 25 50 100 150 200 250 300 Gflops matrix size n test dsyrk + dpotrf 2 4 6 8 10 12 14 5 10 15 20 Gflops matrix size n test dsyrk + dpotrf
Gianluca Frison BLASFEO
5 10 15 20 25 50 100 150 200 250 300 Gflops matrix size n test dsyrk + dpotrf 2 4 6 8 10 12 14 5 10 15 20 Gflops matrix size n test dsyrk + dpotrf
Gianluca Frison BLASFEO
5 10 15 20 25 50 100 150 200 250 300 Gflops matrix size n test dsyrk + dpotrf 2 4 6 8 10 12 14 5 10 15 20 Gflops matrix size n test dsyrk + dpotrf
Gianluca Frison BLASFEO
Gianluca Frison BLASFEO
Gianluca Frison BLASFEO
linear algebra kernels linear algebra kernels
based solver HPMPC based solver linear algebra routines linear algebra routines Riccati solver for unconstrained MPC IPM solver for linear MPC IPM solver for linear MPC Riccati solver for unconstrained MPC high-level wrapper packing of matrices Part I Part II Part III
Gianluca Frison BLASFEO
5 10 15 20 25 50 100 150 200 250 300 Gflops matrix size n test dsyrk + dpotrf 2 4 6 8 10 12 14 5 10 15 20 Gflops matrix size n test dsyrk + dpotrf
Gianluca Frison BLASFEO
L =
2 =
L00 ∗ ∗ L10 L11 ∗ L20 L21 L22 = Q00 ∗ ∗ Q10 Q11 ∗ Q20 Q21 Q22 + A0 A1 A2 ·
AT
1
AT
2
1/ 2
= (Q00 + A0 · AT
0 )1/ 2
∗ ∗ (Q10 + A1 · AT
0 )L−T 00
(Q11 + A1 · AT
1 − L10 · LT 10)1/ 2
∗ (Q20 + A2 · AT
0 )L−T 00
(Q21 + A2 · AT
1 − L20 · LT 10)L−T 11
(Q22 + A2 · AT
2 − L20 · LT 20 − L21 · LT 21)1/ 2
◮ reduce number of function calls ◮ reduce number of load and store of the same data Gianluca Frison BLASFEO
Gianluca Frison BLASFEO
◮ LA kernels (register-blocking, SIMD) ◮ optimized data layout
◮ cache-blocking ◮ on-line data packing ◮ multi-thread (at least for now)
◮ optimized (panel-major) matrix format ◮ off-line data packing ◮ assembly subroutines: tailored LA kernels & code reuse Gianluca Frison BLASFEO
◮ level 1: O(n) storage, O(n) flops ◮ level 2: O(n2) storage, O(n2) flops ◮ level 3: O(n2) storage, O(n3) flops
◮ reuse factor O(1) ◮ memory-bounded
◮ reuse factor O(n) ◮ compute-bounded for large n ◮ disregard O(n2) terms ◮ typically memory-bounded for small n ◮ minimize O(n2) terms ⇒ BLASFEO playground! Gianluca Frison BLASFEO
Gianluca Frison BLASFEO
Gianluca Frison BLASFEO
Gianluca Frison BLASFEO
◮ extra linear storage for inverse of diagonal in factorizations Gianluca Frison BLASFEO
10 20 30 40 50 50 100 150 200 250 300 Gflops matrix size n dgemm_nt HP RF OB MKL Eig 10 20 30 40 50 50 100 150 200 250 300 Gflops matrix size n dgemm_nn HP RF OB MKL Eig 20 40 60 80 100 50 100 150 200 250 300 Gflops matrix size n sgemm_nt HP RF OB MKL Eig 20 40 60 80 100 50 100 150 200 250 300 Gflops matrix size n sgemm_nn HP RF OB MKL Eig
Gianluca Frison BLASFEO
10 20 30 40 50 50 100 150 200 250 300 Gflops matrix size n dpotrf_l HP RF OB MKL Eig 10 20 30 40 50 50 100 150 200 250 300 Gflops matrix size n dgetrf HP RF OB MKL 20 40 60 80 100 50 100 150 200 250 300 Gflops matrix size n spotrf_l HP RF OB MKL Eig 10 20 30 40 50 50 100 150 200 250 300 Gflops matrix size n dgelqf HP RF OB MKL
Gianluca Frison BLASFEO
0.5 1 1.5 2 2.5 3 3.5 4 4.5 50 100 150 200 250 300 Gflops matrix size n dgemm_nt HP RF OB Eig 2 4 6 8 10 12 14 16 18 50 100 150 200 250 300 Gflops matrix size n sgemm_nt HP RF OB Eig 1 2 3 4 5 6 7 8 50 100 150 200 250 300 Gflops matrix size n dgemm_nt HP RF OB 2 4 6 8 10 12 14 16 50 100 150 200 250 300 Gflops matrix size n sgemm_nt HP RF OB
Gianluca Frison BLASFEO
50 100 150 200 500 1000 1500 2000 2500 3000 3500 4000 Gflops matrix size n dpotrf_l MKL PL_64+MKL PL_64+HP 50 100 150 200 500 1000 1500 2000 2500 3000 3500 4000 Gflops matrix size n dpotrf_l MKL PL_64+MKL PL_64+HP 50 100 150 200 500 1000 1500 2000 2500 3000 3500 4000 Gflops matrix size n dpotrf_l MKL PL_128+MKL PL_128+HP 50 100 150 200 500 1000 1500 2000 2500 3000 3500 4000 Gflops matrix size n dpotrf_l MKL PL_128+MKL PL_128+HP
Gianluca Frison BLASFEO
Gianluca Frison BLASFEO