A Parallel Method for the Computation of Matrix Exponential based on - - PowerPoint PPT Presentation

a parallel method for the computation of matrix
SMART_READER_LITE
LIVE PREVIEW

A Parallel Method for the Computation of Matrix Exponential based on - - PowerPoint PPT Presentation

A Parallel Method for the Computation of Matrix Exponential based on Truncated Neumann Series V. S. Dimitrov 12 , V. Ariyarathna 3 , D. F. G. Coelho 1 , L. Rakai 1 , A. Madanayake 3 , R. J. Cintra 4 1 ECE Department, University of Calgary, Canada 2


slide-1
SLIDE 1

A Parallel Method for the Computation of Matrix Exponential based on Truncated Neumann Series

  • V. S. Dimitrov12, V. Ariyarathna3, D. F. G. Coelho1, L. Rakai1,
  • A. Madanayake3, R. J. Cintra4

1ECE Department, University of Calgary, Canada 2Computer Modelling Group, Canada 3ECE Department, University of Akron, USA 4Statistics Department, Universidade Federal de Pernambuco, Brazil

July 20, 2017

Coelho and Dimitrov (UofC) July 20, 2017 1 / 21

slide-2
SLIDE 2

Introduction

Problems in many areas require the solution of sets of linear, constant coefficient differential equations in the form: ˙ x(t) = Ax(t) = ⇒ x(t) = exp(tA)x0 When multiple inputs are used for the same system, it might be advantageous compute the matrix exponential.

Coelho and Dimitrov (UofC) July 20, 2017 2 / 21

slide-3
SLIDE 3

Methods for Matrix Exponential

Series expansion:

◮ Taylor; ◮ Padé; ◮ Scaling & Squaring.

Newton Interpolation; Cayley-Hamilton method; Eigenvectors decomposition.

Coelho and Dimitrov (UofC) July 20, 2017 3 / 21

slide-4
SLIDE 4

Matrix Exponential Series Expansion

The problem can be treated as the evaluation of a polynomial. Existing methods:

◮ Horner rule; ◮ Estrin method; ◮ Binary tree. Coelho and Dimitrov (UofC) July 20, 2017 4 / 21

slide-5
SLIDE 5

Conventions

Let A be a square matrix of size n × n. Let pN(·) be a polynomial of degree N − 1 over the real numbers. Let also gN(·) be a geometric series with N terms.

Coelho and Dimitrov (UofC) July 20, 2017 5 / 21

slide-6
SLIDE 6

Definition

The critical path associated with the computation of a matrix polynomial pN(A) is the largest chain of matrix multiplications (MM) in

  • rder to evaluate pN(A).

Definition (Critical Path for Matrix Polynomial)

Horner rule: N − 1 MM; Estrin method: 2 log2(N − 1) MM; Binary tree: 2 log2(N − 1) MM.

Coelho and Dimitrov (UofC) July 20, 2017 6 / 21

slide-7
SLIDE 7

Geometric Series

Geometric series of matrix arguments can be computed efficiently with the use of different polynomial factorizations. gN(A) =

  • (I + A2) · gN/2(A2),

if N ≡ 0 mod 2 I + (A + A2) · g(N−1)/2(A2), if N ≡ 1 mod 2.

gN(A) =   

(I + A + A2) · gN/3(A3), if N ≡ 0 mod 3, I + (A + A2 + A3) · g(N−1)/3(A3), if N ≡ 1 mod 3, I + A + (A2 + A3 + A4) · g(N−2)/3(A3), if N ≡ 2 mod 3.

In general, the use of basis P demands P′ log2(N) − 2 < 2 log2(N) − 2.

Coelho and Dimitrov (UofC) July 20, 2017 7 / 21

slide-8
SLIDE 8

Geometric Series

In general, the use of basis P demands P′ log2(N) − 2 < 2 log2(N) − 2. Examples: Basis 2: 2 log2(N) − 2; Basis 3: 1.89 . . . log2(N) − 2; Basis 5: 1.72 . . . log2(N) − 2; Basis 6: 1.92 . . . log2(N) − 2; Basis 26: 1.70 . . . log2(N) − 2.

Coelho and Dimitrov (UofC) July 20, 2017 8 / 21

slide-9
SLIDE 9

The Matrix Exponential as Several Neumann Series

We write the matrix exponential truncated series expansion pN(A) as a linear combination of different geometric series

  • n αkA, k = 0, 1, . . . , N − 1:

pN(A) =

N−1

  • n=0

gn+1(αnA) =

N−1

  • n=0

n−1

  • k=0

αk

nAk

  • =

N−1

  • n=0

N−1

  • k=n

αn

k

  • · An.

Coelho and Dimitrov (UofC) July 20, 2017 9 / 21

slide-10
SLIDE 10

The Matrix Exponential as Several Neumann Series

If the coefficients of pN(·) are p0, p1, . . . , pN−1, we have he system α0 + α1 + α2 + . . . + αN−1 = p0 α2

1 + α2 2 + . . . + α2 N−1 = p1

α3

2 + . . . + a3 N−1 = p2

. . . αN

N−1 = pN−1.

This system has several complex solutions that can be found by back substitution.

Coelho and Dimitrov (UofC) July 20, 2017 10 / 21

slide-11
SLIDE 11

A Numerical Example

Small degree polynomials does not require complex solutions. Considering N = 4, we have α0 = 0.451801 α1 = 0.420627 α2 = 0.344837 α3 = −0.217308

Coelho and Dimitrov (UofC) July 20, 2017 11 / 21

slide-12
SLIDE 12

Another Numerical Example

Table: Calculated coefficients for N = 9.

Coefficient Value α8 0.265650069930254 α7 0.270164634258582 α6 0.294213807881822 α5 0.320211897615876 α4 0.339931939777992 α3 0.312847569943447 β1 −0.803019919407973 β0 −0.046083639081852

Coelho and Dimitrov (UofC) July 20, 2017 12 / 21

slide-13
SLIDE 13

A Different Approach

If we modify the formulation to pN(A) =

N−1

  • n=0

gN(αnA) =

N−1

  • n=0

N−1

  • k=0

αn

k

  • · An.

we obtain α0 + α1 + α2 + . . . + αN−1 = 1 α2

0 + α2 1 + α2 2 + . . . + α2 N−1 = 1

2 . . . αN−1 + αN−1

1

+ αN−1

2

+ . . . + αN−1

N−1 =

1 (N − 1)!

Coelho and Dimitrov (UofC) July 20, 2017 13 / 21

slide-14
SLIDE 14

Algorithmic Example for N = 9

Pre Computation: B = A2, C = B2, broadcast A, B, C, β0, and β1 Processor 0 computes H9(A) ← β0A + β1B − (N − 4)I Processor 1 computes g4(α3A) ← (I + α3A)(I + α2

3B)

Processor 2 computes g5(α4A) ← I + (α4A + α2

4B)(I + α2 4B)

Processor 3 computes g6(α5A) ← (I + α5A)(I + α2

5B + α4 5C)

Processor 4 computes g7(α6A) ← I + (α6A + α2

6B)(I + α2 6B + α4 6C)

Processor 5 computes g8(α7A) ← (I + α7A)(I + α2

7B)(I + α4 7C)

Processor 6 computes g9(α8A) ← I + (α8A + α2

8B)(I + α2 8B)(I + α4 8C)

Return E9(A) = 9

n=3 gn+1(αnA) + H9(A)

Figure: Fragment of the algorithm for computing E9(A).

Coelho and Dimitrov (UofC) July 20, 2017 14 / 21

slide-15
SLIDE 15

Computing Time Trade-Off in Software

10-10 10-5 100 1 2 ×10-4 m=10 10-10 10-5 100 Error 1 2 Time (s) ×10-3 m=100 1 2 3 4 5 6 7 8 9 N 10-10 10-5 100 0.2 0.4 m=1000 Error Time expm time

Figure: Illustration of the accuracy versus computing time trade-off for different values of N and m.

Coelho and Dimitrov (UofC) July 20, 2017 15 / 21

slide-16
SLIDE 16

Hardware Realization

4 4 4 4 4 4

block

4 4

Addition

4 4 2 P/S S/P

Re−arrange

a12 a22

t = 1 t = 0

a21 a11

G4(α3A) G6(α5A) G5(α4A) G7(α6A) G8(α7A) G9(α8A) A · A

A2 · A2

E9(A)

H9(A)

Figure: Top level view of the implementation of the proposed algorithm.

Coelho and Dimitrov (UofC) July 20, 2017 16 / 21

slide-17
SLIDE 17

Hardware Realization

D D D D

u11 u21 u22 v12 v22 v11

t = 1 t = 0

w12 w22 w21 w11 u12 v21

Figure: Multiplication block for 2×2 matrices.

Coelho and Dimitrov (UofC) July 20, 2017 17 / 21

slide-18
SLIDE 18

Hardware Realization Results: FPGA

Table: Timing and resource consumption comparison for Xilinx xc6vlx240t-ff1156 FPGA Figure of merit Horner’s Rule New Algorithm Latency 16 6 (clock cycles) Critical path 14.392 11.160 delay (ns) Slice LUTs used 24537 90993

  • No. of adders

48 118

  • No. of multipliers

32 112

Coelho and Dimitrov (UofC) July 20, 2017 18 / 21

slide-19
SLIDE 19

Hardware Realization Results: ASIC

Table: ASIC synthesis results

Figure of merit Horner’s New Percentage Rule Algorithm Change T (ns) 3.556 1.350 ↓ 62.04 % Occupied area 1.355 4.080 ↑ 201.10% (A, mm2) Dynamic power 3199.088 2403.661 ↓ 24.86% (mW/GHz) AT (mm2 · ns) 4.8175 5.5083 ↑ 14.34% AT2 (mm2 · ns2) 17.1313 7.4363 ↓ 56.59% Max frequency 0.2812 0.7407 ↑ 163.40% (GHz) Latency 16 6 ↓ 62.5 % (Clock cycles) Total gate count 61009 131874 ↑ 116.15%

Coelho and Dimitrov (UofC) July 20, 2017 19 / 21

slide-20
SLIDE 20

Final Comments

Advantages:

◮ the proposed method reduce critical path;

Disadvantages:

◮ requires more processors and memory (software); ◮ requires more hardware resources such as LUT and gates (hardware).

Future works:

◮ consider different combinations of Neumann series for different

solutions (real solution possible?);

◮ consider more matrix functions and general polynomials; ◮ provide accurate error analysis. Coelho and Dimitrov (UofC) July 20, 2017 20 / 21

slide-21
SLIDE 21

Questions?

Coelho and Dimitrov (UofC) July 20, 2017 21 / 21