Numerical Analysis of the Matrix Logarithm Nick Higham School of - - PowerPoint PPT Presentation

numerical analysis of the matrix logarithm
SMART_READER_LITE
LIVE PREVIEW

Numerical Analysis of the Matrix Logarithm Nick Higham School of - - PowerPoint PPT Presentation

Numerical Analysis of the Matrix Logarithm Nick Higham School of Mathematics The University of Manchester higham@ma.man.ac.uk http://www.ma.man.ac.uk/~higham/ Computational Methods with Applications Harrachov 2007 Defining f(A)


slide-1
SLIDE 1

Numerical Analysis of the Matrix Logarithm

Nick Higham School of Mathematics The University of Manchester higham@ma.man.ac.uk http://www.ma.man.ac.uk/~higham/ Computational Methods with Applications Harrachov 2007

slide-2
SLIDE 2

Defining f(A) Applications Theory Methods

Outline

1 Definition of log(A) 2 Applications 3 Theory 4 Numerical methods

MIMS

Nick Higham Matrix Logarithm 2 / 42

slide-3
SLIDE 3

Defining f(A) Applications Theory Methods

Matrix Logarithm

A logarithm of A ∈ Cn×n is any matrix X such that eX = A. Existence. Representation, classification. Computation. Conditioning. First, approach via theory of matrix functions. . .

MIMS

Nick Higham Matrix Logarithm 3 / 42

slide-4
SLIDE 4

Defining f(A) Applications Theory Methods

Multiplicity of Definitions

There have been proposed in the literature since 1880 eight distinct definitions of a matric function, by Weyr, Sylvester and Buchheim, Giorgi, Cartan, Fantappiè, Cipolla, Schwerdtfeger and Richter. — R. F. Rinehart, The Equivalence of Definitions of a Matric Function,

  • Amer. Math. Monthly (1955)

MIMS

Nick Higham Matrix Logarithm 4 / 42

slide-5
SLIDE 5

Defining f(A) Applications Theory Methods

Jordan Canonical Form

Z −1AZ = J = diag(J1, . . . , Jp), Jk

  • mk×mk

=      λk 1 λk ... ... 1 λk      Definition f(A) = Zf(J)Z −1 = Zdiag(f(Jk))Z −1, f(Jk) =        f(λk) f ′(λk) . . . f (mk−1))(λk) (mk − 1)! f(λk) ... . . . ... f ′(λk) f(λk)        .

MIMS

Nick Higham Matrix Logarithm 5 / 42

slide-6
SLIDE 6

Defining f(A) Applications Theory Methods

Interpolation

Definition (Sylvester, 1883; Buchheim, 1886) Distinct e’vals λ1, . . . , λs, ni = max size of Jordan blocks for λi. Then f(A) = p(A), where p is unique Hermite interpolating poly of degree < s

i=1 ni satisfying

p(j)(λi) = f (j)(λi), j = 0: ni − 1, i = 1: s.

MIMS

Nick Higham Matrix Logarithm 7 / 42

slide-7
SLIDE 7

Defining f(A) Applications Theory Methods

Cauchy Integral Theorem

Definition f(A) = 1 2πi

  • Γ

f(z)(zI − A)−1 dz, where f is analytic on and inside a closed contour Γ that encloses λ(A).

MIMS

Nick Higham Matrix Logarithm 8 / 42

slide-8
SLIDE 8

Defining f(A) Applications Theory Methods

Equivalence of Definitions

Theorem The three definitions are equivalent, modulo analyticity assumption for Cauchy.

MIMS

Nick Higham Matrix Logarithm 9 / 42

slide-9
SLIDE 9

Defining f(A) Applications Theory Methods

Composite Functions

Theorem f(t) = g(h(t)) ⇒ f(A) = g(h(A)), provided latter matrix defined. Corollary exp(log(A)) = A when log(A) is defined.

MIMS

Nick Higham Matrix Logarithm 10 / 42

slide-10
SLIDE 10

Defining f(A) Applications Theory Methods

Outline

1 Definition of log(A) 2 Applications 3 Theory 4 Numerical methods

MIMS

Nick Higham Matrix Logarithm 11 / 42

slide-11
SLIDE 11

Defining f(A) Applications Theory Methods

Application: Markov Models

Time-homogeneous continuous-time Markov process with transition probability matrix P(t) ∈ Rn×n. Transition intensity matrix Q related to P by P(t) = eQt. Elements of Q satisfy qij ≥ 0, i = j,

n

  • j=1

qij = 0. Embeddability problem When does a given stochastic P have a real logarithm Q that is an intensity matrix?

MIMS

Nick Higham Matrix Logarithm 13 / 42

slide-12
SLIDE 12

Defining f(A) Applications Theory Methods

The Average Eye

First order character of optical system characterized by transference matrix T = S

δ 1

  • ∈ R5×5, where S ∈ R4×4 is

symplectic: STJS = J, where J =

  • −I2

I2

  • .

Average m−1 m

i=1 Ti is not a transference matrix.

Harris (2005) proposes the average exp(m−1 m

i=1 log(Ti)).

MIMS

Nick Higham Matrix Logarithm 14 / 42

slide-13
SLIDE 13

Defining f(A) Applications Theory Methods

The Average Eye

First order character of optical system characterized by transference matrix T = S

δ 1

  • ∈ R5×5, where S ∈ R4×4 is

symplectic: STJS = J, where J =

  • −I2

I2

  • .

Average m−1 m

i=1 Ti is not a transference matrix.

Harris (2005) proposes the average exp(m−1 m

i=1 log(Ti)).

For Hermitian pos def A and B, Arsigny et al. (2007) define the log-Euclidean mean E(A, B) = exp( 1

2(log(A) + log(B))).

MIMS

Nick Higham Matrix Logarithm 14 / 42

slide-14
SLIDE 14

Defining f(A) Applications Theory Methods

Outline

1 Definition of log(A) 2 Applications 3 Theory 4 Numerical methods

MIMS

Nick Higham Matrix Logarithm 15 / 42

slide-15
SLIDE 15

Defining f(A) Applications Theory Methods

Logs of A = I3

B =     , C =   2π − 1 1 −2π −2π   , D =   2π 1 −2π   , eB = eC = eD = I3. Λ(C) = Λ(D) = {0, 2πi, −2πi}.

MIMS

Nick Higham Matrix Logarithm 18 / 42

slide-16
SLIDE 16

Defining f(A) Applications Theory Methods

Principal Log and pth Root

Let A ∈ Cn×n have no eigenvalues on R− . Principal log X = log(A) denotes unique X such that eX = A. −π < Im

  • λ(X)
  • < π.

For next 2 slides only, allow Im

  • λ(X)
  • = π.

Principal pth root For integer p > 0, X = A1/p is unique X such that X p = A. −π/p < arg(λ(X)) < π/p.

MIMS

Nick Higham Matrix Logarithm 19 / 42

slide-17
SLIDE 17

Defining f(A) Applications Theory Methods

All Solutions of eX = A

Theorem (Gantmacher) A ∈ Cn×n nonsing with Jordan canonical form Z −1AZ = J = diag(J1, J2, . . . , Jp). All solutions to eX = A are given by X = Z U diag(L(j1)

1 , L(j2) 2 , . . . , L(jp) p ) U −1Z −1,

where L(jk)

k

= log(Jk(λk)) + 2 jk πi Imk, jk ∈ Z arbitrary, and U an arbitrary nonsing matrix that commutes with J.

MIMS

Nick Higham Matrix Logarithm 20 / 42

slide-18
SLIDE 18

Defining f(A) Applications Theory Methods

All Solutions of eX = A: Classified

Theorem A ∈ Cn×n nonsing: p Jordan blocks, s distinct ei’vals. eX = A has a countable infinity of solutions that are primary functions of A: Xj = Zdiag(L(j1)

1 , L(j2) 2 , . . . , L(jp) p )Z −1,

where λi = λk implies ji = jk. If s < p then eX = A has non-primary solutions Xj(U) = Z U diag(L(j1)

1 , L(j2) 2 , . . . , L(jp) p ) U −1Z −1,

where jk ∈ Z arbitrary, U arbitrary nonsing with UJ = JU, and for each j ∃ i and k s.t. λi = λk while ji = jk.

MIMS

Nick Higham Matrix Logarithm 21 / 42

slide-19
SLIDE 19

Defining f(A) Applications Theory Methods

Logs of A = I3

C =   2π − 1 1 −2π −2π   , D =   2π 1 −2π   , e0 = eC = eD = I3. Λ(C) = Λ(D) = {0, 2πi, −2πi}. U =   1 α 1 α 1   , α ∈ C, X = U diag(2πi, −2πi, 0)U−1 = 2πi   1 −2α 2α2 1 −α 1   .

MIMS

Nick Higham Matrix Logarithm 22 / 42

slide-20
SLIDE 20

Defining f(A) Applications Theory Methods

Two Facts on Commuting Matrices

Theorem If A, B ∈ Cn×n commute then ∃ a unitary U ∈ Cn×n such that U∗AU and U∗BU are both upper triangular.

MIMS

Nick Higham Matrix Logarithm 23 / 42

slide-21
SLIDE 21

Defining f(A) Applications Theory Methods

Two Facts on Commuting Matrices

Theorem If A, B ∈ Cn×n commute then ∃ a unitary U ∈ Cn×n such that U∗AU and U∗BU are both upper triangular. Theorem For A, B ∈ Cn×n, e(A+B)t = eAteBt for all t if and only if AB = BA.

MIMS

Nick Higham Matrix Logarithm 23 / 42

slide-22
SLIDE 22

Defining f(A) Applications Theory Methods

When Does log(BC) = log(B) + log(C)?

Theorem Let B, C ∈ Cn×n commute and have no ei’vals on R−. If for every ei’val λj of B and the corr. ei’val µj of C, | arg λj + arg µj| < π, then log(BC) = log(B) + log(C).

MIMS

Nick Higham Matrix Logarithm 24 / 42

slide-23
SLIDE 23

Defining f(A) Applications Theory Methods

When Does log(BC) = log(B) + log(C)?

Theorem Let B, C ∈ Cn×n commute and have no ei’vals on R−. If for every ei’val λj of B and the corr. ei’val µj of C, | arg λj + arg µj| < π, then log(BC) = log(B) + log(C).

  • Proof. log(B) and log(C) commute, since B and C do.

Therefore elog(B)+log(C) = elog(B)elog(C) = BC. Thus log(B) + log(C) is some logarithm of BC. Then Im(log λj + log µj) = arg λj + arg µj ∈ (−π, π), so log(B) + log(C) is the principal logarithm of BC.

MIMS

Nick Higham Matrix Logarithm 24 / 42

slide-24
SLIDE 24

Defining f(A) Applications Theory Methods

Outline

1 Definition of log(A) 2 Applications 3 Theory 4 Numerical methods

MIMS

Nick Higham Matrix Logarithm 25 / 42

slide-25
SLIDE 25

Defining f(A) Applications Theory Methods

Henry Briggs (1561–1630)

Arithmetica Logarithmica (1624) Logarithms to base 10 of 1–20,000 and 90,000–100,000 to 14 decimal places.

MIMS

Nick Higham Matrix Logarithm 26 / 42

slide-26
SLIDE 26

Defining f(A) Applications Theory Methods

Henry Briggs (1561–1630)

Arithmetica Logarithmica (1624) Logarithms to base 10 of 1–20,000 and 90,000–100,000 to 14 decimal places. Briggs must be viewed as one of the great figures in numerical analysis. —Herman H. Goldstine, A History of Numerical Analysis (1977)

MIMS

Nick Higham Matrix Logarithm 26 / 42

slide-27
SLIDE 27
slide-28
SLIDE 28
slide-29
SLIDE 29

Defining f(A) Applications Theory Methods

Briggs’ Log Method (1617)

log(ab) = log a + log b ⇒ log a = 2 log a1/2. Use repeatedly: log a = 2k log a1/2k. Write a1/2k = 1 + x and note log(1 + x) ≈ x. Briggs worked to base 10 and used log10 a ≈ 2k · log10 e · (a1/2k − 1).

MIMS

Nick Higham Matrix Logarithm 29 / 42

slide-30
SLIDE 30

Defining f(A) Applications Theory Methods

Matrix Logarithm

Take B = C in previous theorem: log A = log

  • A1/2 · A1/2

= 2 log A1/2, since arg λ(A1/2) ∈ (−π/2, π/2).

MIMS

Nick Higham Matrix Logarithm 30 / 42

slide-31
SLIDE 31

Defining f(A) Applications Theory Methods

Matrix Logarithm

Take B = C in previous theorem: log A = log

  • A1/2 · A1/2

= 2 log A1/2, since arg λ(A1/2) ∈ (−π/2, π/2). Use Briggs’ idea: log A = 2k log

  • A1/2k

.

MIMS

Nick Higham Matrix Logarithm 30 / 42

slide-32
SLIDE 32

Defining f(A) Applications Theory Methods

Matrix Logarithm

Take B = C in previous theorem: log A = log

  • A1/2 · A1/2

= 2 log A1/2, since arg λ(A1/2) ∈ (−π/2, π/2). Use Briggs’ idea: log A = 2k log

  • A1/2k

. Kenney & Laub’s (1989) inverse scaling and squaring method: Bring A close to I by repeated square roots. Approximate log A1/2k using an [m/m] Padé approximant rm(x) ≈ log(1 + x). Rescale to find log(A).

MIMS

Nick Higham Matrix Logarithm 30 / 42

slide-33
SLIDE 33

Defining f(A) Applications Theory Methods

Options

Apply ISS To original A: Cheng, H, Kenney & Laub (2001). Requires square roots of full matrices. To triangular Schur factor. To diagonal blocks within the Schur–Parlett method.

MIMS

Nick Higham Matrix Logarithm 31 / 42

slide-34
SLIDE 34

Defining f(A) Applications Theory Methods

Options

Apply ISS To original A: Cheng, H, Kenney & Laub (2001). Requires square roots of full matrices. To triangular Schur factor. To diagonal blocks within the Schur–Parlett method. ⋆ Use fixed Padé degree m. ⋆ Let m vary optimally with A.

MIMS

Nick Higham Matrix Logarithm 31 / 42

slide-35
SLIDE 35

Defining f(A) Applications Theory Methods

Options

Apply ISS To original A: Cheng, H, Kenney & Laub (2001). Requires square roots of full matrices. To triangular Schur factor. To diagonal blocks within the Schur–Parlett method. ⋆ Use fixed Padé degree m. ⋆ Let m vary optimally with A. ◮ MATLAB’s logm (m = 8).

MIMS

Nick Higham Matrix Logarithm 31 / 42

slide-36
SLIDE 36

Defining f(A) Applications Theory Methods

Options

Apply ISS To original A: Cheng, H, Kenney & Laub (2001). Requires square roots of full matrices. To triangular Schur factor. To diagonal blocks within the Schur–Parlett method. ⋆ Use fixed Padé degree m. ⋆ Let m vary optimally with A. ◮ MATLAB’s logm (m = 8). ◮ Improved logm.

MIMS

Nick Higham Matrix Logarithm 31 / 42

slide-37
SLIDE 37

Defining f(A) Applications Theory Methods

Padé Approximants

rkm = pkm/qkm is a [k/m] Padé approximant of f if pkm and qkm are polys of degree at most k and m and f(x) − rkm(x) = O

  • xk+m+1

. For f(x) = log(1 + x), r11(x) = 2x 2 + x , r22(x) = 6x + 3x2 6 + 6x + x2, r33(x) = 60x + 60x2 + 11x3 60 + 90x + 36x2 + 3x3.

MIMS

Nick Higham Matrix Logarithm 32 / 42

slide-38
SLIDE 38

Defining f(A) Applications Theory Methods

Padé Approximants

rkm = pkm/qkm is a [k/m] Padé approximant of f if pkm and qkm are polys of degree at most k and m and f(x) − rkm(x) = O

  • xk+m+1

. Theorem (Kenney & Laub, 1989) For X < 1, rmm(X) − log(I + X) ≤ |rmm(−X) − log(1 − X)|.

MIMS

Nick Higham Matrix Logarithm 32 / 42

slide-39
SLIDE 39

Defining f(A) Applications Theory Methods

Algorithmic Ingredients

log A = 2k log

  • A1/2k

≈ 2krm

  • A1/2k − I
  • .

For given A1/2k, error bound determines min m s.t. rm

  • suff. accurate.

Choose k and m = m(k) to minimize overall cost. Since

  • I − A1/2k+1

I + A1/2k+1 = I − A1/2k, I − A1/2k+1 ≈ 1 2I − A1/2k. Evaluate the partial fraction form rm(x) =

m

  • j=1

α(m)

j

x 1 + β(m)

j

x , where α(m)

j

weights and β(m)

j

Gauss–Legendre nodes.

MIMS

Nick Higham Matrix Logarithm 33 / 42

slide-40
SLIDE 40

Defining f(A) Applications Theory Methods

Schur–Parlett Algorithm

H & Davies (2003), funm: Compute Schur decomposition A = QTQ∗. Re-order T to block triangular form in which eigenvalues within a block are “close” and those of separate blocks are “well separated”. Evaluate Fii = f(Tii). Solve the Sylvester equations Tii Fij − Fij Tjj = FiiTij − TijFjj +

j−1

  • k=i+1

(FikTkj − TikFkj). Undo the unitary transformations.

MIMS

Nick Higham Matrix Logarithm 34 / 42

slide-41
SLIDE 41

Defining f(A) Applications Theory Methods

Schur–Parlett Algorithm

H & Davies (2003), funm: Compute Schur decomposition A = QTQ∗. Re-order T to block triangular form in which eigenvalues within a block are “close” and those of separate blocks are “well separated”. Evaluate Fii = log(Tii). Solve the Sylvester equations Tii Fij − Fij Tjj = FiiTij − TijFjj +

j−1

  • k=i+1

(FikTkj − TikFkj). Undo the unitary transformations.

MIMS

Nick Higham Matrix Logarithm 34 / 42

slide-42
SLIDE 42

Defining f(A) Applications Theory Methods

Function of 2 × 2 Block

f

  • λ1

t12 λ2

  • =

  f(λ1) t12 f(λ2) − f(λ1) λ2 − λ1 f(λ2)   . Inaccurate if λ1 ≈ λ2. Need a better way to compute the divided difference f[λ2, λ1].

MIMS

Nick Higham Matrix Logarithm 35 / 42

slide-43
SLIDE 43

Defining f(A) Applications Theory Methods

Log of 2 × 2 Block

log λ2 − log λ1 = log λ2 λ1

  • + 2πi U(log λ2 − log λ1)

= log 1 + z 1 − z

  • + 2πi U(log λ2 − log λ1),

where U = unwinding number, z = (λ2 − λ1)/(λ2 + λ1). atanh(z) := 1 2 log 1 + z 1 − z

  • ,

MIMS

Nick Higham Matrix Logarithm 36 / 42

slide-44
SLIDE 44

Defining f(A) Applications Theory Methods

Log of 2 × 2 Block

log λ2 − log λ1 = log λ2 λ1

  • + 2πi U(log λ2 − log λ1)

= log 1 + z 1 − z

  • + 2πi U(log λ2 − log λ1),

where U = unwinding number, z = (λ2 − λ1)/(λ2 + λ1). atanh(z) := 1 2 log 1 + z 1 − z

  • ,

f12 = t12 2 atanh(z) + 2πiU(log λ2 − log λ1) λ2 − λ1 .

MIMS

Nick Higham Matrix Logarithm 36 / 42

slide-45
SLIDE 45

Defining f(A) Applications Theory Methods

Numerical Experiment

◮ 67 test matrices, dimension 2–10. ◮ Evaluated X − log(A)F/ log(A)F. ◮ Notation: ◮ logm: MATLAB 7.4 (R2007a). ◮ logm_new: New version of logm. ◮ iss_schur: Schur decomp then ISS. ◮ iss: ISS on full A. ◮ cond(A) = lim

ǫ→0

max

E2≤ǫA2

log(A + E) − log(A)2 ǫ log(A)2 .

MIMS

Nick Higham Matrix Logarithm 37 / 42

slide-46
SLIDE 46

Defining f(A) Applications Theory Methods

Performance Profile

1 2 3 4 5 6 7 8 9 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 α p logm_new iss_schur logm iss MIMS

Nick Higham Matrix Logarithm 40 / 42

slide-47
SLIDE 47

Defining f(A) Applications Theory Methods

log(A)b

Hale, H & Trefethen, Computing Aα, log(A) and Related Matrix Functions by Contour Integrals, 2007. New methods for f(A)b where f has singularities in (−∞, 0] and A is a matrix with ei’vals on or near (0, ∞). Contour integrals + conformal map + repeated trapezium rule ⇒ geometric convergence.

MIMS

Nick Higham Matrix Logarithm 41 / 42

slide-48
SLIDE 48

Defining f(A) Applications Theory Methods

In Conclusion

Matrix logarithm and square root are archetypal examples of multivalued matrix functions (LambertW: Corless, Ding, H & Jeffrey,

2007).

Able to classify all logs. Non-primary logs of interest, but little is known. Improvements to inverse scaling and squaring alg and to logm. Exploiting structure?

MIMS

Nick Higham Matrix Logarithm 42 / 42

slide-49
SLIDE 49

Defining f(A) Applications Theory Methods

References I

  • R. J. Bradford, R. M. Corless, J. H. Davenport, D. J.

Jeffrey, and S. M. Watt. Reasoning about the elementary functions of complex analysis. Annals of Mathematics and Artificial Intelligence, 36:303–318, 2002.

  • S. H. Cheng, N. J. Higham, C. S. Kenney, and A. J.

Laub. Approximating the logarithm of a matrix to specified accuracy. SIAM J. Matrix Anal. Appl., 22(4):1112–1125, 2001.

MIMS

Nick Higham Matrix Logarithm 38 / 42

slide-50
SLIDE 50

Defining f(A) Applications Theory Methods

References II

  • R. M. Corless, H. Ding, N. J. Higham, and D. J. Jeffrey.

The solution of S exp(S) = A is not always the Lambert W function of A. In ISSAC ’07: Proceedings of the 2007 International Symposium on Symbolic and Algebraic Computation, pages 116–121, New York, 2007. ACM Press.

  • R. M. Corless and D. J. Jeffrey.

The unwinding number. ACM SIGSAM Bulletin, 30(2):28–35, June 1996.

MIMS

Nick Higham Matrix Logarithm 39 / 42

slide-51
SLIDE 51

Defining f(A) Applications Theory Methods

References III

  • E. D. Dolan and J. J. Moré.

Benchmarking optimization software with performance profiles.

  • Math. Programming, 91:201–213, 2002.
  • F. R. Gantmacher.

The Theory of Matrices, volume one. Chelsea, New York, 1959.

MIMS

Nick Higham Matrix Logarithm 40 / 42

slide-52
SLIDE 52

Defining f(A) Applications Theory Methods

References IV

  • N. Hale, N. J. Higham, and L. N. Trefethen.

Computing Aα, log(A) and related matrix functions by contour integrals. MIMS EPrint 2007.xxx, Manchester Institute for Mathematical Sciences, The University of Manchester, UK, Aug. 2007.

  • N. J. Higham.

Evaluating Padé approximants of the matrix logarithm. SIAM J. Matrix Anal. Appl., 22(4):1126–1135, 2001.

MIMS

Nick Higham Matrix Logarithm 41 / 42

slide-53
SLIDE 53

Defining f(A) Applications Theory Methods

References V

  • N. J. Higham.

Functions of Matrices: Theory and Computation. 2008. Book to appear.

MIMS

Nick Higham Matrix Logarithm 42 / 42