Linear Algebra 1: Computing canonical forms in exact linear algebra - - PowerPoint PPT Presentation

linear algebra 1 computing canonical forms in exact
SMART_READER_LITE
LIVE PREVIEW

Linear Algebra 1: Computing canonical forms in exact linear algebra - - PowerPoint PPT Presentation

Linear Algebra 1: Computing canonical forms in exact linear algebra Clment P ERNET , LIG/INRIA-MOAIS, Grenoble Universit, France ECRYPT II: Summer School on Tools, Mykonos, Grece, June 1st, 2012 Introduction : matrix normal forms Given a


slide-1
SLIDE 1

Linear Algebra 1: Computing canonical forms in exact linear algebra

Clément PERNET,

LIG/INRIA-MOAIS, Grenoble Université, France

ECRYPT II: Summer School on Tools, Mykonos, Grece, June 1st, 2012

slide-2
SLIDE 2

Introduction : matrix normal forms

Given a transformation B = f(A),

◮ Identifiy invariants ◮ Unique representant of an equivalence class ◮ Simplify computations (structured form)

Different types:

Equivalence over a field: B = UA, where U is invertible

◮ Reduced echelon form:

E =     1 ∗ ∗ ∗ ∗ 1 ∗ ∗ ∗ 1 ∗    

◮ Gauss-Jordan elimination

slide-3
SLIDE 3

Introduction : matrix normal forms

Given a transformation B = f(A),

◮ Identifiy invariants ◮ Unique representant of an equivalence class ◮ Simplify computations (structured form)

Different types:

Equivalence over a ring: B = UA, where det(U) = ±1

◮ Hermite normal form:

H =     p1 ∗ x1,2 ∗ ∗ x1,3 ∗ p2 ∗ ∗ x2,3 ∗ p3 ∗     , with 0 ≤ x∗,j < pj

slide-4
SLIDE 4

Introduction : matrix normal forms

Given a transformation B = f(A),

◮ Identifiy invariants ◮ Unique representant of an equivalence class ◮ Simplify computations (structured form)

Different types:

Similarity over a field: B = U−1AU

◮ Frobenius normal form (or canonical rational form):

F =      CP0 CP1 ... CPk      , with pi+1|pi and P0 = MinPoly(A).

◮ Krylov method or ZigZag elimination

slide-5
SLIDE 5

Motivation

Equivalence over a field: Gaussian elimination

◮ Reduced echelon form, rank profile: PolSys-Gröbner basis. ◮ Linear system solving: sieves, index calculus, ...

Equivalence over a ring: lattice reduction

◮ Hermite normal form: Z-modules and their saturation ◮ short vector problem:

◮ hard problem ◮ help improve computational complexities

slide-6
SLIDE 6

Complexities

Matrix multiplication: door to fast linear algebra

◮ over a field: O (nω). ω ∈]2.3727, 3] (exponent of linear algebra) ◮ over Z: O (nωM(log A)) = O˜(nω log A)

Equivalence over a field: Reduced echelon form

◮ Gauss-Jordan:

O (nω)

slide-7
SLIDE 7

Complexities

Matrix multiplication: door to fast linear algebra

◮ over a field: O (nω). ω ∈]2.3727, 3] (exponent of linear algebra) ◮ over Z: O (nωM(log A)) = O˜(nω log A)

Equivalence over a field: Reduced echelon form

◮ Gauss-Jordan:

O (nω) Equivalence over Z: Hermite normal form

◮ [Kannan & Bachem 79]:

∈ P

◮ [Chou & Collins 82]:

  • n6 log A
  • ◮ [Domich & Al. 87], [Illiopoulos 89]:

  • n4 log A
  • ◮ [Micciancio & Warinschi01]:

  • n5 log A2

, O˜

  • n3 log A
  • heur.

◮ [Storjohann & Labahn 96]:

  • nω+1 log A
slide-8
SLIDE 8

Complexities

Matrix multiplication: door to fast linear algebra

◮ over a field: O (nω). ω ∈]2.3727, 3] (exponent of linear algebra) ◮ over Z: O (nωM(log A)) = O˜(nω log A)

Equivalence over a field: Reduced echelon form

◮ Gauss-Jordan:

O (nω) Equivalence over Z: Hermite normal form

◮ [Kannan & Bachem 79]:

∈ P

◮ [Chou & Collins 82]:

  • n6 log A
  • ◮ [Domich & Al. 87], [Illiopoulos 89]:

  • n4 log A
  • ◮ [Micciancio & Warinschi01]:

  • n5 log A2

, O˜

  • n3 log A
  • heur.

◮ [Storjohann & Labahn 96]:

  • nω+1 log A
  • Similarity over a field: Frobenius normal form

◮ [Storjohann00]:

O˜(nω)

◮ [P

. & Storjohann07]: Las Vegas without U O (nω)

slide-9
SLIDE 9

Complexities

Matrix multiplication: door to fast linear algebra

◮ over a field: O (nω). ω ∈]2.3727, 3] (exponent of linear algebra) ◮ over Z: O (nωM(log A)) = O˜(nω log A)

Equivalence over a field: Reduced echelon form

◮ Gauss-Jordan:

O (nω) Equivalence over Z: Hermite normal form

◮ [Kannan & Bachem 79]:

∈ P

◮ [Chou & Collins 82]:

  • n6 log A
  • ◮ [Domich & Al. 87], [Illiopoulos 89]:

  • n4 log A
  • ◮ [Micciancio & Warinschi01]:

  • n5 log A2

, O˜

  • n3 log A
  • heur.

◮ [Storjohann & Labahn 96]:

  • nω+1 log A
  • Similarity over a field: Frobenius normal form

◮ [Storjohann00]:

O˜(nω)

◮ [P

. & Storjohann07]: Las Vegas without U O (nω)

slide-10
SLIDE 10

Motivation

Algorithmic building blocks in the design of efficient computation of exact linear algebra normal forms.

In brief

Reductions to a building block Matrix Mult: block rec. log n

i=1 n

n

2i

ω−1 = O (nω) (Gauss, REF) Matrix Mult: Iterative n

k=1 n

n

k

ω−1 = O (nω) (Frobenius) Linear Sys: over Z (Hermite Normal Form) Size/dimension compromises

◮ Hermite normal form : rank 1 updates reducing the determinant ◮ Frobenius normal form : degree k, dimension n/k for k = 1 . . . n

slide-11
SLIDE 11

Motivation

Algorithmic building blocks in the design of efficient computation of exact linear algebra normal forms.

In brief

Reductions to a building block Matrix Mult: block rec. log n

i=1 n

n

2i

ω−1 = O (nω) (Gauss, REF) Matrix Mult: Iterative n

k=1 n

n

k

ω−1 = O (nω) (Frobenius) Linear Sys: over Z (Hermite Normal Form) Size/dimension compromises

◮ Hermite normal form : rank 1 updates reducing the determinant ◮ Frobenius normal form : degree k, dimension n/k for k = 1 . . . n

Study originating from, the design of the libraries

◮ FFLAS-FFPACK: dense over word size Zp. ◮ M4RI: dense over GF(2) (see Martin Albrecht’s Talk) ◮ LinBox: dense/sparse/black-box over Z, Zp

slide-12
SLIDE 12

Outline

Reduced Echelon forms and Gaussian elimination Gaussian elimination based matrix decompositions Relations between decompositions Algorithms Hermite normal form Micciancio & Warinschi algorithm Double Determinant AddCol Frobenius normal form Krylov method Algorithm Reduction to matrix multiplication

slide-13
SLIDE 13

Outline

Reduced Echelon forms and Gaussian elimination Gaussian elimination based matrix decompositions Relations between decompositions Algorithms Hermite normal form Micciancio & Warinschi algorithm Double Determinant AddCol Frobenius normal form Krylov method Algorithm Reduction to matrix multiplication

slide-14
SLIDE 14

Reduced echelon form and Gaussian elimination

Gaussian elimination < Reduction to red. Echelon form

◮ Extensively studied for numerical computations ◮ Specificities of exact computations:

◮ No partial/full pivoting ◮ Rank profile matters

◮ size of coefficients (e.g. compressed in GF(2))

⇒asymmetry

slide-15
SLIDE 15

LU decomposition

= L A U

◮ L unit lower triangular, ◮ U non-sing upper triangular

Exists for

◮ matrices having the generic rank profile (every leading

principal minor is non zero)

slide-16
SLIDE 16

LUP , PLU decomposition

= L P A U

◮ P a permutation matrix

Exists for

◮ Any non-singular matrix ◮ Or any matrix with generic row rank profile

slide-17
SLIDE 17

LUP , PLU decomposition

= L P A U P = A U L

◮ P a permutation matrix

Exists for

◮ Any non-singular matrix ◮ Or any matrix with generic row rank profile

slide-18
SLIDE 18

LSP , LQUP , PLUQ decompositions

S = A L P

◮ S: semi-upper triangular, ◮ Q permutation matrix

Exists for

◮ any m × n matrix

slide-19
SLIDE 19

LSP , LQUP , PLUQ decompositions

S = A L P P = A L U Q

◮ S: semi-upper triangular, ◮ Q permutation matrix

Exists for

◮ any m × n matrix

slide-20
SLIDE 20

LSP , LQUP , PLUQ decompositions

S = A L P P = A L U Q U P Q = A L

◮ S: semi-upper triangular, ◮ Q permutation matrix

Exists for

◮ any m × n matrix

slide-21
SLIDE 21

Echelon form decomposition

Row Echelon Form XA = R

= A R X

◮ X, Y: non-singular transformation matrices ◮ R, C: matrices in row/col echelon form

slide-22
SLIDE 22

Echelon form decomposition

Row Echelon Form XA = R

= A R X

Column Echelon Form AY = C

A = Y C

◮ X, Y: non-singular transformation matrices ◮ R, C: matrices in row/col echelon form

slide-23
SLIDE 23

Reduced echelon form decomposition

Row Reduced Echelon Form XA = R

= A R X

◮ X, Y: non-singular transformation matrices ◮ R, C: matrices in reduced row/col echelon form

slide-24
SLIDE 24

Reduced echelon form decomposition

Row Reduced Echelon Form XA = R

= A R X

Column Reduced Echelon Form AY = C

A = C Z

◮ X, Y: non-singular transformation matrices ◮ R, C: matrices in reduced row/col echelon form

slide-25
SLIDE 25

CUP and PLE decompositions

A = U P C

◮ C: column echelon form ◮ E: row echelon form

Exists for

◮ any m × n matrix

slide-26
SLIDE 26

CUP and PLE decompositions

A = U P C A E = P L

◮ C: column echelon form ◮ E: row echelon form

Exists for

◮ any m × n matrix

slide-27
SLIDE 27

Relations: up to permutations

From LSP to LQUP

S = QU

S = A L P P = A L U Q

Fact

The first r = rank(A) values of the permutation Q are monotonically increasing.

slide-28
SLIDE 28

Relations: up to permutations

From LQUP to CUP

C = LQ

P = A L U Q P = A L U Q A = U P C

slide-29
SLIDE 29

Relations: up to permutations

From LQUP to CUP

C = LQ

P = A L U Q P = A L U Q A = U P C

From LQUP to PLE

Using transposition: PLE(AT) = CUP(A)T

slide-30
SLIDE 30

Relations:

From LQUP to PLUQ

P ↔ Q, L ←= QTLQ

P = A L U Q A = U P C U P Q = A L

slide-31
SLIDE 31

Algorithms: main types

Three ways to group operations:

  • 1. simple iterative

◮ Apply the standard Gaussian elimination in dimension n ◮ Main loop for i=1 to n

  • 2. block algorithms

2.1 block iterative (Tile)

◮ Apply Gaussian elimination in dimension n/k over blocks of

size k

◮ Main loop: for i=1 to n/k

2.2 block recursive

◮ Apply Gaussian elimination in dimension 2 recursively on

blocks of size n/2i

◮ Main loop: for i=1 to 2

slide-32
SLIDE 32

Type of algorithms

Data locality: prefer block algorithms

◮ cache aware: block iterative ◮ cache oblivious: block recursive

Base case efficieny: simple iterative Asymptotic time complexity: block recursive Parallelization: block iterative

slide-33
SLIDE 33

Block recursive gaussian elimination

Author Year Computation Requirement Strassen 69 Inverse

  • gen. rank prof.

Bunch, Hopcroft 74 LUP

  • gen. row rank prof

Ibarra, Moran, Hui 82 LSP , LQUP none Schönage, Keller-Gerig 85 StepForm none Storjohann 00 Echelon, RedEch none here 11 CUP ,PLE,PLUQ none

slide-34
SLIDE 34

Block recursive gaussian elimination

Author Year Computation Requirement Strassen 69 Inverse

  • gen. rank prof.

Bunch, Hopcroft 74 LUP

  • gen. row rank prof

Ibarra, Moran, Hui 82 LSP , LQUP none Schönage, Keller-Gerig 85 StepForm none Storjohann 00 Echelon, RedEch none here 11 CUP ,PLE,PLUQ none

Comparison according to:

◮ No requirement on the input matrix

slide-35
SLIDE 35

Block recursive gaussian elimination

Author Year Computation Requirement Strassen 69 Inverse

  • gen. rank prof.

Bunch, Hopcroft 74 LUP

  • gen. row rank prof

Ibarra, Moran, Hui 82 LSP , LQUP none Schönage, Keller-Gerig 85 StepForm none Storjohann 00 Echelon, RedEch none here 11 CUP ,PLE,PLUQ none

Comparison according to:

◮ No requirement on the input matrix ◮ Rank sensitive complexity

slide-36
SLIDE 36

Block recursive gaussian elimination

Author Year Computation Requirement Strassen 69 Inverse

  • gen. rank prof.

Bunch, Hopcroft 74 LUP

  • gen. row rank prof

Ibarra, Moran, Hui 82 LSP , LQUP none Schönage, Keller-Gerig 85 StepForm none Storjohann 00 Echelon, RedEch none here 11 CUP ,PLE,PLUQ none

Comparison according to:

◮ No requirement on the input matrix ◮ Rank sensitive complexity ◮ Memory allocations ◮ Constant factor in the time complexity

slide-37
SLIDE 37

Memory requirements:

Definition

In place = output overrides the input and computation does not need extra memory (considering Matrix multiplication C ← C + AB as a black box) Remark: a unit lower triangular and an upper triangular matrix can be stored on the same m × n storage!

slide-38
SLIDE 38

Preliminaries

TRSM: TRiangular Solve with Matrix

A B C −1 D E

  • =

A−1 I I −B I I C−1 D E

slide-39
SLIDE 39

Preliminaries

TRSM: TRiangular Solve with Matrix

A B C −1 D E

  • =

A−1 I I −B I I C−1 D E

  • Compute F = C−1E

(Recursive call) Compute G = D − BF (MM) Compute H = A−1G (Recursive call) Return H F

  • A

C B E D

slide-40
SLIDE 40

Preliminaries

TRSM: TRiangular Solve with Matrix

A B C −1 D E

  • =

A−1 I I −B I I C−1 D E

  • Compute F = C−1E

(Recursive call) Compute G = D − BF (MM) Compute H = A−1G (Recursive call) Return

  • H

F

  • A

C B F D

slide-41
SLIDE 41

Preliminaries

TRSM: TRiangular Solve with Matrix

A B C −1 D E

  • =

A−1 I I −B I I C−1 D E

  • Compute F = C−1E

(Recursive call) Compute G = D − BF (MM) Compute H = A−1G (Recursive call) Return

  • H

F

  • A

C B F G

slide-42
SLIDE 42

Preliminaries

TRSM: TRiangular Solve with Matrix

A B C −1 D E

  • =

A−1 I I −B I I C−1 D E

  • Compute F = C−1E

(Recursive call) Compute G = D − BF (MM) Compute H = A−1G (Recursive call) Return

  • H

F

  • A

C B F H

slide-43
SLIDE 43

Preliminaries

TRSM: TRiangular Solve with Matrix

A B C −1 D E

  • =

A−1 I I −B I I C−1 D E

  • Compute F = C−1E

(Recursive call) Compute G = D − BF (MM) Compute H = A−1G (Recursive call) Return

  • H

F

  • A

C B F H

◮ O (nω) ◮ In place

slide-44
SLIDE 44

The CUP decomposition A1 A2

  • 1. Split A Row-wise
slide-45
SLIDE 45

The CUP decomposition

V A

1 1

U

C

21

A

22

  • 1. Split A Row-wise
  • 2. Recursive call on A1
slide-46
SLIDE 46

The CUP decomposition

V A

1 1

U

C

22

G

  • 1. Split A Row-wise
  • 2. Recursive call on A1
  • 3. G ← A21U−1

1

(trsm)

slide-47
SLIDE 47

The CUP decomposition

V H

1 1

U

C

G

  • 1. Split A Row-wise
  • 2. Recursive call on A1
  • 3. G ← A21U−1

1

(trsm)

  • 4. H ← A22 − G × V (MM)
slide-48
SLIDE 48

The CUP decomposition

G C C

U

1 1 2

U

2

  • 1. Split A Row-wise
  • 2. Recursive call on A1
  • 3. G ← A21U−1

1

(trsm)

  • 4. H ← A22 − G × V (MM)
  • 5. Recursive call on H
slide-49
SLIDE 49

The CUP decomposition

G C C

U U

1 1 2 2

  • 1. Split A Row-wise
  • 2. Recursive call on A1
  • 3. G ← A21U−1

1

(trsm)

  • 4. H ← A22 − G × V (MM)
  • 5. Recursive call on H
  • 6. Row permutations
slide-50
SLIDE 50

Memory: LSP vs LQUP vs PLUQ vs CUP

Decomposition In place LSP N LQUP N PLUQ Y CUP Y

slide-51
SLIDE 51

Echelon forms

From CUP to ColumnEchelon form

Y = PT U1 U2 In−r −1 = PT

  • U−1

1

−U−1

1 U2

In−r

  • A

= U P C A = U P C Id A = Y C

Additional operations: −U−1U2 trsm (triangular system solve) in-place

slide-52
SLIDE 52

Echelon forms

From CUP to ColumnEchelon form

Y = PT U1 U2 In−r −1 = PT

  • U−1

1

−U−1

1 U2

In−r

  • A

= U P C A = U P C Id A = Y C

Additional operations: −U−1U2 trsm (triangular system solve) in-place U−1

1 :

trtri (triangular inverse)

slide-53
SLIDE 53

Echelon forms

From CUP to ColumnEchelon form

Y = PT U1 U2 In−r −1 = PT

  • U−1

1

−U−1

1 U2

In−r

  • A

= U P C A = U P C Id A = Y C

Additional operations: −U−1U2 trsm (triangular system solve) in-place U−1

1 :

trtri (triangular inverse) in-place

slide-54
SLIDE 54

From CUP to Column Echelon form

TRTRI: triangular inverse

U1 U2 U3 −1 = U−1

1

−U−1

1 U2U−1 3

U−1

3

  • 1: if n = 1 then

2:

U ← U−1

3: else 4:

U2 ← U−1

3 U2

TRSM

5:

U2 ← −U2U−1

3

TRSM

6:

U1 ← U−1

1

TRTRI

7:

U3 ← U−1

3

TRTRI

8: end if

slide-55
SLIDE 55

Reduced Echelon forms

From Col. Echelon form to Reduced Col. Echelon form

Z = Y M In−r −1

A = Y C A = Y Q M A = Y Q M Id Id A = C Z

Similarly, from PLE to RowEchelon form Again reduces to: U−1X: TRSM, in-place U−1: TRTRI, in-place

slide-56
SLIDE 56

Reduced Echelon forms

From Col. Echelon form to Reduced Col. Echelon form

Z = Y M In−r −1

A = Y C A = Y Q M A = Y Q M Id Id A = C Z

Similarly, from PLE to RowEchelon form Again reduces to: U−1X: TRSM, in-place U−1: TRTRI, in-place UL: TRTRM,

slide-57
SLIDE 57

Reduced Echelon forms

From Col. Echelon form to Reduced Col. Echelon form

Z = Y M In−r −1

A = Y C A = Y Q M A = Y Q M Id Id A = C Z

Similarly, from PLE to RowEchelon form Again reduces to: U−1X: TRSM, in-place U−1: TRTRI, in-place UL: TRTRM,

slide-58
SLIDE 58

Reduced Echelon forms

From Col. Echelon form to Reduced Col. Echelon form

Z = Y M In−r −1

A = Y C A = Y Q M A = Y Q M Id Id A = C Z

Similarly, from PLE to RowEchelon form Again reduces to: U−1X: TRSM, in-place U−1: TRTRI, in-place UL: TRTRM, in-place

slide-59
SLIDE 59

From Echelon to Reduced Echelon

TRTRM: triangular triangular multiplication

U1 U2 U3 L1 L2 L3

  • =

U1L1 + U2L2 U2L3 U3L2 U3L3

  • 1: X1 ← U1L1

TRTRM

2: X1 ← X1 + U2L2

MM

3: X2 ← U2L3

TRMM

4: X3 ← U3L2

TRMM

5: X4 ← U3L3

TRTRM

slide-60
SLIDE 60

From Echelon to Reduced Echelon

TRTRM: triangular triangular multiplication

U1 U2 U3 L1 L2 L3

  • =

U1L1 + U2L2 U2L3 U3L2 U3L3

  • 1: X1 ← U1L1

TRTRM

2: X1 ← X1 + U2L2

MM

3: X2 ← U2L3

TRMM

4: X3 ← U3L2

TRMM

5: X4 ← U3L3

TRTRM

U U U L L L1 1

2 2 3 3

slide-61
SLIDE 61

From Echelon to Reduced Echelon

TRTRM: triangular triangular multiplication

U1 U2 U3 L1 L2 L3

  • =

U1L1 + U2L2 U2L3 U3L2 U3L3

  • 1: X1 ← U1L1

TRTRM

2: X1 ← X1 + U2L2

MM

3: X2 ← U2L3

TRMM

4: X3 ← U3L2

TRMM

5: X4 ← U3L3

TRTRM

U U L L

2 2 3 3

X1

slide-62
SLIDE 62

From Echelon to Reduced Echelon

TRTRM: triangular triangular multiplication

U1 U2 U3 L1 L2 L3

  • =

U1L1 + U2L2 U2L3 U3L2 U3L3

  • 1: X1 ← U1L1

TRTRM

2: X1 ← X1 + U2L2

MM

3: X2 ← U2L3

TRMM

4: X3 ← U3L2

TRMM

5: X4 ← U3L3

TRTRM

U U L L

2 2 3 3

X1

slide-63
SLIDE 63

From Echelon to Reduced Echelon

TRTRM: triangular triangular multiplication

U1 U2 U3 L1 L2 L3

  • =

U1L1 + U2L2 U2L3 U3L2 U3L3

  • 1: X1 ← U1L1

TRTRM

2: X1 ← X1 + U2L2

MM

3: X2 ← U2L3

TRMM

4: X3 ← U3L2

TRMM

5: X4 ← U3L3

TRTRM

U X L L

2 2 3 3

X1

slide-64
SLIDE 64

From Echelon to Reduced Echelon

TRTRM: triangular triangular multiplication

U1 U2 U3 L1 L2 L3

  • =

U1L1 + U2L2 U2L3 U3L2 U3L3

  • 1: X1 ← U1L1

TRTRM

2: X1 ← X1 + U2L2

MM

3: X2 ← U2L3

TRMM

4: X3 ← U3L2

TRMM

5: X4 ← U3L3

TRTRM

U X L X

2 3 3 3

X1

slide-65
SLIDE 65

From Echelon to Reduced Echelon

TRTRM: triangular triangular multiplication

U1 U2 U3 L1 L2 L3

  • =

U1L1 + U2L2 U2L3 U3L2 U3L3

  • 1: X1 ← U1L1

TRTRM

2: X1 ← X1 + U2L2

MM

3: X2 ← U2L3

TRMM

4: X3 ← U3L2

TRMM

5: X4 ← U3L3

TRTRM

X X

2 3

X1 X4

◮ O (nω) ◮ In place

slide-66
SLIDE 66

Example: in place matrix inversion

A

slide-67
SLIDE 67

Example: in place matrix inversion

A U L

CUP decomposition

A = LU

slide-68
SLIDE 68

Example: in place matrix inversion

A U L U L

−1

CUP decomposition Echelon

AU−1 = L

slide-69
SLIDE 69

Example: in place matrix inversion

A U L U L

−1

CUP decomposition Echelon

A

−1

Reduced Echelon

A(U−1L−1) = I

slide-70
SLIDE 70

Experiments

./test-invert 65521 A1000 1 518,996,125,000 bytes x ms ms 0.0 5000.0 10000.0 15000.0 20000.0 25000.0 30000.0 bytes 0M 2M 4M 6M 8M 10M 12M 14M 16M x804EDD4:void%FFLAS::Win x804EDEE:void%FFLAS::Win x804EDC0:void%FFLAS::Win x8049A98:main x804AA7E:_Z10read_fieldI ./test-redechelon 65521 A1000 1 280,663,687,500 bytes x ms ms 1.0 5001.0 10001.0 15001.0 20001.0 25001.0 30001.0 bytes 0M 2M 4M 6M 8M x804B100:void%FFLAS::Win x804B11A:void%FFLAS::Win x804B0EC:void%FFLAS::Win x804A47E:_Z10read_fieldI

slide-71
SLIDE 71

Direct computation of the Reduced Echelon form

◮ Strassen 69: inverse of generic matrices ◮ Storjohann 00: Gauss-Jordan generalization for any rank

profile

Matrix Inversion [Strassen 69]

A B C D −1 =

  • A−1

I I −B I I (D − CA−1B)−1 I CA−1 I

slide-72
SLIDE 72

Direct computation of the Reduced Echelon form

◮ Strassen 69: inverse of generic matrices ◮ Storjohann 00: Gauss-Jordan generalization for any rank

profile

Matrix Inversion [Strassen 69]

A B C D −1 =

  • A−1

I I −B I I (D − CA−1B)−1 I CA−1 I

  • 1: Compute E = A−1

(Recursive call) 2: Compute F = D − CEB (MM) 3: Compute G = F−1 (Recursive call) 4: Compute H = −EB (MM) 5: Compute J = HG (MM) 6: Compute K = CE (MM) 7: Compute L = E + JK (MM) 8: Compute M = GK (MM) 9: Return E J M G

slide-73
SLIDE 73

Strassen-Storjohann’s Gauss-Jordan elimination

Problem

Needs to perform operations of the form A ← AB ⇒not doable in place by a usual matrix multiplication algorithm

slide-74
SLIDE 74

Strassen-Storjohann’s Gauss-Jordan elimination

Problem

Needs to perform operations of the form A ← AB ⇒not doable in place by a usual matrix multiplication algorithm Workaround [Storjohann]:

  • 1. Decompose B = LU

LU

  • 2. A ← AL

trmm

  • 3. A ← AU

trmm

slide-75
SLIDE 75

Rank sensitive time complexity

Fact

Algorithms LSP , CUP , LQUP , PLUQ, ... have a rank sensitive computation time: O

  • mnrω−2

20 40 60 80 100 120 140 500 1000 1500 2000 2500 3000 Time (s) Rank n=3000, PIII−1.6Ghz, 512Mb RAM Block algorithm Iterative algorithm

slide-76
SLIDE 76

Time complexity: comparing constants

O (nω) = Cωn3

Algorithm Constant Cω C3 Clog2 7 in-place MM Cω 2 6 × TRSM

Cω 2ω−1−2

1 4 ∨ TRTRI

Cω (2ω−1−2)(2ω−1−1) 1 3 ≈ 0.33 8 5 = 1.6

∨ TRTRM, CUP PLUQ LQUP,

Cω 2ω−1−2 − Cω 2ω−2 2 3 ≈ 0.66 14 5 = 2.8

∨ Echelon

Cω 2ω−2−1 − 3Cω 2ω−2

1

22 5 ≈ 4.4

∨ RedEchelon

Cω(2ω−1+2) (2ω−1−2)(2ω−1−1)

2

44 5 = 8.8

∨ StepForm

5Cω 2ω−1−1 + Cω (2ω−1−1)(2ω−2−1)

4

76 5 = 15.2

× GJ∗

Cω 2ω−2−1

2 8 ×

∗: GJ: GaussJordan alg of [Storjohann00] computing the

reduced echelon form

slide-77
SLIDE 77

Applications to standard linalg problems

Problem Using Cω C3 Clog2 7 In place Rank RankProfile IsSingular Det Solve GJ CUP

Cω 2ω−2−1 Cω 2ω−1−2 − Cω 2ω−2

2 0.66 8 2.8 × ∨ Inverse GJ CUP

Cω 2ω−2−1 Cω(2ω−1+2) (2ω−1−2)(2ω−1−1)

2 2 8 8.8 × ∨

slide-78
SLIDE 78

Summary

A L U

2/3

U L

  • 1

L

  • 1U-1

1/3 1/3

A

  • 1

2/3 LU decomp A=LU Echelon form decomp (TA=E) Reduced Echelon form decomp TA=R LUDecomp TRTRI TRTRI TRTRM Gauss 1 GaussJordan 2

  • Det
  • Rank
  • RankProfile
  • Solve
  • Echelon
  • Nullspace basis
  • Reduced

Echelon form

  • Inverse
slide-79
SLIDE 79

Outline

Reduced Echelon forms and Gaussian elimination Gaussian elimination based matrix decompositions Relations between decompositions Algorithms Hermite normal form Micciancio & Warinschi algorithm Double Determinant AddCol Frobenius normal form Krylov method Algorithm Reduction to matrix multiplication

slide-80
SLIDE 80

Computing Hermite Normal form

Equivalence over a ring: H = UA, where det(U) = ±1

Hermite normal form: H =

    p1 ∗ x1,2 ∗ ∗ x1,3 ∗ p2 ∗ ∗ x2,3 ∗ p3 ∗     , with 0 ≤ x∗,j < pj

Reduced Echelon form over a Ring

Improving Micciancio-Warinshi algorihm

◮ O˜

  • n5 log A
  • (heuristically: O˜
  • n3 log A
  • )

◮ space: O

  • n2 log A
  • ⇒Good on random matrices, common in num. theory, crypto.
slide-81
SLIDE 81

Computing Hermite Normal form

Equivalence over a ring: H = UA, where det(U) = ±1

Hermite normal form: H =

    p1 ∗ x1,2 ∗ ∗ x1,3 ∗ p2 ∗ ∗ x2,3 ∗ p3 ∗     , with 0 ≤ x∗,j < pj

Reduced Echelon form over a Ring

Improving Micciancio-Warinshi algorihm

◮ O˜

  • n5 log A
  • (heuristically: O˜
  • n3 log A
  • )

◮ space: O

  • n2 log A
  • ⇒Good on random matrices, common in num. theory, crypto.

Implementation, reduction to building blocks:

◮ LinSys over Z, ◮ CUP and MatMul over Zp

slide-82
SLIDE 82

Naive algorithm

begin

1

foreach i do

2

(g, ti, . . . , tn) = xgcd(Ai,i, Ai+1,i, . . . , An,i);

3

Li ← n

j=i+1 tjLj;

4

for j = i + 1 . . . n do

5

Lj ← Lj −

Aj,i g Li ;

/* eliminate */

6

for j = 1 . . . i − 1 do

7

Lj ← Lj − ⌊

Aj,i g ⌋Li ;

/* reduce */

8

end

9

    p1 ∗ x1,2 ∗ ∗ x1,3 ∗ p2 ∗ ∗ x2,3 ∗ p3 ∗    

slide-83
SLIDE 83

Computing modulo the determinant [Domich & Al. 87]

Property

For A non-singular: maxi

  • j Hij ≤ det H

Example

A =     −5 8 −3 −9 5 5 −2 8 −2 −2 8 5 7 −5 −8 4 3 −4 1 −1 6 8 −3     , H =     1 3 237 −299 90 1 1 103 −130 40 4 352 −450 135 486 −627 188     det A = 1944

slide-84
SLIDE 84

Computing modulo the determinant [Domich & Al. 87]

Property

For A non-singular: maxi

  • j Hij ≤ det H

Example

A =     −5 8 −3 −9 5 5 −2 8 −2 −2 8 5 7 −5 −8 4 3 −4 1 −1 6 8 −3     , H =     1 3 237 −299 90 1 1 103 −130 40 4 352 −450 135 486 −627 188     det A = 1944

Moreover, every computation can be done modulo d = det A: U′ A dIn In

  • =

H In

  • ⇒O
  • n3

× M(n(log n + log A)) = O˜

  • n4 log A
slide-85
SLIDE 85

Computing modulo the determinant

◮ Pessimistic estimate on the arithmetic size ◮ d large but most coefficients of H are small ◮ On the average : only the last few columns are large

⇒Compute H′ close to H but with small determinant

slide-86
SLIDE 86

Computing modulo the determinant

◮ Pessimistic estimate on the arithmetic size ◮ d large but most coefficients of H are small ◮ On the average : only the last few columns are large

⇒Compute H′ close to H but with small determinant [Micciancio & Warinschi 01]

A =   B b cT an−1,n dT an,n   d1 = det B cT

  • , d2 = det

B dT

  • g = gcd(d1, d2) = sd1 + td2 Then

det

  • B

scT + tdT

  • = g
slide-87
SLIDE 87

Micciancio & Warinschi algorithm

begin

1

Compute d1 = det B cT

  • , d2 = det

B dT

  • ;

/* Double Det */

2

(g, s, t) = xgcd(d1, d2);

3

Compute H1 the HNF of

  • B

scT + tdT

  • mod g ;

/* Modular HNF */

4

Recover H2 the HNF of

  • B

b scT + tdT san−1,n + tan,n

  • ;

/* AddCol */

5

Recover H3 the HNF of   B b cT an−1,n dT an,n   ; /* AddRow */

6

end

7

slide-88
SLIDE 88

Micciancio & Warinschi algorithm

begin

1

Compute d1 = det B cT

  • , d2 = det

B dT

  • ;

/* Double Det */

2

(g, s, t) = xgcd(d1, d2);

3

Compute H1 the HNF of

  • B

scT + tdT

  • mod g ;

/* Modular HNF */

4

Recover H2 the HNF of

  • B

b scT + tdT san−1,n + tan,n

  • ;

/* AddCol */

5

Recover H3 the HNF of   B b cT an−1,n dT an,n   ; /* AddRow */

6

end

7

slide-89
SLIDE 89

Double Determinant

First approach: LU mod p1, . . . , pk + CRT

◮ Only one elimination for the n − 2 first rows ◮ 2 updates for the last rows (triangular back substitution) ◮ k large such that k i=1 pi > nn log An/2

slide-90
SLIDE 90

Double Determinant

First approach: LU mod p1, . . . , pk + CRT

◮ Only one elimination for the n − 2 first rows ◮ 2 updates for the last rows (triangular back substitution) ◮ k large such that k i=1 pi > nn log An/2

Second approach: [Abbott Bronstein Mulders 99]

◮ Solve Ax = b. ◮ δ = lcm(q1, . . . , qn) s.t. xi = pi/qi

Then δ is a large divisor of D = det A.

◮ Compute D/δ by LU mod p1, . . . , pk + CRT ◮ k small, such that k i=1 pi > nn log An/2/δ

slide-91
SLIDE 91

Double Determinant : improved

Property

Let x = [x1, . . . , xn] be the solution of

  • A

c

  • x = d. Then

y = [− x1

xn , . . . , − xn−1 xn , 1 xn ] is the solution of

  • A

d

  • y = c.

◮ 1 system solve ◮ 1 LU for each pi

⇒d1, d2 computed at about the cost of 1 déterminant

slide-92
SLIDE 92

AddCol

Problem

Find a vector e such that

  • H1

e

  • = U
  • B

b scT + tdT san−1,n + tan,n

  • e

= U

  • b

san−1,n + tan,n

  • =

H1

  • B

scT + tdT −1 b san−1,n + tan,n

  • ⇒Solve a system.

◮ n − 1 first rows are small ◮ last row is large

slide-93
SLIDE 93

AddCol

Idea: replace the last row by a random small one wT. B wT

  • y =
  • b

an−1,n−1

  • Let k be a basis of the kernel of B. Then

x = y + αk. where α = an−1,n−1 − (scT + tdT) · y (scT + tdT) · k ⇒limits the expensive arithmetic to a few dot products

slide-94
SLIDE 94

Outline

Reduced Echelon forms and Gaussian elimination Gaussian elimination based matrix decompositions Relations between decompositions Algorithms Hermite normal form Micciancio & Warinschi algorithm Double Determinant AddCol Frobenius normal form Krylov method Algorithm Reduction to matrix multiplication

slide-95
SLIDE 95

Krylov Method

Definition (degree d Krylov matrix of one vector v)

K =

  • v

Av . . . Ad−1v

  • Property

A × K = K ×      ∗ 1 ∗ ... ∗ 1 ∗     

slide-96
SLIDE 96

Krylov Method

Definition (degree d Krylov matrix of one vector v)

K =

  • v

Av . . . Ad−1v

  • Property

A × K = K ×      ∗ 1 ∗ ... ∗ 1 ∗      ⇒if d = n, K−1AK = CPA

car

slide-97
SLIDE 97

Krylov Method

Definition (degree d Krylov matrix of one vector v)

K =

  • v

Av . . . Ad−1v

  • Property

A × K = K ×      ∗ 1 ∗ ... ∗ 1 ∗      ⇒if d = n, K−1AK = CPA

car

⇒[Keller-Gehrig, alg. 2] computes K in O (nω log n)

slide-98
SLIDE 98

Definition (degree k Krylov matrix of several vectors vi)

K =

  • v1

. . . Ak−1v1 v2 . . . Ak−1v2 . . . vl . . . Ak−1vl

  • Property

1 1 1 1 1 1 k ≤ k

A × K = K×

k

slide-99
SLIDE 99

Hessenberg poly-cyclic form

Fact

If (d1, . . . dl) is lexicographically maximal such that K =

  • v1

. . . Ad1−1v1 . . . vl . . . Adl−1vl

  • is non-singular, then

1 1 1 1 1 1

A × K = K×

d2 d1 dl

slide-100
SLIDE 100

Principle

k-shifted form:

1 1 1 1 1 1 k k ≤ k

slide-101
SLIDE 101

Principle

k + 1-shifted form:

1 1 1 1 1 1 k + 1 k + 1 dl

slide-102
SLIDE 102

Principle

◮ Compute iteratively from 1-shifted form to d1-shifted form

slide-103
SLIDE 103

Principle

◮ Compute iteratively from 1-shifted form to d1-shifted form ◮ each diagonal block appears in the increasing degree

slide-104
SLIDE 104

Principle

◮ Compute iteratively from 1-shifted form to d1-shifted form ◮ each diagonal block appears in the increasing degree ◮ until the shifted Hessenberg form is obtained:

1 1 1 1 1 1

d2 d1 dl

slide-105
SLIDE 105

Principle

◮ Compute iteratively from 1-shifted form to d1-shifted form ◮ each diagonal block appears in the increasing degree ◮ until the shifted Hessenberg form is obtained:

1 1 1 1 1 1

d2 d1 dl

How to transform from k to k + 1-shifted form ?

slide-106
SLIDE 106

Krylov normal extension

for any k-shifted form

1 1 1 1 1 1

k k ≤ k Ak = c2 c3

c1

slide-107
SLIDE 107

Krylov normal extension

for any k-shifted form

1 1 1 1 1 1

k k ≤ k Ak = c2 c3

c1

compute the n × (n + k) matrix

1 1 1 1 1 1

k k k K = c1 c2 c3

slide-108
SLIDE 108

Krylov normal extension

for any k-shifted form

1 1 1 1 1 1

k k ≤ k Ak = c2 c3

c1

compute the n × (n + k) matrix

1 1 1 1 1 1

k k k K = c1 c2 c3

and form K by picking its first linearly independent columns.

slide-109
SLIDE 109

The algorithm

◮ Form K: just copy the columns of Ak

slide-110
SLIDE 110

The algorithm

◮ Form K: just copy the columns of Ak ◮ Compute K: rank profile of K

slide-111
SLIDE 111

The algorithm

◮ Form K: just copy the columns of Ak ◮ Compute K: rank profile of K ◮ Apply the similarity transformation K−1AkK

slide-112
SLIDE 112

Example

slide-113
SLIDE 113

Example

slide-114
SLIDE 114

Example

slide-115
SLIDE 115

Example

slide-116
SLIDE 116

Example

slide-117
SLIDE 117

Example

slide-118
SLIDE 118

Example

slide-119
SLIDE 119

Example

slide-120
SLIDE 120

Example

slide-121
SLIDE 121

Example

slide-122
SLIDE 122

Example

slide-123
SLIDE 123

Example

slide-124
SLIDE 124

Example

slide-125
SLIDE 125

Lemma

If #F > 2n2, the transformation will succeed with high

  • probability. Failure is detected.
slide-126
SLIDE 126

Lemma

If #F > 2n2, the transformation will succeed with high

  • probability. Failure is detected.

How to use fast matrix arithmetic ?

slide-127
SLIDE 127

Permutations: compressing the dense columns

1 1 1 1 1 1 1 1

×P Ak = = Q×

c2 c3 c3 c1 c1 c2

slide-128
SLIDE 128

Permutations: compressing the dense columns

1 1 1 1 1 1 1 1

×P Ak = = Q×

c2 c3 c3 c1 c1 c2 1 1 1 1 1 1 1 1

×P′ K = = Q′×

c2 c1 c2 c1

slide-129
SLIDE 129

Reduction to Matrix multiplication

Similarity transformation: parenthesing

K−1AK = Q′−1 I ∗ ∗

  • P′−1Q

I ∗ ∗

  • PQ′

I ∗ ∗

  • P′
slide-130
SLIDE 130

Reduction to Matrix multiplication

Similarity transformation: parenthesing

K−1AK = Q′−1 I ∗ ∗ P′−1Q I ∗ ∗ PQ′ I ∗ ∗

  • P′
slide-131
SLIDE 131

Reduction to Matrix multiplication

Similarity transformation: parenthesing

K−1AK = Q′−1 I ∗ ∗ P′−1Q I ∗ ∗ PQ′ I ∗ ∗

  • P′

⇒O

  • k

n

k

ω

slide-132
SLIDE 132

Reduction to Matrix multiplication

Similarity transformation: parenthesing

K−1AK = Q′−1 I ∗ ∗ P′−1Q I ∗ ∗ PQ′ I ∗ ∗

  • P′

⇒O

  • k

n

k

ω Overall complexity: summing for each iteration:

n

  • k=1

k n k ω = nω

n

  • k=1

1 k ω−1 = ζ(ω − 1)nω = O (nω)

slide-133
SLIDE 133

A new type of reduction

xIn − A det(xIn − A)

dimension = 1 degree = n degree = 1 dimension = n

slide-134
SLIDE 134

A new type of reduction

xIn − A det(xIn − A)

dimension = 1 degree = n degree = 1 dimension = n

Keller-Gehrig 2

1 1 1 1

dimension =

n 2i

degree = 2i

slide-135
SLIDE 135

A new type of reduction

xIn − A det(xIn − A)

dimension = 1 degree = n degree = 1 dimension = n

Keller-Gehrig 2

1 1 1 1

dimension =

n 2i

degree = 2i

New algorithm

1 1 1 1 1 1 1 1 1

dimension = n

k

degree = k

slide-136
SLIDE 136

Conclusion

Reductions to a building block

Matrix Mult: block rec. log n

i=1 n

n

2i

ω−1 = O (nω) (Gauss, REF) Matrix Mult: Iterative n

k=1 n

n

k

ω−1 = O (nω) (Frobenius) Linear Sys: over Z (Hermite Normal Form)

Size/dimension compromises

◮ Hermite normal form : rank 1 updates reducing the

determinant

◮ Frobenius normal form : degree k, dimension n/k for

k = 1 . . . n