On fast multiplication of a matrix by its transpose Jean-Guillaume - - PowerPoint PPT Presentation

on fast multiplication of a matrix by its transpose
SMART_READER_LITE
LIVE PREVIEW

On fast multiplication of a matrix by its transpose Jean-Guillaume - - PowerPoint PPT Presentation

On fast multiplication of a matrix by its transpose Jean-Guillaume Dumas Cl ement Pernet Alexandre Sedoglavic Luminy, 3 Mars 2020 Centre de Recherche en Informatique, Signal et Automatique de Lille Strassen-Winograd fast multiplication


slide-1
SLIDE 1

On fast multiplication of a matrix by its transpose

Jean-Guillaume Dumas Cl´ ement Pernet Alexandre Sedoglavic

Luminy, 3 Mars 2020

Centre de Recherche en Informatique, Signal et Automatique de Lille

slide-2
SLIDE 2

Strassen-Winograd fast multiplication algorithm

Outline

1

Strassen-Winograd fast multiplication algorithm

2

Fast matrix product by its transpose

3

Skew orthogonal matrices

4

Complexity bounds for block algorithms

5

Space and time efficient implementation

6

Minimality

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 2 / 23

slide-3
SLIDE 3

Strassen-Winograd fast multiplication algorithm

2ˆ2 matrix multiplication

  • A11

A12 A21 A22

  • ˆ
  • B11

B12 B21 B22

  • =
  • (A11B11 + A12B21)

(A11B12 + A12B22) (A21B11 + A22B21) (A21B12 + A22B22)

  • =
  • C11

C12 C21 C22

  • Classical Algorithm

8 multiplications, 4 additions

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 3 / 23

slide-4
SLIDE 4

Strassen-Winograd fast multiplication algorithm

2ˆ2 matrix multiplication

  • A11

A12 A21 A22

  • ˆ
  • B11

B12 B21 B22

  • =
  • (A11B11 + A12B21)

(A11B12 + A12B22) (A21B11 + A22B21) (A21B12 + A22B22)

  • =
  • C11

C12 C21 C22

  • Classical Algorithm

8 multiplications, 4 additions

[Strassen 1969]

7 multiplications, 18 additions

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 3 / 23

slide-5
SLIDE 5

Strassen-Winograd fast multiplication algorithm

2ˆ2 matrix multiplication

  • A11

A12 A21 A22

  • ˆ
  • B11

B12 B21 B22

  • =
  • (A11B11 + A12B21)

(A11B12 + A12B22) (A21B11 + A22B21) (A21B12 + A22B22)

  • =
  • C11

C12 C21 C22

  • Classical Algorithm

8 multiplications, 4 additions

[Strassen 1969]

7 multiplications, 18 additions

[Winograd 1973? 1977]

7 multiplications, 15 additions

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 3 / 23

slide-6
SLIDE 6

Strassen-Winograd fast multiplication algorithm

2ˆ2 matrix multiplication

  • A11

A12 A21 A22

  • ˆ
  • B11

B12 B21 B22

  • =
  • (A11B11 + A12B21)

(A11B12 + A12B22) (A21B11 + A22B21) (A21B12 + A22B22)

  • =
  • C11

C12 C21 C22

  • Classical Algorithm

8 multiplications, 4 additions

[Strassen 1969]

7 multiplications, 18 additions

[Winograd 1973? 1977]

7 multiplications, 15 additions

[Hopcroft-Kerr 1969]:

7 multiplications minimum

[Bshouty 1995]:

15 additions minimum (for a bilin. alg. with 7 mult.)

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 3 / 23

slide-7
SLIDE 7

Strassen-Winograd fast multiplication algorithm

Matrix multiplication by its transpose A ¨ A⊺

  • A11

A12 A21 A22

  • ˆ
  • A⊺

11

A⊺

21

A⊺

12

A⊺

22

  • =
  • (A11A⊺

11 + A12A⊺ 12)

(A21A⊺

11 + A22A⊺ 12)

(A21A⊺

21 + A22A⊺ 22)

  • =
  • C11

C ⊺

21

C21 C22

  • Divide & Conquer Algorithm

6 multiplications, 3 additions

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 4 / 23

slide-8
SLIDE 8

Strassen-Winograd fast multiplication algorithm

Matrix multiplication by its transpose A ¨ A⊺

  • A11

A12 A21 A22

  • ˆ
  • A⊺

11

A⊺

21

A⊺

12

A⊺

22

  • =
  • (A11A⊺

11 + A12A⊺ 12)

(A21A⊺

11 + A22A⊺ 12)

(A21A⊺

21 + A22A⊺ 22)

  • =
  • C11

C ⊺

21

C21 C22

  • Divide & Conquer Algorithm

6 multiplications, 3 additions here (over C, over any finite field) 5 multiplications, 7.5 additions

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 4 / 23

slide-9
SLIDE 9

Fast matrix product by its transpose

Outline

1

Strassen-Winograd fast multiplication algorithm

2

Fast matrix product by its transpose

3

Skew orthogonal matrices

4

Complexity bounds for block algorithms

5

Space and time efficient implementation

6

Minimality

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 5 / 23

slide-10
SLIDE 10

Fast matrix product by its transpose

From Strassen-Winograd fast multiplication algorithm

Require: A = [ a11 a12

a21 a22 ] and B =

  • b11 b12

b21 b22

  • ;

Ensure: C = A ¨ B

1

8 additions: s1 Ð a11 ´ a21, s2 Ð a21 + a22, s3 Ð s2 ´ a11, s4 Ð a12 ´ s3, t1 Ð b22 ´ b12, t2 Ð b12 ´ b11, t3 Ð b11 + t1, t4 Ð b21 ´ t3.

2

7 recursive multiplications: p1 Ð a11 ¨ b11, p2 Ð a12 ¨ b21, p3 Ð a22 ¨ t4, p4 Ð s1 ¨ t1, p5 Ð s3 ¨ t3, p6 Ð s4 ¨ b22, p7 Ð s2 ¨ t2.

3

7 final additions: c1 Ð p1 + p5, c2 Ð c1 + p4, c3 Ð p1 + p2, c4 Ð c2 + p3, c5 Ð c2 + p7, c6 Ð c1 + p7, c7 Ð c6 + p6.

4

return C = [ c3 c7

c4 c5 ].

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 6 / 23

slide-11
SLIDE 11

Fast matrix product by its transpose

Matrix product by its transpose

Require: A = [ a11 a12

a21 a22 ] with A⊺ =

a⊺

11 a⊺ 21

a⊺

12 a⊺ 22

  • ;

Ensure: C = A ¨ A⊺

1

6 additions: s1 Ð a11 ´ a21, s2 Ð a21 + a22, s3 Ð s2 ´ a11, ✭✭✭✭✭✭ ✭ ❤❤❤❤❤❤ ❤ s4 Ð a12 ´ s3, t1 Ð a⊺

22 ´ a⊺ 21,

✭✭✭✭✭✭ ✭ ❤❤❤❤❤❤ ❤ t2 Ð a⊺

21 ´ a⊺ 11,

t3 Ð a⊺

11 + t1,

t4 Ð a⊺

12 ´ t3.

2

6 multiplications (2 recursive, 4 general): p1 Ð a11 ¨ a⊺

11,

p2 Ð a12 ¨ a⊺

12,

p3 Ð a22 ¨ t4, p4 Ð s1 ¨ t1, p5 Ð s3 ¨ t3, ✘✘✘✘✘ ✘ ❳❳❳❳❳ ❳ p6 Ð s4 ¨ a⊺

22,

p7 Ð s2 ¨ s⊺

1 .

3

5 final additions: c1 Ð p1 + p5, c2 Ð c1 + p4, c3 Ð p1 + p2, c4 Ð c2 + p3, c5 Ð c2 ´ p7, ✭✭✭✭✭ ✭ ❤❤❤❤❤ ❤ c6 Ð c1 ´ p7, ✭✭✭✭✭ ✭ ❤❤❤❤❤ ❤ c7 Ð c6 + p6.

4

return C = c3 ✚ ❩

c7 c4 c5

  • .

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 7 / 23

slide-12
SLIDE 12

Fast matrix product by its transpose

Matrix product by its transpose

Require: A = [ a11 a12

a21 a22 ] with A⊺ =

a⊺

11 a⊺ 21

a⊺

12 a⊺ 22

  • ;

Ensure: C = A ¨ A⊺

1

6 additions: s1 Ð a11 ´ a21, s2 Ð a21 + a22, s3 Ð s2 ´ a11, ✭✭✭✭✭✭ ✭ ❤❤❤❤❤❤ ❤ s4 Ð a12 ´ s3, t1 Ð a⊺

22 ´ a⊺ 21,

all variants have sign discrepancies

✭✭✭✭✭✭ ✭ ❤❤❤❤❤❤ ❤ t2 Ð a⊺

21 ´ a⊺ 11,

t3 Ð a⊺

11 + t1,

t4 Ð a⊺

12 ´ t3.

2

6 multiplications (2 recursive, 4 general): p1 Ð a11 ¨ a⊺

11,

p2 Ð a12 ¨ a⊺

12,

p3 Ð a22 ¨ t4, p4 Ð s1 ¨ t1, p5 Ð s3 ¨ t3, ✘✘✘✘✘ ✘ ❳❳❳❳❳ ❳ p6 Ð s4 ¨ a⊺

22,

p7 Ð s2 ¨ s⊺

1 .

3

5 final additions: c1 Ð p1 + p5, c2 Ð c1 + p4, c3 Ð p1 + p2, c4 Ð c2 + p3, c5 Ð c2 ´ p7, ✭✭✭✭✭ ✭ ❤❤❤❤❤ ❤ c6 Ð c1 ´ p7, ✭✭✭✭✭ ✭ ❤❤❤❤❤ ❤ c7 Ð c6 + p6.

4

return C = c3 ✚ ❩

c7 c4 c5

  • .

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 7 / 23

slide-13
SLIDE 13

Fast matrix product by its transpose

Parameterized matrix product by its transpose!

Require: A = [ a11 a12

a21 a22 ] and Y s.t. YY ⊺ = ´In ;

Ensure: C = A ¨ A⊺

1

4 additions and 2 multiplications by Y : s1 Ð (a21 ´ a11) Y , s2 Ð a22 ´ a21 Y , s3 Ð ´a11 Y ´ s2. s4 Ð s3 + a12, ✭✭✭✭✭✭✭ ✭ ❤❤❤❤❤❤❤ ❤ t1 Ð Y ⊺a⊺

21 ´ a⊺ 22

✭✭✭✭✭✭✭✭ ✭ ❤❤❤❤❤❤❤❤ ❤ t3 Ð ´Y ⊺a⊺

11 ´ t1

✘✘✘✘✘✘ ❳❳❳❳❳❳ t4 Ð t3 ´ a⊺

12

2

5 multiplications (3 recursive, 2 general): p1 Ð a11 ¨ a⊺

11,

p2 Ð a12 ¨ a⊺

12,

p3 Ð a22 ¨ s⊺

4 ,

p4 Ð s1 ¨ s⊺

2 ,

p5 Ð s3 ¨ s⊺

3 .

✘✘✘✘✘ ✘ ❳❳❳❳❳ ❳ p7 Ð s2 ¨ s⊺

1

3

5 final additions: c1 Ð p1 + p5, c2 Ð c1 + p4, c3 Ð p1 + p2, c4 Ð c2 + p3, c5 Ð c2 + p⊺

4 .

4

return C = [ c3

c4 c5 ].

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 8 / 23

slide-14
SLIDE 14

Fast matrix product by its transpose

Fast Matrix product by its transpose, using symmetries

Require: A = [ a11 a12

a21 a22 ] and Y s.t. YY ⊺ = ´In;

Ensure: C = A ¨ A⊺

1

4 additions and 2 multiplications by Y : s1 Ð (a21 ´ a11)Y , s2 Ð a22 ´ a21Y , s3 Ð ´a11Y ´ s2. s4 Ð s3 + a12,

2

5 multiplications (3 recursive, 2 general): p1 Ð a11 ¨ a⊺

11,

p2 Ð a12 ¨ a⊺

12,

p3 Ð a22 ¨ s⊺

4,

p4 Ð s1 ¨ s⊺

2,

p5 Ð s3 ¨ s⊺

3.

3

2 complete and 3 symmetric additions : Low(c1) Ð Low(p1) + Low(p5) , c2 Ð c1 + p4, Low(c3) Ð Low(p1) + Low(p2) , Low(c5) Ð Low(c2) + Low(p⊺

4) ,

c4 Ð c2 + p3.

4

return C = [ c3

c4 c5 ].

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 9 / 23

slide-15
SLIDE 15

Skew orthogonal matrices

Outline

1

Strassen-Winograd fast multiplication algorithm

2

Fast matrix product by its transpose

3

Skew orthogonal matrices

4

Complexity bounds for block algorithms

5

Space and time efficient implementation

6

Minimality

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 10 / 23

slide-16
SLIDE 16

Skew orthogonal matrices

Skew orthogonal matrices?

Definition (Skew orthogonal matrix) Y such that Y ¨ Y ⊺ = ´In

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 11 / 23

slide-17
SLIDE 17

Skew orthogonal matrices

Skew orthogonal matrices?

Definition (Skew orthogonal matrix) Y such that Y ¨ Y ⊺ = ´In ✓ C: Y = i ¨ In ➡ no-op: swap real/imaginary & 1 sign

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 11 / 23

slide-18
SLIDE 18

Skew orthogonal matrices

Skew orthogonal matrices?

Definition (Skew orthogonal matrix) Y such that Y ¨ Y ⊺ = ´In ✓ C: Y = i ¨ In ➡ no-op: swap real/imaginary & 1 sign ✗ R

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 11 / 23

slide-19
SLIDE 19

Skew orthogonal matrices

Skew orthogonal matrices?

Definition (Skew orthogonal matrix) Y such that Y ¨ Y ⊺ = ´In ✓ C: Y = i ¨ In ➡ no-op: swap real/imaginary & 1 sign ✗ R ✗ Q

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 11 / 23

slide-20
SLIDE 20

Skew orthogonal matrices

Skew orthogonal matrices?

Definition (Skew orthogonal matrix) Y such that Y ¨ Y ⊺ = ´In ✓ C: Y = i ¨ In ➡ no-op: swap real/imaginary & 1 sign ✗ R ✗ Q ✓ F2k: (p = 2) then 1 = ´1 is a square ➡Y = In

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 11 / 23

slide-21
SLIDE 21

Skew orthogonal matrices

Skew orthogonal matrices?

Definition (Skew orthogonal matrix) Y such that Y ¨ Y ⊺ = ´In ✓ C: Y = i ¨ In ➡ no-op: swap real/imaginary & 1 sign ✗ R ✗ Q ✓ F2k: (p = 2) then 1 = ´1 is a square ➡Y = In ✓ Fpk: (p ” 1 mod 4) or (k even) then ´1 is a square ➡Y = i ¨ In

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 11 / 23

slide-22
SLIDE 22

Skew orthogonal matrices

Skew orthogonal matrices?

Definition (Skew orthogonal matrix) Y such that Y ¨ Y ⊺ = ´In ✓ C: Y = i ¨ In ➡ no-op: swap real/imaginary & 1 sign ✗ R ✗ Q ✓ F2k: (p = 2) then 1 = ´1 is a square ➡Y = In ✓ Fpk: (p ” 1 mod 4) or (k even) then ´1 is a square ➡Y = i ¨ In ✓ Other finite fields, n ě 4: Y =

  • a

b ´b a

  • b In/2 =
  • a In/2

b In/2 ´b In/2 a In/2

  • P Fnˆn

q

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 11 / 23

slide-23
SLIDE 23

Skew orthogonal matrices

Skew orthogonal matrices?

Definition (Skew orthogonal matrix) Y such that Y ¨ Y ⊺ = ´In ✓ C: Y = i ¨ In ➡ no-op: swap real/imaginary & 1 sign ✗ R ✗ Q ✓ F2k: (p = 2) then 1 = ´1 is a square ➡Y = In ✓ Fpk: (p ” 1 mod 4) or (k even) then ´1 is a square ➡Y = i ¨ In ✓ Other finite fields, n ě 4: Y =

  • a

b ´b a

  • b In/2 =
  • a In/2

b In/2 ´b In/2 a In/2

  • P Fnˆn

q

Proof.

  • a

b ´b a a ´b b a

  • =
  • a2 + b2

a2 + b2

  • =
  • ´1

´1

  • if a2 + b2 = ´1

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 11 / 23

slide-24
SLIDE 24

Complexity bounds for block algorithms

Outline

1

Strassen-Winograd fast multiplication algorithm

2

Fast matrix product by its transpose

3

Skew orthogonal matrices

4

Complexity bounds for block algorithms

5

Space and time efficient implementation

6

Minimality

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 12 / 23

slide-25
SLIDE 25

Complexity bounds for block algorithms

Leading terms of the complexity bounds for matrix multiplication

Definition (Leading term of a complexity bound for matrix multiplication) n ˆ n Matrices can be multiplied at a cost dominateda by MMω(n).

aMMω(n) = cωnω s.t. matrix mult. can be done in MMω(n) + o(MMω(n)).

Classical method: MM3(n) = 2n3 Strassen-Winograd calling the classical algorithm at a given threshold:

➡ C(n) ď 7C(rn/2s) + 15rn/2s2

  • r

C(k) = 2k3 for k Æ 1 000 ➡ MMlog2(7)(n) = O

  • nlog2(7)

, with log2(7) « 2.807

. . .

[LeGall 2014]: ω ă 2.3728639

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 13 / 23

slide-26
SLIDE 26

Complexity bounds for block algorithms

Leading terms of the complexity bounds for A ¨ A⊺ over finite fields

Classical method: 1 2MM3(n) = n3 Our algorithm:

➡ S(n) ď 3 S(rn/2s) + 2 MMω(rn/2s) +  7.5 + $ & % 2 6 , .

 rn/2s2

  • r

S(k) = k3 for k Æ 2 000 ➡ 2 2ω ´ 3 MMω(n)

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 14 / 23

slide-27
SLIDE 27

Complexity bounds for block algorithms

Leading terms of the complexity bounds for A ¨ A⊺ over finite fields

Classical method: 1 2MM3(n) = n3 Our algorithm:

➡ S(n) ď 3 S(rn/2s) + 2 MMω(rn/2s) +  7.5 + $ & % 2 6 , .

 rn/2s2

  • r

S(k) = k3 for k Æ 2 000 ➡ 2 2ω ´ 3 MMω(n)

Problem Algorithm O

  • n3

O

  • nlog2(7)

O(nω) A ¨ A⊺ P Fnˆn D & C n3 2 3 MMlog2(7)(n) 2 2ω ´ 4 MMω(n) here 0.8n3 1 2 MMlog2(7)(n) 2 2ω ´ 3 MMω(n)

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 14 / 23

slide-28
SLIDE 28

Complexity bounds for block algorithms

Leading terms of the complexity bounds over C

Problem Algorithm O

  • n3

O

  • nlog2(7)

O(nω) A ¨ B P Cnˆn naive 8n3 4 RRlog2(7)(n) 4 RRω(n) (U + iV )¨(W + iX) Karatsuba: 3M 6n3 3 RRlog2(7)(n) 3 RRω(n)

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 15 / 23

slide-29
SLIDE 29

Complexity bounds for block algorithms

Leading terms of the complexity bounds over C

Problem Algorithm O

  • n3

O

  • nlog2(7)

O(nω) A ¨ B P Cnˆn naive 8n3 4 RRlog2(7)(n) 4 RRω(n) (U + iV )¨(W + iX) Karatsuba: 3M 6n3 3 RRlog2(7)(n) 3 RRω(n) A ¨ A⊺ P Cnˆn 2M 4n3 2 RRlog2(7)(n) 2 RRω(n)‹ D & C 3n3 2 RRlog2(7)(n) 6 2ω ´ 4 RRω(n) here 2.4n3 3 2 RRlog2(7)(n) 6 2ω ´ 3 RRω(n)

‹If ω ă log2(6) « 2.585, then 2 ă

6 2ω ´ 3

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 15 / 23

slide-30
SLIDE 30

Space and time efficient implementation

Outline

1

Strassen-Winograd fast multiplication algorithm

2

Fast matrix product by its transpose

3

Skew orthogonal matrices

4

Complexity bounds for block algorithms

5

Space and time efficient implementation

6

Minimality

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 16 / 23

slide-31
SLIDE 31

Space and time efficient implementation

Scheduling

C Ð αA ¨ A⊺

C22 C12 C21 C11 S2 S1 P4⊺ S3 P5 S4 P3 P1 C1 C2 C5 C4 P2 C3

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 17 / 23

slide-32
SLIDE 32

Space and time efficient implementation

Scheduling

C Ð αA ¨ A⊺

C22 C12 C21 C11 S2 S1 P4⊺ S3 P5 S4 P3 P1 C1 C2 C5 C4 P2 C3

SYRK: C Ð αA ¨ A⊺ + βC

  • peration

location S1 = (A21 ´ A11) ¨ Y tmp ( n

2 ˆ n 2 )

S2 = A22 ´ A21 ¨ Y C12 Up(C11) = Low(C22)⊺ C11 P4⊺ = αS2 ¨ S1⊺ C22 S3 = S1 ´ A22 tmp P5 = αS3 ¨ S3⊺ C12 S4 = S3 + A12 tmp P3 = αA22 ¨ S4⊺ + βC21 C21 P1 = αA11 ¨ A11⊺ tmp U1 = P1 + P5 C12 Up(U1) = Low(U1)⊺ C12 U2 = U1 + P4 C12 U4 = U2 + P3 C21 U5 = U2 + P4⊺ + βUp(C11)⊺ C22 P2 = αA12 ¨ A12⊺ + βC11 C11 U3 = P1 + P2 C11

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 17 / 23

slide-33
SLIDE 33

Space and time efficient implementation

Speed: LinBox/FFLAS-FFPACK

10 20 30 40 50 60 70 80 2000 4000 6000 8000 10000 12000 14000 16000 Effective Gfops: n3/(109 x time) n Fast FSYRK on an i7-6700 @ 3.4 GHz Classic OpenBLAS DSYRK Classic FSYRK modulo 131071 Divide & Conquer FSYRK modulo 131071 Fast FSYRK modulo 131071 Fast FSYRK modulo 131041

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 18 / 23

slide-34
SLIDE 34

Space and time efficient implementation

Speed: LinBox/FFLAS-FFPACK

10 20 30 40 50 60 70 80 2000 4000 6000 8000 10000 12000 14000 16000 Effective Gfops: n3/(109 x time) n Fast FSYRK on an i7-6700 @ 3.4 GHz Classic OpenBLAS DSYRK Classic FSYRK modulo 131071 Divide & Conquer FSYRK modulo 131071 Fast FSYRK modulo 131071 Fast FSYRK modulo 131041 1 minute 1’28”

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 18 / 23

slide-35
SLIDE 35

Minimality

Outline

1

Strassen-Winograd fast multiplication algorithm

2

Fast matrix product by its transpose

3

Skew orthogonal matrices

4

Complexity bounds for block algorithms

5

Space and time efficient implementation

6

Minimality

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 19 / 23

slide-36
SLIDE 36

Minimality

On minimality of bilinear algorithms computing A ¨ A⊺

Conjecture (5 multiplications) The best block variant has 3 recursive calls and 2 generic multiplications Conjecture (Skew orthogonal) All variants with 5 multiplications require a skew orthogonal matrix

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 20 / 23

slide-37
SLIDE 37

Minimality

On minimality of bilinear algorithms computing A ¨ A⊺

Conjecture (5 multiplications) The best block variant has 3 recursive calls and 2 generic multiplications Conjecture (Skew orthogonal) All variants with 5 multiplications require a skew orthogonal matrix Theorem (non-commutative) Non-commutative 2ˆ2 variants with 5 multiplications require at least 11 additions Theorem (9 additions) All block variants with 5 multiplications require at least 9 block additions

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 20 / 23

slide-38
SLIDE 38

Minimality

On minimality of bilinear algorithms computing A ¨ A⊺

Conjecture (5 multiplications) The best block variant has 3 recursive calls and 2 generic multiplications Conjecture (Skew orthogonal) All variants with 5 multiplications require a skew orthogonal matrix Theorem (non-commutative) Non-commutative 2ˆ2 variants with 5 multiplications require at least 11 additions Theorem (9 additions) All block variants with 5 multiplications require at least 9 block additions ➡With symmetries of the blocks this is reduced to

  • 4 + 2 + 3

2

  • n2 = 7.5n2 additions

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 20 / 23

slide-39
SLIDE 39

Minimality

Strassen tensor

Example: Winograd’s variant p1 = a11 ¨ b11 p2 = a12 ¨ b21 p3 = (a11 ´ a21) ¨ (b22 ´ b12) p4 = (a21 + a22) ¨ (b12 ´ b11) p5 = (a11 + a12 ´ a21 ´ a22) ¨ b22 p6 = a22(b12 ´ b11 + b21 ´ b22) p7 = (a21 ´ a11 + a22) ¨ (b11 ´ b12 + b22) c11 = p1 + p2 c12 = p1 + p4 + p5 + p7 c21 = p1 + p3 + p6 + p7 c22 = p1 + p3 + p4 + p7

ř7

i=1 Si1bSi2bSi3 =

  • 1
  • b
  • 1
  • b
  • 1

1 1 1

  • +
  • 1
  • b
  • 1
  • b
  • 1
  • +
  • 1

´1

  • b
  • ´1

1

  • b
  • 1

1

  • +
  • 1

1

  • b
  • ´1

1

  • b
  • 1

1

  • +
  • 1

1 ´1 ´1

  • b
  • 1
  • b
  • 1
  • +
  • 1
  • b
  • ´1

1 1 ´1

  • b
  • 1
  • +
  • ´1

1 1

  • b
  • 1

´1 1

  • b
  • 1

1 1

  • Dumas-Pernet-Sedoglavic

On fast multiplication of a matrix by its transpose JNCF 2020 21 / 23

slide-40
SLIDE 40

Minimality

Strassen tensor

Example: Winograd’s variant p1 = a11 ¨ b11 p2 = a12 ¨ b21 p3 = (a11 ´ a21) ¨ (b22 ´ b12) p4 = (a21 + a22) ¨ (b12 ´ b11) p5 = (a11 + a12 ´ a21 ´ a22) ¨ b22 p6 = a22(b12 ´ b11 + b21 ´ b22) p7 = (a21 ´ a11 + a22) ¨ (b11 ´ b12 + b22) c11 = p1 + p2 c12 = p1 + p4 + p5 + p7 c21 = p1 + p3 + p6 + p7 c22 = p1 + p3 + p4 + p7

ř7

i=1 Si1bSi2bSi3 =

  • 1
  • b
  • 1
  • b
  • 1

1 1 1

  • +
  • 1
  • b
  • 1
  • b
  • 1
  • +
  • 1

´1

  • b
  • ´1

1

  • b
  • 1

1

  • +
  • 1

1

  • b
  • ´1

1

  • b
  • 1

1

  • +
  • 1

1 ´1 ´1

  • b
  • 1
  • b
  • 1
  • +
  • 1
  • b
  • ´1

1 1 ´1

  • b
  • 1
  • +
  • ´1

1 1

  • b
  • 1

´1 1

  • b
  • 1

1 1

  • Theorem (

[de Groot 1978])

All the 7 mult. variants (U, V , W unimodular) are given by:

  • U´1 ¨ Si1 ¨ V
  • b
  • V ´1 ¨ Si2 ¨ W
  • b
  • W ´1 ¨ Si3 ¨ U
  • Dumas-Pernet-Sedoglavic

On fast multiplication of a matrix by its transpose JNCF 2020 21 / 23

slide-41
SLIDE 41

Minimality

Isotropy group and Gr¨

  • bner basis: multiplications

5 multiplications for A ¨ A⊺? 12 unknowns: U = [ u11 u12

u21 u22 ], V = [ v11 v12 v21 v22 ], W = [ w11 w12 w21 w22 ].

31 equations of degree 4

3 ˚ 4 equations: recursive calls pi = X ¨ X ⊺ and 2 ˚ 2 ˚ 4 equations: two symmetries pj = X ¨ Y ⊺ = (Y ¨ X ⊺)⊺ = pk⊺ 3 equations: unimodularity of U, V , W

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 22 / 23

slide-42
SLIDE 42

Minimality

Isotropy group and Gr¨

  • bner basis: multiplications

5 multiplications for A ¨ A⊺?

Gr¨

  • bner basis via FGb

$ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ & ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ % z = v11 ´ v21 v11 = z + v21 v22 =

  • 2 v21(v21 + z) ´ 1
  • v21 + z3

v12 = ´

  • v212 + (v21 + z2)2 + 1
  • v21 ´ z

u11 = ´ (z + v21)2 + v212(w21 + w22) u21 = ´ (z + v21)2 + v212(w11 + w12) u12 = ´ (z + v21)2 + v212 w22 u22 = (z + v21)2 + v212 w12 1 = w11w22 ´ w12w21 = (z + v21)2 + v2122 + 1‹ ❯ z2 = ?´1, v11 = 1, v21 = 0, w11 = 1, w12 = 0, w21 = 0; then scale by 1/z ➡ our algorithm

‹0 =

(z + v21)2 + v2122 + 1 imposes a square root of ´1

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 22 / 23

slide-43
SLIDE 43

Conclusion

Perspective

Minimality of multiplications conjectures: reduce polynomial system?

✓ 5 multiplications are required ✓ except, 4 multiplications for: (p=2) & (2ˆ2) & (commutative) conjectures verified @ isotropies of Strassen-Winograd ...

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 23 / 23

slide-44
SLIDE 44

Conclusion

Perspective

Minimality of multiplications conjectures: reduce polynomial system?

[Schwartz et al. 2017, 2019] for A ¨ A⊺? O

  • n2 log(n)
  • non matrix based pre-computations

➡ only 12 additions

ě 5 recursive levels: fewer operations than Strassen-Winograd for A ¨ B Probably for matrices larger than 64k ...

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 23 / 23

slide-45
SLIDE 45

Conclusion

Perspective

Minimality of multiplications conjectures: reduce polynomial system?

[Schwartz et al. 2017, 2019] for A ¨ A⊺?

Accelerating LDL⊺ in practice for not generic rank profile?

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose JNCF 2020 23 / 23

slide-46
SLIDE 46

Skew orthogonal matrices

Sum of squares in finite fields

Require: p P Pzt2u and k P Z, a non quadratic residue modp. Ensure: (a, b) P Z2, s.t. a2 + b2 ” k mod p.

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose S´ eminaire CAS3C3 12 / 30

slide-47
SLIDE 47

Skew orthogonal matrices

Sum of squares in finite fields

Require: p P Pzt2u and k P Z, a non quadratic residue modp. Ensure: (a, b) P Z2, s.t. a2 + b2 ” k mod p.

1

Find s, such that # s is not a square s ´ 1 is a square

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose S´ eminaire CAS3C3 12 / 30

slide-48
SLIDE 48

Skew orthogonal matrices

Sum of squares in finite fields

Require: p P Pzt2u and k P Z, a non quadratic residue modp. Ensure: (a, b) P Z2, s.t. a2 + b2 ” k mod p.

1

Find s, such that # s is not a square s ´ 1 is a square

➡ randomize (or lowest quadratic non-residue is O

  • log2(p)
  • under ERH)

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose S´ eminaire CAS3C3 12 / 30

slide-49
SLIDE 49

Skew orthogonal matrices

Sum of squares in finite fields

Require: p P Pzt2u and k P Z, a non quadratic residue modp. Ensure: (a, b) P Z2, s.t. a2 + b2 ” k mod p.

1

Find s, such that # s is not a square s ´ 1 is a square

2

c such that c2 ” s ´ 1 mod p

➡ c = ModSquareRoot(s ´ 1, p)

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose S´ eminaire CAS3C3 12 / 30

slide-50
SLIDE 50

Skew orthogonal matrices

Sum of squares in finite fields

Require: p P Pzt2u and k P Z, a non quadratic residue modp. Ensure: (a, b) P Z2, s.t. a2 + b2 ” k mod p.

1

Find s, such that # s is not a square s ´ 1 is a square

2

c such that c2 ” s ´ 1 mod p

3

r ” ks´1 mod p

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose S´ eminaire CAS3C3 12 / 30

slide-51
SLIDE 51

Skew orthogonal matrices

Sum of squares in finite fields

Require: p P Pzt2u and k P Z, a non quadratic residue modp. Ensure: (a, b) P Z2, s.t. a2 + b2 ” k mod p.

1

Find s, such that # s is not a square s ´ 1 is a square

2

c such that c2 ” s ´ 1 mod p

3

r ” ks´1 mod p

4

a such that a2 ” r mod p

product of non squares is square ➡ Now k ” rs ” a2s ” a2(1 + s ´ 1) ” a2(1 + c2) mod p

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose S´ eminaire CAS3C3 12 / 30

slide-52
SLIDE 52

Complexity bounds for block algorithms

2M method over C

Usually, complex numbers are implemented as pairs of floats

(U + iV )(W + iX) in 4 floating point multiplications 3M (Karatsuba) in 3 : UW ´ VX + i((U + V )(W + X) ´ UW ´ VX)

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose S´ eminaire CAS3C3 16 / 30

slide-53
SLIDE 53

Complexity bounds for block algorithms

2M method over C

Usually, complex numbers are implemented as pairs of floats ➡ 2M method: A ¨ A⊺ = (U + iV ) ¨ (U⊺ + iV ⊺)

1

G = (U + V ) ¨ (U⊺ ´ V ⊺)

2

H = U ¨ V ⊺

3

Re = G ´ H⊺ + H

4

Im = H + H⊺

Then Re + i¨Im = (U ¨ U⊺ ´ V ¨ V ⊺) + i¨(U ¨ V ⊺ + V ¨ U⊺) = (U + iV ) ¨ (U⊺ + iV ⊺)

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose S´ eminaire CAS3C3 16 / 30

slide-54
SLIDE 54

Minimality

Brent equations for Strassen-like algorithms for 2ˆ2 matrices

Cij = ÿ AiℓBℓj 4 equations, 16 monomials (1) pk = ÿ αijkAij

  • ¨

ÿ βijkBij

  • (4 + 4) ˚ 7 unknowns

(2) Cij =

7

ÿ

k=1

γijkpk 4 ˚ 7 unknowns (3)

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose S´ eminaire CAS3C3 24 / 30

slide-55
SLIDE 55

Minimality

Brent equations for Strassen-like algorithms for 2ˆ2 matrices

Cij = ÿ AiℓBℓj 4 equations, 16 monomials (1) pk = ÿ αijkAij

  • ¨

ÿ βijkBij

  • (4 + 4) ˚ 7 unknowns

(2) Cij =

7

ÿ

k=1

γijkpk 4 ˚ 7 unknowns (3) substitute (2) in (3), equate (1) 84 unknowns in 64 equations of degree 3

[Strassen 1969]: D a solution

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose S´ eminaire CAS3C3 24 / 30

slide-56
SLIDE 56

Minimality

Brent equations for Strassen-like algorithms for 2ˆ2 matrices

Cij = ÿ AiℓBℓj 4 equations, 16 monomials (1) pk = ÿ αijkAij

  • ¨

ÿ βijkBij

  • (4 + 4) ˚ 7 unknowns

(2) Cij =

7

ÿ

k=1

γijkpk 4 ˚ 7 unknowns (3) substitute (2) in (3), equate (1) 84 unknowns in 64 equations of degree 3

[Strassen 1969]: D a solution

A ¨ A⊺ in 5 multiplications? ➡ still 55 unknowns in 48 equations of degree 3

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose S´ eminaire CAS3C3 24 / 30

slide-57
SLIDE 57

Minimality

Isotropy group and Gr¨

  • bner basis: additions

9 additions for A ¨ A⊺? Adapting

[Bshouty 1995]: we prove minimality of 9 additions.

1

Like generic case, 4 is minimal for pre-additions on each of A and B ➡ from 15 to 11

2

7 post-additions to compute C11, C12, C21, C22 is minimal:

This is (4, 1, 1, 1), (3, 2, 1, 1) or (2, 2, 2, 1) additions Symmetry of result: best possible saving of C21 = C12⊺ is 2 in (2, 2, 2, 1) ➡ from 11 to 9 Symmetry of the diagonal blocks C11 = C11⊺ & C22 = C22⊺ gives at most 2+1 2 n2 further savings

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose S´ eminaire CAS3C3 27 / 30

slide-58
SLIDE 58

Minimality

Isotropy group and Gr¨

  • bner basis: additions

9 additions for A ¨ A⊺? Adapting

[Bshouty 1995]: we prove minimality of 9 additions.

1

Like generic case, 4 is minimal for pre-additions on each of A and B ➡ from 15 to 11

2

7 post-additions to compute C11, C12, C21, C22 is minimal:

This is (4, 1, 1, 1), (3, 2, 1, 1) or (2, 2, 2, 1) additions Symmetry of result: best possible saving of C21 = C12⊺ is 2 in (2, 2, 2, 1) ➡ from 11 to 9 Symmetry of the diagonal blocks C11 = C11⊺ & C22 = C22⊺ gives at most 2+1 2 n2 further savings

(Alternatively adding equations to the Gr¨

  • bner basis)

s.t. Sij P 1 0

0 0

  • ,

0 1

0 0

  • ,

0 0

1 0

  • ,

0 0

0 1

(

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose S´ eminaire CAS3C3 27 / 30

slide-59
SLIDE 59

Application to LDL⊺

Outline

1

Strassen-Winograd fast multiplication algorithm

2

Fast matrix product by its transpose

3

Skew orthogonal matrices

4

Complexity bounds for block algorithms

5

Space and time efficient implementation

6

Minimality

7

Application to LDL⊺

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose S´ eminaire CAS3C3 28 / 30

slide-60
SLIDE 60

Application to LDL⊺

SYRDK: A ¨ D ¨ A⊺

SYRDK is a main source of speed in LDL⊺ factorizations D is block diagonal with diagonal/anti-diagonal/anti-triangular blocks

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose S´ eminaire CAS3C3 29 / 30

slide-61
SLIDE 61

Application to LDL⊺

SYRDK: A ¨ D ¨ A⊺

SYRDK is a main source of speed in LDL⊺ factorizations D is block diagonal with diagonal/anti-diagonal/anti-triangular blocks To compute A ¨ D ¨ A⊺ faster over finite fields:

1

D Ñ ∆ ¨ ∆⊺

2

r A Ð A ¨ ∆ once and for all

3

r A ¨ r A⊺

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose S´ eminaire CAS3C3 29 / 30

slide-62
SLIDE 62

Application to LDL⊺

SYRDK: A ¨ D ¨ A⊺

SYRDK is a main source of speed in LDL⊺ factorizations D is block diagonal with diagonal/anti-diagonal/anti-triangular blocks To compute A ¨ D ¨ A⊺ faster over finite fields:

1

D Ñ ∆ ¨ ∆⊺

2

r A Ð A ¨ ∆ once and for all

3

r A ¨ r A⊺

➡ How to factor D into ∆ ¨ ∆⊺?

  • α 0

0 β

  • : squares

➡ ok

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose S´ eminaire CAS3C3 29 / 30

slide-63
SLIDE 63

Application to LDL⊺

SYRDK: A ¨ D ¨ A⊺

SYRDK is a main source of speed in LDL⊺ factorizations D is block diagonal with diagonal/anti-diagonal/anti-triangular blocks To compute A ¨ D ¨ A⊺ faster over finite fields:

1

D Ñ ∆ ¨ ∆⊺

2

r A Ð A ¨ ∆ once and for all

3

r A ¨ r A⊺

➡ How to factor D into ∆ ¨ ∆⊺?

  • α 0

0 β

  • : squares

➡ ok

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose S´ eminaire CAS3C3 29 / 30

slide-64
SLIDE 64

Application to LDL⊺

SYRDK: A ¨ D ¨ A⊺

SYRDK is a main source of speed in LDL⊺ factorizations D is block diagonal with diagonal/anti-diagonal/anti-triangular blocks To compute A ¨ D ¨ A⊺ faster over finite fields:

1

D Ñ ∆ ¨ ∆⊺

2

r A Ð A ¨ ∆ once and for all

3

r A ¨ r A⊺

➡ How to factor D into ∆ ¨ ∆⊺?

  • α 0

0 β

  • : squares

➡ ok

  • therwise

sum of squares ➡ a b

c d

[ a c

b d ] =

  • a2+b2 ac+bd

ac+bd c2+d2

  • =
  • α 0

0 β

  • Dumas-Pernet-Sedoglavic

On fast multiplication of a matrix by its transpose S´ eminaire CAS3C3 29 / 30

slide-65
SLIDE 65

Application to LDL⊺

SYRDK: A ¨ D ¨ A⊺

SYRDK is a main source of speed in LDL⊺ factorizations D is block diagonal with diagonal/anti-diagonal/anti-triangular blocks To compute A ¨ D ¨ A⊺ faster over finite fields:

1

D Ñ ∆ ¨ ∆⊺

2

r A Ð A ¨ ∆ once and for all

3

r A ¨ r A⊺

➡ How to factor D into ∆ ¨ ∆⊺?

  • α 0

0 β

  • : squares

➡ ok

  • therwise

sum of squares ➡ a b

c d

[ a c

b d ] =

  • a2+b2 ac+bd

ac+bd c2+d2

  • =
  • α 0

0 β

  • 0 γ

γ 0

  • :

1

1 1 ´1

1 2 γ

0 ´ 1

2 γ

1

1 1 ´1

⊺ = 0 γ

γ 0

  • Dumas-Pernet-Sedoglavic

On fast multiplication of a matrix by its transpose S´ eminaire CAS3C3 29 / 30

slide-66
SLIDE 66

Application to LDL⊺

SYRDK: A ¨ D ¨ A⊺

SYRDK is a main source of speed in LDL⊺ factorizations D is block diagonal with diagonal/anti-diagonal/anti-triangular blocks To compute A ¨ D ¨ A⊺ faster over finite fields:

1

D Ñ ∆ ¨ ∆⊺

2

r A Ð A ¨ ∆ once and for all

3

r A ¨ r A⊺

➡ How to factor D into ∆ ¨ ∆⊺?

  • α 0

0 β

  • : squares

➡ ok

  • therwise

sum of squares ➡ a b

c d

[ a c

b d ] =

  • a2+b2 ac+bd

ac+bd c2+d2

  • =
  • α 0

0 β

  • 0 γ

γ 0

  • :

1

1 1 ´1

1 2 γ

0 ´ 1

2 γ

1

1 1 ´1

⊺ = 0 γ

γ 0

  • r
  • 1 0

0 γ

1 0 1

0 1 1

  • 1 0

0 γ

1 0 1

0 1 1

⊺ = 0 γ

γ 0

  • mod 2

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose S´ eminaire CAS3C3 29 / 30

slide-67
SLIDE 67

Application to LDL⊺

SYRDK: A ¨ D ¨ A⊺

SYRDK is a main source of speed in LDL⊺ factorizations D is block diagonal with diagonal/anti-diagonal/anti-triangular blocks To compute A ¨ D ¨ A⊺ faster over finite fields:

1

D Ñ ∆ ¨ ∆⊺

2

r A Ð A ¨ ∆ once and for all

3

r A ¨ r A⊺

➡ How to factor D into ∆ ¨ ∆⊺?

  • α 0

0 β

  • : squares

➡ ok

  • therwise

sum of squares ➡ a b

c d

[ a c

b d ] =

  • a2+b2 ac+bd

ac+bd c2+d2

  • =
  • α 0

0 β

  • 0 γ

γ 0

  • :

1

1 1 ´1

1 2 γ

0 ´ 1

2 γ

1

1 1 ´1

⊺ = 0 γ

γ 0

  • r
  • 1 0

0 γ

1 0 1

0 1 1

  • 1 0

0 γ

1 0 1

0 1 1

⊺ = 0 γ

γ 0

  • mod 2
  • Char. 2,

0 γ

γ β

  • :

0 γ

γ β

  • =
  • γ

?

β γ

?

β

?

β

γ

?

β γ

?

β

?

β

⊺ =

  • γ

?

β

0 ? β

1 1

1 0

γ

?

β

0 ? β

1 1

1 0

⊺ .

Dumas-Pernet-Sedoglavic On fast multiplication of a matrix by its transpose S´ eminaire CAS3C3 29 / 30