General Equivariance Zhuohui Zhang Amos Gropp Department of - - PowerPoint PPT Presentation

general equivariance
SMART_READER_LITE
LIVE PREVIEW

General Equivariance Zhuohui Zhang Amos Gropp Department of - - PowerPoint PPT Presentation

General Equivariance Zhuohui Zhang Amos Gropp Department of Computer Science & Applied Math Weizmann Institute of Science AGMDL, 2019 Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 1 / 19 Outline Definitions 1


slide-1
SLIDE 1

General Equivariance

Zhuohui Zhang Amos Gropp

Department of Computer Science & Applied Math Weizmann Institute of Science

AGMDL, 2019

Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 1 / 19

slide-2
SLIDE 2

Outline

1

Definitions Compact Group What is a CNN Compact Groups and Equivariance Equivariant MFF-NN and G-CNN

2

How to prove? Three ways to look at representations The Proof

Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 2 / 19

slide-3
SLIDE 3

What is a compact group?

A group is compact if it is closed and bounded. Finite groups are considered to be compact. A compact group has finite volume, one can take integration f → 1 |G|

  • f (g)dg or f →

1 |G|

  • g∈G

f (g) Convolution on G: (f ∗ g)(x) =

  • G

f (xy−1)g(y)dy and convolution theorem, Fourier coefficients of f ∗ g is the elementwise product of Fourier coefficients of f and g:

  • (f ∗ g) = ˆ

f ⊙ ˆ g What is a representation: a (complex) vector space V on which G acts as linear maps(matrices). g ∈ G g ∈ GL(V ) We allow complex matrix entries.

Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 3 / 19

slide-4
SLIDE 4

Examples

Group Representation Permutation group Sn Permutation matrices Rotation group SO(2) 1-dim rep’s e2πimθ Roation group SO(3) 3-dim rotation matrices Quaternions (2-dim) Torus [0, 1]n 1-dim rep’s e2πi(m1θ1+m2θ2+...+mnθn) Irreducibility of V : there is no subrepresentation W ⊂ V closed under group action. Reducible = ⇒ simultaneous block-diagonalization: possible to choose {e, g1, g2 . . .} ⊂ GL(V ) with gi’s block-diagonal of the same shape: gi = gi,1 gi,2

  • G-equivariant map between representations:

homG(V1, V2) = {M : V1 → V2 | M(gv) = g(M(v))}

Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 4 / 19

slide-5
SLIDE 5

Functions on G/H, H\G, H\G/K

How to view the functions on the coset spaces G/H, H\G, H\G/K? First way: as functions on the coset spaces G/H, H\G, H\G/K. For f ∈ L2(G), there are projection operators to L2(G/H), L2(H\G) and L2(H\G/K) G/H AvgHf (x) =

1 |H|

  • H f (xh)dh

H\G AvgHf (x) =

1 |H|

  • H f (hx)dh

H\G/K AvgH,Kf (x) =

1 |H||K|

  • H
  • K f (hxk)dhdk

Second way: are functions invariant on the left by H, on the right by H or on the left by H while on the right by K. There are lifting

  • perators from these cosets to G:

L2(G/H) L2(H\G) L2(H\G/K)    → L2(G) f → ˜ f

Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 5 / 19

slide-6
SLIDE 6

Convolution on G/H, H\G, H\G/K

We can also define convolutions of functions on cosets by simply taking the convolutions of their lifts, and then descend to some cosets according to their invariance properties: f ∈ L2(G) g ∈ L2(G/H) f ∗ g ∈ L2(G/H) f ∈ L2(G/H) g ∈ L2(H\G) f ∗ g ∈ L2(G) f ∈ L2(G/H) g ∈ L2(H\G/K) f ∗ g ∈ L2(G/K) where f ∗ g := ˜ f ∗ ˜ g

Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 6 / 19

slide-7
SLIDE 7

Examples

G, H G/H and functions G = S1 = {eiθ} H = {±1} f = an cos(nθ) G = SO(3), H = SO(2) Spherical harmonics Y m

ℓ (θ, ϕ) on S2

G = Sn, H = Sk, K = Sn−k Size k subsets in {1, . . . , n}

Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 7 / 19

slide-8
SLIDE 8

What is a CNN

A layer is considered as the space of maps: Ll := L(Xl, Vl) = {fl : Xl → Vl} associating each node in the index set to a vector in Vl Some group G acts on the index set Xl of each layer. The action can be transferred to L(Xl, Vl): (g · f )(x) = f (g−1x) L(Xl, Vl) is a vector space with a linear group action. Therefore it is a representation of G.

Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 8 / 19

slide-9
SLIDE 9

What is a CNN

Between each pairs of layers Ll−1 and Ll there is

An (affine) linear map φl : Ll−1 → Ll A nonlinearity σl : Vl → Vl.

A MFF-NN is a sequence of such maps L0

L1 L2 . . .

f0

(σ1 ◦ φ1)(f0) (σ2 ◦ φ2)(σ1 ◦ φ1)(f0) . . .

What are the φl’s? φl can be arbitrary linear functions on Ll−1 (fully connected) φl : fl−1 →

  • Xl−1

wl(y, x)f (y)dy where we represent the weights w(y, x) as a function on Xl−1 × Xl learned through back-propagation, can be a convolution kernel χl(xy−1).

Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 9 / 19

slide-10
SLIDE 10

Equivariance and G-CNN

We require the index set be Xi = G/Hi for some close subgroup Hi ⊂ G. Also require the map φl : L(Xl−1, Vl−1) → L(Xl, Vl) be G-equivariant g ◦ (φl(f )) = φl(g ◦ f ) A MFF-NN is called a G-CNN if the index sets are Xi = G/Hi, and the linear maps φl : L(Xl−1, Vl−1) → L(Xl, Vl) are convolutions φl(fl−1) = fl−1 ∗ χl for some filter χl on Hl−1\G/Hl, with value in Vl−1 × Vl, or more correctly, V ∗

l−1 ⊗ Vl.

Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 10 / 19

slide-11
SLIDE 11

Explicitly, set dl = dim Vl, writing down coordinates of each function in Ll−1, (a1, . . . , adl−1) ∗    χ1,1 . . . χ1,dl . . . . . . χdl−1,1 . . . χdl−1,dl    =(

dl−1

  • 1

ai ∗ χi,1, . . . ,

dl−1

  • 1

ai ∗ χi,dl) where ai : Xl−1 → C, χi,j : Hl−1\G/Hl → C are the coordinate functions

  • f the layer and the filter, respectively.

Theorem

A MFF-NN with each layer indexed by Xl = G/Hl is G-equivariant if and

  • nly if it is a G-CNN.

Proving properties of a function Xl → V can be reduced to proving properties of functions Xl → C.

Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 11 / 19

slide-12
SLIDE 12

First Way to Look at Representations

Intuitively, we can view a group representation V as a specification of matrix representations for elements g ∈ G. Permutation group S3, this representation is NOT irreducible   1 1 1   ,   1 1 1   ,   1 1 1     1 1 1   ,   1 1 1   ,   1 1 1   Circle S1 ∼ = SO(2): irreducibles are

  • eimθ

1 × 1 matrices with m ∈ Z.

  • cos mθ

sin mθ − sin mθ cos mθ

  • eimθ

e−imθ

  • Splits as two 1-dim matrices.

Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 12 / 19

slide-13
SLIDE 13

First Way to Look at Representations

Rotation group SO(3) with (ψ, θ, φ)-Euler angle coordinates. There is one such irreducible matrix in each dimension:

  • cos θ

2 e− ψ+φ 2 i − sin θ 2 e φ−ψ 2 i

sin θ

2 e− φ−ψ 2 i

cos θ

2 e ψ+φ 2 i

 

cos2 θ

2 e−i(ψ+φ) − e−iψ sin θ √ 2

sin2 θ

2 ei(φ−ψ) e−iφ sin θ √ 2

cos θ − eiφ sin θ

√ 2

sin2 θ

2 e−i(φ−ψ) eiψ sin θ √ 2

cos2 θ

2 ei(ψ+φ)

      

cos3 θ 2 e− 3 2 i(ψ+φ) − 1 4 √ 3 sin2 θ csc θ 2 e− 1 2 i(3ψ+φ) 1 2 √ 3 sin θ 2 sin(θ)e 1 2 i(φ−3ψ) sin3 θ 2

  • −e

3 2 i(φ−ψ)

  • 1

4 √ 3 sin2 θ csc θ 2 e− 1 2 i(ψ+3φ) 1 2 cos θ 2 (3 cos θ−1)e− 1 2 i(ψ+φ) − 1 2 sin θ 2 (3 cos θ+1)e 1 2 i(φ−ψ) 1 2 √ 3 sin θ 2 sin θe 1 2 i(3φ−ψ) 1 2 √ 3 sin θ 2 sin θe− 1 2 i(3φ−ψ) 1 2 sin θ 2 (3 cos(θ)+1)e− 1 2 i(φ−ψ) 1 2 cos θ 2

  • (3 cos θ−1)e

1 2 i(ψ+φ) − 1 4 √ 3 sin2 θ csc θ 2 e 1 2 i(ψ+3φ) sin3 θ 2 e− 3 2 i(φ−ψ) 1 2 √ 3 sin θ 2 sin θe− 1 2 i(φ−3ψ) 1 4 √ 3 sin2 θ csc θ 2 e 1 2 i(3ψ+φ) cos3 θ 2 e 3 2 i(ψ+φ)

   

Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 13 / 19

slide-14
SLIDE 14

Second Way to Look at Representations

We can look at representations as a vector space V with G action. Representations V , W are said to be isomorphic if it is possible to choose a basis for each space with the same G-action matrices. Space of G-equivariant maps f ∈ homG(V , W ) f (gv) = g(f (v)) Schur’s lemma: if V , W are irreducible, then homG(V , W ) =

  • if V ≇W

CI if V ∼ =W

In general, if we can break V , W into irreducibles V = ⊕Vi and W = ⊕Wj homG(V , W ) = ⊕i ⊕j homG(Vi, Wj)

Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 14 / 19

slide-15
SLIDE 15

Why Schur’s lemma makes sense?

For the circle group SO(2), irreducible representations are 1-dimensional representations with an abstract basis vector vm on which G acts by: vn → einθvn Consider a G-equivariant map φl acting on the basis elements φl(vn) =

  • m

bm,nvm Since the map φl is G-equivariant, we should have φl(einθvn) =

  • m

bm,neimθvm But this map is also linear, so φl(einθvn) =

  • m

bm,neinθvm Thus

m bm,nei(m−n)θvm = m bm,nvm for every θ. bm,n is required

to vanish except for m = n.

Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 15 / 19

slide-16
SLIDE 16

Third Way to Look at Representations

We can look at representations as functions on G. Peter-Weyl Theorem: for any compact group G, L2(G) = ˆ

  • τ∈Irrep(G)Mat(τ)

where Mat(τ) is the space generated by matrix coefficients of an irreducible representation τ. Mat(τ) is isomorphic to dim τ copies of τ. The set Irrep(G) are described combinatorially. For example, groups like Sn and SU(n), the set has one-to-one correspondence to integer partitions n = (n1, . . . , nk). Fourier basis of G are these matrix coefficients. To prove the main theorem, it suffices to prove all G-equivariant functions homG(Ll−1, Ll) are convolution operators.

Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 16 / 19

slide-17
SLIDE 17

We sketch the proof for the circle group SO(2) The functions on G are Fourier series f =

  • n∈Z

aneinθ If there is a G-equivariant map φl mapping f to φl(f ), assume on the basis elements φl(einθ) =

  • m

bm,neimθ By Schur’s lemma, this map is G-equivariant if and only if φl is of the form φl(einθ) = bn,neinθ Hence φl(f ) =

  • n∈Z

anbn,neinθ which by convolution theorem is equal to f ∗ gφl where gφl is gφl =

  • n∈Z

φl(einθ)

Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 17 / 19

slide-18
SLIDE 18

Message Passing NN as G-CNN

Message functions Mt and vertex update functions Ut. States of vertices ht

v, messages mt+1 v

Message passing: mt+1

v

=

  • w∈N(v)

Mt(ht

v, ht w, evw)

ht+1

v

= Ut(ht

v, mt+1 v

) with a readout function ˆ y = R({hT

v })

Message update is invariant under permutations in N(v). It is a function on Sn/(Sk × Sn−k) where n = |V | and k = |N(v)|.

The coset Sn/(Sk × Sn−k) encodes all k-subset of vertices. But not necessarily contiguous vertices. Not a problem because we can remove the redundancy by setting the message functions 0 on those vertices.

Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 18 / 19

slide-19
SLIDE 19

Thank You For Listening

Zhuohui Zhang, Amos Gropp (WIS) General Equivariance AGMDL, 2019 19 / 19