Statistical Estimation in the Presence of Group Actions Alex Wein - - PowerPoint PPT Presentation

statistical estimation in the presence of group actions
SMART_READER_LITE
LIVE PREVIEW

Statistical Estimation in the Presence of Group Actions Alex Wein - - PowerPoint PPT Presentation

Statistical Estimation in the Presence of Group Actions Alex Wein MIT Mathematics 1 / 39 In memoriam Amelia Perry 1991 2018 2 / 39 My research interests Statistical and computational limits of average-case inference problems


slide-1
SLIDE 1

Statistical Estimation in the Presence of Group Actions

Alex Wein MIT Mathematics

1 / 39

slide-2
SLIDE 2

In memoriam

Amelia Perry 1991 – 2018

2 / 39

slide-3
SLIDE 3

My research interests

◮ Statistical and computational limits of average-case inference

problems (signal planted in random noise)

3 / 39

slide-4
SLIDE 4

My research interests

◮ Statistical and computational limits of average-case inference

problems (signal planted in random noise)

◮ Community detection (stochastic block model) ◮ Spiked matrix/tensor problems ◮ Synchronization / group actions (today) 3 / 39

slide-5
SLIDE 5

My research interests

◮ Statistical and computational limits of average-case inference

problems (signal planted in random noise)

◮ Community detection (stochastic block model) ◮ Spiked matrix/tensor problems ◮ Synchronization / group actions (today)

◮ Connections to...

3 / 39

slide-6
SLIDE 6

My research interests

◮ Statistical and computational limits of average-case inference

problems (signal planted in random noise)

◮ Community detection (stochastic block model) ◮ Spiked matrix/tensor problems ◮ Synchronization / group actions (today)

◮ Connections to...

◮ Statistical physics ◮ Phase transitions: easy, hard, impossible 3 / 39

slide-7
SLIDE 7

My research interests

◮ Statistical and computational limits of average-case inference

problems (signal planted in random noise)

◮ Community detection (stochastic block model) ◮ Spiked matrix/tensor problems ◮ Synchronization / group actions (today)

◮ Connections to...

◮ Statistical physics ◮ Phase transitions: easy, hard, impossible ◮ Algebra ◮ Group theory, representation theory, invariant theory 3 / 39

slide-8
SLIDE 8

My research interests

◮ Statistical and computational limits of average-case inference

problems (signal planted in random noise)

◮ Community detection (stochastic block model) ◮ Spiked matrix/tensor problems ◮ Synchronization / group actions (today)

◮ Connections to...

◮ Statistical physics ◮ Phase transitions: easy, hard, impossible ◮ Algebra ◮ Group theory, representation theory, invariant theory

◮ Today: problems involving group actions

◮ A meeting point of statistics, algebra, signal processing

computer science, statistical physics, . . .

3 / 39

slide-9
SLIDE 9

Motivation: cryo-electron microscopy (cryo-EM)

Image credit: [Singer, Shkolnisky ’11] 4 / 39

slide-10
SLIDE 10

Motivation: cryo-electron microscopy (cryo-EM)

Image credit: [Singer, Shkolnisky ’11]

◮ Biological imaging method: determine structure of molecule

4 / 39

slide-11
SLIDE 11

Motivation: cryo-electron microscopy (cryo-EM)

Image credit: [Singer, Shkolnisky ’11]

◮ Biological imaging method: determine structure of molecule ◮ 2017 Nobel Prize in Chemistry

4 / 39

slide-12
SLIDE 12

Motivation: cryo-electron microscopy (cryo-EM)

Image credit: [Singer, Shkolnisky ’11]

◮ Biological imaging method: determine structure of molecule ◮ 2017 Nobel Prize in Chemistry ◮ Given many noisy 2D images of a 3D molecule, taken from

different unknown angles

4 / 39

slide-13
SLIDE 13

Motivation: cryo-electron microscopy (cryo-EM)

Image credit: [Singer, Shkolnisky ’11]

◮ Biological imaging method: determine structure of molecule ◮ 2017 Nobel Prize in Chemistry ◮ Given many noisy 2D images of a 3D molecule, taken from

different unknown angles

◮ Goal is to reconstruct the 3D structure of the molecule

4 / 39

slide-14
SLIDE 14

Motivation: cryo-electron microscopy (cryo-EM)

Image credit: [Singer, Shkolnisky ’11]

◮ Biological imaging method: determine structure of molecule ◮ 2017 Nobel Prize in Chemistry ◮ Given many noisy 2D images of a 3D molecule, taken from

different unknown angles

◮ Goal is to reconstruct the 3D structure of the molecule ◮ Group action by SO(3) (rotations in 3D)

4 / 39

slide-15
SLIDE 15

Other examples

Other problems involving random group actions:

5 / 39

slide-16
SLIDE 16

Other examples

Other problems involving random group actions:

◮ Image registration

Image credit: [Bandeira, PhD thesis ’15]

Group: SO(2) (2D rotations)

5 / 39

slide-17
SLIDE 17

Other examples

Other problems involving random group actions:

◮ Image registration

Image credit: [Bandeira, PhD thesis ’15]

Group: SO(2) (2D rotations)

◮ Multi-reference alignment

Image credit: Jonathan Weed

Group: Z/p (cyclic shifts)

5 / 39

slide-18
SLIDE 18

Other examples

Other problems involving random group actions:

◮ Image registration

Image credit: [Bandeira, PhD thesis ’15]

Group: SO(2) (2D rotations)

◮ Multi-reference alignment

Image credit: Jonathan Weed

Group: Z/p (cyclic shifts)

◮ Applications: computer vision, radar, structural biology,

robotics, geology, paleontology, ...

5 / 39

slide-19
SLIDE 19

Other examples

Other problems involving random group actions:

◮ Image registration

Image credit: [Bandeira, PhD thesis ’15]

Group: SO(2) (2D rotations)

◮ Multi-reference alignment

Image credit: Jonathan Weed

Group: Z/p (cyclic shifts)

◮ Applications: computer vision, radar, structural biology,

robotics, geology, paleontology, ...

◮ Methods used in practice often lack provable guarantees...

5 / 39

slide-20
SLIDE 20

Part I: Synchronization

6 / 39

slide-21
SLIDE 21

Synchronization problems

The synchronization approach [1]: learn the group elements

[1] Singer ’11 [2] Singer, Shkolnisky ’11 7 / 39

slide-22
SLIDE 22

Synchronization problems

The synchronization approach [1]: learn the group elements

◮ Fix a group G

◮ e.g. SO(3) [1] Singer ’11 [2] Singer, Shkolnisky ’11 7 / 39

slide-23
SLIDE 23

Synchronization problems

The synchronization approach [1]: learn the group elements

◮ Fix a group G

◮ e.g. SO(3)

◮ g ∈ G n – vector of unknown group elements

◮ e.g. rotation of each image [1] Singer ’11 [2] Singer, Shkolnisky ’11 7 / 39

slide-24
SLIDE 24

Synchronization problems

The synchronization approach [1]: learn the group elements

◮ Fix a group G

◮ e.g. SO(3)

◮ g ∈ G n – vector of unknown group elements

◮ e.g. rotation of each image

◮ Given pairwise information: for each i < j, a noisy

measurement of gig−1

j

◮ e.g. by comparing two images [1] Singer ’11 [2] Singer, Shkolnisky ’11 7 / 39

slide-25
SLIDE 25

Synchronization problems

The synchronization approach [1]: learn the group elements

◮ Fix a group G

◮ e.g. SO(3)

◮ g ∈ G n – vector of unknown group elements

◮ e.g. rotation of each image

◮ Given pairwise information: for each i < j, a noisy

measurement of gig−1

j

◮ e.g. by comparing two images

◮ Goal: recover g up to global right-multiplication

◮ can’t distinguish (g1, . . . , gn) from (g1h, . . . , gnh) [1] Singer ’11 [2] Singer, Shkolnisky ’11 7 / 39

slide-26
SLIDE 26

Synchronization problems

The synchronization approach [1]: learn the group elements

◮ Fix a group G

◮ e.g. SO(3)

◮ g ∈ G n – vector of unknown group elements

◮ e.g. rotation of each image

◮ Given pairwise information: for each i < j, a noisy

measurement of gig−1

j

◮ e.g. by comparing two images

◮ Goal: recover g up to global right-multiplication

◮ can’t distinguish (g1, . . . , gn) from (g1h, . . . , gnh)

In cryo-EM: once you learn the rotations, it is possible to reconstruct a de-noised model of the molecule [2]

[1] Singer ’11 [2] Singer, Shkolnisky ’11 7 / 39

slide-27
SLIDE 27

A simple model: Gaussian Z/2 synchronization

◮ G = Z/2 = {±1}

8 / 39

slide-28
SLIDE 28

A simple model: Gaussian Z/2 synchronization

◮ G = Z/2 = {±1} ◮ True signal x ∈ {±1}n (vector of group elements)

8 / 39

slide-29
SLIDE 29

A simple model: Gaussian Z/2 synchronization

◮ G = Z/2 = {±1} ◮ True signal x ∈ {±1}n (vector of group elements) ◮ For each i, j observe xixj + N(0, σ2)

8 / 39

slide-30
SLIDE 30

A simple model: Gaussian Z/2 synchronization

◮ G = Z/2 = {±1} ◮ True signal x ∈ {±1}n (vector of group elements) ◮ For each i, j observe xixj + N(0, σ2) ◮ Specifically, observe n × n matrix Y = λ

n xx⊤

signal

+ 1 √nW

noise ◮ λ ≥ 0 – signal-to-noise parameter ◮ W – random noise matrix: symmetric with entries N(0, 1)

8 / 39

slide-31
SLIDE 31

A simple model: Gaussian Z/2 synchronization

◮ G = Z/2 = {±1} ◮ True signal x ∈ {±1}n (vector of group elements) ◮ For each i, j observe xixj + N(0, σ2) ◮ Specifically, observe n × n matrix Y = λ

n xx⊤

signal

+ 1 √nW

noise ◮ λ ≥ 0 – signal-to-noise parameter ◮ W – random noise matrix: symmetric with entries N(0, 1) ◮ Yij is a noisy measurement of xixj (same/diff)

8 / 39

slide-32
SLIDE 32

A simple model: Gaussian Z/2 synchronization

◮ G = Z/2 = {±1} ◮ True signal x ∈ {±1}n (vector of group elements) ◮ For each i, j observe xixj + N(0, σ2) ◮ Specifically, observe n × n matrix Y = λ

n xx⊤

signal

+ 1 √nW

noise ◮ λ ≥ 0 – signal-to-noise parameter ◮ W – random noise matrix: symmetric with entries N(0, 1) ◮ Yij is a noisy measurement of xixj (same/diff) ◮ Normalization: MMSE is a constant (depending on λ)

8 / 39

slide-33
SLIDE 33

A simple model: Gaussian Z/2 synchronization

◮ G = Z/2 = {±1} ◮ True signal x ∈ {±1}n (vector of group elements) ◮ For each i, j observe xixj + N(0, σ2) ◮ Specifically, observe n × n matrix Y = λ

n xx⊤

signal

+ 1 √nW

noise ◮ λ ≥ 0 – signal-to-noise parameter ◮ W – random noise matrix: symmetric with entries N(0, 1) ◮ Yij is a noisy measurement of xixj (same/diff) ◮ Normalization: MMSE is a constant (depending on λ)

This is a spiked Wigner model: in general xi ∼ P (some prior)

8 / 39

slide-34
SLIDE 34

A simple model: Gaussian Z/2 synchronization

◮ G = Z/2 = {±1} ◮ True signal x ∈ {±1}n (vector of group elements) ◮ For each i, j observe xixj + N(0, σ2) ◮ Specifically, observe n × n matrix Y = λ

n xx⊤

signal

+ 1 √nW

noise ◮ λ ≥ 0 – signal-to-noise parameter ◮ W – random noise matrix: symmetric with entries N(0, 1) ◮ Yij is a noisy measurement of xixj (same/diff) ◮ Normalization: MMSE is a constant (depending on λ)

This is a spiked Wigner model: in general xi ∼ P (some prior) Statistical physics makes extremely precise (non-rigorous) predictions about this type of problem

◮ Often later proved correct

8 / 39

slide-35
SLIDE 35

A simple model: Gaussian Z/2 synchronization

◮ G = Z/2 = {±1} ◮ True signal x ∈ {±1}n (vector of group elements) ◮ Observe n × n matrix Y = λ

n xx⊤

signal

+ 1 √nW

noise

Image credit: [Deshpande, Abbe, Montanari ’15] 9 / 39

slide-36
SLIDE 36

Statistical physics and inference

What does statistical physics have to do with Bayesian inference?

10 / 39

slide-37
SLIDE 37

Statistical physics and inference

What does statistical physics have to do with Bayesian inference? In inference, observe Y = λ

nxx⊤ + 1 √nW and want to infer x

10 / 39

slide-38
SLIDE 38

Statistical physics and inference

What does statistical physics have to do with Bayesian inference? In inference, observe Y = λ

nxx⊤ + 1 √nW and want to infer x

Posterior distribution: Pr[x|Y ] ∝ exp(λ x⊤Yx)

10 / 39

slide-39
SLIDE 39

Statistical physics and inference

What does statistical physics have to do with Bayesian inference? In inference, observe Y = λ

nxx⊤ + 1 √nW and want to infer x

Posterior distribution: Pr[x|Y ] ∝ exp(λ x⊤Yx) In physics, this is called a Boltzmann/Gibbs distribution: Pr[x] ∝ exp(−βH(x))

◮ Energy (“Hamiltonian”) H(x) = −x⊤Yx ◮ Temperature β = λ

10 / 39

slide-40
SLIDE 40

Statistical physics and inference

What does statistical physics have to do with Bayesian inference? In inference, observe Y = λ

nxx⊤ + 1 √nW and want to infer x

Posterior distribution: Pr[x|Y ] ∝ exp(λ x⊤Yx) In physics, this is called a Boltzmann/Gibbs distribution: Pr[x] ∝ exp(−βH(x))

◮ Energy (“Hamiltonian”) H(x) = −x⊤Yx ◮ Temperature β = λ

So posterior distribution of Bayesian inference obeys the same equations as a disordered physical system (e.g. magnet, spin glass)

10 / 39

slide-41
SLIDE 41

BP and AMP

“Axiom” from statistical physics: the best algorithm for every* problem is BP (belief propagation) [1]

[1] Pearl ’82 [2] Donoho, Maleki, Montanari ’09 11 / 39

slide-42
SLIDE 42

BP and AMP

“Axiom” from statistical physics: the best algorithm for every* problem is BP (belief propagation) [1]

◮ Each unknown xi is a “node”

[1] Pearl ’82 [2] Donoho, Maleki, Montanari ’09 11 / 39

slide-43
SLIDE 43

BP and AMP

“Axiom” from statistical physics: the best algorithm for every* problem is BP (belief propagation) [1]

◮ Each unknown xi is a “node” ◮ Each observation (“interaction”) Yij is an “edge”

◮ In our case, a complete graph [1] Pearl ’82 [2] Donoho, Maleki, Montanari ’09 11 / 39

slide-44
SLIDE 44

BP and AMP

“Axiom” from statistical physics: the best algorithm for every* problem is BP (belief propagation) [1]

◮ Each unknown xi is a “node” ◮ Each observation (“interaction”) Yij is an “edge”

◮ In our case, a complete graph

◮ Nodes iteratively pass “messages” or “beliefs” to each other along

edges, and then update their own beliefs

[1] Pearl ’82 [2] Donoho, Maleki, Montanari ’09 11 / 39

slide-45
SLIDE 45

BP and AMP

“Axiom” from statistical physics: the best algorithm for every* problem is BP (belief propagation) [1]

◮ Each unknown xi is a “node” ◮ Each observation (“interaction”) Yij is an “edge”

◮ In our case, a complete graph

◮ Nodes iteratively pass “messages” or “beliefs” to each other along

edges, and then update their own beliefs

◮ Hard to analyze

[1] Pearl ’82 [2] Donoho, Maleki, Montanari ’09 11 / 39

slide-46
SLIDE 46

BP and AMP

“Axiom” from statistical physics: the best algorithm for every* problem is BP (belief propagation) [1]

◮ Each unknown xi is a “node” ◮ Each observation (“interaction”) Yij is an “edge”

◮ In our case, a complete graph

◮ Nodes iteratively pass “messages” or “beliefs” to each other along

edges, and then update their own beliefs

◮ Hard to analyze

In our case (since interactions are “dense”), we can use a simplification

  • f BP called AMP (approximate message passing) [2]

[1] Pearl ’82 [2] Donoho, Maleki, Montanari ’09 11 / 39

slide-47
SLIDE 47

BP and AMP

“Axiom” from statistical physics: the best algorithm for every* problem is BP (belief propagation) [1]

◮ Each unknown xi is a “node” ◮ Each observation (“interaction”) Yij is an “edge”

◮ In our case, a complete graph

◮ Nodes iteratively pass “messages” or “beliefs” to each other along

edges, and then update their own beliefs

◮ Hard to analyze

In our case (since interactions are “dense”), we can use a simplification

  • f BP called AMP (approximate message passing) [2]

◮ Easy/possible to analyze

[1] Pearl ’82 [2] Donoho, Maleki, Montanari ’09 11 / 39

slide-48
SLIDE 48

BP and AMP

“Axiom” from statistical physics: the best algorithm for every* problem is BP (belief propagation) [1]

◮ Each unknown xi is a “node” ◮ Each observation (“interaction”) Yij is an “edge”

◮ In our case, a complete graph

◮ Nodes iteratively pass “messages” or “beliefs” to each other along

edges, and then update their own beliefs

◮ Hard to analyze

In our case (since interactions are “dense”), we can use a simplification

  • f BP called AMP (approximate message passing) [2]

◮ Easy/possible to analyze ◮ Provably optimal mean squared error for many problems

[1] Pearl ’82 [2] Donoho, Maleki, Montanari ’09 11 / 39

slide-49
SLIDE 49

AMP for Z/2 synchronization

Y = λ n xx⊤ + 1 √nW , x ∈ {±1}n

12 / 39

slide-50
SLIDE 50

AMP for Z/2 synchronization

Y = λ n xx⊤ + 1 √nW , x ∈ {±1}n AMP algorithm:

◮ State v ∈ Rn – estimate for x

12 / 39

slide-51
SLIDE 51

AMP for Z/2 synchronization

Y = λ n xx⊤ + 1 √nW , x ∈ {±1}n AMP algorithm:

◮ State v ∈ Rn – estimate for x ◮ Initialize v to small random vector

12 / 39

slide-52
SLIDE 52

AMP for Z/2 synchronization

Y = λ n xx⊤ + 1 √nW , x ∈ {±1}n AMP algorithm:

◮ State v ∈ Rn – estimate for x ◮ Initialize v to small random vector ◮ Repeat:

  • 1. Power iteration: v ← Yv (power iteration)

12 / 39

slide-53
SLIDE 53

AMP for Z/2 synchronization

Y = λ n xx⊤ + 1 √nW , x ∈ {±1}n AMP algorithm:

◮ State v ∈ Rn – estimate for x ◮ Initialize v to small random vector ◮ Repeat:

  • 1. Power iteration: v ← Yv (power iteration)
  • 2. Onsager: v ← v + [Onsager term]

12 / 39

slide-54
SLIDE 54

AMP for Z/2 synchronization

Y = λ n xx⊤ + 1 √nW , x ∈ {±1}n AMP algorithm:

◮ State v ∈ Rn – estimate for x ◮ Initialize v to small random vector ◮ Repeat:

  • 1. Power iteration: v ← Yv (power iteration)
  • 2. Onsager: v ← v + [Onsager term]
  • 3. Entrywise soft projection: vi ← tanh(λvi) (for all i)

◮ Resulting values in [−1, 1] 12 / 39

slide-55
SLIDE 55

AMP is optimal

Y = λ n xx⊤ + 1 √nW , x ∈ {±1}n For Z/2 synchronization, AMP is provably optimal.

Deshpande, Abbe, Montanari, ’15 13 / 39

slide-56
SLIDE 56

Free energy landscapes

What do physics predictions look like?

Lesieur, Krzakala, Zdeborov´ a ’15 14 / 39

slide-57
SLIDE 57

Free energy landscapes

What do physics predictions look like?

f (γ) = 1 λ

λ2 4

  • γ2

λ4 + 1

  • +

1 2 γ γ λ2 + 1

E

z∼N (0,1)

log(2 cosh(γ + √γz))

  • Lesieur, Krzakala, Zdeborov´

a ’15 14 / 39

slide-58
SLIDE 58

Free energy landscapes

What do physics predictions look like?

f (γ) = 1 λ

λ2 4

  • γ2

λ4 + 1

  • +

1 2 γ γ λ2 + 1

E

z∼N (0,1)

log(2 cosh(γ + √γz))

  • x-axis γ: correlation with true signal (related to MSE)

y-axis f : free energy – AMP’s “objective function” (minimize)

Lesieur, Krzakala, Zdeborov´ a ’15 14 / 39

slide-59
SLIDE 59

Free energy landscapes

What do physics predictions look like?

f (γ) = 1 λ

λ2 4

  • γ2

λ4 + 1

  • +

1 2 γ γ λ2 + 1

E

z∼N (0,1)

log(2 cosh(γ + √γz))

  • x-axis γ: correlation with true signal (related to MSE)

y-axis f : free energy – AMP’s “objective function” (minimize) AMP – gradient descent starting from γ = 0 (left side) STAT (statistical) – global minimum

Lesieur, Krzakala, Zdeborov´ a ’15 14 / 39

slide-60
SLIDE 60

Free energy landscapes

What do physics predictions look like?

f (γ) = 1 λ

λ2 4

  • γ2

λ4 + 1

  • +

1 2 γ γ λ2 + 1

E

z∼N (0,1)

log(2 cosh(γ + √γz))

  • x-axis γ: correlation with true signal (related to MSE)

y-axis f : free energy – AMP’s “objective function” (minimize) AMP – gradient descent starting from γ = 0 (left side) STAT (statistical) – global minimum So yields computational and statistical MSE for each λ

Lesieur, Krzakala, Zdeborov´ a ’15 14 / 39

slide-61
SLIDE 61

Our contributions

Joint work with Amelia Perry, Afonso Bandeira, Ankur Moitra

Perry, W., Bandeira, Moitra, Message-passing algorithms for synchronization problems over compact groups, to appear in CPAM Perry, W., Bandeira, Moitra, Optimality and Sub-optimality of PCA for Spiked Random Matrices and Synchronization, part I to appear in Ann. Stat 15 / 39

slide-62
SLIDE 62

Our contributions

Joint work with Amelia Perry, Afonso Bandeira, Ankur Moitra

◮ Using representation theory we define a very general Gaussian

  • bservation model for synchronization over any compact group

Perry, W., Bandeira, Moitra, Message-passing algorithms for synchronization problems over compact groups, to appear in CPAM Perry, W., Bandeira, Moitra, Optimality and Sub-optimality of PCA for Spiked Random Matrices and Synchronization, part I to appear in Ann. Stat 15 / 39

slide-63
SLIDE 63

Our contributions

Joint work with Amelia Perry, Afonso Bandeira, Ankur Moitra

◮ Using representation theory we define a very general Gaussian

  • bservation model for synchronization over any compact group

◮ Significantly generalizes Z/2 case Perry, W., Bandeira, Moitra, Message-passing algorithms for synchronization problems over compact groups, to appear in CPAM Perry, W., Bandeira, Moitra, Optimality and Sub-optimality of PCA for Spiked Random Matrices and Synchronization, part I to appear in Ann. Stat 15 / 39

slide-64
SLIDE 64

Our contributions

Joint work with Amelia Perry, Afonso Bandeira, Ankur Moitra

◮ Using representation theory we define a very general Gaussian

  • bservation model for synchronization over any compact group

◮ Significantly generalizes Z/2 case

◮ We give a precise analysis of the statistical and computational

limits of this model

Perry, W., Bandeira, Moitra, Message-passing algorithms for synchronization problems over compact groups, to appear in CPAM Perry, W., Bandeira, Moitra, Optimality and Sub-optimality of PCA for Spiked Random Matrices and Synchronization, part I to appear in Ann. Stat 15 / 39

slide-65
SLIDE 65

Our contributions

Joint work with Amelia Perry, Afonso Bandeira, Ankur Moitra

◮ Using representation theory we define a very general Gaussian

  • bservation model for synchronization over any compact group

◮ Significantly generalizes Z/2 case

◮ We give a precise analysis of the statistical and computational

limits of this model

◮ Uses non-rigorous (but well-established) ideas from statistical

physics

◮ Methods proven correct in related settings Perry, W., Bandeira, Moitra, Message-passing algorithms for synchronization problems over compact groups, to appear in CPAM Perry, W., Bandeira, Moitra, Optimality and Sub-optimality of PCA for Spiked Random Matrices and Synchronization, part I to appear in Ann. Stat 15 / 39

slide-66
SLIDE 66

Our contributions

Joint work with Amelia Perry, Afonso Bandeira, Ankur Moitra

◮ Using representation theory we define a very general Gaussian

  • bservation model for synchronization over any compact group

◮ Significantly generalizes Z/2 case

◮ We give a precise analysis of the statistical and computational

limits of this model

◮ Uses non-rigorous (but well-established) ideas from statistical

physics

◮ Methods proven correct in related settings ◮ Includes an AMP algorithm which we believe is optimal among

all polynomial-time algorithms

Perry, W., Bandeira, Moitra, Message-passing algorithms for synchronization problems over compact groups, to appear in CPAM Perry, W., Bandeira, Moitra, Optimality and Sub-optimality of PCA for Spiked Random Matrices and Synchronization, part I to appear in Ann. Stat 15 / 39

slide-67
SLIDE 67

Our contributions

Joint work with Amelia Perry, Afonso Bandeira, Ankur Moitra

◮ Using representation theory we define a very general Gaussian

  • bservation model for synchronization over any compact group

◮ Significantly generalizes Z/2 case

◮ We give a precise analysis of the statistical and computational

limits of this model

◮ Uses non-rigorous (but well-established) ideas from statistical

physics

◮ Methods proven correct in related settings ◮ Includes an AMP algorithm which we believe is optimal among

all polynomial-time algorithms

◮ Also some rigorous statistical lower and upper bounds

Perry, W., Bandeira, Moitra, Message-passing algorithms for synchronization problems over compact groups, to appear in CPAM Perry, W., Bandeira, Moitra, Optimality and Sub-optimality of PCA for Spiked Random Matrices and Synchronization, part I to appear in Ann. Stat 15 / 39

slide-68
SLIDE 68

Multi-frequency U(1) synchronization

◮ G = U(1) = {z ∈ C : |z| = 1} (angles)

16 / 39

slide-69
SLIDE 69

Multi-frequency U(1) synchronization

◮ G = U(1) = {z ∈ C : |z| = 1} (angles) ◮ True signal x ∈ U(1)n

16 / 39

slide-70
SLIDE 70

Multi-frequency U(1) synchronization

◮ G = U(1) = {z ∈ C : |z| = 1} (angles) ◮ True signal x ∈ U(1)n ◮ W – complex Gaussian noise (GUE)

16 / 39

slide-71
SLIDE 71

Multi-frequency U(1) synchronization

◮ G = U(1) = {z ∈ C : |z| = 1} (angles) ◮ True signal x ∈ U(1)n ◮ W – complex Gaussian noise (GUE) ◮ Observe

16 / 39

slide-72
SLIDE 72

Multi-frequency U(1) synchronization

◮ G = U(1) = {z ∈ C : |z| = 1} (angles) ◮ True signal x ∈ U(1)n ◮ W – complex Gaussian noise (GUE) ◮ Observe

Y (1) = λ1 n xx∗ + 1 √nW (1)

16 / 39

slide-73
SLIDE 73

Multi-frequency U(1) synchronization

◮ G = U(1) = {z ∈ C : |z| = 1} (angles) ◮ True signal x ∈ U(1)n ◮ W – complex Gaussian noise (GUE) ◮ Observe

Y (1) = λ1 n xx∗ + 1 √nW (1) Y (2) = λ2 n x2x∗2 + 1 √nW (2) · · · Y (K) = λK n xKx∗K + 1 √nW (K) where xk means entry-wise kth power.

16 / 39

slide-74
SLIDE 74

Multi-frequency U(1) synchronization

◮ G = U(1) = {z ∈ C : |z| = 1} (angles) ◮ True signal x ∈ U(1)n ◮ W – complex Gaussian noise (GUE) ◮ Observe

Y (1) = λ1 n xx∗ + 1 √nW (1) Y (2) = λ2 n x2x∗2 + 1 √nW (2) · · · Y (K) = λK n xKx∗K + 1 √nW (K) where xk means entry-wise kth power.

◮ This model has information on different frequencies

16 / 39

slide-75
SLIDE 75

Multi-frequency U(1) synchronization

◮ G = U(1) = {z ∈ C : |z| = 1} (angles) ◮ True signal x ∈ U(1)n ◮ W – complex Gaussian noise (GUE) ◮ Observe

Y (1) = λ1 n xx∗ + 1 √nW (1) Y (2) = λ2 n x2x∗2 + 1 √nW (2) · · · Y (K) = λK n xKx∗K + 1 √nW (K) where xk means entry-wise kth power.

◮ This model has information on different frequencies ◮ Challenge: how to synthesize information across frequencies?

16 / 39

slide-76
SLIDE 76

AMP for U(1) synchronization

Y (k) = λk n xkx∗k + 1 √nW (k) for k = 1, . . . , K

17 / 39

slide-77
SLIDE 77

AMP for U(1) synchronization

Y (k) = λk n xkx∗k + 1 √nW (k) for k = 1, . . . , K Algorithm’s state: v(k) ∈ Cn for each frequency k

◮ v(k) is an estimate of (xk 1 , . . . , xk n )

17 / 39

slide-78
SLIDE 78

AMP for U(1) synchronization

Y (k) = λk n xkx∗k + 1 √nW (k) for k = 1, . . . , K Algorithm’s state: v(k) ∈ Cn for each frequency k

◮ v(k) is an estimate of (xk 1 , . . . , xk n )

AMP algorithm:

◮ Power iteration (separately on each frequency):

v(k) ← Y (k)v(k)

17 / 39

slide-79
SLIDE 79

AMP for U(1) synchronization

Y (k) = λk n xkx∗k + 1 √nW (k) for k = 1, . . . , K Algorithm’s state: v(k) ∈ Cn for each frequency k

◮ v(k) is an estimate of (xk 1 , . . . , xk n )

AMP algorithm:

◮ Power iteration (separately on each frequency):

v(k) ← Y (k)v(k)

◮ “Soft projection” (separately on each index i): v(·) i

← F(v(·)

i )

◮ This synthesizes the frequencies in a non-trivial way 17 / 39

slide-80
SLIDE 80

AMP for U(1) synchronization

Y (k) = λk n xkx∗k + 1 √nW (k) for k = 1, . . . , K Algorithm’s state: v(k) ∈ Cn for each frequency k

◮ v(k) is an estimate of (xk 1 , . . . , xk n )

AMP algorithm:

◮ Power iteration (separately on each frequency):

v(k) ← Y (k)v(k)

◮ “Soft projection” (separately on each index i): v(·) i

← F(v(·)

i )

◮ This synthesizes the frequencies in a non-trivial way

◮ Onsager correction term

17 / 39

slide-81
SLIDE 81

AMP for U(1) synchronization

Y (k) = λk n xkx∗k + 1 √nW (k) for k = 1, . . . , K Algorithm’s state: v(k) ∈ Cn for each frequency k

◮ v(k) is an estimate of (xk 1 , . . . , xk n )

AMP algorithm:

◮ Power iteration (separately on each frequency):

v(k) ← Y (k)v(k)

◮ “Soft projection” (separately on each index i): v(·) i

← F(v(·)

i )

◮ This synthesizes the frequencies in a non-trivial way

◮ Onsager correction term

Analysis of AMP:

◮ Exact expression for AMP’s MSE (as n → ∞) as a function of

λ1, . . . , λK

17 / 39

slide-82
SLIDE 82

AMP for U(1) synchronization

Y (k) = λk n xkx∗k + 1 √nW (k) for k = 1, . . . , K Algorithm’s state: v(k) ∈ Cn for each frequency k

◮ v(k) is an estimate of (xk 1 , . . . , xk n )

AMP algorithm:

◮ Power iteration (separately on each frequency):

v(k) ← Y (k)v(k)

◮ “Soft projection” (separately on each index i): v(·) i

← F(v(·)

i )

◮ This synthesizes the frequencies in a non-trivial way

◮ Onsager correction term

Analysis of AMP:

◮ Exact expression for AMP’s MSE (as n → ∞) as a function of

λ1, . . . , λK

◮ Also, exact expression for the statistically optimal MSE

17 / 39

slide-83
SLIDE 83

Results for U(1) synchronization

Y (k) = λk n xkx∗k + 1 √nW (k) for k = 1, . . . , K

18 / 39

slide-84
SLIDE 84

Results for U(1) synchronization

Y (k) = λk n xkx∗k + 1 √nW (k) for k = 1, . . . , K

◮ Single frequency: given Y (k), can non-trivially estimate xk iff

λk > 1

18 / 39

slide-85
SLIDE 85

Results for U(1) synchronization

Y (k) = λk n xkx∗k + 1 √nW (k) for k = 1, . . . , K

◮ Single frequency: given Y (k), can non-trivially estimate xk iff

λk > 1

◮ Information-theoretically, with λ1 = · · · = λK = λ, need

λ ∼

  • log K/K (for large K)

18 / 39

slide-86
SLIDE 86

Results for U(1) synchronization

Y (k) = λk n xkx∗k + 1 √nW (k) for k = 1, . . . , K

◮ Single frequency: given Y (k), can non-trivially estimate xk iff

λk > 1

◮ Information-theoretically, with λ1 = · · · = λK = λ, need

λ ∼

  • log K/K (for large K)

◮ But AMP (and conjecturally, any poly-time algorithm)

requires λk > 1 for some k

18 / 39

slide-87
SLIDE 87

Results for U(1) synchronization

Y (k) = λk n xkx∗k + 1 √nW (k) for k = 1, . . . , K

◮ Single frequency: given Y (k), can non-trivially estimate xk iff

λk > 1

◮ Information-theoretically, with λ1 = · · · = λK = λ, need

λ ∼

  • log K/K (for large K)

◮ But AMP (and conjecturally, any poly-time algorithm)

requires λk > 1 for some k

◮ Computationally hard to synthesize sub-critical (λ ≤ 1)

frequencies

18 / 39

slide-88
SLIDE 88

Results for U(1) synchronization

Y (k) = λk n xkx∗k + 1 √nW (k) for k = 1, . . . , K

◮ Single frequency: given Y (k), can non-trivially estimate xk iff

λk > 1

◮ Information-theoretically, with λ1 = · · · = λK = λ, need

λ ∼

  • log K/K (for large K)

◮ But AMP (and conjecturally, any poly-time algorithm)

requires λk > 1 for some k

◮ Computationally hard to synthesize sub-critical (λ ≤ 1)

frequencies

◮ But once above the λ = 1 threshold, adding frequencies helps

reduce MSE of AMP

18 / 39

slide-89
SLIDE 89

Results for U(1) synchronization

Solid: AMP (n = 100) (K = num freq) Dotted: theoretical (n → ∞) Same λ on each frequency

Image credit: Perry, W., Bandeira, Moitra, Message-passing algorithms for synchronization problems over compact groups, to appear in CPAM 19 / 39

slide-90
SLIDE 90

General groups

All of the above extends to any compact group

◮ E.g. Any finite group; SO(3)

20 / 39

slide-91
SLIDE 91

General groups

All of the above extends to any compact group

◮ E.g. Any finite group; SO(3)

How to even define the model?

◮ Need to add “noise” to a group element gig−1 j

20 / 39

slide-92
SLIDE 92

General groups

All of the above extends to any compact group

◮ E.g. Any finite group; SO(3)

How to even define the model?

◮ Need to add “noise” to a group element gig−1 j

Answer: Use representation theory to represent a group element as a matrix (and then add Gaussian noise)

20 / 39

slide-93
SLIDE 93

General groups

All of the above extends to any compact group

◮ E.g. Any finite group; SO(3)

How to even define the model?

◮ Need to add “noise” to a group element gig−1 j

Answer: Use representation theory to represent a group element as a matrix (and then add Gaussian noise)

◮ A representation ρ of G is a way to assign a matrix ρ(g) to

each g ∈ G

◮ Formally, a homomorphism

ρ : G → GL(Cd) = {d × d invertible matrices}

20 / 39

slide-94
SLIDE 94

General groups

All of the above extends to any compact group

◮ E.g. Any finite group; SO(3)

How to even define the model?

◮ Need to add “noise” to a group element gig−1 j

Answer: Use representation theory to represent a group element as a matrix (and then add Gaussian noise)

◮ A representation ρ of G is a way to assign a matrix ρ(g) to

each g ∈ G

◮ Formally, a homomorphism

ρ : G → GL(Cd) = {d × d invertible matrices} Frequencies are replaced by irreducible representations of G

◮ Fourier theory for functions G → C

20 / 39

slide-95
SLIDE 95

General groups

All of the above extends to any compact group

◮ E.g. Any finite group; SO(3)

How to even define the model?

◮ Need to add “noise” to a group element gig−1 j

Answer: Use representation theory to represent a group element as a matrix (and then add Gaussian noise)

◮ A representation ρ of G is a way to assign a matrix ρ(g) to

each g ∈ G

◮ Formally, a homomorphism

ρ : G → GL(Cd) = {d × d invertible matrices} Frequencies are replaced by irreducible representations of G

◮ Fourier theory for functions G → C

For U(1), 1D irreducible representation for each k: ρk(g) = gk

20 / 39

slide-96
SLIDE 96

Part II: Orbit Recovery

21 / 39

slide-97
SLIDE 97

Back to cryo-EM

Image credit: [Singer, Shkolnisky ’11] 22 / 39

slide-98
SLIDE 98

Back to cryo-EM

Image credit: [Singer, Shkolnisky ’11]

Synchronization is not the ideal model for cryo-EM

22 / 39

slide-99
SLIDE 99

Back to cryo-EM

Image credit: [Singer, Shkolnisky ’11]

Synchronization is not the ideal model for cryo-EM

◮ The synchronization approach disregards the underlying signal

(the molecule)

22 / 39

slide-100
SLIDE 100

Back to cryo-EM

Image credit: [Singer, Shkolnisky ’11]

Synchronization is not the ideal model for cryo-EM

◮ The synchronization approach disregards the underlying signal

(the molecule)

◮ Our Gaussian synchronization model assumes independent

noise on each pair i, j of images, whereas actually there is independent noise on each image

22 / 39

slide-101
SLIDE 101

Back to cryo-EM

Image credit: [Singer, Shkolnisky ’11]

Synchronization is not the ideal model for cryo-EM

◮ The synchronization approach disregards the underlying signal

(the molecule)

◮ Our Gaussian synchronization model assumes independent

noise on each pair i, j of images, whereas actually there is independent noise on each image

◮ For high noise, it is impossible to reliably recover the rotations

◮ So we should not try to estimate the rotations! 22 / 39

slide-102
SLIDE 102

Orbit recovery problem

Let G be a compact group acting linearly and continuously on a finite-dimensional real vector space V = Rp.

23 / 39

slide-103
SLIDE 103

Orbit recovery problem

Let G be a compact group acting linearly and continuously on a finite-dimensional real vector space V = Rp.

◮ Compact: e.g. any finite group, SO(2), SO(3)

23 / 39

slide-104
SLIDE 104

Orbit recovery problem

Let G be a compact group acting linearly and continuously on a finite-dimensional real vector space V = Rp.

◮ Compact: e.g. any finite group, SO(2), SO(3) ◮ Linear: ρ : G → GL(V ) = {invertible p × p matrices}

(homomorphism)

23 / 39

slide-105
SLIDE 105

Orbit recovery problem

Let G be a compact group acting linearly and continuously on a finite-dimensional real vector space V = Rp.

◮ Compact: e.g. any finite group, SO(2), SO(3) ◮ Linear: ρ : G → GL(V ) = {invertible p × p matrices}

(homomorphism)

◮ Action: g · x = ρ(g)x

for g ∈ G, x ∈ V

23 / 39

slide-106
SLIDE 106

Orbit recovery problem

Let G be a compact group acting linearly and continuously on a finite-dimensional real vector space V = Rp.

◮ Compact: e.g. any finite group, SO(2), SO(3) ◮ Linear: ρ : G → GL(V ) = {invertible p × p matrices}

(homomorphism)

◮ Action: g · x = ρ(g)x

for g ∈ G, x ∈ V

◮ Continuous: ρ is continuous

23 / 39

slide-107
SLIDE 107

Orbit recovery problem

Let G be a compact group acting linearly and continuously on a finite-dimensional real vector space V = Rp.

24 / 39

slide-108
SLIDE 108

Orbit recovery problem

Let G be a compact group acting linearly and continuously on a finite-dimensional real vector space V = Rp. Unknown signal x ∈ V (e.g. the molecule)

24 / 39

slide-109
SLIDE 109

Orbit recovery problem

Let G be a compact group acting linearly and continuously on a finite-dimensional real vector space V = Rp. Unknown signal x ∈ V (e.g. the molecule) For i = 1, . . . , n observe yi = gi · x + εi where. . .

◮ gi ∼ Haar(G)

(“uniform distribution” on G)

◮ εi ∼ N(0, σ2Ip)

(noise)

24 / 39

slide-110
SLIDE 110

Orbit recovery problem

Let G be a compact group acting linearly and continuously on a finite-dimensional real vector space V = Rp. Unknown signal x ∈ V (e.g. the molecule) For i = 1, . . . , n observe yi = gi · x + εi where. . .

◮ gi ∼ Haar(G)

(“uniform distribution” on G)

◮ εi ∼ N(0, σ2Ip)

(noise) Goal: Recover some ˜ x in the orbit {g · x : g ∈ G} of x

24 / 39

slide-111
SLIDE 111

Special case: multi-reference alignment (MRA)

G = Z/p acts on Rp via cyclic shifts For i = 1, . . . , n observe yi = gi · x + εi with εi ∼ N(0, σ2I)

Image credit: Jonathan Weed 25 / 39

slide-112
SLIDE 112

Special case: multi-reference alignment (MRA)

G = Z/p acts on Rp via cyclic shifts For i = 1, . . . , n observe yi = gi · x + εi with εi ∼ N(0, σ2I) Method of invariants [1,2] : measure features of the signal x that are shift-invariant

[1] Bandeira, Rigollet, Weed, Optimal rates of estimation for multi-reference alignment, 2017 [2] Perry, Weed, Bandeira, Rigollet, Singer, The sample complexity of multi-reference alignment, 2017 25 / 39

slide-113
SLIDE 113

Special case: multi-reference alignment (MRA)

G = Z/p acts on Rp via cyclic shifts For i = 1, . . . , n observe yi = gi · x + εi with εi ∼ N(0, σ2I) Method of invariants [1,2] : measure features of the signal x that are shift-invariant Degree-1:

[1] Bandeira, Rigollet, Weed, Optimal rates of estimation for multi-reference alignment, 2017 [2] Perry, Weed, Bandeira, Rigollet, Singer, The sample complexity of multi-reference alignment, 2017 25 / 39

slide-114
SLIDE 114

Special case: multi-reference alignment (MRA)

G = Z/p acts on Rp via cyclic shifts For i = 1, . . . , n observe yi = gi · x + εi with εi ∼ N(0, σ2I) Method of invariants [1,2] : measure features of the signal x that are shift-invariant Degree-1:

i xi

[1] Bandeira, Rigollet, Weed, Optimal rates of estimation for multi-reference alignment, 2017 [2] Perry, Weed, Bandeira, Rigollet, Singer, The sample complexity of multi-reference alignment, 2017 25 / 39

slide-115
SLIDE 115

Special case: multi-reference alignment (MRA)

G = Z/p acts on Rp via cyclic shifts For i = 1, . . . , n observe yi = gi · x + εi with εi ∼ N(0, σ2I) Method of invariants [1,2] : measure features of the signal x that are shift-invariant Degree-1:

i xi (mean)

[1] Bandeira, Rigollet, Weed, Optimal rates of estimation for multi-reference alignment, 2017 [2] Perry, Weed, Bandeira, Rigollet, Singer, The sample complexity of multi-reference alignment, 2017 25 / 39

slide-116
SLIDE 116

Special case: multi-reference alignment (MRA)

G = Z/p acts on Rp via cyclic shifts For i = 1, . . . , n observe yi = gi · x + εi with εi ∼ N(0, σ2I) Method of invariants [1,2] : measure features of the signal x that are shift-invariant Degree-1:

i xi (mean)

Degree-2:

[1] Bandeira, Rigollet, Weed, Optimal rates of estimation for multi-reference alignment, 2017 [2] Perry, Weed, Bandeira, Rigollet, Singer, The sample complexity of multi-reference alignment, 2017 25 / 39

slide-117
SLIDE 117

Special case: multi-reference alignment (MRA)

G = Z/p acts on Rp via cyclic shifts For i = 1, . . . , n observe yi = gi · x + εi with εi ∼ N(0, σ2I) Method of invariants [1,2] : measure features of the signal x that are shift-invariant Degree-1:

i xi (mean)

Degree-2:

i x2 i

[1] Bandeira, Rigollet, Weed, Optimal rates of estimation for multi-reference alignment, 2017 [2] Perry, Weed, Bandeira, Rigollet, Singer, The sample complexity of multi-reference alignment, 2017 25 / 39

slide-118
SLIDE 118

Special case: multi-reference alignment (MRA)

G = Z/p acts on Rp via cyclic shifts For i = 1, . . . , n observe yi = gi · x + εi with εi ∼ N(0, σ2I) Method of invariants [1,2] : measure features of the signal x that are shift-invariant Degree-1:

i xi (mean)

Degree-2:

i x2 i ,

x1x2 + x2x3 + · · · + xpx1, . . .

[1] Bandeira, Rigollet, Weed, Optimal rates of estimation for multi-reference alignment, 2017 [2] Perry, Weed, Bandeira, Rigollet, Singer, The sample complexity of multi-reference alignment, 2017 25 / 39

slide-119
SLIDE 119

Special case: multi-reference alignment (MRA)

G = Z/p acts on Rp via cyclic shifts For i = 1, . . . , n observe yi = gi · x + εi with εi ∼ N(0, σ2I) Method of invariants [1,2] : measure features of the signal x that are shift-invariant Degree-1:

i xi (mean)

Degree-2:

i x2 i ,

x1x2 + x2x3 + · · · + xpx1, . . . (autocorrelation)

[1] Bandeira, Rigollet, Weed, Optimal rates of estimation for multi-reference alignment, 2017 [2] Perry, Weed, Bandeira, Rigollet, Singer, The sample complexity of multi-reference alignment, 2017 25 / 39

slide-120
SLIDE 120

Special case: multi-reference alignment (MRA)

G = Z/p acts on Rp via cyclic shifts For i = 1, . . . , n observe yi = gi · x + εi with εi ∼ N(0, σ2I) Method of invariants [1,2] : measure features of the signal x that are shift-invariant Degree-1:

i xi (mean)

Degree-2:

i x2 i ,

x1x2 + x2x3 + · · · + xpx1, . . . (autocorrelation) Degree-3:

[1] Bandeira, Rigollet, Weed, Optimal rates of estimation for multi-reference alignment, 2017 [2] Perry, Weed, Bandeira, Rigollet, Singer, The sample complexity of multi-reference alignment, 2017 25 / 39

slide-121
SLIDE 121

Special case: multi-reference alignment (MRA)

G = Z/p acts on Rp via cyclic shifts For i = 1, . . . , n observe yi = gi · x + εi with εi ∼ N(0, σ2I) Method of invariants [1,2] : measure features of the signal x that are shift-invariant Degree-1:

i xi (mean)

Degree-2:

i x2 i ,

x1x2 + x2x3 + · · · + xpx1, . . . (autocorrelation) Degree-3: x1x2x4 + x2x3x5 + . . .

[1] Bandeira, Rigollet, Weed, Optimal rates of estimation for multi-reference alignment, 2017 [2] Perry, Weed, Bandeira, Rigollet, Singer, The sample complexity of multi-reference alignment, 2017 25 / 39

slide-122
SLIDE 122

Special case: multi-reference alignment (MRA)

G = Z/p acts on Rp via cyclic shifts For i = 1, . . . , n observe yi = gi · x + εi with εi ∼ N(0, σ2I) Method of invariants [1,2] : measure features of the signal x that are shift-invariant Degree-1:

i xi (mean)

Degree-2:

i x2 i ,

x1x2 + x2x3 + · · · + xpx1, . . . (autocorrelation) Degree-3: x1x2x4 + x2x3x5 + . . . (triple correlation)

[1] Bandeira, Rigollet, Weed, Optimal rates of estimation for multi-reference alignment, 2017 [2] Perry, Weed, Bandeira, Rigollet, Singer, The sample complexity of multi-reference alignment, 2017 25 / 39

slide-123
SLIDE 123

Special case: multi-reference alignment (MRA)

G = Z/p acts on Rp via cyclic shifts For i = 1, . . . , n observe yi = gi · x + εi with εi ∼ N(0, σ2I) Method of invariants [1,2] : measure features of the signal x that are shift-invariant Degree-1:

i xi (mean)

Degree-2:

i x2 i ,

x1x2 + x2x3 + · · · + xpx1, . . . (autocorrelation) Degree-3: x1x2x4 + x2x3x5 + . . . (triple correlation) Invariant features are easy to estimate from the samples

[1] Bandeira, Rigollet, Weed, Optimal rates of estimation for multi-reference alignment, 2017 [2] Perry, Weed, Bandeira, Rigollet, Singer, The sample complexity of multi-reference alignment, 2017 25 / 39

slide-124
SLIDE 124

Sample complexity

Theorem [1]: (Upper bound) With noise level σ, can estimate degree-d invariants using n = O(σ2d) samples.

[1] Bandeira, Rigollet, Weed, Optimal rates of estimation for multi-reference alignment, 2017 26 / 39

slide-125
SLIDE 125

Sample complexity

Theorem [1]: (Upper bound) With noise level σ, can estimate degree-d invariants using n = O(σ2d) samples. (Lower bound) If x(1), x(2) agree on all invariants of degree ≤ d − 1 then Ω(σ2d) samples are required to distinguish them.

◮ Method of invariants is optimal

[1] Bandeira, Rigollet, Weed, Optimal rates of estimation for multi-reference alignment, 2017 26 / 39

slide-126
SLIDE 126

Sample complexity

Theorem [1]: (Upper bound) With noise level σ, can estimate degree-d invariants using n = O(σ2d) samples. (Lower bound) If x(1), x(2) agree on all invariants of degree ≤ d − 1 then Ω(σ2d) samples are required to distinguish them.

◮ Method of invariants is optimal

Question: What degree d∗ of invariants do we need to learn before we can recover (the orbit of) x?

◮ Optimal sample complexity is n = Θ(σ2d∗)

[1] Bandeira, Rigollet, Weed, Optimal rates of estimation for multi-reference alignment, 2017 26 / 39

slide-127
SLIDE 127

Sample complexity

Theorem [1]: (Upper bound) With noise level σ, can estimate degree-d invariants using n = O(σ2d) samples. (Lower bound) If x(1), x(2) agree on all invariants of degree ≤ d − 1 then Ω(σ2d) samples are required to distinguish them.

◮ Method of invariants is optimal

Question: What degree d∗ of invariants do we need to learn before we can recover (the orbit of) x?

◮ Optimal sample complexity is n = Θ(σ2d∗)

Answer (for MRA) [1]:

◮ For “generic” x, degree 3 is sufficient, so sample complexity

n = Θ(σ6)

[1] Bandeira, Rigollet, Weed, Optimal rates of estimation for multi-reference alignment, 2017 26 / 39

slide-128
SLIDE 128

Sample complexity

Theorem [1]: (Upper bound) With noise level σ, can estimate degree-d invariants using n = O(σ2d) samples. (Lower bound) If x(1), x(2) agree on all invariants of degree ≤ d − 1 then Ω(σ2d) samples are required to distinguish them.

◮ Method of invariants is optimal

Question: What degree d∗ of invariants do we need to learn before we can recover (the orbit of) x?

◮ Optimal sample complexity is n = Θ(σ2d∗)

Answer (for MRA) [1]:

◮ For “generic” x, degree 3 is sufficient, so sample complexity

n = Θ(σ6)

◮ But for a measure-zero set of “bad” signals, need much higher

degree (as high as p)

[1] Bandeira, Rigollet, Weed, Optimal rates of estimation for multi-reference alignment, 2017 26 / 39

slide-129
SLIDE 129

Another viewpoint: mixtures of Gaussians

MRA sample: y = g · x + ε with g ∼ G, ε ∼ N(0, σ2I)

27 / 39

slide-130
SLIDE 130

Another viewpoint: mixtures of Gaussians

MRA sample: y = g · x + ε with g ∼ G, ε ∼ N(0, σ2I) The distribution of y is a (uniform) mixture of |G| Gaussians centered at {g · x : g ∈ G}

27 / 39

slide-131
SLIDE 131

Another viewpoint: mixtures of Gaussians

MRA sample: y = g · x + ε with g ∼ G, ε ∼ N(0, σ2I) The distribution of y is a (uniform) mixture of |G| Gaussians centered at {g · x : g ∈ G}

◮ For infinite groups, a mixture of infinitely-many Gaussians

27 / 39

slide-132
SLIDE 132

Another viewpoint: mixtures of Gaussians

MRA sample: y = g · x + ε with g ∼ G, ε ∼ N(0, σ2I) The distribution of y is a (uniform) mixture of |G| Gaussians centered at {g · x : g ∈ G}

◮ For infinite groups, a mixture of infinitely-many Gaussians

Method of moments: Estimate moments E[y], E[yy⊤], . . ., E[y⊗d]

27 / 39

slide-133
SLIDE 133

Another viewpoint: mixtures of Gaussians

MRA sample: y = g · x + ε with g ∼ G, ε ∼ N(0, σ2I) The distribution of y is a (uniform) mixture of |G| Gaussians centered at {g · x : g ∈ G}

◮ For infinite groups, a mixture of infinitely-many Gaussians

Method of moments: Estimate moments E[y], E[yy⊤], . . ., E[y⊗d] E[y⊗k] Eg[(g · x)⊗k]

27 / 39

slide-134
SLIDE 134

Another viewpoint: mixtures of Gaussians

MRA sample: y = g · x + ε with g ∼ G, ε ∼ N(0, σ2I) The distribution of y is a (uniform) mixture of |G| Gaussians centered at {g · x : g ∈ G}

◮ For infinite groups, a mixture of infinitely-many Gaussians

Method of moments: Estimate moments E[y], E[yy⊤], . . ., E[y⊗d] E[y⊗k] Eg[(g · x)⊗k] Fact: Moments are equivalent to invariants

◮ Eg[(g · x)⊗k] contains the same information as the degree-k

invariant polynomials

27 / 39

slide-135
SLIDE 135

Our contributions

Joint work with Ben Blum-Smith, Afonso Bandeira, Amelia Perry, Jonathan Weed

Bandeira, Blum-Smith, Perry, Weed, W., Estimation under group actions: recovering orbits from invariants, 2017 28 / 39

slide-136
SLIDE 136

Our contributions

Joint work with Ben Blum-Smith, Afonso Bandeira, Amelia Perry, Jonathan Weed

◮ We generalize from MRA to any compact group

Bandeira, Blum-Smith, Perry, Weed, W., Estimation under group actions: recovering orbits from invariants, 2017 28 / 39

slide-137
SLIDE 137

Our contributions

Joint work with Ben Blum-Smith, Afonso Bandeira, Amelia Perry, Jonathan Weed

◮ We generalize from MRA to any compact group ◮ Again, the method of invariants/moments is optimal

Bandeira, Blum-Smith, Perry, Weed, W., Estimation under group actions: recovering orbits from invariants, 2017 28 / 39

slide-138
SLIDE 138

Our contributions

Joint work with Ben Blum-Smith, Afonso Bandeira, Amelia Perry, Jonathan Weed

◮ We generalize from MRA to any compact group ◮ Again, the method of invariants/moments is optimal ◮ We give an (inefficient) algorithm that achieves optimal

sample complexity: solve polynomial system

Bandeira, Blum-Smith, Perry, Weed, W., Estimation under group actions: recovering orbits from invariants, 2017 28 / 39

slide-139
SLIDE 139

Our contributions

Joint work with Ben Blum-Smith, Afonso Bandeira, Amelia Perry, Jonathan Weed

◮ We generalize from MRA to any compact group ◮ Again, the method of invariants/moments is optimal ◮ We give an (inefficient) algorithm that achieves optimal

sample complexity: solve polynomial system

◮ To determine what degree of invariants are required, we use

invariant theory and algebraic geometry

◮ How to tell if polynomial equations have a unique solution Bandeira, Blum-Smith, Perry, Weed, W., Estimation under group actions: recovering orbits from invariants, 2017 28 / 39

slide-140
SLIDE 140

Invariant theory

Variables x1, . . . , xp (corresponding to the coordinates of x)

29 / 39

slide-141
SLIDE 141

Invariant theory

Variables x1, . . . , xp (corresponding to the coordinates of x) The invariant ring R[x]G is the subring of R[x] := R[x1, . . . , xp] consisting of polynomials f such that f (g · x) = f (x) ∀g ∈ G.

29 / 39

slide-142
SLIDE 142

Invariant theory

Variables x1, . . . , xp (corresponding to the coordinates of x) The invariant ring R[x]G is the subring of R[x] := R[x1, . . . , xp] consisting of polynomials f such that f (g · x) = f (x) ∀g ∈ G.

◮ Aside: A main result of invariant theory is that R[x]G is

finitely-generated

29 / 39

slide-143
SLIDE 143

Invariant theory

Variables x1, . . . , xp (corresponding to the coordinates of x) The invariant ring R[x]G is the subring of R[x] := R[x1, . . . , xp] consisting of polynomials f such that f (g · x) = f (x) ∀g ∈ G.

◮ Aside: A main result of invariant theory is that R[x]G is

finitely-generated R[x]G

≤d – invariants of degree ≤ d

29 / 39

slide-144
SLIDE 144

Invariant theory

Variables x1, . . . , xp (corresponding to the coordinates of x) The invariant ring R[x]G is the subring of R[x] := R[x1, . . . , xp] consisting of polynomials f such that f (g · x) = f (x) ∀g ∈ G.

◮ Aside: A main result of invariant theory is that R[x]G is

finitely-generated R[x]G

≤d – invariants of degree ≤ d

(Simple) algorithm:

◮ Pick d∗ (to be chosen later) ◮ Using Θ(σ2d∗) samples, estimate invariants up to degree d∗:

learn value f (x) for all f ∈ R[x]G

≤d ◮ Solve for an ˆ

x that is consistent with those values: f (ˆ x) = f (x) ∀f ∈ R[x]G

≤d (polynomial system of equations)

29 / 39

slide-145
SLIDE 145

All invariants determine orbit

Theorem [1]: If G is compact, for every x ∈ V , the full invariant ring R[x]G determines x up to orbit.

◮ In the sense that if x, x′ do not lie in the same orbit, there

exists f ∈ R[x]G that separates them: f (x) = f (x′)

[1] Kaˇ c, Invariant theory lecture notes, 1994 [2] Bandeira, Rigollet, Weed, Optimal rates of estimation for multi-reference alignment, 2017 30 / 39

slide-146
SLIDE 146

All invariants determine orbit

Theorem [1]: If G is compact, for every x ∈ V , the full invariant ring R[x]G determines x up to orbit.

◮ In the sense that if x, x′ do not lie in the same orbit, there

exists f ∈ R[x]G that separates them: f (x) = f (x′) Corollary: Suppose that for some d, R[x]G

≤d generates R[x]G (as

an R-algebra). Then R[x]G

≤d determines x up to orbit and so

sample complexity is O(σ2d).

[1] Kaˇ c, Invariant theory lecture notes, 1994 [2] Bandeira, Rigollet, Weed, Optimal rates of estimation for multi-reference alignment, 2017 30 / 39

slide-147
SLIDE 147

All invariants determine orbit

Theorem [1]: If G is compact, for every x ∈ V , the full invariant ring R[x]G determines x up to orbit.

◮ In the sense that if x, x′ do not lie in the same orbit, there

exists f ∈ R[x]G that separates them: f (x) = f (x′) Corollary: Suppose that for some d, R[x]G

≤d generates R[x]G (as

an R-algebra). Then R[x]G

≤d determines x up to orbit and so

sample complexity is O(σ2d). Problem: This is for worst-case x ∈ V . For MRA (cyclic shifts) this requires d = p whereas generic x only requires d = 3 [2].

[1] Kaˇ c, Invariant theory lecture notes, 1994 [2] Bandeira, Rigollet, Weed, Optimal rates of estimation for multi-reference alignment, 2017 30 / 39

slide-148
SLIDE 148

All invariants determine orbit

Theorem [1]: If G is compact, for every x ∈ V , the full invariant ring R[x]G determines x up to orbit.

◮ In the sense that if x, x′ do not lie in the same orbit, there

exists f ∈ R[x]G that separates them: f (x) = f (x′) Corollary: Suppose that for some d, R[x]G

≤d generates R[x]G (as

an R-algebra). Then R[x]G

≤d determines x up to orbit and so

sample complexity is O(σ2d). Problem: This is for worst-case x ∈ V . For MRA (cyclic shifts) this requires d = p whereas generic x only requires d = 3 [2]. Actually care about whether R[x]G

≤d generically determines R[x]G

[1] Kaˇ c, Invariant theory lecture notes, 1994 [2] Bandeira, Rigollet, Weed, Optimal rates of estimation for multi-reference alignment, 2017 30 / 39

slide-149
SLIDE 149

Do polynomials generically determine other polynomials?

Say we have A ⊆ B ⊆ R[x]

◮ (Technically need to assume B is finitely generated)

31 / 39

slide-150
SLIDE 150

Do polynomials generically determine other polynomials?

Say we have A ⊆ B ⊆ R[x]

◮ (Technically need to assume B is finitely generated)

Question: Do the values {a(x) : a ∈ A} generically determine the values {b(x) : b ∈ B}?

31 / 39

slide-151
SLIDE 151

Do polynomials generically determine other polynomials?

Say we have A ⊆ B ⊆ R[x]

◮ (Technically need to assume B is finitely generated)

Question: Do the values {a(x) : a ∈ A} generically determine the values {b(x) : b ∈ B}? Definition: Polynomials f1, . . . , fm are algebraically independent if there is no P ∈ R[y1, . . . , ym] with P(f1, . . . , fm) ≡ 0.

31 / 39

slide-152
SLIDE 152

Do polynomials generically determine other polynomials?

Say we have A ⊆ B ⊆ R[x]

◮ (Technically need to assume B is finitely generated)

Question: Do the values {a(x) : a ∈ A} generically determine the values {b(x) : b ∈ B}? Definition: Polynomials f1, . . . , fm are algebraically independent if there is no P ∈ R[y1, . . . , ym] with P(f1, . . . , fm) ≡ 0. Definition: For U ⊆ R[x], the transcendence degree trdeg(U) is the number of algebraically independent polynomials in U.

31 / 39

slide-153
SLIDE 153

Do polynomials generically determine other polynomials?

Say we have A ⊆ B ⊆ R[x]

◮ (Technically need to assume B is finitely generated)

Question: Do the values {a(x) : a ∈ A} generically determine the values {b(x) : b ∈ B}? Definition: Polynomials f1, . . . , fm are algebraically independent if there is no P ∈ R[y1, . . . , ym] with P(f1, . . . , fm) ≡ 0. Definition: For U ⊆ R[x], the transcendence degree trdeg(U) is the number of algebraically independent polynomials in U. Answer: Suppose trdeg(A) = trdeg(B). If x is “generic” then the values {a(x) : a ∈ A} determine a finite number of possibilities for the entire collection {b(x) : b ∈ B}.

◮ “Generic”: x lies in a particular full-measure set

31 / 39

slide-154
SLIDE 154

How to test algebraic independence?

32 / 39

slide-155
SLIDE 155

How to test algebraic independence?

This is actually easy!

32 / 39

slide-156
SLIDE 156

How to test algebraic independence?

This is actually easy! Theorem (Jacobian criterion): Polynomials f1, . . . , fm ∈ R[x1, . . . , xp] are algebraically independent if and only if the m × p Jacobian matrix Jij = ∂fi

∂xj has

full row rank. (Still true if you evaluate J at a generic point x.)

32 / 39

slide-157
SLIDE 157

How to test algebraic independence?

This is actually easy! Theorem (Jacobian criterion): Polynomials f1, . . . , fm ∈ R[x1, . . . , xp] are algebraically independent if and only if the m × p Jacobian matrix Jij = ∂fi

∂xj has

full row rank. (Still true if you evaluate J at a generic point x.)

◮ Why: Tests whether map (x1, . . . , xp) → (f1(x), . . . , fm(x)) is

locally surjective

32 / 39

slide-158
SLIDE 158

Generic list recovery

Our main result is an efficient procedure that takes the problem setup as input (group G and action on V ) and outputs the degree d∗ of invariants required for generic list recovery.

◮ List recovery: output a finite list ˆ

x(1), ˆ x(2), . . ., one of which (approximately) lies in the orbit of the true x

33 / 39

slide-159
SLIDE 159

Generic list recovery

Our main result is an efficient procedure that takes the problem setup as input (group G and action on V ) and outputs the degree d∗ of invariants required for generic list recovery.

◮ List recovery: output a finite list ˆ

x(1), ˆ x(2), . . ., one of which (approximately) lies in the orbit of the true x

◮ List recovery may be good enough in practice?

33 / 39

slide-160
SLIDE 160

Generic list recovery

Our main result is an efficient procedure that takes the problem setup as input (group G and action on V ) and outputs the degree d∗ of invariants required for generic list recovery.

◮ List recovery: output a finite list ˆ

x(1), ˆ x(2), . . ., one of which (approximately) lies in the orbit of the true x

◮ List recovery may be good enough in practice?

Procedure:

33 / 39

slide-161
SLIDE 161

Generic list recovery

Our main result is an efficient procedure that takes the problem setup as input (group G and action on V ) and outputs the degree d∗ of invariants required for generic list recovery.

◮ List recovery: output a finite list ˆ

x(1), ˆ x(2), . . ., one of which (approximately) lies in the orbit of the true x

◮ List recovery may be good enough in practice?

Procedure:

◮ Need to test whether R[x]G ≤d determines R[x]G (generically)

33 / 39

slide-162
SLIDE 162

Generic list recovery

Our main result is an efficient procedure that takes the problem setup as input (group G and action on V ) and outputs the degree d∗ of invariants required for generic list recovery.

◮ List recovery: output a finite list ˆ

x(1), ˆ x(2), . . ., one of which (approximately) lies in the orbit of the true x

◮ List recovery may be good enough in practice?

Procedure:

◮ Need to test whether R[x]G ≤d determines R[x]G (generically) ◮ So need to check if trdeg(R[x]G ≤d) = trdeg(R[x]G)

33 / 39

slide-163
SLIDE 163

Generic list recovery

Our main result is an efficient procedure that takes the problem setup as input (group G and action on V ) and outputs the degree d∗ of invariants required for generic list recovery.

◮ List recovery: output a finite list ˆ

x(1), ˆ x(2), . . ., one of which (approximately) lies in the orbit of the true x

◮ List recovery may be good enough in practice?

Procedure:

◮ Need to test whether R[x]G ≤d determines R[x]G (generically) ◮ So need to check if trdeg(R[x]G ≤d) = trdeg(R[x]G) ◮ trdeg(R[x]G) is easy: dim(x) − dim(orbit)

33 / 39

slide-164
SLIDE 164

Generic list recovery

Our main result is an efficient procedure that takes the problem setup as input (group G and action on V ) and outputs the degree d∗ of invariants required for generic list recovery.

◮ List recovery: output a finite list ˆ

x(1), ˆ x(2), . . ., one of which (approximately) lies in the orbit of the true x

◮ List recovery may be good enough in practice?

Procedure:

◮ Need to test whether R[x]G ≤d determines R[x]G (generically) ◮ So need to check if trdeg(R[x]G ≤d) = trdeg(R[x]G) ◮ trdeg(R[x]G) is easy: dim(x) − dim(orbit) ◮ trdeg(R[x]G ≤d) via Jacobian criterion

33 / 39

slide-165
SLIDE 165

Generic list recovery

Our main result is an efficient procedure that takes the problem setup as input (group G and action on V ) and outputs the degree d∗ of invariants required for generic list recovery.

◮ List recovery: output a finite list ˆ

x(1), ˆ x(2), . . ., one of which (approximately) lies in the orbit of the true x

◮ List recovery may be good enough in practice?

Comments:

34 / 39

slide-166
SLIDE 166

Generic list recovery

Our main result is an efficient procedure that takes the problem setup as input (group G and action on V ) and outputs the degree d∗ of invariants required for generic list recovery.

◮ List recovery: output a finite list ˆ

x(1), ˆ x(2), . . ., one of which (approximately) lies in the orbit of the true x

◮ List recovery may be good enough in practice?

Comments:

◮ For e.g. MRA (cyclic shifts), need to test each p separately on

a computer

34 / 39

slide-167
SLIDE 167

Generic list recovery

Our main result is an efficient procedure that takes the problem setup as input (group G and action on V ) and outputs the degree d∗ of invariants required for generic list recovery.

◮ List recovery: output a finite list ˆ

x(1), ˆ x(2), . . ., one of which (approximately) lies in the orbit of the true x

◮ List recovery may be good enough in practice?

Comments:

◮ For e.g. MRA (cyclic shifts), need to test each p separately on

a computer

◮ Not an efficient algorithm to solve any particular instance

34 / 39

slide-168
SLIDE 168

Generic list recovery

Our main result is an efficient procedure that takes the problem setup as input (group G and action on V ) and outputs the degree d∗ of invariants required for generic list recovery.

◮ List recovery: output a finite list ˆ

x(1), ˆ x(2), . . ., one of which (approximately) lies in the orbit of the true x

◮ List recovery may be good enough in practice?

Comments:

◮ For e.g. MRA (cyclic shifts), need to test each p separately on

a computer

◮ Not an efficient algorithm to solve any particular instance ◮ There is also an algorithm to bound the size of the list (or test

for unique recovery), but it is not efficient (Gr¨

  • bner bases)

34 / 39

slide-169
SLIDE 169

Generalized orbit recovery problem

Extensions:

35 / 39

slide-170
SLIDE 170

Generalized orbit recovery problem

Extensions:

◮ Projection (e.g. cryo-EM):

◮ Observe yi = Π(gi · x) + εi ◮ Π : V → W linear ◮ εi ∼ N(0, σ2I) 35 / 39

slide-171
SLIDE 171

Generalized orbit recovery problem

Extensions:

◮ Projection (e.g. cryo-EM):

◮ Observe yi = Π(gi · x) + εi ◮ Π : V → W linear ◮ εi ∼ N(0, σ2I)

◮ Heterogeneity:

◮ K signals x(1), . . . , x(K) ◮ Mixing weights (w1, . . . , wK) ∈ ∆K ◮ Observe yi = Π(gi · x(ki)) + εi ◮ ki ∼ {1, . . . , K} according to w 35 / 39

slide-172
SLIDE 172

Generalized orbit recovery problem

Extensions:

◮ Projection (e.g. cryo-EM):

◮ Observe yi = Π(gi · x) + εi ◮ Π : V → W linear ◮ εi ∼ N(0, σ2I)

◮ Heterogeneity:

◮ K signals x(1), . . . , x(K) ◮ Mixing weights (w1, . . . , wK) ∈ ∆K ◮ Observe yi = Π(gi · x(ki)) + εi ◮ ki ∼ {1, . . . , K} according to w

Same methods apply!

35 / 39

slide-173
SLIDE 173

Generalized orbit recovery problem

Extensions:

◮ Projection (e.g. cryo-EM):

◮ Observe yi = Π(gi · x) + εi ◮ Π : V → W linear ◮ εi ∼ N(0, σ2I)

◮ Heterogeneity:

◮ K signals x(1), . . . , x(K) ◮ Mixing weights (w1, . . . , wK) ∈ ∆K ◮ Observe yi = Π(gi · x(ki)) + εi ◮ ki ∼ {1, . . . , K} according to w

Same methods apply!

◮ Order-d moments now only give access to a particular

subspace of R[x]G

35 / 39

slide-174
SLIDE 174

Generalized orbit recovery problem

Extensions:

◮ Projection (e.g. cryo-EM):

◮ Observe yi = Π(gi · x) + εi ◮ Π : V → W linear ◮ εi ∼ N(0, σ2I)

◮ Heterogeneity:

◮ K signals x(1), . . . , x(K) ◮ Mixing weights (w1, . . . , wK) ∈ ∆K ◮ Observe yi = Π(gi · x(ki)) + εi ◮ ki ∼ {1, . . . , K} according to w

Same methods apply!

◮ Order-d moments now only give access to a particular

subspace of R[x]G

◮ For heterogeneity, work over a bigger group G K acting on

(x(1), . . . , x(K)) ∈ V ⊕K

35 / 39

slide-175
SLIDE 175

Results: cryo-EM

Our methods show that for cryo-EM, generic list recovery is possible at degree 3

36 / 39

slide-176
SLIDE 176

Results: cryo-EM

Our methods show that for cryo-EM, generic list recovery is possible at degree 3 So information-theoretic sample complexity is Θ(σ6)

36 / 39

slide-177
SLIDE 177

Results: cryo-EM

Our methods show that for cryo-EM, generic list recovery is possible at degree 3 So information-theoretic sample complexity is Θ(σ6) Ongoing work: polynomial time algorithm for cryo-EM

36 / 39

slide-178
SLIDE 178

Efficient recovery: tensor decomposition

Restrict to finite group Recall: with O(σ6) samples, can estimate the third moment: T3(x) =

  • g∈G

(g · x)⊗3

[1] Perry, Weed, Bandeira, Rigollet, Singer, The sample complexity of multi-reference alignment, 2017 37 / 39

slide-179
SLIDE 179

Efficient recovery: tensor decomposition

Restrict to finite group Recall: with O(σ6) samples, can estimate the third moment: T3(x) =

  • g∈G

(g · x)⊗3 This is an instance of tensor decomposition: Given m

i=1 a⊗3 i

for some a1, . . . , am ∈ Rp, recover {ai}

[1] Perry, Weed, Bandeira, Rigollet, Singer, The sample complexity of multi-reference alignment, 2017 37 / 39

slide-180
SLIDE 180

Efficient recovery: tensor decomposition

Restrict to finite group Recall: with O(σ6) samples, can estimate the third moment: T3(x) =

  • g∈G

(g · x)⊗3 This is an instance of tensor decomposition: Given m

i=1 a⊗3 i

for some a1, . . . , am ∈ Rp, recover {ai} For MRA: since m ≤ p (“undercomplete”) can apply Jennrich’s algorithm to decompose tensor efficiently [1]

[1] Perry, Weed, Bandeira, Rigollet, Singer, The sample complexity of multi-reference alignment, 2017 37 / 39

slide-181
SLIDE 181

Example: heterogeneous MRA

MRA with multiple signals x(1), . . . , x(K) Td(x) =

K

  • k=1
  • g∈G

(g · x(k))⊗d

[1] Perry, Weed, Bandeira, Rigollet, Singer ’17 [2] Boumal, Bendory, Lederman, Singer ’17 [3] Ma, Shi, Steurer ’16 38 / 39

slide-182
SLIDE 182

Example: heterogeneous MRA

MRA with multiple signals x(1), . . . , x(K) Td(x) =

K

  • k=1
  • g∈G

(g · x(k))⊗d Jennrich’s algorithm works if given 5th moment n = O(σ10) [1]

[1] Perry, Weed, Bandeira, Rigollet, Singer ’17 [2] Boumal, Bendory, Lederman, Singer ’17 [3] Ma, Shi, Steurer ’16 38 / 39

slide-183
SLIDE 183

Example: heterogeneous MRA

MRA with multiple signals x(1), . . . , x(K) Td(x) =

K

  • k=1
  • g∈G

(g · x(k))⊗d Jennrich’s algorithm works if given 5th moment n = O(σ10) [1] Information-theoretically, 3rd moment suffices if K ≤ p/6

[1] Perry, Weed, Bandeira, Rigollet, Singer ’17 [2] Boumal, Bendory, Lederman, Singer ’17 [3] Ma, Shi, Steurer ’16 38 / 39

slide-184
SLIDE 184

Example: heterogeneous MRA

MRA with multiple signals x(1), . . . , x(K) Td(x) =

K

  • k=1
  • g∈G

(g · x(k))⊗d Jennrich’s algorithm works if given 5th moment n = O(σ10) [1] Information-theoretically, 3rd moment suffices if K ≤ p/6 If signals x(k) are random (i.i.d. Gaussian), conjectured that efficient recovery is possible from 3rd moment iff K ≤ √p [2]

[1] Perry, Weed, Bandeira, Rigollet, Singer ’17 [2] Boumal, Bendory, Lederman, Singer ’17 [3] Ma, Shi, Steurer ’16 38 / 39

slide-185
SLIDE 185

Example: heterogeneous MRA

MRA with multiple signals x(1), . . . , x(K) Td(x) =

K

  • k=1
  • g∈G

(g · x(k))⊗d Jennrich’s algorithm works if given 5th moment n = O(σ10) [1] Information-theoretically, 3rd moment suffices if K ≤ p/6 If signals x(k) are random (i.i.d. Gaussian), conjectured that efficient recovery is possible from 3rd moment iff K ≤ √p [2] New result (with A. Moitra): if K ≤ √p/polylog(p) then for random signals, efficient recovery is possible from 3rd moment

◮ Based on random overcomplete 3-tensor decomposition [3]

[1] Perry, Weed, Bandeira, Rigollet, Singer ’17 [2] Boumal, Bendory, Lederman, Singer ’17 [3] Ma, Shi, Steurer ’16 38 / 39

slide-186
SLIDE 186

Acknowledgements

39 / 39

slide-187
SLIDE 187

Acknowledgements

◮ Ankur Moitra

39 / 39

slide-188
SLIDE 188

Acknowledgements

◮ Ankur Moitra ◮ Michel Goemans

39 / 39

slide-189
SLIDE 189

Acknowledgements

◮ Ankur Moitra ◮ Michel Goemans ◮ Philippe Rigollet

39 / 39

slide-190
SLIDE 190

Acknowledgements

◮ Ankur Moitra ◮ Michel Goemans ◮ Philippe Rigollet ◮ Afonso Bandeira

39 / 39

slide-191
SLIDE 191

Acknowledgements

◮ Ankur Moitra ◮ Michel Goemans ◮ Philippe Rigollet ◮ Afonso Bandeira ◮ Collaborators

39 / 39

slide-192
SLIDE 192

Acknowledgements

◮ Ankur Moitra ◮ Michel Goemans ◮ Philippe Rigollet ◮ Afonso Bandeira ◮ Collaborators ◮ Family

39 / 39

slide-193
SLIDE 193

Acknowledgements

◮ Ankur Moitra ◮ Michel Goemans ◮ Philippe Rigollet ◮ Afonso Bandeira ◮ Collaborators ◮ Family ◮ Thank you!

39 / 39