[PPT] - Randomized sparse Kaczmarz methods Dirk Lorenz, joint with Frank PowerPoint Presentation

SLIDE 1

Randomized sparse Kaczmarz methods

Dirk Lorenz, joint with Frank Schöpfer, Feb 9, 2018

Inverse Problems and Machine Learning, Caltech 2018

SLIDE 2

The Kaczmarz method Randomization Sparsity Split feasibility problems Convergence rates

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 2 of 30

SLIDE 3

Just solving systems of linear equations

Ax = b pretty arbitrary (but consistent), m rows, n columns

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 3 of 30

SLIDE 4

Just solving systems of linear equations

Ax = b pretty arbitrary (but consistent), m rows, n columns Solve only one row ai, x = b by projecting onto the hyperplane of solutions: xk+1 = xk − xk, ai − bi ai2 ai

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 3 of 30

SLIDE 5

Just solving systems of linear equations

Ax = b pretty arbitrary (but consistent), m rows, n columns Solve only one row ai, x = b by projecting onto the hyperplane of solutions: xk+1 = xk − xk, ai − bi ai2 ai Each projection just needs O(n) operations

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 3 of 30

SLIDE 6

Just solving systems of linear equations

Ax = b pretty arbitrary (but consistent), m rows, n columns Solve only one row ai, x = b by projecting onto the hyperplane of solutions: xk+1 = xk − xk, ai − bi ai2 ai Each projection just needs O(n) operations Amount for one pass through all columns same as applying A

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 3 of 30

SLIDE 7

Just solving systems of linear equations

Ax = b pretty arbitrary (but consistent), m rows, n columns Solve only one row ai, x = b by projecting onto the hyperplane of solutions: xk+1 = xk − xk, ai − bi ai2 ai Each projection just needs O(n) operations Amount for one pass through all columns same as applying A Stefan Kaczmarz [1937]: Convergent to some solution for all consistent systems

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 3 of 30

SLIDE 8

Learning with Kaczmarz

Unknown distibution ρ on X × Y = Rd × R, regression function fρ(a) = y dρ(y | a) Hypothesis space H = {fx ∈ L2

ρX, x ∈ Rd}, fx(a) = a, x

Learning: Obtain samples a ∈ X′, b ∈ Y sequentially and try to learn x Kaczmarz: Update xk by xk+1 = xk − xk, a − b a2 a Goal: Show that xk converges to some x∗ such that fx∗ = argmin

f ∈H

E(f ) = argmin

f ∈H

X×Y(b − f (a))2dρ

[Lin, Zhou 2015] Here focus on Kaczmarz as an algorithm for solving systems

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 4 of 30

SLIDE 9

Convergence speed?

m = 6 rows, n = 2 columns: −1 1 −1 −0.5 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0.5 1 linear order random order

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 5 of 30

SLIDE 10

Convergence speed?

m = 12 rows, n = 2 columns: −1 1 −1 −0.5 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0.5 1 linear order random order

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 5 of 30

SLIDE 11

Convergence speed?

rows, n = 2 columns: −1 1 −1 −0.5 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0.5 1 linear order random order Btw: Randomized Kaczmarz is stochastic gradient descent for ∑i(ai, x − bi)2

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 5 of 30

SLIDE 12

The Kaczmarz method Randomization Sparsity Split feasibility problems Convergence rates

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 6 of 30

SLIDE 13

Randomization leads to linear convergence

In each iteration, choose index i with probability pi. If ˆ x solves (i.e. ˆ x, ai = bi), then xk+1 − ˆ x2 = xk − ˆ x2 − (xk − ˆ x, ai)2 ai2 . Taking the expectation over the choice of i gives E(xk+1 − ˆ x2) = xk − ˆ x2 − ∑

i

pi (xk − ˆ x, ai)2 ai2 = xk − ˆ x2 − A(xk − ˆ x), DA(xk − ˆ x) with D = diag(pi/ai2). Gives uniform improvement E(xk+1 − ˆ x2) ≤ (1 − λ)xk − ˆ x2, λ = λmin(ATDA)

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 7 of 30

SLIDE 14

Theorem

A ∈ Rm×n, m ≥ n with full rank, Aˆ x = b, then iterates of randomized Kaczmarz fulfill E(xk − ˆ x2) ≤ (1 − λ)kx0 − ˆ x2 with λ = λmin(ATDA), D = diag(pi/ai2).

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 8 of 30

SLIDE 15

Theorem

A ∈ Rm×n, m ≥ n with full rank, Aˆ x = b, then iterates of randomized Kaczmarz fulfill E(xk − ˆ x2) ≤ (1 − λ)kx0 − ˆ x2 with λ = λmin(ATDA), D = diag(pi/ai2). Result due to [Stohmer, Vershynin 2009]

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 8 of 30

SLIDE 16

Theorem

A ∈ Rm×n, m ≥ n with full rank, Aˆ x = b, then iterates of randomized Kaczmarz fulfill E(xk − ˆ x2) ≤ (1 − λ)kx0 − ˆ x2 with λ = λmin(ATDA), D = diag(pi/ai2). Result due to [Stohmer, Vershynin 2009] Choice pi = ai2

A2

F gives D = A−2

F I, i.e.

λ = λmin(ATA) A2

F

= σmin(A) A2

F

=: κ(A)

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 8 of 30

SLIDE 17

Theorem

A ∈ Rm×n, m ≥ n with full rank, Aˆ x = b, then iterates of randomized Kaczmarz fulfill E(xk − ˆ x2) ≤ (1 − λ)kx0 − ˆ x2 with λ = λmin(ATDA), D = diag(pi/ai2). Result due to [Stohmer, Vershynin 2009] Choice pi = ai2

A2

F gives D = A−2

F I, i.e.

λ = λmin(ATA) A2

F

= σmin(A) A2

F

=: κ(A) Experimentally: above p not optimal, other p give larger λ

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 8 of 30

SLIDE 18

Underdetermined systems

Consider Ax = b, underdetermined but consistent

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 9 of 30

SLIDE 19

Underdetermined systems

Consider Ax = b, underdetermined but consistent Which solution does Kaczmarz pick?

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 9 of 30

SLIDE 20

Underdetermined systems

Consider Ax = b, underdetermined but consistent Which solution does Kaczmarz pick? Initialization x0 = 0 (or x0 ∈ rg AT), then all iterates xk ∈ rg AT

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 9 of 30

SLIDE 21

Underdetermined systems

Consider Ax = b, underdetermined but consistent Which solution does Kaczmarz pick? Initialization x0 = 0 (or x0 ∈ rg AT), then all iterates xk ∈ rg AT Assume ˆ x solution in rg AT

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 9 of 30

SLIDE 22

Underdetermined systems

Consider Ax = b, underdetermined but consistent Which solution does Kaczmarz pick? Initialization x0 = 0 (or x0 ∈ rg AT), then all iterates xk ∈ rg AT Assume ˆ x solution in rg AT Z ∈ Rn×m, columns form ONB of rg AT, then xk = ZZTxk, ZZTˆ x = ˆ x.

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 9 of 30

SLIDE 23

Underdetermined systems

Consider Ax = b, underdetermined but consistent Which solution does Kaczmarz pick? Initialization x0 = 0 (or x0 ∈ rg AT), then all iterates xk ∈ rg AT Assume ˆ x solution in rg AT Z ∈ Rn×m, columns form ONB of rg AT, then xk = ZZTxk, ZZTˆ x = ˆ x. As above: E(xk − ˆ x2) ≤ (1 − λ)kx0 − ˆ x2 λ = λmin(ZTATDAZ), D = diag(pi/ai2)

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 9 of 30

SLIDE 24

Underdetermined systems

Consider Ax = b, underdetermined but consistent Which solution does Kaczmarz pick? Initialization x0 = 0 (or x0 ∈ rg AT), then all iterates xk ∈ rg AT Assume ˆ x solution in rg AT Z ∈ Rn×m, columns form ONB of rg AT, then xk = ZZTxk, ZZTˆ x = ˆ x. As above: E(xk − ˆ x2) ≤ (1 − λ)kx0 − ˆ x2 λ = λmin(ZTATDAZ), D = diag(pi/ai2) Convergence to minimum-norm solution ˆ x

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 9 of 30

SLIDE 25

The Kaczmarz method Randomization Sparsity Split feasibility problems Convergence rates

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 10 of 30

SLIDE 26

Kaczmarz converging to sparse solutions?

Kaczmarz converges to (unique) solution in x0 + rg AT (if consistent)

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 11 of 30

SLIDE 27

Kaczmarz converging to sparse solutions?

Kaczmarz converges to (unique) solution in x0 + rg AT (if consistent) This is the solution with min x2

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 11 of 30

SLIDE 28

Kaczmarz converging to sparse solutions?

Kaczmarz converges to (unique) solution in x0 + rg AT (if consistent) This is the solution with min x2 Convergence to other solutions? (e.g. min x1)

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 11 of 30

SLIDE 29

Kaczmarz converging to sparse solutions?

Kaczmarz converges to (unique) solution in x0 + rg AT (if consistent) This is the solution with min x2 Convergence to other solutions? (e.g. min x1) Kaczmarz xk+1 = xk − aT

i xk − bi

ai2

2

ai

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 11 of 30

SLIDE 30

Kaczmarz converging to sparse solutions?

Kaczmarz converges to (unique) solution in x0 + rg AT (if consistent) This is the solution with min x2 Convergence to other solutions? (e.g. min x1) Sparse Kaczmarz zk+1 = zk − aT

i xk − bi

ai2

2

ai xk+1 = Sλ(zk+1)

Sλ −λ λ

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 11 of 30

SLIDE 31

Kaczmarz converging to sparse solutions?

Kaczmarz converges to (unique) solution in x0 + rg AT (if consistent) This is the solution with min x2 Convergence to other solutions? (e.g. min x1) Sparse Kaczmarz zk+1 = zk − aT

i xk − bi

ai2

2

ai xk+1 = Sλ(zk+1)

Sλ −λ λ

Theorem [L, Schöpfer, Wenger, Magnor 2014]: The sequence xk, when initialized with x0 = 0, converges to the solution of min x1 +

1 2λx2 2 such that Ax = b

if every i appears infinitely often

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 11 of 30

SLIDE 32

Sparse Kaczmarz and linearized Bregman

zk+1 = zk − aT

r(k)xk − br(k)

ar(k)2

2

ar(k) xk+1 = Sλ(zk+1) Two interesting things:

1. Very similar to Kaczmarz. Other “minimum-J-solutions” possible?
2. Very similar to linearized Bregman iteration.

zk+1 = zk − tkAT(Axk − b), tk ≤

1 A2

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 12 of 30

SLIDE 33

Sparse Kaczmarz and linearized Bregman

zk+1 = zk − aT

r(k)xk − br(k)

ar(k)2

2

ar(k) xk+1 = Sλ(zk+1) Two interesting things:

1. Very similar to Kaczmarz. Other “minimum-J-solutions” possible?
2. Very similar to linearized Bregman iteration.

zk+1 = zk − tkAT(Axk − b), tk ≤

1 A2

Approach taken here: “Split feasibility problems” will answer the first and explain the second point.

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 12 of 30

SLIDE 34

The Kaczmarz method Randomization Sparsity Split feasibility problems Convergence rates

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 13 of 30

SLIDE 35

Convex split feasibility problems

Split feasibility problem (SFP): Find x, such that x ∈

NC

i=1

Ci, Aix ∈ Qi, i = 1, . . . , NQ Ci, Qi convex sets, Ai linear

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 14 of 30

SLIDE 36

Convex split feasibility problems

Split feasibility problem (SFP): Find x, such that x ∈

NC

i=1

Ci, Aix ∈ Qi, i = 1, . . . , NQ Ci, Qi convex sets, Ai linear For a mere “feasibility problem”: Do alternating projections xk+1 = PCi(xk) i = (k mod NC) + 1 “control sequence”

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 14 of 30

SLIDE 37

Convex split feasibility problems

Split feasibility problem (SFP): Find x, such that x ∈

NC

i=1

Ci, Aix ∈ Qi, i = 1, . . . , NQ Ci, Qi convex sets, Ai linear For a mere “feasibility problem”: Do alternating projections xk+1 = PCi(xk) i = (k mod NC) + 1 “control sequence” [1933 von Neumann (two subspaces), 1962 Halperin (several subspaces), Dijkstra, Censor, Combettes, Bauschke, Borwein, Deutsch, Lewis, Luke…]

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 14 of 30

SLIDE 38