Randomized sparse Kaczmarz methods Dirk Lorenz, joint with Frank - - PowerPoint PPT Presentation

randomized sparse kaczmarz methods
SMART_READER_LITE
LIVE PREVIEW

Randomized sparse Kaczmarz methods Dirk Lorenz, joint with Frank - - PowerPoint PPT Presentation

Randomized sparse Kaczmarz methods Dirk Lorenz, joint with Frank Schpfer, Feb 9, 2018 Inverse Problems and Machine Learning, Caltech 2018 The Kaczmarz method Randomization Sparsity Split feasibility problems Convergence rates Feb 9, 2018


slide-1
SLIDE 1

Randomized sparse Kaczmarz methods

Dirk Lorenz, joint with Frank Schöpfer, Feb 9, 2018

Inverse Problems and Machine Learning, Caltech 2018

slide-2
SLIDE 2

The Kaczmarz method Randomization Sparsity Split feasibility problems Convergence rates

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 2 of 30

slide-3
SLIDE 3

Just solving systems of linear equations

Ax = b pretty arbitrary (but consistent), m rows, n columns

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 3 of 30

slide-4
SLIDE 4

Just solving systems of linear equations

Ax = b pretty arbitrary (but consistent), m rows, n columns Solve only one row ai, x = b by projecting onto the hyperplane of solutions: xk+1 = xk − xk, ai − bi ai2 ai

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 3 of 30

slide-5
SLIDE 5

Just solving systems of linear equations

Ax = b pretty arbitrary (but consistent), m rows, n columns Solve only one row ai, x = b by projecting onto the hyperplane of solutions: xk+1 = xk − xk, ai − bi ai2 ai Each projection just needs O(n) operations

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 3 of 30

slide-6
SLIDE 6

Just solving systems of linear equations

Ax = b pretty arbitrary (but consistent), m rows, n columns Solve only one row ai, x = b by projecting onto the hyperplane of solutions: xk+1 = xk − xk, ai − bi ai2 ai Each projection just needs O(n) operations Amount for one pass through all columns same as applying A

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 3 of 30

slide-7
SLIDE 7

Just solving systems of linear equations

Ax = b pretty arbitrary (but consistent), m rows, n columns Solve only one row ai, x = b by projecting onto the hyperplane of solutions: xk+1 = xk − xk, ai − bi ai2 ai Each projection just needs O(n) operations Amount for one pass through all columns same as applying A Stefan Kaczmarz [1937]: Convergent to some solution for all consistent systems

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 3 of 30

slide-8
SLIDE 8

Learning with Kaczmarz

Unknown distibution ρ on X × Y = Rd × R, regression function fρ(a) = y dρ(y | a) Hypothesis space H = {fx ∈ L2

ρX, x ∈ Rd}, fx(a) = a, x

Learning: Obtain samples a ∈ X′, b ∈ Y sequentially and try to learn x Kaczmarz: Update xk by xk+1 = xk − xk, a − b a2 a Goal: Show that xk converges to some x∗ such that fx∗ = argmin

f ∈H

E(f ) = argmin

f ∈H

  • X×Y(b − f (a))2dρ

[Lin, Zhou 2015] Here focus on Kaczmarz as an algorithm for solving systems

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 4 of 30

slide-9
SLIDE 9

Convergence speed?

m = 6 rows, n = 2 columns: −1 1 −1 −0.5 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0.5 1 linear order random order

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 5 of 30

slide-10
SLIDE 10

Convergence speed?

m = 12 rows, n = 2 columns: −1 1 −1 −0.5 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0.5 1 linear order random order

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 5 of 30

slide-11
SLIDE 11

Convergence speed?

rows, n = 2 columns: −1 1 −1 −0.5 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0.5 1 linear order random order Btw: Randomized Kaczmarz is stochastic gradient descent for ∑i(ai, x − bi)2

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 5 of 30

slide-12
SLIDE 12

The Kaczmarz method Randomization Sparsity Split feasibility problems Convergence rates

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 6 of 30

slide-13
SLIDE 13

Randomization leads to linear convergence

In each iteration, choose index i with probability pi. If ˆ x solves (i.e. ˆ x, ai = bi), then xk+1 − ˆ x2 = xk − ˆ x2 − (xk − ˆ x, ai)2 ai2 . Taking the expectation over the choice of i gives E(xk+1 − ˆ x2) = xk − ˆ x2 − ∑

i

pi (xk − ˆ x, ai)2 ai2 = xk − ˆ x2 − A(xk − ˆ x), DA(xk − ˆ x) with D = diag(pi/ai2). Gives uniform improvement E(xk+1 − ˆ x2) ≤ (1 − λ)xk − ˆ x2, λ = λmin(ATDA)

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 7 of 30

slide-14
SLIDE 14

Theorem

A ∈ Rm×n, m ≥ n with full rank, Aˆ x = b, then iterates of randomized Kaczmarz fulfill E(xk − ˆ x2) ≤ (1 − λ)kx0 − ˆ x2 with λ = λmin(ATDA), D = diag(pi/ai2).

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 8 of 30

slide-15
SLIDE 15

Theorem

A ∈ Rm×n, m ≥ n with full rank, Aˆ x = b, then iterates of randomized Kaczmarz fulfill E(xk − ˆ x2) ≤ (1 − λ)kx0 − ˆ x2 with λ = λmin(ATDA), D = diag(pi/ai2). Result due to [Stohmer, Vershynin 2009]

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 8 of 30

slide-16
SLIDE 16

Theorem

A ∈ Rm×n, m ≥ n with full rank, Aˆ x = b, then iterates of randomized Kaczmarz fulfill E(xk − ˆ x2) ≤ (1 − λ)kx0 − ˆ x2 with λ = λmin(ATDA), D = diag(pi/ai2). Result due to [Stohmer, Vershynin 2009] Choice pi = ai2

A2

F gives D = A−2

F I, i.e.

λ = λmin(ATA) A2

F

= σmin(A) A2

F

=: κ(A)

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 8 of 30

slide-17
SLIDE 17

Theorem

A ∈ Rm×n, m ≥ n with full rank, Aˆ x = b, then iterates of randomized Kaczmarz fulfill E(xk − ˆ x2) ≤ (1 − λ)kx0 − ˆ x2 with λ = λmin(ATDA), D = diag(pi/ai2). Result due to [Stohmer, Vershynin 2009] Choice pi = ai2

A2

F gives D = A−2

F I, i.e.

λ = λmin(ATA) A2

F

= σmin(A) A2

F

=: κ(A) Experimentally: above p not optimal, other p give larger λ

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 8 of 30

slide-18
SLIDE 18

Underdetermined systems

Consider Ax = b, underdetermined but consistent

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 9 of 30

slide-19
SLIDE 19

Underdetermined systems

Consider Ax = b, underdetermined but consistent Which solution does Kaczmarz pick?

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 9 of 30

slide-20
SLIDE 20

Underdetermined systems

Consider Ax = b, underdetermined but consistent Which solution does Kaczmarz pick? Initialization x0 = 0 (or x0 ∈ rg AT), then all iterates xk ∈ rg AT

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 9 of 30

slide-21
SLIDE 21

Underdetermined systems

Consider Ax = b, underdetermined but consistent Which solution does Kaczmarz pick? Initialization x0 = 0 (or x0 ∈ rg AT), then all iterates xk ∈ rg AT Assume ˆ x solution in rg AT

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 9 of 30

slide-22
SLIDE 22

Underdetermined systems

Consider Ax = b, underdetermined but consistent Which solution does Kaczmarz pick? Initialization x0 = 0 (or x0 ∈ rg AT), then all iterates xk ∈ rg AT Assume ˆ x solution in rg AT Z ∈ Rn×m, columns form ONB of rg AT, then xk = ZZTxk, ZZTˆ x = ˆ x.

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 9 of 30

slide-23
SLIDE 23

Underdetermined systems

Consider Ax = b, underdetermined but consistent Which solution does Kaczmarz pick? Initialization x0 = 0 (or x0 ∈ rg AT), then all iterates xk ∈ rg AT Assume ˆ x solution in rg AT Z ∈ Rn×m, columns form ONB of rg AT, then xk = ZZTxk, ZZTˆ x = ˆ x. As above: E(xk − ˆ x2) ≤ (1 − λ)kx0 − ˆ x2 λ = λmin(ZTATDAZ), D = diag(pi/ai2)

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 9 of 30

slide-24
SLIDE 24

Underdetermined systems

Consider Ax = b, underdetermined but consistent Which solution does Kaczmarz pick? Initialization x0 = 0 (or x0 ∈ rg AT), then all iterates xk ∈ rg AT Assume ˆ x solution in rg AT Z ∈ Rn×m, columns form ONB of rg AT, then xk = ZZTxk, ZZTˆ x = ˆ x. As above: E(xk − ˆ x2) ≤ (1 − λ)kx0 − ˆ x2 λ = λmin(ZTATDAZ), D = diag(pi/ai2) Convergence to minimum-norm solution ˆ x

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 9 of 30

slide-25
SLIDE 25

The Kaczmarz method Randomization Sparsity Split feasibility problems Convergence rates

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 10 of 30

slide-26
SLIDE 26

Kaczmarz converging to sparse solutions?

Kaczmarz converges to (unique) solution in x0 + rg AT (if consistent)

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 11 of 30

slide-27
SLIDE 27

Kaczmarz converging to sparse solutions?

Kaczmarz converges to (unique) solution in x0 + rg AT (if consistent) This is the solution with min x2

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 11 of 30

slide-28
SLIDE 28

Kaczmarz converging to sparse solutions?

Kaczmarz converges to (unique) solution in x0 + rg AT (if consistent) This is the solution with min x2 Convergence to other solutions? (e.g. min x1)

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 11 of 30

slide-29
SLIDE 29

Kaczmarz converging to sparse solutions?

Kaczmarz converges to (unique) solution in x0 + rg AT (if consistent) This is the solution with min x2 Convergence to other solutions? (e.g. min x1) Kaczmarz xk+1 = xk − aT

i xk − bi

ai2

2

ai

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 11 of 30

slide-30
SLIDE 30

Kaczmarz converging to sparse solutions?

Kaczmarz converges to (unique) solution in x0 + rg AT (if consistent) This is the solution with min x2 Convergence to other solutions? (e.g. min x1) Sparse Kaczmarz zk+1 = zk − aT

i xk − bi

ai2

2

ai xk+1 = Sλ(zk+1)

Sλ −λ λ

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 11 of 30

slide-31
SLIDE 31

Kaczmarz converging to sparse solutions?

Kaczmarz converges to (unique) solution in x0 + rg AT (if consistent) This is the solution with min x2 Convergence to other solutions? (e.g. min x1) Sparse Kaczmarz zk+1 = zk − aT

i xk − bi

ai2

2

ai xk+1 = Sλ(zk+1)

Sλ −λ λ

Theorem [L, Schöpfer, Wenger, Magnor 2014]: The sequence xk, when initialized with x0 = 0, converges to the solution of min x1 +

1 2λx2 2 such that Ax = b

if every i appears infinitely often

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 11 of 30

slide-32
SLIDE 32

Sparse Kaczmarz and linearized Bregman

zk+1 = zk − aT

r(k)xk − br(k)

ar(k)2

2

ar(k) xk+1 = Sλ(zk+1) Two interesting things:

  • 1. Very similar to Kaczmarz. Other “minimum-J-solutions” possible?
  • 2. Very similar to linearized Bregman iteration.

zk+1 = zk − tkAT(Axk − b), tk ≤

1 A2

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 12 of 30

slide-33
SLIDE 33

Sparse Kaczmarz and linearized Bregman

zk+1 = zk − aT

r(k)xk − br(k)

ar(k)2

2

ar(k) xk+1 = Sλ(zk+1) Two interesting things:

  • 1. Very similar to Kaczmarz. Other “minimum-J-solutions” possible?
  • 2. Very similar to linearized Bregman iteration.

zk+1 = zk − tkAT(Axk − b), tk ≤

1 A2

Approach taken here: “Split feasibility problems” will answer the first and explain the second point.

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 12 of 30

slide-34
SLIDE 34

The Kaczmarz method Randomization Sparsity Split feasibility problems Convergence rates

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 13 of 30

slide-35
SLIDE 35

Convex split feasibility problems

Split feasibility problem (SFP): Find x, such that x ∈

NC

  • i=1

Ci, Aix ∈ Qi, i = 1, . . . , NQ Ci, Qi convex sets, Ai linear

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 14 of 30

slide-36
SLIDE 36

Convex split feasibility problems

Split feasibility problem (SFP): Find x, such that x ∈

NC

  • i=1

Ci, Aix ∈ Qi, i = 1, . . . , NQ Ci, Qi convex sets, Ai linear For a mere “feasibility problem”: Do alternating projections xk+1 = PCi(xk) i = (k mod NC) + 1 “control sequence”

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 14 of 30

slide-37
SLIDE 37

Convex split feasibility problems

Split feasibility problem (SFP): Find x, such that x ∈

NC

  • i=1

Ci, Aix ∈ Qi, i = 1, . . . , NQ Ci, Qi convex sets, Ai linear For a mere “feasibility problem”: Do alternating projections xk+1 = PCi(xk) i = (k mod NC) + 1 “control sequence” [1933 von Neumann (two subspaces), 1962 Halperin (several subspaces), Dijkstra, Censor, Combettes, Bauschke, Borwein, Deutsch, Lewis, Luke…]

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 14 of 30

slide-38
SLIDE 38

Tackling split feasibility problems

Projecting onto {x | Ax ∈ Q }: Project onto separating hyperplane Hk = {x | Axk − PQ (Axk), Ax − PQ (Axk) ≤ 0} (separates xk from {x | Ax ∈ Q })

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 15 of 30

slide-39
SLIDE 39

Tackling split feasibility problems

Projecting onto {x | Ax ∈ Q }: Project onto separating hyperplane Hk = {x | Axk − PQ (Axk), Ax − PQ (Axk) ≤ 0} (separates xk from {x | Ax ∈ Q })

xk+1 = PCi(xk) for a constraint u ∈ Ci xk+1 = PHk(xk) for a constraint Aix ∈ Qi

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 15 of 30

slide-40
SLIDE 40

Tackling split feasibility problems

Projecting onto {x | Ax ∈ Q }: Project onto separating hyperplane Hk = {x | Axk − PQ (Axk), Ax − PQ (Axk) ≤ 0} (separates xk from {x | Ax ∈ Q })

xk+1 = PCi(xk) for a constraint u ∈ Ci xk+1 = PHk(xk) for a constraint Aix ∈ Qi

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 15 of 30

slide-41
SLIDE 41

Tackling split feasibility problems

Projecting onto {x | Ax ∈ Q }: Project onto separating hyperplane Hk = {x | Axk − PQ (Axk), Ax − PQ (Axk) ≤ 0} (separates xk from {x | Ax ∈ Q })

xk+1 = PCi(xk) for a constraint u ∈ Ci xk+1 = PHk(xk) for a constraint Aix ∈ Qi

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 15 of 30

slide-42
SLIDE 42

Tackling split feasibility problems

Projecting onto {x | Ax ∈ Q }: Project onto separating hyperplane Hk = {x | Axk − PQ (Axk), Ax − PQ (Axk) ≤ 0} (separates xk from {x | Ax ∈ Q })

xk+1 = PCi(xk) for a constraint u ∈ Ci xk+1 = PHk(xk) for a constraint Aix ∈ Qi

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 15 of 30

slide-43
SLIDE 43

Tackling split feasibility problems

Projecting onto {x | Ax ∈ Q }: Project onto separating hyperplane Hk = {x | Axk − PQ (Axk), Ax − PQ (Axk) ≤ 0} (separates xk from {x | Ax ∈ Q })

xk+1 = PCi(xk) for a constraint u ∈ Ci xk+1 = PHk(xk) for a constraint Aix ∈ Qi

Converges to feasible point.

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 15 of 30

slide-44
SLIDE 44

Tackling split feasibility problems

Projecting onto {x | Ax ∈ Q }: Project onto separating hyperplane Hk = {x | Axk − PQ (Axk), Ax − PQ (Axk) ≤ 0} (separates xk from {x | Ax ∈ Q })

xk+1 = PCi(xk) for a constraint u ∈ Ci xk+1 = PHk(xk) for a constraint Aix ∈ Qi

Converges to feasible point. E.g.: Q = {b}: xk+1 = xk + tkAT(Axk − b) minimum norm solution of Ax = b

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 15 of 30

slide-45
SLIDE 45

Towards sparse solutions with Bregman projections

PC(x) = argminy∈C x − y2 orthogonal projection J : X → R convex, z ∈ ∂J(x) Dz(x, y) = J(y) − J(x) − z, y − x Bregman distance

x y DJ(x, y) slope z

Bregman distances Bregman projection: Πz

C(x) = argminy∈C Dz J(x, y)

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 16 of 30

slide-46
SLIDE 46

Bregman projections

Assume J : Rn → R continuous, α-strongly convex ( = ⇒ ∇J∗ is α−1-Lipschitz) Bregman projections onto hyperplanes H = {aTx = β} are simple: if z ∈ ∂J(x) Πz

H(x) = ∇J∗(z − ¯

ta), ¯ t = argmin

t

J∗(z − ta) + tβ Moreover: z − ¯ ta ∈ ∂J(Πz

H(x)) new subgradient in Πz H(x).

RBPSFP: Random Bregman projections for SFP x ∈ ∩Ci, Aix ∈ Qi:

Initialize z0 ∈ ∂J(x0) xk+1 = Πzk

Ci(xk) or xk+1 = Πzk Hi(xk), update zk ∈ ∂J(xk)

random: every index appears infinitely often

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 17 of 30

slide-47
SLIDE 47

Convergence

Theorem: [Schöpfer, L., Wenger 2014] RBPSFP converges to a feasible point ¯ x ∈ C := Ci ∩ {x | Aix ∈ Qi}.

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 18 of 30

slide-48
SLIDE 48

Convergence

Theorem: [Schöpfer, L., Wenger 2014] RBPSFP converges to a feasible point ¯ x ∈ C := Ci ∩ {x | Aix ∈ Qi}. Application to min J(x) s.t. Ax = b Multiple possibilities, e.g.

  • 1. only one “difficult constraint”: Ax ∈ Q = {b}
  • 2. many simple constraints Ci = {aT

i x = bi}

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 18 of 30

slide-49
SLIDE 49

Convergence

Theorem: [Schöpfer, L., Wenger 2014] RBPSFP converges to a feasible point ¯ x ∈ C := Ci ∩ {x | Aix ∈ Qi}. Application to min J(x) s.t. Ax = b Multiple possibilities, e.g.

  • 1. only one “difficult constraint”: Ax ∈ Q = {b}
  • 2. many simple constraints Ci = {aT

i x = bi}

In both cases: Convergence to minimum-J solution

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 18 of 30

slide-50
SLIDE 50

Sparse solutions

J(x) = λx1 does not work - not strongly convex

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 19 of 30

slide-51
SLIDE 51

Sparse solutions

J(x) = λx1 does not work - not strongly convex J(x) = λx1 + 1

2x2: strongly convex with constant 1

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 19 of 30

slide-52
SLIDE 52

Sparse solutions

J(x) = λx1 does not work - not strongly convex J(x) = λx1 + 1

2x2: strongly convex with constant 1

Bregman projection onto hyperplanes H = {aTx = β}: if z ∈ ∂J(x) Πz

H(x) = ∇J∗(z − ¯

ta), ¯ t = argmin

t

J∗(z − ta) + tβ

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 19 of 30

slide-53
SLIDE 53

Sparse solutions

J(x) = λx1 does not work - not strongly convex J(x) = λx1 + 1

2x2: strongly convex with constant 1

Bregman projection onto hyperplanes H = {aTx = β}: if z ∈ ∂J(x) Πz

H(x) = ∇J∗(z − ¯

ta), ¯ t = argmin

t

J∗(z − ta) + tβ ∇J∗(x) = (∂J)−1(x) = Sλ(x): J ∂J

λ

∇J∗

λ

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 19 of 30

slide-54
SLIDE 54

Basic algorithm and special cases:

Variant 1: One difficult constraint Ax = b Variant 2: Many simple constraints aT

r x = br

In general: Block-processing Arx = br

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 20 of 30

slide-55
SLIDE 55

Basic algorithm and special cases:

Variant 1: One difficult constraint Ax = b Variant 2: Many simple constraints aT

r x = br

In general: Block-processing Arx = br Iteration:

  • Calculate

zk+1 = zk − tkAT

r wk

xk+1 = ∇J∗(zk+1) with appropriate stepsize tk (depending on wk and βk)

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 20 of 30

slide-56
SLIDE 56

Basic algorithm and special cases:

Variant 1: One difficult constraint Ax = b Variant 2: Many simple constraints aT

r x = br

In general: Block-processing Arx = br Iteration:

  • Calculate

zk+1 = zk − tkAT

r wk

xk+1 = ∇J∗(zk+1) with appropriate stepsize tk (depending on wk and βk) J(x) = x2

2/2, variant 1.: Landweber iteration

J(x) = x2

2/2, variant 2.: Kaczmarz method

J(x) = λx1 + x2

2/2, variant 1.: Linearized Bregman

J(x) = λx1 + x2

2/2, variant 2.: Sparse Kaczmarz

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 20 of 30

slide-57
SLIDE 57

Inexact stepsizes are allowed

Instead of projecting exactly, it suffices to move close enough Linearized Bregman: tk = Axk − b2 AT(Axk − b)2 ,

  • r

tk ≤ 1 A2 However: To compute exact stepsize, solve one-dimensional piecewise quadratic optimization problem (for J(x) = λx1 + x2

2/2 can be done in O(n log n), usually faster).

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 21 of 30

slide-58
SLIDE 58

Stepsize comparison - linearized Bregman

xn − x† 10−8 10−4 100 100 200 300

constant stepsize exact

A ∈ R1000×2000 Gaussian distriuted entries, x† 20 non-zeros (also Gausian distributed)

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 22 of 30

slide-59
SLIDE 59

The Kaczmarz method Randomization Sparsity Split feasibility problems Convergence rates

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 23 of 30

slide-60
SLIDE 60

Convergence rates for RBPSFP

Theorem (Schöpfer, L. 2018)

RBPSFB with C = Ci ∩ {x | Aix ∈ Qi} converges with a rate E(dist(xk, C)) = O(1/ √ k) if {Ck}k and each {Qi, rg(Ai)} is boundedly linearly regular and J is strongly convex. If, additionally, J is piecewise linear quadratic, then method converges linearly, i.e. E(dist(xk, C)) = O(qk). Proof based on error bounds… Corollary: The randomized sparse Kaczmarz (RaSK) method converges linearly.

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 24 of 30

slide-61
SLIDE 61

Taylores results for randomized sparse Kaczmarz

Theorem (Schöpfer, L. 2018)

For RaSK with exact steps (ERaSK) for a consistent overdetermined system Ax = b it holds that E(xk − x∗2) ≤ (1 − ǫ)k/2

  • 2λˆ

x1 + ˆ x2

2

with ǫ = ˜

σ2

min(A)

2A2

F

|ˆ x|min |ˆ x|min + 2λ where ˜ σmin = min{σmin(AJ) | AJ = 0 submatrix}, |ˆ x|min = min{|ˆ xj| | ˆ xj = 0}.

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 25 of 30

slide-62
SLIDE 62

Randomized sparse Kaczmarz for noisy data

Following [Needell 2010] and [Lai, Yin 2013]:

Theorem

For Ax = bδ with bδ − b2 ≤ δ it holds for RaSK E(xk − x∗2 ≤ (1 − ǫ)k/2

  • 2λˆ

x1 + ˆ x2

2 +

  • 2|ˆ

x|min + 4λ |ˆ x|min δ ˜ σmin(A) and for ERaSK the upper bound is (1 − ǫ)k/2

  • 2λˆ

x1 + ˆ x2

2 +

  • 2|ˆ

x|min + 4λ |ˆ x|min δ ˜ σmin(A)

  • 1 + 4A2,1

δ

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 26 of 30

slide-63
SLIDE 63

Sparsity also helps for overdetermined systems

200 columns, 1000 rows, consistent system Ax = b, unique solution x†, nnz(x†) = 25 1 2 ·104 10−17 10−8 101 k Ax − b/b 1 2 ·104 10−17 10−8 101 k x − ˆ x/ˆ x Black: Randomized Kaczmarz, Red: Randomized sparse Kaczmarz, Green: Exact-step randomized sparse Kaczmarz

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 27 of 30

slide-64
SLIDE 64

Sparsity also helps for overdetermined systems

200 columns, 1000 rows, inconsistent system Ax = b, 10% relative error 2,000 4,000 10−1 100 k Ax − bδ/bδ 2,000 4,000 10−1 100 k x − ˆ x/ˆ x Black: Randomized Kaczmarz, Red: Randomized sparse Kaczmarz, Green: Exact-step randomized sparse Kaczmarz

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 27 of 30

slide-65
SLIDE 65

Randomization also helps

Matrix from fan-beam CT, consistent system Ax = b, unique solution x†, 100 columns, 1164 rows, nnz(x†) = 20 1 2 ·104 10−10 10−5 100 Ax − b/b 1 2 ·104 10−5 100 x − ˆ x/ˆ x Blue: Sparse Kaczmarz, Red: Randomized sparse Kaczmarz

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 28 of 30

slide-66
SLIDE 66

Randomization also helps

Matrix from fan-beam CT, consistent system Ax = b, unique solution x†, 900 columns, 3660 rows, nnz(x†) = 180 0.5 1 1.5 ·105 10−3 10−2 10−1 100 Ax − b/b 0.5 1 1.5 ·105 10−2 10−1 100 x − ˆ x/ˆ x Blue: Sparse Kaczmarz, Red: Randomized sparse Kaczmarz

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 28 of 30

slide-67
SLIDE 67

Conclusion

Randomization gives uniform expected progress, hence convergence rates Randomization usually improves (random reshuffle also works) Extension to sparse solutions simple; exact stepsize matters, though Convergence of RaSK and ERaSK linear Exacts steps faster, lower accuracy for noisy data

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 29 of 30

slide-68
SLIDE 68

References

Dirk A. Lorenz, Frank Schöpfer, and Stephan Wenger, The linearized Bregman method via split feasibility problems: Analysis and generalizations, SIAM Journal on Imaging Sciences 2 (2014), no. 7, 1237–1262, [doi, arXiv]. Dirk A Lorenz, Stephan Wenger, Frank Schöpfer, and Marcus Magnor, A sparse Kaczmarz solver and a linearized Bregman method for

  • nline compressed sensing, Image Processing (ICIP), 2014 IEEE

International Conference on, IEEE, 2014, [doi, arXiv], pp. 1347–1351. Frank Schöpfer and Dirk A. Lorenz, Linear convergence of the Randomized Sparse Kaczmarz method, To appear in Mathematical Programming, 2018, [doi, arXiv].

Feb 9, 2018 Dirk Lorenz Randomized sparse Kaczmarz methods Page 30 of 30