Surfing: Iterative Optimization Over Incrementally Trained Deep - - PowerPoint PPT Presentation

surfing iterative optimization over incrementally trained
SMART_READER_LITE
LIVE PREVIEW

Surfing: Iterative Optimization Over Incrementally Trained Deep - - PowerPoint PPT Presentation

Surfing: Iterative Optimization Over Incrementally Trained Deep Networks Ganlin Song, Zhou Fan, John Lafferty Department of Statistics and Data Science Yale University Ganlin Song, Zhou Fan, John Lafferty Surfing: Iterative Optimization Over


slide-1
SLIDE 1

Surfing: Iterative Optimization Over Incrementally Trained Deep Networks

Ganlin Song, Zhou Fan, John Lafferty

Department of Statistics and Data Science Yale University

Ganlin Song, Zhou Fan, John Lafferty Surfing: Iterative Optimization Over Incrementally Trained Deep Networks 1 / 6

slide-2
SLIDE 2

Background

We consider inverting a trained generative network G by min

x f(x) = min x G(x) − y2

Generative Model Invert a Generator

Ganlin Song, Zhou Fan, John Lafferty Surfing: Iterative Optimization Over Incrementally Trained Deep Networks 2 / 6

slide-3
SLIDE 3

Background

  • Compressed sensing framework: observe z = Ay + ǫ; recover y by

(Bora, Jalal, Price & Dimakis 2017)

min

x f(x) = min x AG(x) − z2

Ganlin Song, Zhou Fan, John Lafferty Surfing: Iterative Optimization Over Incrementally Trained Deep Networks 3 / 6

slide-4
SLIDE 4

Background

  • Compressed sensing framework: observe z = Ay + ǫ; recover y by

(Bora, Jalal, Price & Dimakis 2017)

min

x f(x) = min x AG(x) − z2

  • f(x) is non-convex; gradient descent not guaranteed to reach global optimum

Ganlin Song, Zhou Fan, John Lafferty Surfing: Iterative Optimization Over Incrementally Trained Deep Networks 3 / 6

slide-5
SLIDE 5

Motivation

Landscape of x → −fθ(x) = −Gθ(x) − y2, as weights θ are trained

Ganlin Song, Zhou Fan, John Lafferty Surfing: Iterative Optimization Over Incrementally Trained Deep Networks 4 / 6

slide-6
SLIDE 6

Algorithm

Intuition

  • The landscape for initial random network is “nice”
  • Initialize with random network and track optimum for intermediate networks

Surfing Algorithm

  • Obtain a sequence of parameters θ0, θ1, . . . , θT during training
  • Optimize empirical risk function fθ0, fθ1, . . . , fθT iteratively using gradient descent
  • For each t ∈ {1, . . . , T}, initialize gradient descent at the solution from time t − 1

Ganlin Song, Zhou Fan, John Lafferty Surfing: Iterative Optimization Over Incrementally Trained Deep Networks 5 / 6

slide-7
SLIDE 7

Theory and Experiments

Theoretical Results

1

If Gθ has random parameters, all critical points of fθ(x) belong to a small neighborhood around 0 with high probability (Builds on Hand & Voroninski 2017)

2

Under certain conditions, modified surfing can track the minimizer

Ganlin Song, Zhou Fan, John Lafferty Surfing: Iterative Optimization Over Incrementally Trained Deep Networks 6 / 6

slide-8
SLIDE 8

Theory and Experiments

Theoretical Results

1

If Gθ has random parameters, all critical points of fθ(x) belong to a small neighborhood around 0 with high probability (Builds on Hand & Voroninski 2017)

2

Under certain conditions, modified surfing can track the minimizer

Experiments

For DCGAN trained on Fashion-MNIST

minx Gθ(x) − Gθ(x0)2 minx AGθ(x) − Ay2

Ganlin Song, Zhou Fan, John Lafferty Surfing: Iterative Optimization Over Incrementally Trained Deep Networks 6 / 6