Some Recent Advances in Non-convex Optimization Purushottam Kar - - PowerPoint PPT Presentation

some recent advances
SMART_READER_LITE
LIVE PREVIEW

Some Recent Advances in Non-convex Optimization Purushottam Kar - - PowerPoint PPT Presentation

Some Recent Advances in Non-convex Optimization Purushottam Kar IIT KANPUR Outline of the Talk Recap of Convex Optimization Why Non-convex Optimization? Non-convex Optimization: A Brief Introduction Robust Regression : A


slide-1
SLIDE 1

Some Recent Advances in Non-convex Optimization

Purushottam Kar IIT KANPUR

slide-2
SLIDE 2

Outline of the Talk

  • Recap of Convex Optimization
  • Why Non-convex Optimization?
  • Non-convex Optimization: A Brief Introduction
  • Robust Regression: A Non-convex Approach
  • Robust Regression: Application to Face Recognition
  • Robust PCA: A Sketch and Application to Foreground Extraction in Images
slide-3
SLIDE 3

Recap of Convex Optimization

slide-4
SLIDE 4

Convex Optimization

Convex function Convex set

slide-5
SLIDE 5

Examples

Linear Programming Quadratic Programming Semidefinite Programming

slide-6
SLIDE 6

Applications

Resource Allocation Classification Regression Clustering/Partitioning Signal Processing Dimensionality Reduction

slide-7
SLIDE 7

Techniques

  • Projected (Sub)gradient Methods
  • Stochastic, mini-batch variants
  • Primal, dual, primal-dual approaches
  • Coordinate update techniques
  • Interior Point Methods
  • Barrier methods
  • Annealing methods
  • Other Methods
  • Cutting plane methods
  • Accelerated routines
  • Proximal methods
  • Distributed optimization
  • Derivative-free optimization
slide-8
SLIDE 8

Why Non-convex Optimization?

slide-9
SLIDE 9

Gene Expression Analysis

www.tes.com

DNA micro-array gene expression data

slide-10
SLIDE 10

Recommender Systems

=

𝑜 𝑛 𝑙

slide-11
SLIDE 11

Image Reconstruction and Robust Face Recognition

≈ ≈ = +

0.05 0.90

+

0.05

+

0.01 0.92

+

0.07

= +

0.15 0.65

+

0.20

=

slide-12
SLIDE 12

Image Denoising and Robust Face Recognition

= + = + + + + ⋯

𝑜

slide-13
SLIDE 13

Large Scale Surveillance

  • Foreground-background separation

= = + = +

𝑜 𝑛

www.extremetech.com

slide-14
SLIDE 14

Non Convex Optimization

Sparse Recovery Robust PCA Robust Regression Matrix Completion

slide-15
SLIDE 15

Non-convex Optimization: A Brief Introduction

slide-16
SLIDE 16

Relaxation-based Techniques

  • “Convexify” the feasible set
slide-17
SLIDE 17

Alternating Minimization

Matrix Completion Robust PCA … also Robust Regression, coming up

slide-18
SLIDE 18

Projected Gradient Descent

Sparse Recovery

Top 𝑡 elements by magnitude Perform 𝑙-truncated SVD

slide-19
SLIDE 19

Pursuit and Greedy Methods

Set of “atoms”

Sparse Recovery

slide-20
SLIDE 20

Robust Regression: A Non-convex Approach

slide-21
SLIDE 21

Linear Regression

slide-22
SLIDE 22

Linear Regression

slide-23
SLIDE 23

Linear Regression

image.frompo.com

slide-24
SLIDE 24

Linear Regression with Noise

slide-25
SLIDE 25

Linear Regression with Noise

Residual

slide-26
SLIDE 26

Linear Regression with Noise

slide-27
SLIDE 27

Linear Regression with Noise

slide-28
SLIDE 28

Linear Regression with Noise

slide-29
SLIDE 29

Linear Regression with Corruptions

www.toonvectors.com

slide-30
SLIDE 30

Robust Regression

Corruptions are adversarial, adaptive, but only on a “few” locations

slide-31
SLIDE 31

Robust Regression

Corruptions are adversarial, adaptive, but only on a “few” locations Attempt 1

3

slide-32
SLIDE 32

Robust Regression

Corruptions are adversarial, adaptive, but only on a “few” locations Attempt 1

10

slide-33
SLIDE 33

Robust Regression

Corruptions are adversarial, adaptive, but only on a “few” locations Attempt 1

10

slide-34
SLIDE 34

Robust Regression

Corruptions are adversarial, adaptive, but only on a “few” locations Attempt 2 [Wright and Ma 2010*, Nguyen et al, 2013*]

slide-35
SLIDE 35

Lessons from History

If among these errors are some which appear too large to be admissible, then those equations which produced these errors will be rejected, as coming from too faulty experiments, and the unknowns will be determined by means of the other equations, which will then give much smaller errors

Adrien-Marie Legendre, On the Method of Least Squares, 1805

slide-36
SLIDE 36

Linear Regression with Corruptions

slide-37
SLIDE 37

Linear Regression with Corruptions

slide-38
SLIDE 38

Linear Regression with Corruptions

slide-39
SLIDE 39

Linear Regression with Corruptions

TORRENT-FC

Thresholding Operator-based Robust RegrEssioN meThod [Bhatia et al, 2015]

slide-40
SLIDE 40

TORRENT in Action!

slide-41
SLIDE 41

TORRENT in Action!

slide-42
SLIDE 42

TORRENT in Action!

slide-43
SLIDE 43

TORRENT in Action!

slide-44
SLIDE 44

TORRENT in Action!

slide-45
SLIDE 45

TORRENT in Action!

slide-46
SLIDE 46

TORRENT in Action!

slide-47
SLIDE 47

TORRENT in Action!

slide-48
SLIDE 48

TORRENT in Action!

slide-49
SLIDE 49

TORRENT in Action!

slide-50
SLIDE 50

TORRENT in Action!

slide-51
SLIDE 51

TORRENT in Action!

slide-52
SLIDE 52

TORRENT in Action!

slide-53
SLIDE 53

TORRENT in Action!

slide-54
SLIDE 54

TORRENT in Action!

slide-55
SLIDE 55

TORRENT in Action!

slide-56
SLIDE 56

TORRENT in Action!

slide-57
SLIDE 57

TORRENT in Action!

slide-58
SLIDE 58

TORRENT in Action!

slide-59
SLIDE 59

TORRENT in Action!

slide-60
SLIDE 60

TORRENT in Action!

slide-61
SLIDE 61

TORRENT in Action!

slide-62
SLIDE 62

TORRENT in Action!

slide-63
SLIDE 63

Alt-Min in Theory

Robust against adaptive adversaries has access to data , gold model , and noise Requirement: Data needs to satisfy some “nice” properties Enough data needs to be present Guarantees: TORRENT will recover the gold model if i.e.

Recovery Guarantees

slide-64
SLIDE 64

Alt-Min in Theory

Robust against adaptive adversaries has access to data , gold model , and noise Requirement: Data needs to satisfy some “nice” properties Enough data needs to be present Guarantees: TORRENT will recover the gold model if i.e.

Recovery Guarantees

slide-65
SLIDE 65

Alt-Min in Theory

Linear rate of convergence Suppose each alternation ≡ one step After 𝑈 = log

1 𝜗 time steps

Invariant: at time 𝑢, “active set” s.t

Convergence Rates

slide-66
SLIDE 66

Alt-Min in Theory

Linear rate of convergence Suppose each alternation ≡ one step After 𝑈 = log

1 𝜗 time steps

Invariant: at time 𝑢, “active set” s.t

Convergence Rates

slide-67
SLIDE 67

Alt-Min in Theory

Linear rate of convergence Suppose each alternation ≡ one step After 𝑈 = log

1 𝜗 time steps

Invariant: at time 𝑢, “active set” s.t

Convergence Rates

slide-68
SLIDE 68

Alt-Min in Theory

Linear rate of convergence Suppose each alternation ≡ one step After 𝑈 = log

1 𝜗 time steps

Invariant: at time 𝑢, “active set” s.t

Convergence Rates

slide-69
SLIDE 69

Alt-Min in Theory

Linear rate of convergence Suppose each alternation ≡ one step After 𝑈 = log

1 𝜗 time steps

Invariant: at time 𝑢, “active set” s.t

Convergence Rates

slide-70
SLIDE 70

Alt-Min in Theory

Linear rate of convergence Suppose each alternation ≡ one step After 𝑈 = log

1 𝜗 time steps

Invariant: at time 𝑢, “active set” s.t

Convergence Rates

slide-71
SLIDE 71

Alt-Min in Theory

Linear rate of convergence Suppose each alternation ≡ one step After 𝑈 = log

1 𝜗 time steps

Invariant: at time 𝑢, “active set” s.t

Convergence Rates

slide-72
SLIDE 72

Alt-Min in Theory

Linear rate of convergence Suppose each alternation ≡ one step After 𝑈 = log

1 𝜗 time steps

Invariant: at time 𝑢, “active set” s.t

Convergence Rates

slide-73
SLIDE 73

Alt-Min in Practice

[Bhatia et al 2015]

Quality of Recovery

slide-74
SLIDE 74

Alt-Min in Practice

[Bhatia et al 2015]

Speed of Recovery

slide-75
SLIDE 75

Robust Regression: Application to Face Recognition

Extended Yale B dataset, 38 people, 800 images

slide-76
SLIDE 76

Face Recognition

10% noise 30% noise 50% noise 70% noise [Bhatia et al 2015]

slide-77
SLIDE 77
slide-78
SLIDE 78

Robust PCA: A Sketch and Application to Foreground Extraction in Images

slide-79
SLIDE 79

The Alternating Projection Procedure

[Netrapalli et al 2014]

slide-80
SLIDE 80
slide-81
SLIDE 81

Concluding Comments Non-convex optimization is an exciting area Widespread applications

  • Much better modelling of problems
  • Much more scalable algorithms
  • Provable guarantees

So …

  • Full of opportunities
  • Full of challenges
slide-82
SLIDE 82

Acknowledgements

Kush Bhatia

Microsoft Research

Prateek Jain

Microsoft Research

Ambuj Tewari

  • U. Michigan, Ann Arbor

Portions of this talk were based on joint work with

http://research.microsoft.com/en-us/projects/altmin/default.aspx

slide-83
SLIDE 83

The Data Sciences Gang@IITK

Arnab Bhattacharya Medha Atre Sumit Ganguly Purushottam Kar Harish Karnick Vinay Namboodiri Piyush Rai Indranil Saha Gaurav Sharma Sandeep Shukla

slide-84
SLIDE 84

Our Strengths

Machine Learning Databases, Data Mining Online, Streaming Algorithms Vision, Image Processing Cyber-physical Systems

slide-85
SLIDE 85

Questions?

slide-86
SLIDE 86
  • TORRENT indeed performs Alt-Min
  • Two variables in TORRENT – active set and model
  • encodes the complement of the corruption vector
  • TORRENT alternates between
  • Fixing model and choosing active set
  • Fixing active set and choosing model
  • Both steps reduce the residual as much as possible

TORRENT as an Alt-Min Procedure

slide-87
SLIDE 87

TORRENT-GD

Linear Regression with Corruptions

Thresholding Operator-based Robust RegrEssioN meThod [Bhatia et al, 2015]

slide-88
SLIDE 88

TORRENT-HYB

Linear Regression with Corruptions

Thresholding Operator-based Robust RegrEssioN meThod [Bhatia et al, 2015]