Stable and Efficient Representation Learning with Nonnegativity - - PowerPoint PPT Presentation

stable and efficient representation learning with
SMART_READER_LITE
LIVE PREVIEW

Stable and Efficient Representation Learning with Nonnegativity - - PowerPoint PPT Presentation

Stable and Efficient Representation Learning with Nonnegativity Constraints Tsung-Han Lin and H.T. Kung Unsupervised Representation Learning Layer 3 Representation Layer 2 Representation Encoding Sparse encoder Layer 1 Representation


slide-1
SLIDE 1

Stable and Efficient Representation Learning with Nonnegativity Constraints

Tsung-Han Lin and H.T. Kung

slide-2
SLIDE 2

Unsupervised Representation Learning

Layer 1 Representation Layer 2 Representation Layer 3 Representation

Dictionary Encoding Sparse encoder

(e.g., l1-regularized sparse coding)

Large Dictionary

slide-3
SLIDE 3

Why Sparse Representations?

  • Prior knowledge is better encoded into sparse

representations

– Data is explained by only a few underlying factors – Representations are more linearly separable

Feature B Feature A Simplifies supervised classifier training: sparse representations work well even when labeled samples are few

slide-4
SLIDE 4

Computing Sparse Representations

Sparse approximation:

=

0.5 0.3

+

× ×

slide-5
SLIDE 5

Computing Sparse Representations

Sparse approximation:

  • L1 relaxation approach: good classification accuracy,

but computation is expensive

  • Greedy approach (e.g., orthogonal matching pursuit):

fast, but yields suboptimal classification accuracy

L1-regularized OMP Classification accuracy (%) 78.7 76.0

[Coates 2011]

CIFAR-10 classification with single-layer architecture

slide-6
SLIDE 6

Major Findings

  • Weak stability is the key to OMP’s suboptimal

performance

  • By allowing only additive features (via

nonnegativity constraints), classification with OMP delivers higher accuracy by large margins

  • Competitive classification accuracy with deep

neural networks

slide-7
SLIDE 7

Stability of Representations

Data Input Encoder

?

+ n

slide-8
SLIDE 8

d2 d3 d1 x Select the atom that has the largest correlation with the residual Support set k

Orthogonal Matching Pursuit (OMP)

Select k atoms from a dictionary D that minimize |x-Dz|

d1

slide-9
SLIDE 9

x d2 d3 d1 Support set d1 Select the atom that has the largest correlation with the residual r(1) Estimate the coefficients of the selected atoms by least squares Dz(1) Update the residual using current estimate x k

Orthogonal Matching Pursuit (OMP)

Select k atoms from a dictionary D that minimize |x-Dz|

slide-10
SLIDE 10

d2 d3 d1 Support set d1 d3 Select the atom that has the largest correlation with the residual r(1) Estimate the coefficients of the selected atoms by least squares Dz(1) Update the residual using current estimate k x

Orthogonal Matching Pursuit (OMP)

Select k atoms from a dictionary D that minimize |x-Dz|

slide-11
SLIDE 11

d1 d2

OMP

d1 d2 residual

δ

Nonnegative OMP

Use only additive features by constraining the atoms and coefficients to be nonnegative residual

  • 1. Larger region for

noise tolerance

  • 2. Terminate

without overfitting

“+d1” “-d2”

n n

slide-12
SLIDE 12

Allowing Only Additive Features = +

Cancellation

slide-13
SLIDE 13

Allowing Only Additive Features = +

Enforce nonnegativity to eliminate cancellation On input:

3

  • 2
  • 1

3 2 1 Sign splitting “+” channel “−” channel

On dictionary: On representation:

  • Any nonnegative sparse

coding algorithms

  • We use spherical K-means
  • Encode with nonnegative

OMP (NOMP)

slide-14
SLIDE 14

Evaluate the Stability of Representations

Encoder Rotation angle δ 0.01π 0.02π 0.03π 0.04π OMP 1 0.71 0.54 0.43 0.34 NOMP 1 0.92 0.80 0.68 0.57 Grating A

Encode by OMP/NOMP

Feature dictionary learned from image datasets Representation A Correlation between representation A and B Grating B Rotate by some small angle δ Representation B Measure change by their correlation

slide-15
SLIDE 15

Classification: NOMP vs OMP

Classification accuracy on CIFAR-10 NOMP has ~3% improvement over OMP

slide-16
SLIDE 16

NOMP Outperforms When Fewer Labeled Samples Are Available

Classification accuracy on CIFAR-10 with fewer labeled training samples

slide-17
SLIDE 17

STL-10: 10 classes, 100 labeled samples/class, 96x96 images

64.5%

Hierarchical matching pursuit (2012)

67.9%

This work

61.4%

Maxout network (2013)

60.1%

This work

airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck

CIFAR-100: 100 classes, 500 labeled samples/class, 32x32 images

aquatic mammals, fish, flowers, food containers, fruit and vegetables, household electrical devices, household furniture, insects, large carnivores, large man-made outdoor things, large natural outdoor scenes, large omnivores and herbivores, medium-sized mammals, non-insect invertebrates, people, reptiles, small mammals, trees, vehicles

slide-18
SLIDE 18

Conclusion

  • Greedy sparse encoder is useful, giving a

scalable unsupervised representation learning pipeline that attains state-of-the-art classification performance

  • Proper choice of encoder is critical: the

stability of encoder is a key to the quality of representations