Compressed Sensing Meets Machine Learning - Classification of - - PowerPoint PPT Presentation

compressed sensing meets machine learning classification
SMART_READER_LITE
LIVE PREVIEW

Compressed Sensing Meets Machine Learning - Classification of - - PowerPoint PPT Presentation

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion Compressed Sensing Meets Machine Learning - Classification of Mixture Subspace Models via Sparse Representation Allen Y. Yang


slide-1
SLIDE 1

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion

Compressed Sensing Meets Machine Learning

  • Classification of Mixture Subspace Models via Sparse Representation

Allen Y. Yang <yang@eecs.berkeley.edu>

  • Feb. 25, 2008. UC Berkeley

Allen Y. Yang <yang@eecs.berkeley.edu> Compressed Sensing Meets Machine Learning

slide-2
SLIDE 2

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion

What is Sparsity

Sparsity A signal is sparse if most of its coefficients are (approximately) zero.

(a) Harmonic functions (b) Magnitude spectrum

Figure: 2-D DCT transform.

Allen Y. Yang <yang@eecs.berkeley.edu> Compressed Sensing Meets Machine Learning

slide-3
SLIDE 3

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion

Sparsity in spatial domain

gene microarray data [Drmanac et al. 1993]

Allen Y. Yang <yang@eecs.berkeley.edu> Compressed Sensing Meets Machine Learning

slide-4
SLIDE 4

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion

Sparsity in human visual cortex [Olshausen & Field 1997, Serre & Poggio 2006]

1

Feed-forward: No iterative feedback loop.

2

Redundancy: Average 80-200 neurons for each feature representation.

3

Recognition: Information exchange between stages is not about individual neurons, but rather how many neurons as a group fire together.

Allen Y. Yang <yang@eecs.berkeley.edu> Compressed Sensing Meets Machine Learning

slide-5
SLIDE 5

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion

Sparsity and ℓ1-Minimization

“Black gold” age [Claerbout & Muir 1973, Taylor, Banks & McCoy 1979]

Figure: Deconvolution of spike train.

Allen Y. Yang <yang@eecs.berkeley.edu> Compressed Sensing Meets Machine Learning

slide-6
SLIDE 6

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion

Sparse Support Estimators

Sparse support estimator [Donoho 1992, Meinshausen & Buhlmann 2006, Yu 2006, Wainwright 2006, Ramchandran 2007, Gastpar 2007] Basis pursuit [Chen & Donoho 1999]: Given y = Ax and x unknown, x∗ = arg min x1, subject to y = Ax The Lasso (least absolute shrinkage and selection operator) [Tibshirani 1996] x∗ = arg min y − Ax2, subject to x1 ≤ k

Allen Y. Yang <yang@eecs.berkeley.edu> Compressed Sensing Meets Machine Learning

slide-7
SLIDE 7

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion

Taking Advantage of Sparsity

What generates sparsity? (d’apr` es Emmanuel Cand` es) Measure first, analyze later. Curse of dimensionality.

1

Numerical analysis: sparsity reduces cost for storage and computation.

2

Regularization in classification:

(a) decision boundary (b) maximal margin

Figure: Linear support vector machine (SVM)

Allen Y. Yang <yang@eecs.berkeley.edu> Compressed Sensing Meets Machine Learning

slide-8
SLIDE 8

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion

Our Contributions

1

Classification via compressed sensing

2

Performance in face recognition

3

Extensions

Outlier rejection Occlusion compensation

4

Distributed pattern recognition in sensor networks.

Allen Y. Yang <yang@eecs.berkeley.edu> Compressed Sensing Meets Machine Learning

slide-9
SLIDE 9

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion

Problem Formulation in Face Recognition

1

Notations

Training: For K classes, collect training samples {v1,1, · · · , v1,n1}, · · · , {vK,1, · · · , vK,nK } ∈ RD. Test: Present a new y ∈ RD, solve for label(y) ∈ [1, 2, · · · , K].

2

Construct RD sample space via stacking

Figure: For images, assume 3-channel 640 × 480 image, D = 3 · 640 · 480 ≈ 1e6.

3

Assume y belongs to Class i [Belhumeur et al. 1997, Basri & Jacobs 2003] y = αi,1vi,1 + αi,2vi,2 + · · · + αi,n1vi,ni , = Aiαi, where Ai = [vi,1, vi,2, · · · , vi,ni ].

Allen Y. Yang <yang@eecs.berkeley.edu> Compressed Sensing Meets Machine Learning

slide-10
SLIDE 10

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion 1

Nevertheless, i is the variable we need to solve. Global representation: y = [A1, A2, · · · , AK ]  

α1 α2

. . .

αK

  , = Ax0.

2

Over-determined system: A ∈ RD×n, where D ≫ n = n1 + · · · + nK . x0 encodes membership of y: If y belongs to Subject i, x0 = [ 0 ··· 0 αi 0 ··· 0 ]T ∈ Rn. Problems to face Solving for x0 in RD is intractable. True solution x0 is sparse: Average

1 K terms non-zero.

Allen Y. Yang <yang@eecs.berkeley.edu> Compressed Sensing Meets Machine Learning

slide-11
SLIDE 11

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion

Dimensionality Redunction

1

Construct linear projection R ∈ Rd×D, d is the feature dimension, d ≪ D. ˜ y . = Ry = RAx0 = ˜ Ax0 ∈ Rd. ˜ A ∈ Rd×n, but x0 is unchanged.

2

Holistic features

Eigenfaces [Turk 1991] Fisherfaces [Belhumeur 1997] Laplacianfaces [He 2005]

3

Partial features

4

Unconventional features

Downsampled faces Random projections

Allen Y. Yang <yang@eecs.berkeley.edu> Compressed Sensing Meets Machine Learning

slide-12
SLIDE 12

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion

ℓ0-Minimization

1

Solving for sparsest solution via ℓ0-Minimization x0 = arg min

x

x0 s.t. ˜ y = ˜ Ax. · 0 simply counts the number of nonzero terms.

2

ℓ0-Ball

ℓ0-ball is not convex. ℓ0-minimization is NP-hard.

Allen Y. Yang <yang@eecs.berkeley.edu> Compressed Sensing Meets Machine Learning

slide-13
SLIDE 13

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion

ℓ1/ℓ0 Equivalence

1

Compressed sensing: If x0 is sparse enough, ℓ0-minimization is equivalent to (P1) min x1 s.t. ˜ y = ˜ Ax. x1 = |x1| + |x2| + · · · + |xn|.

2

ℓ1-Ball

ℓ1-Minimization is convex. Solution equal to ℓ0-minimization.

3

ℓ1/ℓ0 Equivalence: [Donoho 2002, 2004; Candes et al. 2004; Baraniuk 2006] Given ˜ y = ˜ Ax0, there exists equivalence breakdown point (EBP) ρ(˜ A), if x00 < ρ:

ℓ1-solution is unique x1 = x0

Allen Y. Yang <yang@eecs.berkeley.edu> Compressed Sensing Meets Machine Learning

slide-14
SLIDE 14

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion

ℓ1-Minimization Routines

Matching pursuit [Mallat 1993]

1

Find most correlated vector vi in ˜ A with y: i = arg max y, vj.

2

˜ A ← ˜ A

ˆ i, xi ← y, vi, y ← y − xivi. 3

Repeat until y < ǫ.

Basis pursuit [Chen 1998]

1

Assume x0 is m-sparse.

2

Select m linearly independent vectors Bm in ˜ A as a basis xm = B†

my. 3

Repeat swapping one basis vector in Bm with another vector in ˜ A if improve y − Bmxm.

4

If y − Bmxm2 < ǫ, stop.

Quadratic solvers: ˜ y = ˜ Ax0 + z ∈ Rd, where z2 < ǫ x∗ = arg min{x1 + λy − ˜ Ax2} [Lasso, Second-order cone programming]: More expensive. Matlab Toolboxes ℓ1-Magic by Cand` es at Caltech. SparseLab by Donoho at Stanford. cvx by Boyd at Stanford.

Allen Y. Yang <yang@eecs.berkeley.edu> Compressed Sensing Meets Machine Learning

slide-15
SLIDE 15

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion

Classification

1

Project x1 onto face subspaces: δ1(x1) =  

α1

. . .   , δ2(x1) =  

α2

. . .   , · · · , δK (x1) =    . . .

αK

   . (1)

2

Define residual ri = ˜ y − ˜ Aδi(x1)2 for Subject i:

id(y) = arg mini=1,··· ,K {ri}

Allen Y. Yang <yang@eecs.berkeley.edu> Compressed Sensing Meets Machine Learning

slide-16
SLIDE 16

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion

AR Database 100 Subjects (Illumination and Expression Variance)

Table: I. Nearest Neighbor Dimension 30 54 130 540 Eigen [%] 68.1 74.8 79.3 80.5 Laplacian [%] 73.1 77.1 83.8 89.7 Random [%] 56.7 63.7 71.4 75 Down [%] 51.7 60.9 69.2 73.7 Fisher [%] 83.4 86.8 N/A N/A Table: II. Nearest Subspace 30 54 130 540 64.1 77.1 82 85.1 66 77.5 84.3 90.3 59.2 68.2 80 83.3 56.2 67.7 77 82.1 80.3 85.8 N/A N/A Table: III. Linear SVM Dimension 30 54 130 540 Eigen [%] 73 84.3 89 92 Laplacian [%] 73.4 85.8 90.8 95.7 Random [%] 54.1 70.8 81.6 88.8 Down [%] 51.4 73 83.4 90.3 Fisher [%] 86.3 93.3 N/A N/A Table: IV. ℓ1-Minimization 30 54 130 540 71.1 80 85.7 92 73.7 84.7 91 94.3 57.8 75.5 87.6 94.7 46.8 67 84.6 93.9 87 92.3 N/A N/A

Allen Y. Yang <yang@eecs.berkeley.edu> Compressed Sensing Meets Machine Learning

slide-17
SLIDE 17

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion

Sparsity vs. Non-sparsity: ℓ1 and SVM decisively outperform NN and NS.

1

Our framework seeks sparsity in representation of y.

2

SVM seeks sparsity in decision boundaries on A = [v1, · · · , vn].

3

NN and NS do not enforce sparsity. ℓ1-Minimization vs. SVM: Performance of SVM depends on the choice of features.

1

Random project performs poorly with SVMs.

2

ℓ1-Minimization guarantees performance convergence with different features.

3

At lower-dimensional space, Fisher features outperform.

Table: III. Linear SVM Dimension 30 54 130 540 Eigen [%] 73 84.3 89 92 Laplacian [%] 73.4 85.8 90.8 95.7 Random [%] 54.1 70.8 81.6 88.8 Down [%] 51.4 73 83.4 90.3 Fisher [%] 86.3 93.3 N/A N/A Table: IV. ℓ1-Minimization 30 54 130 540 71.1 80 85.7 92 73.7 84.7 91 94.3 57.8 75.5 87.6 94.7 46.8 67 84.6 93.9 87 92.3 N/A N/A

Allen Y. Yang <yang@eecs.berkeley.edu> Compressed Sensing Meets Machine Learning

slide-18
SLIDE 18

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion

Randomfaces

Blessing of Dimensionality [Donoho 2000] In high-dimensional data space RD, with overwhelming probability, ℓ1/ℓ0 equivalence holds for random projection R. Unconventional properties:

1

Domain independent!

2

Data independent!

3

Fast to generate and compute!

Reference: Yang et al. Feature selection in face recognition: A sparse representation perspective. Berkeley Tech Report, 2007.

Allen Y. Yang <yang@eecs.berkeley.edu> Compressed Sensing Meets Machine Learning

slide-19
SLIDE 19

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion

Variation: Outlier Rejection

ℓ1-Coefficients for invalid images Outlier Rejection When ℓ1-solution is not sparse or concentrated to one subspace, the test sample is invalid. Sparsity Concentration Index: SCI(x) . = K · maxi δi(x)1/x1 − 1 K − 1 ∈ [0, 1].

Allen Y. Yang <yang@eecs.berkeley.edu> Compressed Sensing Meets Machine Learning

slide-20
SLIDE 20

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion

Variation: Occlusion Compensation

1

Sparse representation + sparse error y = Ax + e

2

Occlusion compensation: y = A | I x e

  • = Bw

Reference: Wright et al. Robust face recognition via sparse representation. UIUC Tech Report, 2007.

Allen Y. Yang <yang@eecs.berkeley.edu> Compressed Sensing Meets Machine Learning

slide-21
SLIDE 21

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion

Distributed Pattern Recognition

Figure: d-Oracle: Distributed object recognition via camera wireless networks.

Key components:

1

Each sensor only observes partial profile of the event: Demand a global classification framework.

2

Individual sensor obtains limited classification ability: Sensors become active only when certain events are locally detected.

3

The network configuration is dynamic: Global classifier needs to adapt to change of active sensors.

Allen Y. Yang <yang@eecs.berkeley.edu> Compressed Sensing Meets Machine Learning

slide-22
SLIDE 22

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion

Problem Formulation for Distributed Action Recognition

Architecture 8 sensors distributed on human body. Location of the sensors are given and fixed. Each sensor carries triaxial accelerometer and biaxial gyroscope. Sampling frequency: 20Hz.

Figure: Readings from 8 x-axis accelerometers and x-axis gyroscopes for a stand-kneel-stand sequence.

Allen Y. Yang <yang@eecs.berkeley.edu> Compressed Sensing Meets Machine Learning

slide-23
SLIDE 23

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion

Challenges for Distributed Action Recognition

1

Simultaneous segmentation and classification.

2

Individual sensors not sufficient to classify all human actions.

3

Simulate sensor failure and network congestion by different subsets of active sensors.

4

Identity independence: The prior examples of the subject for testing are excluded as part of training data.

Figure: Same actions performed by two subjects.

Allen Y. Yang <yang@eecs.berkeley.edu> Compressed Sensing Meets Machine Learning

slide-24
SLIDE 24

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion

Mixture Subspace Model for Distributed Action Recognition

1

Training samples are segmented manually with correct labels.

2

On each sensor node i, normalize the vector form (via stacking) vi = [x(1), · · · , x(h), y(1), · · · , y(h), z(1), · · · , z(h), θ(1), · · · , θ(h), ρ(1), · · · , ρ(h)]T ∈ R5h

3

Full body motion Training sample: v = v1 . . .

v8

  • Test sample: y =

 

y1

. . .

y8

  ∈ R8·5h

4

Mixture subspace model y =  

y1

. . .

y8

  =   v1 . . .

v8

  • 1

, · · · , v1 . . .

v8

  • n

  x = Ax.

Allen Y. Yang <yang@eecs.berkeley.edu> Compressed Sensing Meets Machine Learning

slide-25
SLIDE 25

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion

Localized Classifiers

Distributed Sparse Representation  

y1

. . .

y8

  =   v1 . . .

v8

  • 1

, · · · , v1 . . .

v8

  • n

  x ⇔     

y1=(v1,1,··· ,v1,n)x

. . .

y8=(v8,1,··· ,v8,n)x

On each sensor node i:

1

Given a (long) test sequence at time t, apply multiple duration hypotheses: yi ∈ R5h.

2

Choose Fisher features Ri ∈ R10×5h: ˜ yi = Riyi = RiAix = ˜ Aix ∈ R10

Allen Y. Yang <yang@eecs.berkeley.edu> Compressed Sensing Meets Machine Learning

slide-26
SLIDE 26

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion

Localized Classifiers

1

Equivalently, define R′

i = (0 · · · Ri · · · 0)

˜ yi = (0 · · · Ri · · · 0)  

y1

. . .

y8

  = (0 · · · Ri · · · 0)   v1 . . .

v8

  • 1

, · · · , v1 . . .

v8

  • n

  x = R′

i Ax ∈ R10

2

For all segmentation hypotheses, apply sparsity concentration index (SCI) threshold σ1:

(a) valid segmentation (b) invalid segmentation

Local Sparsity Threshold σ1 If SCI(x) > σ1, sensor i becomes active and transmits ˜ yi ∈ R10. ˜ yi provides a segmentation hypothesis at time t and length hi.

Allen Y. Yang <yang@eecs.berkeley.edu> Compressed Sensing Meets Machine Learning

slide-27
SLIDE 27

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion

Adaptive Global Classifier

Adaptive classification for a subset of active sensors (Suppose 1, . . . , L at time t and hi) Define global feature matrix R′ =  

R1 ··· 0 ··· 0

. . . ... . . . . . .

··· RL ··· 0

 :  

˜ y1

. . .

˜ yL

  = R′  

y1

. . .

y8

  = R′  

A1

. . .

A8

  x = R′Ax Global segmentation: Regardless of L active sensors, given a global threshold σ2, If SCI(x) > σ2, accept y as global segmentation with label given by x. Distributed Classification via Compressed Sensing

1

Reformulate adaptive classification via feature matrix R′: Local: R′ = (0 · · · Ri · · · 0) Global: R′ =  

R1 ··· 0 ··· 0

. . . ... . . . . . .

··· RL ··· 0

  ⇔ R′y = R′Ax

2

The representation x and training matrix A remain invariant.

3

Segmentation, recognition, and outlier rejection are unified on x.

Allen Y. Yang <yang@eecs.berkeley.edu> Compressed Sensing Meets Machine Learning

slide-28
SLIDE 28

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion

Experiment

Algorithm only depends on two parameters: (σ1, σ2). Data communications between sensors and station are 10-D action features. Precision vs Recall:

Sensors 2 7 2,7 1,2,7 1- 3, 7,8 1- 8 Prec [%] 89.8 94.6 94.4 92.8 94.6 98.8 Rec [%] 65 61.5 82.5 80.6 89.5 94.2

Confusion Tables:

(a) Sensor 1-8 (b) Sensor 7 Reference: Yang et al. Distributed Segmentation and Classification of Human Actions Using a Wearable Motion Sensor Network. Berkeley Tech Report, 2007.

Allen Y. Yang <yang@eecs.berkeley.edu> Compressed Sensing Meets Machine Learning

slide-29
SLIDE 29

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion

Conclusion

1

Sparsity is important for classification of HD data.

2

A new recognition framework via compressed sensing.

3

In HD feature space, choosing an “optimal” feature becomes not significant.

4

Randomfaces, outliers, occlusion.

5

Distributed pattern recognition in body sensor networks.

Allen Y. Yang <yang@eecs.berkeley.edu> Compressed Sensing Meets Machine Learning

slide-30
SLIDE 30

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion

Future Directions

Distributed camera networks Biosensor networks in health care

Allen Y. Yang <yang@eecs.berkeley.edu> Compressed Sensing Meets Machine Learning

slide-31
SLIDE 31

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion

Acknowledgments

Collaborators Berkeley: Shankar Sastry, Ruzena Bajcsy UIUC: Yi Ma UT-Dallas: Roozbeh Jafari MATLAB Toolboxes ℓ1-Magic by Cand` es at Caltech. SparseLab by Donoho at Stanford. cvx by Boyd at Stanford. References Robust face recognition via sparse representation. Submitted to PAMI, 2008. Distributed segmentation and classification of human actions using a wearable motion sensor

  • network. Berkeley Tech Report 2007.

Allen Y. Yang <yang@eecs.berkeley.edu> Compressed Sensing Meets Machine Learning