compressed sensing meets machine learning classification
play

Compressed Sensing Meets Machine Learning - Classification of - PowerPoint PPT Presentation

Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion Compressed Sensing Meets Machine Learning - Classification of Mixture Subspace Models via Sparse Representation Allen Y. Yang


  1. Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion Compressed Sensing Meets Machine Learning - Classification of Mixture Subspace Models via Sparse Representation Allen Y. Yang <yang@eecs.berkeley.edu> Feb. 25, 2008. UC Berkeley Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>

  2. Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion What is Sparsity Sparsity A signal is sparse if most of its coefficients are (approximately) zero. (a) Harmonic functions (b) Magnitude spectrum Figure: 2-D DCT transform. Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>

  3. Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion Sparsity in spatial domain gene microarray data [Drmanac et al. 1993] Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>

  4. Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion Sparsity in human visual cortex [Olshausen & Field 1997, Serre & Poggio 2006] Feed-forward : No iterative feedback loop. 1 Redundancy : Average 80-200 neurons for each feature representation. 2 Recognition : Information exchange between stages is not about individual neurons, but 3 rather how many neurons as a group fire together. Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>

  5. Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion Sparsity and ℓ 1 -Minimization “Black gold” age [Claerbout & Muir 1973, Taylor, Banks & McCoy 1979] Figure: Deconvolution of spike train. Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>

  6. Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion Sparse Support Estimators Sparse support estimator [Donoho 1992, Meinshausen & Buhlmann 2006, Yu 2006, Wainwright 2006, Ramchandran 2007, Gastpar 2007] Basis pursuit [Chen & Donoho 1999]: Given y = A x and x unknown, x ∗ = arg min � x � 1 , subject to y = A x The Lasso (least absolute shrinkage and selection operator) [Tibshirani 1996] x ∗ = arg min � y − A x � 2 , subject to � x � 1 ≤ k Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>

  7. Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion Taking Advantage of Sparsity What generates sparsity? ( d’apr` es Emmanuel Cand` es) Measure first, analyze later. Curse of dimensionality. Numerical analysis : sparsity reduces cost for storage and computation. 1 Regularization in classification : 2 (a) decision boundary (b) maximal margin Figure: Linear support vector machine (SVM) Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>

  8. Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion Our Contributions Classification via compressed sensing 1 Performance in face recognition 2 Extensions 3 Outlier rejection Occlusion compensation Distributed pattern recognition in sensor networks. 4 Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>

  9. Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion Problem Formulation in Face Recognition Notations 1 Training: For K classes, collect training samples { v 1 , 1 , · · · , v 1 , n 1 } , · · · , { v K , 1 , · · · , v K , nK } ∈ R D . Test: Present a new y ∈ R D , solve for label( y ) ∈ [1 , 2 , · · · , K ]. Construct R D sample space via stacking 2 Figure: For images, assume 3-channel 640 × 480 image, D = 3 · 640 · 480 ≈ 1 e 6. Assume y belongs to Class i [Belhumeur et al. 1997, Basri & Jacobs 2003] 3 y = α i , 1 v i , 1 + α i , 2 v i , 2 + · · · + α i , n 1 v i , n i , = A i α i , where A i = [ v i , 1 , v i , 2 , · · · , v i , n i ]. Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>

  10. Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion Nevertheless, i is the variable we need to solve. 1 Global representation: α 1   α 2 .  , = [ A 1 , A 2 , · · · , A K ] y .  . α K = A x 0 . Over-determined system: A ∈ R D × n , where D ≫ n = n 1 + · · · + n K . 2 x 0 encodes membership of y : If y belongs to Subject i , x 0 = [ 0 ··· 0 α i 0 ··· 0 ] T ∈ R n . Problems to face Solving for x 0 in R D is intractable . 1 True solution x 0 is sparse : Average K terms non-zero. Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>

  11. Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion Dimensionality Redunction Construct linear projection R ∈ R d × D , d is the feature dimension , d ≪ D . 1 y . = R y = RA x 0 = ˜ A x 0 ∈ R d . ˜ ˜ A ∈ R d × n , but x 0 is unchanged. Holistic features 2 Eigenfaces [Turk 1991] Fisherfaces [Belhumeur 1997] Laplacianfaces [He 2005] Partial features 3 Unconventional features 4 Downsampled faces Random projections Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>

  12. Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion ℓ 0 -Minimization Solving for sparsest solution via ℓ 0 -Minimization 1 y = ˜ x 0 = arg min � x � 0 s.t. ˜ A x . x � · � 0 simply counts the number of nonzero terms. ℓ 0 -Ball 2 ℓ 0 -ball is not convex. ℓ 0 -minimization is NP-hard. Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>

  13. Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion ℓ 1 / ℓ 0 Equivalence Compressed sensing : If x 0 is sparse enough , ℓ 0 -minimization is equivalent to 1 y = ˜ ( P 1 ) min � x � 1 s.t. ˜ A x . � x � 1 = | x 1 | + | x 2 | + · · · + | x n | . ℓ 1 -Ball 2 ℓ 1 -Minimization is convex. Solution equal to ℓ 0 -minimization. ℓ 1 / ℓ 0 Equivalence: [Donoho 2002, 2004; Candes et al. 2004; Baraniuk 2006] 3 y = ˜ A x 0 , there exists equivalence breakdown point (EBP) ρ (˜ Given ˜ A ), if � x 0 � 0 < ρ : ℓ 1 -solution is unique x 1 = x 0 Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>

  14. Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion ℓ 1 -Minimization Routines Matching pursuit [Mallat 1993] Find most correlated vector v i in ˜ A with y : i = arg max � y , v j � . 1 ˆ A ← ˜ ˜ i , x i ← � y , v i � , y ← y − x i v i . A 2 Repeat until � y � < ǫ . 3 Basis pursuit [Chen 1998] 1 Assume x 0 is m -sparse. Select m linearly independent vectors B m in ˜ 2 A as a basis x m = B † m y . Repeat swapping one basis vector in B m with another vector in ˜ A if improve � y − B m x m � . 3 If � y − B m x m � 2 < ǫ , stop. 4 y = ˜ A x 0 + z ∈ R d , where � z � 2 < ǫ Quadratic solvers : ˜ arg min {� x � 1 + λ � y − ˜ x ∗ = A x � 2 } [Lasso, Second-order cone programming]: More expensive. Matlab Toolboxes ℓ 1 -Magic by Cand` es at Caltech. SparseLab by Donoho at Stanford. cvx by Boyd at Stanford. Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>

  15. Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion Classification Project x 1 onto face subspaces: 1  0  0 α 1     0 0 α 2  , δ 2 ( x 1 ) =  , · · · , δ K ( x 1 ) = . δ 1 ( x 1 ) = .  . (1) .   .  .  .  . . . 0 0 α K y − ˜ Define residual r i = � ˜ A δ i ( x 1 ) � 2 for Subject i : 2 id( y ) = arg min i =1 , ··· , K { r i } Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>

  16. Introduction Classification via Sparse Representation Distributed Pattern Recognition Conclusion AR Database 100 Subjects (Illumination and Expression Variance) Table: I. Nearest Neighbor Table: II. Nearest Subspace Dimension 30 54 130 540 30 54 130 540 Eigen [%] 68.1 74.8 79.3 80.5 64.1 77.1 82 85.1 Laplacian [%] 73.1 77.1 83.8 89.7 66 77.5 84.3 90.3 Random [%] 56.7 63.7 71.4 75 59.2 68.2 80 83.3 Down [%] 51.7 60.9 69.2 73.7 56.2 67.7 77 82.1 Fisher [%] 83.4 86.8 N/A N/A 80.3 85.8 N/A N/A Table: IV. ℓ 1 -Minimization Table: III. Linear SVM Dimension 30 54 130 540 30 54 130 540 Eigen [%] 73 84.3 89 92 71.1 80 85.7 92 Laplacian [%] 73.4 85.8 90.8 95.7 73.7 84.7 91 94.3 Random [%] 54.1 70.8 81.6 88.8 57.8 75.5 87.6 94.7 Down [%] 51.4 73 83.4 90.3 46.8 67 84.6 93.9 Fisher [%] 86.3 93.3 N/A N/A 87 92.3 N/A N/A Allen Y. Yang Compressed Sensing Meets Machine Learning <yang@eecs.berkeley.edu>

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend