a two stage approach for learning a sparse model with
play

A Two-Stage Approach for Learning a Sparse Model with Sharp Excess - PowerPoint PPT Presentation

A Two-Stage Approach for Learning a Sparse Model with Sharp Excess Risk Analysis Zhe Li , Tianbao Yang ,Lijun Zhang , Rong Jin The University of Iowa, Nanjing University, Alibaba Group December 10, 2015 Zhe Li ,


  1. A Two-Stage Approach for Learning a Sparse Model with Sharp Excess Risk Analysis Zhe Li ∗ , Tianbao Yang ∗ ,Lijun Zhang ♮ , Rong Jin † ∗ The University of Iowa, ♮ Nanjing University, † Alibaba Group December 10, 2015 Zhe Li ∗ , Tianbao Yang ∗ ,Lijun Zhang ♮ , Rong Jin † A Two-Stage Approach for Learning a Sparse Model with Sharp

  2. Problem Let x ∈ R d and y ∈ R denote an input and output pair Let w ∗ be an optimal model that minimizes the expected error 1 2 E P [( w T x − y ) 2 ] w ∗ = arg min || w || 1 ≤ B Key Problem : w ∗ is not necessarily sparse The goal : to learn a sparse model w to achieve small excess risk ER ( w , w ∗ ) = E P [( w T x − y ) 2 ] − E P [( w T ∗ x − y ) 2 ] ≤ ǫ Zhe Li ∗ , Tianbao Yang ∗ ,Lijun Zhang ♮ , Rong Jin † A Two-Stage Approach for Learning a Sparse Model with Sharp

  3. The challenges L = E P [( w T x − y ) 2 ] is not necessarily strongly convex Stochastic optimization: O (1 /ǫ 2 ) sample complexity and no sparsity guarantee Empirical risk minimization + ℓ 1 penalty: O (1 /ǫ 2 ) sample complexity and no sparsity guarantee Challenges: Can we reduce sample complexity (e.g. O (1 /ǫ ))? Can we also have a guarantee on sparsity of model? Our solution: Zhe Li ∗ , Tianbao Yang ∗ ,Lijun Zhang ♮ , Rong Jin † A Two-Stage Approach for Learning a Sparse Model with Sharp

  4. The first stage Our first stage algorithm is motivated by EPOCH-GD algorithm [Hazan, Kale 2011], which is on strongly convex setting. How to avoid strongly convex assumption? L ( w ) = E P [( w T x − y ) 2 ] = h ( Aw ) + b T w + c h ( · ): a strongly convex function The optimal solution set is a polyhedron By Hoffmans’ bound we have 2( L ( w ) − L ∗ ) ≥ 1 κ || w − w + || 2 2 where w + is the closest solution to w in the optimal solution set. [1] Elad Hazan, Satyen Kale, Beyond the regret minimization barrier: optimal algorithm for stochastic strongly-convex optimization Zhe Li ∗ , Tianbao Yang ∗ ,Lijun Zhang ♮ , Rong Jin † A Two-Stage Approach for Learning a Sparse Model with Sharp

  5. The second stage Our second stage algorithm: Randomized Sparsification For k = 1 , . . . , K Sample i k ∈ [ d ] according to Pr( i k = j ) = p j w ik � Compute [ � w k ] i k = [ � w k − 1 ] i k + p ik End For � w 2 j E [ x 2 ˆ j ] | ˆ w j | p j = j ] instead of p j = w || 1 [Shalve-Shwartz et || ˆ � � d w 2 j E [ x 2 ˆ j =1 al., 2010] Reduced constant in O (1 /ǫ ) for sparsity [2] shalve-shwartz, Srebro, Zhang, Trading accuracy for sparsity in optimization problems with sparsity constraints Zhe Li ∗ , Tianbao Yang ∗ ,Lijun Zhang ♮ , Rong Jin † A Two-Stage Approach for Learning a Sparse Model with Sharp

  6. Experimental Results E2006-tfidf E2006-tfidf E2006-tfidf 1.5 0.75 3 MG-Sparsification SpT: K = 500 Epoch-SGD 1.4 SGD DD-Sparsification SpS: B = 1 2.5 full model SpS: B = 2 1.3 SpS: B = 3 1.2 0.7 SpS: B = 4 2 RMSE RMSE RMSE SpS: B = 5 1.1 1.5 1 0.65 1 0.9 0.8 0.5 0.7 0.6 0 0.6 0 500 1000 1500 2000 2500 0 0.011 10.4 16.9 53.8 92.3 92.0 1 2 3 4 5 6 7 k K Sparsity(%) 1 st stage 2 nd stage RMSE vs Sparsity Zhe Li ∗ , Tianbao Yang ∗ ,Lijun Zhang ♮ , Rong Jin † A Two-Stage Approach for Learning a Sparse Model with Sharp

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend