A Two-Stage Approach for Learning a Sparse Model with Sharp Excess - PowerPoint PPT Presentation

A Two-Stage Approach for Learning a Sparse Model with Sharp Excess Risk Analysis Zhe Li ∗ , Tianbao Yang ∗ ,Lijun Zhang ♮ , Rong Jin † ∗ The University of Iowa, ♮ Nanjing University, † Alibaba Group December 10, 2015 Zhe Li ∗ , Tianbao Yang ∗ ,Lijun Zhang ♮ , Rong Jin † A Two-Stage Approach for Learning a Sparse Model with Sharp

Problem Let x ∈ R d and y ∈ R denote an input and output pair Let w ∗ be an optimal model that minimizes the expected error 1 2 E P [( w T x − y ) 2 ] w ∗ = arg min || w || 1 ≤ B Key Problem : w ∗ is not necessarily sparse The goal : to learn a sparse model w to achieve small excess risk ER ( w , w ∗ ) = E P [( w T x − y ) 2 ] − E P [( w T ∗ x − y ) 2 ] ≤ ǫ Zhe Li ∗ , Tianbao Yang ∗ ,Lijun Zhang ♮ , Rong Jin † A Two-Stage Approach for Learning a Sparse Model with Sharp

The challenges L = E P [( w T x − y ) 2 ] is not necessarily strongly convex Stochastic optimization: O (1 /ǫ 2 ) sample complexity and no sparsity guarantee Empirical risk minimization + ℓ 1 penalty: O (1 /ǫ 2 ) sample complexity and no sparsity guarantee Challenges: Can we reduce sample complexity (e.g. O (1 /ǫ ))? Can we also have a guarantee on sparsity of model? Our solution: Zhe Li ∗ , Tianbao Yang ∗ ,Lijun Zhang ♮ , Rong Jin † A Two-Stage Approach for Learning a Sparse Model with Sharp

The first stage Our first stage algorithm is motivated by EPOCH-GD algorithm [Hazan, Kale 2011], which is on strongly convex setting. How to avoid strongly convex assumption? L ( w ) = E P [( w T x − y ) 2 ] = h ( Aw ) + b T w + c h ( · ): a strongly convex function The optimal solution set is a polyhedron By Hoffmans’ bound we have 2( L ( w ) − L ∗ ) ≥ 1 κ || w − w + || 2 2 where w + is the closest solution to w in the optimal solution set. [1] Elad Hazan, Satyen Kale, Beyond the regret minimization barrier: optimal algorithm for stochastic strongly-convex optimization Zhe Li ∗ , Tianbao Yang ∗ ,Lijun Zhang ♮ , Rong Jin † A Two-Stage Approach for Learning a Sparse Model with Sharp

The second stage Our second stage algorithm: Randomized Sparsification For k = 1 , . . . , K Sample i k ∈ [ d ] according to Pr( i k = j ) = p j w ik � Compute [ � w k ] i k = [ � w k − 1 ] i k + p ik End For � w 2 j E [ x 2 ˆ j ] | ˆ w j | p j = j ] instead of p j = w || 1 [Shalve-Shwartz et || ˆ � � d w 2 j E [ x 2 ˆ j =1 al., 2010] Reduced constant in O (1 /ǫ ) for sparsity [2] shalve-shwartz, Srebro, Zhang, Trading accuracy for sparsity in optimization problems with sparsity constraints Zhe Li ∗ , Tianbao Yang ∗ ,Lijun Zhang ♮ , Rong Jin † A Two-Stage Approach for Learning a Sparse Model with Sharp

Experimental Results E2006-tfidf E2006-tfidf E2006-tfidf 1.5 0.75 3 MG-Sparsification SpT: K = 500 Epoch-SGD 1.4 SGD DD-Sparsification SpS: B = 1 2.5 full model SpS: B = 2 1.3 SpS: B = 3 1.2 0.7 SpS: B = 4 2 RMSE RMSE RMSE SpS: B = 5 1.1 1.5 1 0.65 1 0.9 0.8 0.5 0.7 0.6 0 0.6 0 500 1000 1500 2000 2500 0 0.011 10.4 16.9 53.8 92.3 92.0 1 2 3 4 5 6 7 k K Sparsity(%) 1 st stage 2 nd stage RMSE vs Sparsity Zhe Li ∗ , Tianbao Yang ∗ ,Lijun Zhang ♮ , Rong Jin † A Two-Stage Approach for Learning a Sparse Model with Sharp

A Two-Stage Approach for Learning a Sparse Model with Sharp Excess - PowerPoint PPT Presentation

A Two-Stage Approach for Learning a Sparse Model with Sharp Excess Risk Analysis Zhe Li , Tianbao Yang ,Lijun Zhang , Rong Jin The University of Iowa, Nanjing University, Alibaba Group December 10, 2015 Zhe Li ,

in Big-Data Analytic Systems Rui Li , Peizhen Guo, Bo Hu, Wenjun Hu Yale University Background

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

VOLVO PENTA STAGE V SOLUTION Engine concept and range presentation April 2019 ADDITIONAL

IGCSE MISY Mandalay 2020-2022 MISY Mandalay Key Stage 4 MISY Key Stages EYFS KS4 KS5 KS1

24/10/2018 01/12/2018 01/07/2019 01/07/2020 01/07/2021 01/07/2022 Stage 2 Stage 3 Royal

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Learning in the Foundation Stage The Foundation Stage is the stage of education for children

Sparse Feature Learning Philipp Koehn 3 March 2015 Philipp Koehn Machine Translation: Sparse

Sparse Feature Learning Philipp Koehn 1 March 2016 Philipp Koehn Machine Translation: Sparse

SSWG Stage Two: Information Gathering Todays Plan Review feedback Review Stage Two related to

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

ERSKINE PARK HIGH SCHOOL Putting Plans into Action STAGE 3 STAGE 2 STAGE 1 Since last

SACE Stage 1 into SACE Stage 2 COURSE COUNSELLING For 2019 Respect, Care & Compassion,

Progression options beyond Stage 3 CIVIL CI Accredited for MIEI Add experience and/or MEngSc

Taming the Beast: Topic imaging Predictive approach Sparse Machine Learning for Large Text

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

PUBLISHING A MONOGRAPH Michael Sharp Cambridge University Press msharp@cambridge.org The

Regression Discontinuity Designs Erik Gahner Larsen Advanced applied statistics, 2015 1 / 48

On sharp Strichartz inequalities in low dimensions Dirk Hundertmark University of Birmingham

Sharp phase transition for Bargmann-Fock percolation Hugo Vanneuville (ICJ, Universit e Lyon

SGD without Replacement: Sharper Rates for General Smooth Convex Functions Dheeraj Nagaraj

Isosurfaces Over Simplicial Partitions of Multiresolution Grids Josiah Manson and Scott Schaefer

A Performance Improvement Approach for Second-Order Optimization in Large Mini-batch Training

Sharp Adaptive Estimation of the Trend Coefficient of an Ergodic Diffusion Arnak S. Dalalyan

A Two-Stage Approach for Learning a Sparse Model with Sharp Excess - PowerPoint PPT Presentation

A Two-Stage Approach for Learning a Sparse Model with Sharp Excess Risk Analysis Zhe Li , Tianbao Yang ,Lijun Zhang , Rong Jin The University of Iowa, Nanjing University, Alibaba Group December 10, 2015 Zhe Li ,

in Big-Data Analytic Systems Rui Li , Peizhen Guo, Bo Hu, Wenjun Hu Yale University Background

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

VOLVO PENTA STAGE V SOLUTION Engine concept and range presentation April 2019 ADDITIONAL

IGCSE MISY Mandalay 2020-2022 MISY Mandalay Key Stage 4 MISY Key Stages EYFS KS4 KS5 KS1

24/10/2018 01/12/2018 01/07/2019 01/07/2020 01/07/2021 01/07/2022 Stage 2 Stage 3 Royal

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Learning in the Foundation Stage The Foundation Stage is the stage of education for children

Sparse Feature Learning Philipp Koehn 3 March 2015 Philipp Koehn Machine Translation: Sparse

Sparse Feature Learning Philipp Koehn 1 March 2016 Philipp Koehn Machine Translation: Sparse

SSWG Stage Two: Information Gathering Todays Plan Review feedback Review Stage Two related to

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

ERSKINE PARK HIGH SCHOOL Putting Plans into Action STAGE 3 STAGE 2 STAGE 1 Since last

SACE Stage 1 into SACE Stage 2 COURSE COUNSELLING For 2019 Respect, Care &amp; Compassion,

Progression options beyond Stage 3 CIVIL CI Accredited for MIEI Add experience and/or MEngSc

Taming the Beast: Topic imaging Predictive approach Sparse Machine Learning for Large Text

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

PUBLISHING A MONOGRAPH Michael Sharp Cambridge University Press msharp@cambridge.org The

Regression Discontinuity Designs Erik Gahner Larsen Advanced applied statistics, 2015 1 / 48

On sharp Strichartz inequalities in low dimensions Dirk Hundertmark University of Birmingham

Sharp phase transition for Bargmann-Fock percolation Hugo Vanneuville (ICJ, Universit e Lyon

SGD without Replacement: Sharper Rates for General Smooth Convex Functions Dheeraj Nagaraj

Isosurfaces Over Simplicial Partitions of Multiresolution Grids Josiah Manson and Scott Schaefer

A Performance Improvement Approach for Second-Order Optimization in Large Mini-batch Training

Sharp Adaptive Estimation of the Trend Coefficient of an Ergodic Diffusion Arnak S. Dalalyan

SACE Stage 1 into SACE Stage 2 COURSE COUNSELLING For 2019 Respect, Care & Compassion,