Compressive Extreme Learning Machines Improved Models Through - PowerPoint PPT Presentation

Compressive Extreme Learning Machines Improved Models Through Exploiting Time-Accuracy Trade-offs Mark van Heeswijk, Amaury Lendasse, Yoan Miche September 5, 2014

Outline Motivation Extreme Learning Machines Compressive Extreme Learning Machine Experiments Conclusions Compressive Extreme Learning Machines 2/24 Mark van Heeswijk, Amaury Lendasse, Yoan September 5, 2014 Miche Improved Models Through Exploiting Time-Accuracy Trade-offs

Trade-offs in Training Neural Networks Ideally: training results in best possible test accuracy training is fast the model is efficient to evaluate at test time However, in practice, in training of neural networks there exists a trade-off between: testing accuracy training time testing time Furthermore, the optimal trade-off depends on the user’s requirements Compressive Extreme Learning Machines 4/24 Mark van Heeswijk, Amaury Lendasse, Yoan September 5, 2014 Miche Improved Models Through Exploiting Time-Accuracy Trade-offs

Contributions The paper explores time-accuracy trade-offs in various Extreme Learning Machines (ELMs) Compressive Extreme Learning Machine is introduced: allows for a flexible time-accuracy trade-off by training the model in a reduced space experiments indicate that this trade-off is efficient in the sense that it may yield better models in less time Compressive Extreme Learning Machines 5/24 Mark van Heeswijk, Amaury Lendasse, Yoan September 5, 2014 Miche Improved Models Through Exploiting Time-Accuracy Trade-offs

Standard ELM Given a training set ( x i , y i ) , x i ∈ R d , y i ∈ R , an activation function f : R �→ R and M the number of hidden nodes: 1: - Randomly assign input weights w i and biases b i , i ∈ [1 , M ]; 2: - Calculate the hidden layer output matrix H ; 3: - Calculate output weights matrix β = H † Y . where  f ( w 1 · x 1 + b 1 ) · · · f ( w M · x 1 + b M )  . . ... . . H =   . .   f ( w 1 · x N + b 1 ) · · · f ( w M · x N + b M ) Compressive Extreme Learning Machines 7/24 Mark van Heeswijk, Amaury Lendasse, Yoan September 5, 2014 Miche Improved Models Through Exploiting Time-Accuracy Trade-offs

ELM Theory vs Practice In theory, ELM is universal approximator In practice, limited number of samples; risk of overfitting Therefore: the functional approximation should use as limited number of neurons as possible the hidden layer should extract and retain as much useful information as possible from the input samples Compressive Extreme Learning Machines 8/24 Mark van Heeswijk, Amaury Lendasse, Yoan September 5, 2014 Miche Improved Models Through Exploiting Time-Accuracy Trade-offs

ELM Theory vs Practice Weight considerations: weight range determines typical activation of the transfer function (remember � w i , x � = | w i || x | cos θ ,) therefore, normalize or tune the length of the weights vectors somehow Linear vs non-linear: since sigmoid neurons operate in nonlinear regime, add d linear neurons for the ELM to work better on (almost) linear problems Avoiding overfitting: use efficient L2 regularization Compressive Extreme Learning Machines 9/24 Mark van Heeswijk, Amaury Lendasse, Yoan September 5, 2014 Miche Improved Models Through Exploiting Time-Accuracy Trade-offs

Ternary Weight Scheme + 1 0 0 0 until enough neurons [vanHeeswijk2014]:   − 1 0 0 0 add w ∈ {− 1 , 0 , 1 } d with 1 var (3 1 × � d �   ) 1 var 0 + 1 0 0   1   add w ∈ {− 1 , 0 , 1 } d with 2 vars (3 2 × 0 − 1 0 0 � d   � )   2   add w ∈ {− 1 , 0 , 1 } d with 3 vars (3 3 ×   � d � ) + 1 + 1 0 0   3   + 1 − 1 0 0   . . .   − 1 + 1 0 0     2 vars − 1 − 1 0 0   For each subspace, weights are added in random or-     der to avoid bias toward particular variables       0 0 − 1 − 1   3 vars Compressive Extreme Learning Machines 10/24 Mark van Heeswijk, Amaury Lendasse, Yoan September 5, 2014 Miche Improved Models Through Exploiting Time-Accuracy Trade-offs

Time-accuracy Trade-offs for Several ELMs ELM OP-ELM : Optimally Pruned ELM with neurons ranked by relevance, and then pruned to optimize the leave-one-out error TR-ELM: Tikhonov-regularized ELM, with efficient optimization of regularization parameter λ , using the SVD approach to computing H † TROP-ELM: Tikhonov regularized OP-ELM BIP(0.2), BIP(rand), BIP(CV) : ELMs pretrained using Batch Intrinsic Plasticity mechanism, adapting the hidden layer weights and biases, such that they retain as much information as possible BIP parameter is either fixed, randomized, or cross-validated over 20 possible values Compressive Extreme Learning Machines 11/24 Mark van Heeswijk, Amaury Lendasse, Yoan September 5, 2014 Miche Improved Models Through Exploiting Time-Accuracy Trade-offs

ELM Time-accuracy Trade-offs (Abalone UCI) 8 0 . 5 OP-3-ELM OP-3-ELM TROP-3-ELM 0 . 49 TROP-3-ELM 7 TR-3-ELM TR-3-ELM BIP(CV)-TR-3-ELM 0 . 48 BIP(CV)-TR-3-ELM 6 BIP(0.2)-TR-3-ELM BIP(0.2)-TR-3-ELM 0 . 47 BIP(rand)-TR-3-ELM BIP(rand)-TR-3-ELM 5 training time 0 . 46 mse test 4 0 . 45 0 . 44 3 0 . 43 2 0 . 42 1 0 . 41 0 0 . 4 0 100 200 300 400 500 600 700 800 900 1 , 000 0 100 200 300 400 500 600 700 800 900 1 , 000 #hidden neurons #hidden neurons Compressive Extreme Learning Machines 12/24 Mark van Heeswijk, Amaury Lendasse, Yoan September 5, 2014 Miche Improved Models Through Exploiting Time-Accuracy Trade-offs

ELM Time-accuracy Trade-offs (Abalone UCI) 0 . 5 0 . 5 OP-3-ELM OP-3-ELM 0 . 49 TROP-3-ELM 0 . 49 TROP-3-ELM TR-3-ELM TR-3-ELM 0 . 48 BIP(CV)-TR-3-ELM 0 . 48 BIP(CV)-TR-3-ELM BIP(0.2)-TR-3-ELM BIP(0.2)-TR-3-ELM 0 . 47 0 . 47 BIP(rand)-TR-3-ELM BIP(rand)-TR-3-ELM 0 . 46 0 . 46 mse test mse test 0 . 45 0 . 45 0 . 44 0 . 44 0 . 43 0 . 43 0 . 42 0 . 42 0 . 41 0 . 41 0 . 4 0 . 4 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 training time testing time · 10 − 2 Compressive Extreme Learning Machines 13/24 Mark van Heeswijk, Amaury Lendasse, Yoan September 5, 2014 Miche Improved Models Through Exploiting Time-Accuracy Trade-offs

ELM Time-accuracy Trade-offs (Abalone UCI) Depending on the user’s criteria, these results suggest: training time most important: BIP(rand)-TR-3-ELM (almost optimal performance, while keeping training time low) if test error is most important: BIP(CV)-TR-3-ELM (slightly better accuracy, but training time is 20 times as high) if testing time is most important: BIP(rand)-TR-3-ELM (surprisingly) (OP-ELM and TROP-ELM tend to be faster in test, but suffer from slight overfitting) Since TR-3-ELM offers attractive trade-offs between speed and accuracy, this model will be central in the rest of the paper. Compressive Extreme Learning Machines 14/24 Mark van Heeswijk, Amaury Lendasse, Yoan September 5, 2014 Miche Improved Models Through Exploiting Time-Accuracy Trade-offs

Two approaches for improving models Time-accuracy trade-offs suggest two possible strategies to obtain models that are preferable over other models: reducing test error , using a better algorithm ( “in terms of training time-accuracy plot: “pushing the curve down” ) reducing computational time , while retaining as much accuracy as possible ( “in terms of training time-accuracy plot: “pushing the curve to the left” ) Compressive ELM focuses on reducing computational time by performing the training in a reduced space , and then projecting back the solution back to the original space. Compressive Extreme Learning Machines 16/24 Mark van Heeswijk, Amaury Lendasse, Yoan September 5, 2014 Miche Improved Models Through Exploiting Time-Accuracy Trade-offs

Compressive ELM Given m × n matrix A , compute k-term approximate SVD A ≈ UDV T [Halko2009]: Form the n × ( k + p ) random matrix Ω. (where p is small) Form the m × ( k + p ) sampling matrix Y = A Ω. (sketch it by applying Ω) Form the m × ( k + p ) orthonormal matrix Q (such that range ( Q ) = range ( Y )) Compute B = Q ∗ A . Form the SVD of B so that B = ˆ UDV T Compute the matrix U = Q ˆ U Compressive Extreme Learning Machines 17/24 Mark van Heeswijk, Amaury Lendasse, Yoan September 5, 2014 Miche Improved Models Through Exploiting Time-Accuracy Trade-offs

Compressive Extreme Learning Machines Improved Models Through - PowerPoint PPT Presentation

Compressive Extreme Learning Machines Improved Models Through Exploiting Time-Accuracy Trade-offs Mark van Heeswijk, Amaury Lendasse, Yoan Miche September 5, 2014 Outline Motivation Extreme Learning Machines Compressive Extreme Learning

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Fast Compressive Sampling Using Fast Compressive Sampling Using Structurally Random Matrices

Compressive Sensing Take 2 Yubo Paul Yang, Algorithm Interest Group, Nov. 1 2019 See take 1

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF

Extreme Heat Preparedness Objectives What is extreme heat ? How does it impact SF? What are the

2014: Extreme territories 2 2015: Extreme territories 3 2016: Extreme territories 4 2018:

Finite State Machines (FSM) Chapter 8 State Machines Introduction State Machines Mealy and

Machine Learning for Compressive Privacy S.Y. Y. K Kung Prince ceton Un Univer ersi sity

MATHEMATICS 1 CONTENTS Extreme values in one dimension Extreme values in two dimensions

Finite State Machines (FSM) AKA Finite State Automat on State Machines Introduction State

Towards Compressive Geospatial Sensing Via Fusion of LIDAR and Hyperspectral Imaging Allen Y.

INFLUENCE OF STRUCTURAL ANISOTROPY INFLUENCE OF STRUCTURAL ANISOTROPY ON COMPRESSIVE FRACTURE

Compressive sensing principles and iterative sparse recovery for inverse and ill-posed problems

KSVD - Gradient Descent Method For Compressive Sensing Optimization Endra Department of Computer

Compressive Sensing with Biorthogonal Wavelets via Structured Sparsity Marco F. Duarte Richard

How to Take into Account the Discrete Parameters in the BIC Criterion? V. Vandewalle University

Frequentist Statistics DS GA 1002 Probability and Statistics for Data Science

Safety Assurance in in Cyber-Physical Systems buil ilt wit ith Le Learning-Enabled Components

INTEGRAL PRIVACY COMPLIANT STATISTICS COMPUTATION NAVODA SENAVIRATHNE UNIVERSITY OF SKVDE,

Menu Concerns about the quality of the predictive distributions Augmentation: a bit more

First International Workshop on Learning over Multiple Contexts LMCE 2014 Nancy, 19 September

Event Calendar SHIMA Daio,

Voyaging around nacre with the x-ray shuttle from biomineralisation to prosthetics via mollusc