Feature Significance in Wide Neural Networks Janusz A. Starzyk Rafa - - PowerPoint PPT Presentation

feature significance in wide neural networks
SMART_READER_LITE
LIVE PREVIEW

Feature Significance in Wide Neural Networks Janusz A. Starzyk Rafa - - PowerPoint PPT Presentation

Feature Significance in Wide Neural Networks Janusz A. Starzyk Rafa Niemiec Adrian Horzyk 1 2 2 3 starzykj@ohio.edu horzyk@agh.edu.pl rniemiec@wsiz.rzeszow.pl Google: Janusz Starzyk Google: Rafa Niemiec Google: Horzyk 1 2 3 Ohio


slide-1
SLIDE 1

Rafał Niemiec

rniemiec@wsiz.rzeszow.pl Google: Rafał Niemiec

Feature Significance in Wide Neural Networks

AGH University of Science and Technology Krakow, Poland Ohio University, Athens, Ohio, U.S.A., School of Electrical Engineering and Computer Science

Adrian Horzyk

horzyk@agh.edu.pl Google: Horzyk

Janusz A. Starzyk

starzykj@ohio.edu Google: Janusz Starzyk University of Information Technology and Management, Rzeszow, Poland

1 2 3 1 2 2 3

slide-2
SLIDE 2

2

Introduction Introduction

Wide neural networks was recently proposed as a less costly alternative to deep neural networks, having no problem with exploding or vanishing gradient descent.

We analyzed quality of features and properties of wide neural networks.

We compared the random selection of weights in the hidden layer to the selection based on radial basis functions.

Our study is devoted to feature selection and feature significance in wide neural networks.

We introduced a measure to compare various feature selection techniques.

We proved that this approach is computationally more efficient.

slide-3
SLIDE 3

Wide Neural Networks can be described by the equations:

Y1 = Z1 · W1 Z2 = Ψ(Y1) Y2 = Z2 · W2 W2 = Z2

+ · YD

3

Wide (Broad) Neural Wide (Broad) Neural Network Network

ASSUMED

(it is usually randomly generated, but we propose a different approach to save the number of necessary neurons and connections)

COMPUTED

using pseudoinverse

We try to measure the feature quality if the assumption was correct

C.L.P. Chen, and Z. Liu, “Broad Learning System: An Effective and Efficient Incremental Learning System Without the Need for Deep Architecture”, IEEE Transactions on Neural Networks and Learning Systems, Vol. 29 , Issue: 1 , Jan. 2018, pp. 10-24. They were from 276 to 1543 times faster trained than autoencoders, multilayer perceptrons, and deep neural networks.

slide-4
SLIDE 4

4

Neural Neural Features Features

Z1 W1 W2 Y2 kxm mxn nxo kxo

INPUT FEATURES NEURAL FEATURES OUTPUT FEATURES It is all about W1 proper column space setup.

Ψ

slide-5
SLIDE 5

5

Classification error approximated experimentally is

Ran Random features dom features (used as reference) (used as reference)

slide-6
SLIDE 6

6

Radial Basis Radial Basis Features Features

Hidden neurons function: Distance between the input data and a hidden neuron : Mean value of norms of differences between weights of hidden neurons:

slide-7
SLIDE 7

7

RANDOM FEATURE SELECTION

Comparison of random and radial Comparison of random and radial basis basis featur feature e selection selection approaches approaches

RADIAL BASIS FEATURE SELECTION

slide-8
SLIDE 8

8

Random v Random vs Radial Basis Radial Basis Features Features

RND RBF

Open question: how significant the improvement can be?

1 𝑜

slide-9
SLIDE 9

9

Feature Feature Significance Significance

𝟐 𝒐

e2 – for random weights W1 e1 = 𝟐

𝒐 theoretical limit of

RVFLNN networks e2 – for RBF weights W1 e1 – for random weights W1

Random weights are always better from the reference weights, and this advantage is still better along with the number of neurons RBF weights are always better than random weights in range of 0 and 6000 neurons, but this benefits are diminishing along with the number of neurons.

slide-10
SLIDE 10

10

% % increase increase of RND relative to RBF

  • f RND relative to RBF
slide-11
SLIDE 11

11

Inc Incremental remental Feature Feature Significance Significance

Suppose we have n features and want to add nf new features We want to measure nRND features can be saved if nf will be added e.g.

slide-12
SLIDE 12

12

Example A Example A

e1 e2 n

slide-13
SLIDE 13

13

Example B Example B

Incremental significance of endpoint features

slide-14
SLIDE 14

14

Full Fully Connec y Connected ted Cas Cascade cade

No of connections (and weights to train) is 7840!

slide-15
SLIDE 15

15

Modified Modified Cascade Cascade

slide-16
SLIDE 16

16

Summary & Summary & Future Future Plans Plans

We discussed the significance of the feature selection for wide neural networks.

We compared recognition accuracy on the MNIST dataset using two approaches.

We compared wide networks to connected cascades.

We introduced two simple feature significance measures.

In future, we want to explore tradeoffs between the number

  • f hidden neurons in wide neural networks vs. width and

depth of deep neural networks to better understand tradeoffs between the two parameters in these modern network structures.

slide-17
SLIDE 17

17

謝謝

1. Basawaraj, J. A. Starzyk, A. Horzyk, Episodic Memory in Minicolumn Associative Knowledge Graphs, IEEE Transactions on Neural Networks and Learning Systems, Vol. 30, Issue 11, 2019, pp. 3505-3516, DOI: 10.1109/TNNLS.2019.2927106 (TNNLS-2018-P-9932), IF = 7.982. 2.

  • J. A. Starzyk, Ł. Maciura, A. Horzyk, Associative Memories with Synaptic Delays, IEEE Transactions on Neural Networks and Learning Systems,
  • Vol. .., Issue .., 2019, pp. ... - ..., DOI: 10.1109/TNNLS.2019.2921143 (TNNLS-2018-P-9188), IF = 7.982.

3.

  • A. Horzyk, J. A. Starzyk, J. Graham, Integration of Semantic and Episodic Memories, IEEE Transactions on Neural Networks and Learning

Systems, Vol. 28, Issue 12, Dec. 2017, pp. 3084 - 3095, DOI: 10.1109/TNNLS.2017.2728203, IF = 6.108. 4.

  • A. Horzyk, J.A. Starzyk, Fast Neural Network Adaptation with Associative Pulsing Neurons, IEEE Xplore, In: 2017 IEEE Symposium Series on

Computational Intelligence, pp. 339 -346, 2017, DOI: 10.1109/SSCI.2017.8285369. 5.

  • A. Horzyk, Neurons Can Sort Data Efficiently, Proc. of ICAISC 2017, Springer-Verlag, LNAI, 2017, pp. 64 - 74, ICAISC BEST PAPER AWARD 2017

sponsored by Springer. 6.

  • A. Horzyk, J. A. Starzyk and Basawaraj, Emergent creativity in declarative memories, IEEE Xplore, In: 2016 IEEE Symposium Series on

Computational Intelligence, Greece, Athens: Institute of Electrical and Electronics Engineers, Curran Associates, Inc. 57 Morehouse Lane Red Hook, NY 12571 USA, 2016, ISBN 978-1-5090-4239-5, pp. 1 - 8, DOI: 10.1109/SSCI.2016.7850029. Adrian Horzyk horzyk@agh.edu.pl Google: Horzyk Janusz A. Starzyk starzykj@ohio.edu Google: Janusz Starzyk Rafał Niemiec rniemiec@wsiz.rzeszow.pl Google: Rafał Niemiec

slide-18
SLIDE 18

18

Z2 = UΣVT ΣΣ+ = ~I Σ+Σ = ~I Z2

+ = (UΣVT)+ = VΣ+UT

Minimum norm least square solution W2 ~= Z2

+ YD Matrix Methods in Data Analysis, Signal Processing, and Machine Learning

Pseudo Pseudoinvers inverse e Z2

+=V

=VΣ+UT

Projections, where possible

σ1 0 0 Σ = 0 σ2 0 0 0 0 1/σ1 0 0 Σ+ = 0 1/σ2 0 0 0 0

m x n n x m m x m n x n & WIPE nullspace C(Z2) C(Z2

T)