Feature Significance in Wide Neural Networks Janusz A. Starzyk Rafa - - PowerPoint PPT Presentation

▶

Jun 06, 2023 157 likes •347 views

Feature Significance in Wide Neural Networks Janusz A. Starzyk Rafa Niemiec Adrian Horzyk 1 2 2 3 starzykj@ohio.edu horzyk@agh.edu.pl rniemiec@wsiz.rzeszow.pl Google: Janusz Starzyk Google: Rafa Niemiec Google: Horzyk 1 2 3 Ohio

SLIDE 1

Rafał Niemiec

rniemiec@wsiz.rzeszow.pl Google: Rafał Niemiec

Feature Significance in Wide Neural Networks

AGH University of Science and Technology Krakow, Poland Ohio University, Athens, Ohio, U.S.A., School of Electrical Engineering and Computer Science

Adrian Horzyk

horzyk@agh.edu.pl Google: Horzyk

Janusz A. Starzyk

starzykj@ohio.edu Google: Janusz Starzyk University of Information Technology and Management, Rzeszow, Poland

1 2 3 1 2 2 3

SLIDE 2

Introduction Introduction

▪

Wide neural networks was recently proposed as a less costly alternative to deep neural networks, having no problem with exploding or vanishing gradient descent.

▪

We analyzed quality of features and properties of wide neural networks.

▪

We compared the random selection of weights in the hidden layer to the selection based on radial basis functions.

▪

Our study is devoted to feature selection and feature significance in wide neural networks.

▪

We introduced a measure to compare various feature selection techniques.

▪

We proved that this approach is computationally more efficient.

SLIDE 3

Wide Neural Networks can be described by the equations:

Y1 = Z1 · W1 Z2 = Ψ(Y1) Y2 = Z2 · W2 W2 = Z2

+ · YD

Wide (Broad) Neural Wide (Broad) Neural Network Network

ASSUMED

(it is usually randomly generated, but we propose a different approach to save the number of necessary neurons and connections)

COMPUTED

using pseudoinverse

We try to measure the feature quality if the assumption was correct

C.L.P. Chen, and Z. Liu, “Broad Learning System: An Effective and Efficient Incremental Learning System Without the Need for Deep Architecture”, IEEE Transactions on Neural Networks and Learning Systems, Vol. 29 , Issue: 1 , Jan. 2018, pp. 10-24. They were from 276 to 1543 times faster trained than autoencoders, multilayer perceptrons, and deep neural networks.

SLIDE 4

Neural Neural Features Features

Z1 W1 W2 Y2 kxm mxn nxo kxo

INPUT FEATURES NEURAL FEATURES OUTPUT FEATURES It is all about W1 proper column space setup.

SLIDE 5

Classification error approximated experimentally is

Ran Random features dom features (used as reference) (used as reference)

SLIDE 6

Radial Basis Radial Basis Features Features

Hidden neurons function: Distance between the input data and a hidden neuron : Mean value of norms of differences between weights of hidden neurons:

SLIDE 7

RANDOM FEATURE SELECTION

Comparison of random and radial Comparison of random and radial basis basis featur feature e selection selection approaches approaches

RADIAL BASIS FEATURE SELECTION

SLIDE 8

Random v Random vs Radial Basis Radial Basis Features Features

RND RBF

Open question: how significant the improvement can be?

1 𝑜

SLIDE 9

Feature Feature Significance Significance

𝟐 𝒐

e2 – for random weights W1 e1 = 𝟐

𝒐 theoretical limit of

RVFLNN networks e2 – for RBF weights W1 e1 – for random weights W1

Random weights are always better from the reference weights, and this advantage is still better along with the number of neurons RBF weights are always better than random weights in range of 0 and 6000 neurons, but this benefits are diminishing along with the number of neurons.

SLIDE 10

% % increase increase of RND relative to RBF

f RND relative to RBF

SLIDE 11

Inc Incremental remental Feature Feature Significance Significance

Suppose we have n features and want to add nf new features We want to measure nRND features can be saved if nf will be added e.g.

SLIDE 12

Example A Example A

e1 e2 n

SLIDE 13

Example B Example B

Incremental significance of endpoint features

SLIDE 14

Full Fully Connec y Connected ted Cas Cascade cade

No of connections (and weights to train) is 7840!

SLIDE 15

Modified Modified Cascade Cascade

SLIDE 16

Summary & Summary & Future Future Plans Plans

▪

We discussed the significance of the feature selection for wide neural networks.

▪

We compared recognition accuracy on the MNIST dataset using two approaches.

▪

We compared wide networks to connected cascades.

▪

We introduced two simple feature significance measures.

▪

In future, we want to explore tradeoffs between the number

f hidden neurons in wide neural networks vs. width and

depth of deep neural networks to better understand tradeoffs between the two parameters in these modern network structures.

SLIDE 17

謝謝

1. Basawaraj, J. A. Starzyk, A. Horzyk, Episodic Memory in Minicolumn Associative Knowledge Graphs, IEEE Transactions on Neural Networks and Learning Systems, Vol. 30, Issue 11, 2019, pp. 3505-3516, DOI: 10.1109/TNNLS.2019.2927106 (TNNLS-2018-P-9932), IF = 7.982. 2.

J. A. Starzyk, Ł. Maciura, A. Horzyk, Associative Memories with Synaptic Delays, IEEE Transactions on Neural Networks and Learning Systems,
Vol. .., Issue .., 2019, pp. ... - ..., DOI: 10.1109/TNNLS.2019.2921143 (TNNLS-2018-P-9188), IF = 7.982.

A. Horzyk, J. A. Starzyk, J. Graham, Integration of Semantic and Episodic Memories, IEEE Transactions on Neural Networks and Learning

Systems, Vol. 28, Issue 12, Dec. 2017, pp. 3084 - 3095, DOI: 10.1109/TNNLS.2017.2728203, IF = 6.108. 4.

A. Horzyk, J.A. Starzyk, Fast Neural Network Adaptation with Associative Pulsing Neurons, IEEE Xplore, In: 2017 IEEE Symposium Series on

Computational Intelligence, pp. 339 -346, 2017, DOI: 10.1109/SSCI.2017.8285369. 5.

A. Horzyk, Neurons Can Sort Data Efficiently, Proc. of ICAISC 2017, Springer-Verlag, LNAI, 2017, pp. 64 - 74, ICAISC BEST PAPER AWARD 2017