An Introduction to Neural Network Rule Extraction Algorithms By - - PowerPoint PPT Presentation
An Introduction to Neural Network Rule Extraction Algorithms By - - PowerPoint PPT Presentation
An Introduction to Neural Network Rule Extraction Algorithms By Sarah Jackson Can we trust magic? Neural Networks Machine learning black boxes Magical, unexplainable results Problems People won't trust Neural Networks since
Can we trust magic?
✗ Neural Networks
✗ Machine learning black boxes ✗ Magical, unexplainable results
✗ Problems
✗ People won't trust Neural Networks since it
is difficult for them to understand them
✗ End result isn't always the only thing we
are looking for
✗ Unacceptable risk for certain scenarios
Why do we want them then?
✗ Neural Networks have been shown to
accurately classify data
✗ Neural Networks are capable of learning
and classifying in ways that other machine learning techniques may not be
Who cares about rules?
✗ Rules help to bridge the gap between
connectionist and symbolic methods
✗ Rule extraction from Neural Networks will
increase their acceptance
✗ Rules will also improve usefulness of data
gathered from Neural Networks
What do we do with these rules?
✗ Validation
✗ We can tell something has been learned
✗ Integration
✗ Can be used with symbolic systems
✗ Theory discovery
✗ May not have been seen otherwise
✗ Explanation ability
✗ Allows exploration of knowledge in network
Are the rules good?
✗ Accuracy
✗ Correctly classify unseen examples
✗ Fidelity
✗ Same behavior as Neural Network
✗ Consistency
✗ Classify unseen examples the same
✗ Comprehensibility
✗ Size of rule set and number of clauses per
rule
How does extraction work?
✗ Knowledge in Neural Networks represented
by numerical weights
✗ Extraction algorithms attempt to directly or
indirectly analyze the numerical data
✗ Neural Network behavior is explained
through new methods
Decompositional Algorithms
✗ Knowledge is extracted from each node in the
network individually
✗ Each node's rules are based on previous layers ✗ Usually simply described and accurate ✗ Require threshold approximation for each node ✗ Restricted generalization and scalability
✗ Special training procedure ✗ Special network architecture
✗ Require sigmoidal transfer functions for hidden
nodes
Global Algorithms
✗ Describe output nodes as functions of input
nodes
✗ Internal structure of network is not
important
✗ Represent networks as decision trees ✗ Extract rules from constructed decision
trees
✗ May not be efficient as complexity of
network grows
Combinatorial Algorithms
✗ Uses aspects of decompositional and
global algorithms
✗ Network architecture and value of weights
are necessary
✗ Attempts to gain advantages of each
without the disadvantages
TREPAN
✗ Trees Parroting Networks ✗ Global method
✗ Represents network knowledge through a
decision tree
✗ Uses same construction as C4.5 and
CART
✗ Uses breadth-first search to construct the
tree instead of depth-first search
TREPAN
✗ Classes used for decision tree are those
defined by the neural network
✗ List of leaf nodes kept with related data
✗ Subset of training data ✗ Set of complementary data ✗ Set of constraints
✗ Data sets used to determine if node should
be further divided or left as terminal leaf
✗ Data sets meet constraints
TREPAN
✗ Nodes are removed from list when split or
become terminal leaves
✗ Never added to list again ✗ Children are added to list
✗ Decision function determines type of decision
tree constructed
✗ M-of-N – Each node represents an m-of-n test ✗ 1-of-N – Each node represents a 1-of-n test ✗ Simple – Each node represents a test for one
attribute (true of false)
TREPAN
✗ Comparison on UCI Tic-Tac-Toe Data
✗ 27 inputs, 20 hidden nodes, 2 outputs
TREPAN
✗ Typically, shortest tree is easiest to understand ✗ M-of-N has fewest nodes, but is very difficult to
understand
✗ TREPAN provides higher quality information
TREPAN
TREPAN
TREPAN
Another Global Algorithm
✗ Only uses training data to construct decision tree
✗ TREPAN uses training data and may use
artificially generated data
✗ Uses CN2 and C4.5 algorithms
BDT
✗ Bound Decomposition Tree
✗ Decomposition Algorithm
✗ Designed with goals of no retraining, high
accuracy and low complexity
✗ Algorithm works for Multi-Layer
Perceptrons
BDT
✗ Maximum upper bounds on any neuron
✗ All inputs that have positive weight have a
value of 1
✗ Inputs with negative weight have a value of
✗ Minimum lower bounds on any neuron
✗ Only inputs that have negative weight have
a value of 1
✗ Inputs with positive weight have a value of
BDT
✗ Each neuron has its own minimum and maximum
bounds
✗ Minimum is found by adding the bias plus all
negative weights
✗ Maximum is found by adding the bias plus all
positive weights
Weight Min Bound Max Bound I1
- 0.25
- 0.25
I2 0.65 0.65 I3
- 0.48
- 0.48
I4 0.72 0.72 Bias (-1) 1
- 1
- 1
- 1.73
0.37
BDT
✗ Each neuron (cube) is divided into two subcubes
based on the first input
✗ One subcube assumes 0 as the value and the
- ther assumes 1
✗ Remaining inputs are used to construct the input
vectors for each subcube
✗ Bounds are calculated for each subcube
✗ Positive subcube – lower bound is positive ✗ Negative subcube – upper bound is negative ✗ Uncertain subcube – lower bound is negative and
upper bound is positive
BDT
✗ Positive subcubes will always fire
✗ Represents a rule for the neuron
✗ Negative subcubes will never fire ✗ Uncertain subcubes must be further
subdivided until positive and/or negative subcubes are reached
✗ Rules for a neuron are the set of all input
vectors on positive subcubes
✗ Can have a Δ over 0 to prune the neuron
BDT
Sources
Milare, R., De Carvalho, A., & Monard, M. (2002). An Approach to Explain Neural Networks Using Symbolic Algorithms. International Journal of Computational Intelligence and Applications. 2(4), 365-376. Heh, J. S., Chen, J. C., & Chang, M. (2008). Designing a decompositional rule extraction algorithm for neural networks with bound decomposition tree. Neural Computing and Applications. 17, 297- 309. Nobre, C., Martinelle, E., Braga, A., De Carvalho, A., Rezende, S., Braga, J. L. & Ludermir,
- T. (1999). Knowledge Extraction: A Comparison between Symbolic and Connectionist
- Methods. International Journal of Neural Systems . 9(3), 257-264.