Cascade-Correlation and Deep Learning
Scott E. Fahlman
Professor Emeritus Language Technologies Institute February 12, 2020
Deep Learning Scott E. Fahlman Professor Emeritus Language - - PowerPoint PPT Presentation
Cascade-Correlation and Deep Learning Scott E. Fahlman Professor Emeritus Language Technologies Institute February 12, 2020 Two Ancient Papers Fahlman, S. E. and C. Lebiere (1990) "The Cascade-Correlation Learning Architecture,
Professor Emeritus Language Technologies Institute February 12, 2020
CMU/LTI
2
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
3
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
▪ All hidden units are being trained at once, changing the environment
seen by the other units as they train.
▪ Each unit must find a distinct job -- some component of the error to
correct.
▪ All units scramble for the most important jobs. No central authority or
communication.
▪ Once a job is taken, it disappears and units head for the next-best job,
▪ A chaotic game of “musical chairs” develops. ▪ This is a very inefficient way to assign a distinct useful job to each unit.
4
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
5
f f
Inputs Outputs Units Trainable Weights
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
6
f f
Inputs Outputs Units
f
Frozen Weights Trainable Weights First Hidden Unit
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
7
f f
Inputs Outputs Units
f
Frozen Weights Second Hidden Unit Trainable Weights
f
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
▪ Create a pool of candidate units. Each gets all available inputs.
Outputs are not yet connected to anything.
▪ Train the incoming weights to maximize the match (covariance)
between each unit’s output and the residual error:
▪ When all are quiescent, tenure the winner and add it to active net. Kill
all the other candidates.
8
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
9
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
10
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
Scott E. Fahlman <sef@cs.cmu.edu>
11
CMU/LTI
Scott E. Fahlman <sef@cs.cmu.edu>
12
CMU/LTI
13
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
14
One-Step Delay
Sigmoid Inputs Trainable Wi Trainable Ws
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
15
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
16
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
17
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
18
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
19
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
▪ Note: Don’t need a unit for every pattern or every time-slice.
20
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
21
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
22
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
23
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
▪ Eliminates inefficiency due to moving targets and herd effect. ▪ Freezing allows for incremental “lesson-plan” training. ▪ Unit training/selection is very parallelizable.
24
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
25
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
▪ Dean will be looking at using this for NLP applications specifically aimed
at the language in patents.
▪ Dean also has done some work on word embeddings, developing a
version of word2vec using Scone.
▪ Not many learning-speed results published. ▪ Ideally, we want something like “link crossing” counts for comparison.
Scott E. Fahlman <sef@cs.cmu.edu>
26
CMU/LTI
27
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
28
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
29
Scott E. Fahlman <sef@cs.cmu.edu>