Cascade-Correlation and Deep Learning
Scott E. Fahlman
Professor Emeritus Language Technologies Institute February 27, 2019
Cascade-Correlation and Deep Learning Scott E. Fahlman Professor - - PowerPoint PPT Presentation
Cascade-Correlation and Deep Learning Scott E. Fahlman Professor Emeritus Language Technologies Institute February 27, 2019 Two Ancient Papers Fahlman, S. E. and C. Lebiere (1990) "The Cascade-Correlation Learning Architecture,
Professor Emeritus Language Technologies Institute February 27, 2019
CMU/LTI
2
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
3
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
▪ All hidden units are being trained at once, changing the environment
▪ Each unit must find a distinct job -- some component of the error to
▪ All units scramble for the most important jobs. No central authority or
▪ Once a job is taken, it disappears and units head for the next-best job,
▪ A chaotic game of “musical chairs” develops. ▪ This is a very inefficient way to assign a distinct useful job to each unit.
4
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
5
f f
Inputs Outputs Units Trainable Weights
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
6
f f
Inputs Outputs Units
f
Frozen Weights Trainable Weights First Hidden Unit
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
7
f f
Inputs Outputs Units
f
Frozen Weights Second Hidden Unit Trainable Weights
f
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
▪ Create a pool of candidate units. Each gets all available inputs.
▪ Train the incoming weights to maximize the match (covariance)
▪ When all are quiescent, tenure the winner and add it to active net. Kill
8
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
9
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
10
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
Scott E. Fahlman <sef@cs.cmu.edu>
11
CMU/LTI
Scott E. Fahlman <sef@cs.cmu.edu>
12
CMU/LTI
13
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
14
One-Step Delay
Sigmoid Inputs Trainable Wi Trainable Ws
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
15
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
16
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
17
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
18
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
19
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
▪ Note: Don’t need a unit for every pattern or every time-slice.
20
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
21
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
22
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
23
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
▪ Eliminates inefficiency due to moving targets and herd effect. ▪ Freezing allows for incremental “lesson-plan” training. ▪ Unit training/selection is very parallelizable.
24
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
25
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
▪ Dean will be looking at using this for NLP applications specifically aimed
▪ Dean also has done some work on word embeddings, developing a
▪ Ian is now looking for good sequential benchmarks to compare the
▪ It’s surprisingly hard to find reported results that we can compare for
Scott E. Fahlman <sef@cs.cmu.edu>
26
CMU/LTI
27
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
28
Scott E. Fahlman <sef@cs.cmu.edu>
CMU/LTI
29
Scott E. Fahlman <sef@cs.cmu.edu>