For Thursday Read chapter 23, sections 1-3 Homework: Chapter 18, - PowerPoint PPT Presentation

For Thursday • Read chapter 23, sections 1-3 • Homework: – Chapter 18, exercise 25, parts a and b only

Program 4 • Any questions?

PAC Learning • The only reasonable expectation of a learner is that with high probability it learns a close approximation to the target concept. • In the PAC model, we specify two small parameters, ε and δ , and require that with probability at least (1  δ ) a system learn a concept with error at most ε .

Version Space • Bounds on generalizations of a set of examples

Consistent Learners • A learner L using a hypothesis H and training data D is said to be a consistent learner if it always outputs a hypothesis with zero error on D whenever H contains such a hypothesis. • By definition, a consistent learner must produce a hypothesis in the version space for H given D . • Therefore, to bound the number of examples needed by a consistent learner, we just need to bound the number of examples needed to ensure that the version-space contains no hypotheses with unacceptably high error.

ε -Exhausted Version Space • The version space, VS H , D , is said to be ε -exhausted iff every hypothesis in it has true error less than or equal to ε. • In other words, there are enough training examples to guarantee than any consistent hypothesis has error at most ε. • One can never be sure that the version-space is ε -exhausted, but one can bound the probability that it is not. • Theorem 7.1 (Haussler, 1988): If the hypothesis space H is finite, and D is a sequence of m  1 independent random examples for some target concept c , then for any 0  ε  1, the probability that the version space VS H , D is not ε - exhausted is less than or equal to: | H | e – ε m

Sample Complexity Analysis • Let δ be an upper bound on the probability of not exhausting the version space. So:      m ( consist ( , )) P H D H e bad     m e H     ln( ) m H        ln / (flip inequality ) m    H    H     ln / m        1      ln ln / m H   

Sample Complexity Result • Therefore, any consistent learner, given at least:   1     ln ln / H    examples will produce a result that is PAC. • Just need to determine the size of a hypothesis space to instantiate this result for learning specific classes of concepts. • This gives a sufficient number of examples for PAC learning, but not a necessary number. Several approximations like that used to bound the probability of a disjunction make this a gross over-estimate in practice.

Sample Complexity of Conjunction Learning • Consider conjunctions over n boolean features. There are 3 n of these since each feature can appear positively, appear negatively, or not appear in a given conjunction. Therefore |H|= 3 n, so a sufficient number of examples to learn a PAC concept is:     1 1          n ln ln 3 / ln ln 3 / n       • Concrete examples: – δ=ε=0.05, n =10 gives 280 examples – δ=0.01, ε=0.05, n =10 gives 312 examples – δ=ε=0.01, n =10 gives 1,560 examples – δ=ε=0.01, n =50 gives 5,954 examples • Result holds for any consistent learner.

Sample Complexity of Learning Arbitrary Boolean Functions • Consider any boolean function over n boolean features such as the hypothesis space of DNF or decision trees. There are 2 2^ n of these, so a sufficient number of examples to learn a PAC concept is:     1 1  n      2    n ln ln 2 / ln 2 ln 2 /       • Concrete examples: – δ=ε=0.05, n =10 gives 14,256 examples – δ=ε=0.05, n =20 gives 14,536,410 examples – δ=ε=0.05, n =50 gives 1.561 x10 16 examples

COLT Conclusions • The PAC framework provides a theoretical framework for analyzing the effectiveness of learning algorithms. • The sample complexity for any consistent learner using some hypothesis space, H , can be determined from a measure of its expressiveness | H | or VC( H ), quantifying bias and relating it to generalization. • If sample complexity is tractable, then the computational complexity of finding a consistent hypothesis in H governs its PAC learnability. • Constant factors are more important in sample complexity than in computational complexity, since our ability to gather data is generally not growing exponentially. • Experimental results suggest that theoretical sample complexity bounds over-estimate the number of training instances needed in practice since they are worst-case upper bounds.

COLT Conclusions (cont) • Additional results produced for analyzing: – Learning with queries – Learning with noisy data – Average case sample complexity given assumptions about the data distribution. – Learning finite automata – Learning neural networks • Analyzing practical algorithms that use a preference bias is difficult. • Some effective practical algorithms motivated by theoretical results: – Winnow – Boosting – Support Vector Machines (SVM)

Beyond a Single Learner • Ensembles of learners work better than individual learning algorithms • Several possible ensemble approaches: – Ensembles created by using different learning methods and voting – Bagging – Boosting

Bagging • Random selections of examples to learn the various members of the ensemble. • Seems to work fairly well, but no real guarantees.

Boosting • Most used ensemble method • Based on the concept of a weighted training set. • Works especially well with weak learners. • Start with all weights at 1. • Learn a hypothesis from the weights. • Increase the weights of all misclassified examples and decrease the weights of all correctly classified examples. • Learn a new hypothesis. • Repeat

Why Neural Networks?

Why Neural Networks? • Analogy to biological systems, the best examples we have of robust learning systems. • Models of biological systems allowing us to understand how they learn and adapt. • Massive parallelism that allows for computational efficiency. • Graceful degradation due to distributed represent- ations that spread knowledge representation over large numbers of computational units. • Intelligent behavior is an emergent property from large numbers of simple units rather than resulting from explicit symbolically encoded rules.

Neural Speed Constraints • Neuron “switching time” is on the order of milliseconds compared to nanoseconds for current transistors. • A factor of a million difference in speed. • However, biological systems can perform significant cognitive tasks (vision, language understanding) in seconds or tenths of seconds.

What That Means • Therefore, there is only time for about a hundred serial steps needed to perform such tasks. • Even with limited abilties, current AI systems require orders of magnitude more serial steps. • Human brain has approximately 10 11 neurons each connected on average to 10 4 others, therefore must exploit massive parallelism.

Real Neurons • Cells forming the basis of neural tissue – Cell body – Dendrites – Axon – Syntaptic terminals • The electrical potential across the cell membrane exhibits spikes called action potentials. • Originating in the cell body, this spike travels down the axon and causes chemical neuro- transmitters to be released at syntaptic terminals. • This chemical difuses across the synapse into dendrites of neighboring cells.

Real Neurons (cont) • Synapses can be excitory or inhibitory. • Size of synaptic terminal influences strength of connection. • Cells “add up” the incoming chemical messages from all neighboring cells and if the net positive influence exceeds a threshold, they “fire” and emit an action potential.

Model Neuron (Linear Threshold Unit) • Neuron modelled by a unit ( j ) connected by weights, w ji , to other units ( i ): • Net input to a unit is defined as: net j = S w ji * o i • Output of a unit is a threshold function on the net input: – 1 if net j > T j – 0 otherwise

Neural Computation • McCollough and Pitts (1943) show how linear threshold units can be used to compute logical functions. • Can build basic logic gates – AND: Let all w ji be (T j /n)+  where n = number of inputs – OR: Let all w ji be T j +  – NOT: Let one input be a constant 1 with weight T j +e and the input to be inverted have weight -T j

Neural Computation (cont) • Can build arbitrary logic circuits, finite-state machines, and computers given these basis gates. • Given negated inputs, two layers of linear threshold units can specify any boolean function using a two-layer AND-OR network.

Learning • Hebb (1949) suggested if two units are both active (firing) then the weight between them should increase: w ji = w ji +  o j o i –  is a constant called the learning rate – Supported by physiological evidence

For Thursday Read chapter 23, sections 1-3 Homework: Chapter 18, - PowerPoint PPT Presentation

For Thursday Read chapter 23, sections 1-3 Homework: Chapter 18, exercise 25, parts a and b only Program 4 Any questions? PAC Learning The only reasonable expectation of a learner is that with high probability it learns a close

The Jewel of Dublins churches Thursday 18 July 13 1 Thursday 18 July 13 2 Thursday 18 July

Thursday, September 10, 2009 Thursday, September 10, 2009 Thursday, September

Thursday, 6 August 15 Kampala Thursday, 6 August 15 Kampala City Thursday, 6 August 15 Kampala

Kill-switch Presentation Motivation 2013/5/16 Thursday 2013/5/16 Thursday 2013/5/16 Thursday

Sound Thursday, 8 December 11 CD quality 44.1 kHz, 16-bit, stereo Thursday, 8 December 11

History and Biology Thursday, April 3, 14 Apis Cerana Apis Cerana Thursday, April 3, 14 Apis

Annual Meeting - May 2, 2013 Thursday, May 2, 13 Call to Order Thursday, May 2, 13 Review of

HyperAgile: Empowering Creativity within Software Development Processes Sam Aaron Thursday, 11

Pegarus & Poison Rubinius VM as a Multilanguage Platform Thursday, July 29, 2010 Brian Ford

Interpretation of Probability February 14, 2013 Thursday, February 14, 13 Thursday, February

Architecture of the Triposo travel guide Douwe Osinga (@dosinga) Thursday, 17 October 13 The

Timelines @ Twitter QCon London 2012 Arya Asemanfar Thursday, March 8, 2012 Poll-based Timeline

basho Thursday, 11 April 13 $ Thursday, 11 April 13 $ whoami Thursday, 11 April 13 $ whoami

Modeling your Customer December 2013 Thursday, 5 December, 13 Coming Clean Thursday, 5

Thursday, 22 March 2012 1 Thursday, 22 March 2012 T ere once was a queen so s rong and so pr o d,

Welcome Thursday, January 5, 2012 H EALTHY C HOCOLATE W EIGHT -L OSS S YMPOSIUM Thursday, January

What Go What Got This All t This All Star Started? d? Recen cent T Trends in ends in Ballo

A Vision for Specialty Practice Take a moment to imagine if all of your clients had a pet

POLI 100M: Poli-cal Psychology Lecture 2: Individual Differences Taylor N. Carlson

Justification and Conflicts of Rights Marinella Capriati Matthias Brinkmann Marinella Capriati

APPENDIX 1Q 2020 Financial Results May 7, 2020 Lakshmi Mittal, Chairman and CEO Aditya Mittal,

CREATING OPPORTUNITIES IN THE NIGERIAN GAS TO INDUSTRY SECTOR CHARTING A WAY FORWARD SEPTEMBER

Co-authors: Bjrn Lfstrand Dannie Cutts Erin Honold #ITEC2019 A Standardized and Modular

Evaluation in Todays Political, Social and Technological Climate Kathryn Newcomer September

For Thursday Read chapter 23, sections 1-3 Homework: Chapter 18, - PowerPoint PPT Presentation

For Thursday Read chapter 23, sections 1-3 Homework: Chapter 18, exercise 25, parts a and b only Program 4 Any questions? PAC Learning The only reasonable expectation of a learner is that with high probability it learns a close

The Jewel of Dublins churches Thursday 18 July 13 1 Thursday 18 July 13 2 Thursday 18 July

Thursday, September 10, 2009 Thursday, September 10, 2009 Thursday, September

Thursday, 6 August 15 Kampala Thursday, 6 August 15 Kampala City Thursday, 6 August 15 Kampala

Kill-switch Presentation Motivation 2013/5/16 Thursday 2013/5/16 Thursday 2013/5/16 Thursday

Sound Thursday, 8 December 11 CD quality 44.1 kHz, 16-bit, stereo Thursday, 8 December 11

History and Biology Thursday, April 3, 14 Apis Cerana Apis Cerana Thursday, April 3, 14 Apis

Annual Meeting - May 2, 2013 Thursday, May 2, 13 Call to Order Thursday, May 2, 13 Review of

HyperAgile: Empowering Creativity within Software Development Processes Sam Aaron Thursday, 11

Pegarus &amp; Poison Rubinius VM as a Multilanguage Platform Thursday, July 29, 2010 Brian Ford

Interpretation of Probability February 14, 2013 Thursday, February 14, 13 Thursday, February

Architecture of the Triposo travel guide Douwe Osinga (@dosinga) Thursday, 17 October 13 The

Timelines @ Twitter QCon London 2012 Arya Asemanfar Thursday, March 8, 2012 Poll-based Timeline

basho Thursday, 11 April 13 $ Thursday, 11 April 13 $ whoami Thursday, 11 April 13 $ whoami

Modeling your Customer December 2013 Thursday, 5 December, 13 Coming Clean Thursday, 5

Thursday, 22 March 2012 1 Thursday, 22 March 2012 T ere once was a queen so s rong and so pr o d,

Welcome Thursday, January 5, 2012 H EALTHY C HOCOLATE W EIGHT -L OSS S YMPOSIUM Thursday, January

What Go What Got This All t This All Star Started? d? Recen cent T Trends in ends in Ballo

A Vision for Specialty Practice Take a moment to imagine if all of your clients had a pet

POLI 100M: Poli-cal Psychology Lecture 2: Individual Differences Taylor N. Carlson

Justification and Conflicts of Rights Marinella Capriati Matthias Brinkmann Marinella Capriati

APPENDIX 1Q 2020 Financial Results May 7, 2020 Lakshmi Mittal, Chairman and CEO Aditya Mittal,

CREATING OPPORTUNITIES IN THE NIGERIAN GAS TO INDUSTRY SECTOR CHARTING A WAY FORWARD SEPTEMBER

Co-authors: Bjrn Lfstrand Dannie Cutts Erin Honold #ITEC2019 A Standardized and Modular

Evaluation in Todays Political, Social and Technological Climate Kathryn Newcomer September

Pegarus & Poison Rubinius VM as a Multilanguage Platform Thursday, July 29, 2010 Brian Ford