ECE 4524 Artificial Intelligence and Engineering Applications - PowerPoint PPT Presentation

ECE 4524 Artificial Intelligence and Engineering Applications Lecture 23: Learning Theory Reading: AIAMA 18.4-18.5 Today’s Schedule: ◮ Evaluating Hypotheses/Models ◮ PAC Learning and Sample Complexity

Assumptions about Training and Testing Sets Critical assumptions of supervised learning are ◮ the true f does not change, it is stationary ◮ the samples from f are independent and identically distributed (IID)

Error Rate We define the error rate as the proportion of mistakes made by h over a set of N examples N Error Rate = 1 � ✶ y i � = h ( x i ) N i =1 where ✶ is the indicator function. ◮ When this error rate is zero over the training set, h is said to be consistent. ◮ It is always possible to find a hypothesis space H complex enough so that some h ∈ H is consistent.

Test Error Rate Thus we are more concerned with the test error rate . ◮ A low test error indicate h generalizes well ◮ Often a consistent hypothesis has worse generalization than a less-complex one. ◮ This trade-off between the complexity of H and the test performance is the core of supervised machine learning.

Cross-Validation ◮ So, the test error is the final word on the performance of h , but recall that we can only use the test set once . Otherwise we are said to be peeking . ◮ However, if we use the entire training set for training we will likely over-train. ◮ The answer is to use cross-validation to estimate the generalization performance of h . We partition the training set into a training and validation set. ◮ holdout cross-validation - reserve a percentage (typically 1/3) from D for validation. ◮ k -fold cross-validation - generate k independent subsets of D . giving k estimates of generalization performance ◮ when k = N this is called leave-one-out cross-validation.

Selecting Hypothesis Complexity So, to select an optimal h we need a learning algorithm , a way to optimize the parameters over a given set H ◮ Define the size of H as some parameter which adjusts the complexity of H . ◮ For increasing values of size use cross-validation and the learning algorithm to give an estimate of the training and validation error. ◮ stop when h is consistent or the training error has converged ◮ search backwards to find the size with the smallest validation error ◮ finally, train h at the optimal size using the full training set.

Loss Functions Minimizing the error rate assumes that all errors are equal in the success of the agent. From our discussion of Utility we know this is not true. ◮ In ML it is traditional to work with a cost rather than utility via a loss function . L ( x , y , ˆ y ) = U (result of y given x ) − U (result of ˆ y given x ) where y = f ( x ) and ˆ y = h ( x ) ◮ We often assume no dependence on x on the loss so we just have L ( y , ˆ y ).

Empirical Loss ◮ We would like to minimize the expected loss over the validation set N � L ( y i , h ( x i )) P ( x i , y i ) i =1 however we don’t know the joint probability ◮ Instead we assume a uniform distribution and optimize the empirical loss N 1 � L ( y i , h ( x i )) N i =1

Probably Approximately Correct Learning For Boolean functions (binary classifiers) define: � � error( h ) = L 0 / 1 ( y , h ( x )) x y N ≥ 1 ln 1 � � δ + ln |H| ǫ

Next Actions ◮ Reading on Linear Models (AIAMA 18.6) ◮ No warmup. Reminders: ◮ Quiz 3 is this Thursday (4/12).

ECE 4524 Artificial Intelligence and Engineering Applications - PowerPoint PPT Presentation

ECE 4524 Artificial Intelligence and Engineering Applications Lecture 23: Learning Theory Reading: AIAMA 18.4-18.5 Todays Schedule: Evaluating Hypotheses/Models PAC Learning and Sample Complexity Assumptions about Training and

Artificial Intelligence Artificial Intelligence Artificial Intelligence Study and design of

ECE 4524 Artificial Intelligence and Engineering Applications Tree and Graph Search Reading:

ECE 4524 Artificial Intelligence and Engineering Applications Lecture 16: Uncertainty and

ECE 4524 Artificial Intelligence and Engineering Applications Lecture 17: Bayesian Inference

ECE 4524 Artificial Intelligence and Engineering Applications Meeting 5: Two-Player Games and

ECE 4524 Artificial Intelligence and Engineering Applications Meeting 6: Alpha-Beta Pruning,

ECE 4524 Artificial Intelligence and Engineering Applications Lecture 21: Decisions and Utility

ECE 4524 Artificial Intelligence and Engineering Applications Lecture 20: Approximate Inference

ECE 4524 Artificial Intelligence and Engineering Applications Lecture 10: Theorem Proving in

ECE 4524 Artificial Intelligence and Engineering Applications Lecture 22: Introduction to

ECE 4524 Artificial Intelligence and Engineering Applications Lecture 9: Knowledge-Based Agents

ECE 4524 Artificial Intelligence and Engineering Applications Lecture 12: Unification in FOL

ECE 4524 Artificial Intelligence and Engineering Applications Meeting 8: Searching for Constraint

ECE 4524 Artificial Intelligence and Engineering Applications Lecture 4: Heuristic Search

ECE 4524 Artificial Intelligence and Engineering Applications Lecture 19: Bayesian Networks

ECE 4524 Artificial Intelligence and Engineering Applications Meeting 7: Constraint Satisfaction

Training, test and validation splits Dmitriy (Dima) Gorenshteyn Lead Data Scientist, Memorial

L ECTURE 9: E VALUATION Prof. Julia Hockenmaier juliahmr@illinois.edu Admin Homework 1 is being

1 Z-Score Test for Comparing One-sided vs Two-sided Tests Learned Hypotheses Assumes h 1 is

CAPS: A Cross-genre Author Profiling System Ivan Bilan and Desislava Zhekova Center for

Laboratory of Machine Learning with Python Numpy / Matplotlib / Scikit-learn Luca Erculiani

Similarity Metric Method for Binary Basic Blocks of Cross-Instruction Set Architecture Xiaochuan

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by

The Walking School Bus: Combining Safety, Fun, and the Walk to School Wha hat t is is a W a