1 Hypothesis Space Inductive Learning Hypothesis Restrict learned - PDF document

Classification (Categorization) • Given: – A description of an instance, x ∈ X , where X is the CS 391L: Machine Learning: instance language or instance space . – A fixed set of categories: C= { c 1 , c 2 ,… c n } Inductive Classification • Determine: – The category of x : c ( x ) ∈ C, where c ( x ) is a categorization function whose domain is X and whose range is C . – If c ( x ) is a binary function C ={0,1} ({true,false}, Raymond J. Mooney {positive, negative}) then it is called a concept . University of Texas at Austin 1 2 Learning for Categorization Sample Category Learning Problem • A training example is an instance x ∈ X, • Instance language: <size, color, shape> – size ∈ {small, medium, large} paired with its correct category c ( x ): – color ∈ {red, blue, green} < x , c ( x )> for an unknown categorization – shape ∈ {square, circle, triangle} function, c . • C = {positive, negative} • Given a set of training examples, D . • D : Example Size Color Shape Category • Find a hypothesized categorization function, 1 small red circle positive h ( x ), such that: 2 large red circle positive ∀ < > ∈ = x , c ( x ) D : h ( x ) c ( x ) 3 small red triangle negative Consistency 4 large blue circle negative 3 4 Hypothesis Selection Generalization • Many hypotheses are usually consistent with the • Hypotheses must generalize to correctly training data. classify instances not in the training data. – red & circle • Simply memorizing training examples is a – (small & circle) or (large & red) consistent hypothesis that does not – (small & red & circle) or (large & red & circle) generalize. – not [ ( red & triangle) or (blue & circle) ] • Occam’s razor : – not [ ( small & red & triangle) or (large & blue & circle) ] • Bias – Finding a simple hypothesis helps ensure – Any criteria other than consistency with the training data generalization. that is used to select a hypothesis. 5 6 1

Hypothesis Space Inductive Learning Hypothesis • Restrict learned functions a priori to a given hypothesis • Any function that is found to approximate the target space , H , of functions h ( x ) that can be considered as concept well on a sufficiently large set of training definitions of c ( x ). examples will also approximate the target function well on • For learning concepts on instances described by n discrete- unobserved examples. valued features, consider the space of conjunctive • Assumes that the training and test examples are drawn hypotheses represented by a vector of n constraints independently from the same underlying distribution. < c 1 , c 2 , … c n > where each c i is either: • This is a fundamentally unprovable hypothesis unless – ?, a wild card indicating no constraint on the i th feature additional assumptions are made about the target concept – A specific value from the domain of the i th feature and the notion of “approximating the target function well – Ø indicating no value is acceptable on unobserved examples” is defined appropriately (cf. • Sample conjunctive hypotheses are computational learning theory). – <big, red, ?> – <?, ?, ?> (most general hypothesis) – < Ø, Ø, Ø> (most specific hypothesis) 7 8 Evaluation of Classification Learning Category Learning as Search • Category learning can be viewed as searching the • Classification accuracy (% of instances hypothesis space for one (or more) hypotheses that are classified correctly). consistent with the training data. • Consider an instance space consisting of n binary features – Measured on an independent test data. which therefore has 2 n instances. • For conjunctive hypotheses, there are 4 choices for each • Training time (efficiency of training feature: Ø, T, F, ?, so there are 4 n syntactically distinct algorithm). hypotheses. • However, all hypotheses with 1 or more Øs are equivalent, • Testing time (efficiency of subsequent so there are 3 n +1 semantically distinct hypotheses. • The target binary categorization function in principle could classification). be any of the possible 2 2^ n functions on n input bits. • Therefore, conjunctive hypotheses are a small subset of the space of possible functions, but both are intractably large. • All reasonable hypothesis spaces are intractably large or even infinite. 9 10 Learning by Enumeration Efficient Learning • For any finite or countably infinite hypothesis • Is there a way to learn conjunctive concepts space, one can simply enumerate and test without enumerating them? hypotheses one at a time until a consistent one is • How do human subjects learn conjunctive found. concepts? � ✁ ✂ ✄ ☎ ✆ ✝ ✞ ✟ ✠ ✡ ☛ ✁ ☞ ✌ ✍ ✏ ✏ ✑ ✏ ✏ ✏ ✏ • Is there a way to efficiently find an ✞ ✟ ✎ ✆ ✁ ✠ ✎ ✟ ✎ ✄ ✠ ✟ ✝ ✝ ✄ ✂ ☎ ✟ ✠ ✟ ✠ ✒ ☛ ☎ ☎ ✓ ✔ unconstrained boolean function consistent ✏ ✏ ✏ ✏ ✖ ✝ ✄ ✠ ✄ ✂ ✕ ✟ ✠ ☎ ✄ ☎ ✠ ☛ ✂ ✄ ✂ ✠ ✞ ✗ with a set of discrete-valued training • This algorithm is guaranteed to terminate with a instances? consistent hypothesis if one exists; however, it is obviously computationally intractable for almost • If so, is it a useful/practical algorithm? any practical problem. 11 12 2

Conjunctive Rule Learning Limitations of Conjunctive Rules • Conjunctive descriptions are easily learned by finding • If a concept does not have a single set of necessary all commonalities shared by all positive examples. and sufficient conditions, conjunctive learning fails. Example Size Color Shape Category 1 small red circle positive Example Size Color Shape Category 2 large red circle positive 1 small red circle positive 3 small red triangle negative 2 large red circle positive 4 large blue circle negative 3 small red triangle negative Learned rule: red & circle → positive 4 large blue circle negative 5 medium red circle negative • Must check consistency with negative examples. If inconsistent, no conjunctive rule exists. Learned rule: red & circle → positive Inconsistent with negative example #5! 13 14 Disjunctive Concepts Using the Generality Structure • By exploiting the structure imposed by the generality of • Concept may be disjunctive. hypotheses, an hypothesis space can be searched for consistent hypotheses without enumerating or explicitly Example Size Color Shape Category exploring all hypotheses. 1 small red circle positive • An instance, x ∈ X , is said to satisfy an hypothesis, h , iff 2 large red circle positive h ( x )=1 (positive) • Given two hypotheses h 1 and h 2 , h 1 is more general than 3 small red triangle negative or equal to h 2 ( h 1 ≥ h 2 ) iff every instance that satisfies h 2 4 large blue circle negative also satisfies h 1 . 5 medium red circle negative • Given two hypotheses h 1 and h 2 , h 1 is ( strictly ) more general than h 2 ( h 1 > h 2 ) iff h 1 ≥ h 2 and it is not the case that Learned rules: small & circle → positive h 2 ≥ h 1. large & red → positive • Generality defines a partial order on hypotheses. 15 16 Examples of Generality Sample Generalization Lattice • Conjunctive feature vectors Size: {sm, big} Color: {red, blue} Shape: {circ, squr} – <?, red, ?> is more general than <?, red, circle> <?, ?, ?> – Neither of <?, red, ?> and <?, ?, circle> is more general than the other. <?,?,circ> <big,?,?> <?,red,?> <?,blue,?> <sm,?,?> <?,?,squr> • Axis-parallel rectangles in 2-d space C < ?,red,circ><big,?,circ><big,red,?><big,blue,?><sm,?,circ><?,blue,circ> <?,red,squr><sm.?,sqr><sm,red,?><sm,blue,?><big,?,squr><?,blue,squr> A B < big,red,circ><sm,red,circ><big,blue,circ><sm,blue,circ>< big,red,squr><sm,red,squr><big,blue,squr><sm,blue,squr> – A is more general than B < Ø, Ø, Ø> – Neither of A and C are more general than the other. Number of hypotheses = 3 3 + 1 = 28 17 18 3

1 Hypothesis Space Inductive Learning Hypothesis Restrict learned - PDF document

Classification (Categorization) Given: A description of an instance, x X , where X is the CS 391L: Machine Learning: instance language or instance space . A fixed set of categories: C= { c 1 , c 2 , c n } Inductive

Testing by Implicit Learning Ilias Diakonikolas Columbia University March 2009 2 What this

CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman selman@cs.cornell.edu

Decision Tree Learning: Part 1 Yingyu Liang Computer Sciences 760 Fall 2017

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I What We Did The

For Monday No reading No homework Program 1 Questions? Homework Decision Tree

Dark Matter Radio (DM Radio) Kent Irwin for the DM Radio Collaboration DM Radio Pathfinder

Twelve Key Ideas In Machine Learning Pedro Domingos Dept. of Computer Science & Eng.

PSYC 001 : General Psychology If this is your first day, see an instructor after class.

Efficient Tensor Decomposition and Its Application Naoki KAWASHIMA (ISSP) Dec. 3, 2018 Occam's

Machine Learning @Quora: Beyond Deep Learning 08/02/2016 Xavier Amatriain (@xamat) Our Mission

Computational challenges & opportunities P. Perona California Institute of Technology 4

Kolmogorov Complexity of Categories Complexity Programing Language Kolmogorov Noson S.

Techniques for Private Data Analysis Sofya Raskhodnikova Penn State University Based on joint

Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the

Prediction models of Social Media data Daniel Preotiuc-Pietro daniel@dcs.shef.ac.uk 11.10.2013

Machine learning theory Nonuniform learnability Hamid Beigy Sharif university of technology

Advances in Gaussian Processes Tutorial at NIPS 2006 in Vancouver Carl Edward Rasmussen Max

Bayesian Model Comparison Roberto Trotta - www.robertotrotta.com Analytics, Computation and

arXiv:2007.10928v1 [cs.LG] 21 Jul 2020 Abstract The No Free Lunch theorems prove that under a

tic r The he e extr xtragala lactic ray sk y sky Thr hree a appr pproa oache

Econometric Evaluation of Social Programs Part I: Counterfactuals, Causality and Structural

Sigmoid curves and a case for close-to-linear nonlinear models Charles Y. Tan charles

COL866: Foundations of Data Science Ragesh Jaiswal, IITD Ragesh Jaiswal, IITD COL866:

Learnability and models of decision making under uncertainty Pathikrit Basu Federico Echenique

1 Hypothesis Space Inductive Learning Hypothesis Restrict learned - PDF document

Classification (Categorization) Given: A description of an instance, x X , where X is the CS 391L: Machine Learning: instance language or instance space . A fixed set of categories: C= { c 1 , c 2 , c n } Inductive

Testing by Implicit Learning Ilias Diakonikolas Columbia University March 2009 2 What this

CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman selman@cs.cornell.edu

Decision Tree Learning: Part 1 Yingyu Liang Computer Sciences 760 Fall 2017

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I What We Did The

For Monday No reading No homework Program 1 Questions? Homework Decision Tree

Dark Matter Radio (DM Radio) Kent Irwin for the DM Radio Collaboration DM Radio Pathfinder

Twelve Key Ideas In Machine Learning Pedro Domingos Dept. of Computer Science &amp; Eng.

PSYC 001 : General Psychology If this is your first day, see an instructor after class.

Efficient Tensor Decomposition and Its Application Naoki KAWASHIMA (ISSP) Dec. 3, 2018 Occam's

Machine Learning @Quora: Beyond Deep Learning 08/02/2016 Xavier Amatriain (@xamat) Our Mission

Computational challenges &amp; opportunities P. Perona California Institute of Technology 4

Kolmogorov Complexity of Categories Complexity Programing Language Kolmogorov Noson S.

Techniques for Private Data Analysis Sofya Raskhodnikova Penn State University Based on joint

Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the

Prediction models of Social Media data Daniel Preotiuc-Pietro daniel@dcs.shef.ac.uk 11.10.2013

Machine learning theory Nonuniform learnability Hamid Beigy Sharif university of technology

Advances in Gaussian Processes Tutorial at NIPS 2006 in Vancouver Carl Edward Rasmussen Max

Bayesian Model Comparison Roberto Trotta - www.robertotrotta.com Analytics, Computation and

arXiv:2007.10928v1 [cs.LG] 21 Jul 2020 Abstract The No Free Lunch theorems prove that under a

tic r The he e extr xtragala lactic ray sk y sky Thr hree a appr pproa oache

Econometric Evaluation of Social Programs Part I: Counterfactuals, Causality and Structural

Sigmoid curves and a case for close-to-linear nonlinear models Charles Y. Tan charles

COL866: Foundations of Data Science Ragesh Jaiswal, IITD Ragesh Jaiswal, IITD COL866:

Learnability and models of decision making under uncertainty Pathikrit Basu Federico Echenique

Twelve Key Ideas In Machine Learning Pedro Domingos Dept. of Computer Science & Eng.

Computational challenges & opportunities P. Perona California Institute of Technology 4