chapter18
play

Chapter18 Learning from Observations Sec. 1 - 3 20070607 - PDF document

Chapter18 Learning from Observations Sec. 1 - 3 20070607 Chap18 1 Learning Essential for unknown environments. i.e. when designer lacks omniscience Learning modifies the agents decision mechanisms to improve performance


  1. Chapter18 Learning from Observations Sec. 1 - 3 20070607 Chap18 1 Learning Essential for unknown environments. • i.e. when designer lacks omniscience Learning modifies the agent’s decision • mechanisms to improve performance 20070607 Chap18 2 1

  2. Learning Agents Performance Element • Decides what actions to take - Learning Element • Modifies the performance element to make better decision - 20070607 Chap18 3 Inductive Learning Given as input the correct value of the unknown • function for particular inputs and must try to recover the unknown function or something close it. Pure inductive Inference (or Induction) • Given a collection of examples of f , return a function h that approximates f . An example is a pair: (x, f(x)) h: hypothesis function, f: target function It is not easy to tell whether any particular h is a • good approximation of f . 20070607 Chap18 4 2

  3. Inductive Learning Method Construct/adjust h to agree with f on • training set) ( h is consistent if it agrees with f on all examples) 20070607 Chap18 5 Inductive Learning Method (cont.-1) 20070607 Chap18 6 3

  4. Inductive Learning Method (cont.-2) 20070607 Chap18 7 Inductive Learning Method (cont.-3) 20070607 Chap18 8 4

  5. Inductive Learning Method (cont.-4) 20070607 Chap18 9 Inductive Learning Method (cont.-5) How do we choose from among multiple • consistent hypotheses? Ockham’s razor Prefer the simplest hypothesis consistent with the data 20070607 Chap18 10 5

  6. Attribute-based Representations Examples described by attribute values (Boolean, discrete, continuous, etc.) e.g. situations where I will/won’t wait for a table 20070607 Chap18 11 Learning Decision Trees Problem: decide whether to wait for a table at a restaurant, based on the following attributes: 1. Alternate: is there an alternative restaurant nearby? 2. Bar: is there a comfortable bar area to wait in? 3. Fri/Sat: is today Friday or Saturday? 4. Hungry: are we hungry? 5. Patrons: number of people in the restaurant (None, Some, Full) 6. Price: price range ($, $$, $$$) 7. Raining: is it raining outside? 8. Reservation: have we made a reservation? 9. Type: kind of restaurant (French, Italian, Thai, Burger) 10. WaitEstimate: estimated waiting time (0-10, 10-30, 30-60, >60) 20070607 Chap18 12 6

  7. Decision Trees 20070607 Chap18 13 Expressiveness Decision trees can express any function of the • input attributes. Trivially, exist a consistent decision tree for any • training set. Prefer to find more compact decision trees. • 20070607 Chap18 14 7

  8. Hypothesis Spaces How many distinct decision trees with n Boolean attributes? = number of Boolean functions = number of distinct truth tables with 2 n rows n 2 2 = e.g. with 6 Boolean attributes, there are 18,446,744,073,709,551,616 trees. 20070607 Chap18 15 Hypothesis Spaces (cont.) How many purely conjunctive hypotheses ? (e.g., Hungry ∧ ¬ Rain) Each attribute can be in (positive), in (negative), or out • ⇒ 3 n distinct conjunctive hypotheses More expressive hypothesis space • - increases chance that target function can be expressed - increases number of hypotheses consistent with training set ⇒ may get worse predictions 20070607 Chap18 16 8

  9. Decision Tree Learning Aim: find a small tree consistent with the training example Idea: (recursively) choose “most significant” attribute as root of (sub)tree 20070607 Chap18 17 Choosing An Attribute Idea: a good attribute splits the examples into subsets that are (ideally) “all positive” or “all negative” Patrons? is a better choice. 20070607 Chap18 18 9

  10. Using Information Theory To implement Choose-Attribute in the DTL algorithm • Information Content (Entropy): • n ∑ = − I ( P ( v ),...., P ( v )) P ( v ) log P ( v ) 1 n i 2 i = i 1 For a training set containing p positive examples • and n negative examples: p n p p n n = − − I ( , ) log log + + + + + + 2 2 p n p n p n p n p n p n 20070607 Chap18 19 Information Gain A chosen attribute A divides the training set E into subsets • E 1 , … , E v according to their values for A , where A has v distinct values. + v ∑ p n p n = i i i i remainder ( A ) I ( , ) + + + p n p n p n = i 1 i i i i Information Gain (IG) or reduction in entropy from the • attribute test: p n = − IG ( A ) I ( , ) remainder ( A ) + + p n p n Choose the attribute with the largest IG • 20070607 Chap18 20 10

  11. Information Gain (cont.) For the training set, p = n = 6, • I(6/12, 6/12) = 1 bit Consider the attributes Patrons and Type • (and others too): 2 4 6 2 4 = − + + = IG ( Patrons ) 1 [ I ( 0 , 1 ) I ( 1 , 0 ) I ( , )] . 0541 bits 12 12 12 6 6 2 1 1 2 1 1 4 2 2 4 2 2 = − + + + = IG ( Type ) 1 [ I ( , ) I ( , ) I ( , ) I ( , )] 0 bits 12 2 2 12 2 2 12 4 4 12 4 4 Patrons has the highest IG of all attributes and so • is chosen by the DTL algorithm as the root 20070607 Chap18 21 Example Decision tree learned from the 12 examples: • Substantially simpler than “true” tree---a more • complex hypothesis isn’t justified by small amount of data 20070607 Chap18 22 11

  12. Performance Measurement How do we know that h ≈ f ? • 1. Use theorems of computational/statistical learning theory 2. Try h on a new test set of examples (use same distribution over example space as training set) Learning curve = % correct on test set as a function of training set size 20070607 Chap18 23 Summary Learning needed for unknown environments, • lazy designers Learning agent • = performance element + learning element For supervised learning, the aim is • to find a simple hypothesis approximately consistent with training examples Decision tree learning using information gain • Learning performance = prediction accuracy • measured on test set 20070607 Chap18 24 12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend