machine learning 2007 lecture 4 instructor tim van erven
play

Machine Learning 2007: Lecture 4 Instructor: Tim van Erven - PowerPoint PPT Presentation

Machine Learning 2007: Lecture 4 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website: www.cwi.nl/erven/teaching/0708/ml/ September 27, 2007 1 / 29 Overview Organisational Organisational Matters Matters An Unbiased Hypothesis


  1. Machine Learning 2007: Lecture 4 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website: www.cwi.nl/˜erven/teaching/0708/ml/ September 27, 2007 1 / 29

  2. Overview Organisational Organisational Matters ● Matters An Unbiased Hypothesis Space for ● L IST -T HEN -E LIMINATE L IST -T HEN -E LIMINATE ? Directed Graphs and Trees Math: Directed Graphs and Trees ● Hypothesis Space: Decision Trees for Classification ● Decision Trees ID3 ✦ Hypothesis Space: Decision Trees Probability ✦ Method: ID3 Distributions Math: Probability Distributions ● 2 / 29

  3. Organisational Matters Organisational Course Organisation: Matters Biweekly exercises: you get a full week instead of 5 days. L IST -T HEN -E LIMINATE ● Directed Graphs and Exercise 2 available this evening. ● Trees Grades for Exercise 1 available this week. ● Hypothesis Space: Decision Trees Study Guide: ID3 Probability You don’t have to know the details of the ● Distributions C ANDIATE -E LIMINATION algorithm, just that it does the same thing as the L IST -T HEN -E LIMINATE algorithm. But sections 2.6 and 2.7 of Mitchell are very important! Just ● replace each occurrence of C ANDIATE -E LIMINATION by L IST -T HEN -E LIMINATE when reading them. This Lecture versus Mitchell: Decision trees are in Mitchell, but I will discuss the underlying ● mathematics in much more detail. 3 / 29

  4. Overview Organisational Organisational Matters ● Matters An Unbiased Hypothesis Space for ● L IST -T HEN -E LIMINATE L IST -T HEN -E LIMINATE ? Directed Graphs and Trees Math: Directed Graphs and Trees ● Hypothesis Space: Decision Trees for Classification ● Decision Trees ID3 ✦ Hypothesis Space: Decision Trees Probability ✦ Method: ID3 Distributions Math: Probability Distributions ● 4 / 29

  5. L IST -T HEN -E LIMINATE Algorithm Description: Organisational Matters L IST -T HEN -E LIMINATE L IST -T HEN -E LIMINATE finds the set, VersionSpace, of all ● Directed Graphs and Trees hypotheses that are consistent with all the training data. Hypothesis Space: It can only classify a new feature vector x if all the hypotheses ● Decision Trees in VersionSpace agree. ID3 Probability Distributions Hypothesis Space: H = {� ? , ? , ? , ? , ? , ? � , � Sunny , ? , ? , ? , ? , ? � , � Warm , ? , ? , ? , ? , ? � , . . . , �∅ , ∅ , ∅ , ∅ , ∅ , ∅�} Has a very strong representation bias : Only 973 out of ● 2 96 ≈ 10 29 possible hypotheses can be represented. 5 / 29

  6. An Unbiased Hypothesis Space All Possible Hypotheses: Organisational Matters Why not take all possible hypotheses as a hypothesis space for L IST -T HEN -E LIMINATE L IST -T HEN -E LIMINATE ? Directed Graphs and Trees Hypothesis Space: H = { h | h is a function from X to Y} , Decision Trees ID3 where Probability Distributions X = set of possible feature vectors, ● Y = set of possible labels, ● |H| = |Y| |X| = 2 96 . ● 6 / 29

  7. An Unbiased Hypothesis Space All Possible Hypotheses: Organisational Matters Why not take all possible hypotheses as a hypothesis space for L IST -T HEN -E LIMINATE L IST -T HEN -E LIMINATE ? Directed Graphs and Trees Hypothesis Space: H = { h | h is a function from X to Y} , Decision Trees ID3 where Probability Distributions X = set of possible feature vectors, ● Y = set of possible labels, ● |H| = |Y| |X| = 2 96 . ● Classifying a New Feature Vector: � y 1 � � y n � Given: data D = , . . . , . ● x 1 x n What happens if we try to classify a new feature vector x n +1 ? ● 6 / 29

  8. Classifying New Instances For any hypothesis h ∈ H , there exists a h ′ ∈ H such that Organisational Matters L IST -T HEN -E LIMINATE h ( x ) � = h ′ ( x ) if x = x n + 1 , Directed Graphs and Trees h ( x ) = h ′ ( x ) for any other x . Hypothesis Space: Decision Trees ID3 Probability Distributions 7 / 29

  9. Classifying New Instances For any hypothesis h ∈ H , there exists a h ′ ∈ H such that Organisational Matters L IST -T HEN -E LIMINATE h ( x ) � = h ′ ( x ) if x = x n + 1 , Directed Graphs and Trees h ( x ) = h ′ ( x ) for any other x . Hypothesis Space: Decision Trees Consequence: ID3 Probability Suppose x n +1 does not occur in D . ● Distributions Then for every h ∈ VersionSpace, there exists an alternative ● h ′ ∈ VersionSpace that disagrees on the label of x n +1 : h ( x n +1 ) � = h ′ ( x n +1 ) Conclusion: In an unbiased hypothesis space, the L IST -T HEN -E LIMINATE algorithm cannot generalise at all. Bias is unavoidable! 7 / 29

  10. Overview Organisational Organisational Matters ● Matters An Unbiased Hypothesis Space for ● L IST -T HEN -E LIMINATE L IST -T HEN -E LIMINATE ? Directed Graphs and Trees Math: Directed Graphs and Trees ● Hypothesis Space: Decision Trees for Classification ● Decision Trees ID3 ✦ Hypothesis Space: Decision Trees Probability ✦ Method: ID3 Distributions Math: Probability Distributions ● 8 / 29

  11. Directed Graphs Organisational A directed graph G is an ordered pair G = ( V, E ) , where Matters V = { v 1 , . . . , v m } is a set of vertices / nodes ; ● L IST -T HEN -E LIMINATE E = { e 1 , . . . , e n } is a set of directed edges between the Directed Graphs and ● Trees vertices in V . Hypothesis Space: Decision Trees Each directed edge e from vertex u to vertex v is an ordered ● ID3 pair e = ( u, v ) . Probability I can draw the same directed graph in different ways. ● Distributions v1 v2 v3 v4 v7 v5 v6 9 / 29

  12. Directed Graphs Organisational A directed graph G is an ordered pair G = ( V, E ) , where Matters V = { v 1 , . . . , v m } is a set of vertices / nodes ; ● L IST -T HEN -E LIMINATE E = { e 1 , . . . , e n } is a set of directed edges between the Directed Graphs and ● Trees vertices in V . Hypothesis Space: Decision Trees Each directed edge e from vertex u to vertex v is an ordered ● ID3 pair e = ( u, v ) . Probability I can draw the same directed graph in different ways. ● Distributions v1 v1 v2 v3 v2 v3 v4 v7 v6 v7 v5 v6 v4 v5 9 / 29

  13. Directed Graphs with Edge Labels We can also label edges with labels from some set of Organisational ● Matters possible labels L . Now G = ( V, E, L ) . L IST -T HEN -E LIMINATE Each directed edge e with label l ∈ L from vertex u to vertex v ● Directed Graphs and Trees is an ordered pair e = ( u, l, v ) . Hypothesis Space: Decision Trees Example: ID3 Let L = { a, b, c } . Probability Distributions c v1 v2 a c v3 v4 b a a v5 v6 v7 10 / 29

  14. Tree Examples Organisational Example 1: Example 2: Example 3: Matters L IST -T HEN -E LIMINATE Directed Graphs and v1 v1 Trees a b Hypothesis Space: Decision Trees v1 v3 v2 v3 v2 v4 ID3 Probability Distributions Example 4: Example 5: ● In all examples the root of the tree v1 is v 1 . ● The nodes with- v1 out outgoing a b v2 v3 v8 edges (shown in v2 v3 red) are called v4 v5 b a leaves . ● The other nodes v4 v5 v6 v7 are called inter- nal nodes. 11 / 29

  15. Directed Trees A directed graph is a (directed) tree T = ( V, E ) with root v ∈ V if Organisational Matters and only if either: L IST -T HEN -E LIMINATE v is the only node: T = ( { v } , ∅ ) , or Directed Graphs and 1. Trees 2. T 1 , . . . , T k are trees with roots t 1 , . . . , t k , ● Hypothesis Space: Decision Trees v , T 1 , . . . , T k have no nodes in common, and ● ID3 T looks like: ● Probability Distributions v t1 tk Tk T1 12 / 29

  16. Properties of Trees Let T be a (directed) tree. Organisational Matters If T contains an edge e = ( u, v ) from node u to node v , then ● L IST -T HEN -E LIMINATE Directed Graphs and ✦ u is called the parent of v , Trees v is called the child of u . Hypothesis Space: ✦ Decision Trees ID3 Probability Distributions 13 / 29

  17. Properties of Trees Let T be a (directed) tree. Organisational Matters If T contains an edge e = ( u, v ) from node u to node v , then ● L IST -T HEN -E LIMINATE Directed Graphs and ✦ u is called the parent of v , Trees v is called the child of u . Hypothesis Space: ✦ Decision Trees ID3 Number of Parents: Probability Distributions Each node has exactly one parent, except for the root , which ● has no parents. 13 / 29

  18. Properties of Trees Let T be a (directed) tree. Organisational Matters If T contains an edge e = ( u, v ) from node u to node v , then ● L IST -T HEN -E LIMINATE Directed Graphs and ✦ u is called the parent of v , Trees v is called the child of u . Hypothesis Space: ✦ Decision Trees ID3 Number of Parents: Probability Distributions Each node has exactly one parent, except for the root , which ● has no parents. Number of Children: Each node may have any (finite) number of children. ● The leaves are the nodes without children. ● The internal nodes have at least one child. ● 13 / 29

  19. Overview Organisational Organisational Matters ● Matters An Unbiased Hypothesis Space for ● L IST -T HEN -E LIMINATE L IST -T HEN -E LIMINATE ? Directed Graphs and Trees Math: Directed Graphs and Trees ● Hypothesis Space: Decision Trees for Classification ● Decision Trees ID3 ✦ Hypothesis Space: Decision Trees Probability ✦ Method: ID3 Distributions Math: Probability Distributions ● 14 / 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend