Learning Sets of Rules Sequential covering algorithms FOIL - - PDF document

learning sets of rules
SMART_READER_LITE
LIVE PREVIEW

Learning Sets of Rules Sequential covering algorithms FOIL - - PDF document

Learning Sets of Rules Sequential covering algorithms FOIL Induction as inverse of deduction Inductive Logic Programming Web resources: http://web.comlab.ox.ac.uk/oucl/research/areas/ machlearn/ilp.html


slide-1
SLIDE 1

Learning Sets of Rules

  • Sequential covering algorithms
  • FOIL
  • Induction as inverse of deduction
  • Inductive Logic Programming

Web resources:

  • http://web.comlab.ox.ac.uk/oucl/research/areas/

machlearn/ilp.html

  • http://www-ai.ijs.si/∼ilpnet2

1

slide-2
SLIDE 2

Learning Disjunctive Sets of Rules

Method 1: Learn decision tree, convert to rules Method 2: Sequential covering algorithm:

  • 1. Learn one rule with high accuracy, any coverage
  • 2. Remove positive examples covered by this rule
  • 3. Repeat

2

slide-3
SLIDE 3

Sequential Covering Algorithm

Sequential-covering(Target attribute, Attributes, Examples, Threshold)

  • Learned rules ← {}
  • Rule ← learn-one-

rule(Target attribute, Attributes, Examples)

  • while performance(Rule, Examples)

> Threshold, do – Learned rules ← Learned rules + Rule – Examples ← Examples − {examples correctly classified by Rule} – Rule ← learn-one- rule(Target attribute, Attributes, Examples)

  • Learned rules ← sort Learned rules according to

performance over Examples

  • return Learned rules

3

slide-4
SLIDE 4

Learn-One-Rule

... ... IF Wind=weak THEN PlayTennis=yes IF Wind=strong THEN PlayTennis=no IF THEN PlayTennis=yes THEN IF Humidity=normal Wind=weak PlayTennis=yes IF Humidity=normal THEN PlayTennis=yes THEN IF Humidity=normal Outlook=sunny PlayTennis=yes THEN IF Humidity=normal Wind=strong PlayTennis=yes THEN IF Humidity=normal Outlook=rain PlayTennis=yes IF Humidity=high THEN PlayTennis=no

4

slide-5
SLIDE 5

Learn-One-Rule

  • Pos ← positive Examples
  • Neg ← negative Examples
  • while Pos, do

Learn a NewRule – NewRule ← most general rule possible – NewRuleNeg ← Neg – while NewRuleNeg, do Add a new literal to specialize NewRule

  • 1. Candidate literals ← generate candidates
  • 2. Best literal ← argmaxL∈Candidate literals

Performance(SpecializeRule(NewRule, L))

  • 3. add Best literal to NewRule preconditions
  • 4. NewRuleNeg ← subset of NewRuleNeg that

satisfies NewRule preconditions – Learned rules ← Learned rules + NewRule – Pos ← Pos − {members of Pos covered by NewRule}

  • Return Learned rules

5

slide-6
SLIDE 6

Subtleties: Learn One Rule

  • 1. May use beam search
  • 2. Easily generalizes to multi-valued target functions
  • 3. Choose evaluation function to guide search:
  • Entropy (i.e., information gain)
  • Sample accuracy:

nc n where nc = correct rule predictions, n = all predictions

  • m estimate:

nc + mp n + m

6

slide-7
SLIDE 7

Variants of Rule Learning Programs

  • Sequential or simultaneous covering of data?
  • General → specific, or specific → general?
  • Generate-and-test, or example-driven?
  • Whether and how to post-prune?
  • What statistical evaluation function?

7

slide-8
SLIDE 8

Learning First Order Rules

Why do that?

  • Can learn sets of rules such as

Ancestor(x, y) ← Parent(x, y) Ancestor(x, y) ← Parent(x, z) ∧ Ancestor(z, y)

  • General purpose programming language Prolog:

programs are sets of such rules

8

slide-9
SLIDE 9

First Order Rule for Classifying Web Pages

[Slattery, 1997] course(A) ← has-word(A, instructor), Not has-word(A, good), link-from(A, B), has-word(B, assign), Not link-from(B, C) Train: 31/31, Test: 31/34

9

slide-10
SLIDE 10

FOIL

  • First-Order Induction of Logic (FOIL)
  • Learns Horn clauses without functions
  • Allows negated literals in rule body
  • Sequential covering algorithm

– Greedy, hill-climbing approach – Seeks only rules for predicting True

  • Each new rule generalizes overall concept (S → G)
  • Each added conjunct specializes rule (G → S)

10

slide-11
SLIDE 11

FOIL(Target predicate, Predicates, Examples)

  • Pos ← positive Examples
  • Neg ← negative Examples
  • while Pos, do

Learn a NewRule – NewRule ← most general rule possible – NewRuleNeg ← Neg – while NewRuleNeg, do Add a new literal to specialize NewRule

  • 1. Candidate literals ← generate candidates
  • 2. Best literal ←

argmaxL∈Candidate literals Foil Gain(L, NewRule)

  • 3. add Best literal to NewRule preconditions
  • 4. NewRuleNeg ← subset of NewRuleNeg that

satisfies NewRule preconditions – Learned rules ← Learned rules + NewRule – Pos ← Pos − {members of Pos covered by NewRule}

  • Return Learned rules

11

slide-12
SLIDE 12

Specializing Rules in FOIL

Learning rule: P(x1, x2, . . . , xk) ← L1 . . . Ln Candidate specializations add new literal of form:

  • Q(v1, . . . , vr), where at least one of the vi in the

created literal must already exist as a variable in the rule.

  • Equal(xj, xk), where xj and xk are variables already

present in the rule

  • The negation of either of the above forms of literals

12

slide-13
SLIDE 13

Information Gain in FOIL

Foil Gain(L, R) ≡ t

   log2

p1 p1 + n1 − log2 p0 p0 + n0

   

Where

  • L is the candidate literal to add to rule R
  • p0 = number of positive bindings of R
  • n0 = number of negative bindings of R
  • p1 = number of positive bindings of R + L
  • n1 = number of negative bindings of R + L
  • t is the number of positive bindings of R also covered

by R + L Note

  • − log2

p0 p0+n0 is optimal number of bits to indicate the

class of a positive binding covered by R

13

slide-14
SLIDE 14

FOIL Example

1 2 3 4 5 6 7 8 x y represents LinkedTo(x,y)

Instances:

  • pairs of nodes, e.g 1, 5, with graph described by

literals LinkedTo(0,1), ¬ LinkedTo(0,8) etc. Target function:

  • CanReach(x,y) true iff directed path from x to y

Hypothesis space:

  • Each h ∈ H is a set of horn clauses using predicates

LinkedTo (and CanReach)

14

slide-15
SLIDE 15

Induction as Inverted Deduction

Induction is finding h such that (∀xi, f(xi) ∈ D) B ∧ h ∧ xi ⊢ f(xi) where

  • xi is ith training instance
  • f(xi) is the target function value for xi
  • B is other background knowledge

So let’s design inductive algorithm by inverting

  • perators for automated deduction!

15

slide-16
SLIDE 16

Induction as Inverted Deduction

“pairs of people, u, v such that child of u is v,” f(xi) : Child(Bob, Sharon) xi : Male(Bob), Female(Sharon), Father(Sharon, Bob) B : Parent(u, v) ← Father(u, v) What satisfies (∀xi, f(xi) ∈ D) B ∧ h ∧ xi ⊢ f(xi)? h1 : Child(u, v) ← Father(v, u) h2 : Child(u, v) ← Parent(v, u)

16

slide-17
SLIDE 17

Induction is, in fact, the inverse operation of deduction, and cannot be conceived to exist without the corresponding operation, so that the question of relative importance cannot arise. Who thinks of asking whether addition or subtraction is the more important process in arithmetic? But at the same time much difference in difficulty may exist between a direct and inverse operation; . . . it must be allowed that inductive investigations are

  • f a far higher degree of difficulty and complexity

than any questions of deduction. . . . (Jevons 1874)

17

slide-18
SLIDE 18

Induction as Inverted Deduction

We have mechanical deductive operators F(A, B) = C, where A ∧ B ⊢ C need inductive operators O(B, D) = h where (∀xi, f(xi) ∈ D) (B∧h∧xi) ⊢ f(xi)

18

slide-19
SLIDE 19

Induction as Inverted Deduction

Positives:

  • Subsumes earlier idea of finding h that “fits” training

data

  • Domain theory B helps define meaning of “fit” the

data B ∧ h ∧ xi ⊢ f(xi)

  • Suggests algorithms that search H guided by B

19

slide-20
SLIDE 20

Induction as Inverted Deduction

Negatives:

  • Doesn’t allow for noisy data. Consider

(∀xi, f(xi) ∈ D) (B ∧ h ∧ xi) ⊢ f(xi)

  • First order logic gives a huge hypothesis space H

→ overfitting... → intractability of calculating all acceptable h’s

20

slide-21
SLIDE 21

Deduction: Resolution Rule

P ∨ L ¬L ∨ R P ∨ R

  • 1. Given initial clauses C1 and C2, find a literal L from

clause C1 such that ¬L occurs in clause C2

  • 2. Form the resolvent C by including all literals from

C1 and C2, except for L and ¬L. More precisely, the set of literals occurring in the conclusion C is C = (C1 − {L}) ∪ (C2 − {¬L}) where ∪ denotes set union, and “−” denotes set difference.

21

slide-22
SLIDE 22

Inverting Resolution

PassExam Study C: PassExam KnowMaterial C : 1

V

KnowMaterial Study C : 2

V

PassExam KnowMaterial C : 1

V

KnowMaterial Study C : 2

V V

PassExam Study C:

V 22

slide-23
SLIDE 23

Inverted Resolution (Propositional)

  • 1. Given initial clauses C1 and C, find a literal L that
  • ccurs in clause C1, but not in clause C.
  • 2. Form the second clause C2 by including the following

literals C2 = (C − (C1 − {L})) ∪ {¬L}

23

slide-24
SLIDE 24

First order resolution

First order resolution:

  • 1. Find a literal L1 from clause C1, literal L2 from

clause C2, and substitution θ such that L1θ = ¬L2θ

  • 2. Form the resolvent C by including all literals from

C1θ and C2θ, except for L1θ and ¬L2θ. More precisely, the set of literals occurring in the conclusion C is C = (C1 − {L1})θ ∪ (C2 − {L2})θ

24

slide-25
SLIDE 25

Inverting First order resolution

C2 = (C − (C1 − {L1})θ1)θ−1

2

∪ {¬L1θ1θ−1

2 }

25

slide-26
SLIDE 26

Cigol

Father Tom, Bob ( ) V Father x,z Father z,y V GrandChild y,x ( ) ( ) ( ) Bob/y, Tom/z} { { } Shannon/x GrandChild Bob, Shannon ( ) Father Shannon, Tom ( ) V GrandChild Bob,x Father x,Tom ) ( ( )

26

slide-27
SLIDE 27

Progol

Progol: Reduce comb. explosion by generating the most specific acceptable h

  • 1. User specifies H by stating predicates, functions, and

forms of arguments allowed for each

  • 2. Progol uses sequential covering algorithm.

For each xi, f(xi)

  • Find most specific hypothesis hi s.t.

B ∧ hi ∧ xi ⊢ f(xi) – actually, considers only k-step entailment

  • 3. Conduct general-to-specific search bounded by

specific hypothesis hi, choosing hypothesis with minimum description length

27

slide-28
SLIDE 28

Summary: Learning Rule Sets

  • Sequential (set) covering
  • Inductive Logic Programming (ILP)

– FOIL – Inverse resolution

28