Learning Sets of Rules Sequential covering algorithms FOIL - PDF document

Learning Sets of Rules • Sequential covering algorithms • FOIL • Induction as inverse of deduction • Inductive Logic Programming Web resources: • http://web.comlab.ox.ac.uk/oucl/research/areas/ machlearn/ilp.html • http://www-ai.ijs.si/ ∼ ilpnet2 1

Learning Disjunctive Sets of Rules Method 1: Learn decision tree, convert to rules Method 2: Sequential covering algorithm: 1. Learn one rule with high accuracy, any coverage 2. Remove positive examples covered by this rule 3. Repeat 2

Sequential Covering Algorithm Sequential-covering ( Target attribute , Attributes, Examples, Threshold ) • Learned rules ← {} • Rule ← learn-one- rule ( Target attribute, Attributes, Examples ) • while performance ( Rule, Examples ) > Threshold , do – Learned rules ← Learned rules + Rule – Examples ← Examples − { examples correctly classified by Rule } – Rule ← learn-one- rule ( Target attribute, Attributes, Examples ) • Learned rules ← sort Learned rules according to performance over Examples • return Learned rules 3

Learn-One-Rule IF THEN PlayTennis=yes IF Wind=weak THEN PlayTennis=yes ... IF Wind=strong IF Humidity=high THEN PlayTennis=no THEN PlayTennis=no IF Humidity=normal THEN PlayTennis=yes IF Humidity=normal Wind=weak THEN PlayTennis=yes ... IF Humidity=normal IF Humidity=normal Wind=strong Outlook=rain IF Humidity=normal THEN PlayTennis=yes THEN PlayTennis=yes Outlook=sunny THEN PlayTennis=yes 4

Learn-One-Rule • Pos ← positive Examples • Neg ← negative Examples • while Pos , do Learn a NewRule – NewRule ← most general rule possible – NewRuleNeg ← Neg – while NewRuleNeg , do Add a new literal to specialize NewRule 1. Candidate literals ← generate candidates 2. Best literal ← argmax L ∈ Candidate literals Performance ( SpecializeRule ( NewRule, L )) 3. add Best literal to NewRule preconditions 4. NewRuleNeg ← subset of NewRuleNeg that satisfies NewRule preconditions – Learned rules ← Learned rules + NewRule – Pos ← Pos − { members of Pos covered by NewRule } • Return Learned rules 5

Subtleties: Learn One Rule 1. May use beam search 2. Easily generalizes to multi-valued target functions 3. Choose evaluation function to guide search: • Entropy (i.e., information gain) • Sample accuracy: n c n where n c = correct rule predictions, n = all predictions • m estimate: n c + mp n + m 6

Variants of Rule Learning Programs • Sequential or simultaneous covering of data? • General → specific, or specific → general? • Generate-and-test, or example-driven? • Whether and how to post-prune? • What statistical evaluation function? 7

Learning First Order Rules Why do that? • Can learn sets of rules such as Ancestor ( x, y ) ← Parent ( x, y ) Ancestor ( x, y ) ← Parent ( x, z ) ∧ Ancestor ( z, y ) • General purpose programming language Prolog : programs are sets of such rules 8

First Order Rule for Classifying Web Pages [Slattery, 1997] course(A) ← has-word(A, instructor), Not has-word(A, good), link-from(A, B), has-word(B, assign), Not link-from(B, C) Train: 31/31, Test: 31/34 9

FOIL • First-Order Induction of Logic (FOIL) • Learns Horn clauses without functions • Allows negated literals in rule body • Sequential covering algorithm – Greedy, hill-climbing approach – Seeks only rules for predicting True • Each new rule generalizes overall concept (S → G) • Each added conjunct specializes rule (G → S) 10

FOIL ( Target predicate, Predicates, Examples ) • Pos ← positive Examples • Neg ← negative Examples • while Pos , do Learn a NewRule – NewRule ← most general rule possible – NewRuleNeg ← Neg – while NewRuleNeg , do Add a new literal to specialize NewRule 1. Candidate literals ← generate candidates 2. Best literal ← argmax L ∈ Candidate literals Foil Gain ( L, NewRule ) 3. add Best literal to NewRule preconditions 4. NewRuleNeg ← subset of NewRuleNeg that satisfies NewRule preconditions – Learned rules ← Learned rules + NewRule – Pos ← Pos − { members of Pos covered by NewRule } • Return Learned rules 11

Specializing Rules in FOIL Learning rule: P ( x 1 , x 2 , . . . , x k ) ← L 1 . . . L n Candidate specializations add new literal of form: • Q ( v 1 , . . . , v r ), where at least one of the v i in the created literal must already exist as a variable in the rule. • Equal ( x j , x k ), where x j and x k are variables already present in the rule • The negation of either of the above forms of literals 12

Information Gain in FOIL  p 1 p 0  Foil Gain ( L, R ) ≡ t  log 2 − log 2     p 1 + n 1 p 0 + n 0  Where • L is the candidate literal to add to rule R • p 0 = number of positive bindings of R • n 0 = number of negative bindings of R • p 1 = number of positive bindings of R + L • n 1 = number of negative bindings of R + L • t is the number of positive bindings of R also covered by R + L Note p 0 • − log 2 p 0 + n 0 is optimal number of bits to indicate the class of a positive binding covered by R 13

FOIL Example 7 8 6 0 2 3 4 5 1 LinkedTo(x,y) x y represents Instances: • pairs of nodes, e.g � 1 , 5 � , with graph described by literals LinkedTo(0,1), ¬ LinkedTo(0,8) etc. Target function: • CanReach(x,y) true iff directed path from x to y Hypothesis space: • Each h ∈ H is a set of horn clauses using predicates LinkedTo (and CanReach ) 14

Induction as Inverted Deduction Induction is finding h such that ( ∀� x i , f ( x i ) � ∈ D ) B ∧ h ∧ x i ⊢ f ( x i ) where • x i is i th training instance • f ( x i ) is the target function value for x i • B is other background knowledge So let’s design inductive algorithm by inverting operators for automated deduction! 15

Induction as Inverted Deduction “pairs of people, � u, v � such that child of u is v ,” f ( x i ) : Child ( Bob, Sharon ) x i : Male ( Bob ) , Female ( Sharon ) , Father ( Sharon, Bob ) B : Parent ( u, v ) ← Father ( u, v ) What satisfies ( ∀� x i , f ( x i ) � ∈ D ) B ∧ h ∧ x i ⊢ f ( x i )? h 1 : Child ( u, v ) ← Father ( v, u ) h 2 : Child ( u, v ) ← Parent ( v, u ) 16

Induction is, in fact, the inverse operation of deduction, and cannot be conceived to exist without the corresponding operation, so that the question of relative importance cannot arise. Who thinks of asking whether addition or subtraction is the more important process in arithmetic? But at the same time much difference in difficulty may exist between a direct and inverse operation; . . . it must be allowed that inductive investigations are of a far higher degree of difficulty and complexity than any questions of deduction . . . . (Jevons 1874) 17

Induction as Inverted Deduction We have mechanical deductive operators F ( A, B ) = C , where A ∧ B ⊢ C need inductive operators O ( B, D ) = h where ( ∀� x i , f ( x i ) � ∈ D ) ( B ∧ h ∧ x i ) ⊢ f ( x i ) 18

Induction as Inverted Deduction Positives: • Subsumes earlier idea of finding h that “fits” training data • Domain theory B helps define meaning of “fit” the data B ∧ h ∧ x i ⊢ f ( x i ) • Suggests algorithms that search H guided by B 19

Induction as Inverted Deduction Negatives: • Doesn’t allow for noisy data. Consider ( ∀� x i , f ( x i ) � ∈ D ) ( B ∧ h ∧ x i ) ⊢ f ( x i ) • First order logic gives a huge hypothesis space H → overfitting... → intractability of calculating all acceptable h ’s 20

Deduction: Resolution Rule ∨ L P ¬ L ∨ R ∨ R P 1. Given initial clauses C 1 and C 2 , find a literal L from clause C 1 such that ¬ L occurs in clause C 2 2. Form the resolvent C by including all literals from C 1 and C 2 , except for L and ¬ L . More precisely, the set of literals occurring in the conclusion C is C = ( C 1 − { L } ) ∪ ( C 2 − {¬ L } ) where ∪ denotes set union, and “ − ” denotes set difference. 21

Inverting Resolution V C : KnowMaterial Study C : V KnowMaterial Study 2 2 C : V V PassExam KnowMaterial C : PassExam KnowMaterial 1 1 V C: PassExam Study V C: PassExam Study 22

Inverted Resolution (Propositional) 1. Given initial clauses C 1 and C , find a literal L that occurs in clause C 1 , but not in clause C . 2. Form the second clause C 2 by including the following literals C 2 = ( C − ( C 1 − { L } )) ∪ {¬ L } 23

First order resolution First order resolution: 1. Find a literal L 1 from clause C 1 , literal L 2 from clause C 2 , and substitution θ such that L 1 θ = ¬ L 2 θ 2. Form the resolvent C by including all literals from C 1 θ and C 2 θ , except for L 1 θ and ¬ L 2 θ . More precisely, the set of literals occurring in the conclusion C is C = ( C 1 − { L 1 } ) θ ∪ ( C 2 − { L 2 } ) θ 24

Inverting First order resolution C 2 = ( C − ( C 1 − { L 1 } ) θ 1 ) θ − 1 ∪ {¬ L 1 θ 1 θ − 1 2 } 2 25

Cigol Father Tom, Bob ( ) GrandChild y,x ( ) V Father x,z ( ) V Father z,y ( ) { Bob/y, Tom/z } Father Shannon, Tom ( ) GrandChild Bob,x ( ) V Father x,Tom ( ) { Shannon/x } GrandChild Bob, Shannon ( ) 26

Learning Sets of Rules Sequential covering algorithms FOIL - PDF document

Learning Sets of Rules Sequential covering algorithms FOIL Induction as inverse of deduction Inductive Logic Programming Web resources: http://web.comlab.ox.ac.uk/oucl/research/areas/ machlearn/ilp.html

MATH 105: Finite Mathematics 6-1: Sets Prof. Jonathan Duncan Walla Walla College Winter

Languages and Regular expressions Lecture 2 1 Strings, Sets of Strings, Sets of Sets of

Sets Sets A Set is an abstract data type representing an unordered Sets are unordered and

Rules Engine Tool What is the Rules Engine? Alert Proactive Reaction Business Rules Actions

Singer difference sets and difference system of sets Akihiro Munemasa Graduate School of

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

Objectives FOLLOW Sets Dr. Mattox Beckman Compute the FOLLOW sets for the nonterminal symbols

S 3 identified by a rep. identified by a rep. n n = # of = # of Make Make- -Set

Sets Reading: EC 3.1-3.3 Peter J. Haas INFO 150 Fall Semester 2019 Lecture 11 1/ 21 Sets

Some Remarks on Sets of Lexicographic Probabilities and Sets of Desirable Gambles Fabio G. Cozman

Connected Domina-ng Sets Network Design Fall 2015 Saba Ahmadi Sheng Yang Domina-ng Sets and

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

Fourier transform for nilpotent Lie groups Index sets and representations Granada Index sets

Sets & Relations Basics of Sets Sets & Relations Relational Database in action x y

Math 211 Math 211 Lecture #39 Limit Sets April 25, 2001 2 Limit Sets Limit Sets The

Disjoint Sets and Disjoint sets The UNION-FIND ADT for disjoint sets the UNION-FIND

GuidedSampler: Coverage-guided Sampling of SMT Solutions Rafael Dutra, Jonathan Bachrach,

Spatial and Temporal Scales Coupling in Reactive Flows Ashraf N. Al-Khateeb R EACTIVE F LOW M

Response Prediction of Compliant Structures in Hypersonic Flow Jack J. McNamara ---

NUMA-Aware Thread and Resource Scheduling for Terabit Data Movement Taeuk Kim , Awais Khan,

TLS 1.3 Encrypted SNI ekr: ekr@rtfm.com dkg: dkg@aclu.org IETF 94 TLS 1.3 Encrypted SNI 1

CPL: A Language for User Control of Internet Telephony Services Jonathan Lennox Henning

Overview of the OS CS 450 : Operating Systems Michael Saelee <lee@iit.edu> 1 Computer

Trusted System Elements and Examples CS461/ECE422 Spring 2012

Learning Sets of Rules Sequential covering algorithms FOIL - PDF document

Learning Sets of Rules Sequential covering algorithms FOIL Induction as inverse of deduction Inductive Logic Programming Web resources: http://web.comlab.ox.ac.uk/oucl/research/areas/ machlearn/ilp.html

MATH 105: Finite Mathematics 6-1: Sets Prof. Jonathan Duncan Walla Walla College Winter

Languages and Regular expressions Lecture 2 1 Strings, Sets of Strings, Sets of Sets of

Sets Sets A Set is an abstract data type representing an unordered Sets are unordered and

Rules Engine Tool What is the Rules Engine? Alert Proactive Reaction Business Rules Actions

Singer difference sets and difference system of sets Akihiro Munemasa Graduate School of

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

Objectives FOLLOW Sets Dr. Mattox Beckman Compute the FOLLOW sets for the nonterminal symbols

S 3 identified by a rep. identified by a rep. n n = # of = # of Make Make- -Set

Sets Reading: EC 3.1-3.3 Peter J. Haas INFO 150 Fall Semester 2019 Lecture 11 1/ 21 Sets

Some Remarks on Sets of Lexicographic Probabilities and Sets of Desirable Gambles Fabio G. Cozman

Connected Domina-ng Sets Network Design Fall 2015 Saba Ahmadi Sheng Yang Domina-ng Sets and

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

Fourier transform for nilpotent Lie groups Index sets and representations Granada Index sets

Sets &amp; Relations Basics of Sets Sets &amp; Relations Relational Database in action x y

Math 211 Math 211 Lecture #39 Limit Sets April 25, 2001 2 Limit Sets Limit Sets The

Disjoint Sets and Disjoint sets The UNION-FIND ADT for disjoint sets the UNION-FIND

GuidedSampler: Coverage-guided Sampling of SMT Solutions Rafael Dutra, Jonathan Bachrach,

Spatial and Temporal Scales Coupling in Reactive Flows Ashraf N. Al-Khateeb R EACTIVE F LOW M

Response Prediction of Compliant Structures in Hypersonic Flow Jack J. McNamara ---

NUMA-Aware Thread and Resource Scheduling for Terabit Data Movement Taeuk Kim , Awais Khan,

TLS 1.3 Encrypted SNI ekr: ekr@rtfm.com dkg: dkg@aclu.org IETF 94 TLS 1.3 Encrypted SNI 1

CPL: A Language for User Control of Internet Telephony Services Jonathan Lennox Henning

Overview of the OS CS 450 : Operating Systems Michael Saelee &lt;lee@iit.edu&gt; 1 Computer

Trusted System Elements and Examples CS461/ECE422 Spring 2012

Sets & Relations Basics of Sets Sets & Relations Relational Database in action x y

Overview of the OS CS 450 : Operating Systems Michael Saelee <lee@iit.edu> 1 Computer