1 Greedy Sequential Covering Example Greedy Sequential Covering - PDF document

Learning Rules • If-then rules in logic are a standard representation of knowledge that have proven useful in expert-systems and other AI systems CS 391L: Machine Learning: – In propositional logic a set of rules for a concept is equivalent to DNF • Rules are fairly easy for people to understand and therefore can Rule Learning help provide insight and comprehensible results for human users. – Frequently used in data mining applications where goal is discovering understandable patterns in data. • Methods for automatically inducing rules from data have been shown to build more accurate expert systems than human knowledge engineering for some applications. Raymond J. Mooney • Rule-learning methods have been extended to first-order logic to handle relational (structural) representations. University of Texas at Austin – Inductive Logic Programming (ILP) for learning Prolog programs from I/O pairs. – Allows moving beyond simple feature-vector representations of data. 1 2 Rule Learning Approaches Decision-Trees to Rules • Translate decision trees into rules (C4.5) • For each path in a decision tree from the root to a leaf, create a rule with the conjunction of tests • Sequential (set) covering algorithms along the path as an antecedent and the leaf label – General-to-specific (top-down) (CN2, FOIL) as the consequent. – Specific-to-general (bottom-up) (GOLEM, red ∧ circle → A color CIGOL) blue → B green red blue red ∧ square → B – Hybrid search (AQ, Chillin, Progol) shape C B green → C circle square triangle red ∧ triangle → C • Translate neural-nets into rules (TREPAN) B A C 3 4 Post-Processing Decision-Tree Rules Sequential Covering • Resulting rules may contain unnecessary antecedents that • A set of rules is learned one at a time, each time finding a are not needed to remove negative examples and result in single rule that covers a large number of positive instances over-fitting. without covering any negatives, removing the positives that • Rules are post-pruned by greedily removing antecedents or it covers, and learning additional rules to cover the rest. rules until performance on training data or validation set is Let P be the set of positive examples significantly harmed. Until P is empty do: • Resulting rules may lead to competing conflicting Learn a rule R that covers a large number of elements of P but conclusions on some instances. no negatives. Add R to the list of rules. • Sort rules by training (validation) accuracy to create an Remove positives covered by R from P ordered decision list. The first rule in the list that applies is • This is an instance of the greedy algorithm for minimum set used to classify a test instance. covering and does not guarantee a minimum number of red ∧ circle → A (97% train accuracy) learned rules. red ∧ big → B (95% train accuracy) • Minimum set covering is an NP-hard problem and the : greedy algorithm is a standard approximation algorithm. : • Methods for learning individual rules vary. Test case: <big, red, circle> assigned to class A 5 6 1

Greedy Sequential Covering Example Greedy Sequential Covering Example Y Y + + + + + + + + + + + + + + + + + + + + + + + + + + X X 7 8 Greedy Sequential Covering Example Greedy Sequential Covering Example Y Y + + + + + + + + + + + + X X 9 10 Greedy Sequential Covering Example Greedy Sequential Covering Example Y Y + + + + + + X X 11 12 2

Greedy Sequential Covering Example No-optimal Covering Example Y Y + + + + + + + + + + + + + X X 13 14 Greedy Sequential Covering Example Greedy Sequential Covering Example Y Y + + + + + + + + + + + + + + + + + + + X X 15 16 Greedy Sequential Covering Example Greedy Sequential Covering Example Y Y + + + + + + + + X X 17 18 3

Greedy Sequential Covering Example Greedy Sequential Covering Example Y Y + + + X X 19 20 Greedy Sequential Covering Example Greedy Sequential Covering Example Y Y + X X 21 22 Strategies for Learning a Single Rule Top-Down Rule Learning Example • Top Down (General to Specific): – Start with the most-general (empty) rule. Y – Repeatedly add antecedent constraints on features that eliminate negative examples while maintaining as many + + positives as possible. + – Stop when only positives are covered. + + + + • Bottom Up (Specific to General) – Start with a most-specific rule (e.g. complete instance description of a random instance). + + – Repeatedly remove antecedent constraints in order to + + + + cover more positives. – Stop when further generalization results in covering negatives. X 23 24 4

Top-Down Rule Learning Example Top-Down Rule Learning Example Y Y + + + + + + + + + + + + + + Y>C 1 Y>C 1 + + + + + + + + + + + + X X X>C 2 25 26 Top-Down Rule Learning Example Top-Down Rule Learning Example Y Y Y<C 3 Y<C 3 + + + + + + + + + + + + + + Y>C 1 Y>C 1 + + + + + + + + + + + + X X X<C 4 X>C 2 X>C 2 27 28 Bottom-Up Rule Learning Example Bottom-Up Rule Learning Example Y Y + + + + + + + + + + + + + + + + + + + + + + + + + + X X 29 30 5

Bottom-Up Rule Learning Example Bottom-Up Rule Learning Example Y Y + + + + + + + + + + + + + + + + + + + + + + + + + + X X 31 32 Bottom-Up Rule Learning Example Bottom-Up Rule Learning Example Y Y + + + + + + + + + + + + + + + + + + + + + + + + + + X X 33 34 Bottom-Up Rule Learning Example Bottom-Up Rule Learning Example Y Y + + + + + + + + + + + + + + + + + + + + + + + + + + X X 35 36 6

Bottom-Up Rule Learning Example Bottom-Up Rule Learning Example Y Y + + + + + + + + + + + + + + + + + + + + + + + + + + X X 37 38 Bottom-Up Rule Learning Example Learning a Single Rule in FOIL • Top-down approach originally applied to first-order logic (Quinlan, 1990). Y • Basic algorithm for instances with discrete-valued + + features: + Let A ={} (set of rule antecedents) + + + Let N be the set of negative examples + Let P the current set of uncovered positive examples Until N is empty do For every feature-value pair (literal) ( F i = V ij ) calculate + + Gain( F i = V ij , P , N ) Pick literal, L , with highest gain. + + + + Add L to A . Remove from N any examples that do not satisfy L . Remove from P any examples that do not satisfy L . Return the rule: A 1 ∧ A 2 ∧ … ∧ A n → Positive X 39 40 Foil Gain Metric Sample Disjunctive Learning Data • Want to achieve two goals – Decrease coverage of negative examples Example Size Color Shape Category • Measure increase in percentage of positives covered when 1 small red circle positive literal is added to the rule. 2 big red circle positive – Maintain coverage of as many positives as possible. 3 small red triangle negative • Count number of positives covered. 4 big blue circle negative Define Gain( L , P , N ) 5 medium red circle negative Let p be the subset of examples in P that satisfy L. Let n be the subset of examples in N that satisfy L. Return: | p |*[log 2 (| p |/(| p |+| n |)) – log 2 (| P |/(| P |+| N |))] 41 42 7

1 Greedy Sequential Covering Example Greedy Sequential Covering - PDF document

Learning Rules If-then rules in logic are a standard representation of knowledge that have proven useful in expert-systems and other AI systems CS 391L: Machine Learning: In propositional logic a set of rules for a concept is equivalent

Majority Rule in the Absence of a Majority Klaus Nehring and Marcus Pivato ESSLLI August 13,

All A is B All A is B This is A This is B This is B This is A good form bad form Use to

? S E L U R T L M 3 4 What Are Core Values? CORE VALUES Core values are

Knuth-Bendix Completion Procedure (Rules 1) The KB procedure consists of 3 basic steps:

An adaptive nearest neighbor rule for classification Akshay Balsubramani, Stanford Sanjoy

ADJUST TO FINAL RULE REVISIONS FOR MORE INFORMATION www.ae2s.com/rlcr.php Presentation

Examples of SAAS on cloud: dynamically scaling R, Galaxy and Matlab tom.visser@sara.nl

The Agile Advantage: How Agile Integration Improves Outcomes for Software as a Service (SaaS) in

Outline What is it? Why now? Cloud killer apps Economics for users Economics

Friction: The #1 Killer of Workplace Productivity DAVID POLITIS Founder & CEO, BetterCloud

C LOUD C OMPUTING A ND M4D Balwinder Sodhi Indian Institute of Technology Ropar MOOC on M4D 2013

A Python for Future Generations Armin @mitsuhiko Ronacher Hi, I'm Armin ... and I do Open

The Bright and Dark Sides of Computer Vision and Machine Learning Challenges and Opportunities for

CS 5150 So(ware Engineering Security William Y. Arms

Parallel Computing in R and Simulations on the Cluster Computing Club April 30, 2019 Lamar Hunt

Mining, and Intro to Categorization Tues April 10 Kristen Grauman UT Austin UT Austin, CS 376

COMP 516 COMP 516 Research Methods in Computer Science Research Methods in Computer Science

Computer Programming II Algorithm Analysis and Sorting Techniques Algorithm A sequence of

DMA Support for the Sancus Architecture Lightweight and Open-Source Trusted Computing for the IoT

Clustering Data Streams zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA Sudipto Guha

Sub-lithographic Semiconductor Computing Systems Andr DeHon andre@cs.caltech.edu In

Tracking Dice, see dieroll2.cpp Defining tvector objects const int DICE_SIDES = 4; Can specify

Vectors Vectors are homogeneous collections with random access Store the same type/class of

LECTURE 1: OVERVIEW CS 4100: Foundations of AI Instructor: Robert Platt (some slides from Chris

1 Greedy Sequential Covering Example Greedy Sequential Covering - PDF document

Learning Rules If-then rules in logic are a standard representation of knowledge that have proven useful in expert-systems and other AI systems CS 391L: Machine Learning: In propositional logic a set of rules for a concept is equivalent

Majority Rule in the Absence of a Majority Klaus Nehring and Marcus Pivato ESSLLI August 13,

All A is B All A is B This is A This is B This is B This is A good form bad form Use to

? S E L U R T L M 3 4 What Are Core Values? CORE VALUES Core values are

Knuth-Bendix Completion Procedure (Rules 1) The KB procedure consists of 3 basic steps:

An adaptive nearest neighbor rule for classification Akshay Balsubramani, Stanford Sanjoy

ADJUST TO FINAL RULE REVISIONS FOR MORE INFORMATION www.ae2s.com/rlcr.php Presentation

Examples of SAAS on cloud: dynamically scaling R, Galaxy and Matlab tom.visser@sara.nl

The Agile Advantage: How Agile Integration Improves Outcomes for Software as a Service (SaaS) in

Outline What is it? Why now? Cloud killer apps Economics for users Economics

Friction: The #1 Killer of Workplace Productivity DAVID POLITIS Founder &amp; CEO, BetterCloud

C LOUD C OMPUTING A ND M4D Balwinder Sodhi Indian Institute of Technology Ropar MOOC on M4D 2013

A Python for Future Generations Armin @mitsuhiko Ronacher Hi, I'm Armin ... and I do Open

The Bright and Dark Sides of Computer Vision and Machine Learning Challenges and Opportunities for

CS 5150 So(ware Engineering Security William Y. Arms

Parallel Computing in R and Simulations on the Cluster Computing Club April 30, 2019 Lamar Hunt

Mining, and Intro to Categorization Tues April 10 Kristen Grauman UT Austin UT Austin, CS 376

COMP 516 COMP 516 Research Methods in Computer Science Research Methods in Computer Science

Computer Programming II Algorithm Analysis and Sorting Techniques Algorithm A sequence of

DMA Support for the Sancus Architecture Lightweight and Open-Source Trusted Computing for the IoT

Clustering Data Streams zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA Sudipto Guha

Sub-lithographic Semiconductor Computing Systems Andr DeHon andre@cs.caltech.edu In

Tracking Dice, see dieroll2.cpp Defining tvector objects const int DICE_SIDES = 4; Can specify

Vectors Vectors are homogeneous collections with random access Store the same type/class of

LECTURE 1: OVERVIEW CS 4100: Foundations of AI Instructor: Robert Platt (some slides from Chris

Friction: The #1 Killer of Workplace Productivity DAVID POLITIS Founder & CEO, BetterCloud