The logic of learning: The logic of learning: logic and knowledge - - PowerPoint PPT Presentation

the logic of learning the logic of learning
SMART_READER_LITE
LIVE PREVIEW

The logic of learning: The logic of learning: logic and knowledge - - PowerPoint PPT Presentation

The logic of learning: The logic of learning: logic and knowledge representation logic and knowledge representation in machine learning in machine learning Peter A. Flach Department of Computer Science University of Bristol


slide-1
SLIDE 1

LICS’01 workshop The logic of learning

The logic of learning:

logic and knowledge representation in machine learning

The logic of learning:

logic and knowledge representation in machine learning Peter A. Flach Department of Computer Science University of Bristol www.cs.bris.ac.uk/~flach/

slide-2
SLIDE 2

LICS’01 workshop The logic of learning

Overview of this talk Overview of this talk

z A quick overview of ILP z Knowledge representation

y individual-centred representations

z Learning as inference

y inductive consequence relations

z Conclusions and outlook z A quick overview of ILP z Knowledge representation

y individual-centred representations

z Learning as inference

y inductive consequence relations

z Conclusions and outlook

slide-3
SLIDE 3

LICS’01 workshop The logic of learning

Overview of this talk Overview of this talk

z A (very) quick overview of ILP z Knowledge representation

y individual-centred representations

z Learning as inference

y inductive consequence relations

z Conclusions and outlook z A (very) quick overview of ILP z Knowledge representation

y individual-centred representations

z Learning as inference

y inductive consequence relations

z Conclusions and outlook

slide-4
SLIDE 4

LICS’01 workshop The logic of learning

Inductive concept learning Inductive concept learning

z Given: descriptions of instances and non- instances z Find: a concept covering all instances and no non-instances z Given: descriptions of instances and non- instances z Find: a concept covering all instances and no non-instances

— — — — — —

+ + + + + +

— —

+ + not yet refuted = Version Space not yet refuted = Version Space too general

(covering non-instances)

too general

(covering non-instances)

too specific

(not covering instances)

too specific

(not covering instances)

slide-5
SLIDE 5

LICS’01 workshop The logic of learning

z Given:

y positive examples P: facts to be entailed, y negative examples N: facts not to be entailed, y background knowledge B: a set of predicate definitions;

z Find: a hypothesis H (one or more predicate definitions) such that

y for every p∈P: B ∪ H |= p (completeness), y for every n∈N: B ∪ H |≠ n (consistency).

z Given:

y positive examples P: facts to be entailed, y negative examples N: facts not to be entailed, y background knowledge B: a set of predicate definitions;

z Find: a hypothesis H (one or more predicate definitions) such that

y for every p∈P: B ∪ H |= p (completeness), y for every n∈N: B ∪ H |≠ n (consistency).

Concept learning in logic Concept learning in logic

slide-6
SLIDE 6

LICS’01 workshop The logic of learning

ILP methods ILP methods

z top-down (language-driven)

y descend the generality ordering

x start with short, general rule

y specialise by

x substituting variables x adding conditions

z bottom-up (data-driven)

y climb the generality ordering

x start with long, specific rule

y generalise by

x introducing variables x removing conditions

z top-down (language-driven)

y descend the generality ordering

x start with short, general rule

y specialise by

x substituting variables x adding conditions

z bottom-up (data-driven)

y climb the generality ordering

x start with long, specific rule

y generalise by

x introducing variables x removing conditions

slide-7
SLIDE 7

LICS’01 workshop The logic of learning

Top-down induction: example Top-down induction: example

example action hypothesis +p(b,[b]) add clause p(X,Y).

  • p(x,[])

specialise p(X,[V|W]).

  • p(x,[a,b]) specialise

p(X,[X|W]). +p(b,[a,b]) add clause p(X,[X|W]). p(X,[V|W]):-p(X,W). example example action action hypothesis hypothesis +p(b,[b]) add clause p(X,Y).

  • p(x,[])

specialise p(X,[V|W]).

  • p(x,[a,b]) specialise

p(X,[X|W]). +p(b,[a,b]) add clause p(X,[X|W]). p(X,[V|W]):-p(X,W).

slide-8
SLIDE 8

LICS’01 workshop The logic of learning

Bottom-up induction: example Bottom-up induction: example

z Treat positive examples + ground background facts as body z Choose two examples as heads and anti-unify

q([1,2],[3,4],[1,2,3,4]):- q([1,2],[3,4],[1,2,3,4]),q([a],[],[a]),q([],[],[]),q([2],[3,4],[2,3,4]) q([a],[],[a]):- q([1,2],[3,4],[1,2,3,4]),q([a],[],[a]),q([],[],[]),q([2],[3,4],[2,3,4]) q([A|B],C,[A|D]):- q([1,2],[3,4],[1,2,3,4]),q([A|B],C,[A|D]),q(W,C,X),q([S|B],[3,4],[S,T,U|V]), q([R|G],K,[R|L]),q([a],[],[a]),q(Q,[],Q),q([P],K,[P|K]), q(N,K,O),q(M,[],M),q([],[],[]),q(G,K,L), q([F|G],[3,4],[F,H,I|J]),q([E],C,[E|C]),q(B,C,D),q([2],[3,4],[2,3,4])

z Generalise by removing literals until negative examples covered z Treat positive examples + ground background facts as body z Choose two examples as heads and anti-unify

q([1,2],[3,4],[1,2,3,4]):- q([1,2],[3,4],[1,2,3,4]),q([a],[],[a]),q([],[],[]),q([2],[3,4],[2,3,4]) q([a],[],[a]):- q([1,2],[3,4],[1,2,3,4]),q([a],[],[a]),q([],[],[]),q([2],[3,4],[2,3,4]) q([A|B],C,[A|D]):- q([1,2],[3,4],[1,2,3,4]),q([A|B],C,[A|D]),q(W,C,X),q([S|B],[3,4],[S,T,U|V]), q([R|G],K,[R|L]),q([a],[],[a]),q(Q,[],Q),q([P],K,[P|K]), q(N,K,O),q(M,[],M),q([],[],[]),q(G,K,L), q([F|G],[3,4],[F,H,I|J]),q([E],C,[E|C]),q(B,C,D),q([2],[3,4],[2,3,4])

z Generalise by removing literals until negative examples covered

slide-9
SLIDE 9

LICS’01 workshop The logic of learning

Progol predicting carcinogenicity Progol predicting carcinogenicity

zA molecular compound is carcinogenic if:

(1) it tests positive in the Salmonella assay; or (2) it tests positive for sex-linked recessive lethal mutation in Drosophila; or (3) it tests negative for chromosome aberration; or (4) it has a carbon in a six-membered aromatic ring with a partial charge of

  • 0.13; or

(5) it has a primary amine group and no secondary or tertiary amines; or (6) it has an aromatic (or resonant) hydrogen with partial charge ≥ 0.168; or (7) it has an hydroxy oxygen with a partial charge ≥ -0.616 and an aromatic (or resonant) hydrogen; or (8) it has a bromine; or (9) it has a tetrahedral carbon with a partial charge ≤ -0.144 and tests positive on Progol’s mutagenicity rules.

zA molecular compound is carcinogenic if:

(1) it tests positive in the Salmonella assay; or (2) it tests positive for sex-linked recessive lethal mutation in Drosophila; or (3) it tests negative for chromosome aberration; or (4) it has a carbon in a six-membered aromatic ring with a partial charge of

  • 0.13; or

(5) it has a primary amine group and no secondary or tertiary amines; or (6) it has an aromatic (or resonant) hydrogen with partial charge ≥ 0.168; or (7) it has an hydroxy oxygen with a partial charge ≥ -0.616 and an aromatic (or resonant) hydrogen; or (8) it has a bromine; or (9) it has a tetrahedral carbon with a partial charge ≤ -0.144 and tests positive on Progol’s mutagenicity rules.

slide-10
SLIDE 10

LICS’01 workshop The logic of learning

ILP example: East-West trains ILP example: East-West trains

  • 1. TRAINS GOING EAST
  • 2. TRAINS GOING WEST

1. 2. 3. 4. 5. 1. 2. 3. 4. 5.

  • 1. TRAINS GOING EAST
  • 2. TRAINS GOING WEST

1. 2. 3. 4. 5. 1. 2. 3. 4. 5.

slide-11
SLIDE 11

LICS’01 workshop The logic of learning

Prolog representation (flattened) Prolog representation (flattened)

z Example:

eastbound(t1).

z Background knowledge:

car(t1,c1). car(t1,c2). car(t1,c3). car(t1,c4). rectangle(c1). rectangle(c2). rectangle(c3). rectangle(c4). short(c1). long(c2). short(c3). long(c4).

  • pen(c1). open(c2). peaked(c3). open(c4).

two_wheels(c1). three_wheels(c2). two_wheels(c3). two_wheels(c4). load(c1,l1). load(c2,l2). load(c3,l3). load(c4,l4). circle(l1). hexagon(l2). triangle(l3). rectangle(l4).

  • ne_load(l1). one_load(l2). one_load(l3). three_loads(l4).

z Hypothesis:

eastbound(T):-car(T,C),short(C),not open(C).

z Example:

eastbound(t1).

z Background knowledge:

car(t1,c1). car(t1,c2). car(t1,c3). car(t1,c4). rectangle(c1). rectangle(c2). rectangle(c3). rectangle(c4). short(c1). long(c2). short(c3). long(c4).

  • pen(c1). open(c2). peaked(c3). open(c4).

two_wheels(c1). three_wheels(c2). two_wheels(c3). two_wheels(c4). load(c1,l1). load(c2,l2). load(c3,l3). load(c4,l4). circle(l1). hexagon(l2). triangle(l3). rectangle(l4).

  • ne_load(l1). one_load(l2). one_load(l3). three_loads(l4).

z Hypothesis:

eastbound(T):-car(T,C),short(C),not open(C).

slide-12
SLIDE 12

LICS’01 workshop The logic of learning

Prolog representation (flattened) Prolog representation (flattened)

z Example:

eastbound(t1).

z Background knowledge:

car(t1,c1). car(t1,c2). car(t1,c3). car(t1,c4). rectangle(c1). rectangle(c2). rectangle(c3). rectangle(c4). short(c1). long(c2). short(c3). long(c4).

  • pen(c1). open(c2). peaked(c3). open(c4).

two_wheels(c1). three_wheels(c2). two_wheels(c3). two_wheels(c4). load(c1,l1). load(c2,l2). load(c3,l3). load(c4,l4). circle(l1). hexagon(l2). triangle(l3). rectangle(l4).

  • ne_load(l1). one_load(l2). one_load(l3). three_loads(l4).

z Hypothesis:

eastbound(T):-car(T,C),short(C),not open(C).

z Example:

eastbound(t1).

z Background knowledge:

car(t1,c1). car(t1,c2). car(t1,c3). car(t1,c4). rectangle(c1). rectangle(c2). rectangle(c3). rectangle(c4). short(c1). long(c2). short(c3). long(c4).

  • pen(c1). open(c2). peaked(c3). open(c4).

two_wheels(c1). three_wheels(c2). two_wheels(c3). two_wheels(c4). load(c1,l1). load(c2,l2). load(c3,l3). load(c4,l4). circle(l1). hexagon(l2). triangle(l3). rectangle(l4).

  • ne_load(l1). one_load(l2). one_load(l3). three_loads(l4).

z Hypothesis:

eastbound(T):-car(T,C),short(C),not open(C).

slide-13
SLIDE 13

LICS’01 workshop The logic of learning

Prolog representation (terms) Prolog representation (terms)

z Example:

eastbound([c(rectangle,short,open,2,l(circle,1)), c(rectangle,long,open,3,l(hexagon,1)), c(rectangle,short,peaked,2,l(triangle,1)), c(rectangle,long,open,2,l(rectangle,3))]).

z Background knowledge: member/2, arg/3 z Hypothesis:

eastbound(T):-member(C,T),arg(2,C,short), not arg(3,C,open).

z Example:

eastbound([c(rectangle,short,open,2,l(circle,1)), c(rectangle,long,open,3,l(hexagon,1)), c(rectangle,short,peaked,2,l(triangle,1)), c(rectangle,long,open,2,l(rectangle,3))]).

z Background knowledge: member/2, arg/3 z Hypothesis:

eastbound(T):-member(C,T),arg(2,C,short), not arg(3,C,open).

slide-14
SLIDE 14

LICS’01 workshop The logic of learning

Prolog representation (terms) Prolog representation (terms)

z Example:

eastbound([c(rectangle,short,open,2,l(circle,1)), c(rectangle,long,open,3,l(hexagon,1)), c(rectangle,short,peaked,2,l(triangle,1)), c(rectangle,long,open,2,l(rectangle,3))]).

z Background knowledge: member/2, arg/3 z Hypothesis:

eastbound(T):-member(C,T),arg(2,C,short), not arg(3,C,open).

z Example:

eastbound([c(rectangle,short,open,2,l(circle,1)), c(rectangle,long,open,3,l(hexagon,1)), c(rectangle,short,peaked,2,l(triangle,1)), c(rectangle,long,open,2,l(rectangle,3))]).

z Background knowledge: member/2, arg/3 z Hypothesis:

eastbound(T):-member(C,T),arg(2,C,short), not arg(3,C,open).

slide-15
SLIDE 15

LICS’01 workshop The logic of learning

Machine learning vs. ILP Machine learning vs. ILP

attribute-value concept learning attribute-value concept learning Prolog program synthesis Prolog program synthesis individual-centred representations individual-centred representations multi-instance learning multi-instance learning

? ?

slide-16
SLIDE 16

LICS’01 workshop The logic of learning

Overview of this talk Overview of this talk

z A quick overview of ILP z Knowledge representation

y individual-centred representations

z Learning as inference

y inductive consequence relations

z Conclusions and outlook z A quick overview of ILP z Knowledge representation

y individual-centred representations

z Learning as inference

y inductive consequence relations

z Conclusions and outlook

slide-17
SLIDE 17

LICS’01 workshop The logic of learning

Knowledge Representation Knowledge Representation

z Entity-Relationship (ER) diagrams z Relational Database z Individual-Centred Representations z Strongly typed language z XML? z Entity-Relationship (ER) diagrams z Relational Database z Individual-Centred Representations z Strongly typed language z XML?

slide-18
SLIDE 18

LICS’01 workshop The logic of learning

ER diagram for East-West trains ER diagram for East-West trains

Train Train

Direction Direction

Has Has Car Car

Shape Shape Length Length Roof Roof Wheels Wheels

1 1 M M Has Has Load Load 1 1 1 1

Number Number Object Object

slide-19
SLIDE 19

LICS’01 workshop The logic of learning

A particular train A particular train

train1 train1

Direction Direction

Has Has car1 car1

Shape Shape Length Length Roof Roof Wheels Wheels

Has Has load1 load1

Number Number Object Object

Has Has car2 car2

Shape Shape Length Length Roof Roof Wheels Wheels

Has Has load2 load2

Number Number Object Object

Has Has car3 car3

Shape Shape Length Length Roof Roof Wheels Wheels

Has Has load3 load3

Number Number Object Object

slide-20
SLIDE 20

LICS’01 workshop The logic of learning

LOAD CAR OBJECT NUMBE R l1 c1 circle 1 l2 c2 hexagon 1 l3 c3 triangle 1 l4 c4 rec tangle 3 … … … LOAD CAR OBJECT NUMBE R l1 c1 circle 1 l2 c2 hexagon 1 l3 c3 triangle 1 l4 c4 rec tangle 3 … … …

LOAD_TABLE LOAD_TABLE

Database representation Database representation

TRAIN DIRECTION t1 EAST t2 EAST … … t6 WEST … … TRAIN DIRECTION t1 EAST t2 EAST … … t6 WEST … …

TRAIN_TABLE TRAIN_TABLE

CAR TRAIN SHAPE LENGTH ROOF WHE ELS c1 t1 rec tangle short

  • pen

2 c2 t1 rec tangle long

  • pen

3 c3 t1 rec tangle short peaked 2 c4 t1 rec tangle long

  • pen

2 … … … … CAR TRAIN SHAPE LENGTH ROOF WHE ELS c1 t1 rec tangle short

  • pen

2 c2 t1 rec tangle long

  • pen

3 c3 t1 rec tangle short peaked 2 c4 t1 rec tangle long

  • pen

2 … … … …

CAR_TABLE CAR_TABLE SELECT DISTINCT TRAIN_TABLE.TRAIN FROM TRAIN_TABLE, CAR_TABLE WHERE TRAIN_TABLE.TRAIN = CAR_TABLE.TRAIN AND CAR_TABLE.SHAPE = 'rectangle' AND CAR_TABLE.ROOF != 'open' SELECT DISTINCT TRAIN_TABLE.TRAIN FROM TRAIN_TABLE, CAR_TABLE WHERE TRAIN_TABLE.TRAIN = CAR_TABLE.TRAIN AND CAR_TABLE.SHAPE = 'rectangle' AND CAR_TABLE.ROOF != 'open'

slide-21
SLIDE 21

LICS’01 workshop The logic of learning

Individual-centred representations Individual-centred representations

z ER diagram is a tree (approximately)

y root denotes individual y looking downwards from the root, only one-to-one or

  • ne-to-many relations are allowed

y one-to-one cycles are allowed

z Database can be partitioned into sub-databases each describing a single individual z Alternative: all information about a single individual packed together in a term

y tuples, lists, sets, multisets, trees, …

z ER diagram is a tree (approximately)

y root denotes individual y looking downwards from the root, only one-to-one or

  • ne-to-many relations are allowed

y one-to-one cycles are allowed

z Database can be partitioned into sub-databases each describing a single individual z Alternative: all information about a single individual packed together in a term

y tuples, lists, sets, multisets, trees, …

slide-22
SLIDE 22

LICS’01 workshop The logic of learning

Strongly typed languages Strongly typed languages

z Type signature specifies ‘data model’

y similar to ER diagram

z Each example described by single statement z Hypothesis construction guided by types

y interaction between structural functions/predicates referring to subterms and utility predicates giving properties of subterms

z Example language: Escher

y functional logic programming

z Type signature specifies ‘data model’

y similar to ER diagram

z Each example described by single statement z Hypothesis construction guided by types

y interaction between structural functions/predicates referring to subterms and utility predicates giving properties of subterms

z Example language: Escher

y functional logic programming

slide-23
SLIDE 23

LICS’01 workshop The logic of learning

East-West trains in Escher East-West trains in Escher

z Type signature:

data Shape = Rectangle | Hexagon | …; data Length = Long | Short; data Roof = Open | Peaked | …; data Object = Circle | Hexagon | …; type Wheels = Int; type Load = (Object,Number); type Number = Int type Car = (Shape,Length,Roof,Wheels,Load); type Train = [Car]; eastbound::Train->Bool;

z Example:

eastbound([(Rectangle,Short,Open,2,(Circle,1)), (Rectangle,Long,Open,3,(Hexagon,1)), (Rectangle,Short,Peaked,2,(Triangle,1)), (Rectangle,Long,Open,2,(Rectangle,3))]) = True

z Hypothesis:

eastbound(t) = (exists \c -> member(c,t) && LengthP(c)==Short && RoofP(c)!=Open)

z Type signature:

data Shape = Rectangle | Hexagon | …; data Length = Long | Short; data Roof = Open | Peaked | …; data Object = Circle | Hexagon | …; type Wheels = Int; type Load = (Object,Number); type Number = Int type Car = (Shape,Length,Roof,Wheels,Load); type Train = [Car]; eastbound::Train->Bool;

z Example:

eastbound([(Rectangle,Short,Open,2,(Circle,1)), (Rectangle,Long,Open,3,(Hexagon,1)), (Rectangle,Short,Peaked,2,(Triangle,1)), (Rectangle,Long,Open,2,(Rectangle,3))]) = True

z Hypothesis:

eastbound(t) = (exists \c -> member(c,t) && LengthP(c)==Short && RoofP(c)!=Open)

slide-24
SLIDE 24

LICS’01 workshop The logic of learning

East-West trains in Escher East-West trains in Escher

z Type signature:

data Shape = Rectangle | Hexagon | …; data Length = Long | Short; data Roof = Open | Peaked | …; data Object = Circle | Hexagon | …; type Wheels = Int; type Load = (Object,Number); type Number = Int type Car = (Shape,Length,Roof,Wheels,Load); type Train = [Car]; eastbound::Train->Bool;

z Example:

eastbound([(Rectangle,Short,Open,2,(Circle,1)), (Rectangle,Long,Open,3,(Hexagon,1)), (Rectangle,Short,Peaked,2,(Triangle,1)), (Rectangle,Long,Open,2,(Rectangle,3))]) = True

z Hypothesis:

eastbound(t) = (exists \c -> member(c,t) && LengthP(c)==Short && RoofP(c)!=Open)

z Type signature:

data Shape = Rectangle | Hexagon | …; data Length = Long | Short; data Roof = Open | Peaked | …; data Object = Circle | Hexagon | …; type Wheels = Int; type Load = (Object,Number); type Number = Int type Car = (Shape,Length,Roof,Wheels,Load); type Train = [Car]; eastbound::Train->Bool;

z Example:

eastbound([(Rectangle,Short,Open,2,(Circle,1)), (Rectangle,Long,Open,3,(Hexagon,1)), (Rectangle,Short,Peaked,2,(Triangle,1)), (Rectangle,Long,Open,2,(Rectangle,3))]) = True

z Hypothesis:

eastbound(t) = (exists \c -> member(c,t) && LengthP(c)==Short && RoofP(c)!=Open)

slide-25
SLIDE 25

LICS’01 workshop The logic of learning

Mutagenesis Mutagenesis

Has Has Atom Atom

Element Element AtomType AtomType Charge Charge

1 1 M M

BondType BondType

Bond Bond 1 1 1 1 Molecule Molecule

Class Class Ind1 Ind1 IndA IndA Lumo Lumo LogP LogP

slide-26
SLIDE 26

LICS’01 workshop The logic of learning

Mutagenesis in Escher

z Type signature:

data Element = Br | C | Cl | F | H | I | N | O | S; type Ind1 = Bool; type IndA = Bool; type Lumo = Float; type LogP = Float; type AtomID = Int; type AtomType = Int; type Charge = Float; type BondType = Int; type Atom = (AtomID,Element,AtomType,Charge); type Bond = ({AtomID},BondType); type Molecule = (Ind1,IndA,Lumo,LogP,{Atom},{Bond}); mutagenic::Molecule->Bool;

slide-27
SLIDE 27

LICS’01 workshop The logic of learning

z Examples:

mutagenic(True,False,-1.246,4.23, {(1,C,22,-0.117), (2,C,22,-0.117), …, (26,O,40,-0.388)}, {({1,2},7), …, ({24,26},2)}) = True;

z NB. Naming of sub-terms cannot be avoided here, because molecules are graphs rather than trees

Mutagenesis in Escher

atoms bonds

slide-28
SLIDE 28

LICS’01 workshop The logic of learning

Hypothesis:

mutagenic(m) = ind1P(m) == True || lumoP(m) <= -2.072 || (exists \a -> a 'in' atomSetP(m) && elementP(a)==C && atomTypeP(a)==26 && chargeP(a)==0.115) || (exists \b1 b2 -> b1 'in' bondSetP(m) && b2 'in' bondSetP(m) && bondTypeP(b1)==1 && bondTypeP(b2)==2 && not disjoint(labelSetP(b1),labelSetP(b2)) || (exists \a -> a 'in' atomSetP(m) && elementP(a)==C && atomTypeP(a)==29 && (exists \b1 b2 -> b1 'in' bondSetP(m) && b2 'in' bondSetP(m) && bondTypeP(b1)==7 && bondTypeP(b2)==1 && labelP(a) 'in' labelSetP(b1) && not disjoint(labelSetP(b1),labelSetP(b2)))) || …;

Mutagenesis in Escher

slide-29
SLIDE 29

LICS’01 workshop The logic of learning

Complexity of classification problems Complexity of classification problems z Simplest case: single table with primary key

y attribute-value or propositional learning y example corresponds to tuple of constants

z Next: single table without primary key

y multi-instance problem y example corresponds to set of tuples of constants

z Complexity resides in many-to-one foreign keys

y non-determinate variables y lists, sets, multisets

z Simplest case: single table with primary key

y attribute-value or propositional learning y example corresponds to tuple of constants

z Next: single table without primary key

y multi-instance problem y example corresponds to set of tuples of constants

z Complexity resides in many-to-one foreign keys

y non-determinate variables y lists, sets, multisets

slide-30
SLIDE 30

LICS’01 workshop The logic of learning

Understanding ILP Understanding ILP

z Back to Prolog: what do we learn from all this?

y structural predicates introduce local variables, utility predicates consume them y interactions between local variables should not be broken up ===> features y enhancement of existing transformation methods (e.g. LINUS) through feature construction

z Back to Prolog: what do we learn from all this?

y structural predicates introduce local variables, utility predicates consume them y interactions between local variables should not be broken up ===> features y enhancement of existing transformation methods (e.g. LINUS) through feature construction

slide-31
SLIDE 31

LICS’01 workshop The logic of learning

The key steps in rule learning The key steps in rule learning

z Hypothesis construction: find a set of n rules

y usually simplified by n separate rule constructions

z Rule construction: find a pair (Head, Body)

y e.g. select class and construct body

z Body construction: find a set of m literals

y usually simplified by adding one literal at a time

z Hypothesis construction: find a set of n rules

y usually simplified by n separate rule constructions

z Rule construction: find a pair (Head, Body)

y e.g. select class and construct body

z Body construction: find a set of m literals

y usually simplified by adding one literal at a time

slide-32
SLIDE 32

LICS’01 workshop The logic of learning

The key steps in rule learning The key steps in rule learning

z Hypothesis construction: find a set of n rules

y usually simplified by n separate rule constructions

z Rule construction: find a pair (Head, Body)

y e.g. select class and construct body

z Body construction: find a set of m features

y usually simplified by adding one feature at a time

z Feature construction: find a set of k literals

y e.g. interesting subgroup, frequent itemset y discovery task rather than classification task

z Hypothesis construction: find a set of n rules

y usually simplified by n separate rule constructions

z Rule construction: find a pair (Head, Body)

y e.g. select class and construct body

z Body construction: find a set of m features

y usually simplified by adding one feature at a time

z Feature construction: find a set of k literals

y e.g. interesting subgroup, frequent itemset y discovery task rather than classification task

slide-33
SLIDE 33

LICS’01 workshop The logic of learning

First-order features First-order features

z Features concern interactions of local variables z The following rule has one feature ‘has a short closed car’:

eastbound(T):-car(T,C),short(C),not open(C).

z The following rule has two features ‘has a short car’ and ‘has a closed car’:

eastbound(T):- car(T,C1),short(C1), car(T,C2),not open(C2).

z Features concern interactions of local variables z The following rule has one feature ‘has a short closed car’:

eastbound(T):-car(T,C),short(C),not open(C).

z The following rule has two features ‘has a short car’ and ‘has a closed car’:

eastbound(T):- car(T,C1),short(C1), car(T,C2),not open(C2).

slide-34
SLIDE 34

LICS’01 workshop The logic of learning

Propositionalising rules Propositionalising rules

z Equivalently:

eastbound(T):-hasShortCar(T),hasClosedCar(T). hasShortCar(T):-car(T,C1),short(C1). hasClosedCar(T):-car(T,C2),not open(C2).

z Given a way to construct and select first-order features, body construction in ILP is semi- propositional

y head and all literals in body have the same global variable(s) y corresponds to single table, one row per example

z Equivalently:

eastbound(T):-hasShortCar(T),hasClosedCar(T). hasShortCar(T):-car(T,C1),short(C1). hasClosedCar(T):-car(T,C2),not open(C2).

z Given a way to construct and select first-order features, body construction in ILP is semi- propositional

y head and all literals in body have the same global variable(s) y corresponds to single table, one row per example

slide-35
SLIDE 35

LICS’01 workshop The logic of learning

Prolog feature bias Prolog feature bias

z Flattened representation, but derived from strongly-typed term representation

y one free global variable y each (binary) structural predicate introduces a new existential local variable and uses either global variable or local variable introduced by other structural predicate y utility predicates only use variables y all variables are used

z NB. features can be non-boolean

y if all structural predicates are one-to-one

z Flattened representation, but derived from strongly-typed term representation

y one free global variable y each (binary) structural predicate introduces a new existential local variable and uses either global variable or local variable introduced by other structural predicate y utility predicates only use variables y all variables are used

z NB. features can be non-boolean

y if all structural predicates are one-to-one

slide-36
SLIDE 36

LICS’01 workshop The logic of learning

Example: mutagenesis Example: mutagenesis

y 42 regression-unfriendly molecules y 57 first-order features with one utility literal y LINUS using CN2: 83%

mutagenic(M,false):-not (has_atom(M,A),atom_type(A,21)), logP(M,L),L>1.99,L<5.64. mutagenic(M,false):-not (has_atom(M,A),atom_type(A,195)), lumo(M,Lu),Lu>-1.74,Lu<-0.83, logP(M,L),L>1.81. mutagenic(M,false):-lumo(M,Lu),Lu>-0.77. mutagenic(M,true):-has_atom(M,A),atom_type(A,21), lumo(M,Lu),Lu<-1.21. mutagenic(M,true):-logP(M,L),L>5.64,L<6.36. mutagenic(M,true):-lumo(M,Lu),Lu>-0.95, logP(M,L),L<2.21.

y 42 regression-unfriendly molecules y 57 first-order features with one utility literal y LINUS using CN2: 83%

mutagenic(M,false):-not (has_atom(M,A),atom_type(A,21)), logP(M,L),L>1.99,L<5.64. mutagenic(M,false):-not (has_atom(M,A),atom_type(A,195)), lumo(M,Lu),Lu>-1.74,Lu<-0.83, logP(M,L),L>1.81. mutagenic(M,false):-lumo(M,Lu),Lu>-0.77. mutagenic(M,true):-has_atom(M,A),atom_type(A,21), lumo(M,Lu),Lu<-1.21. mutagenic(M,true):-logP(M,L),L>5.64,L<6.36. mutagenic(M,true):-lumo(M,Lu),Lu>-0.95, logP(M,L),L<2.21.

slide-37
SLIDE 37

LICS’01 workshop The logic of learning

Feature construction: summary Feature construction: summary

z All the expressiveness of ILP is in the features

y body construction is essentially propositional y every ILP system does constructive induction

z Feature construction is a discovery task

y use of discovery systems such as Warmr, Tertius or Midos y alternative: use a relevancy filter

z All the expressiveness of ILP is in the features

y body construction is essentially propositional y every ILP system does constructive induction

z Feature construction is a discovery task

y use of discovery systems such as Warmr, Tertius or Midos y alternative: use a relevancy filter

slide-38
SLIDE 38

LICS’01 workshop The logic of learning

Overview of this talk Overview of this talk

z A quick overview of ILP z Knowledge representation y individual-centred representations z Learning as inference y inductive consequence relations z Conclusions and outlook z A quick overview of ILP z Knowledge representation y individual-centred representations z Learning as inference y inductive consequence relations z Conclusions and outlook

slide-39
SLIDE 39

LICS’01 workshop The logic of learning

Inductive consequence relations Inductive consequence relations

z I write E |< H for ‘H is a possible inductive hypothesis given evidence E’

y like deduction: from input to output y unlike deduction: possibly unsound

z What are sensible properties of |< ? z What are possible material definitions of |< ? z I write E |< H for ‘H is a possible inductive hypothesis given evidence E’

y like deduction: from input to output y unlike deduction: possibly unsound

z What are sensible properties of |< ? z What are possible material definitions of |< ?

slide-40
SLIDE 40

LICS’01 workshop The logic of learning

(I1) If α |< β and =α∧β→γ, then α∧γ |< β. (I2) If α |< β and =α∧β→γ, then α∧¬γ |< β. (I2′) If =β→¬α, then α |< β. (I3) If α |< β and =α∧β→γ, then α |< β∧γ. (I4) If α |< β, then α |< α. (I5) If α |< β, then β |< β. (I6) If α |< β and =β↔γ, then α |< γ. (I7) If α |< γ and =α↔β, then β |< γ. (I1) If α |< β and =α∧β→γ, then α∧γ |< β. (I2) If α |< β and =α∧β→γ, then α∧¬γ |< β. (I2′) If =β→¬α, then α |< β. (I3) If α |< β and =α∧β→γ, then α |< β∧γ. (I4) If α |< β, then α |< α. (I5) If α |< β, then β |< β. (I6) If α |< β and =β↔γ, then α |< γ. (I7) If α |< γ and =α↔β, then β |< γ. /

General induction postulates General induction postulates

/

slide-41
SLIDE 41

LICS’01 workshop The logic of learning

Explanatory induction Explanatory induction

z E |< H is interpreted as ‘evidence E is explained by hypothesis H’

y induction as reverse deduction

z Close link with abduction

y Peirce: ‘if A were true, C would be a matter of course’

z Depends on notion of explanation z E |< H is interpreted as ‘evidence E is explained by hypothesis H’

y induction as reverse deduction

z Close link with abduction

y Peirce: ‘if A were true, C would be a matter of course’

z Depends on notion of explanation

slide-42
SLIDE 42

LICS’01 workshop The logic of learning

(E1) If α |< β, =γ→β and γ |< γ, then α |< γ. (E2) If γ |< γ and ¬α |< γ, then α |< α. (E3) If α |< β∧γ, then β→α |< γ. (E4) If α |< γ and β |< γ, then α∧β |< γ. (E5) If α |< γ and =α→β, then β |< γ. (E1) If α |< β, =γ→β and γ |< γ, then α |< γ. (E2) If γ |< γ and ¬α |< γ, then α |< α. (E3) If α |< β∧γ, then β→α |< γ. (E4) If α |< γ and β |< γ, then α∧β |< γ. (E5) If α |< γ and =α→β, then β |< γ. /

Explanatory induction postulates Explanatory induction postulates

slide-43
SLIDE 43

LICS’01 workshop The logic of learning

z Let |~ be an explanation mechanism, and define the explanatory power of a formula α as C~ = { γ | α |~ γ } z The explanatory consequence relation |< based on |~ is defined as α |< β iff C~(α) ⊆ C~(β) ⊂ L z (E1–5) are sound and complete if |~ = |= z Let |~ be an explanation mechanism, and define the explanatory power of a formula α as C~ = { γ | α |~ γ } z The explanatory consequence relation |< based on |~ is defined as α |< β iff C~(α) ⊆ C~(β) ⊂ L z (E1–5) are sound and complete if |~ = |=

Explanatory semantics Explanatory semantics

slide-44
SLIDE 44

LICS’01 workshop The logic of learning

Confirmatory induction Confirmatory induction

z E |< H is interpreted as ‘evidence E confirms hypothesis H’ z A kind of closed-world reasoning

y ‘assume that everything you haven’t seen behaves like something you have seen’ y closely related to non-monotonic reasoning

z E |< H is interpreted as ‘evidence E confirms hypothesis H’ z A kind of closed-world reasoning

y ‘assume that everything you haven’t seen behaves like something you have seen’ y closely related to non-monotonic reasoning

slide-45
SLIDE 45

LICS’01 workshop The logic of learning

(C1) If α |< β and =β→γ, then α |< γ. (C2) If α |< α and α |< ¬β, then β |< β. (C3) If α |< β and α |< γ, then α |< β∧γ. (C4) If α |< γ and β |< γ, then α∨β |< γ. (C5) If α |< β and α |< γ, then α∧γ |< β. (C1) If α |< β and =β→γ, then α |< γ. (C2) If α |< α and α |< ¬β, then β |< β. (C3) If α |< β and α |< γ, then α |< β∧γ. (C4) If α |< γ and β |< γ, then α∨β |< γ. (C5) If α |< β and α |< γ, then α∧γ |< β.

Confirmatory induction postulates Confirmatory induction postulates

/

slide-46
SLIDE 46

LICS’01 workshop The logic of learning

z Let Reg be a function constructing a set of regular models from observations α z The confirmatory consequence relation |< based on Reg is defined as α |< β iff ∅ ⊂ Reg(α) ⊆ [β] z (C1–5) are sound and complete if Reg(α) are the most preferred models of α z Let Reg be a function constructing a set of regular models from observations α z The confirmatory consequence relation |< based on Reg is defined as α |< β iff ∅ ⊂ Reg(α) ⊆ [β] z (C1–5) are sound and complete if Reg(α) are the most preferred models of α

Confirmatory semantics Confirmatory semantics

slide-47
SLIDE 47

LICS’01 workshop The logic of learning

Overview of this talk Overview of this talk

z A quick overview of ILP z Knowledge representation y individual-centred representations z Learning as inference y inductive consequence relations z Conclusions and outlook z A quick overview of ILP z Knowledge representation y individual-centred representations z Learning as inference y inductive consequence relations z Conclusions and outlook

slide-48
SLIDE 48

LICS’01 workshop The logic of learning

First-order representations inÉ First-order representations inÉ

z …probabilistic models

y Koller’s probabilistic relational models y first-order Bayesian classification with 1BC y towards first-order Bayesian networks

z …support vector machines

y kernels on sequences y a kernel on Escher terms

z …neural networks

y recurrent NN for Escher terms

z …probabilistic models

y Koller’s probabilistic relational models y first-order Bayesian classification with 1BC y towards first-order Bayesian networks

z …support vector machines

y kernels on sequences y a kernel on Escher terms

z …neural networks

y recurrent NN for Escher terms

slide-49
SLIDE 49

LICS’01 workshop The logic of learning

The naive Bayes classifier The naive Bayes classifier

z Bayesian classifier: z Naive Bayes assumption (propositional case): z Bayesian classifier: z Naive Bayes assumption (propositional case):

argmax ( | ) argmax ( | ) ( ) ( ) argmax ( | ) ( ) argmax ( ) ( | )

c c c c i i i

P c d P d c P c P d P d c P c P c P A a c = = = =

argmax ( | ) argmax ( | ) ( ) ( ) argmax ( | ) ( ) argmax ( ) ( | )

c c c c i i i

P c d P d c P c P d P d c P c P c P A a c = = = =

slide-50
SLIDE 50

LICS’01 workshop The logic of learning

Individual

Naive Bayes net Naive Bayes net

Class A1 A2 A3

Class Class

Individual Individual

A1 A1 A2 A2 A3 A3

slide-51
SLIDE 51

LICS’01 workshop The logic of learning

Molecule

Towards first-order Bayes nets Towards first-order Bayes nets

Class AtomSet LogP LUMO {Atom}

Molecule Molecule

Class Class

Contains Contains Atom Atom

Element Element AtomType AtomType

1 1 M M

Lumo Lumo LogP LogP

Has Has AtomSet AtomSet 1 1 1 1

Charge Charge

1 1

slide-52
SLIDE 52

LICS’01 workshop The logic of learning

Support vector machines Support vector machines

z Wide margin classifier

y support vectors are the datapoints closest to the separating hyperplane

z Kernel: (implicit) transformation to feature space

y to deal with problems that are not linearly separable in input space y feature space is often high-dimensional

z Wide margin classifier

y support vectors are the datapoints closest to the separating hyperplane

z Kernel: (implicit) transformation to feature space

y to deal with problems that are not linearly separable in input space y feature space is often high-dimensional

slide-53
SLIDE 53

LICS’01 workshop The logic of learning

Primal and dual form Primal and dual form

z Linear classifiers construct a hyperplane separating the input points

y decision rule y hypothesis y equivalently where αi represent hypothesis in dual co-ordinates

z Linear classifiers construct a hyperplane separating the input points

y decision rule y hypothesis y equivalently where αi represent hypothesis in dual co-ordinates

h b ( ) sgn( ) x w x = ⋅ + h b ( ) sgn( ) x w x = ⋅ + w x = ∑ αi

i i i

y w x = ∑ αi

i i i

y h y b

i i i i

( ) sgn x x x = ⋅ +

( )

∑ α

h y b

i i i i

( ) sgn x x x = ⋅ +

( )

∑ α

slide-54
SLIDE 54

LICS’01 workshop The logic of learning

Kernels Kernels

z Learning in feature space: z A kernel calculates the inner product directly in input space:

y This measures the similarity between x and z in terms of features φ

z Learning in feature space: z A kernel calculates the inner product directly in input space:

y This measures the similarity between x and z in terms of features φ

h y b

i i i i

( ) sgn ( ) ( ) x x x = ⋅ +

( )

∑ α

φ φ h y b

i i i i

( ) sgn ( ) ( ) x x x = ⋅ +

( )

∑ α

φ φ

K( , ) ( ) ( ) x z x z = ⋅ φ φ K( , ) ( ) ( ) x z x z = ⋅ φ φ

slide-55
SLIDE 55

LICS’01 workshop The logic of learning

A kernel for Escher terms A kernel for Escher terms

z Let x and z be terms of type T. We define KT(x,z) recursively as follows:

y If T = T1 x ... x Tn is a tuple type, x = (x1,...,xn) and z = (z1,...,zn), then KT(x,z) = KT1(x1,z1) + ... + KTn(xn,zn). y If T = {T'} is a set type, x = {x1,...,xn} and z = {z1,...zm}, then KT(x,z) = KT'(x1,z1) + ... + KT'(x1,zm) + KT'(x2,z1) + ... + KT'(x2,zm) + ... + KT'(xn,zm). y If x = f(x1,...,xn) and z = f(z1,...,zn) where f is a data constructor of type T1 -> ... -> Tn -> T, then KT(x,z) = 1 + KT1(x1,z1) + ... + KTn(xn,zn); if x and z have different data constructors then KT(x,z) = 0.

z Let x and z be terms of type T. We define KT(x,z) recursively as follows:

y If T = T1 x ... x Tn is a tuple type, x = (x1,...,xn) and z = (z1,...,zn), then KT(x,z) = KT1(x1,z1) + ... + KTn(xn,zn). y If T = {T'} is a set type, x = {x1,...,xn} and z = {z1,...zm}, then KT(x,z) = KT'(x1,z1) + ... + KT'(x1,zm) + KT'(x2,z1) + ... + KT'(x2,zm) + ... + KT'(xn,zm). y If x = f(x1,...,xn) and z = f(z1,...,zn) where f is a data constructor of type T1 -> ... -> Tn -> T, then KT(x,z) = 1 + KT1(x1,z1) + ... + KTn(xn,zn); if x and z have different data constructors then KT(x,z) = 0.

slide-56
SLIDE 56

LICS’01 workshop The logic of learning

Recurrent neural networks Recurrent neural networks

z Consist of a recurrent or folding part that is unfolded to encode a given input tree, followed by a traditional feed-forward network z Folding part trained by backpropagation through structure z Generalises naturally to terms z Consist of a recurrent or folding part that is unfolded to encode a given input tree, followed by a traditional feed-forward network z Folding part trained by backpropagation through structure z Generalises naturally to terms

slide-57
SLIDE 57

LICS’01 workshop The logic of learning

(f((a,b),(c,d)), [4,21,42]) f((a,b),(c,d)) [4,21,42] f (a,b) (c,d) a b c d 4 [21,42] (:) 21 [42] (:) 42 [] (:)

Recurrent NN for Escher terms Recurrent NN for Escher terms

T x List Int T' x T' -> T' x T' -> T Int -> List Int -> List Int T' x T' T x List Int T' x T' -> T' x T' -> T Int -> List Int -> List Int T' x T'

slide-58
SLIDE 58

LICS’01 workshop The logic of learning

Concluding remarks Concluding remarks

z Data models and knowledge representation are integral parts of any approach to learning, modelling and reasoning z Individual-centred representation are natural in classification and provide better understanding

  • f the relation with propositional approaches

z There is still much to explore in upgrading existing propositional approaches with richer knowledge representation z Data models and knowledge representation are integral parts of any approach to learning, modelling and reasoning z Individual-centred representation are natural in classification and provide better understanding

  • f the relation with propositional approaches

z There is still much to explore in upgrading existing propositional approaches with richer knowledge representation

slide-59
SLIDE 59

LICS’01 workshop The logic of learning

Acknowledgements Acknowledgements

z Joint work with

y Nicolas Lachiche y John Lloyd y Christophe Giraud-Carrier y Nada Lavrac y Thomas Gaertner y Elias Gyftodimos

z Joint work with

y Nicolas Lachiche y John Lloyd y Christophe Giraud-Carrier y Nada Lavrac y Thomas Gaertner y Elias Gyftodimos