Aggregation and Selection in Relational Data Mining Celine Vens - - PowerPoint PPT Presentation

aggregation and selection in relational data mining
SMART_READER_LITE
LIVE PREVIEW

Aggregation and Selection in Relational Data Mining Celine Vens - - PowerPoint PPT Presentation

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions Aggregation and Selection in Relational Data Mining Celine Vens Anneleen Van Assche Hendrik Blockeel Sa so D


slide-1
SLIDE 1

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions

Aggregation and Selection in Relational Data Mining

Celine Vens Anneleen Van Assche Hendrik Blockeel Saˇ so Dˇ zeroski Department of Computer Science - K.U.Leuven Department of Knowledge Technologies - Jozef Stefan Institute, Slovenia

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining

slide-2
SLIDE 2

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions

Outline

◮ Introduction ◮ Aggregation and Selection ◮ Relational Decision Trees ◮ Relational Random Forests ◮ Experimental Results ◮ Conclusions

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining

slide-3
SLIDE 3

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions

Relational Data Mining

◮ Data Mining: searching for patterns in (large) databases. ◮ Propositional (Classical) Data Mining:

◮ data is stored in single table ◮ patterns involve intra-tuple relations

◮ Relational Data Mining:

◮ data is stored in multiple tables (relational database) ◮ patterns involve inter-tuple or inter-table relations ◮ how to deal with 1-n or m-n relations (sets)?

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining

slide-4
SLIDE 4

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions

Working Example

Current relational learners : 2 approaches to dealing with sets

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining

slide-5
SLIDE 5

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions

Outline

◮ Introduction ◮ Aggregation and Selection ◮ Relational Decision Trees ◮ Relational Random Forests ◮ Experimental Results ◮ Conclusions

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining

slide-6
SLIDE 6

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions

First approach: Aggregation

◮ Use SQL-like aggregation to summarize set in one big table ◮ Apply classical data mining technique (e.g. decision tree

inducer)

◮ Optimized for highly non-determinate domains

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining

slide-7
SLIDE 7

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions

Second approach: Selection

◮ Apply relational data mining technique (e.g. relational

decision tree inducer)

◮ Test for existence of specific elements in the set ◮ Optimized for structurally complex domains ◮ e.g. ILP: Inductive Logic Programming

◮ database and patterns in Prolog ◮ possibility to add background knowledge

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining

slide-8
SLIDE 8

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions

Example concepts

  • 1. Persons that have two books.
  • 2. Persons that have a computer book.
  • 3. Persons that have two computer books.

How to express concept 3??

◮ Selective methods need aggregate function in background

knowledge.

◮ Aggregating methods need separate relation for each genre.

Solution: combine aggregation and selection in context of relational data mining

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining

slide-9
SLIDE 9

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions Decision Trees Relational Decision Trees Combining selection and aggregation

Outline

◮ Introduction ◮ Aggregation and Selection ◮ Relational Decision Trees

◮ Decision Trees ◮ Relational Decision Trees ◮ Combining Aggregation and Selection

◮ Relational Random Forests ◮ Experimental Results ◮ Conclusions

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining

slide-10
SLIDE 10

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions Decision Trees Relational Decision Trees Combining selection and aggregation

Decision Trees

◮ One of the most widely used and practical data mining

methods

◮ Each internal node contains a test on some attribute ◮ Each leaf contains a prediction ◮ Classification of new instance

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining

slide-11
SLIDE 11

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions Decision Trees Relational Decision Trees Combining selection and aggregation

Decision Trees: learning them

◮ Divide & conquer algorithm ◮ Pseudocode:

grow node(Node,Examples): IF stopcriterium: assign majority class from Examples to Node ELSE generate all possible tests for Node associate best test with Node grow two childnodes Left and Right split Examples into ExamplesPass and ExamplesFail grow node(Left,ExamplesPass) grow node(Right,ExamplesFail)

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining

slide-12
SLIDE 12

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions Decision Trees Relational Decision Trees Combining selection and aggregation

Relational Decision Trees: learning them

◮ Upgrade of classical algorithm: Tilde [Blockeel and De Raedt ’98] ◮ Trees are relational: contain first order logic literals in test of

internal node

◮ Selective approach (ILP) ◮ Tests can introduce variables : possible tests may differ at

each node

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining

slide-13
SLIDE 13

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions Decision Trees Relational Decision Trees Combining selection and aggregation

Adding aggregation

◮ User specifies basic components: aggregate functions, sets to

be aggregated, query to generate set to be aggregated

◮ Aggregate conditions are created, using discretization ◮ Aggregate conditions are added to the set of possible tests

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining

slide-14
SLIDE 14

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions Decision Trees Relational Decision Trees Combining selection and aggregation

Adding selections to aggregation: first manner

◮ If a node contains an aggregation, any node in its left subtree

can add a selection within that aggregate condition

◮ Local search within aggregate condition

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining

slide-15
SLIDE 15

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions Decision Trees Relational Decision Trees Combining selection and aggregation

Adding selections to aggregation: second manner

◮ Lookahead

◮ technique to look ahead in refinement lattice ◮ add several literals at once ◮ computationally expensive

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining

slide-16
SLIDE 16

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions Decision Trees Relational Decision Trees Combining selection and aggregation

Relational Decision Trees with aggregation and selection: learning them

◮ Pseudocode:

grow node(Node,Examples): IF stopcriterium: assign majority class from Examples to Node ELSE generate all possible first order tests for Node: usual tests aggregate functions refinement of aggregate function higher in tree associate best test with Node grow two childnodes Left and Right split Examples into ExamplesPass and ExamplesFail grow node(Left,ExamplesPass) grow node(Right,ExamplesFail)

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining

slide-17
SLIDE 17

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions Decision Trees Relational Decision Trees Combining selection and aggregation

Relational Decision Trees with aggregation and selection: problem

◮ Number of tests at each node in the tree grows very fast ◮ Need some way to deal with it ◮ Make use of technique from classical data mining:

Random Forests [Breiman ’01]

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining

slide-18
SLIDE 18

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions Random Forests Relational Random Forests

Outline

◮ Introduction ◮ Aggregation and Selection ◮ Relational Decision Trees ◮ Relational Random Forests

◮ Random Forests ◮ Relational Random Forests

◮ Experimental Results ◮ Conclusions

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining

slide-19
SLIDE 19

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions Random Forests Relational Random Forests

Random Forests

◮ Random Forests

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining

slide-20
SLIDE 20

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions Random Forests Relational Random Forests

Random Forests

◮ Random Decision Tree Algorithm |T ′| = f (|T|) with e.g. f (x) = 0.1x or f (x) = √x

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining

slide-21
SLIDE 21

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions Random Forests Relational Random Forests

Relational Random Forests

◮ Random forests with our relational decision tree algorithm. ◮ Pseudocode: grow node(Node,Examples,Probability): IF stopcriterium: assign majority class from Examples to Node ELSE generate all possible first order tests for Node: usual tests aggregate functions refinement of aggregate function higher in tree select random subset from possible tests using Probability associate best test out of random subset with Node grow two childnodes Left and Right split Examples into ExamplesPass and ExamplesFail grow node(Left,ExamplesPass,Probability) grow node(Right,ExamplesFail,Probability)

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining

slide-22
SLIDE 22

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions Real world data Artificial data

Outline

◮ Introduction ◮ Aggregation and Selection ◮ Relational Decision Trees ◮ Relational Random Forests ◮ Experimental Results

◮ Real world data ◮ Artificial data

◮ Conclusions

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining

slide-23
SLIDE 23

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions Real world data Artificial data

Experimental Setup: Real world data

◮ Average over 5 times 5-fold cross-validation ◮ Different parameters:

◮ number of trees: 3, 11, 33 ◮ proportion of feature sample: 100%, 75%, 50%, 25%, 10%,

sqrt

◮ level of aggregates: No Aggregates (NA), Simple Aggregates

(SA), Refined Aggregates (RA), Lookahead Aggregates (LA)

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining

slide-24
SLIDE 24

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions Real world data Artificial data

Experimental Results: Real world data

◮ The effect of aggregates and the number of trees (P = 0.25)

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining

slide-25
SLIDE 25

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions Real world data Artificial data

Experimental Results: Real world data

◮ The effect of the number of features (e.g. Mutagenesis)

FORF (33 trees) P LA RA SA NA 1 0.779 0.774 0.777 0.731 0.75 0.774 0.775 0.764 0.720 0.50 0.777 0.781 0.789 0.747 0.25 0.770 0.772 0.758 0.736 0.10 0.790 0.765 0.761 0.653 sqrt 0.785 0.752 0.743 0.690 Tilde NA 0.720

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining

slide-26
SLIDE 26

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions Real world data Artificial data

Experimental Results: Real world data

◮ Compared to other systems

(FORF-SA uses 33 trees and 25% of the features)

◮ Financial

FORF-SA DINUS-C RELAGGS PROGOL 0.993 (0.005) 0.851 (0.103) 0.880 (0.065) 0.863 (0.071)

◮ Diterpenes

FORF-SA FOIL IBL-matchings ICL 0.928 (0.006) 0.783 (0.011) 0.935 (0.006) 0.860 (0.009)

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining

slide-27
SLIDE 27

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions Real world data Artificial data

Experimental Results: Artificial data

Summary of experimental results so far:

◮ Positive effect of random forest ◮ Positive effect of adding (simple) aggregates ◮ Effect of combination of aggregates and selection?

◮ Artificial dataset

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining

slide-28
SLIDE 28

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions Real world data Artificial data

Experimental Results: Artificial data

Datagenerator for east-/ westbound trains.

◮ 800 trains, 400 in each direction ◮ Target concept:

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining

slide-29
SLIDE 29

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions Real world data Artificial data

Experimental Results: Artificial data

Results (P = 0.25, number of trees = 33)

◮ Accuracy

LA RA SA NA 0.98 0.84 0.79 0.75

◮ Avg number of nodes in a tree

LA RA SA NA 2.87 9.42 10.03 9.78

◮ Average induction time

LA RA SA NA 36.41 6.93 4.97 2.84

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining

slide-30
SLIDE 30

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions

Outline

◮ Introduction ◮ Aggregation and Selection ◮ Relational Decision Trees ◮ Relational Random Forests ◮ Experimental Results ◮ Conclusions

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining

slide-31
SLIDE 31

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions

Conclusions

◮ First order random forest induction algorithm based on Tilde ◮ Feature space enlarged by including aggregates ◮ Refinement operator adjusted to include selection conditions

within the aggregates

◮ Strength was experimentally shown

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining

slide-32
SLIDE 32

Introduction Aggregation and Selection Relational Decision Trees Relational Random Forests Experimental Results Conclusions

Acknowledgements and References

Acknowledgements:

◮ Maurice Bruynooghe

Full Paper:

◮ C. Vens, A. Van Assche, H. Blockeel, and S. Dzeroski, First Order

Random Forests with Complex Aggregates, Proceedings of the 14th International Conference on Inductive Logic Programming (ILP-2004), Porto, Portugal, 2004

  • C. Vens, A. Van Assche, H. Blockeel, S.Dˇ

zeroski Aggregation and Selection in Relational Data Mining