SLIDE 8 8 Possible Discretization of Continuous Attributes
- Use C4.5-Disc
- Quick overview:
– Extract reduced data set that only contains attribute to discretize and desired classification. – From that build up decision tree using the C4.5 algorithm (another rule induction algorithm). – Result: Decision tree with binary decisions x ≤ a → go left; x > a → go right – Each path corresponds to the definition of a categorical interval.
AntMiner’s Parameters
- Number of ants (3000 used in experiments). Also limits the
maximum number of rules found for a classification. Is not necessarily exploited because algorithm might converge before.
- Minimum number of cases per rule (10). Each rule must at least
cover so many cases. Avoids overfitting.
- Maximum number of uncovered classes in the training set (10).
The algorithm stops when there are only fewer instances left.
- Number of rules to test for the convergence of the ants (10). The
algorithm waits so long for an improvement.
Sample Run Start
- Deciding whether to play outside
– Attributes: outlook, temperature, humidity, windy, play – Classes: play (yes), do not play (no) – sunny,hot,high,FALSE,no (1) – sunny,hot,high,TRUE,no (2) –
- vercast,hot,normal,FALSE,yes (3)
– rainy,mild,high,FALSE,yes (4) – rainy,cool,normal,FALSE,yes (5) – rainy,cool,normal,TRUE,no (6) –
- vercast,cool,normal,TRUE,yes (7)
– sunny,mild,high,FALSE,no (8) – sunny,cool,normal,FALSE,yes (9) – rainy,mild,normal,FALSE,yes (10) – sunny,mild,normal,TRUE,yes (11) –
- vercast,mild,high,TRUE,yes (12)
–
- vercast,hot,normal,FALSE,yes (13)
– rainy,mild,high,TRUE,no (14)
- Sample run for finding one rule set.
- Start: I={all}, R={}
- Ant 1: Choose probabilistically
- utlook=overcast (then play=yes)
- Ant 1: Chooses values for other
attributes…
- Ant 1: Finishes because all attributes are
used.
- Ant 1: Last three conditions are pruned
away.
- I={1,2,4,5,6,8,9,10,11,14},
R={outlook=overcast → yes)
- Ant 2: Choose outlook=rainy (then
play=yes)
- Rule is not good enough (3:2)
- Ant 2: Choose windy=true (then play=no)
- Ant 2 finishes because otherwise covered
set would be too small.
- No pruning possible either.
- …