RainForest RainForest – – A Framework for Fast A Framework for Fast Decision Tree Construction of Large Decision Tree Construction of Large Datasets Datasets
Pre Presented by ted by:
Leila Homaeian
CMPUT 695
- Nov. 25, 2004
Leila Homaeian CMPUT 695 2
Outline Outline
- 1. Introduction
- 2. Problem Definition
- 3. Related Work
- 4. RainForest Framework
- Family of Algorithms
- 5. Experiments
- 6. Conclusion
Leila Homaeian CMPUT 695 3
Introduction Introduction
Classification is an important data mining problem. Input: database of training records Each record has a class label and predictor attributes. The resulting model assigns class labels to testing records. Decision tree construction algorithms: Easily assimilated by humans Can be constructed fast Highly accurate
Introduction Problem Definition Related Work RainForest Framework Experiments Conclusion
Leila Homaeian CMPUT 695 4
Introduction (cont’d) Introduction (cont’d)
Proposed approaches to deal with large datasets:
- Discretize ordered attributes
- Sampling at each node of the
classification tree Assume dataset fits in main memory
- Partitioning methods such that
each partition fits in main memory Quality?
Introduction Problem Definition Related Work RainForest Framework Experiments Conclusion