Outline Utilizing Diversity and Performance Introduction - PDF document

2009-03-24 Outline Utilizing Diversity and Performance � Introduction Measures for Ensemble Creation � Data Mining � Predictive Modelling � Ensembles � Diversity � Information Fusion � Problem Statement � Research and Results � Implicit Diversity � Estimating Ensemble Performance � Evaluating Optimization Criteria Tuve Löfström � Combining Measures � Licentiate Thesis Conclusions � Discussion Introduction Introduction � “Data mining is the process of exploration and analysis, � Predictive modeling is one of the key tasks in data by automatic or semiautomatic means, of large mining quantities of data in order to discover meaningful � The objective when performing predictive patterns and rules” � Berry and Linoff 1997 modeling is to predict a value for a specific variable � the target variable � The aim of data mining is to � Most often a predictive model is found from � “be able to respond to the patterns, to act on them, directed data mining ultimately turning the data into information, the � a top-down approach where a mapping from an input information into action, and the action into value” vector to a scalar output is learnt from samples � Berry and Linoff 1997 �� Introduction Variables Targets � The task is either classification or regression �� When performing classification the target value � �� must be any of a pre-defined set of values �� For regression, the target value is a continuous �� value �� Instances �� ! �� The normal procedure is to use historical data �� ! �� with known target values to build models that �! �� could later be used for prediction �� 1

2009-03-24 A decision tree A rule set JChipper rules: =========== �� IF ( petalwidth <= 0.6 ) THEN Iris-setosa [50/0] �� IF ( petalwidth <= 1.7 ) AND ( petallength <= 5.0 ) THEN Iris-versicolor [50/2] �� DEFAULT: Iris-virginica [50/2] Number of Rules : 3 Number of Conditions : 4 A neural net Ensembles � An ensemble is a composite model, aggregating multiple base models into one predictive model � An ensemble prediction, consequently, is a function of all included base models � Both theory and a wealth of empirical studies have established that ensembles are generally = �� y f � � f � � � �� more accurate than single predictive models i An ensemble Diversity � For the ensemble approach to work, the � �� ensemble must contain diversity �� There would be no point in combining only models that always M � � M � � �� M n � � Make the same mistakes � Add the same information � � �� F � M � �� M � ��!�� M n �� We want models that perform well �� individually and complement each other � � �� y "�� 2

2009-03-24 The need for diversity Diversity Measures � �� • Overall ensemble error depends on average error of � Diversity is well defined for regression problems �� ensemble members and diverisity � � � � � � �� S � � Not for classification problems • Increasing diversity decreases � Several different heuristical diversity measures for a overall error h � � h � � �� h S � classification context have been proposed. •Provided it does not result � �� in an increase in average � Two types of measures �� error �� h "� � F � h � �� h � ��!�� h S �� S � � Pairwise measures (Krogh and Vedelsby, 1995) h � � h � � �� h S � �� Compare all pairs and average over the � � �� y "�� Unfortunately average error and results � � �� h "� � F � h � �� h � ��!�� h S �� diversity are highly correlated � Non-pairwise measures �� y "�� = − E E A � Measure all members together Information Fusion Ensembles in Information Fusion � Information fusion is the research about how to aid � One of the characteristics of information fusion is the decision makers with different tasks, by combining data need to combine data from several sources and information from various sources � To understand the whole picture from all the various fractions of data that is gathered � It is characterized by the necessity to gather data about � Obviously, the use of ensembles is a very natural objects or situations from multiple sources and combine them to enable effective decision support, framework for information fusion often under severe time and resource constraints � New base models can be added when new sources are added � Each source can only provide information from its � Old models can be updated or dropped when they become specific point of view and often only about some too faulty or sources are removed or lost specific feature. Diversity and Information Fusion Problem Statement � Diversity in ensembles is achieved by dividing � The main problem: datasets into: How should ensembles be created to maximize predictive performance? � The problem statement: � Different feature sets How could measurements of diversity and predictive performance on available � Different subsets of data data be used when combining or selecting base classifiers in order to maximize � Measurements of the problem from different ensemble predictive performance on unseen data? perspectives � The final goal when building predictive models is to achieve as high predictive performance as possible, this is inherent in the need of a predictive model � The data used in Information Fusion often come: � An ensemble can be formed either by simply combining available base classifiers, � from different kinds of sensors or by selecting a subset of base classifiers � This means that diversity and performance measures can be used either to guide in the � with different intervalls selection or as an implicit goal when creating the models to combine � from sensors at different positions 3

Outline Utilizing Diversity and Performance Introduction - PDF document

2009-03-24 Outline Utilizing Diversity and Performance Introduction Measures for Ensemble Creation Data Mining Predictive Modelling Ensembles Diversity Information Fusion Problem Statement Research and

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

Living Building Challenge_Water Petal Petal Intent... The intent of the Water Petal is to

Realising Asset-Based Development and Local Potentiality: The Petal Model Geoff Brown, Manager

ARTIFICIAL SUPERHYDROPHOBIC SURFACES WITH HIGH AND LOW ADHESION Eun Kyu Her and Kyu Hwan Oh* 1

Virtual Actors Machine emulation of character gesture behaviour as portrayed by human actors - By

Hot Topics in Course Redesign: What about the English Department? National Center for Academic

Women in Leadership 2018 Kim Parker Director, Social Trends Research Colloquium on Global

Agenda Introductions (Chairs) Our Charge (Chairs) Timeline and Process (Chairs)

sr r ts