20-03-28 9. General Improvement Techniques 9.1 Preprocessing There - PDF document

20-03-28 9. General Improvement Techniques 9.1 Preprocessing There are several concepts/techniques to improve The following is based on slides by Michael M. Richter. learning results that are general enough to be applicable Data preprocessing aims to solve/lessen the following to several learning concepts, often using task or problems: application specific knowledge to achieve their results. } data quality not sufficient These techniques can be grouped into } too large a number of features } preprocessing } too large a number of examples } ensemble/combination } wrong representation of data } cooperation } postprocessing Machine Learning J. Denzinger Machine Learning J. Denzinger Types of Data Preprocessing Dealing with insufficient data quality Cleaning of data: } Deleting examples with missing feature values } removing of errors and fixing of missing data } Deleting a particular feature (where the values are } elimination of noise often missing) from all examples Data integration and transformation: } Manually adding/correcting feature values } creating examples out of several data bases (projection } Extending values of a feature by new value “unknown” and join) } Adding a “default”-value for a missing feature value } change of representation Data reduction: } Trying to predict a missing feature value } eliminating features and/or examples F constitutes its own learning problem all need substantial application specific knowledge Machine Learning J. Denzinger Machine Learning J. Denzinger Dealing with insufficient data quality due to noisy data Dealing with too many features Identifying inconsistent values and outliers by Also known as reduction of dimensionality or attribute (or feature) selection. Aims at deleting } checking semantic consistency conditions } irrelevant features and } using set partitioning techniques } redundant features Treating detected examples by Identifying irrelevant features: } previous methods } small variance } using value from nearest cluster } little or no correlation with classification feature } binning for numerical values } not important for classification feature } by, for example, learning decision tree from small subset of examples and feature does not occur in tree Machine Learning J. Denzinger Machine Learning J. Denzinger 1

20-03-28 Dealing with too many features (cont.) Dealing with too many examples Identifying redundant features: Perform data sampling! } performing semantic analysis of data (for example, Different sampling methods: realizing that there are birth year and age as features) } random sampling } features that have high correlation with other } cluster sampling: randomly put examples into clusters features and randomly select the clusters to sample from } stratified sampling: create simple clusters (for But: sometimes redundant features can be relevant! example by clustering according to different values of For example, features that have been computed out of one feature) and randomly select examples out of others obviously have high correlation with those each cluster features. We do not want to remove those! Machine Learning J. Denzinger Machine Learning J. Denzinger Transformation of data Changing the type of certain data Used to get data into a form that makes it more Aimed at changing the representation but not the suitable for a chosen learning method. Includes information content of data. } methods for noise reduction (as looked at before) } Normalizing numerical feature value spaces } changing the type of certain data (i.e. the value space } into [0..1] of a feature) } into their logarithms } transformation of integer codes into symbol values } applying generalization/using taxonomies (and vice versa) } aggregation of several examples into one } like 1->male, 2->female } construction of new features out of other ones } conversion of types } like a string into a symbolic value Machine Learning J. Denzinger Machine Learning J. Denzinger Generalization Aggregation Reduction of the information content of a feature. Combining several examples fulfilling a certain condition. } Exchange of a numerical feature by a symbolic one (that still has some quantitative meaning) New feature values for this new example need to be generated by: } for example: high, medium, low } Combination of several feature values into one value } summing “old” values up } for example: specific diseases into disease types } creating average of “old” values è use of a taxonomy } counting entries } clustering of the values of a feature can also be used to } ... find good groups to combine. Machine Learning J. Denzinger Machine Learning J. Denzinger 2

20-03-28 Construction of new features New feature should be relevant for the goal of the learning ( F application knowledge required). Note that the difference to aggregation is that the feature value computation is done within an example not out of several examples. Example: profit := income - expenses Machine Learning J. Denzinger 3

20-03-28 9. General Improvement Techniques 9.1 Preprocessing There - PDF document

20-03-28 9. General Improvement Techniques 9.1 Preprocessing There are several concepts/techniques to improve The following is based on slides by Michael M. Richter. learning results that are general enough to be applicable Data preprocessing

CSC 411: Lecture 12: Clustering Class based on Raquel Urtasun & Rich Zemels lectures Sanja

Using Auxiliary Information Under a Pier Francesco Perri Generic Sampling Design Theoretical

Clustering (K-Means) Clustering Readings: Matt Gormley Murphy 25.5 Bishop 12.1,

direct illumination sampling Petr Vvoda, Ivo Kondapaneni, and Jaroslav Kivnek Render Legion,

Advanced Reconstruction Algorithms for the CMS High Granularity Calorimeter Kevin Pedro (FNAL)

Background Poisson or Binomial data with the following properties GLM with clustered data A

Lecture 14: Inference in Dirichlet Processes (Blei & Jordan, Variational inference for

Data Mining Learning from Large Data Sets Lecture 8

Clustering and Classification by Optimum-Path Forest Alexandre Falc ao Institute of Computing

Big Data Era 1 1 https://vimeo.com/102998774 The big problem: Scalability Visualization

New Developments In The Theory Of Clustering thats all very well in practice, but does it work

Towards a Statistical Theory of Clustering Ulrike von Luxburg, Shai Ben-David Page 1 Ulrike von

Percolation Theory Percolation Theory Jie Gao Computer Science Department Stony Brook

Machine learning theory Theory of clustering Hamid Beigy Sharif university of technology June

On the Approximability of Information Theoretic Clustering Ferdiando Cicalese, U. Verona Eduardo

A Spectral Algorithm for Learning Class-Based n -gram Models of Natural Language Karl Stratos

Nice to meet you! The Network Matters Cloud-based applications generate significant network

Cluster algebras and applications Bernhard Keller Universit Paris Diderot Paris 7 DMV

Cluster Production in pBUU - Past and Future Pawel Danielewicz National Superconducting

Completion of Discrete Cluster Categories of type A . Emine Yldrm, joint with Ba Nguyen and

Nature-based Solutions in Practice: Examples from the Moore Foundations Conservation and

Configuration Management Initiative 2.0 updates @fabianbircher @nuvoleweb bircher @MikePPhD

Features VS CMI - The battle for Drupal 8 By - Neetu Morwani LEARNING IS ESSENTIAL DOWN THE LINE

3Q 2012 Earning Results October 2 0 1 2 Disclaimer Chimei Innolux Corporations statements of

20-03-28 9. General Improvement Techniques 9.1 Preprocessing There - PDF document

20-03-28 9. General Improvement Techniques 9.1 Preprocessing There are several concepts/techniques to improve The following is based on slides by Michael M. Richter. learning results that are general enough to be applicable Data preprocessing

CSC 411: Lecture 12: Clustering Class based on Raquel Urtasun &amp; Rich Zemels lectures Sanja

Using Auxiliary Information Under a Pier Francesco Perri Generic Sampling Design Theoretical

Clustering (K-Means) Clustering Readings: Matt Gormley Murphy 25.5 Bishop 12.1,

direct illumination sampling Petr Vvoda, Ivo Kondapaneni, and Jaroslav Kivnek Render Legion,

Advanced Reconstruction Algorithms for the CMS High Granularity Calorimeter Kevin Pedro (FNAL)

Background Poisson or Binomial data with the following properties GLM with clustered data A

Lecture 14: Inference in Dirichlet Processes (Blei &amp; Jordan, Variational inference for

Data Mining Learning from Large Data Sets Lecture 8

Clustering and Classification by Optimum-Path Forest Alexandre Falc ao Institute of Computing

Big Data Era 1 1 https://vimeo.com/102998774 The big problem: Scalability Visualization

New Developments In The Theory Of Clustering thats all very well in practice, but does it work

Towards a Statistical Theory of Clustering Ulrike von Luxburg, Shai Ben-David Page 1 Ulrike von

Percolation Theory Percolation Theory Jie Gao Computer Science Department Stony Brook

Machine learning theory Theory of clustering Hamid Beigy Sharif university of technology June

On the Approximability of Information Theoretic Clustering Ferdiando Cicalese, U. Verona Eduardo

A Spectral Algorithm for Learning Class-Based n -gram Models of Natural Language Karl Stratos

Nice to meet you! The Network Matters Cloud-based applications generate significant network

Cluster algebras and applications Bernhard Keller Universit Paris Diderot Paris 7 DMV

Cluster Production in pBUU - Past and Future Pawel Danielewicz National Superconducting

Completion of Discrete Cluster Categories of type A . Emine Yldrm, joint with Ba Nguyen and

Nature-based Solutions in Practice: Examples from the Moore Foundations Conservation and

Configuration Management Initiative 2.0 updates @fabianbircher @nuvoleweb bircher @MikePPhD

Features VS CMI - The battle for Drupal 8 By - Neetu Morwani LEARNING IS ESSENTIAL DOWN THE LINE

3Q 2012 Earning Results October 2 0 1 2 Disclaimer Chimei Innolux Corporations statements of

CSC 411: Lecture 12: Clustering Class based on Raquel Urtasun & Rich Zemels lectures Sanja

Lecture 14: Inference in Dirichlet Processes (Blei & Jordan, Variational inference for