time series based ensemble modeling for bio medical
play

Time-series-based Ensemble Modeling for Bio-Medical Applications - PowerPoint PPT Presentation

Time-series-based Ensemble Modeling for Bio-Medical Applications Maciej Ogorzaek 1 , 2 in collaboration with: Christian Merkwirth, Grzegorz Surowka, Leszek Nowak, Katarzyna Grzesiak-Kopec 1 Joerg Wichard 3 1 Department of Information


  1. Time-series-based Ensemble Modeling for Bio-Medical Applications Maciej Ogorzałek 1 , 2 in collaboration with: Christian Merkwirth, Grzegorz Surowka, Leszek Nowak, Katarzyna Grzesiak-Kopec 1 Joerg Wichard 3 1 Department of Information Technologies, Jagiellonian University, Kraków 2 Chair of Bio-signals and Systems, Hong Kong Polytechnic University (under DSS) 3 FMP Berlin, Germany M. Ogorzałek – p. 1/1

  2. Learning a Dependency from Data x µ , y µ ) with µ = 1 , . . . , N Given: A sample of input-output-pairs ( � A functional dependence y ( � x ) (maybe corrupted by noise) Choosing a model (function) ˆ f out of hypothesis space H Aim: close to true dependency f as possible f : R D �→ { 0 , 1 , 2 , ... } Classification discrete classes f : R D �→ R Regression continuous output Implementation usually via solution of an appropriate optimization problem: • Matrix inversion in case of linear regression • Minimization of a loss function on the training data • Quadratic programming problem for SVMs M. Ogorzałek – p. 2/1

  3. Validation and Model Selection • Generalization error: How does the model perform on unseen data (samples) ? • Exact generalization error is not accessible since we have only limited number of observations ! • Training on small data set tends to overfit, causing generalization error to be significantly higher than training error • Consequence of mismatch between the capacity of the hypothesis space H (VC (Vapnik-Cervonenkis)-Dimension) and the number of training observations • Validation: Estimating the generalization error using just the given data set – Needed for choosing optimal model structure or learning parameters (step sizes etc.) • Model Selection: Selecting the model with lowest (estimated) generalization error • But estimation of generalization error is very unreliable on small data sets M. Ogorzałek – p. 3/1

  4. Improving Generalization for Single Models • Remedies: – Manipulating training algorithm (e.g. early stopping) – Regularization by adding a penalty to the loss function – Using algorithms with built-in capacity control (e.g. SVM) – Rely on criteria like BIC (Bayesian Information Criteria), AIC (Akaike), GCV (Generalized Cross-Validation ) or Cross Validation to select optimal model complextiy – Reformulate the loss function : • ǫ -insensitive loss • Huber loss • SVM loss for classification M. Ogorzałek – p. 4/1

  5. Question • Are there any other methods to improve generalization error ? M. Ogorzałek – p. 5/1

  6. Question • Are there any other methods to improve generalization error ? • Yes, by combining several individual models! M. Ogorzałek – p. 5/1

  7. Ensemble Methods Ensemble: Averaging the output of several separately trained models • Simple average ¯ � K x ) = 1 f ( � k =1 f k ( � x ) K • Weighted average ¯ f ( � x ) = � k w k f k ( � x ) with � k w k = 1 M. Ogorzałek – p. 6/1

  8. Ensemble Methods Ensemble: Averaging the output of several separately trained models • Simple average ¯ � K x ) = 1 f ( � k =1 f k ( � x ) K • Weighted average ¯ f ( � x ) = � k w k f k ( � x ) with � k w k = 1 M. Ogorzałek – p. 6/1

  9. Ensemble Methods Ensemble: Averaging the output of several separately trained models • Simple average ¯ � K x ) = 1 f ( � k =1 f k ( � x ) K • Weighted average ¯ f ( � x ) = � k w k f k ( � x ) with � k w k = 1 M. Ogorzałek – p. 6/1

  10. Ensemble Methods Error decomposition: Ensemble: Averaging the output of several separately trained models x ) − ¯ x )) 2 e ( � x ) = ( y ( � f ( � • Simple average K 1 ¯ � K x ) = 1 f ( � k =1 f k ( � x ) � x )) 2 ¯ ǫ ( � x ) = ( y ( � x ) − f k ( � K K k =1 • Weighted average K 1 x ) − ¯ � x )) 2 ¯ a ( � ¯ x ) = ( f k ( � f ( � f ( � x ) = � k w k f k ( � x ) with � k w k = 1 K k =1 Interpretation: e ( � x ) = ǫ ( � ¯ x ) − ¯ a ( � x ) • The ensemble generalization error is always smaller than the expected error of the individual models Integrating over input space: • An ensemble should consist of well trained but diverse models E = ¯ E − ¯ A • An ensemble often outperforms the best constituting model M. Ogorzałek – p. 6/1

  11. Decorrelating Models E = ¯ E − ¯ A How can we obtain models that have low gen- eralization error (small ¯ E ), but are mutually un- correlated (large ¯ A )? • Varying model structure (e.g. topology) • Exploiting the disadvantage of getting stuck in local minima: – Varying initial conditions – Varying parameters of the training procedure – Using ǫ -insensitive loss function • Train a large population of models • Applying resampling or sequencing tech- niques: M. Ogorzałek – p. 7/1

  12. Decorrelating Models • Resampling: Generating new data sets E = ¯ E − ¯ by omitting or duplicating samples of the A original data set. These techniques can How can we obtain models that have low gen- be used to estimate generalization errors eralization error (small ¯ E ), but are mutually un- and for model construction correlated (large ¯ A )? Bootstraping Generate bootstrap • Varying model structure (e.g. topology) replicates by randomly drawing • Exploiting the disadvantage of getting samples from training set stuck in local minima: Cross-Validation Divide data set – Varying initial conditions repeatedly in training and test part – Varying parameters of the training Bumping Construct models on bootstrap procedure replicates and choose best model on – Using ǫ -insensitive loss function full data set Bagging Bootstrap aggregation, create • Train a large population of models several models on bootstrap • Applying resampling or sequencing tech- replicates and average these niques: Boosting Create sequence of models where training of next model depends on output of previous model M. Ogorzałek – p. 7/1

  13. Crosstraining – Constructing Ensembles • Finesse: Efficiently reuse samples by combining training, validation and selection of models • Additional benefit of reduced correlation between models • Repeatedly partition data set randomly into two sample classes – Training set, used for training and stopping criteria – Test set, used only for accessing generalization error after model has been trained M. Ogorzałek – p. 8/1

  14. Crosstraining – Constructing Ensembles • Finesse: Efficiently reuse samples • Train population of (heterogenous) by combining training, validation models, select best ones and selection of models according to error on test set • Additional benefit of reduced • Repartition data set, taking care correlation between models that test sets are mutually disjunct • Repeatedly partition data set • Combine best models of all randomly into two sample classes partitionings to ensemble – Training set, used for training • Optionally weight models accord- and stopping criteria ing to the estimated generalization – Test set, used only for error on the total data set accessing generalization error after model has been trained M. Ogorzałek – p. 8/1

  15. Pros and Cons of Ensembles Ensemble Methods • Advantages – Straightforward extension of existing modeling algorithms – Almost fool-proof minimization of generalization error – Makes no assumptions on the structure of the underlying models – Simplifies the problem of model selection • Disadvantages – Increased computational effort – Interpretation of ensemble is even harder than drawing conclusions from a single model M. Ogorzałek – p. 9/1

  16. Pros and Cons of Ensembles Ensemble Methods Combining Heterogenous Models • Advantages • Advantages – Straightforward extension of – Often one model type existing modeling algorithms performs superior on the given data set – Almost fool-proof minimization of generalization error – Probability of using an unsuited model type – Makes no assumptions on the decreases structure of the underlying models – Inherent decorrelation even without manipulating data set – Simplifies the problem of or training parameters model selection • Disadvantages • Disadvantages – Accessing the generalization – Increased computational effort performance of heterogenous – Interpretation of ensemble is models is even more difficult even harder than drawing than for models of same type conclusions from a single model M. Ogorzałek – p. 9/1

  17. The ENTOOL Toolbox for Statistical Learning • The ENTOOL toolbox for statistical learning is designed to make state-of-the-art machine learning algorithms available under a common interface • Allows construction of single models or ensembles of (heterogenous) models • Supports decorrelation of models by offering resampling techniques • Though primarily designed for re- gression, it is possible to construct ensembles of classifiers with EN- TOOL M. Ogorzałek – p. 10/1

  18. The ENTOOL Toolbox for Statistical Learning • Requirements: • The ENTOOL toolbox for statistical learning is designed to – Matlab (TM) make state-of-the-art machine • Operating systems: learning algorithms available – Windows under a common interface – Linux • Allows construction of single – Solaris (limited) models or ensembles of (heterogenous) models • Supports decorrelation of models by offering resampling techniques • Though primarily designed for re- gression, it is possible to construct ensembles of classifiers with EN- TOOL M. Ogorzałek – p. 10/1

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend