Time-series-based Ensemble Modeling for Bio-Medical Applications - PowerPoint PPT Presentation

Time-series-based Ensemble Modeling for Bio-Medical Applications Maciej Ogorzałek 1 , 2 in collaboration with: Christian Merkwirth, Grzegorz Surowka, Leszek Nowak, Katarzyna Grzesiak-Kopec 1 Joerg Wichard 3 1 Department of Information Technologies, Jagiellonian University, Kraków 2 Chair of Bio-signals and Systems, Hong Kong Polytechnic University (under DSS) 3 FMP Berlin, Germany M. Ogorzałek – p. 1/1

Learning a Dependency from Data x µ , y µ ) with µ = 1 , . . . , N Given: A sample of input-output-pairs ( � A functional dependence y ( � x ) (maybe corrupted by noise) Choosing a model (function) ˆ f out of hypothesis space H Aim: close to true dependency f as possible f : R D �→ { 0 , 1 , 2 , ... } Classification discrete classes f : R D �→ R Regression continuous output Implementation usually via solution of an appropriate optimization problem: • Matrix inversion in case of linear regression • Minimization of a loss function on the training data • Quadratic programming problem for SVMs M. Ogorzałek – p. 2/1

Validation and Model Selection • Generalization error: How does the model perform on unseen data (samples) ? • Exact generalization error is not accessible since we have only limited number of observations ! • Training on small data set tends to overfit, causing generalization error to be significantly higher than training error • Consequence of mismatch between the capacity of the hypothesis space H (VC (Vapnik-Cervonenkis)-Dimension) and the number of training observations • Validation: Estimating the generalization error using just the given data set – Needed for choosing optimal model structure or learning parameters (step sizes etc.) • Model Selection: Selecting the model with lowest (estimated) generalization error • But estimation of generalization error is very unreliable on small data sets M. Ogorzałek – p. 3/1

Improving Generalization for Single Models • Remedies: – Manipulating training algorithm (e.g. early stopping) – Regularization by adding a penalty to the loss function – Using algorithms with built-in capacity control (e.g. SVM) – Rely on criteria like BIC (Bayesian Information Criteria), AIC (Akaike), GCV (Generalized Cross-Validation ) or Cross Validation to select optimal model complextiy – Reformulate the loss function : • ǫ -insensitive loss • Huber loss • SVM loss for classification M. Ogorzałek – p. 4/1

Question • Are there any other methods to improve generalization error ? M. Ogorzałek – p. 5/1

Question • Are there any other methods to improve generalization error ? • Yes, by combining several individual models! M. Ogorzałek – p. 5/1

Ensemble Methods Ensemble: Averaging the output of several separately trained models • Simple average ¯ � K x ) = 1 f ( � k =1 f k ( � x ) K • Weighted average ¯ f ( � x ) = � k w k f k ( � x ) with � k w k = 1 M. Ogorzałek – p. 6/1

Ensemble Methods Error decomposition: Ensemble: Averaging the output of several separately trained models x ) − ¯ x )) 2 e ( � x ) = ( y ( � f ( � • Simple average K 1 ¯ � K x ) = 1 f ( � k =1 f k ( � x ) � x )) 2 ¯ ǫ ( � x ) = ( y ( � x ) − f k ( � K K k =1 • Weighted average K 1 x ) − ¯ � x )) 2 ¯ a ( � ¯ x ) = ( f k ( � f ( � f ( � x ) = � k w k f k ( � x ) with � k w k = 1 K k =1 Interpretation: e ( � x ) = ǫ ( � ¯ x ) − ¯ a ( � x ) • The ensemble generalization error is always smaller than the expected error of the individual models Integrating over input space: • An ensemble should consist of well trained but diverse models E = ¯ E − ¯ A • An ensemble often outperforms the best constituting model M. Ogorzałek – p. 6/1

Decorrelating Models E = ¯ E − ¯ A How can we obtain models that have low generalization error (small ¯ E ), but are mutually un- correlated (large ¯ A )? • Varying model structure (e.g. topology) • Exploiting the disadvantage of getting stuck in local minima: – Varying initial conditions – Varying parameters of the training procedure – Using ǫ -insensitive loss function • Train a large population of models • Applying resampling or sequencing techniques: M. Ogorzałek – p. 7/1

Decorrelating Models • Resampling: Generating new data sets E = ¯ E − ¯ by omitting or duplicating samples of the A original data set. These techniques can How can we obtain models that have low gen- be used to estimate generalization errors eralization error (small ¯ E ), but are mutually un- and for model construction correlated (large ¯ A )? Bootstraping Generate bootstrap • Varying model structure (e.g. topology) replicates by randomly drawing • Exploiting the disadvantage of getting samples from training set stuck in local minima: Cross-Validation Divide data set – Varying initial conditions repeatedly in training and test part – Varying parameters of the training Bumping Construct models on bootstrap procedure replicates and choose best model on – Using ǫ -insensitive loss function full data set Bagging Bootstrap aggregation, create • Train a large population of models several models on bootstrap • Applying resampling or sequencing tech- replicates and average these niques: Boosting Create sequence of models where training of next model depends on output of previous model M. Ogorzałek – p. 7/1

Crosstraining – Constructing Ensembles • Finesse: Efficiently reuse samples by combining training, validation and selection of models • Additional benefit of reduced correlation between models • Repeatedly partition data set randomly into two sample classes – Training set, used for training and stopping criteria – Test set, used only for accessing generalization error after model has been trained M. Ogorzałek – p. 8/1

Crosstraining – Constructing Ensembles • Finesse: Efficiently reuse samples • Train population of (heterogenous) by combining training, validation models, select best ones and selection of models according to error on test set • Additional benefit of reduced • Repartition data set, taking care correlation between models that test sets are mutually disjunct • Repeatedly partition data set • Combine best models of all randomly into two sample classes partitionings to ensemble – Training set, used for training • Optionally weight models accord- and stopping criteria ing to the estimated generalization – Test set, used only for error on the total data set accessing generalization error after model has been trained M. Ogorzałek – p. 8/1

Pros and Cons of Ensembles Ensemble Methods • Advantages – Straightforward extension of existing modeling algorithms – Almost fool-proof minimization of generalization error – Makes no assumptions on the structure of the underlying models – Simplifies the problem of model selection • Disadvantages – Increased computational effort – Interpretation of ensemble is even harder than drawing conclusions from a single model M. Ogorzałek – p. 9/1

Pros and Cons of Ensembles Ensemble Methods Combining Heterogenous Models • Advantages • Advantages – Straightforward extension of – Often one model type existing modeling algorithms performs superior on the given data set – Almost fool-proof minimization of generalization error – Probability of using an unsuited model type – Makes no assumptions on the decreases structure of the underlying models – Inherent decorrelation even without manipulating data set – Simplifies the problem of or training parameters model selection • Disadvantages • Disadvantages – Accessing the generalization – Increased computational effort performance of heterogenous – Interpretation of ensemble is models is even more difficult even harder than drawing than for models of same type conclusions from a single model M. Ogorzałek – p. 9/1

The ENTOOL Toolbox for Statistical Learning • The ENTOOL toolbox for statistical learning is designed to make state-of-the-art machine learning algorithms available under a common interface • Allows construction of single models or ensembles of (heterogenous) models • Supports decorrelation of models by offering resampling techniques • Though primarily designed for regression, it is possible to construct ensembles of classifiers with EN- TOOL M. Ogorzałek – p. 10/1

The ENTOOL Toolbox for Statistical Learning • Requirements: • The ENTOOL toolbox for statistical learning is designed to – Matlab (TM) make state-of-the-art machine • Operating systems: learning algorithms available – Windows under a common interface – Linux • Allows construction of single – Solaris (limited) models or ensembles of (heterogenous) models • Supports decorrelation of models by offering resampling techniques • Though primarily designed for regression, it is possible to construct ensembles of classifiers with EN- TOOL M. Ogorzałek – p. 10/1

Time-series-based Ensemble Modeling for Bio-Medical Applications - PowerPoint PPT Presentation

Time-series-based Ensemble Modeling for Bio-Medical Applications Maciej Ogorzaek 1 , 2 in collaboration with: Christian Merkwirth, Grzegorz Surowka, Leszek Nowak, Katarzyna Grzesiak-Kopec 1 Joerg Wichard 3 1 Department of Information

Lead Screw Motors LSM08 Series LSM11 Series LSM14 Series LSM17 Series

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

Section 1 Time Series Modeling 1 / 37 Time Series Modeling ST 810-006 Statistics and Financial

Time Series Analysis and Mining with R Time Series Decomposi- tion Time Series Forecasting

Outline Time series and forecasting Time series objects 1 in R Basic time series functionality

Bio-PEPAd: Integrating exponential and deterministic delays Jane Hillston. LFCS and CSBE,

Bio-PEPAd: Integrating exponential and deterministic delays Jane Hillston. LFCS and SynthSys,

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

standard series Overview DP series DX series H series M series bitte hier

Stochastic Physics Perturbations For Ensemble Forecast Yuejian Zhu Ensemble Team Environmental

One Resilience Noel L.J. Miranda, Bio-security/Bio-threats Preparedness Consultant ARF

Knowledge development and transfer of best practice on bio-safety/bio- security/bio-risk

Convergence of ensemble Kalman filters in the large ensemble limit and infinite dimension Jan

E- -Series: Series: Water Mist Extinguishers Water Mist Extinguishers E E- -Series: Series:

Fourier Series Fourier Sine Series Fourier Cosine Series Fourier Series Convergence

Modeling time series with hidden Markov models Advanced Machine learning 2017 Nadia Figueroa,

Profiling novel pharmacology of GPCR complexes Professor Kevin Pfleger using Receptor-HIT

RNA-seq read mapping Pr Engstrm SciLifeLab

A Translational Investigation of Metastasis Ning Zhang Tianjin Medical University Metastasis of

How I treat high risk Waldenstrms Macroglobulinemia? Christian Buske The first difficulty!

Applications of Machine Learning in Computational Biology Narges Razavian New York University

Web-based Inference Detection Web 2.0 Security & Privacy, 5/24/2007 Richard Chow Philippe

2017 Water Cruise: Update on Cyanide Rolling Averages # Sites With Results by Year & Region

Calculating 3-Event Rolling Averages As part of the site-specific objectives (SSO), NPDES

Time-series-based Ensemble Modeling for Bio-Medical Applications - PowerPoint PPT Presentation

Time-series-based Ensemble Modeling for Bio-Medical Applications Maciej Ogorzaek 1 , 2 in collaboration with: Christian Merkwirth, Grzegorz Surowka, Leszek Nowak, Katarzyna Grzesiak-Kopec 1 Joerg Wichard 3 1 Department of Information

Lead Screw Motors LSM08 Series LSM11 Series LSM14 Series LSM17 Series

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

Section 1 Time Series Modeling 1 / 37 Time Series Modeling ST 810-006 Statistics and Financial

Time Series Analysis and Mining with R Time Series Decomposi- tion Time Series Forecasting

Outline Time series and forecasting Time series objects 1 in R Basic time series functionality

Bio-PEPAd: Integrating exponential and deterministic delays Jane Hillston. LFCS and CSBE,

Bio-PEPAd: Integrating exponential and deterministic delays Jane Hillston. LFCS and SynthSys,

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

standard series Overview DP series DX series H series M series bitte hier

Stochastic Physics Perturbations For Ensemble Forecast Yuejian Zhu Ensemble Team Environmental

One Resilience Noel L.J. Miranda, Bio-security/Bio-threats Preparedness Consultant ARF

Knowledge development and transfer of best practice on bio-safety/bio- security/bio-risk

Convergence of ensemble Kalman filters in the large ensemble limit and infinite dimension Jan

E- -Series: Series: Water Mist Extinguishers Water Mist Extinguishers E E- -Series: Series:

Fourier Series Fourier Sine Series Fourier Cosine Series Fourier Series Convergence

Modeling time series with hidden Markov models Advanced Machine learning 2017 Nadia Figueroa,

Profiling novel pharmacology of GPCR complexes Professor Kevin Pfleger using Receptor-HIT

RNA-seq read mapping Pr Engstrm SciLifeLab

A Translational Investigation of Metastasis Ning Zhang Tianjin Medical University Metastasis of

How I treat high risk Waldenstrms Macroglobulinemia? Christian Buske The first difficulty!

Applications of Machine Learning in Computational Biology Narges Razavian New York University

Web-based Inference Detection Web 2.0 Security &amp; Privacy, 5/24/2007 Richard Chow Philippe

2017 Water Cruise: Update on Cyanide Rolling Averages # Sites With Results by Year &amp; Region

Calculating 3-Event Rolling Averages As part of the site-specific objectives (SSO), NPDES

Web-based Inference Detection Web 2.0 Security & Privacy, 5/24/2007 Richard Chow Philippe

2017 Water Cruise: Update on Cyanide Rolling Averages # Sites With Results by Year & Region