Model Type Selection in Predictive Big Data Analytics Mustafa - - PowerPoint PPT Presentation

β–Ά
model type selection in
SMART_READER_LITE
LIVE PREVIEW

Model Type Selection in Predictive Big Data Analytics Mustafa - - PowerPoint PPT Presentation

Using Meta-learning for Model Type Selection in Predictive Big Data Analytics Mustafa Nural, Hao Peng, John A. Miller Department of Computer Science University of Georgia What is is Predictive Analytics? The process of building a


slide-1
SLIDE 1

Using Meta-learning for Model Type Selection in Predictive Big Data Analytics

Mustafa Nural, Hao Peng, John A. Miller Department of Computer Science University of Georgia

slide-2
SLIDE 2

What is is Predictive Analytics?

  • The process of building a statistical model from data to

capture the relationships between variables in order to

  • make sense of
  • to predict outcomes
  • Model

𝑧 = 𝑔 π’š + Ο΅

  • Modeling Technique / Model Type
  • E.g., OLS Regression, Lasso Regression
  • Classification
  • Target outcome of the model is a categorical variable
  • Prediction
  • Target outcome of the model is a non-categorical variable
  • Includes many types of regression models
slide-3
SLIDE 3

What is is the Problem?

  • Choosing the most predictive model from a set of

candidate models is non-trivial

  • No free lunch theorem (Wolpert & Macready, 1997)
  • No single modeling technique can consistently
  • utperform others
  • Different Restrictions per problem
  • Interpretability
  • Parsimony
  • Etc.
slide-4
SLIDE 4

Meta-learning

  • β€œLearning to learn”
  • Active area of research in machine learning
  • Learning performance of classification algorithms
  • Hyper-parameter optimization
  • Pre-processing of datasets
  • Little focus has been given to prediction algorithms
  • No previous work on the regression family
  • OLS Regression
  • Regression with regularization
  • Generalized Linear Models
slide-5
SLIDE 5

Overview of Meta-learning

Train Meta-learner

Candidate Dataset

Candidate s Meta-features Most Predictive Technique(s)

Training Datasets Modeling Techniques Performance Statistics

Report most predictive technique for each dataset

Feature Extraction Meta-learning Suggestion Engine Training Set

Meta-features

slide-6
SLIDE 6

Meta-feature Extraction

  • Features from the literature
  • 𝑐𝑏𝑑𝑓 𝑒𝑔, 𝑐𝑏𝑑𝑓 𝑠𝑒𝑔, non-negative response, domain

response, distinct ratio of response, % numeric, % categorical, % binary variables

  • Grand mean: stddev, mean, skewness, and kurtosis of

numeric variables

  • Grand mean: min, max, mean, stddev of categorical variables
  • Additional features particularly relevant for Regression

problems

  • Log Dimensionality
  • Matrix Condition Number
  • Skewness & Kurtosis of Response
  • Coefficient of Variation of Response
slide-7
SLIDE 7

Target Modeling Techniques

  • Ordinary Least Squares

Regression (ScalaTion)

  • Weighted Least Squares

Regression (ScalaTion)

  • Back-elim Regression (ScalaTion)
  • Response Surface Analysis

(Quadratic Expansion) (ScalaTion)

  • Response Surface Analysis (Cubic

Expansion) (ScalaTion)

  • Log Transformed Regression

(ScalaTion)

  • Root Transformed Regression

(ScalaTion)

  • Exponential Regression (R)
  • Poisson Regression (R)
  • Inverse Gaussian Regression (R)
  • Gamma Regression (R)
  • Ridge Regression (R, ScalaTion)
  • Lasso Regression (R, ScalaTion)
  • Partial Least Squares Regression

(R)

  • Principal Components Regression

(R)

slide-8
SLIDE 8

Generating Training Set

  • Performance metrics
  • Root mean squared error (𝑆𝑁𝑇𝐹)
  • Root relative squared error (𝑆𝑆𝑇𝐹) (1 βˆ’ 𝑆2)
  • 15 modeling techniques
  • 114 datasets
  • UCI, OpenML, R, Luis Torgo collection, Bilkent Unv. Collection and

etc.

  • https://github.com/scalation/data
  • 10-fold cross validation repeated 10 times per

dataset/technique to get more reliable estimates

  • Hyper-parameter optimization is done by some modeling

techniques

  • E.g., πœ‡ penalty for 𝑀1(Lasso) and 𝑀2 (Ridge) regularization
slide-9
SLIDE 9

Training the Meta-learner

  • Meta-features are used as predictors
  • Top-performing modeling technique as the response
  • Random Forest Classifier, k-NN Classifier
  • Evaluation metrics
  • Mean Average Precision (𝑁𝐡𝑄@𝑙)
  • Rank-wise precision
  • Loose Accuracy (𝑀𝐡@𝑙)
  • If any of the top-k predictions match actual top-1 => 1
  • Otherwise => 0
  • Normalized Discounted Cumulative Gain (𝑂𝐸𝐷𝐻@𝑙)
  • Graded penalty if rankings are out of order
slide-10
SLIDE 10

Results (Cont’d)

0.53 0.77 0.56 0.70 0.84 0.45 0.74 0.55 0.65 0.83

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00

LA@1 LA@3 MAP@3 NDGC@1 NDGC@3

Random Forest kNN

slide-11
SLIDE 11

Conclusions & Future Work

  • Meta-learning can be used for predictive analytics

including regression family of techniques

  • Random forest classifier is a viable alternative as a

meta-learner for prediction

  • Dimensionality and characteristics of the response

variable are the most important meta-features.

  • Generalized Linear Models have specific assumptions on the

response variable.

  • Low dimensionality and negative base degrees of freedom

are important indicators for using a regularization technique such as Lasso or Ridge.

  • Future work includes:
  • More through comparison with AutoWEKA
  • Comparison with Ontology-based and Subsampling-based
slide-12
SLIDE 12

Questions

?

slide-13
SLIDE 13

Current Approaches

  • Exhaustive Search
  • Meta-learning
  • Ontology-based Semantics
  • Other/Proprietary
slide-14
SLIDE 14

Exhaustive Search

  • NaΓ―ve approach
  • Build a model using each modeling technique to find the
  • ptimal model
  • 238 in R caret package
  • > 10000 packages in R total
  • Examples: AutoWEKA, caret(R), performanceEstimation(R),

SPSS Auto Modeler, Data Robot, …

  • PROS
  • Conceptually simple
  • Not complex to implement
  • CONS
  • Might be tedious to implement
  • Time consuming
  • Doesn’t scale well w.r.t. dataset size and number of techniques
slide-15
SLIDE 15

Meta-learning

  • Applying a learning algorithm to pick a base

machine learning algorithm

  • Learns a mapping between dataset characteristics

and top-performing technique(s) among candidates

  • Has been studied extensively for classification

problems.

  • Limited work on
  • predictive models & regression based models
  • mapping data to a model (rather than a technique)
slide-16
SLIDE 16

Meta-learning (cont’d)

  • PROS
  • Fast once trained
  • Let m be number of instances and n number of variables
  • Scalable w.r.t dataset size
  • CONS
  • Training required
  • Adding new techniques not possible without re-training
slide-17
SLIDE 17

Ontology-based Semantics

  • Leverage domain expertise captured formally in an

Ontology

  • Use logical reasoning to suggest optimal model(s)
slide-18
SLIDE 18

Ontology-based Semantics(cont’d)

  • PROS
  • Fast
  • Scalable
  • Extending is straightforward
  • CONS
  • Requires manual curation
slide-19
SLIDE 19

A More Modern Approach

  • No expertise needed
  • Limited analysis capabilities
  • Doesn’t let you change

default model criteria and diagnostics

  • Not transparent
  • Doesn’t walk you

through decisions it’s making

  • Therefore limited

statistical insight

  • Emphasizes Text Analysis

Other/Proprietary

Screenshot taken from Watson Analytics platform

slide-20
SLIDE 20

Other/Proprietary (cont’d)

  • Examples: IBM Watson Analytics, Google Prediction

API ..

  • PROS
  • Very simple to use
  • CONS
  • Decision-making process is not transparent ( Watson

Analytics, Google Prediction API)

  • The chosen technique is not known ( Google Prediction

API)

slide-21
SLIDE 21

Generating Training Set

  • 114 datasets
  • 43 datasets from UCI Machine Learning Repository
  • 17 datasets from OpenML
  • 16 datasets from publicly available packages in R
  • 12 datasets from Luis Torgo Regression datasets collection
  • 9 datasets from Bilkent University Function Approximation

Library

  • 9 datasets from NCI-60 Cell Line panel:
  • Similar to (Lee et al. 2011), we have used gene expression data
  • btained from Affymetrix HG-U133A and B chips normalized using

the GCRMA method as predictors of proteins with top 9 most variance obtained from Reverse-phase protein lysate arrays (RPLA).

  • 8 datasets from various other sources

https://github.com/scalation/data