Regression Albert Bifet May 2012 COMP423A/COMP523A Data Stream - - PowerPoint PPT Presentation

regression
SMART_READER_LITE
LIVE PREVIEW

Regression Albert Bifet May 2012 COMP423A/COMP523A Data Stream - - PowerPoint PPT Presentation

Regression Albert Bifet May 2012 COMP423A/COMP523A Data Stream Mining Outline 1. Introduction 2. Stream Algorithmics 3. Concept drift 4. Evaluation 5. Classification 6. Ensemble Methods 7. Regression 8. Clustering 9. Frequent Pattern


slide-1
SLIDE 1

Regression

Albert Bifet May 2012

slide-2
SLIDE 2

COMP423A/COMP523A Data Stream Mining

Outline

  • 1. Introduction
  • 2. Stream Algorithmics
  • 3. Concept drift
  • 4. Evaluation
  • 5. Classification
  • 6. Ensemble Methods
  • 7. Regression
  • 8. Clustering
  • 9. Frequent Pattern Mining
  • 10. Distributed Streaming
slide-3
SLIDE 3

Data Streams

Big Data & Real Time

slide-4
SLIDE 4

Regression

Definition

Given a numeric class attribute, a regression algorithm builds a model that predicts for every unlabelled instance I a numeric value with accuracy. y = f(x)

Example

Stock-Market price prediction

Example

Airplane delays

slide-5
SLIDE 5

Evaluation

  • 1. Error estimation: Hold-out or Prequential
  • 2. Evaluation performance measures: MSE or MAE
  • 3. Statistical significance validation: Nemenyi test

Evaluation Framework

slide-6
SLIDE 6
  • 2. Performance Measures

Regression mean measures

◮ Mean square error:

MSE =

  • (f(xi) − yi)2/N

◮ Root mean square error:

RMSE = √ MSE =

  • (f(xi) − yi)2/N

Forgetting mechanism for estimating measures

Sliding window of size w with the most recent observations

slide-7
SLIDE 7
  • 2. Performance Measures

Regression relative measures

◮ Relative Square error:

RSE =

  • (f(xi) − yi)2/

yi − yi)2

◮ Root relative square error:

RRSE = √ RSE =

  • (f(xi) − yi)2/

yi) − yi)2

Forgetting mechanism for estimating measures

Sliding window of size w with the most recent observations

slide-8
SLIDE 8
  • 2. Performance Measures

Regression absolute measures

◮ Mean absolute error:

MAE =

  • (|f(xi) − yi|)/N

◮ Relative absolute error:

RAE =

  • (|f(xi) − yi|)/
  • (|ˆ

yi − yi|)

Forgetting mechanism for estimating measures

Sliding window of size w with the most recent observations

slide-9
SLIDE 9

Linear Methods for Regression

Linear Least Squares fitting

◮ Linear Regression Model

f(x) = β0 +

p

  • j=1

βjxj = Xβ

◮ Minimize residual sum of squares

RSS(β) =

N

  • i=1

(yi − f(xi))2/N = (y − Xβ)′(y − Xβ)

◮ Solution:

ˆ β = (X′X)−1X′y

slide-10
SLIDE 10

Perceptron

Attribute 1 Attribute 2 Attribute 3 Attribute 4 Attribute 5 Output h

w(

xi) w1 w2 w3 w4 w5

◮ Data stream:

xi, yi

◮ Classical perceptron: h w(

xi) = wT xi,

◮ Minimize Mean-square error: J(

w) = 1

2

(yi − h

w(

xi))2

slide-11
SLIDE 11

Perceptron

◮ Minimize Mean-square error: J(

w) = 1

2

(yi − h

w(

xi))2

◮ Stochastic Gradient Descent:

w = w − η∇J xi

◮ Gradient of the error function:

∇J = −

  • i

(yi − h

w(

xi))

◮ Weight update rule

  • w =

w + η

  • i

(yi − h

w(

xi)) xi

slide-12
SLIDE 12

Fast Incremental Model Tree with Drift Detection FIMT-DD

FIMT-DD differences with HT:

  • 1. Splitting Criterion
  • 2. Numeric attribute handling using BINTREE
  • 3. Linear model at the leaves
  • 4. Concept Drift Handling: Page-Hinckley
  • 5. Alternate Tree adaption strategy
slide-13
SLIDE 13

Splitting Criterion

Standard Deviation Reduction Measure

◮ Classification

Information Gain = Entropy(before Split) − Entropy(after split) Entropy = −

c

  • pi · log pi

Gini Index =

c

  • pi(1 − pi) = 1 −

c

  • p2

i ◮ Regression

Gain = SD(before Split) − SD(after split) StandardDeviation (SD) =

y − yi)2/N

slide-14
SLIDE 14

Numeric Handling Methods

Exhaustive Binary Tree (BINTREE – Gama et al, 2003)

◮ Closest implementation of a batch method ◮ Incrementally update a binary tree as data is observed ◮ Issues: high memory cost, high cost of split search, data

  • rder
slide-15
SLIDE 15

Page Hinckley Test

◮ The CUSUM test

g0 = 0, gt = max (0, gt−1 + ǫt − υ) if gt > h then alarm and gt = 0

◮ The Page Hinckley Test

g0 = 0, gt = gt−1 + (ǫt − υ) Gt = min(gt) if gt − Gt > h then alarm and gt = 0

slide-16
SLIDE 16

Lazy Methods

kNN Nearest Neighbours:

  • 1. Mean value of the k nearest neighbours

ˆ f(xq) = k

i=1 f(xi)

k

  • 2. Depends on distance function