Regression
Albert Bifet May 2012
Regression Albert Bifet May 2012 COMP423A/COMP523A Data Stream - - PowerPoint PPT Presentation
Regression Albert Bifet May 2012 COMP423A/COMP523A Data Stream Mining Outline 1. Introduction 2. Stream Algorithmics 3. Concept drift 4. Evaluation 5. Classification 6. Ensemble Methods 7. Regression 8. Clustering 9. Frequent Pattern
Albert Bifet May 2012
Outline
Definition
Given a numeric class attribute, a regression algorithm builds a model that predicts for every unlabelled instance I a numeric value with accuracy. y = f(x)
Example
Stock-Market price prediction
Example
Airplane delays
Regression mean measures
◮ Mean square error:
MSE =
◮ Root mean square error:
RMSE = √ MSE =
Forgetting mechanism for estimating measures
Sliding window of size w with the most recent observations
Regression relative measures
◮ Relative Square error:
RSE =
yi − yi)2
◮ Root relative square error:
RRSE = √ RSE =
yi) − yi)2
Forgetting mechanism for estimating measures
Sliding window of size w with the most recent observations
Regression absolute measures
◮ Mean absolute error:
MAE =
◮ Relative absolute error:
RAE =
yi − yi|)
Forgetting mechanism for estimating measures
Sliding window of size w with the most recent observations
Linear Least Squares fitting
◮ Linear Regression Model
f(x) = β0 +
p
βjxj = Xβ
◮ Minimize residual sum of squares
RSS(β) =
N
(yi − f(xi))2/N = (y − Xβ)′(y − Xβ)
◮ Solution:
ˆ β = (X′X)−1X′y
Attribute 1 Attribute 2 Attribute 3 Attribute 4 Attribute 5 Output h
w(
xi) w1 w2 w3 w4 w5
◮ Data stream:
xi, yi
◮ Classical perceptron: h w(
xi) = wT xi,
◮ Minimize Mean-square error: J(
w) = 1
2
(yi − h
w(
xi))2
◮ Minimize Mean-square error: J(
w) = 1
2
(yi − h
w(
xi))2
◮ Stochastic Gradient Descent:
w = w − η∇J xi
◮ Gradient of the error function:
∇J = −
(yi − h
w(
xi))
◮ Weight update rule
w + η
(yi − h
w(
xi)) xi
FIMT-DD differences with HT:
Standard Deviation Reduction Measure
◮ Classification
Information Gain = Entropy(before Split) − Entropy(after split) Entropy = −
c
Gini Index =
c
c
i ◮ Regression
Gain = SD(before Split) − SD(after split) StandardDeviation (SD) =
y − yi)2/N
Exhaustive Binary Tree (BINTREE – Gama et al, 2003)
◮ Closest implementation of a batch method ◮ Incrementally update a binary tree as data is observed ◮ Issues: high memory cost, high cost of split search, data
◮ The CUSUM test
g0 = 0, gt = max (0, gt−1 + ǫt − υ) if gt > h then alarm and gt = 0
◮ The Page Hinckley Test
g0 = 0, gt = gt−1 + (ǫt − υ) Gt = min(gt) if gt − Gt > h then alarm and gt = 0
kNN Nearest Neighbours:
ˆ f(xq) = k
i=1 f(xi)
k