CS489/698 Lecture 22: March 27, 2017 Bagging and Distributed - - PowerPoint PPT Presentation

cs489 698 lecture 22 march 27 2017
SMART_READER_LITE
LIVE PREVIEW

CS489/698 Lecture 22: March 27, 2017 Bagging and Distributed - - PowerPoint PPT Presentation

CS489/698 Lecture 22: March 27, 2017 Bagging and Distributed Computing [RN] Sec. 18.10, [M] Sec. 16.2.5, [B] Chap. 14, [HTF] Chap 15-16, [D] Chap. 11 CS489/698 (c) 2017 P. Poupart 1 Boosting vs Bagging Review CS489/698 (c) 2017 P.


slide-1
SLIDE 1

CS489/698 Lecture 22: March 27, 2017

Bagging and Distributed Computing [RN] Sec. 18.10, [M] Sec. 16.2.5, [B] Chap. 14, [HTF] Chap 15-16, [D] Chap. 11

CS489/698 (c) 2017 P. Poupart 1

slide-2
SLIDE 2

CS489/698 (c) 2017 P. Poupart 2

Boosting vs Bagging

  • Review
slide-3
SLIDE 3

CS489/698 (c) 2017 P. Poupart 3

Independent classifiers/predictors

  • How can we obtain independent

classifiers/predictors for bagging?

  • Bootstrap sampling

– Sample (without replacement) subset of data

  • Random projection

– Sample (without replacement) subset of features

  • Learn different classifiers/predictors based on

each data subset and feature subset

slide-4
SLIDE 4

CS489/698 (c) 2017 P. Poupart 4

Bagging

For k = 1 to K

sample data subset sample feature subset train classifier/predictor based on and Classification: Regression: Random forest: bag of decision trees

slide-5
SLIDE 5

5

Application: Xbox 360 Kinect

  • Microsoft Cambridge
  • Body part recognition: supervised learning

CS489/698 (c) 2017 P. Poupart

slide-6
SLIDE 6

6

Depth camera

  • Kinect

Infrared image Gray scale depth map

CS489/698 (c) 2017 P. Poupart

slide-7
SLIDE 7

7

Kinect Body Part Recognition

  • Problem: label each pixel with a body part

CS489/698 (c) 2017 P. Poupart

slide-8
SLIDE 8

8

Kinect Body Part Recognition

  • Features: depth differences between pairs of pixels
  • Classification: forest of decision trees

CS489/698 (c) 2017 P. Poupart

slide-9
SLIDE 9

CS489/698 (c) 2017 P. Poupart 9

Large Scale Machine Learning

  • Big data

– Large number of data instances – Large number of features

  • Solution: distribute computation (parallel

computation)

– GPU (Graphics Processing Unit) – Many cores

slide-10
SLIDE 10

CS489/698 (c) 2017 P. Poupart 10

GPU computation

  • Many Machine Learning algorithms consist of vector,

matrix and tensor operations

– A tensor is a multidimensional array

  • GPU (Graphics Processing Units) can perform

arithmetic operations on all elements of a tensor in parallel

  • Packages that facilitate ML programming on GPUs:

TensorFlow, Theano, Torch, Caffe, DL4J

slide-11
SLIDE 11

CS489/698 (c) 2017 P. Poupart 11

Multicore Computation

  • Idea: Train a different classifier/predictor with a subset
  • f the data on each core
  • How can we combine the classifiers/predictors?
  • Should we take the average of the parameters of the

classifiers/predictors? No, this might lead to a worse classifier/predictor. This is especially problematic for models with hidden variables/units such as neural networks and hidden Markov models

slide-12
SLIDE 12

CS489/698 (c) 2017 P. Poupart 12

Bad case of parameter averaging

  • Consider two threshold neural networks that encode the

exclusive-or Boolean function

  • Averaging the weights yields a new neural network that does

not encode exclusive-or

slide-13
SLIDE 13

CS489/698 (c) 2017 P. Poupart 13

Safely Combining Predictions

  • A safe approach to ensemble learning is to combine

the predictions (not the parameters)

  • Classification: majority vote of the classes predicted

by the classifiers

  • Regression: average of the predictions computed by

the regressors