[PDF] - ? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): PDF Document

SLIDE 1

INF3490 - Biologically inspired computing

Support Vector Machines, Ensemble Learning, and Dimensionality Reduction

Weria Khaksar

October 17, 2018

2

Support Vector Machines (SVM)

17.10.2018

3

Support Vector Machines (SVM): Background

17.10.2018

SVM is used for extreme classification cases. CAT DOG

?

4

Support Vector Machines (SVM): Background

17.10.2018

Remember the inefficiency of the Perceptron?

5

Support Vector Machines (SVM): Background

17.10.2018

Linear Separability

?

6

Support Vector Machines (SVM): Background

17.10.2018

A trick to solve it …

It is always possible to separate out two classes with a linear function, provided that you project the data into the correct set of dimensions.

SLIDE 2

7

Support Vector Machines (SVM): Background

17.10.2018

A trick to solve it …

?

8

Support Vector Machines (SVM): The Margin

17.10.2018

Which line is the best separator?

9

Support Vector Machines (SVM): The Margin

17.10.2018

Why do we need the best line?

10

Support Vector Machines (SVM): The Margin

17.10.2018

Which line is the best separator? The one with the highest margin

11

Support Vector Machines (SVM): Support Vectors

17.10.2018

Which data points are important?

12

Support Vector Machines (SVM): Support Vectors

17.10.2018

Which data points are important? Support Vectors

The data points in each class that lie closest to the classification line are called Support Vectors.

SLIDE 3

13

Support Vector Machines (SVM): Optimal Separation

17.10.2018

The margin should be as large as possible.
the best classifier is the one that goes through the

middle of the marginal area.

We can through away other data and just use support

vectors for classification.

14

Support Vector Machines (SVM): The Math.

17.10.2018

𝑁𝑏𝑦𝑗𝑛𝑗𝑨𝑓 |𝑁| 𝑡. 𝑢. : 𝑢 𝐱. 𝐲 𝑐 1, 𝑗 1, … , 𝑜

15

Support Vector Machines (SVM):

Slack Variables for Non-Linearly Separable Problems:

17.10.2018

16

Support Vector Machines (SVM):

Slack Variables for Non-Linearly Separable Problems:

17.10.2018

17

Support Vector Machines (SVM): KERNELS

17.10.2018

The trick is to modify the input features in some way, to be

able to linearly classify the data.

The main idea is to replace the input feature, 𝐲, with some

function, 𝜚 𝐲 .

The main challenge is to automate the algorithm to find the

proper function without a suitable knowledge domain.

18

Support Vector Machines (SVM): KERNELS

17.10.2018

The trick is to modify the input features in some way, to be

able to linearly classify the data.

The main idea is to replace the input feature, 𝐲, with some

function, 𝜚 𝐲 .

The main challenge is to automate the algorithm to find the

proper function without a suitable knowledge domain.

SLIDE 4

19

Support Vector Machines (SVM): KERNELS

17.10.2018

The trick is to modify the input features in some way, to be

able to linearly classify the data.

The main idea is to replace the input feature, 𝐲, with some

function, 𝜚 𝐲 .

The main challenge is to automate the algorithm to find the

proper function without a suitable knowledge domain.

20

Support Vector Machines (SVM): SVM Algorithm:

17.10.2018

21

Support Vector Machines (SVM): SVM Examples:

17.10.2018

22

Support Vector Machines (SVM): SVM Examples:

17.10.2018

Performing nonlinear classification via linear separation in higher dimensional space

23

Support Vector Machines (SVM): SVM Examples:

17.10.2018

The SVM learning about a linearly separable dataset (top row) and a dataset that needs two straight lines to separate in 2D (bottom row) with left the linear kernel, middle the polynomial kernel of degree 3, and right the RBF kernel.

24

Support Vector Machines (SVM): SVM Examples:

17.10.2018 The effects of different kernels when learning a version of XOR

SLIDE 5

25

Ensembled Learning

17.10.2018

26

Ensemble Learning: Background

17.10.2018

Having lots of simple

learners that each provide slightly different results,

Putting them together

in a proper way,

The results are

significantly better.

27

Ensemble Learning: Background

17.10.2018

The Basic Idea:

28

Ensemble Learning: Important Considerations

17.10.2018

Which learners

should we use?

How should we

ensure that they learn different things?

How should we

combine their results?

29

Ensemble Learning: Important Considerations

17.10.2018

Which learners

should we use?

How should we

ensure that they learn different things?

How should we

combine their results?

30

Ensemble Learning: Background

17.10.2018

If we take a collection of very poor learners,

each performing only just better than chance, then by putting them together it is possible to make an ensemble learner that can perform arbitrarily well.

We just need lots of low-quality learners, and

a way to put them together usefully, and we can make a learner that will do very well.

SLIDE 6

31

Ensemble Learning: Background

17.10.2018

32

Ensemble Learning: How it works?

17.10.2018

33

Ensemble Learning: BOOSTING

17.10.2018

As points are misclassified, their weights increase in boosting (shown by the data point getting larger), which makes the importance of those data points increase, making the classifiers pay more attention to them.

34

Ensemble Learning: BOOSTING

17.10.2018

AdaBoost:

35

Ensemble Learning: BOOSTING

17.10.2018

36

Ensemble Learning: BOOSTING

17.10.2018

AdaBoost: How it works?

SLIDE 7

37

Ensemble Learning: BOOSTING

17.10.2018

AdaBoost:

AdaBoost in Action

38

Ensemble Learning: BAGGING

17.10.2018

Bagging (Bootstrap Aggregating):

39

Ensemble Learning: BAGGING

17.10.2018

Bagging (Bootstrap Aggregating): How it works?

40

Ensemble Learning: BAGGING

17.10.2018

Bagging (Bootstrap Aggregating): Examples:

41

Ensemble Learning: Summary

17.10.2018

42

Dimensionality reduction

17.10.2018

SLIDE 8

43

Dimensionality reduction: Why?

17.10.2018

When looking at data and plotting results, we can

never go beyond three dimensions.

The higher the number of dimensions we have, the

more training data we need.

The dimensionality is an explicit factor for the

computational cost of many algorithms.

Remove noise.
Significantly improve the results of the learning

algorithm.

Make the dataset easier to work with.
Make the results easier to understand.

44

Dimensionality reduction: How?

17.10.2018

Feature Selection: Looking through the features

that are available and seeing whether or not they are actually useful.

Feature Derivation: Deriving new features from

the old ones, generally by applying transforms to the dataset.

Clustering:

Grouping together similar data points, and see whether this allows fewer features to be used.

45

Dimensionality reduction: Example

17.10.2018

46

Dimensionality reduction: Principal Components Analysis (PCA)

17.10.2018

47

Dimensionality reduction: Principal Components Analysis (PCA)

17.10.2018

The principal component is the direction in the data

with the largest variance.

48

Dimensionality reduction: Principal Components Analysis (PCA)

17.10.2018

SLIDE 9

49

Dimensionality reduction: Principal Components Analysis (PCA)

17.10.2018

PCA is a linear transformation
Does not directly help with data that is not linearly

separable.

However, may make learning easier because of

reduced complexity.

PCA removes some information from the data
Might just be noise.
Might provide helpful nuances that may be of help

to some classifiers.

50

Dimensionality reduction: Principal Components Analysis (PCA) Example

how to project samples into the variable space 17.10.2018 17.10.2018

51