Learning, Linear Separability and Linear Programming Lecture 22 - - PowerPoint PPT Presentation

learning linear separability and linear programming
SMART_READER_LITE
LIVE PREVIEW

Learning, Linear Separability and Linear Programming Lecture 22 - - PowerPoint PPT Presentation

CS 573: Algorithms, Fall 2013 Learning, Linear Separability and Linear Programming Lecture 22 November 12, 2013 Sariel (UIUC) CS573 1 Fall 2013 1 / 28 Labeling... . . given examples:a database of cars. 1 . . like to determine which


slide-1
SLIDE 1

CS 573: Algorithms, Fall 2013

Learning, Linear Separability and Linear Programming

Lecture 22

November 12, 2013

Sariel (UIUC) CS573 1 Fall 2013 1 / 28

slide-2
SLIDE 2

Labeling...

. .

1

given examples:a database of cars. . .

2

like to determine which cars are sport cars.. . .

3

Each car record: interpreted as point in high dimensions. . .

4

Example: sport car with 4 doors, manufactured in 1997, by Quaky (with manufacturer ID 6): (4, 1997, 6). Labeled as a sport car. . .

5

Tractor by General Mess (manufacturer ID 3) in 1998: (0, 1997, 3) Labeled as not a sport car. . .

6

Real world: hundreds of attributes. In some cases even millions

  • f attributes!

. .

7

Automate this classification process: label sports/regular car automatically.

Sariel (UIUC) CS573 2 Fall 2013 2 / 28

slide-3
SLIDE 3

Labeling...

. .

1

given examples:a database of cars. . .

2

like to determine which cars are sport cars.. . .

3

Each car record: interpreted as point in high dimensions. . .

4

Example: sport car with 4 doors, manufactured in 1997, by Quaky (with manufacturer ID 6): (4, 1997, 6). Labeled as a sport car. . .

5

Tractor by General Mess (manufacturer ID 3) in 1998: (0, 1997, 3) Labeled as not a sport car. . .

6

Real world: hundreds of attributes. In some cases even millions

  • f attributes!

. .

7

Automate this classification process: label sports/regular car automatically.

Sariel (UIUC) CS573 2 Fall 2013 2 / 28

slide-4
SLIDE 4

Labeling...

. .

1

given examples:a database of cars. . .

2

like to determine which cars are sport cars.. . .

3

Each car record: interpreted as point in high dimensions. . .

4

Example: sport car with 4 doors, manufactured in 1997, by Quaky (with manufacturer ID 6): (4, 1997, 6). Labeled as a sport car. . .

5

Tractor by General Mess (manufacturer ID 3) in 1998: (0, 1997, 3) Labeled as not a sport car. . .

6

Real world: hundreds of attributes. In some cases even millions

  • f attributes!

. .

7

Automate this classification process: label sports/regular car automatically.

Sariel (UIUC) CS573 2 Fall 2013 2 / 28

slide-5
SLIDE 5

Labeling...

. .

1

given examples:a database of cars. . .

2

like to determine which cars are sport cars.. . .

3

Each car record: interpreted as point in high dimensions. . .

4

Example: sport car with 4 doors, manufactured in 1997, by Quaky (with manufacturer ID 6): (4, 1997, 6). Labeled as a sport car. . .

5

Tractor by General Mess (manufacturer ID 3) in 1998: (0, 1997, 3) Labeled as not a sport car. . .

6

Real world: hundreds of attributes. In some cases even millions

  • f attributes!

. .

7

Automate this classification process: label sports/regular car automatically.

Sariel (UIUC) CS573 2 Fall 2013 2 / 28

slide-6
SLIDE 6

Automatic classification...

. .

1

learning algorithm:

. . .

1

given several (or many) classified examples... . . .

2

...develop its own conjecture for rule of classification. . . .

3

... can use it for classifying new data.

. .

2

learning: training + classifying. . .

3

Learn a function: f : I Rd → {−1, 1} . . .

4

challenge: f might have infinite complexity... . .

5

...rare situation in real world. Assume learnable functions. . .

6

red and blue points that are linearly separable. . .

7

Trying to learn a line ℓ that separates the red points from the blue points.

Sariel (UIUC) CS573 3 Fall 2013 3 / 28

slide-7
SLIDE 7

Automatic classification...

. .

1

learning algorithm:

. . .

1

given several (or many) classified examples... . . .

2

...develop its own conjecture for rule of classification. . . .

3

... can use it for classifying new data.

. .

2

learning: training + classifying. . .

3

Learn a function: f : I Rd → {−1, 1} . . .

4

challenge: f might have infinite complexity... . .

5

...rare situation in real world. Assume learnable functions. . .

6

red and blue points that are linearly separable. . .

7

Trying to learn a line ℓ that separates the red points from the blue points.

Sariel (UIUC) CS573 3 Fall 2013 3 / 28

slide-8
SLIDE 8

Automatic classification...

. .

1

learning algorithm:

. . .

1

given several (or many) classified examples... . . .

2

...develop its own conjecture for rule of classification. . . .

3

... can use it for classifying new data.

. .

2

learning: training + classifying. . .

3

Learn a function: f : I Rd → {−1, 1} . . .

4

challenge: f might have infinite complexity... . .

5

...rare situation in real world. Assume learnable functions. . .

6

red and blue points that are linearly separable. . .

7

Trying to learn a line ℓ that separates the red points from the blue points.

Sariel (UIUC) CS573 3 Fall 2013 3 / 28

slide-9
SLIDE 9

Automatic classification...

. .

1

learning algorithm:

. . .

1

given several (or many) classified examples... . . .

2

...develop its own conjecture for rule of classification. . . .

3

... can use it for classifying new data.

. .

2

learning: training + classifying. . .

3

Learn a function: f : I Rd → {−1, 1} . . .

4

challenge: f might have infinite complexity... . .

5

...rare situation in real world. Assume learnable functions. . .

6

red and blue points that are linearly separable. . .

7

Trying to learn a line ℓ that separates the red points from the blue points.

Sariel (UIUC) CS573 3 Fall 2013 3 / 28

slide-10
SLIDE 10

Automatic classification...

. .

1

learning algorithm:

. . .

1

given several (or many) classified examples... . . .

2

...develop its own conjecture for rule of classification. . . .

3

... can use it for classifying new data.

. .

2

learning: training + classifying. . .

3

Learn a function: f : I Rd → {−1, 1} . . .

4

challenge: f might have infinite complexity... . .

5

...rare situation in real world. Assume learnable functions. . .

6

red and blue points that are linearly separable. . .

7

Trying to learn a line ℓ that separates the red points from the blue points.

Sariel (UIUC) CS573 3 Fall 2013 3 / 28

slide-11
SLIDE 11

Automatic classification...

. .

1

learning algorithm:

. . .

1

given several (or many) classified examples... . . .

2

...develop its own conjecture for rule of classification. . . .

3

... can use it for classifying new data.

. .

2

learning: training + classifying. . .

3

Learn a function: f : I Rd → {−1, 1} . . .

4

challenge: f might have infinite complexity... . .

5

...rare situation in real world. Assume learnable functions. . .

6

red and blue points that are linearly separable. . .

7

Trying to learn a line ℓ that separates the red points from the blue points.

Sariel (UIUC) CS573 3 Fall 2013 3 / 28

slide-12
SLIDE 12

Automatic classification...

. .

1

learning algorithm:

. . .

1

given several (or many) classified examples... . . .

2

...develop its own conjecture for rule of classification. . . .

3

... can use it for classifying new data.

. .

2

learning: training + classifying. . .

3

Learn a function: f : I Rd → {−1, 1} . . .

4

challenge: f might have infinite complexity... . .

5

...rare situation in real world. Assume learnable functions. . .

6

red and blue points that are linearly separable. . .

7

Trying to learn a line ℓ that separates the red points from the blue points.

Sariel (UIUC) CS573 3 Fall 2013 3 / 28

slide-13
SLIDE 13

Linear separability example...

Sariel (UIUC) CS573 4 Fall 2013 4 / 28

slide-14
SLIDE 14

Learning linear separation

. .

1

Given red and blue points – how to compute the separating line ℓ? . .

2

line/plane/hyperplane is the zero set of a linear function. . .

3

Form: ∀x ∈ I Rd f (x) = ⟨a, x⟩ + b, where a = (a1, . . . , ad) , b =(b1, . . . , bd) ∈ I R2. ⟨a, x⟩ = ∑

i aixi is the dot product of a and x.

. .

4

classification done by computing sign of f (x): sign(f (x)). . .

5

If sign(f (x)) is negative: x is not in class. If positive: inside. . .

6

A set of training examples: S =

{

(x1, y1) , . . . ,(xn, yn)

}

, where xi ∈ I Rd and yi ∈ {-1,1}, for i = 1, . . . , n.

Sariel (UIUC) CS573 5 Fall 2013 5 / 28

slide-15
SLIDE 15

Learning linear separation

. .

1

Given red and blue points – how to compute the separating line ℓ? . .

2

line/plane/hyperplane is the zero set of a linear function. . .

3

Form: ∀x ∈ I Rd f (x) = ⟨a, x⟩ + b, where a = (a1, . . . , ad) , b =(b1, . . . , bd) ∈ I R2. ⟨a, x⟩ = ∑

i aixi is the dot product of a and x.

. .

4

classification done by computing sign of f (x): sign(f (x)). . .

5

If sign(f (x)) is negative: x is not in class. If positive: inside. . .

6

A set of training examples: S =

{

(x1, y1) , . . . ,(xn, yn)

}

, where xi ∈ I Rd and yi ∈ {-1,1}, for i = 1, . . . , n.

Sariel (UIUC) CS573 5 Fall 2013 5 / 28

slide-16
SLIDE 16

Learning linear separation

. .

1

Given red and blue points – how to compute the separating line ℓ? . .

2

line/plane/hyperplane is the zero set of a linear function. . .

3

Form: ∀x ∈ I Rd f (x) = ⟨a, x⟩ + b, where a = (a1, . . . , ad) , b =(b1, . . . , bd) ∈ I R2. ⟨a, x⟩ = ∑

i aixi is the dot product of a and x.

. .

4

classification done by computing sign of f (x): sign(f (x)). . .

5

If sign(f (x)) is negative: x is not in class. If positive: inside. . .

6

A set of training examples: S =

{

(x1, y1) , . . . ,(xn, yn)

}

, where xi ∈ I Rd and yi ∈ {-1,1}, for i = 1, . . . , n.

Sariel (UIUC) CS573 5 Fall 2013 5 / 28

slide-17
SLIDE 17

Learning linear separation

. .

1

Given red and blue points – how to compute the separating line ℓ? . .

2

line/plane/hyperplane is the zero set of a linear function. . .

3

Form: ∀x ∈ I Rd f (x) = ⟨a, x⟩ + b, where a = (a1, . . . , ad) , b =(b1, . . . , bd) ∈ I R2. ⟨a, x⟩ = ∑

i aixi is the dot product of a and x.

. .

4

classification done by computing sign of f (x): sign(f (x)). . .

5

If sign(f (x)) is negative: x is not in class. If positive: inside. . .

6

A set of training examples: S =

{

(x1, y1) , . . . ,(xn, yn)

}

, where xi ∈ I Rd and yi ∈ {-1,1}, for i = 1, . . . , n.

Sariel (UIUC) CS573 5 Fall 2013 5 / 28

slide-18
SLIDE 18

Learning linear separation

. .

1

Given red and blue points – how to compute the separating line ℓ? . .

2

line/plane/hyperplane is the zero set of a linear function. . .

3

Form: ∀x ∈ I Rd f (x) = ⟨a, x⟩ + b, where a = (a1, . . . , ad) , b =(b1, . . . , bd) ∈ I R2. ⟨a, x⟩ = ∑

i aixi is the dot product of a and x.

. .

4

classification done by computing sign of f (x): sign(f (x)). . .

5

If sign(f (x)) is negative: x is not in class. If positive: inside. . .

6

A set of training examples: S =

{

(x1, y1) , . . . ,(xn, yn)

}

, where xi ∈ I Rd and yi ∈ {-1,1}, for i = 1, . . . , n.

Sariel (UIUC) CS573 5 Fall 2013 5 / 28

slide-19
SLIDE 19

Learning linear separation

. .

1

Given red and blue points – how to compute the separating line ℓ? . .

2

line/plane/hyperplane is the zero set of a linear function. . .

3

Form: ∀x ∈ I Rd f (x) = ⟨a, x⟩ + b, where a = (a1, . . . , ad) , b =(b1, . . . , bd) ∈ I R2. ⟨a, x⟩ = ∑

i aixi is the dot product of a and x.

. .

4

classification done by computing sign of f (x): sign(f (x)). . .

5

If sign(f (x)) is negative: x is not in class. If positive: inside. . .

6

A set of training examples: S =

{

(x1, y1) , . . . ,(xn, yn)

}

, where xi ∈ I Rd and yi ∈ {-1,1}, for i = 1, . . . , n.

Sariel (UIUC) CS573 5 Fall 2013 5 / 28

slide-20
SLIDE 20

Classification...

. .

1

linear classifier h: (w, b) where w ∈ I Rd and b ∈ I R. . .

2

classification of x ∈ I Rd is sign(⟨w, x⟩ + b). . .

3

labeled example (x, y), h classifies (x, y) correctly if sign(⟨w, x⟩ + b) = y. . .

4

Assume a linear classifier exists. . .

5

Given n labeled example. How to compute the linear classifier for these examples? . .

6

Use linear programming.... .

7

looking for (w, b), such that for an(xi, yi) we have sign(⟨w, xi⟩ + b) = yi, which is ⟨w, xi⟩ + b ≥ 0 if yi = 1, and ⟨w, xi⟩ + b ≤ 0 if yi = −1.

Sariel (UIUC) CS573 6 Fall 2013 6 / 28

slide-21
SLIDE 21

Classification...

. .

1

linear classifier h: (w, b) where w ∈ I Rd and b ∈ I R. . .

2

classification of x ∈ I Rd is sign(⟨w, x⟩ + b). . .

3

labeled example (x, y), h classifies (x, y) correctly if sign(⟨w, x⟩ + b) = y. . .

4

Assume a linear classifier exists. . .

5

Given n labeled example. How to compute the linear classifier for these examples? . .

6

Use linear programming.... .

7

looking for (w, b), such that for an(xi, yi) we have sign(⟨w, xi⟩ + b) = yi, which is ⟨w, xi⟩ + b ≥ 0 if yi = 1, and ⟨w, xi⟩ + b ≤ 0 if yi = −1.

Sariel (UIUC) CS573 6 Fall 2013 6 / 28

slide-22
SLIDE 22

Classification...

. .

1

linear classifier h: (w, b) where w ∈ I Rd and b ∈ I R. . .

2

classification of x ∈ I Rd is sign(⟨w, x⟩ + b). . .

3

labeled example (x, y), h classifies (x, y) correctly if sign(⟨w, x⟩ + b) = y. . .

4

Assume a linear classifier exists. . .

5

Given n labeled example. How to compute the linear classifier for these examples? . .

6

Use linear programming.... .

7

looking for (w, b), such that for an(xi, yi) we have sign(⟨w, xi⟩ + b) = yi, which is ⟨w, xi⟩ + b ≥ 0 if yi = 1, and ⟨w, xi⟩ + b ≤ 0 if yi = −1.

Sariel (UIUC) CS573 6 Fall 2013 6 / 28

slide-23
SLIDE 23

Classification...

. .

1

linear classifier h: (w, b) where w ∈ I Rd and b ∈ I R. . .

2

classification of x ∈ I Rd is sign(⟨w, x⟩ + b). . .

3

labeled example (x, y), h classifies (x, y) correctly if sign(⟨w, x⟩ + b) = y. . .

4

Assume a linear classifier exists. . .

5

Given n labeled example. How to compute the linear classifier for these examples? . .

6

Use linear programming.... .

7

looking for (w, b), such that for an(xi, yi) we have sign(⟨w, xi⟩ + b) = yi, which is ⟨w, xi⟩ + b ≥ 0 if yi = 1, and ⟨w, xi⟩ + b ≤ 0 if yi = −1.

Sariel (UIUC) CS573 6 Fall 2013 6 / 28

slide-24
SLIDE 24

Classification...

. .

1

linear classifier h: (w, b) where w ∈ I Rd and b ∈ I R. . .

2

classification of x ∈ I Rd is sign(⟨w, x⟩ + b). . .

3

labeled example (x, y), h classifies (x, y) correctly if sign(⟨w, x⟩ + b) = y. . .

4

Assume a linear classifier exists. . .

5

Given n labeled example. How to compute the linear classifier for these examples? . .

6

Use linear programming.... .

7

looking for (w, b), such that for an(xi, yi) we have sign(⟨w, xi⟩ + b) = yi, which is ⟨w, xi⟩ + b ≥ 0 if yi = 1, and ⟨w, xi⟩ + b ≤ 0 if yi = −1.

Sariel (UIUC) CS573 6 Fall 2013 6 / 28

slide-25
SLIDE 25

Classification...

. .

1

Or equivalently, let xi =

(

x1

i , . . . , xd i

)

∈ I Rd, for i = 1, . . . , m, and let w =

(

w1, . . . , wd) , then we get the linear constraint

d

k=1

wkxk

i + b ≥ 0

if yi = 1, and

d

k=1

wkxk

i + b ≤ 0

if yi = −1. Thus, we get a set of linear constraints, one for each training example, and we need to solve the resulting linear program.

Sariel (UIUC) CS573 7 Fall 2013 7 / 28

slide-26
SLIDE 26

Linear programming for learning?

. .

1

Stumbling block: is that linear programming is very sensitive to noise. . .

2

If points are misclassified = ⇒ no solution. . .

3

use an iterative algorithm that converges to the optimal solution if it exists...

Sariel (UIUC) CS573 8 Fall 2013 8 / 28

slide-27
SLIDE 27

Linear programming for learning?

. .

1

Stumbling block: is that linear programming is very sensitive to noise. . .

2

If points are misclassified = ⇒ no solution. . .

3

use an iterative algorithm that converges to the optimal solution if it exists...

Sariel (UIUC) CS573 8 Fall 2013 8 / 28

slide-28
SLIDE 28

Linear programming for learning?

. .

1

Stumbling block: is that linear programming is very sensitive to noise. . .

2

If points are misclassified = ⇒ no solution. . .

3

use an iterative algorithm that converges to the optimal solution if it exists...

Sariel (UIUC) CS573 8 Fall 2013 8 / 28

slide-29
SLIDE 29

Perceptron algorithm...

perceptron(S: a set of l examples) w0 ← 0,k ← 0 R = max(x,y)∈S

  • x
  • .

repeat

for

(x, y) ∈ S do

if sign(⟨wk, x⟩) ̸= y then

wk+1 ← wk + y ∗ x k ← k + 1 until no mistakes are made in the classification

return wk and k

Sariel (UIUC) CS573 9 Fall 2013 9 / 28

slide-30
SLIDE 30

Perceptron algorithm

. .

1

Why perceptron algorithm converges? . .

2

Assume made a mistake on a sample (x, y) and y = 1. Then, ⟨wk, x⟩ < 0, and ⟨wk+1, x⟩ = ⟨wk + y ∗ x, x⟩ = ⟨wk, x⟩ + y ⟨x, x⟩ = ⟨wk, x⟩ + y ∥x∥ > ⟨wk, x⟩ . . .

3

“walking” in the right direction.. .

4

... new value assigned to x by wk+1 is larger (“more positive”) than the old value assigned to x by wk. . .

5

After enough iterations of such fix-ups, label would change...

Sariel (UIUC) CS573 10 Fall 2013 10 / 28

slide-31
SLIDE 31

Perceptron algorithm

. .

1

Why perceptron algorithm converges? . .

2

Assume made a mistake on a sample (x, y) and y = 1. Then, ⟨wk, x⟩ < 0, and ⟨wk+1, x⟩ = ⟨wk + y ∗ x, x⟩ = ⟨wk, x⟩ + y ⟨x, x⟩ = ⟨wk, x⟩ + y ∥x∥ > ⟨wk, x⟩ . . .

3

“walking” in the right direction.. .

4

... new value assigned to x by wk+1 is larger (“more positive”) than the old value assigned to x by wk. . .

5

After enough iterations of such fix-ups, label would change...

Sariel (UIUC) CS573 10 Fall 2013 10 / 28

slide-32
SLIDE 32

Perceptron algorithm

. .

1

Why perceptron algorithm converges? . .

2

Assume made a mistake on a sample (x, y) and y = 1. Then, ⟨wk, x⟩ < 0, and ⟨wk+1, x⟩ = ⟨wk + y ∗ x, x⟩ = ⟨wk, x⟩ + y ⟨x, x⟩ = ⟨wk, x⟩ + y ∥x∥ > ⟨wk, x⟩ . . .

3

“walking” in the right direction.. .

4

... new value assigned to x by wk+1 is larger (“more positive”) than the old value assigned to x by wk. . .

5

After enough iterations of such fix-ups, label would change...

Sariel (UIUC) CS573 10 Fall 2013 10 / 28

slide-33
SLIDE 33

Perceptron algorithm

. .

1

Why perceptron algorithm converges? . .

2

Assume made a mistake on a sample (x, y) and y = 1. Then, ⟨wk, x⟩ < 0, and ⟨wk+1, x⟩ = ⟨wk + y ∗ x, x⟩ = ⟨wk, x⟩ + y ⟨x, x⟩ = ⟨wk, x⟩ + y ∥x∥ > ⟨wk, x⟩ . . .

3

“walking” in the right direction.. .

4

... new value assigned to x by wk+1 is larger (“more positive”) than the old value assigned to x by wk. . .

5

After enough iterations of such fix-ups, label would change...

Sariel (UIUC) CS573 10 Fall 2013 10 / 28

slide-34
SLIDE 34

Perceptron algorithm converges

.

Theorem

. . Let S be a training set of examples, and let R = max(x,y)∈S

  • x
  • .

Suppose that there exists a vector wopt such that

  • wopt
  • = 1, and a

number γ > 0, such that y ⟨wopt, x⟩ ≥ γ ∀(x, y) ∈ S. Then, the number of mistakes made by the online perceptron algorithm on S is at most

(R

γ

)2

.

Sariel (UIUC) CS573 11 Fall 2013 11 / 28

slide-35
SLIDE 35

Claim by figure...

hard easy

Sariel (UIUC) CS573 12 Fall 2013 12 / 28

slide-36
SLIDE 36

Claim by figure...

hard easy

R R

Sariel (UIUC) CS573 12 Fall 2013 12 / 28

slide-37
SLIDE 37

Claim by figure...

hard easy

R wopt

γ

R wopt

γ′

Sariel (UIUC) CS573 12 Fall 2013 12 / 28

slide-38
SLIDE 38

Claim by figure...

hard easy

R wopt

γ

R wopt

γ′

# errors: (R/γ)2 # errors: (R/γ′)2

Sariel (UIUC) CS573 12 Fall 2013 12 / 28

slide-39
SLIDE 39

Proof of Perceptron convergence...

. .

1

Idea of proof: perceptron weight vector converges to wopt. . .

2

Distance between wopt and kth update vector: αk =

  • wk − R2

γ wopt

  • 2

. . .

3

Quantify the change between αk and αk+1 . .

4

Example being misclassified is (x, y).

Sariel (UIUC) CS573 13 Fall 2013 13 / 28

slide-40
SLIDE 40

Proof of Perceptron convergence...

. .

1

Idea of proof: perceptron weight vector converges to wopt. . .

2

Distance between wopt and kth update vector: αk =

  • wk − R2

γ wopt

  • 2

. . .

3

Quantify the change between αk and αk+1 . .

4

Example being misclassified is (x, y).

Sariel (UIUC) CS573 13 Fall 2013 13 / 28

slide-41
SLIDE 41

Proof of Perceptron convergence...

. .

1

Idea of proof: perceptron weight vector converges to wopt. . .

2

Distance between wopt and kth update vector: αk =

  • wk − R2

γ wopt

  • 2

. . .

3

Quantify the change between αk and αk+1 . .

4

Example being misclassified is (x, y).

Sariel (UIUC) CS573 13 Fall 2013 13 / 28

slide-42
SLIDE 42

Proof of Perceptron convergence...

. .

1

Idea of proof: perceptron weight vector converges to wopt. . .

2

Distance between wopt and kth update vector: αk =

  • wk − R2

γ wopt

  • 2

. . .

3

Quantify the change between αk and αk+1 . .

4

Example being misclassified is (x, y).

Sariel (UIUC) CS573 13 Fall 2013 13 / 28

slide-43
SLIDE 43

Proof of Perceptron convergence...

. .

1

Example being misclassified is (x, y) (both are constants). . .

2

wk+1 ← wk + y ∗ x . .

3

αk+1 =

  • wk+1 − R2

γ wopt

  • 2

=

  • wk + yx − R2

γ wopt

  • 2

=

  • (

wk − R2 γ wopt

)

+ yx

  • 2

=

⟨ (

wk − R2

γ wopt

)

+ yx,

(

wk − R2

γ wopt

)

+ yx

=

⟨ (

wk − R2

γ wopt

)

,

(

wk − R2

γ wopt

)⟩

+2y

⟨ (

wk − R2

γ wopt

)

, x

+ ⟨x, x⟩ = αk + 2y

⟨ (

wk − R2

γ wopt

)

, x

+

  • x
  • 2 .

Sariel (UIUC) CS573 14 Fall 2013 14 / 28

slide-44
SLIDE 44

Proof of Perceptron convergence...

. .

1

Example being misclassified is (x, y) (both are constants). . .

2

wk+1 ← wk + y ∗ x . .

3

αk+1 =

  • wk+1 − R2

γ wopt

  • 2

=

  • wk + yx − R2

γ wopt

  • 2

=

  • (

wk − R2 γ wopt

)

+ yx

  • 2

=

⟨ (

wk − R2

γ wopt

)

+ yx,

(

wk − R2

γ wopt

)

+ yx

=

⟨ (

wk − R2

γ wopt

)

,

(

wk − R2

γ wopt

)⟩

+2y

⟨ (

wk − R2

γ wopt

)

, x

+ ⟨x, x⟩ = αk + 2y

⟨ (

wk − R2

γ wopt

)

, x

+

  • x
  • 2 .

Sariel (UIUC) CS573 14 Fall 2013 14 / 28

slide-45
SLIDE 45

Proof of Perceptron convergence...

. .

1

Example being misclassified is (x, y) (both are constants). . .

2

wk+1 ← wk + y ∗ x . .

3

αk+1 =

  • wk+1 − R2

γ wopt

  • 2

=

  • wk + yx − R2

γ wopt

  • 2

=

  • (

wk − R2 γ wopt

)

+ yx

  • 2

=

⟨ (

wk − R2

γ wopt

)

+ yx,

(

wk − R2

γ wopt

)

+ yx

=

⟨ (

wk − R2

γ wopt

)

,

(

wk − R2

γ wopt

)⟩

+2y

⟨ (

wk − R2

γ wopt

)

, x

+ ⟨x, x⟩ = αk + 2y

⟨ (

wk − R2

γ wopt

)

, x

+

  • x
  • 2 .

Sariel (UIUC) CS573 14 Fall 2013 14 / 28

slide-46
SLIDE 46

Proof of Perceptron convergence...

. .

1

Example being misclassified is (x, y) (both are constants). . .

2

wk+1 ← wk + y ∗ x . .

3

αk+1 =

  • wk+1 − R2

γ wopt

  • 2

=

  • wk + yx − R2

γ wopt

  • 2

=

  • (

wk − R2 γ wopt

)

+ yx

  • 2

=

⟨ (

wk − R2

γ wopt

)

+ yx,

(

wk − R2

γ wopt

)

+ yx

=

⟨ (

wk − R2

γ wopt

)

,

(

wk − R2

γ wopt

)⟩

+2y

⟨ (

wk − R2

γ wopt

)

, x

+ ⟨x, x⟩ = αk + 2y

⟨ (

wk − R2

γ wopt

)

, x

+

  • x
  • 2 .

Sariel (UIUC) CS573 14 Fall 2013 14 / 28

slide-47
SLIDE 47

Proof of Perceptron convergence...

. .

1

Example being misclassified is (x, y) (both are constants). . .

2

wk+1 ← wk + y ∗ x . .

3

αk+1 =

  • wk+1 − R2

γ wopt

  • 2

=

  • wk + yx − R2

γ wopt

  • 2

=

  • (

wk − R2 γ wopt

)

+ yx

  • 2

=

⟨ (

wk − R2

γ wopt

)

+ yx,

(

wk − R2

γ wopt

)

+ yx

=

⟨ (

wk − R2

γ wopt

)

,

(

wk − R2

γ wopt

)⟩

+2y

⟨ (

wk − R2

γ wopt

)

, x

+ ⟨x, x⟩ = αk + 2y

⟨ (

wk − R2

γ wopt

)

, x

+

  • x
  • 2 .

Sariel (UIUC) CS573 14 Fall 2013 14 / 28

slide-48
SLIDE 48

Proof of Perceptron convergence...

. .

1

Example being misclassified is (x, y) (both are constants). . .

2

wk+1 ← wk + y ∗ x . .

3

αk+1 =

  • wk+1 − R2

γ wopt

  • 2

=

  • wk + yx − R2

γ wopt

  • 2

=

  • (

wk − R2 γ wopt

)

+ yx

  • 2

=

⟨ (

wk − R2

γ wopt

)

+ yx,

(

wk − R2

γ wopt

)

+ yx

=

⟨ (

wk − R2

γ wopt

)

,

(

wk − R2

γ wopt

)⟩

+2y

⟨ (

wk − R2

γ wopt

)

, x

+ ⟨x, x⟩ = αk + 2y

⟨ (

wk − R2

γ wopt

)

, x

+

  • x
  • 2 .

Sariel (UIUC) CS573 14 Fall 2013 14 / 28

slide-49
SLIDE 49

Proof of Perceptron convergence...

. .

1

Example being misclassified is (x, y) (both are constants). . .

2

wk+1 ← wk + y ∗ x . .

3

αk+1 =

  • wk+1 − R2

γ wopt

  • 2

=

  • wk + yx − R2

γ wopt

  • 2

=

  • (

wk − R2 γ wopt

)

+ yx

  • 2

=

⟨ (

wk − R2

γ wopt

)

+ yx,

(

wk − R2

γ wopt

)

+ yx

=

⟨ (

wk − R2

γ wopt

)

,

(

wk − R2

γ wopt

)⟩

+2y

⟨ (

wk − R2

γ wopt

)

, x

+ ⟨x, x⟩ = αk + 2y

⟨ (

wk − R2

γ wopt

)

, x

+

  • x
  • 2 .

Sariel (UIUC) CS573 14 Fall 2013 14 / 28

slide-50
SLIDE 50

Proof of Perceptron convergence...

. .

1

Example being misclassified is (x, y) (both are constants). . .

2

wk+1 ← wk + y ∗ x . .

3

αk+1 =

  • wk+1 − R2

γ wopt

  • 2

=

  • wk + yx − R2

γ wopt

  • 2

=

  • (

wk − R2 γ wopt

)

+ yx

  • 2

=

⟨ (

wk − R2

γ wopt

)

+ yx,

(

wk − R2

γ wopt

)

+ yx

=

⟨ (

wk − R2

γ wopt

)

,

(

wk − R2

γ wopt

)⟩

+2y

⟨ (

wk − R2

γ wopt

)

, x

+ ⟨x, x⟩ = αk + 2y

⟨ (

wk − R2

γ wopt

)

, x

+

  • x
  • 2 .

Sariel (UIUC) CS573 14 Fall 2013 14 / 28

slide-51
SLIDE 51

Proof of Perceptron convergence...

. .

1

Example being misclassified is (x, y) (both are constants). . .

2

wk+1 ← wk + y ∗ x . .

3

αk+1 =

  • wk+1 − R2

γ wopt

  • 2

=

  • wk + yx − R2

γ wopt

  • 2

=

  • (

wk − R2 γ wopt

)

+ yx

  • 2

=

⟨ (

wk − R2

γ wopt

)

+ yx,

(

wk − R2

γ wopt

)

+ yx

=

⟨ (

wk − R2

γ wopt

)

,

(

wk − R2

γ wopt

)⟩

+2y

⟨ (

wk − R2

γ wopt

)

, x

+ ⟨x, x⟩ = αk + 2y

⟨ (

wk − R2

γ wopt

)

, x

+

  • x
  • 2 .

Sariel (UIUC) CS573 14 Fall 2013 14 / 28

slide-52
SLIDE 52

Proof of Perceptron convergence...

. .

1

We proved: αk+1 = αk + 2y

⟨ (

wk − R2

γ wopt

)

, x

+

  • x
  • 2.

. .

2

(x, y) is misclassified: sign(⟨wk, x⟩) ̸= y . .

3

= ⇒ sign(y ⟨wk, x⟩) = −1 . .

4

= ⇒ y ⟨wk, x⟩ < 0. . .

5

  • x
  • ≤ R =

⇒ αk+1 ≤ αk + R2 + 2y ⟨wk, x⟩ − 2y

⟨R2

γ wopt, x

≤ αk + R2 + −2R2 γ y ⟨wopt,x⟩ . . .

6

... since 2y ⟨wk, x⟩ < 0.

Sariel (UIUC) CS573 15 Fall 2013 15 / 28

slide-53
SLIDE 53

Proof of Perceptron convergence...

. .

1

We proved: αk+1 = αk + 2y

⟨ (

wk − R2

γ wopt

)

, x

+

  • x
  • 2.

. .

2

(x, y) is misclassified: sign(⟨wk, x⟩) ̸= y . .

3

= ⇒ sign(y ⟨wk, x⟩) = −1 . .

4

= ⇒ y ⟨wk, x⟩ < 0. . .

5

  • x
  • ≤ R =

⇒ αk+1 ≤ αk + R2 + 2y ⟨wk, x⟩ − 2y

⟨R2

γ wopt, x

≤ αk + R2 + −2R2 γ y ⟨wopt,x⟩ . . .

6

... since 2y ⟨wk, x⟩ < 0.

Sariel (UIUC) CS573 15 Fall 2013 15 / 28

slide-54
SLIDE 54

Proof of Perceptron convergence...

. .

1

We proved: αk+1 = αk + 2y

⟨ (

wk − R2

γ wopt

)

, x

+

  • x
  • 2.

. .

2

(x, y) is misclassified: sign(⟨wk, x⟩) ̸= y . .

3

= ⇒ sign(y ⟨wk, x⟩) = −1 . .

4

= ⇒ y ⟨wk, x⟩ < 0. . .

5

  • x
  • ≤ R =

⇒ αk+1 ≤ αk + R2 + 2y ⟨wk, x⟩ − 2y

⟨R2

γ wopt, x

≤ αk + R2 + −2R2 γ y ⟨wopt,x⟩ . . .

6

... since 2y ⟨wk, x⟩ < 0.

Sariel (UIUC) CS573 15 Fall 2013 15 / 28

slide-55
SLIDE 55

Proof of Perceptron convergence...

. .

1

We proved: αk+1 = αk + 2y

⟨ (

wk − R2

γ wopt

)

, x

+

  • x
  • 2.

. .

2

(x, y) is misclassified: sign(⟨wk, x⟩) ̸= y . .

3

= ⇒ sign(y ⟨wk, x⟩) = −1 . .

4

= ⇒ y ⟨wk, x⟩ < 0. . .

5

  • x
  • ≤ R =

⇒ αk+1 ≤ αk + R2 + 2y ⟨wk, x⟩ − 2y

⟨R2

γ wopt, x

≤ αk + R2 + −2R2 γ y ⟨wopt,x⟩ . . .

6

... since 2y ⟨wk, x⟩ < 0.

Sariel (UIUC) CS573 15 Fall 2013 15 / 28

slide-56
SLIDE 56

Proof of Perceptron convergence...

. .

1

We proved: αk+1 = αk + 2y

⟨ (

wk − R2

γ wopt

)

, x

+

  • x
  • 2.

. .

2

(x, y) is misclassified: sign(⟨wk, x⟩) ̸= y . .

3

= ⇒ sign(y ⟨wk, x⟩) = −1 . .

4

= ⇒ y ⟨wk, x⟩ < 0. . .

5

  • x
  • ≤ R =

⇒ αk+1 ≤ αk + R2 + 2y ⟨wk, x⟩ − 2y

⟨R2

γ wopt, x

≤ αk + R2 + −2R2 γ y ⟨wopt,x⟩ . . .

6

... since 2y ⟨wk, x⟩ < 0.

Sariel (UIUC) CS573 15 Fall 2013 15 / 28

slide-57
SLIDE 57

Proof of Perceptron convergence...

. .

1

We proved: αk+1 = αk + 2y

⟨ (

wk − R2

γ wopt

)

, x

+

  • x
  • 2.

. .

2

(x, y) is misclassified: sign(⟨wk, x⟩) ̸= y . .

3

= ⇒ sign(y ⟨wk, x⟩) = −1 . .

4

= ⇒ y ⟨wk, x⟩ < 0. . .

5

  • x
  • ≤ R =

⇒ αk+1 ≤ αk + R2 + 2y ⟨wk, x⟩ − 2y

⟨R2

γ wopt, x

≤ αk + R2 + −2R2 γ y ⟨wopt,x⟩ . . .

6

... since 2y ⟨wk, x⟩ < 0.

Sariel (UIUC) CS573 15 Fall 2013 15 / 28

slide-58
SLIDE 58

Proof of Perceptron convergence...

. .

1

Proved: αk+1 ≤ αk + R2 − 2R2

γ y ⟨wopt,x⟩.

. .

2

sign(⟨wopt, x⟩) = y. . .

3

By margin assumption: y ⟨wopt , x⟩ ≥ γ, ∀(x, y) ∈ S. . .

4

αk+1 ≤ αk + R2 − 2R2

γ y ⟨wopt,x⟩

≤ αk + R2 − 2R2

γ γ

≤ αk + R2 − 2R2 ≤ αk − R2.

Sariel (UIUC) CS573 16 Fall 2013 16 / 28

slide-59
SLIDE 59

Proof of Perceptron convergence...

. .

1

Proved: αk+1 ≤ αk + R2 − 2R2

γ y ⟨wopt,x⟩.

. .

2

sign(⟨wopt, x⟩) = y. . .

3

By margin assumption: y ⟨wopt , x⟩ ≥ γ, ∀(x, y) ∈ S. . .

4

αk+1 ≤ αk + R2 − 2R2

γ y ⟨wopt,x⟩

≤ αk + R2 − 2R2

γ γ

≤ αk + R2 − 2R2 ≤ αk − R2.

Sariel (UIUC) CS573 16 Fall 2013 16 / 28

slide-60
SLIDE 60

Proof of Perceptron convergence...

. .

1

Proved: αk+1 ≤ αk + R2 − 2R2

γ y ⟨wopt,x⟩.

. .

2

sign(⟨wopt, x⟩) = y. . .

3

By margin assumption: y ⟨wopt , x⟩ ≥ γ, ∀(x, y) ∈ S. . .

4

αk+1 ≤ αk + R2 − 2R2

γ y ⟨wopt,x⟩

≤ αk + R2 − 2R2

γ γ

≤ αk + R2 − 2R2 ≤ αk − R2.

Sariel (UIUC) CS573 16 Fall 2013 16 / 28

slide-61
SLIDE 61

Proof of Perceptron convergence...

. .

1

Proved: αk+1 ≤ αk + R2 − 2R2

γ y ⟨wopt,x⟩.

. .

2

sign(⟨wopt, x⟩) = y. . .

3

By margin assumption: y ⟨wopt , x⟩ ≥ γ, ∀(x, y) ∈ S. . .

4

αk+1 ≤ αk + R2 − 2R2

γ y ⟨wopt,x⟩

≤ αk + R2 − 2R2

γ γ

≤ αk + R2 − 2R2 ≤ αk − R2.

Sariel (UIUC) CS573 16 Fall 2013 16 / 28

slide-62
SLIDE 62

Proof of Perceptron convergence...

. .

1

Proved: αk+1 ≤ αk + R2 − 2R2

γ y ⟨wopt,x⟩.

. .

2

sign(⟨wopt, x⟩) = y. . .

3

By margin assumption: y ⟨wopt , x⟩ ≥ γ, ∀(x, y) ∈ S. . .

4

αk+1 ≤ αk + R2 − 2R2

γ y ⟨wopt,x⟩

≤ αk + R2 − 2R2

γ γ

≤ αk + R2 − 2R2 ≤ αk − R2.

Sariel (UIUC) CS573 16 Fall 2013 16 / 28

slide-63
SLIDE 63

Proof of Perceptron convergence...

. .

1

Proved: αk+1 ≤ αk + R2 − 2R2

γ y ⟨wopt,x⟩.

. .

2

sign(⟨wopt, x⟩) = y. . .

3

By margin assumption: y ⟨wopt , x⟩ ≥ γ, ∀(x, y) ∈ S. . .

4

αk+1 ≤ αk + R2 − 2R2

γ y ⟨wopt,x⟩

≤ αk + R2 − 2R2

γ γ

≤ αk + R2 − 2R2 ≤ αk − R2.

Sariel (UIUC) CS573 16 Fall 2013 16 / 28

slide-64
SLIDE 64

Proof of Perceptron convergence...

. .

1

Proved: αk+1 ≤ αk + R2 − 2R2

γ y ⟨wopt,x⟩.

. .

2

sign(⟨wopt, x⟩) = y. . .

3

By margin assumption: y ⟨wopt , x⟩ ≥ γ, ∀(x, y) ∈ S. . .

4

αk+1 ≤ αk + R2 − 2R2

γ y ⟨wopt,x⟩

≤ αk + R2 − 2R2

γ γ

≤ αk + R2 − 2R2 ≤ αk − R2.

Sariel (UIUC) CS573 16 Fall 2013 16 / 28

slide-65
SLIDE 65

Proof of Perceptron convergence...

. .

1

We have: αk+1 ≤ αk − R2 . .

2

α0 =

  • 0 − R2

γ wopt

  • 2

= R4 γ2

  • wopt
  • 2 = R4

γ2 . . .

3

∀i αi ≥ 0. . .

4

Q: max # classification errors can make? . .

5

... # of updates . .

6

.. # of updates ≤ α0/R2... . .

7

A: ≤ R2 γ2 .

Sariel (UIUC) CS573 17 Fall 2013 17 / 28

slide-66
SLIDE 66

Proof of Perceptron convergence...

. .

1

We have: αk+1 ≤ αk − R2 . .

2

α0 =

  • 0 − R2

γ wopt

  • 2

= R4 γ2

  • wopt
  • 2 = R4

γ2 . . .

3

∀i αi ≥ 0. . .

4

Q: max # classification errors can make? . .

5

... # of updates . .

6

.. # of updates ≤ α0/R2... . .

7

A: ≤ R2 γ2 .

Sariel (UIUC) CS573 17 Fall 2013 17 / 28

slide-67
SLIDE 67

Proof of Perceptron convergence...

. .

1

We have: αk+1 ≤ αk − R2 . .

2

α0 =

  • 0 − R2

γ wopt

  • 2

= R4 γ2

  • wopt
  • 2 = R4

γ2 . . .

3

∀i αi ≥ 0. . .

4

Q: max # classification errors can make? . .

5

... # of updates . .

6

.. # of updates ≤ α0/R2... . .

7

A: ≤ R2 γ2 .

Sariel (UIUC) CS573 17 Fall 2013 17 / 28

slide-68
SLIDE 68

Proof of Perceptron convergence...

. .

1

We have: αk+1 ≤ αk − R2 . .

2

α0 =

  • 0 − R2

γ wopt

  • 2

= R4 γ2

  • wopt
  • 2 = R4

γ2 . . .

3

∀i αi ≥ 0. . .

4

Q: max # classification errors can make? . .

5

... # of updates . .

6

.. # of updates ≤ α0/R2... . .

7

A: ≤ R2 γ2 .

Sariel (UIUC) CS573 17 Fall 2013 17 / 28

slide-69
SLIDE 69

Proof of Perceptron convergence...

. .

1

We have: αk+1 ≤ αk − R2 . .

2

α0 =

  • 0 − R2

γ wopt

  • 2

= R4 γ2

  • wopt
  • 2 = R4

γ2 . . .

3

∀i αi ≥ 0. . .

4

Q: max # classification errors can make? . .

5

... # of updates . .

6

.. # of updates ≤ α0/R2... . .

7

A: ≤ R2 γ2 .

Sariel (UIUC) CS573 17 Fall 2013 17 / 28

slide-70
SLIDE 70

Proof of Perceptron convergence...

. .

1

We have: αk+1 ≤ αk − R2 . .

2

α0 =

  • 0 − R2

γ wopt

  • 2

= R4 γ2

  • wopt
  • 2 = R4

γ2 . . .

3

∀i αi ≥ 0. . .

4

Q: max # classification errors can make? . .

5

... # of updates . .

6

.. # of updates ≤ α0/R2... . .

7

A: ≤ R2 γ2 .

Sariel (UIUC) CS573 17 Fall 2013 17 / 28

slide-71
SLIDE 71

Proof of Perceptron convergence...

. .

1

We have: αk+1 ≤ αk − R2 . .

2

α0 =

  • 0 − R2

γ wopt

  • 2

= R4 γ2

  • wopt
  • 2 = R4

γ2 . . .

3

∀i αi ≥ 0. . .

4

Q: max # classification errors can make? . .

5

... # of updates . .

6

.. # of updates ≤ α0/R2... . .

7

A: ≤ R2 γ2 .

Sariel (UIUC) CS573 17 Fall 2013 17 / 28

slide-72
SLIDE 72

Concluding comment...

Any linear program can be written as the problem of separating red points from blue points. As such, the perceptron algorithm can be used to solve linear programs.

Sariel (UIUC) CS573 18 Fall 2013 18 / 28

slide-73
SLIDE 73

Learning a circle...

. .

1

Given a set of red points, and blue points in the plane, we want to learn a circle that contains all the red points, and does not contain the blue points. . .

2

Q: How to compute the circle σ ? . .

3

Lifting: ℓ : (x, y) → (x, y, x2 + y2). . .

4

z(P) =

{

ℓ(x, y) = (x, y, x2 + y2)

  • (x, y) ∈ P

}

Sariel (UIUC) CS573 19 Fall 2013 19 / 28

slide-74
SLIDE 74

Learning a circle...

. .

1

Given a set of red points, and blue points in the plane, we want to learn a circle that contains all the red points, and does not contain the blue points.

σ

.

2

Q: How to compute the circle σ ? . .

3

Lifting: ℓ : (x, y) → (x, y, x2 + y2). . .

4

z(P) =

{

ℓ(x, y) = (x, y, x2 + y2)

  • (x, y) ∈ P

}

Sariel (UIUC) CS573 19 Fall 2013 19 / 28

slide-75
SLIDE 75

Learning a circle...

. .

1

Given a set of red points, and blue points in the plane, we want to learn a circle that contains all the red points, and does not contain the blue points.

σ

.

2

Q: How to compute the circle σ ? . .

3

Lifting: ℓ : (x, y) → (x, y, x2 + y2). . .

4

z(P) =

{

ℓ(x, y) = (x, y, x2 + y2)

  • (x, y) ∈ P

}

Sariel (UIUC) CS573 19 Fall 2013 19 / 28

slide-76
SLIDE 76

Learning a circle...

. .

1

Given a set of red points, and blue points in the plane, we want to learn a circle that contains all the red points, and does not contain the blue points.

σ

.

2

Q: How to compute the circle σ ? . .

3

Lifting: ℓ : (x, y) → (x, y, x2 + y2). . .

4

z(P) =

{

ℓ(x, y) = (x, y, x2 + y2)

  • (x, y) ∈ P

}

Sariel (UIUC) CS573 19 Fall 2013 19 / 28

slide-77
SLIDE 77

Learning a circle...

. .

1

Given a set of red points, and blue points in the plane, we want to learn a circle that contains all the red points, and does not contain the blue points.

σ

.

2

Q: How to compute the circle σ ? . .

3

Lifting: ℓ : (x, y) → (x, y, x2 + y2). . .

4

z(P) =

{

ℓ(x, y) = (x, y, x2 + y2)

  • (x, y) ∈ P

}

Sariel (UIUC) CS573 19 Fall 2013 19 / 28

slide-78
SLIDE 78

Learning a circle...

.

Theorem

. . Two sets of points R and B are separable by a circle in two dimensions, if and only if ℓ(R) and ℓ(B) are separable by a plane in three dimensions.

Sariel (UIUC) CS573 20 Fall 2013 20 / 28

slide-79
SLIDE 79

Proof

. .

1

σ ≡ (x − a)2 + (y − b)2 = r2: circle containing R, and all points of B outside. . .

2

∀(x, y) ∈ R (x − a)2 + (y − b)2 ≤ r2 ∀(x, y) ∈ B (x − a)2 + (y − b)2 > r2. . .

3

∀(x, y) ∈ R −2ax −2by +(x2 + y2)−r2 +a2 +b2 ≤ 0. ∀(x, y) ∈ B −2ax −2by +(x2 + y2)−r2 +a2 +b2 > 0. . .

4

Setting z = z(x, y) = x2 + y2: h(x, y, z) = −2ax − 2by + z − r2 + a2 + b2 ∀(x, y) ∈ R h(x, y, z(x, y)) ≤ 0 . .

5

⇐ ⇒ ∀(x, y) ∈ R h(ℓ(x, y)) ≤ 0 ∀(x, y) ∈ B h(ℓ(x, y)) > 0 . .

6

p ∈ σ ⇐ ⇒ h(ℓ(p)) ≤ 0. . .

7

Proved: if point set is separable by a circle = ⇒ lifted point set ℓ(R) and ℓ(B) are separable by a plane.

Sariel (UIUC) CS573 21 Fall 2013 21 / 28

slide-80
SLIDE 80

Proof

. .

1

σ ≡ (x − a)2 + (y − b)2 = r2: circle containing R, and all points of B outside. . .

2

∀(x, y) ∈ R (x − a)2 + (y − b)2 ≤ r2 ∀(x, y) ∈ B (x − a)2 + (y − b)2 > r2. . .

3

∀(x, y) ∈ R −2ax −2by +(x2 + y2)−r2 +a2 +b2 ≤ 0. ∀(x, y) ∈ B −2ax −2by +(x2 + y2)−r2 +a2 +b2 > 0. . .

4

Setting z = z(x, y) = x2 + y2: h(x, y, z) = −2ax − 2by + z − r2 + a2 + b2 ∀(x, y) ∈ R h(x, y, z(x, y)) ≤ 0 . .

5

⇐ ⇒ ∀(x, y) ∈ R h(ℓ(x, y)) ≤ 0 ∀(x, y) ∈ B h(ℓ(x, y)) > 0 . .

6

p ∈ σ ⇐ ⇒ h(ℓ(p)) ≤ 0. . .

7

Proved: if point set is separable by a circle = ⇒ lifted point set ℓ(R) and ℓ(B) are separable by a plane.

Sariel (UIUC) CS573 21 Fall 2013 21 / 28

slide-81
SLIDE 81

Proof

. .

1

σ ≡ (x − a)2 + (y − b)2 = r2: circle containing R, and all points of B outside. . .

2

∀(x, y) ∈ R (x − a)2 + (y − b)2 ≤ r2 ∀(x, y) ∈ B (x − a)2 + (y − b)2 > r2. . .

3

∀(x, y) ∈ R −2ax −2by +(x2 + y2)−r2 +a2 +b2 ≤ 0. ∀(x, y) ∈ B −2ax −2by +(x2 + y2)−r2 +a2 +b2 > 0. . .

4

Setting z = z(x, y) = x2 + y2: h(x, y, z) = −2ax − 2by + z − r2 + a2 + b2 ∀(x, y) ∈ R h(x, y, z(x, y)) ≤ 0 . .

5

⇐ ⇒ ∀(x, y) ∈ R h(ℓ(x, y)) ≤ 0 ∀(x, y) ∈ B h(ℓ(x, y)) > 0 . .

6

p ∈ σ ⇐ ⇒ h(ℓ(p)) ≤ 0. . .

7

Proved: if point set is separable by a circle = ⇒ lifted point set ℓ(R) and ℓ(B) are separable by a plane.

Sariel (UIUC) CS573 21 Fall 2013 21 / 28

slide-82
SLIDE 82

Proof

. .

1

σ ≡ (x − a)2 + (y − b)2 = r2: circle containing R, and all points of B outside. . .

2

∀(x, y) ∈ R (x − a)2 + (y − b)2 ≤ r2 ∀(x, y) ∈ B (x − a)2 + (y − b)2 > r2. . .

3

∀(x, y) ∈ R −2ax −2by +(x2 + y2)−r2 +a2 +b2 ≤ 0. ∀(x, y) ∈ B −2ax −2by +(x2 + y2)−r2 +a2 +b2 > 0. . .

4

Setting z = z(x, y) = x2 + y2: h(x, y, z) = −2ax − 2by + z − r2 + a2 + b2 ∀(x, y) ∈ R h(x, y, z(x, y)) ≤ 0 . .

5

⇐ ⇒ ∀(x, y) ∈ R h(ℓ(x, y)) ≤ 0 ∀(x, y) ∈ B h(ℓ(x, y)) > 0 . .

6

p ∈ σ ⇐ ⇒ h(ℓ(p)) ≤ 0. . .

7

Proved: if point set is separable by a circle = ⇒ lifted point set ℓ(R) and ℓ(B) are separable by a plane.

Sariel (UIUC) CS573 21 Fall 2013 21 / 28

slide-83
SLIDE 83

Proof

. .

1

σ ≡ (x − a)2 + (y − b)2 = r2: circle containing R, and all points of B outside. . .

2

∀(x, y) ∈ R (x − a)2 + (y − b)2 ≤ r2 ∀(x, y) ∈ B (x − a)2 + (y − b)2 > r2. . .

3

∀(x, y) ∈ R −2ax −2by +(x2 + y2)−r2 +a2 +b2 ≤ 0. ∀(x, y) ∈ B −2ax −2by +(x2 + y2)−r2 +a2 +b2 > 0. . .

4

Setting z = z(x, y) = x2 + y2: h(x, y, z) = −2ax − 2by + z − r2 + a2 + b2 ∀(x, y) ∈ R h(x, y, z(x, y)) ≤ 0 . .

5

⇐ ⇒ ∀(x, y) ∈ R h(ℓ(x, y)) ≤ 0 ∀(x, y) ∈ B h(ℓ(x, y)) > 0 . .

6

p ∈ σ ⇐ ⇒ h(ℓ(p)) ≤ 0. . .

7

Proved: if point set is separable by a circle = ⇒ lifted point set ℓ(R) and ℓ(B) are separable by a plane.

Sariel (UIUC) CS573 21 Fall 2013 21 / 28

slide-84
SLIDE 84

Proof

. .

1

σ ≡ (x − a)2 + (y − b)2 = r2: circle containing R, and all points of B outside. . .

2

∀(x, y) ∈ R (x − a)2 + (y − b)2 ≤ r2 ∀(x, y) ∈ B (x − a)2 + (y − b)2 > r2. . .

3

∀(x, y) ∈ R −2ax −2by +(x2 + y2)−r2 +a2 +b2 ≤ 0. ∀(x, y) ∈ B −2ax −2by +(x2 + y2)−r2 +a2 +b2 > 0. . .

4

Setting z = z(x, y) = x2 + y2: h(x, y, z) = −2ax − 2by + z − r2 + a2 + b2 ∀(x, y) ∈ R h(x, y, z(x, y)) ≤ 0 . .

5

⇐ ⇒ ∀(x, y) ∈ R h(ℓ(x, y)) ≤ 0 ∀(x, y) ∈ B h(ℓ(x, y)) > 0 . .

6

p ∈ σ ⇐ ⇒ h(ℓ(p)) ≤ 0. . .

7

Proved: if point set is separable by a circle = ⇒ lifted point set ℓ(R) and ℓ(B) are separable by a plane.

Sariel (UIUC) CS573 21 Fall 2013 21 / 28

slide-85
SLIDE 85

Proof

. .

1

σ ≡ (x − a)2 + (y − b)2 = r2: circle containing R, and all points of B outside. . .

2

∀(x, y) ∈ R (x − a)2 + (y − b)2 ≤ r2 ∀(x, y) ∈ B (x − a)2 + (y − b)2 > r2. . .

3

∀(x, y) ∈ R −2ax −2by +(x2 + y2)−r2 +a2 +b2 ≤ 0. ∀(x, y) ∈ B −2ax −2by +(x2 + y2)−r2 +a2 +b2 > 0. . .

4

Setting z = z(x, y) = x2 + y2: h(x, y, z) = −2ax − 2by + z − r2 + a2 + b2 ∀(x, y) ∈ R h(x, y, z(x, y)) ≤ 0 . .

5

⇐ ⇒ ∀(x, y) ∈ R h(ℓ(x, y)) ≤ 0 ∀(x, y) ∈ B h(ℓ(x, y)) > 0 . .

6

p ∈ σ ⇐ ⇒ h(ℓ(p)) ≤ 0. . .

7

Proved: if point set is separable by a circle = ⇒ lifted point set ℓ(R) and ℓ(B) are separable by a plane.

Sariel (UIUC) CS573 21 Fall 2013 21 / 28

slide-86
SLIDE 86

Proof: Other direction

. .

1

Assume ℓ(R) and ℓ(B) are linearly separable. Let separating place be: h ≡ ax + by + cz + d = 0 . .

2

∀(x, y, x2 + y2) ∈ ℓ(R): ax + by + c(x2 + y2) + d ≤ 0 . .

3

∀(x, y, x2 + y2) ∈ ℓ(B): ax + by + c(x2 + y2) + d ≥ 0. . .

4

U(h) =

{

(x, y)

  • h((x, y, x2 + y2)) ≤ 0

}

. . .

5

If U(h) is a circle = ⇒ R ⊂ U(h) and B ∩ U(h) = ∅. . .

6

U(h) ≡ ax + by + c(x2 + y2) ≤ −d. . .

7

⇐ ⇒

(

x2 + a

cx

)

+

(

y2 + b

cy

)

≤ − d

c

. .

8

⇐ ⇒

(

x +

a 2c

)2 + (

y +

b 2c

)2 ≤ a2+b2

4c2

− d

c

. .

9

This is disk in the plane, as claimed.

Sariel (UIUC) CS573 22 Fall 2013 22 / 28

slide-87
SLIDE 87

Proof: Other direction

. .

1

Assume ℓ(R) and ℓ(B) are linearly separable. Let separating place be: h ≡ ax + by + cz + d = 0 . .

2

∀(x, y, x2 + y2) ∈ ℓ(R): ax + by + c(x2 + y2) + d ≤ 0 . .

3

∀(x, y, x2 + y2) ∈ ℓ(B): ax + by + c(x2 + y2) + d ≥ 0. . .

4

U(h) =

{

(x, y)

  • h((x, y, x2 + y2)) ≤ 0

}

. . .

5

If U(h) is a circle = ⇒ R ⊂ U(h) and B ∩ U(h) = ∅. . .

6

U(h) ≡ ax + by + c(x2 + y2) ≤ −d. . .

7

⇐ ⇒

(

x2 + a

cx

)

+

(

y2 + b

cy

)

≤ − d

c

. .

8

⇐ ⇒

(

x +

a 2c

)2 + (

y +

b 2c

)2 ≤ a2+b2

4c2

− d

c

. .

9

This is disk in the plane, as claimed.

Sariel (UIUC) CS573 22 Fall 2013 22 / 28

slide-88
SLIDE 88

Proof: Other direction

. .

1

Assume ℓ(R) and ℓ(B) are linearly separable. Let separating place be: h ≡ ax + by + cz + d = 0 . .

2

∀(x, y, x2 + y2) ∈ ℓ(R): ax + by + c(x2 + y2) + d ≤ 0 . .

3

∀(x, y, x2 + y2) ∈ ℓ(B): ax + by + c(x2 + y2) + d ≥ 0. . .

4

U(h) =

{

(x, y)

  • h((x, y, x2 + y2)) ≤ 0

}

. . .

5

If U(h) is a circle = ⇒ R ⊂ U(h) and B ∩ U(h) = ∅. . .

6

U(h) ≡ ax + by + c(x2 + y2) ≤ −d. . .

7

⇐ ⇒

(

x2 + a

cx

)

+

(

y2 + b

cy

)

≤ − d

c

. .

8

⇐ ⇒

(

x +

a 2c

)2 + (

y +

b 2c

)2 ≤ a2+b2

4c2

− d

c

. .

9

This is disk in the plane, as claimed.

Sariel (UIUC) CS573 22 Fall 2013 22 / 28

slide-89
SLIDE 89

Proof: Other direction

. .

1

Assume ℓ(R) and ℓ(B) are linearly separable. Let separating place be: h ≡ ax + by + cz + d = 0 . .

2

∀(x, y, x2 + y2) ∈ ℓ(R): ax + by + c(x2 + y2) + d ≤ 0 . .

3

∀(x, y, x2 + y2) ∈ ℓ(B): ax + by + c(x2 + y2) + d ≥ 0. . .

4

U(h) =

{

(x, y)

  • h((x, y, x2 + y2)) ≤ 0

}

. . .

5

If U(h) is a circle = ⇒ R ⊂ U(h) and B ∩ U(h) = ∅. . .

6

U(h) ≡ ax + by + c(x2 + y2) ≤ −d. . .

7

⇐ ⇒

(

x2 + a

cx

)

+

(

y2 + b

cy

)

≤ − d

c

. .

8

⇐ ⇒

(

x +

a 2c

)2 + (

y +

b 2c

)2 ≤ a2+b2

4c2

− d

c

. .

9

This is disk in the plane, as claimed.

Sariel (UIUC) CS573 22 Fall 2013 22 / 28

slide-90
SLIDE 90

Proof: Other direction

. .

1

Assume ℓ(R) and ℓ(B) are linearly separable. Let separating place be: h ≡ ax + by + cz + d = 0 . .

2

∀(x, y, x2 + y2) ∈ ℓ(R): ax + by + c(x2 + y2) + d ≤ 0 . .

3

∀(x, y, x2 + y2) ∈ ℓ(B): ax + by + c(x2 + y2) + d ≥ 0. . .

4

U(h) =

{

(x, y)

  • h((x, y, x2 + y2)) ≤ 0

}

. . .

5

If U(h) is a circle = ⇒ R ⊂ U(h) and B ∩ U(h) = ∅. . .

6

U(h) ≡ ax + by + c(x2 + y2) ≤ −d. . .

7

⇐ ⇒

(

x2 + a

cx

)

+

(

y2 + b

cy

)

≤ − d

c

. .

8

⇐ ⇒

(

x +

a 2c

)2 + (

y +

b 2c

)2 ≤ a2+b2

4c2

− d

c

. .

9

This is disk in the plane, as claimed.

Sariel (UIUC) CS573 22 Fall 2013 22 / 28

slide-91
SLIDE 91

Proof: Other direction

. .

1

Assume ℓ(R) and ℓ(B) are linearly separable. Let separating place be: h ≡ ax + by + cz + d = 0 . .

2

∀(x, y, x2 + y2) ∈ ℓ(R): ax + by + c(x2 + y2) + d ≤ 0 . .

3

∀(x, y, x2 + y2) ∈ ℓ(B): ax + by + c(x2 + y2) + d ≥ 0. . .

4

U(h) =

{

(x, y)

  • h((x, y, x2 + y2)) ≤ 0

}

. . .

5

If U(h) is a circle = ⇒ R ⊂ U(h) and B ∩ U(h) = ∅. . .

6

U(h) ≡ ax + by + c(x2 + y2) ≤ −d. . .

7

⇐ ⇒

(

x2 + a

cx

)

+

(

y2 + b

cy

)

≤ − d

c

. .

8

⇐ ⇒

(

x +

a 2c

)2 + (

y +

b 2c

)2 ≤ a2+b2

4c2

− d

c

. .

9

This is disk in the plane, as claimed.

Sariel (UIUC) CS573 22 Fall 2013 22 / 28

slide-92
SLIDE 92

Proof: Other direction

. .

1

Assume ℓ(R) and ℓ(B) are linearly separable. Let separating place be: h ≡ ax + by + cz + d = 0 . .

2

∀(x, y, x2 + y2) ∈ ℓ(R): ax + by + c(x2 + y2) + d ≤ 0 . .

3

∀(x, y, x2 + y2) ∈ ℓ(B): ax + by + c(x2 + y2) + d ≥ 0. . .

4

U(h) =

{

(x, y)

  • h((x, y, x2 + y2)) ≤ 0

}

. . .

5

If U(h) is a circle = ⇒ R ⊂ U(h) and B ∩ U(h) = ∅. . .

6

U(h) ≡ ax + by + c(x2 + y2) ≤ −d. . .

7

⇐ ⇒

(

x2 + a

cx

)

+

(

y2 + b

cy

)

≤ − d

c

. .

8

⇐ ⇒

(

x +

a 2c

)2 + (

y +

b 2c

)2 ≤ a2+b2

4c2

− d

c

. .

9

This is disk in the plane, as claimed.

Sariel (UIUC) CS573 22 Fall 2013 22 / 28

slide-93
SLIDE 93

Proof: Other direction

. .

1

Assume ℓ(R) and ℓ(B) are linearly separable. Let separating place be: h ≡ ax + by + cz + d = 0 . .

2

∀(x, y, x2 + y2) ∈ ℓ(R): ax + by + c(x2 + y2) + d ≤ 0 . .

3

∀(x, y, x2 + y2) ∈ ℓ(B): ax + by + c(x2 + y2) + d ≥ 0. . .

4

U(h) =

{

(x, y)

  • h((x, y, x2 + y2)) ≤ 0

}

. . .

5

If U(h) is a circle = ⇒ R ⊂ U(h) and B ∩ U(h) = ∅. . .

6

U(h) ≡ ax + by + c(x2 + y2) ≤ −d. . .

7

⇐ ⇒

(

x2 + a

cx

)

+

(

y2 + b

cy

)

≤ − d

c

. .

8

⇐ ⇒

(

x +

a 2c

)2 + (

y +

b 2c

)2 ≤ a2+b2

4c2

− d

c

. .

9

This is disk in the plane, as claimed.

Sariel (UIUC) CS573 22 Fall 2013 22 / 28

slide-94
SLIDE 94

Proof: Other direction

. .

1

Assume ℓ(R) and ℓ(B) are linearly separable. Let separating place be: h ≡ ax + by + cz + d = 0 . .

2

∀(x, y, x2 + y2) ∈ ℓ(R): ax + by + c(x2 + y2) + d ≤ 0 . .

3

∀(x, y, x2 + y2) ∈ ℓ(B): ax + by + c(x2 + y2) + d ≥ 0. . .

4

U(h) =

{

(x, y)

  • h((x, y, x2 + y2)) ≤ 0

}

. . .

5

If U(h) is a circle = ⇒ R ⊂ U(h) and B ∩ U(h) = ∅. . .

6

U(h) ≡ ax + by + c(x2 + y2) ≤ −d. . .

7

⇐ ⇒

(

x2 + a

cx

)

+

(

y2 + b

cy

)

≤ − d

c

. .

8

⇐ ⇒

(

x +

a 2c

)2 + (

y +

b 2c

)2 ≤ a2+b2

4c2

− d

c

. .

9

This is disk in the plane, as claimed.

Sariel (UIUC) CS573 22 Fall 2013 22 / 28

slide-95
SLIDE 95

A closing comment...

Linear separability is a powerful technique that can be used to learn complicated concepts that are considerably more complicated than just hyperplane separation. This lifting technique showed above is the kernel technique or linearization.

Sariel (UIUC) CS573 23 Fall 2013 23 / 28

slide-96
SLIDE 96

A Little Bit On VC Dimension

. .

1

Q: how complex is the function trying to learn? . .

2

VC-dimension is one way of capturing this notion. (VC = Vapnik, Chervonenkis,1971). . .

3

A matter of expressivity: What is harder to learn:

. . .

1

A rectangle in the plane. . . .

2

A halfplane. . . .

3

A convex polygon with k sides.

Sariel (UIUC) CS573 24 Fall 2013 24 / 28

slide-97
SLIDE 97

A Little Bit On VC Dimension

. .

1

Q: how complex is the function trying to learn? . .

2

VC-dimension is one way of capturing this notion. (VC = Vapnik, Chervonenkis,1971). . .

3

A matter of expressivity: What is harder to learn:

. . .

1

A rectangle in the plane. . . .

2

A halfplane. . . .

3

A convex polygon with k sides.

Sariel (UIUC) CS573 24 Fall 2013 24 / 28

slide-98
SLIDE 98

A Little Bit On VC Dimension

. .

1

Q: how complex is the function trying to learn? . .

2

VC-dimension is one way of capturing this notion. (VC = Vapnik, Chervonenkis,1971). . .

3

A matter of expressivity: What is harder to learn:

. . .

1

A rectangle in the plane. . . .

2

A halfplane. . . .

3

A convex polygon with k sides.

Sariel (UIUC) CS573 24 Fall 2013 24 / 28

slide-99
SLIDE 99

Thinking about concepts as binary functions...

. .

1

X = {p1,p2, . . . , pm}: points in the plane. . .

2

H: set of all halfplanes. . .

3

A half-plane r ∈ H defines a binary vector r(X) =(b1, . . . , bm) where bi = 1 if and only if pi is inside r. . .

4

Possible binary vectors generated by halfplanes: U(X, H) = {r(X) | r ∈ H} . . .

5

A set X of m elements is shattered by R if |U(X, R)| = 2m. . .

6

What does this mean? .

7

The VC-dimension of a set of ranges R is the size of the largest set that it can shatter.

Sariel (UIUC) CS573 25 Fall 2013 25 / 28

slide-100
SLIDE 100

Thinking about concepts as binary functions...

. .

1

X = {p1,p2, . . . , pm}: points in the plane. . .

2

H: set of all halfplanes. . .

3

A half-plane r ∈ H defines a binary vector r(X) =(b1, . . . , bm) where bi = 1 if and only if pi is inside r. . .

4

Possible binary vectors generated by halfplanes: U(X, H) = {r(X) | r ∈ H} . . .

5

A set X of m elements is shattered by R if |U(X, R)| = 2m. . .

6

What does this mean? .

7

The VC-dimension of a set of ranges R is the size of the largest set that it can shatter.

Sariel (UIUC) CS573 25 Fall 2013 25 / 28

slide-101
SLIDE 101

Thinking about concepts as binary functions...

. .

1

X = {p1,p2, . . . , pm}: points in the plane. . .

2

H: set of all halfplanes. . .

3

A half-plane r ∈ H defines a binary vector r(X) =(b1, . . . , bm) where bi = 1 if and only if pi is inside r. . .

4

Possible binary vectors generated by halfplanes: U(X, H) = {r(X) | r ∈ H} . . .

5

A set X of m elements is shattered by R if |U(X, R)| = 2m. . .

6

What does this mean? .

7

The VC-dimension of a set of ranges R is the size of the largest set that it can shatter.

Sariel (UIUC) CS573 25 Fall 2013 25 / 28

slide-102
SLIDE 102

Thinking about concepts as binary functions...

. .

1

X = {p1,p2, . . . , pm}: points in the plane. . .

2

H: set of all halfplanes. . .

3

A half-plane r ∈ H defines a binary vector r(X) =(b1, . . . , bm) where bi = 1 if and only if pi is inside r. . .

4

Possible binary vectors generated by halfplanes: U(X, H) = {r(X) | r ∈ H} . . .

5

A set X of m elements is shattered by R if |U(X, R)| = 2m. . .

6

What does this mean? .

7

The VC-dimension of a set of ranges R is the size of the largest set that it can shatter.

Sariel (UIUC) CS573 25 Fall 2013 25 / 28

slide-103
SLIDE 103

Thinking about concepts as binary functions...

. .

1

X = {p1,p2, . . . , pm}: points in the plane. . .

2

H: set of all halfplanes. . .

3

A half-plane r ∈ H defines a binary vector r(X) =(b1, . . . , bm) where bi = 1 if and only if pi is inside r. . .

4

Possible binary vectors generated by halfplanes: U(X, H) = {r(X) | r ∈ H} . . .

5

A set X of m elements is shattered by R if |U(X, R)| = 2m. . .

6

What does this mean? .

7

The VC-dimension of a set of ranges R is the size of the largest set that it can shatter.

Sariel (UIUC) CS573 25 Fall 2013 25 / 28

slide-104
SLIDE 104

Examples

What is the VC dimensions of circles in the plane? X is set of n points in the plane C is a set of all circles. X = {p, q, r, s} What subsets of X can we generate by circle?

p q r s

Sariel (UIUC) CS573 26 Fall 2013 26 / 28

slide-105
SLIDE 105

Examples

What is the VC dimensions of circles in the plane? X is set of n points in the plane C is a set of all circles. X = {p, q, r, s} What subsets of X can we generate by circle?

p q r s

Sariel (UIUC) CS573 26 Fall 2013 26 / 28

slide-106
SLIDE 106

Subsets realized by disks

p q r s

{}, {r}, {p}, {q}, {s},{p, s}, {p, q}, {p, r},{r, q}{q, s} and {r, p, q}, {p, r, s}{p, s, q},{s, q, r} and {r, p, q, s} We got only 15 sets. There is one set which is not there. Which one? The VC dimension of circles in the plane is 3.

Sariel (UIUC) CS573 27 Fall 2013 27 / 28

slide-107
SLIDE 107

Subsets realized by disks

p q r s

{}, {r}, {p}, {q}, {s},{p, s}, {p, q}, {p, r},{r, q}{q, s} and {r, p, q}, {p, r, s}{p, s, q},{s, q, r} and {r, p, q, s} We got only 15 sets. There is one set which is not there. Which one? The VC dimension of circles in the plane is 3.

Sariel (UIUC) CS573 27 Fall 2013 27 / 28

slide-108
SLIDE 108

Subsets realized by disks

p q r s

{}, {r}, {p}, {q}, {s},{p, s}, {p, q}, {p, r},{r, q}{q, s} and {r, p, q}, {p, r, s}{p, s, q},{s, q, r} and {r, p, q, s} We got only 15 sets. There is one set which is not there. Which one? The VC dimension of circles in the plane is 3.

Sariel (UIUC) CS573 27 Fall 2013 27 / 28

slide-109
SLIDE 109

Sauer’s Lemma

.

Lemma (Sauer Lemma)

. . If R has VC dimension d then |U(X, R)| = O

(

md) , where m is the size of X.

Sariel (UIUC) CS573 28 Fall 2013 28 / 28

slide-110
SLIDE 110

Notes

Sariel (UIUC) CS573 29 Fall 2013 29 / 28

slide-111
SLIDE 111

Notes

Sariel (UIUC) CS573 30 Fall 2013 30 / 28

slide-112
SLIDE 112

Notes

Sariel (UIUC) CS573 31 Fall 2013 31 / 28

slide-113
SLIDE 113

Notes

Sariel (UIUC) CS573 32 Fall 2013 32 / 28