Fast K-Means with Accurate Bounds James Newling & Franc ois - - PowerPoint PPT Presentation

fast k means with accurate bounds
SMART_READER_LITE
LIVE PREVIEW

Fast K-Means with Accurate Bounds James Newling & Franc ois - - PowerPoint PPT Presentation

Fast K-Means with Accurate Bounds James Newling & Franc ois Fleuret Idiap Research Institute Computer Vision and Learning Group & EPFL June 20th, 2016 COLE POLYTECHNIQUE FDRALE DE LAUSANNE K -Means Problem Statement and


slide-1
SLIDE 1

Fast K-Means with Accurate Bounds

James Newling & Franc ¸ois Fleuret Idiap Research Institute

Computer Vision and Learning Group

& EPFL

June 20th, 2016

ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE

slide-2
SLIDE 2

K-Means

Problem Statement and Lloyd’s Algorithm

Given data (xi)N

i=1 ∈ (Rd)N, find centers (ck)K k=1 ∈ (Rd)K

minimising

N

  • i=1

min

k=1:K xi − ck2.

NP-hard, so heuristic algorithms such as Lloyd’s are used Lloyd’s algorithm run for T iterations requires dKNT FLOPs We are interested in making it faster

1 / 9

slide-3
SLIDE 3

Lloyd’s Algorithm

× : data

  • : centers

× × × × × × × × × × × × × × × × × × × × × × × × × × × × × × ×

  • 2 / 9
slide-4
SLIDE 4

Lloyd’s Algorithm

Assignment of datapoint at iteration 1

× × × × × × × × × × × × × × × × × × × × × × × × × × × × × × ×

  • 2 / 9
slide-5
SLIDE 5

Lloyd’s Algorithm

All assignments at iteration 1

× × × × × × × × × × × × × × × × × × × × × × × × × × × × × × ×

  • 2 / 9
slide-6
SLIDE 6

Lloyd’s Algorithm

Updates at iteration 1

× × × × × × × × × × × × × × × × × × × × × × × × × × × × × × ×

  • 2 / 9
slide-7
SLIDE 7

Lloyd’s Algorithm

Assignment of datapoint at iteration 2

× × × × × × × × × × × × × × × × × × × × × × × × × × × × × × ×

  • 2 / 9
slide-8
SLIDE 8

Lloyd’s Algorithm

All assignments at iteration 2

× × × × × × × × × × × × × × × × × × × × × × × × × × × × × × ×

  • 2 / 9
slide-9
SLIDE 9

Lloyd’s Algorithm

Updates at iteration 2

× × × × × × × × × × × × × × × × × × × × × × × × × × × × × × ×

  • 2 / 9
slide-10
SLIDE 10

Lloyd’s Algorithm

Assignment of datapoint at iteration 3

× × × × × × × × × × × × × × × × × × × × × × × × × × × × × × ×

  • 2 / 9
slide-11
SLIDE 11

Lloyd’s Algorithm

All assignments at iteration 3

× × × × × × × × × × × × × × × × × × × × × × × × × × × × × × ×

  • 2 / 9
slide-12
SLIDE 12

Lloyd’s Algorithm

Updates at iteration 3

× × × × × × × × × × × × × × × × × × × × × × × × × × × × × × ×

  • 2 / 9
slide-13
SLIDE 13

Lloyd’s Algorithm

Assignment of datapoint at iteration 4

× × × × × × × × × × × × × × × × × × × × × × × × × × × × × × ×

  • 2 / 9
slide-14
SLIDE 14

Lloyd’s Algorithm

All assignments at iteration 4

× × × × × × × × × × × × × × × × × × × × × × × × × × × × × × ×

  • 2 / 9
slide-15
SLIDE 15

Lloyd’s Algorithm

Updates at iteration 4

× × × × × × × × × × × × × × × × × × × × × × × × × × × × × × ×

  • 2 / 9
slide-16
SLIDE 16

Lloyd’s Algorithm

How to Accelerate

Two approaches : (1) approximate it (2) be more efficient – get exactly the same output as Lloyd’s algorithm without all data-center distances

i Pelleg et al. (1999) i Kanungo et al. (2002)

∆ Hamerly (2010) ∆ Elkan (2003) best high-d ∆ Yinyang (2015) best mid-d ∆ Annular (2013) best low-d

3 / 9

slide-17
SLIDE 17

Lloyd’s Algorithm

How to Accelerate

Two approaches : (1) approximate it

  • nly exact for next 13 minutes

(2) be more efficient – get exactly the same output as Lloyd’s algorithm without all data-center distances

i Pelleg et al. (1999) i Kanungo et al. (2002)

∆ Hamerly (2010) ∆ Elkan (2003) best high-d ∆ Yinyang (2015) best mid-d ∆ Annular (2013) best low-d

3 / 9

slide-18
SLIDE 18

Using The Triangle Inequality

Elkan’s Two Techniques

Elkan uses the triangle inequality in two distinct ways (1) center-center distances to bound data-center distances (2) directly maintain bounds on data-center distances

  • ×

× U L U L

4 / 9

slide-19
SLIDE 19

Using The Triangle Inequality

Elkan’s Two Techniques

Elkan uses the triangle inequality in two distinct ways (1) center-center distances to bound data-center distances (2) directly maintain bounds on data-center distances

  • ×

× U L U L (A) We show that (1) + (2) is slower than just (2). Simplifying helps!

4 / 9

slide-20
SLIDE 20

Using The Triangle Inequality

Elkan K − 1 lower bounds

  • ×

U L

5 / 9

slide-21
SLIDE 21

Using The Triangle Inequality

Yinyang group lower bounds

  • ×

U L

5 / 9

slide-22
SLIDE 22

Using The Triangle Inequality

Hamerly 1 lower bound

  • ×

U L

5 / 9

slide-23
SLIDE 23

Lower bound updating ×

  • 6 / 9
slide-24
SLIDE 24

Lower bound updating ×

  • 6 / 9
slide-25
SLIDE 25

Lower bound updating ×

  • 6 / 9
slide-26
SLIDE 26

Lower bound updating ×

  • 6 / 9
slide-27
SLIDE 27

Lower bound updating ×

  • 6 / 9
slide-28
SLIDE 28

Lower bound updating ×

  • 6 / 9
slide-29
SLIDE 29

Lower bound updating ×

  • 6 / 9
slide-30
SLIDE 30

Lower bound updating ×

  • 6 / 9
slide-31
SLIDE 31

Lower bound updating ×

  • 6 / 9
slide-32
SLIDE 32

Lower bound updating ×

  • 6 / 9
slide-33
SLIDE 33

Lower bound updating ×

  • 6 / 9
slide-34
SLIDE 34

Lower bound updating ×

  • ·-bound

· -bound

6 / 9

slide-35
SLIDE 35

·-bounds All upper and lower bounds in Elkan, Hamerly, Yinyang, Annular are · -bounds, and can be replaced by tighter ·-bounds. There is a cost to ·-bounds, additional memory is required:

  • Store historical centers from all rounds
  • Store the round in which bounds are made tight

This memory overhead can be controlled by periodically clearing the history, requiring a · -bound update

7 / 9

slide-36
SLIDE 36

·-bounds All upper and lower bounds in Elkan, Hamerly, Yinyang, Annular are · -bounds, and can be replaced by tighter ·-bounds. There is a cost to ·-bounds, additional memory is required:

  • Store historical centers from all rounds
  • Store the round in which bounds are made tight

This memory overhead can be controlled by periodically clearing the history, requiring a · -bound update (B) We show that ·-bounding generally improves algorithms.

7 / 9

slide-37
SLIDE 37

Hamerly (2010) bound test, failure 1 ×

  • 8 / 9
slide-38
SLIDE 38

Hamerly (2010) bound test, failure 2 ×

  • 8 / 9
slide-39
SLIDE 39

Hamerly (2010) compute all distances ×

  • 8 / 9
slide-40
SLIDE 40

Hamerly (2010) reset bounds ×

  • 8 / 9
slide-41
SLIDE 41

Eliminating distance calculations c ∈ B(x, r) ⇒ c ∈ {cnew

a

, cnew

b

} r ×

  • cold

a

  • cold

b

  • r = maxc∈{cold

a

,cold

b

} x − c

8 / 9

slide-42
SLIDE 42

Annular (2013) elimination zone c > x + r ⇒ c ∈ B(x, r) (• : centers eliminated ) r x + r ×

  • 8 / 9
slide-43
SLIDE 43

Annular (2013) elimination zone c < x − r ⇒ c ∈ B(x, r) (• : centers eliminated ) r x − r ×

  • 8 / 9
slide-44
SLIDE 44

Annular (2013) elimination zone

  • c − x
  • < r ⇒ c ∈ B(x, r)

(• : centers eliminated ) r ×

  • 8 / 9
slide-45
SLIDE 45

Annular (2013) elimination zone

  • c − x
  • < r ⇒ c ∈ B(x, r)

(• : centers eliminated ) r ×

  • elimination

O(log N) if c sorted

8 / 9

slide-46
SLIDE 46

Annular (2013) elimination zone

  • c − x
  • < r ⇒ c ∈ B(x, r)

(• : centers eliminated ) r ×

  • ?
  • ?
  • ?
  • ?
  • ?
  • ?
  • ?
  • ?
  • ?
  • ?
  • ?
  • ?
  • elimination

O(log N) if c sorted

8 / 9

slide-47
SLIDE 47

Exponion (ours) elimination zone c − cold

a

> 2x − cold

a

+ x − cold

b ⇒ c ∈ B(x, r)

r ×

  • cold

a

  • 8 / 9
slide-48
SLIDE 48

Exponion (ours) elimination zone c − cold

a

> 2x − cold

a

+ x − cold

b ⇒ c ∈ B(x, r)

r ×

  • cold

a

  • 8 / 9
slide-49
SLIDE 49

Exponion (ours) elimination zone c − cold

a

> 2x − cold

a

+ x − cold

b ⇒ c ∈ B(x, r)

r ×

  • cold

a

  • 8 / 9
slide-50
SLIDE 50

Exponion (ours) elimination zone c − cold

a

> 2x − cold

a

+ x − cold

b ⇒ c ∈ B(x, r)

r ×

  • cold

a

  • 8 / 9
slide-51
SLIDE 51

Exponion (ours) elimination zone c − cold

a

> 2x − cold

a

+ x − cold

b ⇒ c ∈ B(x, r)

r ×

  • cold

a

  • 8 / 9
slide-52
SLIDE 52

Exponion (ours) elimination zone c − cold

a

> 2x − cold

a

+ x − cold

b ⇒ c ∈ B(x, r)

r ×

  • cold

a

  • 8 / 9
slide-53
SLIDE 53

Exponion (ours) elimination zone c − cold

a

> 2x − cold

a

+ x − cold

b ⇒ c ∈ B(x, r)

r ×

  • cold

a

  • 8 / 9
slide-54
SLIDE 54

Exponion (ours) elimination zone c − cold

a

> 2x − cold

a

+ x − cold

b ⇒ c ∈ B(x, r)

r ×

  • cold

a

  • 8 / 9
slide-55
SLIDE 55

Exponion (ours) elimination zone c − cold

a

> 2x − cold

a

+ x − cold

b ⇒ c ∈ B(x, r)

r ×

  • cold

a

  • 8 / 9
slide-56
SLIDE 56

Exponion (ours) elimination zone c − cold

a

> 2x − cold

a

+ x − cold

b ⇒ c ∈ B(x, r)

r ×

  • cold

a

  • 8 / 9
slide-57
SLIDE 57

Exponion (ours) elimination zone c − cold

a

> 2x − cold

a

+ x − cold

b ⇒ c ∈ B(x, r)

r ×

  • cold

a

  • 8 / 9
slide-58
SLIDE 58

Exponion (ours) elimination zone c − cold

a

> 2x − cold

a

+ x − cold

b ⇒ c ∈ B(x, r)

r R ×

  • cold

a

  • 8 / 9
slide-59
SLIDE 59

Exponion (ours) elimination zone c − cold

a

> R ⇒ c ∈ B(x, r) (• : centers eliminated ) r R ×

  • cold

a

  • 8 / 9
slide-60
SLIDE 60

Exponion (ours) elimination zone (C) We find that Exponion is generally faster than Annular r R ×

  • cold

a

  • 8 / 9
slide-61
SLIDE 61

Experiments and Results 22 datasets (d : 2 → 784, N : 60k → 2.6m) and K ∈ {100, 1000} 4 public code bases (mlpack, BaylorML, PowerGraph, VLFeat) + + all from scratch

9 / 9

slide-62
SLIDE 62

Experiments and Results 22 datasets (d : 2 → 784, N : 60k → 2.6m) and K ∈ {100, 1000} 4 public code bases (mlpack, BaylorML, PowerGraph, VLFeat) + + all from scratch (A) Simplification accelerates,

  • Elkan in 16/18 high-d experiments, mean speed-up 15%
  • Yinyang in 43/44 all-d experiments, mean speed-up 60%

9 / 9

slide-63
SLIDE 63

Experiments and Results 22 datasets (d : 2 → 784, N : 60k → 2.6m) and K ∈ {100, 1000} 4 public code bases (mlpack, BaylorML, PowerGraph, VLFeat) + + all from scratch (A) Simplification accelerates,

  • Elkan in 16/18 high-d experiments, mean speed-up 15%
  • Yinyang in 43/44 all-d experiments, mean speed-up 60%

(B) Replacing · -bounding by ·-bounding helps

  • In high-d speed-up in 15/20 experiments, mean speed-up of

12%

9 / 9

slide-64
SLIDE 64

Experiments and Results 22 datasets (d : 2 → 784, N : 60k → 2.6m) and K ∈ {100, 1000} 4 public code bases (mlpack, BaylorML, PowerGraph, VLFeat) + + all from scratch (A) Simplification accelerates,

  • Elkan in 16/18 high-d experiments, mean speed-up 15%
  • Yinyang in 43/44 all-d experiments, mean speed-up 60%

(B) Replacing · -bounding by ·-bounding helps

  • In high-d speed-up in 15/20 experiments, mean speed-up of

12% (C) Exponion is generally faster than Annular

  • In low-d Exponion is faster than Annular in 18/22 experiments,

mean speed-up of 35%

9 / 9

slide-65
SLIDE 65

Conclusion

Speed-up: run-times

  • f any of the other 4

implementations of any algorithm relative to our fastest implementations of

  • ur algorithms

0.2 0.4 0.6 0.8 1 1 2 3 4 5 6 7 8 fraction of data sets speed-up K = 102 K = 103

10 / 9

slide-66
SLIDE 66

Conclusion

Speed-up: run-times

  • f any of the other 4

implementations of any algorithm relative to our fastest implementations of

  • ur algorithms

0.2 0.4 0.6 0.8 1 1 2 3 4 5 6 7 8 fraction of data sets speed-up K = 102 K = 103

Our multi-threaded & easy-to-use code is available under an open source licence

10 / 9

slide-67
SLIDE 67

The end james.newling@idiap.ch