Collaborative Filtering Yun-Ta Tsai 1 , Markus Steinberger 2 , Dawid - - PowerPoint PPT Presentation

collaborative filtering
SMART_READER_LITE
LIVE PREVIEW

Collaborative Filtering Yun-Ta Tsai 1 , Markus Steinberger 2 , Dawid - - PowerPoint PPT Presentation

FastANN for High Quality Collaborative Filtering Yun-Ta Tsai 1 , Markus Steinberger 2 , Dawid Pajk 3 , Kari Pulli 4 1 Google, Inc. 2 TU Graz 3 NVIDIA 4 Light Research Story Collaborative Filtering - Aggregation Collaborative Filtering -


slide-1
SLIDE 1

Yun-Ta Tsai1, Markus Steinberger2, Dawid Pająk3, Kari Pulli4

FastANN for High Quality Collaborative Filtering

1Google, Inc. 2TU Graz 3NVIDIA

Research

4Light

slide-2
SLIDE 2

Story

slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6

Collaborative Filtering - Aggregation

slide-7
SLIDE 7

Collaborative Filtering - Result

slide-8
SLIDE 8

Collaborative Filtering - Result

slide-9
SLIDE 9

Related Work

slide-10
SLIDE 10

0.13 1.43 0.98 1.33 1.21 2.33 1.22 0.31 0.45 2.01 1.75 0.48 2.11 1.12 0.92 3.16 0.33 0.21 0.13 0.98 1.21 1.33 1.43 2.33 0.31 0.45 0.48 1.22 1.75 2.01 0.21 0.33 0.92 1.12 2.11 3.16 Garcia et al. ICIP 2010 Distance Table Distance Table (sorted)

slide-11
SLIDE 11

Cayton et al. ADMS 2010

slide-12
SLIDE 12

Adams et al. SIGGRAPH 2009

slide-13
SLIDE 13

Limitation

slide-14
SLIDE 14
slide-15
SLIDE 15

Our solution

slide-16
SLIDE 16

Efficient implementation on GPU General solution for different filters High image quality Applicable to different applications

Our solution

slide-17
SLIDE 17

Our solution

slide-18
SLIDE 18

Our solution

Clustering kNN Query Filtering

slide-19
SLIDE 19

Design challenges

Register pressure Memory access pattern Thread divergence Kernel launch overhead Memory footprint

slide-20
SLIDE 20

Our solution

Clustering kNN Query Filtering

slide-21
SLIDE 21

Tiling

 

slide-22
SLIDE 22

Clustering

slide-23
SLIDE 23

Clustering

slide-24
SLIDE 24

Warp-wide operation

3 8 2 6 3 9 1 4

slide-25
SLIDE 25

Warp-wide operation

3 8 2 6 3 9 1 4 3 8 2 6 3 9 1

slide-26
SLIDE 26

Warp-wide operation

3 11 10 8 9 12 10 5

slide-27
SLIDE 27

Warp-wide operation

3 11 10 8 9 12 10 5 3 11 10 8 9 12

slide-28
SLIDE 28

Warp-wide operation

3 11 13 19 19 20 19 17

slide-29
SLIDE 29

Warp-wide operation

3 11 13 19 19 20 19 17 3 11 13 19

slide-30
SLIDE 30

Warp-wide operation

3 11 13 19 22 31 32 36

Reduce register usage Better parallelism Minimize thread and warp divergence

slide-31
SLIDE 31

Clustering

kNN kNN kNN kNN

slide-32
SLIDE 32

Our solution

Clustering

slide-33
SLIDE 33

Our solution

kNN Query

slide-34
SLIDE 34

kNN Query

kNN kNN kNN kNN

slide-35
SLIDE 35

kNN Query

kNN p0 p1 p2 p3 p4 p0 p1 p2 p3 p4 p0 p1 p2 p3 p4

slide-36
SLIDE 36

kNN Query

kNN p0 p1 p2 p3 p4 p0 p1 p2 p3 p4 p0 p1 p2 p3 p4

slide-37
SLIDE 37

kNN Query

kNN

0.0 0.4 1.3 0.9 0.7

p0 p0 p1 p2 p3 p4 p0 p1 p2 p3 p4

slide-38
SLIDE 38

kNN Query

0.0 0.4 1.3 0.9 0.7

p0 p0 p1 p2 p3 p4 ≤ 0.8 1 1 1

slide-39
SLIDE 39

kNN Query

0.0 0.4 1.3 0.9 0.7

p0 p0 p1 p2 p3 p4 ≤ 0.0 1

slide-40
SLIDE 40

kNN Query

0.0 0.4 1.3 0.9 0.7

p0 p0 p1 p2 p3 p4 ≤ 0.0 1 ≤ 0.4 1 1

slide-41
SLIDE 41

kNN Query

0.0 0.4 1.3 0.9 0.7

p0 p0 p1 p2 p3 p4 ≤ 0.0 1 ≤ 0.4 1 1 ≤ 1.3 1 1 1 1 1

slide-42
SLIDE 42

kNN Query

0.0 0.4 1.3 0.9 0.7

p0 p0 p1 p2 p3 p4 ≤ 0.0 1 ≤ 0.4 1 1 ≤ 1.3 1 1 1 1 1 ≤ 0.9 1 1 1 1

slide-43
SLIDE 43

kNN Query

0.0 0.4 1.3 0.9 0.7

p0 p0 p1 p2 p3 p4 ≤ 0.0 1 ≤ 0.4 1 1 ≤ 1.3 1 1 1 1 1 ≤ 0.9 1 1 1 1 ≤ 0.7 1 1 1

slide-44
SLIDE 44

kNN Query

0.0 0.4 1.3 0.9 0.7

p0 p0 p1 p2 p3 p4 ≤ 0.0 1 ≤ 0.4 1 1 ≤ 1.3 1 1 1 1 1 ≤ 0.9 1 1 1 1 ≤ 0.7 1 1 1

slide-45
SLIDE 45

kNN Query

kNN p0 p1 p2 p3 p4

1 1 0 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 0

slide-46
SLIDE 46

Filtering

slide-47
SLIDE 47

Our solution

Hierarchical 2-mean clustering Warp-wide operators kNN search Distance table Voting Binary coding Filtering and aggregation

Clustering kNN Query Filtering

slide-48
SLIDE 48

Results

slide-49
SLIDE 49

NN quality

24.87 34.86 35.21 7.18 0.22 97.88 39.01 Randomized KD-trees K-means Composite Hierarchical Clustering Generalized PatchMatch Random Ball Cover Ours

% of patches matching the kNN result, k=16

(the higher the better)

slide-50
SLIDE 50

NN quality

3.01 2.00 1.99 7.38 23.91 1.01 1.32 Randomized KD-trees K-means Composite Hierarchical Clustering Generalized PatchMatch Random Ball Cover Ours

Dann/Dknn, k=16

(the lower the better)

slide-51
SLIDE 51

Single Frame Noise Reduction

26.88 27.13 27.02 25.65 21.24 27.83 27.79 28.55 Randomized KD-trees K-means Composite Hierarchical Clustering Generalized PatchMatch Random Ball Cover Ours Exhaustive Search

Nonlocal Means – PSNR [dB], k=16 (the higher the better)

slide-52
SLIDE 52

Single Frame Noise Reduction

30.72 30.68 30.57 29.87 28.92 30.71 31.05 31.10 Randomized KD-trees K-means Composite Hierarchical Clustering Generalized PatchMatch Random Ball Cover Ours Exhaustive Search

BM3D – PSNR [dB], k=16 (the higher the better)

slide-53
SLIDE 53

Single Frame Noise Reduction

788 982 1024 1017 8930 8.19 Randomized KD- trees K-means Composite Hierarchical Clustering Generalized PatchMatch Ours (GPU)

Run-time [ms], k=16, 0.25MPix (the lower the better)

Query Clustering

slide-54
SLIDE 54

Single Frame Noise Reduction

26359 11328 594.99 48.3 8.19 kNN-Garcia (GPU) Random Ball Cover (GPU) Window Search (GPU) Window Search Opt (GPU) Ours (GPU)

Run-time [ms], k=16, 0.25MPix (the lower the better)

Query Clustering

slide-55
SLIDE 55

Architectures

0.02 0.16 0.29 Core i7-950 Geforce GTX 680 Tegra K1

MPix/s/Watt, k=16, 0.25MPix (the higher the better)

slide-56
SLIDE 56

Burst Denoising – Single Frame

First frame of stack 26.45dB First frame of stack 26.45dB

slide-57
SLIDE 57

First frame of stack 26.45dB First frame of stack 26.45dB

Burst Denoising – Single Frame

slide-58
SLIDE 58

Burst Denoising – Single Frame

First frame of stack 26.45dB First frame of stack 26.45dB GK D- T r e e s / ฀r st f r a me 3 1 . 1 d B / 1 1 . 3 s GK D- T r e e s / ฀r st f r a me 3 1 . 1 d B / 1 1 . 3 s Ou r s NL M / ฀r st f r a me 3 1 . 9 d B / . 2 s Ou r s NL M / ฀r st f r a me 3 1 . 9 d B / . 2 s

slide-59
SLIDE 59

Burst Denoising – All Frames

First frame of stack 26.45dB First frame of stack 26.45dB GKD-Trees / stack 31.53dB / 1080s GKD-Trees / stack 31.53dB / 1080s Ours NLM / stack 34.10dB / 0.52s Ours NLM / stack 34.10dB / 0.52s

slide-60
SLIDE 60

Burst Denoising – All Frames

First frame of stack 26.45dB First frame of stack 26.45dB GKD-Trees / stack 31.53dB / 1080s GKD-Trees / stack 31.53dB / 1080s Ours NLM / stack 34.10dB / 0.52s Ours NLM / stack 34.10dB / 0.52s

slide-61
SLIDE 61

Global Illumination

slide-62
SLIDE 62

Global Illumination

slide-63
SLIDE 63

Global Illumination

slide-64
SLIDE 64

Geometry Noise Reduction

Noisy Input

slide-65
SLIDE 65

Geometry Noise Reduction

Ours

slide-66
SLIDE 66

Geometry Noise Reduction

Ours Exhaustive Search

slide-67
SLIDE 67

Conclusions

Efficient implementation on GPU High image quality Applicable to different applications

slide-68
SLIDE 68

Thank you

Paper and Binary:

http://bit.ly/fast-ann