SLIDE 1 Yun-Ta Tsai1, Markus Steinberger2, Dawid Pająk3, Kari Pulli4
FastANN for High Quality Collaborative Filtering
1Google, Inc. 2TU Graz 3NVIDIA
Research
4Light
SLIDE 2
Story
SLIDE 3
SLIDE 4
SLIDE 5
SLIDE 6 Collaborative Filtering - Aggregation
SLIDE 7
Collaborative Filtering - Result
SLIDE 8
Collaborative Filtering - Result
SLIDE 9
Related Work
SLIDE 10 0.13 1.43 0.98 1.33 1.21 2.33 1.22 0.31 0.45 2.01 1.75 0.48 2.11 1.12 0.92 3.16 0.33 0.21 0.13 0.98 1.21 1.33 1.43 2.33 0.31 0.45 0.48 1.22 1.75 2.01 0.21 0.33 0.92 1.12 2.11 3.16 Garcia et al. ICIP 2010 Distance Table Distance Table (sorted)
SLIDE 11 Cayton et al. ADMS 2010
SLIDE 12 Adams et al. SIGGRAPH 2009
SLIDE 13
Limitation
SLIDE 14
SLIDE 15
Our solution
SLIDE 16
Efficient implementation on GPU General solution for different filters High image quality Applicable to different applications
Our solution
SLIDE 17
Our solution
SLIDE 18 Our solution
Clustering kNN Query Filtering
SLIDE 19
Design challenges
Register pressure Memory access pattern Thread divergence Kernel launch overhead Memory footprint
SLIDE 20 Our solution
Clustering kNN Query Filtering
SLIDE 22
Clustering
SLIDE 23
Clustering
SLIDE 24
Warp-wide operation
3 8 2 6 3 9 1 4
SLIDE 25
Warp-wide operation
3 8 2 6 3 9 1 4 3 8 2 6 3 9 1
SLIDE 26
Warp-wide operation
3 11 10 8 9 12 10 5
SLIDE 27
Warp-wide operation
3 11 10 8 9 12 10 5 3 11 10 8 9 12
SLIDE 28
Warp-wide operation
3 11 13 19 19 20 19 17
SLIDE 29
Warp-wide operation
3 11 13 19 19 20 19 17 3 11 13 19
SLIDE 30
Warp-wide operation
3 11 13 19 22 31 32 36
Reduce register usage Better parallelism Minimize thread and warp divergence
SLIDE 31 Clustering
kNN kNN kNN kNN
SLIDE 32 Our solution
Clustering
SLIDE 33 Our solution
kNN Query
SLIDE 34 kNN Query
kNN kNN kNN kNN
SLIDE 35 kNN Query
kNN p0 p1 p2 p3 p4 p0 p1 p2 p3 p4 p0 p1 p2 p3 p4
SLIDE 36 kNN Query
kNN p0 p1 p2 p3 p4 p0 p1 p2 p3 p4 p0 p1 p2 p3 p4
SLIDE 37 kNN Query
kNN
0.0 0.4 1.3 0.9 0.7
p0 p0 p1 p2 p3 p4 p0 p1 p2 p3 p4
SLIDE 38 kNN Query
0.0 0.4 1.3 0.9 0.7
p0 p0 p1 p2 p3 p4 ≤ 0.8 1 1 1
SLIDE 39 kNN Query
0.0 0.4 1.3 0.9 0.7
p0 p0 p1 p2 p3 p4 ≤ 0.0 1
SLIDE 40 kNN Query
0.0 0.4 1.3 0.9 0.7
p0 p0 p1 p2 p3 p4 ≤ 0.0 1 ≤ 0.4 1 1
SLIDE 41 kNN Query
0.0 0.4 1.3 0.9 0.7
p0 p0 p1 p2 p3 p4 ≤ 0.0 1 ≤ 0.4 1 1 ≤ 1.3 1 1 1 1 1
SLIDE 42 kNN Query
0.0 0.4 1.3 0.9 0.7
p0 p0 p1 p2 p3 p4 ≤ 0.0 1 ≤ 0.4 1 1 ≤ 1.3 1 1 1 1 1 ≤ 0.9 1 1 1 1
SLIDE 43 kNN Query
0.0 0.4 1.3 0.9 0.7
p0 p0 p1 p2 p3 p4 ≤ 0.0 1 ≤ 0.4 1 1 ≤ 1.3 1 1 1 1 1 ≤ 0.9 1 1 1 1 ≤ 0.7 1 1 1
SLIDE 44 kNN Query
0.0 0.4 1.3 0.9 0.7
p0 p0 p1 p2 p3 p4 ≤ 0.0 1 ≤ 0.4 1 1 ≤ 1.3 1 1 1 1 1 ≤ 0.9 1 1 1 1 ≤ 0.7 1 1 1
SLIDE 45 kNN Query
kNN p0 p1 p2 p3 p4
1 1 0 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 0
SLIDE 46
Filtering
SLIDE 47 Our solution
Hierarchical 2-mean clustering Warp-wide operators kNN search Distance table Voting Binary coding Filtering and aggregation
Clustering kNN Query Filtering
SLIDE 48
Results
SLIDE 49 NN quality
24.87 34.86 35.21 7.18 0.22 97.88 39.01 Randomized KD-trees K-means Composite Hierarchical Clustering Generalized PatchMatch Random Ball Cover Ours
% of patches matching the kNN result, k=16
(the higher the better)
SLIDE 50 NN quality
3.01 2.00 1.99 7.38 23.91 1.01 1.32 Randomized KD-trees K-means Composite Hierarchical Clustering Generalized PatchMatch Random Ball Cover Ours
Dann/Dknn, k=16
(the lower the better)
SLIDE 51 Single Frame Noise Reduction
26.88 27.13 27.02 25.65 21.24 27.83 27.79 28.55 Randomized KD-trees K-means Composite Hierarchical Clustering Generalized PatchMatch Random Ball Cover Ours Exhaustive Search
Nonlocal Means – PSNR [dB], k=16 (the higher the better)
SLIDE 52 Single Frame Noise Reduction
30.72 30.68 30.57 29.87 28.92 30.71 31.05 31.10 Randomized KD-trees K-means Composite Hierarchical Clustering Generalized PatchMatch Random Ball Cover Ours Exhaustive Search
BM3D – PSNR [dB], k=16 (the higher the better)
SLIDE 53 Single Frame Noise Reduction
788 982 1024 1017 8930 8.19 Randomized KD- trees K-means Composite Hierarchical Clustering Generalized PatchMatch Ours (GPU)
Run-time [ms], k=16, 0.25MPix (the lower the better)
Query Clustering
SLIDE 54 Single Frame Noise Reduction
26359 11328 594.99 48.3 8.19 kNN-Garcia (GPU) Random Ball Cover (GPU) Window Search (GPU) Window Search Opt (GPU) Ours (GPU)
Run-time [ms], k=16, 0.25MPix (the lower the better)
Query Clustering
SLIDE 55 Architectures
0.02 0.16 0.29 Core i7-950 Geforce GTX 680 Tegra K1
MPix/s/Watt, k=16, 0.25MPix (the higher the better)
SLIDE 56
Burst Denoising – Single Frame
First frame of stack 26.45dB First frame of stack 26.45dB
SLIDE 57
First frame of stack 26.45dB First frame of stack 26.45dB
Burst Denoising – Single Frame
SLIDE 58
Burst Denoising – Single Frame
First frame of stack 26.45dB First frame of stack 26.45dB GK D- T r e e s / r st f r a me 3 1 . 1 d B / 1 1 . 3 s GK D- T r e e s / r st f r a me 3 1 . 1 d B / 1 1 . 3 s Ou r s NL M / r st f r a me 3 1 . 9 d B / . 2 s Ou r s NL M / r st f r a me 3 1 . 9 d B / . 2 s
SLIDE 59
Burst Denoising – All Frames
First frame of stack 26.45dB First frame of stack 26.45dB GKD-Trees / stack 31.53dB / 1080s GKD-Trees / stack 31.53dB / 1080s Ours NLM / stack 34.10dB / 0.52s Ours NLM / stack 34.10dB / 0.52s
SLIDE 60
Burst Denoising – All Frames
First frame of stack 26.45dB First frame of stack 26.45dB GKD-Trees / stack 31.53dB / 1080s GKD-Trees / stack 31.53dB / 1080s Ours NLM / stack 34.10dB / 0.52s Ours NLM / stack 34.10dB / 0.52s
SLIDE 61
Global Illumination
SLIDE 62
Global Illumination
SLIDE 63
Global Illumination
SLIDE 64 Geometry Noise Reduction
Noisy Input
SLIDE 65
Geometry Noise Reduction
Ours
SLIDE 66
Geometry Noise Reduction
Ours Exhaustive Search
SLIDE 67
Conclusions
Efficient implementation on GPU High image quality Applicable to different applications
SLIDE 68
Thank you
Paper and Binary:
http://bit.ly/fast-ann