 
              FastANN for High Quality Collaborative Filtering Yun-Ta Tsai 1 , Markus Steinberger 2 , Dawid Pająk 3 , Kari Pulli 4 1 Google, Inc. 2 TU Graz 3 NVIDIA 4 Light Research
Story
Collaborative Filtering - Aggregation 
Collaborative Filtering - Result
Collaborative Filtering - Result
Related Work
Distance Table Distance Table (sorted) 0.13 1.43 0.98 1.33 1.21 2.33 0.13 0.98 1.21 1.33 1.43 2.33 1.22 0.31 0.45 2.01 1.75 0.48 0.31 0.45 0.48 1.22 1.75 2.01 2.11 1.12 0.92 3.16 0.33 0.21 0.21 0.33 0.92 1.12 2.11 3.16 Garcia et al. ICIP 2010
Cayton et al. ADMS 2010
Adams et al. SIGGRAPH 2009
Limitation
Our solution
Our solution Efficient implementation on GPU General solution for different filters High image quality Applicable to different applications
Our solution
Our solution Clustering kNN Query Filtering
Design challenges Register pressure Memory access pattern Thread divergence Kernel launch overhead Memory footprint
Our solution Clustering kNN Query Filtering
Tiling  
Clustering
Clustering
Warp-wide operation 3 8 2 6 3 9 1 4
Warp-wide operation 3 8 2 6 3 9 1 4 3 8 2 6 3 9 1
Warp-wide operation 3 11 10 8 9 12 10 5
Warp-wide operation 3 11 10 8 9 12 10 5 3 11 10 8 9 12
Warp-wide operation 3 11 13 19 19 20 19 17
Warp-wide operation 3 11 13 19 19 20 19 17 3 11 13 19
Warp-wide operation 3 11 13 19 22 31 32 36 Reduce register usage Better parallelism Minimize thread and warp divergence
Clustering kNN kNN kNN kNN
Our solution Clustering
Our solution kNN Query
kNN Query kNN kNN kNN kNN
kNN Query p 0 p 1 p 2 p 3 p 4 p 0 p 1 kNN p 2 p 3 p 0 p 1 p 2 p 3 p 4 p 4
kNN Query p 0 p 1 p 2 p 3 p 4 p 0 p 1 kNN p 2 p 3 p 0 p 1 p 2 p 3 p 4 p 4
kNN Query p 0 p 1 p 2 p 3 p 4 p 0 0.0 0.4 1.3 0.9 0.7 kNN p 0 p 1 p 2 p 3 p 4
kNN Query p 0 p 1 p 2 p 3 p 4 p 0 0.0 0.4 1.3 0.9 0.7 1 1 0 0 1 ≤ 0.8
kNN Query p 0 p 1 p 2 p 3 p 4 p 0 0.0 0.4 1.3 0.9 0.7 1 0 0 0 0 ≤ 0.0
kNN Query p 0 p 1 p 2 p 3 p 4 p 0 0.0 0.4 1.3 0.9 0.7 1 0 0 0 0 ≤ 0.0 1 1 0 0 0 ≤ 0.4
kNN Query p 0 p 1 p 2 p 3 p 4 p 0 0.0 0.4 1.3 0.9 0.7 1 0 0 0 0 ≤ 0.0 1 1 0 0 0 ≤ 0.4 1 1 1 1 1 ≤ 1.3
kNN Query p 0 p 1 p 2 p 3 p 4 p 0 0.0 0.4 1.3 0.9 0.7 1 0 0 0 0 ≤ 0.0 1 1 0 0 0 ≤ 0.4 1 1 1 1 1 ≤ 1.3 ≤ 0.9 1 1 0 1 1
kNN Query p 0 p 1 p 2 p 3 p 4 p 0 0.0 0.4 1.3 0.9 0.7 1 0 0 0 0 ≤ 0.0 1 1 0 0 0 ≤ 0.4 1 1 1 1 1 ≤ 1.3 ≤ 0.9 1 1 0 1 1 ≤ 0.7 1 1 0 0 1
kNN Query p 0 p 1 p 2 p 3 p 4 p 0 0.0 0.4 1.3 0.9 0.7 1 0 0 0 0 ≤ 0.0 1 1 0 0 0 ≤ 0.4 1 1 1 1 1 ≤ 1.3 ≤ 0.9 1 1 0 1 1 ≤ 0.7 1 1 0 0 1
kNN Query p 0 p 1 p 2 p 3 p 4 1 1 1 1 0 1 1 1 0 1 kNN 0 1 1 1 1 1 1 1 0 1 1 1 0 1 1
Filtering
Our solution Hierarchical 2-mean clustering Clustering Warp-wide operators kNN search Distance table kNN Query Voting Binary coding Filtering Filtering and aggregation
Results
NN quality % of patches matching the kNN result, k=16 (the higher the better) 97.88 39.01 35.21 34.86 24.87 7.18 0.22 Randomized K-means Composite Hierarchical Generalized Random Ball Ours KD-trees Clustering PatchMatch Cover
NN quality D ann /D knn , k=16 (the lower the better) 23.91 7.38 3.01 2.00 1.99 1.32 1.01 Randomized K-means Composite Hierarchical Generalized Random Ball Ours KD-trees Clustering PatchMatch Cover
Single Frame Noise Reduction Nonlocal Means – PSNR [dB], k=16 (the higher the better) 28.55 27.83 27.79 27.13 27.02 26.88 25.65 21.24 Randomized K-means Composite Hierarchical Generalized Random Ball Ours Exhaustive KD-trees Clustering PatchMatch Cover Search
Single Frame Noise Reduction BM3D – PSNR [dB], k=16 (the higher the better) 31.10 31.05 30.72 30.71 30.68 30.57 29.87 28.92 Randomized K-means Composite Hierarchical Generalized Random Ball Ours Exhaustive KD-trees Clustering PatchMatch Cover Search
Single Frame Noise Reduction Run-time [ms], k=16, 0.25MPix (the lower the better) 8930 Query Clustering 1024 1017 982 788 8.19 Randomized KD- K-means Composite Hierarchical Generalized Ours (GPU) trees Clustering PatchMatch
Single Frame Noise Reduction Run-time [ms], k=16, 0.25MPix (the lower the better) 26359 Query 11328 Clustering 594.99 48.3 8.19 kNN-Garcia (GPU) Random Ball Cover Window Search Window Search Opt Ours (GPU) (GPU) (GPU) (GPU)
Architectures MPix/s/Watt, k=16, 0.25MPix (the higher the better) 0.29 0.16 0.02 Core i7-950 Geforce GTX 680 Tegra K1
Burst Denoising – Single Frame First frame of stack First frame of stack 26.45dB 26.45dB
Burst Denoising – Single Frame First frame of stack First frame of stack 26.45dB 26.45dB
Burst Denoising – Single Frame GK GK D- D- T T r r e e e e s s / / r r st st f f r r a a me me Ou Ou r r s s NL NL M M / / r r st st f f r r a a me me First frame of stack First frame of stack 3 3 1 1 . . 0 0 1 1 d d B B / / 1 1 1 1 . . 3 3 s s 3 3 1 1 . . 9 9 0 0 d d B B / / 0 0 . . 0 0 2 2 s s 26.45dB 26.45dB
Burst Denoising – All Frames GKD-Trees / stack GKD-Trees / stack Ours NLM / stack Ours NLM / stack First frame of stack First frame of stack 31.53dB / 1080s 31.53dB / 1080s 34.10dB / 0.52s 34.10dB / 0.52s 26.45dB 26.45dB
Burst Denoising – All Frames GKD-Trees / stack GKD-Trees / stack Ours NLM / stack Ours NLM / stack First frame of stack First frame of stack 31.53dB / 1080s 31.53dB / 1080s 34.10dB / 0.52s 34.10dB / 0.52s 26.45dB 26.45dB
Global Illumination
Global Illumination
Global Illumination
Geometry Noise Reduction Noisy Input
Geometry Noise Reduction Ours
Geometry Noise Reduction Exhaustive Search Ours
Conclusions Efficient implementation on GPU High image quality Applicable to different applications
Thank you Paper and Binary: http://bit.ly/fast-ann
Recommend
More recommend