Survey Similarity search for complex similarity models Analysis of - PowerPoint PPT Presentation

Optimal Multi-Step k -Nearest Neighbor Search Thomas Seidl and Hans-Peter Kriegel University of Munich, Germany ACM SIGMOD ‘98, Seattle

Survey • Similarity search for complex similarity models • Analysis of previous solution for k -nn search • An optimality criterion for k -nn search • Optimal algorithm for k -nn search • Performance analysis (c) 1998 Thomas Seidl SIGMOD ‘98 - 2

Distance-based Similarity Search Principle: Small Distances ↔ Strong Similarity k -NearestNeighborQuery ( q , k ): { } ( ) RangeQuery , : ( , ) ε ∈ ≤ ε q o DB d o q { } monotonous d q − 1 , ,     → � k DB 4th 2nd 3rd 1st no answer too many answers k nearest neighbors (c) 1998 Thomas Seidl SIGMOD ‘98 - 3

Complex Similarity Models • Quadratic Form Distance Functions A 2 ( , ) ( ) ( ) = − ⋅ ⋅ − T d p q p q p q A – Color Histograms for Image Databases (QBIC) 256-D histograms (Niblack et al. 93) (Hafner et al. 95) – Shape Similarity for 2D and 3D: Up to 4,096-D vectors (Thesis Seidl 97) – … • Max-Morphological Distance – 2D images: Tumor shapes (Korn et al. 96) (c) 1998 Thomas Seidl SIGMOD ‘98 - 4

Cost of Single Evaluations – Quadratic Form Distance Functions 100,000 evaluation 1,656 102 time [msec] 1,000 6.2 1.1 10 0.4 0.23 0 21 64 112 256 1,024 4,096 dimension – Max-Morphological Distance (Korn et al. 96) 12.69 seconds (avg) per distance evaluation (c) 1998 Thomas Seidl SIGMOD ‘98 - 5

Multi-Step Query Processing • Multi-Step Similarity Search Range Queries (Faloutsos et al. 94) Filter Step k -Nearest Neighbor Queries (Korn et al. 96) (index-based) • No False Drops? candidates Lower-Bounding Property Refinement Step ≤ ( , ) ( , ) d p q d p q (exact evaluation) f o filter distance object distance results (c) 1998 Thomas Seidl SIGMOD ‘98 - 6

Previous k -nn Algorithm (Korn et al. 96) query (q,k) First More candidates k -nn query on Index ( d f ) Phase generated Index than necessary primary d max (d o ) k Second in d max query on Index ( d f ) d Objects Fixed Phase x a m 2nd Phase! >>k final k -nn (d o ) (c) 1998 Thomas Seidl SIGMOD ‘98 - 7

Number of Candidates 1.2 object and d max filter distances 1 k -th object 0.8 distance 0.6 0.4 dmax object distance 0.2 filter distance 0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 rank according to filter distance (c) 1998 Thomas Seidl SIGMOD ‘98 - 8

Optimality of k -NN Algorithms Lemma d ( , ) d ( , ) – Let d f be a lower-bounding filter of d o : ≤ p q p q f o – For a multi-step k -nn algorithm based on d o and d f , the optimal set of candidates is: { } d ( , ) ∈ ≤ ε o DB o q f k – where ε k is the k -th object similarity distance: { } ( ) max d ( , ) ε k = ∈ o q o NN k o q (c) 1998 Thomas Seidl SIGMOD ‘98 - 9

Optimal k -nn Algorithm (new) query (q,k) THEOREM: No false drops 1 No unnecessary init ranking on Index (d f ) 2 candidates Index while d f (o,q) ≤ d max do get next o from index is adjusted and adjust d max (d o ) step by step! d x a m Objects result final k -nn: d o (o,q) ≤ d max Required: Incremental Ranking on index (Hjaltason & Samet 95) (c) 1998 Thomas Seidl SIGMOD ‘98 - 10

Minimal Set of Candidates 1.2 object and primary d max filter distances 1 optimal d max 0.8 primary dmax 0.6 optimal dmax 0.4 filter distance 0.2 object distance 0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 rank according to filter distance The higher the filter distance, the better the filter selectivity (c) 1998 Thomas Seidl SIGMOD ‘98 - 11

Uniformly Distributed Data (20-D) 71,610 80000 number of candidates Experimental Setting 60000 42,891 • 100,000 Objects, 20-D 40000 26,546 • Matrices: sim-id, 1-0, 2-2 20000 370 358 1,118 • Queries: k = 10 (0.01%) 0 previous sim-id sim-1-0 sim-2-2 algorithm • Index: 15-D 1200 1,117 overall runtime [sec] 1000 optimal Avg. Improvement Factors algorithm 800 664 600 419 • Candidates: 72, 120, 64 400 • Overall Time: 26, 48, 23 200 16 14 48 0 sim-id sim-1-0 sim-2-2 similarity matrix (c) 1998 Thomas Seidl SIGMOD ‘98 - 12

2-D Shape Similarity (1,024-D) 2500 number of candidates Experimental Setting 2000 1500 • 10,000 Images, 32x32 Pixel 1000 • ‘Neighborhood Area’: 9-1 500 • Queries: k = 5 (0.05%) 0 previous algorithm 16-D 32-D 48-D 64-D • Index (KLT): 16-D, …, 64-D 300 overall runtime [sec] optimal 250 Avg. Improvement Factors algorithm 200 150 • Candidates: 2.3 100 • Overall Time: 1.6 to 2.3 50 0 16-D 32-D 48-D 64-D dimension of index (c) 1998 Thomas Seidl SIGMOD ‘98 - 13

Color Histograms (112-D) 10000 number of candidates Experimental Setting 8000 6000 • 112,700 Histograms (112-D) 4000 • Quadratic Form Distance 2000 • Queries: k = 2,…,12 (0.01%) previous 0 algorithm • Index (KLT): 12-D 2 4 6 8 10 12 120 optim al overall runtime [sec] 100 algorithm Avg. Improvement Factors 80 60 • Candidates: 17 40 • Overall Time: 8.5 20 0 2 4 6 8 10 12 query parameter k (c) 1998 Thomas Seidl SIGMOD ‘98 - 14

Conclusions • Complex Similarity Search : Expensive similarity evaluations • Multi-Step Approach : Lower-bounding filter distance function • Optimal Algorithm : Minimum number of exact evaluations • Average Improvement Factors : – up to 120 (number of candidates) – up to 48 (overall runtime) • Future Work : New applications; Integration with Data Mining (c) 1998 Thomas Seidl SIGMOD ‘98 - 15

Survey Similarity search for complex similarity models Analysis of - PowerPoint PPT Presentation

Optimal Multi-Step k -Nearest Neighbor Search Thomas Seidl and Hans-Peter Kriegel University of Munich, Germany ACM SIGMOD 98, Seattle Survey Similarity search for complex similarity models Analysis of previous solution for k -nn

Chapter 9. Survey Research Chapter 9. Survey Research survey research methods? survey research

Member Survey 2015 Survey method Surv Survey Monk y Monkey as survey platform, receiving 82

Annual Teen Health Survey 9 School Districts All 8 th , 10 th , and 12 th graders 3-year survey

2018 Monitoring Survey Results June 2018 Saolta Group Survey Overview June 2018 Saolta Survey

Staff Survey 2017 Summary of findings from the Pulse survey Our Survey Methodology Set up

CS 401 Max Flow Applications Xiaorui Sun 1 Survey Design Survey Design Survey design. Design

DOWNTOWN LINCOLN Historic Survey DOWNTOWN LINCOLN Historic Survey LINCOLN DOWNTOWN Historic

SURVEY - CA. Devendra H. Jain dhjainassociates@gmail.com Meaning of Survey MEANING OF SURVEY

SYLVAN GROVE Historic Survey SYLVAN GROVE Historic Survey https://khri.kansasgis.org/ SYLVAN

Savannah: A City-Wide Historic Resources Survey SURVEY PURPOSE AND IMPLEMENTATION OF SURVEY

Basic Needs Summit Presidents Cabinet Presentation Agenda Campus- wide Survey Results

PCCT Client Satisfaction Survey 2015 Type of Survey v The Client Satisfaction Survey was created

Adams County Quality of Life Survey Survey Presentation of Results 1 Adams County Quality of

Industry Economic Outlook Survey Detailed Survey Results: 2Q 2019 Survey Background

BIBLICAL SURVEY Christmas Class From here To here BIBLICAL SURVEY BIBLICAL SURVEY Christmas

AICPA Business and Industry Economic Outlook Survey Detailed Survey Results: 3Q 2019 Survey

SUCCESS BY DESIGN Design 2018 NACADA Region 1 Springfield, Massachusetts Inspired by: Randi

Embedded Citizen Participation Matthias Korn Citizen Participation Who will talk? When will

GeoParsing: the digitzation and historical georeferencing of text documents Stuart Dunn Centre

A comparison between MediaWiki, TWiki and XWiki communities FOSDEM Wiki devroom ULB, Brussels,

Challenges in making Challenges in making Free and Open Source Accessibility work Free and Open

Example PP technology application: LSST Ulrik Egede The L arge S ynoptic S urvey T elescope I.

Efficient one and multiple time-step simulation of the SABR model Alvaro Leitao, Lech A.

Improved Poincar and other classic inequalities: a new approach to prove them and some