A Novel Probabilistic Pruning Approach to Speed Up Similarity - PowerPoint PPT Presentation

LUDWIG- MAXIMILIANS- DEPARTMENT DATABASE UNIVERSITY INSTITUTE FOR SYSTEMS MUNICH INFORMATICS GROUP A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases Thomas Bernecker*, Tobias Emrich*, Hans-Peter Kriegel*, Nikos Mamoulis**, Matthias Renz* and Andreas Zuefle* *) **) Ludwig-Maximilians-Universität München (LMU) University of Hong Kong (HKU) Munich, Germany Hong Kong http://www.dbs.ifi.lmu.de http://www.cs.hku.hk {bernecker, emrich, kriegel, renz, zuefle} nikos@cs.hku.hk @dbs.ifi.lmu.de

Outline DATABASE SYSTEMS GROUP • Background – Uncertain Data Model – Similarity Queries • Probabilistic Pruning – Obtaining probability bounds – Using probability bounds for pruning • Evaluation A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases 2

Uncertain Data Model DATABASE SYSTEMS GROUP • Uncertain attribute An attribute x is uncertain if its value is given by a probabilistic density function (PDF), which describes all possible values v of x , associated with probability P( x = v ). − Discrete PDF (e.g. derived from missing data – See Julia’s talk, derived from time series data – See Saket’s talk) − Continuous PDF (e.g., sensor measurement error) A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases 3

Uncertain Data Model DATABASE SYSTEMS GROUP • Uncertain Object X − Has at least d ≥ 1 uncertain attributes. − X is a random variable, where the set of attribute values of X is described by a multi-dimensional probability distribution. − X has a spatial region UR X (Uncertain Region), where PDF X (t) > 0 if t � UR X and PDF X (t) = 0 otherwise. • Uncertain Object Database PDF X − Contains N uncertain objects A − Object Independence Assumption B C A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases 4

Probabilistic Similarity Queries DATABASE SYSTEMS GROUP • Probabilistic k-Nearest Neighbor query − What are the k objects closest to Q? • Probabilistic Similarity Ranking − Return all objects sorted by their distance to Q. • Probabilistic Reverse k-Nearest Neighbor queries • … B C Note: The query object may now be Q A uncertain.as well! A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases 5

Similarity Queries: Example DATABASE SYSTEMS GROUP • Probabilistic Nearest Neighbor query • Which object is the nearest neighbor of Q? B C Q A A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases 6

Similarity Queries: Example DATABASE SYSTEMS GROUP • Probabilistic Nearest Neighbor queries • Which object is the nearest neighbor of Q? B C Q A In some possible worlds A is the nearest neighbor of Q, … A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases 7

Similarity Queries: Example DATABASE SYSTEMS GROUP • Probabilistic Nearest Neighbor queries • Which object is the nearest neighbor of Q? B C Q A …in other possible worlds, A is not the nearest neighbor of Q. A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases 8

General Framework DATABASE SYSTEMS GROUP • Efficient probabilistic similarity search: – Approximation (Index) • Simplification of spatial-probabilistic keys – Spatial Filter • Filter objects according to simple spatial keys – Probabilistic Filter • Derive lower/upper bounds of qualification probability (by means of simple spatial-probabilistic keys) • Filter objects according to lower/upper probability bounds – Verification • Computation of the exact probability (very expensive) • Monte-Carlo Sampling (many samples required) A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases 9

Spatial Filter DATABASE SYSTEMS GROUP Pruning based on rectangular approximations only [1]. For any Q in this region, A may possibly be closer to Q than B. For any Q in this region, A is closer For any Q in this to Q than B. region, A is not A closer to Q than B. B [1] Tobias Emrich, Hans-Peter Kriegel, Peer Kröger, Matthias Renz, Andreas Züfle: Boosting Spatial Pruning: On Optimal Pruning of MBRs. SIGMOD Conference 2010: 39-50 A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases 10

Probabilistic Pruning DATABASE SYSTEMS GROUP How many objects are closer to Q than A? B 2 A A B 1 Q Q Lower Probability Bound Upper Probability Bound “B 1 is closer to Q than A with a “B 2 is closer to Q than A with a Probability of at least x%” Probability of at most x%” A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases 11

Uncertain Generating Functions DATABASE SYSTEMS GROUP • What we have now is: − B 1 is closer to Q than A with a probability of at least p 1 lb and at most p 1 ub − B 2 is closer to Q than A with a probability of at least p 2 lb and at most p 2 ub − ... • How can we derive the probability that at least (at most, exactly) k objects are closer to Q than A? A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases 12

Uncertain Generating Functions DATABASE SYSTEMS GROUP • Let φ be a predicate and let X 1 , …, X n be uncertain objects. lb and p i ub be lower and upper bounds of the Let p i probability that X i satisfies φ . • How many objects satisfy φ ? • We consider the following generating function: n ∏ + − + − lb ub lb ub p x ( p p ) y ( 1 p ) i i i i = i 1 A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases 13

Example DATABASE SYSTEMS GROUP • Assume the following probability bounds have been derived: − X 1 satisfies φ with a probability of at least 0.2 and at most 0.5 − X 2 satisfies φ with a probability of at least 0.6 and at most 0.8 • What is the probability that the number #X of objects that satisfy φ is at least (at most, exactly) k ? − Consider the following Generating Function: (0.2x + 0.3y + 0.5) * (0.6x + 0.2y + 0.2) − Expansion yields: 0.12x² + 0.34x + 0.1 + 0.22xy + 0.16y + 0.06y² A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases 14

Uncertain Generating Functions DATABASE SYSTEMS GROUP − Expansion yields: 0.12 x² + 0.34x + 0.1 + 0.22xy + 0.16y + 0.06y² P(#X =k ) 80 % 60 % 40 % 20 % k 0 1 2 A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases 15

Uncertain Generating Functions DATABASE SYSTEMS GROUP − Expansion yields: 0.12x² + 0.34 x + 0.1 + 0.22xy + 0.16y + 0.06y² P(#X =k ) 80 % 60 % 40 % 20 % k 0 1 2 A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases 16

Uncertain Generating Functions DATABASE SYSTEMS GROUP − Expansion yields: 0.12x² + 0.34x + 0.1 + 0.22xy + 0.16y + 0.06y² P(#X =k ) 80 % 60 % 40 % 20 % k 0 1 2 A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases 17

Uncertain Generating Functions DATABASE SYSTEMS GROUP − Expansion yields: 0.12x² + 0.34x + 0.1 + 0.22xy + 0.16 y + 0.06 y² P(#X =k ) 80 % 60 % 40 % 20 % k 0 1 2 A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases 18

Uncertain Generating Functions DATABASE SYSTEMS GROUP − Expansion yields: 0.12x² + 0.34 x + 0.1 + 0.22 xy + 0.16 y + 0.06 y² P(#X =k ) 80 % 60 % 40 % 20 % k 0 1 2 A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases 19

Uncertain Generating Functions DATABASE SYSTEMS GROUP − Expansion yields: 0.12 x² + 0.34x + 0.1 + 0.22 xy + 0.16y + 0.06 y² P(#X =k ) 80 % 60 % 40 % 20 % k 0 1 2 A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases 20

Approximated PDF DATABASE SYSTEMS GROUP The result is an approximated PDF of #X . P(#X =k ) 80 % 60 % 40 % 20 % k 0 1 2 A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases 21

Uncertain Generating Functions DATABASE SYSTEMS GROUP P(#X =k ) 80% 60% 40% 20% k 0 1 2 Now let #X denote the number of objects that are closer to Q than A . The pdf of #X corresponds directly of the similarity rank of A to Q . Example Query: Return all objects that are the nearest neighbor of Q with a probability of at least 50%. � A can be pruned. A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases 22

Uncertain Generating Functions DATABASE SYSTEMS GROUP P(#X =k ) 80% 60% 40% 20% k 0 1 2 Now let #X denote the number of objects that are closer to Q than A . The pdf of #X corresponds directly of the similarity rank of A to Q . Example Query: Return the most likely rank of each object. � For A , Rank 1 can be pruned. A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases 23

Evaluation DATABASE SYSTEMS GROUP 180 160 140 runtime (sec) τ = 0.5 with PF 120 MC w/o PF 100 80 60 40 20 0 1 3 5 7 9 11 13 15 17 19 21 23 25 k A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases 24

A Novel Probabilistic Pruning Approach to Speed Up Similarity - PowerPoint PPT Presentation

LUDWIG- MAXIMILIANS- DEPARTMENT DATABASE UNIVERSITY INSTITUTE FOR SYSTEMS MUNICH INFORMATICS GROUP A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases Thomas Bernecker, Tobias Emrich,

Natural Target Pruning Making Proper Pruning Cuts Natural Target Pruning In this lesson we

BASICS Natural Target Pruning Terminology and Tools Reasons for Pruning Fruit Trees

Pruning for Cropload Management and Productivity 2013 Winter Pruning Workshop Dr. Mercy

EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis Chaoqi Wang , Roger Grosse,

Novel Gaits for a Novel Novel Gaits for a Novel Crawling/Grasping Mechanism Crawling/Grasping

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Berries, Grapes and Kiwi Pruning Blueberries Prune to an open vase shape, leaving 4 to 6

ENVIRONMENT STANDING COMMITTEE 18 September 2017 Street Trees & Pruning Requests Criteria

Identification of Pruning Branches for for Automated Dormant Pruning M Manoj Karkee j K k

Welcome to the DCGO Presentation Basic Pruning Agenda Reasons for Pruning Tools

What is the State of Neural Network Pruning? Davis Blalock* Jose Javier Gonzalez* Jonathan

More on games (Ch. 5.4-5.6) Announcements Writing 2 posted Minimax Pruning in real life:

Properties of - //the leaf node (terminal state) ) 9 ( The - algorithm //the leaf

Random Sampling Revisited: Lattice Enumeration with Discrete Pruning Yoshinori Aono

Alpha- -beta pruning beta pruning Example Alpha Example reduce the branching factor of

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Focusing on What Really Matters: Irrelevance Pruning in M&S Alvaro Torralba, Peter

Parallel Test Generation and Execution with Korat Sasa Misailovic (Univ. of Belgrade) Aleksandar

V-Combiner: Speeding-up Iterative Graph Processing on a Shared-memory Platform with Vertex

Lower Bounds on Lattice Enumeration with Extreme Pruning Yoshinori Aono Phong Nguyn Takenobu

More on games (Ch. 5.4-5.6) Review: Minimax Afro Deli Shuang Cheng Cheese- Fried Lo Mein

THE LOTTERY TICKET HYPOTHESIS: FINDING SPARSE, TRAINABLE NEURAL NETWORKS Jonathan Frankle,

SSA Technicalities Last Time Introduced SSA Today Aliasing in SSA Building SSA

CS133 Computational Geometry Computational Geometry on Big Data 1 Big Geometric Data Geotagged

Sambuz

Useful Links

Newsletter

Mail Us

A Novel Probabilistic Pruning Approach to Speed Up Similarity - PowerPoint PPT Presentation

LUDWIG- MAXIMILIANS- DEPARTMENT DATABASE UNIVERSITY INSTITUTE FOR SYSTEMS MUNICH INFORMATICS GROUP A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases Thomas Bernecker*, Tobias Emrich*,

Natural Target Pruning Making Proper Pruning Cuts Natural Target Pruning In this lesson we

BASICS Natural Target Pruning Terminology and Tools Reasons for Pruning Fruit Trees

Pruning for Cropload Management and Productivity 2013 Winter Pruning Workshop Dr. Mercy

EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis Chaoqi Wang , Roger Grosse,

Novel Gaits for a Novel Novel Gaits for a Novel Crawling/Grasping Mechanism Crawling/Grasping

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Berries, Grapes and Kiwi Pruning Blueberries Prune to an open vase shape, leaving 4 to 6

ENVIRONMENT STANDING COMMITTEE 18 September 2017 Street Trees &amp; Pruning Requests Criteria

Identification of Pruning Branches for for Automated Dormant Pruning M Manoj Karkee j K k

Welcome to the DCGO Presentation Basic Pruning Agenda Reasons for Pruning Tools

What is the State of Neural Network Pruning? Davis Blalock* Jose Javier Gonzalez* Jonathan

More on games (Ch. 5.4-5.6) Announcements Writing 2 posted Minimax Pruning in real life:

Properties of - //the leaf node (terminal state) ) 9 ( The - algorithm //the leaf

Random Sampling Revisited: Lattice Enumeration with Discrete Pruning Yoshinori Aono

Alpha- -beta pruning beta pruning Example Alpha Example reduce the branching factor of

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Focusing on What Really Matters: Irrelevance Pruning in M&amp;S Alvaro Torralba, Peter

Parallel Test Generation and Execution with Korat Sasa Misailovic (Univ. of Belgrade) Aleksandar

V-Combiner: Speeding-up Iterative Graph Processing on a Shared-memory Platform with Vertex

Lower Bounds on Lattice Enumeration with Extreme Pruning Yoshinori Aono Phong Nguyn Takenobu

More on games (Ch. 5.4-5.6) Review: Minimax Afro Deli Shuang Cheng Cheese- Fried Lo Mein

THE LOTTERY TICKET HYPOTHESIS: FINDING SPARSE, TRAINABLE NEURAL NETWORKS Jonathan Frankle,

SSA Technicalities Last Time Introduced SSA Today Aliasing in SSA Building SSA

CS133 Computational Geometry Computational Geometry on Big Data 1 Big Geometric Data Geotagged

Sambuz

Useful Links

Newsletter

Mail Us

LUDWIG- MAXIMILIANS- DEPARTMENT DATABASE UNIVERSITY INSTITUTE FOR SYSTEMS MUNICH INFORMATICS GROUP A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases Thomas Bernecker, Tobias Emrich,

ENVIRONMENT STANDING COMMITTEE 18 September 2017 Street Trees & Pruning Requests Criteria

Focusing on What Really Matters: Irrelevance Pruning in M&S Alvaro Torralba, Peter