Keyword Searching in Hypercubic Manifolds Yu- -En Lu En Lu , - PowerPoint PPT Presentation

Keyword Searching in Hypercubic Manifolds Yu- -En Lu En Lu , Steven Hand, Pietro Lio Yu University of Cambridge Computer Laboratory

Motivation � Unstructured P2P networks such as Guntella evaluates complex queries by flooding the network while nothing can be guaranteed � Distributed Hash Tables evaluates simple queries due to hashing whilst guarantee is provided (at least theoretically) � What if we may cluster similar objects in similar regions of network via only hashing? � No preprocessing is needed � No global knowledge required, only the hash � Plug & play on top of current DHT designs

Types of Queries F Exact Query DHT Systems K 1 ∧ K 2 ∧ K 3 e.g. “Harry Potter V.mpg” F Range Query K 1 ∧ K 2 ∧ K 3 ∧ K 4 ≥ 128 PHT, Mercury, P-Grid etc. e.g. “Harry Potter V.mpg AND bit-rate > 128kbps” F Partial Match Query K 1 ∧ K 2 ∨ K 3 ∨ K 4 Qube “Harry Potter [III,IV].mpg’ F Flawed Query K i +1 ∧ K j − 20 Qube “Hary Porter.mpg”

A Projection of Object Feature Space Movie French Cuisine Possible answers to Query the query Harry

A Qube View of the Mappings: Features - Overlay – Nodes Bon jovi Harry potter Vertices/ Abstract Graph Nodes/ Network topology � Each object is represented as a bit-string where 1 denotes it contains a keyword and 0 means not � Each bit string is then hashed onto the P2P name space � The nodes in the network chooses positions in the P2P space randomly and links with each other in some overlay topology. In our case, a hypercube is used.

Design Principles and Fundamental Trade-offs � Latency vs. Message Complexity � Low message complexity usually means low latency � Not entirely true for DHT systems � Fairness vs. Performance � Sending everything to a handful of ultra-peers is fast and simple � Having things spread across the network means fairer system (and perhaps better availability) � Storage vs. Synchronisation complexity � Most popular queries may be processed by querying one random node due to generous replication/caching � For some applications such as distributed inverted index, frequent synchronisation is costy

Hashing and Network Topology Keywords: Harry Potter music movie 7 � Summary Hash h : { 0 , 1 } ∗ → { 0 , 1 } b � Non-expansive: 0 0 0 1 0 1 0 0 1 0 d(x,y) ≤ d ( h ( x ) , h ( y )) � Fair Partitioning: 0 1 h − 1 ( u ) = h − 1 ( v ) � Keyword edges linking Harry Potter Harry Potter nodes in one word music distance Harry Potter � Similar objects are located Harry Potter movie 7 in manifolds of the movie Hypercube

Query Processing Harry Potter Music 0011 1011 0111 1111 0010 1010 1110 0110 Harry Potter Harry Potter VII 0001 Harry 0101 1001 1101 0000 0100 1000 1000 1100 0001 0011 0101 0111 1001 1011 1101 1111 0000 0010 0100 0110 1000 1010 1100 1110

Experimental Setup � A Hypercube DHT is instantiated where end-to-end distance is drawn from King dataset* � King dataset contains latency measurements of a set of DNS servers � Surrogates to a logical ID is chosen based on Plaxton style post-fix matching scheme � Nodes choose DHT IDs randomly � No network proximity to expose worst case performance and tradeoff with dimensionality and caching � A sample of FreeDB**, a free online CD album containing 20 million songs, is used to reflect actual objects in real world � Gnutella query traces served as our query samples * http://pdos.csail.mit.edu/p2psim/kingdata/ ** http://www.freedb.org

Retrieval Cost * Recall rate = percentage of relevant objects found ** Legend tuples denotes (b,n) where b is dimensionality and n is the size of the network

Query Latency in Wide Area Selection of b controls the degree of clustering

Network Performance * This result takes the query “bon jovi” for example where there are 3242 distinct, related songs in FreeDB

Conclusion � Qube spreads objects across the network by their similarity � Better fairness and availability � Zero preprocessing � Little synchronisation need � By tuning parameter b, one may choose the degree of performance/fairness tradeoff � Further investigating lower latency schemes to trim probing cost and decouple query accuracy from network size

Future Work � Large scale simulation (>100K nodes with realistic network latency generator) � Flash crowds query model and replication/caching � Distributed proximity-searches such as kNN under Euclidean metric

Thank you! Yu-En.Lu@cl.cam.ac.uk

Keyword Searching in Hypercubic Manifolds Yu- -En Lu En Lu , - PowerPoint PPT Presentation

Keyword Searching in Hypercubic Manifolds Yu- -En Lu En Lu , Steven Hand, Pietro Lio Yu University of Cambridge Computer Laboratory Motivation Unstructured P2P networks such as Guntella evaluates complex queries by flooding the network

Searching in speech Language and Keyword searching in OSCAR Language and Computers Computers

Complexity of the hypercubic billiard Nicolas Bedaride Laboratoire dAnalyse Topologie

Outline Searching Computers Computers Computers Topic 2: Searching Topic 2: Searching Topic

Hierarchical Clustering on Special Manifolds Motivation Background Manifolds Angelos Markos 1

Vector Bundle Valued Differential Forms on Non-Negatively Graded DG Manifolds Luca Vitagliano

Linguistics 384: Language and Computers Operators Searching the web Topic 2: Searching

Bayes-Nash Price of Anarchy for GSP Renato Paes Leme va Tardos Cornell University Keyword

A glimpse to sponsored search auctions Maria Serna Fall 2016 AGT-MIRI Sponsored search Keyword

MANIFOLDS AND DUALITY ANDREW RANICKI Classication of manifolds Uniqueness

Analysis on singular spaces, Lie manifolds, and non-commutative geometry II Lie manifolds Victor

Embedding 3-manifolds via surgery on surfaces Kyle Larson University of Texas at Austin

Searching Documents and Pages Searching Documents and Pages Searching Documents and Pages Prof.

Searching and Sorting Mason Vail, Boise State University Computer Science Searching Searching is

Chapter 5 Searching and Binary Search Trees 5.1 Searching sequence The purpose of searching :

Searching Tiziana Ligorio 1 Todays Plan Searching algorithms and their analysis 2

Sorting and Searching Topic 11 Sorting and Searching S ti d S hi Fundamental problems in

Responsibility towards environment RECICLADOR a reference company in the field of protecting

Focusky Presentation Software 3.6.10 Crack With Serial Key Focusky Presentation Software 3.6.10

GBI Tax Exemption M&E Costs, Innovation & Certification by Ar, Ir and Sr 17 th March 2010

MY SOUTH AFRICAN JOURNEY DIOGO MAILYS AFRICA UNITE WHERE AM I FROM? BENIN IS MY ORIGIN COUNTRY

Putting tec Putting tech h at the t the hear heart of t of an activ an active city e city

LOCAL CONTROL Pr Presented esented By: By: ACCOUNTABILITY PLANS Mickey Porter Asst. I N I

Water Resource Impacts - M - Mount B Baldy 09/09/2014, Council Meeting: Education Series, B. L.

Measuring trade in services Concepts and definitions Training Workshop on Trade in Services

Keyword Searching in Hypercubic Manifolds Yu- -En Lu En Lu , - PowerPoint PPT Presentation

Keyword Searching in Hypercubic Manifolds Yu- -En Lu En Lu , Steven Hand, Pietro Lio Yu University of Cambridge Computer Laboratory Motivation Unstructured P2P networks such as Guntella evaluates complex queries by flooding the network

Searching in speech Language and Keyword searching in OSCAR Language and Computers Computers

Complexity of the hypercubic billiard Nicolas Bedaride Laboratoire dAnalyse Topologie

Outline Searching Computers Computers Computers Topic 2: Searching Topic 2: Searching Topic

Hierarchical Clustering on Special Manifolds Motivation Background Manifolds Angelos Markos 1

Vector Bundle Valued Differential Forms on Non-Negatively Graded DG Manifolds Luca Vitagliano

Linguistics 384: Language and Computers Operators Searching the web Topic 2: Searching

Bayes-Nash Price of Anarchy for GSP Renato Paes Leme va Tardos Cornell University Keyword

A glimpse to sponsored search auctions Maria Serna Fall 2016 AGT-MIRI Sponsored search Keyword

MANIFOLDS AND DUALITY ANDREW RANICKI Classication of manifolds Uniqueness

Analysis on singular spaces, Lie manifolds, and non-commutative geometry II Lie manifolds Victor

Embedding 3-manifolds via surgery on surfaces Kyle Larson University of Texas at Austin

Searching Documents and Pages Searching Documents and Pages Searching Documents and Pages Prof.

Searching and Sorting Mason Vail, Boise State University Computer Science Searching Searching is

Chapter 5 Searching and Binary Search Trees 5.1 Searching sequence The purpose of searching :

Searching Tiziana Ligorio 1 Todays Plan Searching algorithms and their analysis 2

Sorting and Searching Topic 11 Sorting and Searching S ti d S hi Fundamental problems in

Responsibility towards environment RECICLADOR a reference company in the field of protecting

Focusky Presentation Software 3.6.10 Crack With Serial Key Focusky Presentation Software 3.6.10

GBI Tax Exemption M&amp;E Costs, Innovation &amp; Certification by Ar, Ir and Sr 17 th March 2010

MY SOUTH AFRICAN JOURNEY DIOGO MAILYS AFRICA UNITE WHERE AM I FROM? BENIN IS MY ORIGIN COUNTRY

Putting tec Putting tech h at the t the hear heart of t of an activ an active city e city

LOCAL CONTROL Pr Presented esented By: By: ACCOUNTABILITY PLANS Mickey Porter Asst. I N I

Water Resource Impacts - M - Mount B Baldy 09/09/2014, Council Meeting: Education Series, B. L.

Measuring trade in services Concepts and definitions Training Workshop on Trade in Services

GBI Tax Exemption M&E Costs, Innovation & Certification by Ar, Ir and Sr 17 th March 2010