HashGraph : Semantic Hashing using external knowledge base. C. - PowerPoint PPT Presentation

HashGraph : Semantic Hashing using external knowledge base. C. Gravier 1 , J. Subercaze 1 1 Satin team, LT2C laboratory Universit´ e Jean Monnet ecom Saint-´ T´ el´ Etienne, France C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 1 / 43

Preambule Outline Preambule 1 Semantic Hashing 2 Introduction Existing solutions HashGraph 3 User profile : graph of terms Graph to binary footprint Evaluation HashGraph and HashWordnet 4 On hashing node values Using an exertnal is-a taxonomy Demos 5 C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 2 / 43

Preambule References This presentation is based on : ◮ [BambaCIKM12] : Bamba P., Subercaze J., Gravier C., Benmira N., Fontaine J., The Twitaholic Next Door, Proc. of 21st ACM International Conference on Information and Knowledge Management (CIKM’12), pp.2275–2278, Maui, Hawai’i, USA, October, 30th 2012 ◮ [SubercazeWI13] : Subercaze J., Gravier C., HashGraph : an expressive and scalable Twitter users profile for recommendation, 2013 IEEE/WIC/ACM International Conference on Web Intelligence (WI’13), Atlanta, USA, November 17th–20th, 2013 .. with a different agenda, additional informations and thoughts. C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 3 / 43

Preambule Who are we ? ◮ Christophe Gravier ◮ Associate Professor in Computer Science ecom Saint-´ ◮ Working at T´ el´ Etienne (Universit´ e Jean Monnet) ◮ Julien Subercaze ◮ Researcher in Computer Science ecom Saint-´ ◮ Working at T´ el´ Etienne (Universit´ e Jean Monnet) ◮ Contacts : ◮ mail: { julien.subercaze,christophe.gravier } @univ-st-etienne.fr ◮ homepage : http://satin-ppl.telecom-st-etienne.fr/cgravier/ and http://satin-ppl.telecom-st-etienne.fr/jsubercaze/ ◮ twitter : @chgravier and @JulienSubercaze C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 4 / 43

Semantic Hashing Outline Preambule 1 Semantic Hashing 2 Introduction Existing solutions HashGraph 3 User profile : graph of terms Graph to binary footprint Evaluation HashGraph and HashWordnet 4 On hashing node values Using an exertnal is-a taxonomy Demos 5 C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 5 / 43

Semantic Hashing Introduction Hashing techniques for Information Retrieval ◮ Methods for embedding high dimensional data into a similarity-preserving low-dimensional Hamming space [Kim and Choi, 2011]. ◮ Usually the hash space is an ”absolute partitioning of the space of document representation” [Stein and Potthast, 2007] C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 6 / 43

Semantic Hashing Introduction Hashing techniques for Information Retrieval ◮ Methods for embedding high dimensional data into a similarity-preserving low-dimensional Hamming space [Kim and Choi, 2011]. ◮ Usually the hash space is an ”absolute partitioning of the space of document representation” [Stein and Potthast, 2007] Historically, learn h φ that partitions the Hamming space so that two documents that are at least close to θ threshold of similarity in the original space, are associated to the same Figure: Hashing for information bucket in the Hamming space. retrieval (From [Stein and Potthast, 2007]) C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 6 / 43

Semantic Hashing Introduction Semantic hashing Similarity Search ◮ In similarity search, a document is used as the query ◮ This is fundamentally different with the standard keyword search paradigm, e.g., in TREC [Zhang et al., 2010]. C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 7 / 43

Semantic Hashing Introduction Semantic hashing Similarity Search ◮ In similarity search, a document is used as the query ◮ This is fundamentally different with the standard keyword search paradigm, e.g., in TREC [Zhang et al., 2010]. Semantic Hashing Semantic hashing is about providing the h φ function(s) for providing an index in the Hamming space for fast similarity search . C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 7 / 43

Semantic Hashing Introduction kNN and ǫ − kNN problems ◮ We use a document q as a query: hash it to identify its bucket and then we use the bucket value to address the two problems below 1 : 1 as coined by the founding paper on Semantic Hashing [Gionis et al., 1999] C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 8 / 43

Semantic Hashing Introduction kNN and ǫ − kNN problems ◮ We use a document q as a query: hash it to identify its bucket and then we use the bucket value to address the two problems below 1 : 1. kNN search : Find k nearest documents from hash ( q ) in the Hamming space (aka top-K search). 1 as coined by the founding paper on Semantic Hashing [Gionis et al., 1999] C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 8 / 43

Semantic Hashing Introduction kNN and ǫ − kNN problems ◮ We use a document q as a query: hash it to identify its bucket and then we use the bucket value to address the two problems below 1 : 1. kNN search : Find k nearest documents from hash ( q ) in the Hamming space (aka top-K search). 2. ǫ − kNN search: Find all documents p , d ( q , p ) ≥ ( 1 + ǫ ) × d ( q , P ) , where d ( q , P ) is the distance of q to the its closest point in P (Hamming ball of size ǫ ) 1 as coined by the founding paper on Semantic Hashing [Gionis et al., 1999] C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 8 / 43

Semantic Hashing Introduction kNN and ǫ − kNN problems ◮ We use a document q as a query: hash it to identify its bucket and then we use the bucket value to address the two problems below 1 : 1. kNN search : Find k nearest documents from hash ( q ) in the Hamming space (aka top-K search). 2. ǫ − kNN search: Find all documents p , d ( q , p ) ≥ ( 1 + ǫ ) × d ( q , P ) , where d ( q , P ) is the distance of q to the its closest point in P (Hamming ball of size ǫ ) Remark on Perfect Semantic Hashing It is possible to provide a perfect hashing scheme [Linial et al., 1995], but at a prohibitive code length cost. All semantic hashing schemes try to provide either an approximation (which means hashing with semantic-relatedness preservation guarantees) or a heuristic. 1 as coined by the founding paper on Semantic Hashing [Gionis et al., 1999] C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 8 / 43

Semantic Hashing Introduction A ”good” Semantic Hashing function ? C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 9 / 43

Semantic Hashing Introduction A ”good” Semantic Hashing function ? 1. Entropy maximizing [Baluja and Covell, 2008]. Large coverage of the set of 2 l binary strings of length l . C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 9 / 43

Semantic Hashing Introduction A ”good” Semantic Hashing function ? 1. Entropy maximizing [Baluja and Covell, 2008]. Large coverage of the set of 2 l binary strings of length l . 2. Complexity . Obvisouly, a ”good semantic hashing” would exhibit a computational complexity as low as possible. C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 9 / 43

Semantic Hashing Introduction A ”good” Semantic Hashing function ? 1. Entropy maximizing [Baluja and Covell, 2008]. Large coverage of the set of 2 l binary strings of length l . 2. Complexity . Obvisouly, a ”good semantic hashing” would exhibit a computational complexity as low as possible. 3. Monotonicity . The quality of the embedding should improve with the increase of bits dedicated to the array of bits. C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 9 / 43

Semantic Hashing Introduction A ”good” Semantic Hashing function ? 1. Entropy maximizing [Baluja and Covell, 2008]. Large coverage of the set of 2 l binary strings of length l . 2. Complexity . Obvisouly, a ”good semantic hashing” would exhibit a computational complexity as low as possible. 3. Monotonicity . The quality of the embedding should improve with the increase of bits dedicated to the array of bits. 4. Independance to dimensions [Stein and Potthast, 2007]. As most approaches relies on embedding a high dimensional space of dimension d into a Hamming space of dimension d ′ , the semantic hashing strategy should scale well w.r.t. to the increase of d . C. Gravier, J. Subercaze (Universities of) HashGraph : Semantic Hashing using external knowledge base. 9 / 43

HashGraph : Semantic Hashing using external knowledge base. C. - PowerPoint PPT Presentation

HashGraph : Semantic Hashing using external knowledge base. C. Gravier 1 , J. Subercaze 1 1 Satin team, LT2C laboratory Universit e Jean Monnet ecom Saint- T el Etienne, France C. Gravier, J. Subercaze (Universities of) HashGraph :

Today. Cuckoo hashing. Today. Cuckoo hashing. Johnson-Lindenstrass. Cuckoo hashing. Hashing

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

Overview Intro to Hashing Intro to Hashing Hashing with Chaining Whats hashing?

Database Systems Index: Hashing Based on slides by Feifei Li, University of Utah Hashing n

Hashing (Application of Probability) Ashwinee Panda Final CS 70 Lecture! 9 Aug 2018 Overview

Hashing Connections 2-Universal Hash Function Perfect Hashing Anil Maheshwari Proofs

Union-Find [10] In the last class Hashing Collision Handling for Hashing Closed

Hashing Chapter 5 1 Objectives Understand the idea of hashing Compare hashing to sorting

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

Hashing Hashing What is it? A form of narcotic intake? A side order for your eggs? A

Lecture 8: Hashing I Lecture Overview Dictionaries and Python Motivation Prehashing

Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files

Advanced Algorithms COMS31900 Hashing part two Static Perfect Hashing Rapha el Clifford

TOWN OF SACKVILLE 2017 Tax Base $629,240,300 2018 Tax Base $619,997,885 2019 Tax Base

Treating metadata in agriculture Treating metadata in agriculture using Semantic MediaWiki using

Higher order solution of ODEs arising from DG space semi-discretization of nonstationary

High-Order, Time-Dependent Aerodynamic Optimization using a Discontinuous Galerkin Discretization

Modeling and Simulation of Physical Systems for Hobbyists Manuel Aiple 35C3 29 December 2018

Modelling Biochemical Reaction Networks Lecture 8: Stiff differential equations Marc R. Roussel

PyFR: PastPresentFuture P. E. Vincent Department of Aeronautics, Imperial College London 19

for Digital Systems Logic minimization algorithm summary 1 Sources: TSR, Katz, Boriello &

Overview Scope of ACADO Toolkit An Optimal Control Tutorial Example Algorithms and

Plan Introduction 1 On categorial grammars and learnability 2 Logical Information Systems

HashGraph : Semantic Hashing using external knowledge base. C. - PowerPoint PPT Presentation

HashGraph : Semantic Hashing using external knowledge base. C. Gravier 1 , J. Subercaze 1 1 Satin team, LT2C laboratory Universit e Jean Monnet ecom Saint- T el Etienne, France C. Gravier, J. Subercaze (Universities of) HashGraph :

Today. Cuckoo hashing. Today. Cuckoo hashing. Johnson-Lindenstrass. Cuckoo hashing. Hashing

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

Overview Intro to Hashing Intro to Hashing Hashing with Chaining Whats hashing?

Database Systems Index: Hashing Based on slides by Feifei Li, University of Utah Hashing n

Hashing (Application of Probability) Ashwinee Panda Final CS 70 Lecture! 9 Aug 2018 Overview

Hashing Connections 2-Universal Hash Function Perfect Hashing Anil Maheshwari Proofs

Union-Find [10] In the last class Hashing Collision Handling for Hashing Closed

Hashing Chapter 5 1 Objectives Understand the idea of hashing Compare hashing to sorting

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

Hashing Hashing What is it? A form of narcotic intake? A side order for your eggs? A

Lecture 8: Hashing I Lecture Overview Dictionaries and Python Motivation Prehashing

Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files

Advanced Algorithms COMS31900 Hashing part two Static Perfect Hashing Rapha el Clifford

TOWN OF SACKVILLE 2017 Tax Base $629,240,300 2018 Tax Base $619,997,885 2019 Tax Base

Treating metadata in agriculture Treating metadata in agriculture using Semantic MediaWiki using

Higher order solution of ODEs arising from DG space semi-discretization of nonstationary

High-Order, Time-Dependent Aerodynamic Optimization using a Discontinuous Galerkin Discretization

Modeling and Simulation of Physical Systems for Hobbyists Manuel Aiple 35C3 29 December 2018

Modelling Biochemical Reaction Networks Lecture 8: Stiff differential equations Marc R. Roussel

PyFR: PastPresentFuture P. E. Vincent Department of Aeronautics, Imperial College London 19

for Digital Systems Logic minimization algorithm summary 1 Sources: TSR, Katz, Boriello &amp;

Overview Scope of ACADO Toolkit An Optimal Control Tutorial Example Algorithms and

Plan Introduction 1 On categorial grammars and learnability 2 Logical Information Systems

for Digital Systems Logic minimization algorithm summary 1 Sources: TSR, Katz, Boriello &