IR: Information Retrieval
FIB, Master in Innovation and Research in Informatics Slides by Marta Arias, José Luis Balcázar, Ramon Ferrer-i-Cancho, Ricard Gavaldá
Department of Computer Science, UPC
Fall 2018 http://www.cs.upc.edu/~ir-miri
1 / 1
IR: Information Retrieval FIB, Master in Innovation and Research in - - PowerPoint PPT Presentation
IR: Information Retrieval FIB, Master in Innovation and Research in Informatics Slides by Marta Arias, Jos Luis Balczar, Ramon Ferrer-i-Cancho, Ricard Gavald Department of Computer Science, UPC Fall 2018 http://www.cs.upc.edu/~ir-miri 1
1 / 1
3 / 1
4 / 1
5 / 1
6 / 1
7 / 1
8 / 1
9 / 1
10 / 1
◮ By stacking together k hash functions ◮ h(x) = (h1(x), .., hk(x)) where hi ∈ F ◮ Probability of collision of similar objects decreases to sk ◮ Probability of collision of dissimilar objects decreases even
◮ By repeating the process m times ◮ Probability of collision of similar objects increases to
◮ Choosing k and m appropriately, can achieve a family that
11 / 1
12 / 1
13 / 1
◮ for each x ∈ X ◮ stack k hash functions and form xi = (h1(x), .., hk(x)) ◮ store x in bucket given by f(xi)
◮ stack k hash functions and form qi = (h1(q), .., hk(q)) ◮ Zi = { objects found in bucket f(qi)} ◮ Z = Z ∪ Zi
14 / 1
15 / 1
16 / 1
17 / 1
18 / 1
19 / 1