E ffi cient and E ff ective Query Auto-Completion Giulio Ermanno - - PowerPoint PPT Presentation

e ffi cient and e ff ective query auto completion
SMART_READER_LITE
LIVE PREVIEW

E ffi cient and E ff ective Query Auto-Completion Giulio Ermanno - - PowerPoint PPT Presentation

E ffi cient and E ff ective Query Auto-Completion Giulio Ermanno Pibiri Simon Gog Rossano Venturini ACM Conference on Research and Development in Information Retrieval (SIGIR), 2020 27/07/2020 Query Auto-Completion Given a collection S of


slide-1
SLIDE 1

Efficient and Effective Query Auto-Completion

Giulio Ermanno Pibiri

Rossano Venturini

ACM Conference on Research and Development in Information Retrieval (SIGIR), 2020

27/07/2020

Simon Gog

slide-2
SLIDE 2

Query Auto-Completion

Given a collection S of scored strings and a partially completed user query Q, find the top-k strings that “match” Q in S.

slide-3
SLIDE 3

Setting

We focus on matching algorithms, not ranking mechanisms: we return the “most popular” results from a query log. Many matching algorithms are possible, such as: exact, prefix, pattern (substring), edit-distance…

slide-4
SLIDE 4

Setting

We focus on matching algorithms, not ranking mechanisms: we return the “most popular” results from a query log. Many matching algorithms are possible, such as: exact, prefix, pattern (substring), edit-distance…

prefix conjunctive

slide-5
SLIDE 5

Setting

We focus on matching algorithms, not ranking mechanisms: we return the “most popular” results from a query log. Many matching algorithms are possible, such as: exact, prefix, pattern (substring), edit-distance…

prefix conjunctive

slide-6
SLIDE 6

Return strings containing all the tokens in the prefix and any token prefixed by the suffix.

Build an inverted index where
 docids are assigned in decreasing score order: smaller docids are better.

Conjunctive-Search

slide-7
SLIDE 7

Return strings containing all the tokens in the prefix and any token prefixed by the suffix.

bmw s|

Build an inverted index where
 docids are assigned in decreasing score order: smaller docids are better.

Conjunctive-Search

slide-8
SLIDE 8

Return strings containing all the tokens in the prefix and any token prefixed by the suffix.

bmw s|

bmw i3 sedan

Build an inverted index where
 docids are assigned in decreasing score order: smaller docids are better.

Conjunctive-Search

slide-9
SLIDE 9

Return strings containing all the tokens in the prefix and any token prefixed by the suffix.

bmw s|

bmw i3 sedan bmw i3 sportback

Build an inverted index where
 docids are assigned in decreasing score order: smaller docids are better.

Conjunctive-Search

slide-10
SLIDE 10

Return strings containing all the tokens in the prefix and any token prefixed by the suffix.

bmw s|

bmw i3 sedan bmw i3 sportback bmw i3 sport

Build an inverted index where
 docids are assigned in decreasing score order: smaller docids are better.

Conjunctive-Search

slide-11
SLIDE 11

Return strings containing all the tokens in the prefix and any token prefixed by the suffix.

bmw s|

bmw i3 sedan bmw i3 sportback bmw i3 sport

Build an inverted index where
 docids are assigned in decreasing score order: smaller docids are better.

bmw [7,9]

Conjunctive-Search

slide-12
SLIDE 12

Return strings containing all the tokens in the prefix and any token prefixed by the suffix.

bmw s|

bmw i3 sedan bmw i3 sportback bmw i3 sport

Build an inverted index where
 docids are assigned in decreasing score order: smaller docids are better.

bmw [7,9]

Heap-based approach:
 (1) Much better than explicitly computing the union.
 (2) Terms involved in union may be too many!

Conjunctive-Search

slide-13
SLIDE 13

Conjunctive-Search

Forward search

slide-14
SLIDE 14

Conjunctive-Search

bmw 2|

————————————————————————————————— terms inverted lists termids ————————————————————————————————— 1 4 0 2015 0, 5, 6 1 2016 3 2 2017 1, 2 3 3 0, 1, 3, 5 4 8 2, 6 5 a 5 6 audi 2, 5 7 bmw 0, 1, 3, 4, 6 8 i 0, 1, 3, 6 9 q 2 10 x 4 11 —————————————————————————————————————————— docids completions sets —————————————————————————————————————————— 5 audi a 3 2015 7 , 6 , 4 , 1 2 audi q 8 2017 7 , 10 , 5 , 3 4 bmw x 1 8 , 11 , 0 0 bmw i 3 2015 8 , 9 , 4 , 1 3 bmw i 3 2016 8 , 9 , 4 , 2 1 bmw i 3 2017 8 , 9 , 4 , 3 6 bmw i 8 2015 8 , 9 , 5 , 1

Forward search

slide-15
SLIDE 15

Conjunctive-Search

bmw 2|

————————————————————————————————— terms inverted lists termids ————————————————————————————————— 1 4 0 2015 0, 5, 6 1 2016 3 2 2017 1, 2 3 3 0, 1, 3, 5 4 8 2, 6 5 a 5 6 audi 2, 5 7 bmw 0, 1, 3, 4, 6 8 i 0, 1, 3, 6 9 q 2 10 x 4 11 —————————————————————————————————————————— docids completions sets —————————————————————————————————————————— 5 audi a 3 2015 7 , 6 , 4 , 1 2 audi q 8 2017 7 , 10 , 5 , 3 4 bmw x 1 8 , 11 , 0 0 bmw i 3 2015 8 , 9 , 4 , 1 3 bmw i 3 2016 8 , 9 , 4 , 2 1 bmw i 3 2017 8 , 9 , 4 , 3 6 bmw i 8 2015 8 , 9 , 5 , 1

Forward search

slide-16
SLIDE 16

Conjunctive-Search

bmw 2|

————————————————————————————————— terms inverted lists termids ————————————————————————————————— 1 4 0 2015 0, 5, 6 1 2016 3 2 2017 1, 2 3 3 0, 1, 3, 5 4 8 2, 6 5 a 5 6 audi 2, 5 7 bmw 0, 1, 3, 4, 6 8 i 0, 1, 3, 6 9 q 2 10 x 4 11 —————————————————————————————————————————— docids completions sets —————————————————————————————————————————— 5 audi a 3 2015 7 , 6 , 4 , 1 2 audi q 8 2017 7 , 10 , 5 , 3 4 bmw x 1 8 , 11 , 0 0 bmw i 3 2015 8 , 9 , 4 , 1 3 bmw i 3 2016 8 , 9 , 4 , 2 1 bmw i 3 2017 8 , 9 , 4 , 3 6 bmw i 8 2015 8 , 9 , 5 , 1

Forward search

slide-17
SLIDE 17

Conjunctive-Search

bmw 2|

————————————————————————————————— terms inverted lists termids ————————————————————————————————— 1 4 0 2015 0, 5, 6 1 2016 3 2 2017 1, 2 3 3 0, 1, 3, 5 4 8 2, 6 5 a 5 6 audi 2, 5 7 bmw 0, 1, 3, 4, 6 8 i 0, 1, 3, 6 9 q 2 10 x 4 11 —————————————————————————————————————————— docids completions sets —————————————————————————————————————————— 5 audi a 3 2015 7 , 6 , 4 , 1 2 audi q 8 2017 7 , 10 , 5 , 3 4 bmw x 1 8 , 11 , 0 0 bmw i 3 2015 8 , 9 , 4 , 1 3 bmw i 3 2016 8 , 9 , 4 , 2 1 bmw i 3 2017 8 , 9 , 4 , 3 6 bmw i 8 2015 8 , 9 , 5 , 1 bmw i 3 2015

Forward search

slide-18
SLIDE 18

Conjunctive-Search

bmw 2|

————————————————————————————————— terms inverted lists termids ————————————————————————————————— 1 4 0 2015 0, 5, 6 1 2016 3 2 2017 1, 2 3 3 0, 1, 3, 5 4 8 2, 6 5 a 5 6 audi 2, 5 7 bmw 0, 1, 3, 4, 6 8 i 0, 1, 3, 6 9 q 2 10 x 4 11 —————————————————————————————————————————— docids completions sets —————————————————————————————————————————— 5 audi a 3 2015 7 , 6 , 4 , 1 2 audi q 8 2017 7 , 10 , 5 , 3 4 bmw x 1 8 , 11 , 0 0 bmw i 3 2015 8 , 9 , 4 , 1 3 bmw i 3 2016 8 , 9 , 4 , 2 1 bmw i 3 2017 8 , 9 , 4 , 3 6 bmw i 8 2015 8 , 9 , 5 , 1 bmw i 3 2015

Forward search

slide-19
SLIDE 19

Conjunctive-Search

bmw 2|

————————————————————————————————— terms inverted lists termids ————————————————————————————————— 1 4 0 2015 0, 5, 6 1 2016 3 2 2017 1, 2 3 3 0, 1, 3, 5 4 8 2, 6 5 a 5 6 audi 2, 5 7 bmw 0, 1, 3, 4, 6 8 i 0, 1, 3, 6 9 q 2 10 x 4 11 —————————————————————————————————————————— docids completions sets —————————————————————————————————————————— 5 audi a 3 2015 7 , 6 , 4 , 1 2 audi q 8 2017 7 , 10 , 5 , 3 4 bmw x 1 8 , 11 , 0 0 bmw i 3 2015 8 , 9 , 4 , 1 3 bmw i 3 2016 8 , 9 , 4 , 2 1 bmw i 3 2017 8 , 9 , 4 , 3 6 bmw i 8 2015 8 , 9 , 5 , 1 bmw i 3 2015 bmw i 3 2017

Forward search

slide-20
SLIDE 20

Conjunctive-Search

bmw 2|

————————————————————————————————— terms inverted lists termids ————————————————————————————————— 1 4 0 2015 0, 5, 6 1 2016 3 2 2017 1, 2 3 3 0, 1, 3, 5 4 8 2, 6 5 a 5 6 audi 2, 5 7 bmw 0, 1, 3, 4, 6 8 i 0, 1, 3, 6 9 q 2 10 x 4 11 —————————————————————————————————————————— docids completions sets —————————————————————————————————————————— 5 audi a 3 2015 7 , 6 , 4 , 1 2 audi q 8 2017 7 , 10 , 5 , 3 4 bmw x 1 8 , 11 , 0 0 bmw i 3 2015 8 , 9 , 4 , 1 3 bmw i 3 2016 8 , 9 , 4 , 2 1 bmw i 3 2017 8 , 9 , 4 , 3 6 bmw i 8 2015 8 , 9 , 5 , 1 bmw i 3 2015 bmw i 3 2017

Forward search

slide-21
SLIDE 21

Conjunctive-Search

bmw 2|

————————————————————————————————— terms inverted lists termids ————————————————————————————————— 1 4 0 2015 0, 5, 6 1 2016 3 2 2017 1, 2 3 3 0, 1, 3, 5 4 8 2, 6 5 a 5 6 audi 2, 5 7 bmw 0, 1, 3, 4, 6 8 i 0, 1, 3, 6 9 q 2 10 x 4 11 —————————————————————————————————————————— docids completions sets —————————————————————————————————————————— 5 audi a 3 2015 7 , 6 , 4 , 1 2 audi q 8 2017 7 , 10 , 5 , 3 4 bmw x 1 8 , 11 , 0 0 bmw i 3 2015 8 , 9 , 4 , 1 3 bmw i 3 2016 8 , 9 , 4 , 2 1 bmw i 3 2017 8 , 9 , 4 , 3 6 bmw i 8 2015 8 , 9 , 5 , 1 bmw i 3 2015 bmw i 3 2017 bmw i 3 2016

Forward search

slide-22
SLIDE 22

Conjunctive-Search

bmw 2|

————————————————————————————————— terms inverted lists termids ————————————————————————————————— 1 4 0 2015 0, 5, 6 1 2016 3 2 2017 1, 2 3 3 0, 1, 3, 5 4 8 2, 6 5 a 5 6 audi 2, 5 7 bmw 0, 1, 3, 4, 6 8 i 0, 1, 3, 6 9 q 2 10 x 4 11 —————————————————————————————————————————— docids completions sets —————————————————————————————————————————— 5 audi a 3 2015 7 , 6 , 4 , 1 2 audi q 8 2017 7 , 10 , 5 , 3 4 bmw x 1 8 , 11 , 0 0 bmw i 3 2015 8 , 9 , 4 , 1 3 bmw i 3 2016 8 , 9 , 4 , 2 1 bmw i 3 2017 8 , 9 , 4 , 3 6 bmw i 8 2015 8 , 9 , 5 , 1 bmw i 3 2015 bmw i 3 2017 bmw i 3 2016

Forward search

slide-23
SLIDE 23

Conjunctive-Search

bmw 2|

————————————————————————————————— terms inverted lists termids ————————————————————————————————— 1 4 0 2015 0, 5, 6 1 2016 3 2 2017 1, 2 3 3 0, 1, 3, 5 4 8 2, 6 5 a 5 6 audi 2, 5 7 bmw 0, 1, 3, 4, 6 8 i 0, 1, 3, 6 9 q 2 10 x 4 11 —————————————————————————————————————————— docids completions sets —————————————————————————————————————————— 5 audi a 3 2015 7 , 6 , 4 , 1 2 audi q 8 2017 7 , 10 , 5 , 3 4 bmw x 1 8 , 11 , 0 0 bmw i 3 2015 8 , 9 , 4 , 1 3 bmw i 3 2016 8 , 9 , 4 , 2 1 bmw i 3 2017 8 , 9 , 4 , 3 6 bmw i 8 2015 8 , 9 , 5 , 1 bmw i 3 2015 bmw i 3 2017 bmw i 3 2016

Forward search

slide-24
SLIDE 24

Conjunctive-Search

bmw 2|

————————————————————————————————— terms inverted lists termids ————————————————————————————————— 1 4 0 2015 0, 5, 6 1 2016 3 2 2017 1, 2 3 3 0, 1, 3, 5 4 8 2, 6 5 a 5 6 audi 2, 5 7 bmw 0, 1, 3, 4, 6 8 i 0, 1, 3, 6 9 q 2 10 x 4 11 —————————————————————————————————————————— docids completions sets —————————————————————————————————————————— 5 audi a 3 2015 7 , 6 , 4 , 1 2 audi q 8 2017 7 , 10 , 5 , 3 4 bmw x 1 8 , 11 , 0 0 bmw i 3 2015 8 , 9 , 4 , 1 3 bmw i 3 2016 8 , 9 , 4 , 2 1 bmw i 3 2017 8 , 9 , 4 , 3 6 bmw i 8 2015 8 , 9 , 5 , 1 bmw i 3 2015 bmw i 3 2017 bmw i 8 2015 bmw i 3 2016

Forward search

slide-25
SLIDE 25

Conjunctive-Search

bmw 2|

————————————————————————————————— terms inverted lists termids ————————————————————————————————— 1 4 0 2015 0, 5, 6 1 2016 3 2 2017 1, 2 3 3 0, 1, 3, 5 4 8 2, 6 5 a 5 6 audi 2, 5 7 bmw 0, 1, 3, 4, 6 8 i 0, 1, 3, 6 9 q 2 10 x 4 11 —————————————————————————————————————————— docids completions sets —————————————————————————————————————————— 5 audi a 3 2015 7 , 6 , 4 , 1 2 audi q 8 2017 7 , 10 , 5 , 3 4 bmw x 1 8 , 11 , 0 0 bmw i 3 2015 8 , 9 , 4 , 1 3 bmw i 3 2016 8 , 9 , 4 , 2 1 bmw i 3 2017 8 , 9 , 4 , 3 6 bmw i 8 2015 8 , 9 , 5 , 1 bmw i 3 2015 bmw i 3 2017 bmw i 8 2015 bmw i 3 2016

Forward search

Forward-Search approach: (1) No heap management.
 (2) Need “direct” access to completions: Fwd or FC.

slide-26
SLIDE 26

Experiments

Machine equipped with Intel i9-9900K cores (@3.60 GHz), 64 GB of RAM, and running Linux 5 (64 bits).

https://github.com/jermp/autocomplete

C++ code available at Datasets

slide-27
SLIDE 27

Experiments — Efficiency

Top-10 conjunctive-search query timings in μsec per query, by varying query length and percentage of the last query token.

slide-28
SLIDE 28

Experiments — Effectiveness

Percentage of better scored results returned by conjunctive-search with respect to those returned by prefix-search for top-10 queries.

slide-29
SLIDE 29

Experiments — Space

Space usage in total MiB and bytes per completion (bpc).

slide-30
SLIDE 30

Experiments — Space

Space usage in total MiB and bytes per completion (bpc).

slide-31
SLIDE 31

Experiments — Space

Space usage in total MiB and bytes per completion (bpc).

slide-32
SLIDE 32

Take-away Messages

  • Conjunctive-search overcomes the limited effectiveness
  • f prefix-search by returning more and better scored

results.

  • While prefix-search is very fast (less then 3 μsec per

query on average), conjunctive-search is more expensive and costs between 4 and 500 μsec per query depending

  • n the size of the query.
  • Our optimized implementation of conjunctive-search

substantially outperforms the use of a classical as well as blocked inverted index with small extra, or even less, space.

slide-33
SLIDE 33

Thanks for your attention!

slide-34
SLIDE 34

Prefix-Search

slide-35
SLIDE 35

Prefix-Search

bmw i3 s

slide-36
SLIDE 36

Prefix-Search

bmw i3 s

1 1

slide-37
SLIDE 37

Prefix-Search

bmw i3 s

1 1 2 2

slide-38
SLIDE 38

Prefix-Search

bmw i3 s

1 1 2 2

Docids are assigned in decreasing score order: top-k algorithm reduces to RMQ.

3 3

slide-39
SLIDE 39

Prefix-Search

bmw i3 s

1 1 2 2

Docids are assigned in decreasing score order: top-k algorithm reduces to RMQ.

3 3

bmw i3 sedan

slide-40
SLIDE 40

Prefix-Search

bmw i3 s

1 1 2 2

Docids are assigned in decreasing score order: top-k algorithm reduces to RMQ.

3 3

bmw i3 sedan bmw i3 sportback

slide-41
SLIDE 41

Prefix-Search

bmw i3 s

1 1 2 2

Docids are assigned in decreasing score order: top-k algorithm reduces to RMQ.

3 3

bmw i3 sedan bmw i3 sportback bmw i3 sport