Combining Solr and Elasticsearch to Improve Autosuggestion on Mobile - - PowerPoint PPT Presentation

combining solr and elasticsearch to improve
SMART_READER_LITE
LIVE PREVIEW

Combining Solr and Elasticsearch to Improve Autosuggestion on Mobile - - PowerPoint PPT Presentation

Combining Solr and Elasticsearch to Improve Autosuggestion on Mobile Local Search Toan Vinh Luu, PhD Senior Search Engineer local.ch AG Apache: Big Data 2015 In this talk Requirements of an autosuggestion feature Autosuggestion


slide-1
SLIDE 1

Apache: Big Data 2015

Combining Solr and Elasticsearch to Improve Autosuggestion

  • n Mobile Local Search

Toan Vinh Luu, PhD Senior Search Engineer local.ch AG

slide-2
SLIDE 2

Apache: Big Data 2015

In this talk

  • Requirements of an autosuggestion

feature

  • Autosuggestion architecture
  • Evaluation
slide-3
SLIDE 3

Apache: Big Data 2015

local.ch

  • Local search engine in Switzerland (web, mobile)
  • Each month:

– > 4 millions unique users – > 8 millions queries on mobile (iOS, android,…)

  • Users search for:

– Services (e.g “restaurant zurich”) – Resident information (e.g “toan luu”) – Phone number (e.g. 079574xxyy) – Addresses, weather, – ...

slide-4
SLIDE 4

Apache: Big Data 2015

Why autosuggestion is important?

User taps on the phone 8 times instead of 34 times to get to the result list when searching for “Electric installation Wallisellen”

slide-5
SLIDE 5

Apache: Big Data 2015

What should we suggest to user?

slide-6
SLIDE 6

Apache: Big Data 2015

Popular data suggestion

slide-7
SLIDE 7

Apache: Big Data 2015

Popular queries suggestion

>2000 queries/month for “cablecom” which have only 1 entry “mc donalds” has less entries than “muller” but is queried >10x

slide-8
SLIDE 8

Apache: Big Data 2015

Query history suggestion

  • 9% mobile queries are

historical queries.

  • 38% users search by a

query in the past

slide-9
SLIDE 9

Apache: Big Data 2015

Spellchecker suggestion

>700’000 mistakes per month on mobile (9%)

slide-10
SLIDE 10

Apache: Big Data 2015

Detail entry suggestion

slide-11
SLIDE 11

Apache: Big Data 2015

Special information suggestion

slide-12
SLIDE 12

Apache: Big Data 2015

Autosuggestion Architecture

Autosuggest API/Search API

SuggestData component Query history component Popular query component Spellchecker component

Index Index Index Index Index Query log Popular query processor Local.ch Database

slide-13
SLIDE 13

Apache: Big Data 2015

Data suggestion

  • Pre-generating suggested queries from the data
  • Entry:

– Name: Subito – Category: Restaurant – Street: Konradstrasse – Zipcode: 8005 – City: Zürich

Possible suggested queries:

  • Restaurant
  • Subito
  • Restaurant Zürich
  • Restaurant Subito
  • Restaurant Subito Zürich
  • Konradstrasse, 8005 Zürich
  • Zürich
slide-14
SLIDE 14

Apache: Big Data 2015

Compute data popularity

  • Use faceting to get suggested queries sorted by frequency
  • This approach guarantees near real-time suggestion
  • Suggested queries are copied to 2 fields:

– Search field used for matching, apply analyzers, tokenizer… – Facet field used for displaying and for computing frequency

  • Example:

– q=restaurant zu* => suggest “Restaurant Zürich” – q=zurich restau* => suggest “Restaurant Zürich”

slide-15
SLIDE 15

Apache: Big Data 2015

Improvement

  • Faceting is expensive for short prefix match queries

⇒ Store suggested results in a Cache for all queries with 1, 2 characters

  • Filter duplicated suggestion

– “Restaurant Subito” and “Restaurant Subito Zürich” is 1 entity if they have same frequency => keep only 1 suggestion

  • Store location, language with suggested queries to

filter out irrelevant suggestion to user.

slide-16
SLIDE 16

Apache: Big Data 2015

How do we process “popular queries”

  • Popular is just not high frequency!
  • Depend on user’s language

– 4 languages are used in Switzerland. Fail if we suggest “bäckerei” for a French speaking user

  • Depend on location

– Fail if we suggest a hospital in Zurich for an user in Geneva

  • Misspell

– Fail if we suggest “zürich” and “züruch”

  • Number of unique users

– Fail if we suggest “toan” just because I searched my name thousands of times

  • Blacklist

– Fail if we suggest “f**k”, “pe**is”

slide-17
SLIDE 17

Apache: Big Data 2015

Popular query processor

  • Preprocessing query log:

– Text normalization, stopword, blacklist, keep only queries return results…

  • A query log item in elasticsearch index

{ "q": "restaurant", "language": "de", "lon": 8.50646, "lat": 47.4192, "datetime": "2014-06-02 11:10:07”, "user": “eeaad0c09abc41676c1c99530693” }

slide-18
SLIDE 18

Apache: Big Data 2015

Find candidate popular queries for each language

{ "query" : { "query_string" : { "query" : "language:" + language } }, "facets" : { "q" : { "terms" : { "field" : "q.untouched", "size" : TOP_POPULAR } } } }

slide-19
SLIDE 19

Apache: Big Data 2015

Find number of unique users given a query

{ "query" : { "query_string" : { "query" : "q.untouched:" + query } }, "aggs": { "num_users": { "cardinality": { "field": "user" } } } }

slide-20
SLIDE 20

Apache: Big Data 2015

Bounding box to limit popular queries given location

50 100 150 200 250 300 5.95 6.05 6.15 6.25 6.35 6.45 6.55 6.65 6.75 6.85 6.95 7.05 7.15 7.25 7.35 7.45 7.55 7.65 7.75 7.85 7.95 8.05 8.15 8.25 8.35 8.45 8.55 8.65 8.75 8.85 8.95 9.05 9.15 9.25 9.35 9.45 9.55 9.65 9.75 9.85 9.95 10.05 10.15 10.25 10.35 10.45

90% Popular query: Chuv (Centre Hospitalier Universitaire Vaudois)

slide-21
SLIDE 21

Apache: Big Data 2015

45.81 45.88 45.95 46.02 46.09 46.16 46.23 46.3 46.37 46.44 46.51 46.58 46.65 46.72 46.79 46.86 46.93 47 47.07 47.14 47.21 47.28 47.35 47.42 47.49 47.56 47.63 47.7 47.77 5.95 6.04 6.13 6.22 6.31 6.4 6.49 6.58 6.67 6.76 6.85 6.94 7.03 7.12 7.21 7.3 7.39 7.48 7.57 7.66 7.75 7.84 7.93 8.02 8.11 8.2 8.29 8.38 8.47 8.56 8.65 8.74 8.83 8.92 9.01 9.1 9.19 9.28 9.37 9.46 9.55 9.64 9.73 9.82 9.91 10 10.09 10.18 10.27 10.36 10.45

Histogram of query “chuv” based on freq, longitude and latitude

slide-22
SLIDE 22

Apache: Big Data 2015

46.5243,6.6397 46.52,6.63 46.53,6.64

slide-23
SLIDE 23

Apache: Big Data 2015

Percentiles aggregation to find min, max value of querying location

"query" : { "match" : {"q" : {"query" :”chuv”}} }, "aggs" : { "lat_outlier" : { "percentiles" : { "field" : "lat", "percents" : [5, 95] } }, "lon_outlier" : { "percentiles" : { "field" : "lon", "percents" : [5, 95] } } }

slide-24
SLIDE 24

Apache: Big Data 2015

Popular query stored in Solr index

{ "q": "chuv", "lang": ["de”,"fr”, "en”], "users": 7435, "min_lat": 46.2245, "max_lon": 7.3332, "max_lat": 46.9909, "min_lon": 6.29637, "freq": 9524 }

slide-25
SLIDE 25

Apache: Big Data 2015

Solr request to suggest popular query

q:ch* lang:en users: [100 TO *] min_lat:[* TO " + user_lat + "] min_lon:[* TO " + user_lon + "] max_lat:[" + user_lat + " TO *] max_lon:[" + user_lon + " TO *] & sort=freq desc

slide-26
SLIDE 26

Apache: Big Data 2015

Evaluation

  • Several metrics are used to evaluate

autosuggestion feature

– Number of typed characters to get to result list

  • Average length of input: 10.0 chars
  • Average length of suggestion: 15.4 chars

– Number of clicks on suggested items – Average rank of clicked item

slide-27
SLIDE 27

Apache: Big Data 2015

Number of clicks on suggested items since query history release

Release date

slide-28
SLIDE 28

Apache: Big Data 2015

0.5 1 1.5 2 2.5

Average rank of clicked item

Release query history suggestion

slide-29
SLIDE 29

Apache: Big Data 2015

Conclusion

  • Requirement of an autosuggestion feature:

– reduces number of user’s interactions with your application to get search result.

  • We can combine 2 search frameworks to

bring better search experience to user:

– Solr is efficient for querying, faceting and caching – Elasticsearch is efficient for big data aggregation and query log storing

slide-30
SLIDE 30

Apache: Big Data 2015

Contact information

  • Search team at local.ch

– toan.luu@localsearch.ch – cesar.fuentes@localsearch.ch – pascal.chollet@localsearch.ch