Combining Solr and Elasticsearch to Improve Autosuggestion on Mobile - PowerPoint PPT Presentation

Combining Solr and Elasticsearch to Improve Autosuggestion on Mobile Local Search Toan Vinh Luu, PhD Senior Search Engineer local.ch AG Apache: Big Data 2015

In this talk • Requirements of an autosuggestion feature • Autosuggestion architecture • Evaluation Apache: Big Data 2015

local.ch • Local search engine in Switzerland (web, mobile) • Each month: – > 4 millions unique users – > 8 millions queries on mobile (iOS, android,…) • Users search for: – Services (e.g “restaurant zurich”) – Resident information (e.g “toan luu”) – Phone number (e.g. 079574xxyy) – Addresses, weather, – ... Apache: Big Data 2015

Why autosuggestion is important? User taps on the phone 8 times instead of 34 times to get to the result list when searching for “Electric installation Wallisellen” Apache: Big Data 2015

What should we suggest to user? Apache: Big Data 2015

Popular data suggestion Apache: Big Data 2015

Popular queries suggestion “mc donalds” has less entries than “muller” but is queried >10x >2000 queries/month for “cablecom” which have only 1 entry Apache: Big Data 2015

Query history suggestion • 9% mobile queries are historical queries. • 38% users search by a query in the past Apache: Big Data 2015

Spellchecker suggestion >700’000 mistakes per month on mobile (9%) Apache: Big Data 2015

Detail entry suggestion Apache: Big Data 2015

Special information suggestion Apache: Big Data 2015

Autosuggestion Architecture Autosuggest API/Search API SuggestData Index component Local.ch Database Spellchecker Index component Popular query Popular query Index processor component Query history Index component Index Query log Apache: Big Data 2015

Data suggestion • Pre-generating suggested queries from the data • Entry: – Name: Subito – Category: Restaurant Possible suggested queries: – Street: Konradstrasse • Restaurant • Subito – Zipcode: 8005 • Restaurant Zürich – City: Zürich • Restaurant Subito • Restaurant Subito Zürich • Konradstrasse, 8005 Zürich • Zürich Apache: Big Data 2015

Compute data popularity • Use faceting to get suggested queries sorted by frequency • This approach guarantees near real-time suggestion • Suggested queries are copied to 2 fields: – Search field used for matching, apply analyzers, tokenizer… – Facet field used for displaying and for computing frequency • Example: – q=restaurant zu* => suggest “Restaurant Zürich” – q=zurich restau* => suggest “Restaurant Zürich” Apache: Big Data 2015

Improvement • Faceting is expensive for short prefix match queries ⇒ Store suggested results in a Cache for all queries with 1, 2 characters • Filter duplicated suggestion – “ Restaurant Subito ” and “ Restaurant Subito Zürich ” is 1 entity if they have same frequency => keep only 1 suggestion • Store location , language with suggested queries to filter out irrelevant suggestion to user. Apache: Big Data 2015

How do we process “popular queries” • Popular is just not high frequency! • Depend on user’s language – 4 languages are used in Switzerland. Fail if we suggest “bäckerei” for a French speaking user • Depend on location – Fail if we suggest a hospital in Zurich for an user in Geneva • Misspell – Fail if we suggest “zürich” and “züruch” • Number of unique users – Fail if we suggest “toan” just because I searched my name thousands of times • Blacklist – Fail if we suggest “f**k”, “pe**is” Apache: Big Data 2015

Popular query processor • Preprocessing query log: – Text normalization, stopword, blacklist, keep only queries return results… • A query log item in elasticsearch index { "q": "restaurant", "language": "de", "lon": 8.50646, "lat": 47.4192, "datetime": "2014-06-02 11:10:07”, "user": “eeaad0c09abc41676c1c99530693” } Apache: Big Data 2015

Find candidate popular queries for each language { "query" : { "query_string" : { "query" : "language:" + language } }, "facets" : { "q" : { "terms" : { "field" : "q.untouched", "size" : TOP_POPULAR } } } } Apache: Big Data 2015

Find number of unique users given a query { "query" : { "query_string" : { "query" : "q.untouched:" + query } }, "aggs": { "num_users": { "cardinality": { "field": "user" } } } } Apache: Big Data 2015

100 150 200 250 300 50 Apache: Big Data 2015 0 5.95 6.05 6.15 6.25 90% Bounding box to limit popular 6.35 6.45 6.55 6.65 6.75 queries given location 6.85 6.95 7.05 7.15 7.25 7.35 7.45 7.55 (Centre Hospitalier Universitaire Vaudois) Popular query: Chuv 7.65 7.75 7.85 7.95 8.05 8.15 8.25 8.35 8.45 8.55 8.65 8.75 8.85 8.95 9.05 9.15 9.25 9.35 9.45 9.55 9.65 9.75 9.85 9.95 10.05 10.15 10.25 10.35 10.45

Histogram of query “chuv” 47.77 based on freq, longitude and latitude 47.7 47.63 47.56 47.49 47.42 47.35 47.28 47.21 47.14 47.07 47 46.93 46.86 46.79 46.72 46.65 46.58 46.51 46.44 46.37 46.3 46.23 46.16 46.09 46.02 45.95 45.88 45.81 5.95 6.04 6.13 6.22 6.31 6.4 6.49 6.58 6.67 6.76 6.85 6.94 7.03 7.12 7.21 7.3 7.39 7.48 7.57 7.66 7.75 7.84 7.93 8.02 8.11 8.2 8.29 8.38 8.47 8.56 8.65 8.74 8.83 8.92 9.01 9.1 9.19 9.28 9.37 9.46 9.55 9.64 9.73 9.82 9.91 10 10.09 10.18 10.27 10.36 10.45 Apache: Big Data 2015

46.52,6.63 46.5243,6.6397 46.53,6.64 Apache: Big Data 2015

Percentiles aggregation to find min, max value of querying location "query" : { "match" : {"q" : {"query" :”chuv”}} }, "aggs" : { "lat_outlier" : { "percentiles" : { "field" : "lat", "percents" : [5, 95] } }, "lon_outlier" : { "percentiles" : { "field" : "lon", "percents" : [5, 95] } } } Apache: Big Data 2015

Popular query stored in Solr index { "q": "chuv", "lang": ["de”,"fr”, "en”], "users": 7435, "min_lat": 46.2245, "max_lon": 7.3332, "max_lat": 46.9909, "min_lon": 6.29637, "freq": 9524 } Apache: Big Data 2015

Solr request to suggest popular query q:ch* lang:en users: [100 TO *] min_lat:[* TO " + user_lat + "] min_lon:[* TO " + user_lon + "] max_lat:[" + user_lat + " TO *] max_lon:[" + user_lon + " TO *] & sort=freq desc Apache: Big Data 2015

Evaluation • Several metrics are used to evaluate autosuggestion feature – Number of typed characters to get to result list • Average length of input: 10.0 chars • Average length of suggestion: 15.4 chars – Number of clicks on suggested items – Average rank of clicked item Apache: Big Data 2015

Number of clicks on suggested items since query history release Release date Apache: Big Data 2015

Average rank of clicked item 2.5 2 1.5 1 Release query history suggestion 0.5 0 Apache: Big Data 2015

Conclusion • Requirement of an autosuggestion feature: – reduces number of user’s interactions with your application to get search result. • We can combine 2 search frameworks to bring better search experience to user: – Solr is efficient for querying, faceting and caching – Elasticsearch is efficient for big data aggregation and query log storing Apache: Big Data 2015

Contact information • Search team at local.ch – toan.luu@localsearch.ch – cesar.fuentes@localsearch.ch – pascal.chollet@localsearch.ch Apache: Big Data 2015

Combining Solr and Elasticsearch to Improve Autosuggestion on Mobile - PowerPoint PPT Presentation

Combining Solr and Elasticsearch to Improve Autosuggestion on Mobile Local Search Toan Vinh Luu, PhD Senior Search Engineer local.ch AG Apache: Big Data 2015 In this talk Requirements of an autosuggestion feature Autosuggestion

Apache Lucene 5 New Features and Improvements for Apache Solr and Elasticsearch Uwe Schindler

Drupal and Solr Saturday, August 30, 2008 1 Hello Im Alexandru Badiu Drupal and Solr -

Elasticsearch T E G

JSON Logging with Elasticsearch Radu Gheorghe search statistics Where do your logs end up?

Language support and linguistics in Lucene, Solr and ElasticSearch and the eco-system June 3rd,

optimizations for e-commerce search with Apache Solr Tomasz Sobczak, MICES 2017 About me Work

Beyond the Solr Eclipse Building blazing fast Drupal 8 search with Solr and no code TANAY SAI

Apache Solr An experience report 2013-10-23 - Corsin Decurtins Apache Solr Notes Full-Text

Using ElasticSearch as a fast, flexible, and scalable solution to search occurrence records and

How Elasticsearch powers the Guardians newsroom shay banon @kimchy phil wills @philwills

How Elasticsearch powers the Guardians newsroom shay banon @kimchy graham tackley

Shield your cluster Security with Elasticsearch Alexander Reelsen @spinscale alex@elastic.co

Pronto Elasticsearch Extension Practice in eBay Donggeng Yu 12/07/2019, Pronto, eBay 1 Agenda

SUSE Enterprise Storage 5.5 Object Storage Metadata Sync Module Configuration Elasticsearch,

API Ruby on Rails UI ES API Hedtek Wijiti API API Elasticsearch Depositing user Build

Building and Running a Solr-as-a-Service SHAI ERERA IBM Who Am I? Working at IBM Social

Fuzzy Matching In PostgreSQL A Story From The Trenches Charles Clavadetscher Swiss PostgreSQL

Desktop components Jens Bache-Wiig Agenda Intro Project status Coding QA

Repairing without Retraining: Avoiding Disparate Impact with Counterfactual Distributions Hao

Computing Closed-Form Solutions of Integrable Connections Thomas Cluzeau thomas.cluzeau@xlim.fr

Lessons Learned from Building a Large Multilingual, Multi-region Website in Drupal 8 Stella

Meas rDroid An Android Measurement Framework Johann Schlamp Georg Carle May 2, 2013 The Meas

Captioning Images with Diverse Objects Lisa Anne Subhashini Marcus Raymond Kate Trevor

Natural-Language Video Description with Deep Recurrent Neural Networks June 2017 Subhashini