smart autocomple you complete me Anne Veling June 5th, 2012 - - PowerPoint PPT Presentation

smart autocomple
SMART_READER_LITE
LIVE PREVIEW

smart autocomple you complete me Anne Veling June 5th, 2012 - - PowerPoint PPT Presentation

smart autocomple you complete me Anne Veling June 5th, 2012 Berlin Buzzwords @anneveling agenda 9292.nl Public Transport Site Naive Address Autocompletion Field Inspection Semantic Autocompletion Conclusions 9292.NL Largest


slide-1
SLIDE 1

you complete me

Anne Veling – June 5th, 2012 – Berlin Buzzwords @anneveling

smart autocomple…

slide-2
SLIDE 2

agenda

9292.nl Public Transport Site Naive Address Autocompletion Field Inspection Semantic Autocompletion Conclusions

slide-3
SLIDE 3
slide-4
SLIDE 4

9292.NL

Largest public transport site of The Netherlands 1M travel advices per day! Complete new site by Q42 Linking to existing routing engine Moving from multiple input boxes to one Mobile applications for Windows, iPhone, Android

slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7
slide-8
SLIDE 8
slide-9
SLIDE 9

data

10M points Train and metro stations Bus stops Places of Interest Streets Street ranges Addresses Highly ambiguous Streets / city names / POI Spelling mistakes No single order

slide-10
SLIDE 10

Naive implementation

One concatenated field in Lucene Tune tokenizer/analyzer Tune query analyzer Tune weights Syntax Only

slide-11
SLIDE 11

effort quality

80% 100%

slide-12
SLIDE 12
slide-13
SLIDE 13

Field inspection

Taking advantage of Number of fields Speed of Lucene Query Analysis For each term, query in all fields Does it appear in that field? Count > 0? Use that information to do semantic interpretation

slide-14
SLIDE 14

etten leur zeil city? ☑ ☑ ☒ station? ☑ ☑ ☒ bus stop? ☑ ☑ ☑ street? ☑ ☑ ☑ city:etten-leur street:zeil

slide-15
SLIDE 15

results

Implemented in Scala Lucene RequestHandler in Solr Ajax front-end

slide-16
SLIDE 16
slide-17
SLIDE 17

tuning

Iterative Tuning Using real user inputs from production log files Regression Testing to track index/algorithm changes over time For how many test queries is the expected result

  • The top result?
  • In the top 5?
slide-18
SLIDE 18

conclusions

Very positive feedback Iterative tuning based on actual user input from log files Regression test Lucene is fast Entire type-ahead still within 40ms But: partner currently evaluating naive-only approach sometimes good enough is good enough Field Inspection will allow high quality selection With fallback to naive syntactic search

slide-19
SLIDE 19

Thank you

@anneveling