end 2 end search
play

End-2-End Search Mices 2018 Duncan Blythe About Me Duncan Blythe - PowerPoint PPT Presentation

End-2-End Search Mices 2018 Duncan Blythe About Me Duncan Blythe Research Scientist @ Zalando Research M.Math.Phil in Mathematics/ Philosophy @ Oxford Ph.D. & M.Sc. in Machine Learning/ Computational Neuroscience @ TU Berlin Postdoc in


  1. End-2-End Search Mices 2018 Duncan Blythe

  2. About Me Duncan Blythe Research Scientist @ Zalando Research M.Math.Phil in Mathematics/ Philosophy @ Oxford Ph.D. & M.Sc. in Machine Learning/ Computational Neuroscience @ TU Berlin Postdoc in Biomedical Engineering @ DZNE Bonn (Helmholtz)

  3. Zalando Multinational fashion e-commerce company ● Operates in 14 countries in the EU ● > 23 million active customers ● More than 14,000 employees ● > 1,900 tech-employees ● More than 100 data scientists/ research ● engineers etc.. Large data scientific component ●

  4. Zalando

  5. Zalando

  6. Zalando Research 15 Researchers with PhDs ● ● Backgrounds CS/ ML/ Physics/ Math/ Linguistics ● 4 subteams RL, Vision, NLP, Retrieval ● We are growing/ hiring! ●

  7. NLP Team @ Zalando Research Roland Vollgraf Alan Akbik Duncan Blythe Leonidas Lefakis

  8. NLP Team @ Zalando Research Credit to Han Xiao (now @ ten cent) for the initial work

  9. Outline 1. How do many product search systems work? 2. Why do we need an end-to-end product search system? 3. What data do we need to build end-to-end search? 4. End-to-end retrieval model 5. Discussion

  10. Classical full-text retrieval framework offline Query Product Parsing Indexing Matching String/Symbolic String/Symbolic representation representation

  11. *Animation Classic product search system with filter query { “ brand ”: “Miss Selfridge”, “ category ”: “Umhängetasche”, “ color ”: “red”, ... } Message Queue Indexing Structured Filter query string index brand="nike" AND color="orange"

  12. Parsing a full-text query to a filter query { " brand ": "Miss Selfridge", " category ": "Umhängetasche", " color ": "red", ... } Parsing Indexing { Structured Matching " color ": "red", " category ": "shirt" string index }

  13. Query understanding as a pipeline (ideal) User query normalize jecck wolfskin bluejackets "Jeckk Wolfskin tokenize jeckk+wolfskin+blue+jackets BluEjackets" lemmatize “jeckk wolfskin”+blue+jacket spell-correct “jack wolfskin”+blue+jacket Parsing recognize “jack wolfskin ”+blue+ jacket named-entity recognize synonym “ jack wolfskin”+blue+ coat & acronym ?“jack wolfskin”? +blue+coat disambiguate brand="Jack Wolfskin" ?jack+wolf-skin? +blue+coat AND category="coat" AND color="blue" query-builder Filter query

  14. Query understanding in practice Full text query normalize "Jeckk Wolfskin tokenize BluEjackets" lemmatize Query parsing spell-correct recognize recognize synonym named-entity & acronym brand="Jack Wolfskin" AND category="coat" AND color="blue" disambiguate query-builder Filter query

  15. Pros & cons of a pipeline system Upside: intuitive, modular, many off-the-shelf packages, easy to collaborate ● Fragile ● Complicated dependency ● Not straightforward to improve overall search experience ● Difficult to scale out on other languages ● No concept of “continuous improvement”

  16. Pros & cons of an query/attribute based system Upside: makes sense to use attributes if they are there ● Attributes can be wrong ● Attributes can be missing ● Attributes can’t easily capture “style” in a succinct way ● Synonyms need to be hardcoded

  17. Question 1: If finding the right article is the final goal, then why should we even care about pipeline components?

  18. Question 2: How can we associate “ petite ” with “ for the smaller man ” without hard-coding for each language?

  19. “End-2-End” product search Product search system ● Trained as mapping from queries to products ● Fully data-driven ● All components trained with deep learning

  20. “End-2-End” product search smarter Question 1: If finding the right article is the final goal, eliminate simpler then why should we even care about components in the architecture spell-checking? pipeline An end-to-end product search system with deep more robust learning find better Question 2: representation for easier to maintain How can we associate “fur mamas” with query and product “Schwangerschaftsmode” without hard-coding on each domain? more scalable

  21. Classical system vs end-to-end product search system offline offline Query Product Query Product ② parsing ① indexing deep learning deep learning ③ matching matching Symbolic Symbolic Latent Latent representation representation representation representation

  22. Text ↔ product data sources

  23. Three types of data sources ● Click through data ● Crowdsourcing annotations Product ● Customer reviews User-generated content

  24. Extracting Query ↔ Product mapping from message queue s e e s e a r c h click on reco t y p e i n s e a r c h - b o x see PDP click a product r e s u l t p a g e Time user PDP PDP search-result search-result "denim shirt" Message Time c l i c k - t h r o u g h : retrieve-reco Queue c l i c k - t h r o u g h : r e c e i v e - q u e r y : retrieval-search-result S K U 0 0 0 0 0 - 0 0 1 -result S K U 0 0 0 0 - 0 0 2 " d e n i m s h i r t " { q u e r y : " d e n i m s h i r t " s k u s : [ " S K U 0 0 0 0 0 - 0 0 1 " , " S K U 0 0 0 0 0 - 0 0 2 " ] }

  25. Example of Query → Product map {"query":" ananas ", "skus":[ {"id":"CE321D0HP-A11","freq":371}, {"id":"RL651E02D-F11","freq":273}, {"id":"EV411AA0K-T11","freq":243}, {"id":"L1211E001-A11","freq":208}, {"id":"ES121D0ON-C11","freq":180}, ... {"id":"TO226K009-I11","freq":2}, {"id":"BH523F01J-A11","freq":2}, {"id":"MOC83C00C-J11","freq":1}, {"id":"MOC83C001-J11","freq":1}, {"id":"HG223F04A-A11","freq":1}]}

  26. Example of Product → Query map {"sku":" CZ621C04O-G11 ", "queries":[ {"text":"ballkkeid+lang","freq":1}, {"text":"chi+chi+london","freq":998}, {"text":"ball+kleid","freq":1}, {"text":"abendkleid","freq":403}, {"text":"abschlusskleid+leng","freq":1}, {"text":"ballkleid","freq":394}, {"text":"abschlussballkleider","freq":1}, {"text":"cocktailkleid","freq":134}, {"text":"abschluss+kleider+rot","freq":1}, {"text":"kleid","freq":125}, {"text":"abenkleid","freq":1}, {"text":"kleider","freq":118}, {"text":"abendskleid","freq":1}, {"text":"abendkleider","freq":79}, {"text":"abendkleider+in+lang","freq":1}, {"text":"abendkleid+lang","freq":58}, {"text":"abendkleider+abendkleider","freq":1}, {"text":"kleid+lang","freq":46}, {"text":"abendkleid+damen","freq":1}, {"text":"abiballkleid","freq":46}, {"text":"abendkleid+chi+chi+london","freq":1}, {"text":"chi+chi","freq":43}, {"text":"abendkleid+/ballkleid","freq":1}, {"text":"lange+kleider","freq":40}, {"text":"abend+kleid","freq":1}]} {"text":"ballkleider","freq":36}

  27. Encoder-Matcher Architecture query-encoder ... RNN RNN RNN character-embedding (y, ) (q, ) (u, ) ... YES/NO matcher image-encoder {brand: attribute-encoder "Nike", color: "olive"}

  28. Component parts Query encoder Image encoder Attribute Encoder Matcher Recurrent neural ● ● Convolutional neural ● Textual ● “Objective function” network network embedding Ranking/ ● LSTM/ GRU etc.. ● ● VGG/ ResNet/ AlexNet ● word2vec-esque classification ● Character Multiple views? ● Pretraining ● Nonlinear/ linear ● No preprocessing ● ● Pretraining methods fashion corpora ● E.g. cosine similarity ● Language flag Data augmentation ● Test time ● considerations

  29. Convolutional image encoders ... Layer n Layer 1 Layer 2 Layer 3

  30. Convolutional image encoders: fDNA Nearest neighbours: Sebastian Heinz Christian Bracher Product map:

  31. Query encoders " d e n i m s h i r t " RNN RNN RNN RNN RNN RNN RNN RNN RNN RNN RNN RNN <male> d e n i m _ s h i r t

  32. Alignment objective We want this to be large:

  33. Demo

  34. Code Review

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend