APPROACHES TO IMPLEMENT SEMANTIC SEARCH Johannes Peter Product - - PowerPoint PPT Presentation

approaches to implement semantic search
SMART_READER_LITE
LIVE PREVIEW

APPROACHES TO IMPLEMENT SEMANTIC SEARCH Johannes Peter Product - - PowerPoint PPT Presentation

APPROACHES TO IMPLEMENT SEMANTIC SEARCH Johannes Peter Product Owner / Architect for Search 1 WHAT IS SEMANTIC SEARCH ? 2 Success of search Interface of shops to brains of customers Wide range of usage Success depends on a


slide-1
SLIDE 1

1

APPROACHES TO IMPLEMENT SEMANTIC SEARCH

Johannes Peter Product Owner / Architect for Search

slide-2
SLIDE 2

2

WHAT IS SEMANTIC SEARCH ?

slide-3
SLIDE 3

3

Success of search

  • Interface of shops to brains of customers
  • Wide range of usage
  • Success depends on a proper understanding

?

Search

slide-4
SLIDE 4

4

Simple keyword search

title type description attribute MyMobile 7 Smartphone ... with a contract ... contract MyMobile 7 Smartphone ... Marriage without contract DVD ... MyMobile 6 Smartphone ... Sitcom season 7 DVD ... 7 … MyMobile 6 Smartphone ... with a contract ... contract mymobile 7 without contract

slide-5
SLIDE 5

5

Identifying entities

mymobile 7 without contract product / product group without certain attribute Entity Example Products mymobile 7 Attributes contract Product with / without attribute mymobile 7 without contract Product group with approximate price mymobile under 300 euro

slide-6
SLIDE 6

6

Semantic search

title type description attribute MyMobile 7 smartphone ... MyMobile 6 smartphone ... MyMobile 7 smartphone ... with a contract ... contract MyMobile 6 smartphone ... with a contract ... contract mymobile 7 without contract product / product group without certain attribute

slide-7
SLIDE 7

7

Core benefits

Better precision Better recommendation Facilitated search management

slide-8
SLIDE 8

8

Future perspectives

Voice search Sophisticated sales advisors Chat bots

slide-9
SLIDE 9

9

APPROACHES

slide-10
SLIDE 10

10

ONTOLOGIES & RULE COLLECTIONS

slide-11
SLIDE 11

11

Ontologies & rule collections

mymobile 7 without contract Step Example Identify entities product(mymobile 7) without attribute(contract) Execute rules to combine entities product(mymobile 7) not(attribute(contract)) Translate into search query title:("mymobile 7") AND NOT flag:(contract)

slide-12
SLIDE 12

12

Ontologies

  • Hierarchies of entities
  • Products, attributes and relations

mymobile mymobile 6 mymobile 7 product color black white attribute

slide-13
SLIDE 13

13

Rule collections

  • Condition: There is the term without between a product and an attribute
  • Action: Negate the attribute

mymobile 7 without contract pink dvd

  • Pink: color or artist? à Disambiguation
  • Condition: The term pink appears together with entities related to music or movies
  • Action: Annotate the term pink as artist
slide-14
SLIDE 14

15

Implementation

  • Two parts of implementation
  • Development of the application
  • Information extraction part (creation of ontologies & rule collections)
  • Service for ontology extraction
  • Solr and Elasticsearch are not suitable
  • Highly scalable and performant solution with Spring Boot & Apache Lucene (using term

vectors as payloads)

  • Rule engine
  • Configurable rulesets
  • Routing concept
slide-15
SLIDE 15

16

Implementation

  • Well suited for agile development
  • Pieces of information can be extracted fairly independently

Sprint(s) Extract prices Ontology for products Sprint(s) Combinations of products & prices Rules for products … …

slide-16
SLIDE 16

17

Implementation

  • More complex cases
  • Extract information out of product descriptions
  • Understanding of natural language

Developers Analysts / Linguists

  • Requires maintenance for ontologies and rule collections
slide-17
SLIDE 17

18

MACHINE LEARNING

slide-18
SLIDE 18

19

Machine learning

training data model new query

slide-19
SLIDE 19

20

term mymobile 7 without contract part of speech noun digit preposition noun relation head mymobile contract mymobile chunks noun phrase noun with negation entity product with negated attribute

Machine learning

training data model new query

slide-20
SLIDE 20

21

Machine learning – NLP

  • How natural is the language used for queries?
  • Considering grammatical information can be complicated
  • Disambiguation is very difficult for some cases

term pink mymobile part of speech adjective noun term pink dvd part of speech proper noun noun

  • Natural language processing:
  • "The label saw potential in Pink and offered her a contract."
slide-21
SLIDE 21

22

Implementation

  • Established procedures from the area of natural language processing
  • Libraries (e. g. spaCy) providing
  • Functionalities fairly easy to use
  • High performance
  • Customizations
  • All discussed steps require their own model (training + evaluation data)
  • Still highly experimental
  • Fail early?
  • Continuous delivery?
slide-22
SLIDE 22

23

TERM CO-OCCURRENCES

slide-23
SLIDE 23

24

Term co-occurrences

  • Enrich documents by contextual information
  • Using collaborative filters (recommendation)
  • Which terms / attributes appear in the context of a product?
slide-24
SLIDE 24

26

Term co-occurrences

title category color description MyMobile 7 MyMobile black Smartphone MyMobile 7 black with 128 gb MyMobile 7 MyMobile white New smartphone MyMobile 7, 64 gb, white Sitcom season 7 DVD Season number 7 of the sitcom … MyMobile 6 MyMobile black MyMobile 6 – smartphone – 32 gb – black MyMobile 6 MyMobile white MyMobile 6, smartphone black with 128 gb

  • Co-occurring terms for category MyMobile:

Ø Term "smartphone": 7, black, white mymobile 7

slide-25
SLIDE 25

27

Term co-occurrences

title category color context MyMobile 7 MyMobile black 6, white MyMobile 7 MyMobile white 6, black MyMobile 6 MyMobile black 7, white MyMobile 6 MyMobile white 7, black Sitcom season 7 DVD … mymobile 7

slide-26
SLIDE 26

29

Implementation

  • Fairly easy to implement
  • Generic
  • Produces side effects
  • Requires high data quality
  • Only partially solves problems related to semantic search
  • Not suitable for complex cases
slide-27
SLIDE 27

30

Conclusion

Term co-occurrences Ontologies + rules Machine learning Effort moderate high high Holistic solution no yes yes Suitable for complex cases no yes yes Maintenance effort low high low Success factors

  • High data quality
  • Agile development
  • Ability of linguists
  • Quality of rules
  • Agile development
  • Ability of data scientists
  • Quality of training data

Risk factors

  • Side effects
  • Never-ending rule-

building

  • Never-ending generation
  • f training data
  • Too high expectations
slide-28
SLIDE 28

31

THANK YOU !!

BTW: We are hiring … peterj@mediamarktsaturn.com