Joining in Lucene Martijn van Groningen - - PowerPoint PPT Presentation

joining in lucene
SMART_READER_LITE
LIVE PREVIEW

Joining in Lucene Martijn van Groningen - - PowerPoint PPT Presentation

Joining in Lucene Martijn van Groningen martijn.vangroningen@searchworkings.com Lucene Committer & PMC Member Monday, June 4, 2012 Joining Introduction Data is often relational, but Lucenes document model is not. Support for


slide-1
SLIDE 1

Joining in Lucene

Martijn van Groningen martijn.vangroningen@searchworkings.com Lucene Committer & PMC Member

Monday, June 4, 2012

slide-2
SLIDE 2

Searchworkings.org - The online search community

Introduction Joining

  • Data is often relational, but Lucene’s document model is not.
  • Support for parent child like search from Lucene 3.4
  • Not a SQL join.
  • The parent and children are stored in separate documents.
  • Two types:
  • Index time join
  • Query time join

2

Monday, June 4, 2012

slide-3
SLIDE 3

Searchworkings.org - The online search community

Index time join Joining

  • Two block join queries:
  • ToParentBlockJoinQuery
  • ToChildBlockJoinQuery
  • One Lucene collector:
  • ToParentBlockJoinCollector
  • Index time join requires block indexing.

3

Monday, June 4, 2012

slide-4
SLIDE 4

Searchworkings.org - The online search community

Block indexing Joining

  • Atomically adding documents.
  • A block of documents.
  • Each document gets sequentially assigned Lucene document id.
  • IndexWriter#addDocuments(docs);

4

Monday, June 4, 2012

slide-5
SLIDE 5

Searchworkings.org - The online search community

Block indexing Joining

  • Index doesn't record blocks.
  • App is responsible for identifying block documents.
  • Marking a document in a block.
  • Segment merging doesn’t re-order documents in a segment.
  • Adding a document to a block requires you to reindex the whole block.
  • Removing a document from a block doesn’t requires reindexing a block.

5

Monday, June 4, 2012

slide-6
SLIDE 6

Searchworkings.org - The online search community

Domain example Joining

  • Product
  • Name
  • Description
  • Product-item
  • Color
  • Size
  • Price
  • Goal: Show the most applicable product based on product-item criteria.

6

Monday, June 4, 2012

slide-7
SLIDE 7

Searchworkings.org - The online search community

Domain example Joining

  • Parent is the last document in a block.

7

Monday, June 4, 2012

slide-8
SLIDE 8

Searchworkings.org - The online search community

Block indexing Joining

8

Marking parent documents

Monday, June 4, 2012

slide-9
SLIDE 9

Searchworkings.org - The online search community

Block indexing Joining

9

Add block Add block

Monday, June 4, 2012

slide-10
SLIDE 10

Searchworkings.org - The online search community

  • Parent filter marks the parent documents.
  • Child query is executed in the parent space.
  • ToChildBlockJoinQuery works in the opposite direction.

ToParentBlockJoinQuery Joining

10

Monday, June 4, 2012

slide-11
SLIDE 11

Searchworkings.org - The online search community

Block joining & ElasticSearch Joining

  • ElasticSearch has support for nested objects since version 0.17.0
  • Nested type in the mapping definition.
  • NestedQuery & NestedFilter
  • Uses ToParentBlockJoinQuery
  • Allows to query for nested objects as if they were separate documents

and then return the root object

11

Monday, June 4, 2012

slide-12
SLIDE 12

Searchworkings.org - The online search community

Query time joining Joining

  • Query time joining is executed in two phases and is field based:
  • fromField
  • toField
  • Doesn’t require block indexing.

12

Monday, June 4, 2012

slide-13
SLIDE 13

Searchworkings.org - The online search community

Query time joining Joining

  • First phase collects all the terms in the fromField for the documents

that match with the original query.

  • The second phase returns the documents that match with the collected

terms from the previous phase in the toField.

  • One public method:
  • JoinUtil#createJoinQuery(...)

13

Monday, June 4, 2012

slide-14
SLIDE 14

Searchworkings.org - The online search community

Query time joining - Indexing Joining

14

Referrer the product id.

Monday, June 4, 2012

slide-15
SLIDE 15

Searchworkings.org - The online search community

Query time joining - Indexing Joining

15

Monday, June 4, 2012

slide-16
SLIDE 16

Searchworkings.org - The online search community

Query time joining Joining

16

  • Result will contain one product.
  • Possible to join over two indices.

Monday, June 4, 2012

slide-17
SLIDE 17

Searchworkings.org - The online search community

Final thoughts Joining

  • Joining module has good solutions to model parent child relations.
  • Joining has impact on the query time.
  • Index time joining is much faster than query time joining
  • Query time joining is more flexible than index time joining
  • Mostly a Lucene feature only.
  • All code is annotated as experimental.

17

Monday, June 4, 2012

slide-18
SLIDE 18

18

Any questions?

Monday, June 4, 2012

slide-19
SLIDE 19

Searchworkings.org - The online search community

ToParentBlockJoinCollector Joining

  • TopGroups contains a group per top N parent document.
  • Each group contains a parent and child documents.

19

Monday, June 4, 2012