joining in lucene
play

Joining in Lucene Martijn van Groningen - PowerPoint PPT Presentation

Joining in Lucene Martijn van Groningen martijn.vangroningen@searchworkings.com Lucene Committer & PMC Member Monday, June 4, 2012 Joining Introduction Data is often relational, but Lucenes document model is not. Support for


  1. Joining in Lucene Martijn van Groningen martijn.vangroningen@searchworkings.com Lucene Committer & PMC Member Monday, June 4, 2012

  2. Joining Introduction ‣ Data is often relational, but Lucene’s document model is not. ‣ Support for parent child like search from Lucene 3.4 ‣ Not a SQL join. ‣ The parent and children are stored in separate documents. ‣ Two types: ‣ Index time join ‣ Query time join Searchworkings.org - The online search community 2 Monday, June 4, 2012

  3. Joining Index time join ‣ Two block join queries: ‣ ToParentBlockJoinQuery ‣ ToChildBlockJoinQuery ‣ One Lucene collector: ‣ ToParentBlockJoinCollector ‣ Index time join requires block indexing. Searchworkings.org - The online search community 3 Monday, June 4, 2012

  4. Joining Block indexing ‣ Atomically adding documents. ‣ A block of documents. ‣ Each document gets sequentially assigned Lucene document id. ‣ IndexWriter#addDocuments(docs); Searchworkings.org - The online search community 4 Monday, June 4, 2012

  5. Joining Block indexing ‣ Index doesn't record blocks. ‣ App is responsible for identifying block documents. ‣ Marking a document in a block. ‣ Segment merging doesn’t re-order documents in a segment. ‣ Adding a document to a block requires you to reindex the whole block. ‣ Removing a document from a block doesn’t requires reindexing a block. Searchworkings.org - The online search community 5 Monday, June 4, 2012

  6. Joining Domain example ‣ Product ‣ Name ‣ Description ‣ Product-item ‣ Color ‣ Size ‣ Price ‣ Goal: Show the most applicable product based on product-item criteria. Searchworkings.org - The online search community 6 Monday, June 4, 2012

  7. Joining Domain example ‣ Parent is the last document in a block. Searchworkings.org - The online search community 7 Monday, June 4, 2012

  8. Joining Block indexing Marking parent documents Searchworkings.org - The online search community 8 Monday, June 4, 2012

  9. Joining Block indexing Add block Add block Searchworkings.org - The online search community 9 Monday, June 4, 2012

  10. Joining ToParentBlockJoinQuery ‣ Parent filter marks the parent documents. ‣ Child query is executed in the parent space. ‣ ToChildBlockJoinQuery works in the opposite direction. Searchworkings.org - The online search community 10 Monday, June 4, 2012

  11. Joining Block joining & ElasticSearch ‣ ElasticSearch has support for nested objects since version 0.17.0 ‣ Nested type in the mapping definition. ‣ NestedQuery & NestedFilter ‣ Uses ToParentBlockJoinQuery ‣ Allows to query for nested objects as if they were separate documents and then return the root object Searchworkings.org - The online search community 11 Monday, June 4, 2012

  12. Joining Query time joining ‣ Query time joining is executed in two phases and is field based: ‣ fromField ‣ toField ‣ Doesn’t require block indexing. Searchworkings.org - The online search community 12 Monday, June 4, 2012

  13. Joining Query time joining ‣ First phase collects all the terms in the fromField for the documents that match with the original query. ‣ The second phase returns the documents that match with the collected terms from the previous phase in the toField . ‣ One public method: ‣ JoinUtil#createJoinQuery(...) Searchworkings.org - The online search community 13 Monday, June 4, 2012

  14. Joining Query time joining - Indexing Referrer the product id. Searchworkings.org - The online search community 14 Monday, June 4, 2012

  15. Joining Query time joining - Indexing Searchworkings.org - The online search community 15 Monday, June 4, 2012

  16. Joining Query time joining ‣ Result will contain one product. ‣ Possible to join over two indices. Searchworkings.org - The online search community 16 Monday, June 4, 2012

  17. Joining Final thoughts ‣ Joining module has good solutions to model parent child relations. ‣ Joining has impact on the query time. ‣ Index time joining is much faster than query time joining ‣ Query time joining is more flexible than index time joining ‣ Mostly a Lucene feature only. ‣ All code is annotated as experimental. Searchworkings.org - The online search community 17 Monday, June 4, 2012

  18. Any questions? 18 Monday, June 4, 2012

  19. Joining ToParentBlockJoinCollector ‣ TopGroups contains a group per top N parent document. ‣ Each group contains a parent and child documents. Searchworkings.org - The online search community 19 Monday, June 4, 2012

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend