w 012345 ya
play

}w !"#$%&'()+,-./012345<yA| Illustraons by Ji Franek. - PowerPoint PPT Presentation

Flexible Similarity Search of Semanc Vectors Using Fulltext Search Engines Michal Rika, Vt Novotn, Petr Sojka; Jan Pomiklek, Radim ehek Masaryk University, Faculty of Informacs, Brno, Czech Republic mruzicka@mail.muni.cz ,


  1. Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines Michal Růžička, Vít Novotný, Petr Sojka; Jan Pomikálek, Radim Řehůřek Masaryk University, Faculty of Informa�cs, Brno, Czech Republic mruzicka@mail.muni.cz , witiko@mail.muni.cz , sojka@fi.muni.cz ; RaRe Technologies honza@rare-technologies.com , radim@rare-technologies.com https://mir.fi.muni.cz/ https://rare-technologies.com/ }w� !"#$%&'()+,-./012345<yA| Illustra�ons by Jiří Franek.

  2. Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results Outline 1 Seman�c Indexing and Searching 2 String Encoding of Seman�c Vectors 3 Results Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  3. Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results Outline 1 Seman�c Indexing and Searching 2 String Encoding of Seman�c Vectors 3 Results Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  4. Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results Seman�c Indexing document as a file (e-mail, ���, …), document as Tokenizer Input ���, … DataReader plain text (e.g. ���� (e.g. pdf2text ) Document tokenizer) document as document as a token list a segment list Segmenter Seman�cModeler (e.g. paragraph / Segment2Vec (e.g. T�I��, LSI, deep logical part learning, doc2vec ) [table, formula] segments in segmenter) all documents document as a list of points represen�ng segments Index of Vectors Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  5. Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results Seman�c Searching with Nuggets query document as a file Query Document Indexing Pipeline query as seman�c vectors doc � doc � nugget � nugget � nugget � doc � Query Nuggets Document Nuggets Similarity Search � ⋅ � seman�c vectors � � �, � � � Candidate Nuggets 3 1 Ranker � � � 2 Results as Sorted Nuggets � � � 3 1 2 Results as Sorted Documents Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  6. • … Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results Re-Ranking Techniques 1 Fast: find candidate nuggets via Elas�csearch. 2 Slow but precise: re-rank candidate nuggets with exact similarity metric. • Cosine similarity. • Euclidean similarity. Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  7. Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results Re-Ranking Techniques 1 Fast: find candidate nuggets via Elas�csearch. 2 Slow but precise: re-rank candidate nuggets with exact similarity metric. • Cosine similarity. • Euclidean similarity. • … Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  8. Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results Outline 1 Seman�c Indexing and Searching 2 String Encoding of Seman�c Vectors 3 Results Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  9. • Rounding to two decimal places, string encoded: • Feature tokens: • 0P2i0d12 • 1P2ineg0d13 • 2P2i0d07 Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results String Encoding of Seman�c Vectors • Encoding of seman�c vectors to strings (feature tokens): • Seman�c vector of three dimensions: � � [�.��, ��.��, �.���] ⃗ Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  10. • Feature tokens: • 0P2i0d12 • 1P2ineg0d13 • 2P2i0d07 Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results String Encoding of Seman�c Vectors • Encoding of seman�c vectors to strings (feature tokens): • Seman�c vector of three dimensions: � � [�.��, ��.��, �.���] ⃗ • Rounding to two decimal places, string encoded: � � [�.��, ��.��, �.���] ⃗ Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  11. • Feature tokens: • 0P2i0d12 • 1P2ineg0d13 • 2P2i0d07 Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results String Encoding of Seman�c Vectors • Encoding of seman�c vectors to strings (feature tokens): • Seman�c vector of three dimensions: � � [�.��, ��.��, �.���] ⃗ • Rounding to two decimal places, string encoded: � � [ ’0’ �.�� , ��.�� , �.��� ] ⃗ Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  12. • Feature tokens: • 0P2i0d12 • 1P2ineg0d13 • 2P2i0d07 Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results String Encoding of Seman�c Vectors • Encoding of seman�c vectors to strings (feature tokens): • Seman�c vector of three dimensions: � � [�.��, ��.��, �.���] ⃗ • Rounding to two decimal places, string encoded: � � [ ’0’ �.�� , ’1’ ��.�� , �.��� ] ⃗ Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  13. • Feature tokens: • 0P2i0d12 • 1P2ineg0d13 • 2P2i0d07 Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results String Encoding of Seman�c Vectors • Encoding of seman�c vectors to strings (feature tokens): • Seman�c vector of three dimensions: � � [�.��, ��.��, �.���] ⃗ • Rounding to two decimal places, string encoded: � � [ ’0’ �.�� , ’1’ ��.�� , ’2’ �.��� ] ⃗ Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  14. • Feature tokens: • 0P2i0d12 • 1P2ineg0d13 • 2P2i0d07 Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results String Encoding of Seman�c Vectors • Encoding of seman�c vectors to strings (feature tokens): • Seman�c vector of three dimensions: � � [�.��, ��.��, �.���] ⃗ • Rounding to two decimal places, string encoded: � � [ ’0P2’ �.�� , ’1’ ��.�� , ’2’ �.��� ] ⃗ Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  15. • Feature tokens: • 0P2i0d12 • 1P2ineg0d13 • 2P2i0d07 Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results String Encoding of Seman�c Vectors • Encoding of seman�c vectors to strings (feature tokens): • Seman�c vector of three dimensions: � � [�.��, ��.��, �.���] ⃗ • Rounding to two decimal places, string encoded: � � [ ’0P2’ �.�� , ’1P2’ ��.�� , ’2’ �.��� ] ⃗ Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  16. • Feature tokens: • 0P2i0d12 • 1P2ineg0d13 • 2P2i0d07 Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results String Encoding of Seman�c Vectors • Encoding of seman�c vectors to strings (feature tokens): • Seman�c vector of three dimensions: � � [�.��, ��.��, �.���] ⃗ • Rounding to two decimal places, string encoded: � � [ ’0P2’ �.�� , ’1P2’ ��.�� , ’2P2’ �.�� ] ⃗ Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  17. • Feature tokens: • 0P2i0d12 • 1P2ineg0d13 • 2P2i0d07 Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results String Encoding of Seman�c Vectors • Encoding of seman�c vectors to strings (feature tokens): • Seman�c vector of three dimensions: � � [�.��, ��.��, �.���] ⃗ • Rounding to two decimal places, string encoded: � � [ ’0P2i0d12’ , ’1P2’ ��.�� , ’2P2’ �.�� ] ⃗ Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

  18. • Feature tokens: • 0P2i0d12 • 1P2ineg0d13 • 2P2i0d07 Seman�c Indexing and Searching String Encoding of Seman�c Vectors Results String Encoding of Seman�c Vectors • Encoding of seman�c vectors to strings (feature tokens): • Seman�c vector of three dimensions: � � [�.��, ��.��, �.���] ⃗ • Rounding to two decimal places, string encoded: � � [ ’0P2i0d12’ , ’1P2ineg0d13’ , ’2P2’ �.�� ] ⃗ Flexible Similarity Search of Seman�c Vectors Using Fulltext Search Engines ISWC 2017 workshop HSSUES, Vienna, Austria, October 21, 2017

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend