full text search
play

Full-Text Search Explained Philipp Krenn @xeraa Infrastructure | - PowerPoint PPT Presentation

Full-Text Search Explained Philipp Krenn @xeraa Infrastructure | Developer Advocate ViennaDB Papers We Love Vienna Who uses databases? Who uses search? Databases vs Full-text search But I can do... SELECT * FROM my_table


  1. { "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.39556286, "hits": [ { "_index": "starwars", "_type": "quotes", "_id": "1", "_score": 0.39556286, "_source": { "quote": "These are <em>not</em> the droids you are looking for." } } ] } }

  2. POST /starwars/_search { "query": { "match": { "quote": { "query": "van", "fuzziness": "AUTO" } } } }

  3. { "took": 14, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.18155496, "hits": [ { "_index": "starwars", "_type": "quotes", "_id": "2", "_score": 0.18155496, "_source": { "quote": "Obi-Wan never told you what happened to your father." } } ] } }

  4. SELECT * FROM starwars WHERE quote LIKE "?an" OR quote LIKE "V?n" OR quote LIKE "Va?"

  5. Scoring

  6. MongoDB

  7. > db.starwars.find({ $text: { $search: "droid" }}, {score: {$meta: "textScore"}}) { "_id": ObjectId("57f2d54de814412463c3adef"), "quote": "These are not the droids you are looking for.", "score": 0.75 } Fetched 1 record(s) in 14ms

  8. One Term https://github.com/mongodb/mongo/blob/v3.2/src/mongo/db/fts/fts_spec.cpp#L219 double coeff = (0.5 * data.count / numTokens) + 0.5; data.count: matches numTokens: stemmed words

  9. Search for droid "These are not the droids you are looking for." droid look == 1 match, 2 tokens coeff:

  10. Search for father "No. I am your father." father == 1 match, 1 token coeff:

  11. Search for father "Obi-Wan never told you what happened to your father." obi wan never told happen father == 1 match, 6 tokens coeff:

  12. > db.starwars.find({ $text: { $search: "obi-wan" }}, {score: {$meta: "textScore"}}) { "_id": ObjectId("57f2d56fe814412463c3adf0"), "quote": "Obi-Wan never told you what happened to your father.", "score": 1.1666666666666667 } Fetched 1 record(s) in 6ms

  13. Multiple Terms https://github.com/mongodb/mongo/blob/v3.2/src/mongo/db/fts/fts_spec.cpp#L228 score += (weight * data.freq * coeff * adjustment); weight: method parameter data.freq, adjustment: 1

  14. Search for obi-wan obi wan never told happen father == 1 match, 6 tokens coeff:

  15. Search for obi-wan obi wan never told happen father == 1 match, 6 tokens coeff:

  16. Search for obi-wan score: Sum:

  17. Elasticsearch

  18. Term Frequency / Inverse Document Frequency (TF/IDF) Search one term

  19. BM25 https://speakerdeck.com/elastic/ improved-text-scoring-with-bm25

  20. Term Frequency

  21. Inverse Document Frequency

  22. Field-Length Norm

  23. Putting it Together score(q,d) = queryNorm(q) · coord(q,d) · ∑ ( tf(t in d) · idf(t) ² · t.getBoost() · norm(t,d) ) (t in q)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend