w 012345 ya
play

}w !"#$%&'()+,-./012345<yA| Illustraons by Ji Franek. - PowerPoint PPT Presentation

Math Indexer and Searcher under the Hood: History and Development of a Winning Strategy Michal Rika, Petr Sojka, Marn Lka Masaryk University, Faculty of Informacs, Brno, Czech Republic mruzicka@mail.muni.cz, sojka@fi.muni.cz,


  1. Math Indexer and Searcher under the Hood: History and Development of a Winning Strategy Michal Růžička, Petr Sojka, Mar�n Líška Masaryk University, Faculty of Informa�cs, Brno, Czech Republic mruzicka@mail.muni.cz, sojka@fi.muni.cz, 255768@mail.muni.cz https://mir.fi.muni.cz/ }w� !"#$%&'()+,-./012345<yA| Illustra�ons by Jiří Franek.

  2. Results Comparison Approach Summary Outline 1 Results Comparison 2 Approach 3 Summary Math Indexer and Searcher under the Hood: History and Development of a Winning Strategy NTCIR 2014, Tokyo, Japan, December 11th, 2014

  3. Results Comparison Approach Summary Outline 1 Results Comparison 2 Approach 3 Summary Math Indexer and Searcher under the Hood: History and Development of a Winning Strategy NTCIR 2014, Tokyo, Japan, December 11th, 2014

  4. Results Comparison Approach Summary NTCIR-10 Math Task • The first (pilot) year of the math task event last year (i.e. 2013). • Formula search and Full-text search. • 4 runs submi�ed – differ in query language. • PMath – Run #1. • CMath – Run #2. • PCMath – Run #3. • T EX – Run #4. • Open Informa�on Retrieval. • 1 run submi�ed – T EX + text mixed queries. Math Indexer and Searcher under the Hood: History and Development of a Winning Strategy NTCIR 2014, Tokyo, Japan, December 11th, 2014

  5. Results Comparison Approach Summary NTCIR-10 Math Task Results Table 1: Result metrics for submitted runs in Formula Search with Relevance Level ≥ 3 (Relevant) Metric Run 1 Run 2 Run 4 P-10 avg 0.105 0.191 0.219 P-5 avg 0.133 0.229 0.276 MAP avg 0.060 0.112 0.127 0.109 0.185 0.123 Precision (64/589) (92/496) (96/778) Table 2: Result metrics for submitted runs in Formula Search with Relevance Level ≥ 1 (Partially Relevant) Metric Run 1 Run 2 Run 4 P-10 avg 0.143 0.214 0.267 P-5 avg 0.181 0.267 0.343 MAP avg 0.066 0.081 0.100 0.148 0.232 0.161 Precision (87/589) (115/496) (125/778) Math Indexer and Searcher under the Hood: History and Development of a Winning Strategy NTCIR 2014, Tokyo, Japan, December 11th, 2014

  6. Results Comparison Approach Summary NTCIR-11 Math-2 Task • Only one type of queries. • 50 queries, each • 1–4 formulae, • 1–4 keyphrases. • Wikipedia task in addi�on to the Main task. Math Indexer and Searcher under the Hood: History and Development of a Winning Strategy NTCIR 2014, Tokyo, Japan, December 11th, 2014

  7. Results Comparison Approach Summary NTCIR-11 Math-2 Main Task Results Table: Results of submi�ed runs with Relevance Level ≥ 3 (Relevant). Main task team rank is in [ ] for our best runs (in bold). PMath CMath PCMath T EX MAP avg 0.3073 0.3630 [1] 0.3594 0.3357 P@10 avg 0.3040 0.3520 [1] 0.3480 0.3380 P@5 avg 0.5120 0.5680 [1] 0.5560 0.5400 Table: Results of submi�ed runs with Relevance Level ≥ 1 (Par�ally Relevant). Number in [ ] is team rank of all runs. PMath CMath PCMath T EX MAP avg 0.2557 0.2807 [2] 0.2799 0.2747 P@10 avg 0.5020 0.5440 0.5520 [1] 0.5400 P@5 avg 0.8440 0.8720 [2] 0.8640 0.8480 Math Indexer and Searcher under the Hood: History and Development of a Winning Strategy NTCIR 2014, Tokyo, Japan, December 11th, 2014

  8. Results Comparison Approach Summary NTCIR-11 Math-2 Wikipedia Task Results • Topics with results: • 75 out of 100 (CMath run) • Average posi�on: • 64 correct results in top 100 • 58 correct results in top 20 • 56 correct results in top 10 • 53 correct results in top 5 • 52 correct results in top 4 • 50 correct results in top 3 • 48 correct results in top 2 • 46 correct results in top 1 Math Indexer and Searcher under the Hood: History and Development of a Winning Strategy NTCIR 2014, Tokyo, Japan, December 11th, 2014

  9. Results Comparison Approach Summary NTCIR-11 Math-2 Main Task Approach Math Indexer and Searcher under the Hood: History and Development of a Winning Strategy NTCIR 2014, Tokyo, Japan, December 11th, 2014

  10. Results Comparison Approach Summary NTCIR-11 Math-2 Main Task Approach: News • Query expansion & strip-merging of subresults. • Query expansion. 𝑔 𝑔 𝑙 � 𝑙 � 𝑙 � query 1 (the original query): � � 𝑔 𝑔 𝑙 � 𝑙 � query 2: � � 𝑔 𝑔 𝑙 � query 3: � � 𝑔 𝑔 query 4: � � 𝑔 𝑙 � 𝑙 � 𝑙 � query 5: � 𝑙 � 𝑙 � 𝑙 � query 6: Math Indexer and Searcher under the Hood: History and Development of a Winning Strategy NTCIR 2014, Tokyo, Japan, December 11th, 2014

  11. Results Comparison Approach Summary NTCIR-11 Math-2 Main Task Approach: News The final result list: • Strip-merging of subresults. 1: �� original 2: �� original • Example on three subqueries 3: �� original 4: �� subquery 1 (the original one and 5: �� subquery 1 6: �� subquery 2 two derived subqueries). 7: �� original 8: �� original 9: �� original Results of the subquery 1: 10: �� subquery 1 Results of the original query: 1: �� subquery 1 11: �� subquery 1 1: �� original 2: �� subquery 1 12: �� subquery 2 2: �� original 3: �� subquery 1 13: �� original 3: �� original 4: �� subquery 1 14: �� original 4: �� original 5: �� subquery 1 15: �� original 5: �� original 16: �� subquery 1 6: �� original No more results from subquery 1. 7: �� original Results of the subquery 2: 17: �� subquery 2 1: �� subquery 2 8: �� original 18: ��� original 2: �� subquery 2 9: �� original 19: ��� original 3: �� subquery 2 10: ��� original No more results from the original query. 4: �� subquery 2 11: ��� original 20: �� subquery 2 5: �� subquery 2 21: �� subquery 2 No more results from subquery 2. 22: �� random 23: �� random … 1000: ���� random Math Indexer and Searcher under the Hood: History and Development of a Winning Strategy NTCIR 2014, Tokyo, Japan, December 11th, 2014

  12. Results Comparison Approach Summary Query Expansion Results’ Insight 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0% 10% 20% 30% 40% 50% 60% 70% The percentage of results returned by individual subqueries Original Query Subquery 1 Subquery 2 Subquery 3 Subquery 4 Subquery 5 Subquery 6 Subquery 7 Figure: Rela�ve number of results found using different subqueries for every query in CMath run Math Indexer and Searcher under the Hood: History and Development of a Winning Strategy NTCIR 2014, Tokyo, Japan, December 11th, 2014

  13. Results Comparison Approach Summary NTCIR-11 Math-2 Wikipedia Task Content Topics • Completely the same fully automa�c system used for the main NTCIR Math Task and Wikipedia subtask. • Only different data. • No tuning or modifica�ons for the Wikipedia task. • Input Content MathML was transformed to the format of the main NTCIR math task. • Manually added Presenta�on MathML and TeX representa�on of the data. • Performed all the four runs (CMath, PMath, PCMath, TeX) similarly to the main task. • No query expansion & strip-merging possible as queries consist of a single formula only. Math Indexer and Searcher under the Hood: History and Development of a Winning Strategy NTCIR 2014, Tokyo, Japan, December 11th, 2014

  14. Results Comparison Approach Summary Summary • Our results significanlty improved since the last year. • Query expansion & strip-merging of subresults helps a lot . • Be�er unifica�on definitely needed. • Wikipedia task very useful. Math Indexer and Searcher under the Hood: History and Development of a Winning Strategy NTCIR 2014, Tokyo, Japan, December 11th, 2014

  15. Results Comparison Approach Summary Ques�ons? Math Indexer and Searcher under the Hood: History and Development of a Winning Strategy NTCIR 2014, Tokyo, Japan, December 11th, 2014

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend