which one is better presentation based or content based
play

Which One Is Better: Presentation-Based or Content-Based Math - PowerPoint PPT Presentation

Which One Is Better: Presentation-Based or Content-Based Math Search? Minh-Quoc NGHIEM, Giovanni Yoko KRISTIANTO, Goran TOPC, Akiko AIZAWA Outline Introduction Math Search Systems Method Evaluation Conclusion 2


  1. Which One Is Better: Presentation-Based or Content-Based Math Search? Minh-Quoc NGHIEM, Giovanni Yoko KRISTIANTO, Goran TOPÍC, Akiko AIZAWA

  2. Outline • Introduction • Math Search Systems • Method • Evaluation • Conclusion 2

  3. Introduction • Math Search – Presentation-based • LaTeX • Presentation MathML – Content-based • Content MathML • OpenMath • NTCIR Math Track – http://ntcir-math.nii.ac.jp/ 3

  4. Introduction • Content-based systems use SnuggleTeX or LaTeXML for semantic enrichment • No evaluation of how semantic enrichment module contribute to search system • Which one is better: content-based search or presentation-based search 4

  5. Mathematical Search Systems • Presentation-based systems – Springer LaTeX Search – MathFind – The Digital Library of Mathematical Functions – EgoMath – Math Indexer and Searcher – ActiveMath – … 5

  6. Mathematical Search Systems • Content-based systems – Wolfram Function – MathWebSearch – MathGO! – MathDA – The system of Nguyen et. al – … 6

  7. Method • Use Semantic Enrichment module to convert Presentation to Content MathML • Use Content MathML for Indexing • Allow user to input query in Presentation MathML 7

  8. System framework Presentation MathML expressions Semantic Enrichment Indexing Ranking Content MathML expressions 8

  9. Semantic Enrichment • Semantic Enrichment method of Nghiem et. al (CICM 2013) – Segmentation rules: segment Presentation MathML trees into smaller trees – Translation rules: translate Presentation MathML trees to Content MathML trees – Each rule is associated with a probability 9

  10. Indexing • Indexing method of Topic et. al (NTCIR 2013) – Opaths: path in XML tree with order – Upaths: no order – Sisters: sister nodes in subtree 10

  11. Evaluation • Data – 20k Math expressions in WFS – 15 queries (modified from NTCIR) • Systems – Presentation MathML (PMathML) – Content MathML (CMathML) – Semantic Enrichment (SE) 11

  12. Evaluation • Metrics – Precision at 10 (P@10) • Precision in top k results – Normalize Discounted Cumulative Gain (nDCG) • Ranking quality 12

  13. Queries 𝑦 2 + 𝑧 2 ∞ ∞ ⅇ −𝑦 2 𝑒𝑦 𝑦𝑒𝑦 0 0 𝑙 2 coshⅇ𝑨 + sinhⅇ𝑨 𝑏𝑠𝑑𝑡𝑗𝑜(𝑦) ⅇ ~ 𝑀 𝛽+𝜉 ∫ 𝑏 𝑒+𝑐𝑨 ℛ 𝑨 𝜔 𝜉 (𝑨), ∞ lim 𝑒𝑨 𝑀 𝜉 𝜉→∞ 𝑨 𝜈 (𝑨) 𝜔 𝜉 (𝑨) 𝜉 ∈ ℕ ℬ𝒬 𝑨 𝔔 𝜉 𝜌 1 log(𝑨 + 1) 𝐼 𝑜 (𝑨) 𝜌 cos𝑢𝑜 − 𝑨sin𝑢 𝑒𝑢 0

  14. Evaluation: search performance nDCG and Precision at 10 1 0.9 0.8 0.7 0.6 PMathML CMathML SE nDCG P@10 Using content markup improve search performance 14

  15. Evaluation: search performance Precision at k 1 0.9 0.8 0.7 0.6 1 3 5 7 9 PMathML CMathML SE Using content markup improve search performance Relevant results are ranked higher 15

  16. PMathML and SE systems • SE system is better – Functions have specific meanings • Poly-Gamma, Hermite-H – More than one way to represent math expression • Sin -1 and Arcsin • PMathML system is better – Elementary functions • Power, Logarithm, Trigonometric functions

  17. Summary • Content-based math search is better than presentation-based math search • Performance of semantic enrichment module affect the math search performance • Both presentation-based and content-based systems have their strong points 17

  18. T ⱨ∆∩ⱪ y ○ u ∫ ○ r y ○ ur ∆tte∩ti○∩! 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend