overview of the recognizing inference in text rite 2 at
play

Overview of the Recognizing Inference in TExt (RITE-2) at - PowerPoint PPT Presentation

RITE-2 Overview of the Recognizing Inference in TExt (RITE-2) at Recognizing Inference in TExt@NTCIR10 NTCIR-10 Yotaro Yusuke Junta Tomohide Hiroshi Cheng- Watanabe Miyao Mizuno Shibata Kanayama Wei Lee Tohoku NII


  1. RITE-2 Overview of the Recognizing Inference in TExt (RITE-2) at Recognizing ¡ Inference ¡in ¡ TExt@NTCIR10 ¡ NTCIR-10 � Yotaro Yusuke Junta Tomohide Hiroshi Cheng- Watanabe Miyao Mizuno Shibata Kanayama Wei Lee Tohoku NII Tohoku Kyoto IBM Academia University University University Research Sinica Chuan- Shuming Teruko Noriko Hideki Kohichi Jie Lin Shi Mitamura Kando Shima Takeda National Taiwan MSRA CMU NII CMU IBM Ocean University Research

  2. Overview of RITE-2 � • RITE-2 is a generic benchmark task that addresses a common semantic inference required in various NLP/IA applications The Kamakura Shogunate was considered to have begun in 1192, but the current leading t 1 : theory is that it was e ff ectively formed in 1185. Can t 2 be inferred from t 1 ? (entailment?) The Kamakura Shogunate began in Japan in the t 2 : 12 th century. 2 The 10th NTCIR Conference

  3. Motivation � • Natural Language Processing (NLP) / Information Access (IA) applications Ø Question Answering, Information Retrieval, Information Extraction, Text Summarization, Automatic evaluation for Machine Translation, Complex Question Answering • The current entailment recognition systems have not been mature enough Ø The highest accuracy on Japanese BC subtask in NTCIR-9 RITE was only 58% Ø There is still enough room to address the task to advance entailment recognition technologies 3 The 10th NTCIR Conference

  4. ≡ ≡ � ≡ ≡ ≡ RITE vs. RITE-2 � QA� Bio.� apps� IR� Application oriented sentence Search ¡ Multiple sentence- Pyramid of entailment ¡? ¡ entailment level contradiction? documents recognition inference � technology entailment ¡? BC ¡ RITE-2 � RITE � sentence paraphrase? ¡ Sentence-level entailment ¡? ¡ MC ¡ sentence inference contradiction? case ¡alternation� quantification� coordination� modification� phrase ¡rel.� lexical ¡rel.� negation� World ¡ knowledge� … Linguistic Unit ¡ phenomena- Test � sent. sent. sent. sent. sent. sent. sent. sent. level inference � Foundation oriented sent. sent. sent. sent. sent. sent. sent. sent. 4 The 10th NTCIR Conference

  5. RITE-2 Subtasks � 5 The 10th NTCIR Conference

  6. BC and MC subtasks � The Kamakura Shogunate was considered to BC Y or N have begun in 1192, but the current leading t 1 : theory is that it was e ff ectively formed in 1185. MC The Kamakura Shogunate began in Japan in the B,F,C or I t 2 : 12 th century. • BC subtask Ø Entailment (t 1 entails t 2 ) or Non-Entailment (otherwise) • MC subtask Ø Bi-directional Entailment (t 1 entails t 2 & t 2 entails t 1 ) Ø Forward Entailment (t 1 entails t 2 & t 2 does not entail t 1 ) Ø Contradiction (t 1 contradicts t 2 or cannot be true at the same time) Ø Independence (otherwise) � 6 The 10th NTCIR Conference

  7. Development of BC and MC data � retrieve pairs edit pairs of sentences if needed 1: <t1, t2> 1: <t1, t2> 1: <t1, t2> 1: <t1, t2> 1: <t1, t2> 1: <t1, t2> 1: <t1, t2> 1: <t1, t2> for each example, 5 annotators assigned its semantic label 1: <t1, t2> RITE2 1: <t1, t2> 1: <t1, t2> BC, MC 1: <t1, t2> accept an example if data 4 or more annotators assigned the same label to the example 7 The 10th NTCIR Conference

  8. 成功を収めてオスマン帝国を最盛期 Entrance Exam subtasks ( Japanese only) � Entrance ¡exam ¡problem ¡ National ¡Center ¡Test ¡for ¡University ¡Admission ¡ ( Daigaku ¡Nyushi ¡Center ¡Shiken ) ¡ t 1 : ¡ スレイマン 1 世は数多くの軍事的 に導いた. (Suleiman ¡I ¡contributed ¡in ¡a ¡ lot ¡of ¡military ¡successes ¡and ¡led ¡the ¡ Ottoman ¡Empire ¡to ¡its ¡peak. ¡ t 2 : ¡ オスマン帝国ではスレイマン 1 世 の時代が最盛期であった. (The ¡ Ottoman ¡Empire’s ¡peak ¡was ¡during ¡ the ¡reign ¡of ¡Suleiman ¡I). ¡ 8 The 10th NTCIR Conference

  9. Entrance Exam subtask: BC and Search � • Entrance Exam BC Ø Binary-classi fi cation problem ( Entailment or Non- entailment ) Ø t1 and t2 are given • Entrance Exam Search Ø Binary-classi fi cation problem ( Entailment or Non- entailment ) Ø t2 and a set of documents are given v Systems are required to search sentences in Wikipedia and textbooks to decide semantic labels � 9 The 10th NTCIR Conference

  10. UnitTest ( Japanese only) � • Motivation Ø Evaluate how systems can handle linguistic phenomena that a ff ects entailment relations • Task de fi nition Ø Binary classi fi cation problem (same as BC subtask) t 1 : In the Meiji Constitution, legal clear distinction between the Imperial Family and Japan had been allowed. Category: modi fi er t 2 : In the Meiji Constitution, distinction between the Imperial Family and Japan had been allowed. t 1 : In the Meiji Constitution, distinction between the Imperial Family and Japan had been allowed. Category: melonymy t 2 : In the Meiji Constitution, distinction between the Emperor and Japan had been allowed 10 The 10th NTCIR Conference

  11. Development of the UnitTest data � 1: <t 1 , t 2 > 1: <t 1 , t 2 > 1: <t 1 , t 2 > 1: <t 1 , t 2 > 1: <t1, t2> 1: <t 1 , t 2 > 2: <t1, t2> 1: <t1, t2> 1: <t 1 , t 2 > 1: <t 1 , t 2 > 1: <t 1 , t 2 > sampling 1: <t 1 , t 2 > 1: <t1, t2> break down 1: <t 1 , t 2 > 1.2: <t 1 , t 2 > 1: <t1, t2> 1: <t 1 , t 2 > Sampled 1.1: <t 1 , t 2 > BC subtask sentence UnitTest data data pairs • Procedure Ø Sentence pairs {<t 1 , t 2 >} were sampled from the BC subtask data Ø An annotator transformed each sampled sentence pair from t1 to t2 by breaking down the pair in a set of linguistic phenomena • [Kaneko+ 13] (to appear in ACL 2013) 11 The 10th NTCIR Conference

  12. Distribution of the linguistic phenomena in UnitTest data � dev test dev test list 11 3 lexical synonymy 10 10 quantity 1 0 hypernymy 6 3 scrambling 16 15 meronymy 1 1 inference 4 2 entailment 1 0 Implicit relation 10 18 phrase synonymy 45 35 apposition 3 1 hypernymy 3 0 temporal 2 1 entailment 28 45 spatial 4 1 case alternation 9 7 disagree lexical 5 2 modi fi er 30 42 phrase 25 25 nominalization 2 1 modality 2 1 coreference 12 4 spatial 1 1 clause 29 14 temporal 0 1 relative clause 10 8 Total 272 241 transparent head 2 1 12 The 10th NTCIR Conference

  13. RITE4QA (Chinese only) � • Motivation Ø Can an entailment recognition system rank a set of unordered answer candidates in QA? • Dataset Ø Developed from NTCIR-7 and NTCIR-8 CLQA data v t1: answer-candidate-bearing sentence v t2: a question in an a ffi rmative form • Requirements Ø Generate con fi dence scores for ranking process 13 The 10th NTCIR Conference

  14. Evaluation Metrics � • Macro F1 and Accuracy (BC, MC, ExamBC, ExamSearch and UnitTest) Accuracy = 100 × N correct 1 X MacroF 1 = F 1 c | C | N examples c ∈ C • Correct Answer Ratio (Entrance Exam) Ø Y/N labels are mapped into selections of answers and calculate accuracy of the answers • Top1 and MRR (RITE4QA) � | Q | | Q | 1 1 1 X Top 1 = [top answer is correct] X MRR = | Q | | Q | rank i i =1 i =1 14 The 10th NTCIR Conference

  15. Organization E ff ort �

  16. ≡ ≡ ≡ ≡ ≡ Generic Framework � • We provided pre-processed data and tools to lower barriers to entry � (1) Provided pre- (2) Provided a fundamental processed data entailment recognition tool sentence Linguistic Entailment outputs Analyzer Recognizer documents (3) Provided Evaluation results Evaluator RITE-2 evaluators (accuracies, F1-values…) 16 The 10th NTCIR Conference

  17. (1) Pre-processed data � • Morphological and syntactic analysis Ø MeCab [Kudo+ 05] + CaboCha [Kudo+ 02] Ø Juman + KNP Ø Provided as XML data • Search Results for Exam Search subtask Ø Used TSUBAKI [Shinzato+ 11] to provide search results Ø Provided at most fi ve search results extracted from Wikipedia and textbooks � 17 The 10th NTCIR Conference

  18. ≡ (2) A fundamental entailment recognition tool (Baseline tool) � Baseline Tool Y or N Feature Machine B,F,C or I Extractor Learning instance (XML) relation • Features Ø a machine learning-based entailment recognition system Ø simple features are implemented (Feature Extractor) v Bag-of- {content words, aligned chunks, head words} v Ratio of aligned {content words, aligned chunks} Ø new features can be easily added Ø outputs fi les compatible with the format of the RITE-2 formal run 18 The 10th NTCIR Conference

  19. � (3) RITE-2 Evaluators � • Generic Evaluator (all of the subtasks) $ java -jar rite2eval.jar -g RITE2_JA_test_bc.xml -s output_bc.txt � ------------------------------------------------------------ � |Label| #| Precision| Recall| F1| � | N| 354| 60.18( 204/ 339)| 57.63( 204/ 354)| 58.87| � | Y| 256| 44.65( 121/ 271)| 47.27( 121/ 256)| 45.92| � ------------------------------------------------------------ � Accuracy: � 53.28( 325/ 610) � Macro F1: � 52.40 � Confusion Matrix � --------------------- � |gold \ sys| N Y| � --------------------- � | N| 204 150| � | Y| 135 121| � --------------------- • Additional Evaluator (Entrance Exam) Ø Calculate correct answer ratio 19 The 10th NTCIR Conference

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend