requirements for handling paraphrases
play

Requirements for handling paraphrases Transformation rules / - PDF document

Requirements for handling paraphrases Transformation rules / patterns < IWP 2005, Oct. 14th, 2005 > X verb # S 0 ( X ) + Oper 1 ( S 0 ( X )) Handcrafting ! Iordansjaka et al., 1991 "! Dras, 1999 "! Sato et al., 1999 " A


  1. Requirements for handling paraphrases Transformation rules / patterns < IWP 2005, Oct. 14th, 2005 > X verb # S 0 ( X ) + Oper 1 ( S 0 ( X )) � Handcrafting ! Iordansjaka et al., 1991 "! Dras, 1999 "! Sato et al., 1999 " A Class-oriented Approach to ! Kondo et al., 1999 "! Kondo et al., 2001 "! Iida et al., 2001 " etc. Building a Paraphrase Corpus � Automatic acquisition ! Barzilay et al., 2001 "! Lin et al., 2001 "! Shinyama et al., 2002 " ! Shimohata et al., 2002 "! Pang et al., 2003 " etc. Atsushi FUJITA (1) , Kentaro INUI (2) burst into tears # cry X finds a solution to Y # X solves Y Paraphrase corpus (collection of paraphrase examples) (1) Kyoto University � Few freely available resources ! Dolan et al., 2004 " (2) Nara Institute of Science and Technology The leading indicators measure the economy... The leading index measures the economy.... 2 Purposes of paraphrase corpus Outline Beneficial to activate the research field Background 1. Issues and our class-oriented approach 2. Deep analysis of phenomena Semi-automatic example collection 3. Knowledge Gold-standard discovery for evaluation Preliminary trials 4. Specification 1. Our aim Paraphrase Discussion 2. corpus Conclusion 5. Example-based Paraphrase rule paraphrasing induction Design better evaluation methods 3 4 Building paraphrase corpus Variety of paraphrases Lexical paraphrase Emma burst into tears and he tried to comfort her. Issues $ automatic Emma cried and he tried to console her. � to consider: variety, source, organization acquisition � to maximize: coverage, reliability, cost-efficiency It was his best suit that John wore last night. Syntactic paraphrase $% describable Previous work John wore his best suit last night. Manual production Automatic acquisition Steven made an attempt to stop playing Hearts. ! Shirai et al., 2001 " ! Barzilay et al., 2003 " Steven attempted to stop playing Hearts. ! Kinjo et al., 2003 " ! Shinyama et al., 2003 " ! Shimohata et al., 2004 " ! Dolan et al., 2004 " Lexically compositional paraphrase The breeze sways the trees. - syntactically regular Reliability The trees sway in the breeze. Cost-efficiency - semantically compositional � Coverage is not ensured The room has already been warmed up. � No focus on sorts/variety of paraphrases The room is already warm. 5 6

  2. A class-oriented approach Aim of this study Separately collect examples for each class Confirm the feasibility of the method through practice � Given … � A paraphrase class Paraphrase sub-corpus sub-corpus sub-corpus � A text collection for class A for class B for class C corpus � Collect paraphrase examples belonging to the class Semi-automatic example collection � At a minimal human labor cost � Automatic generation + human judgment � As exhaustively as possible from the text collection � Step 1 : Define a paraphrase class � As reliable as humanly possible based on morpho-syntactic transformation patterns � Step 2 : Collect all candidates using a paraphrase engine � Step 3 : Judge candidate paraphrases in hand 7 8 Outline Semi-automatic example collection Overview Paraphrasing rules 1. V V(N) Issues and our class-oriented approach 2. X { NOM,ACC,DAT } � 1 X N Text collection Semi-automatic example collection 3. Manual description Preliminary trials 4. Lexical Specification resources 1. Discussion Paraphrase generation 2. system Conclusion 5. Judgment guidelines 2 Paraphrase 3 Evaluation Automatic candidates and generation correction 9 10 Step 1: Pattern description Step 3: Manual judgment (mutual judgment) Judgment guidelines Morpho-syntactic paraphrasing patterns “Correct”-preferred judgment process Refine � Pairs of dependency trees � To reduce labor cost � Implemented on a paraphrase generation system 1st annotator 2nd annotator Discussion ! Takahashi et al., 2001 " X: variable for any word Correct V N: variable for a noun V(N) Correct Correct � V: variable for a verb X Correct N- ACC X V(N): verbalized form of N Incorrect Candidate Deferred Deferred paraphrase &' - ( ) - * +, - - ./0 Correct film- NOM him - DAT impression - ACC to give - ACTIVE Incorrect Incorrect Incorrect ( The film made an impression on him. ) &' - ( ) - - +, - 120 Incorrect Unchecked film- NOM him - ACC to be impressed - CAUSATIVE Incorrect ( The film impressed him. ) (Unchecked) 11 12

  3. Step 3: Manual judgment (I/F) Outline Overview (a) source sentence 1. (c) second opinion (correct / incorrect) Issues and our class-oriented approach 2. Given Obligatory Semi-automatic example collection 3. (b) automatically Preliminary trials generated 4. paraphrase Specification 1. Obligatory Discussion 2. (f) free comments Conclusion 5. (c) annotator’s judge (correct / incorrect) Optional (e) confirmed (revised) (d) error tags paraphrase 13 14 Target classes Resources Paraphrases of light-verb constructions (LVC) LVC &' - ( ) - * +, - - ./0 � 4 paraphrasing patterns (e.g. (7) in paper) film- NOM him - DAT impression - ACC to give - ACTIVE � 20,155 pairs of < deverbal noun , verb = ( The film made an impression on him. ) &' - ( ) - - +, - 120 � <+, ( impression ), +,:0 ( to be impressed ) = film- NOM him - ACC to be impressed - CAUSATIVE � <>? ( invitation ), >@ ( to invite ) = ( The film impressed him. ) TransAlt Transitivity alternation (TransAlt) 345 - ( 67 - - 89: � 8 paraphrasing patterns (e.g. (10) in paper) breeze- NOM tree - ACC to sway - Transitive � 212 pairs of < intransitive verb , transitive verb = ( The breeze sways the trees. ) � <8;0 ( to sway- Intransitive ), 89: ( to sway- Transitive ) = 67 - ( 345 - * 8;0 tree- NOM breeze - DAT to sway - Intransitive � <A;0 ( to break- Intransitive ), A: ( to break- Transitive ) = ( The trees sway in the breeze. ) 15 16 Results of trials Aim of this study (reminder) Paraphrase class LVC TransAlt Confirm the feasibility of the method through practice # of paraphrasing patterns 4 8 � Given Step 1 Size of dictionary 20,155 212 � A paraphrase class # of source sentences 10,000 25,000 � A text collection Step 2 # of generated candidates 2,566 985 � Collect paraphrase examples belonging to the class # of judged candidates 1,067 964 � At a minimal human labor cost # of incorrect candidates 520 503 � As exhaustively as possible from the text collection Step 3 # of correct candidates 547 461 � As reliable as humanly possible # of paraphrase examples 591 484 Working hours 118 169.5 Working hours: 2 annotators’ working time for (1) Judgment, (2) Discussion, and (3) Re-judgment after refining guidelines 17 18

  4. Cost-efficiency Exhaustiveness The initial resource is not necessarily optimal LVC TransAlt � Paraphrasing patterns � Derivation pairs How are they optimal? � Estimated coverage: 77% (158/(158+48)) � 158 paraphrases for 750 excerpted sentences � Manual examination obtained another 48 paraphrases Not so wasteful human labor cost � 47 misses can be salvaged by resource enhancement � 7.1 candidates / man-hour � Errors of shallow parsers hurt only once � 3.7 paraphrase examples / man-hour � Use of patterns is realistic approach � TransAlt is 1.75 times more difficult than LVC due to test � Manual examination ensures coverage 19 20 Reliability Conclusion Strategy Feasibility of a semi-automatic example collection � Classification bases on guideline & linguistic intuition � Class-oriented example collection � Inter-annotator discussion refined judgment guidelines � Employing a paraphrase generation system Agreement ratio increased (in case of LVC) Promising results � 74% (3 rd day) � 77% (6 th day) � Reasonable human labor cost, but need reduction � 88% (9 th day) � 93% (11 th day) � Moderately exhaustive at initial stage � It’s still not easy to explain “why this is correct / incorrect” � Typically reliable, but some marginal cases Future plan Paraphrase sub-corpora consist of � Involve an expert to make sure of judgment guidelines � LVC: 1067 candidates / 591 examples � Involve the 3rd annotator for judgment � TransAlt: 964 candidates / 484 examples 21 22 Future work Discussion on required expertise � It is not easy to explain “why this is correct / incorrect” � Involve an expert to make sure of judgment guidelines Build sub-corpora for other paraphrase classes Extrinsic evaluation through case studies � Resultant provides both correct and incorrect examples � Immediately available for analysis and system evaluation Publicly open the resource � Paraphrase corpus, Lexical resources, Judgment guidelines 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend