fact based text editing
play

Fact-based Text Editing H ayat e Iso, Ch a o Qi a o, H a ng Li The - PowerPoint PPT Presentation

Fact-based Text Editing H ayat e Iso, Ch a o Qi a o, H a ng Li The status quo of Text Editing Model, p (y | x), learns how to edit the input, x into the desired output, y . Style Transfer x = This is the worst game! y = This is the


  1. Fact-based Text Editing H ayat e Iso, Ch a o Qi a o, H a ng Li

  2. The status quo of Text Editing ‣ Model, p (y | x), learns how to edit the input, x into the desired output, y . Style Transfer x = “This is the worst game!” y = “This is the best game!” Simplification x = “Last year, I read the book that is y = “Jane wrote a book. I read it last year” authored by Jane” Grammatical Error Correction x = “Fish firming uses the lots of specials” y = “Fish firming uses a lot of specials” 2

  3. What is Fact-based Text Editing? • The goal of fact-based text editing Set of triples is to revise a given document to { ( Baymax , creator , Douncan Rouleau ), better describe the facts in a ( Douncan Rouleau , nationality , American ), knowledge base. ( Baymax , creator , Steven T. Seagle ), ( Steven T. Seagle , nationality , American ), • e.g., several triples ( Baymax , series , Big Hero 6 ), Scott Adsit ) } ( Big Hero 6 , starring , Draft text Baymax was created by Duncan Rouleau , a winner of Eagle Award . Baymax is a character in Big Hero 6 . Revised text Baymax was created by American creators Duncan Rouleau and Steven T. Seagle . Baymax is a character in Big Hero 6 which stars Scott Adsit . 3

  4. Overview of this research • Data Creation: • We have proposed a data construction method for fact-based text editing and created two datasets. • Fact-based Text Editing model: • We have proposed a model for fact-based text editing, which performs the task by generating a sequence of actions, instead of words. 4

  5. Data Creation:Factual Masking • For all of table-to-text pairs in the training data, we create the template by factual masking. Τ = {( Baymax , voice, Scott_Adsit )} x = “ Scott_Adsit does the voice for Baymax ” Set of templates for T’ Masking Τ ’ = {( AGENT-1 , voice, PATIENT-1 )} x’ x’ = “ PATIENT-1 does the voice for AGENT-1 ” Storing 5

  6. Data Creation: Retrieve LCS matched template Set of templates for {( AGENT-1 , occupation, PATIENT-3 ), Τ ’ = {( AGENT-1 , occupation, PATIENT-3 ), ( AGENT-1 , was_a_crew_member_of, BRIDGE-1 )} ( AGENT-1 , was_a_crew_member_of, BRIDGE-1 ), ( BRIDGE-1 , operator, PATIENT-2 )} y’ = AGENT-1 performed as PATIENT-3 on BRIDGE-1 mission that was operated by PATIENT-2 . Retrieve ^ x’ = AGENT-1 served as PATIENT-3 was a crew member of the BRIDGE-1 mission. 6

  7. Data Creation: Token Alignment ^ x’ = AGENT-1 served as PATIENT-3 was a crew member of the BRIDGE-1 mission . y’ = AGENT-1 performed as PATIENT-3 on BRIDGE-1 mission that was operated by PATIENT-2 . To delete 7

  8. Data Creation: Delete Substring ^ x’ = AGENT-1 served as PATIENT-3 was a crew member of the BRIDGE-1 mission . y’ = AGENT-1 performed as PATIENT-3 on BRIDGE-1 mission that was operated by PATIENT-2 . To delete x’ = AGENT-1 performed as PATIENT-3 on BRIDGE-1 mission that was operated by PATIENT-2 . Keep Keep Delete x’ = AGENT-1 performed as PATIENT-3 on BRIDGE-1 mission. 8

  9. Data Creation: Fact Unmasking • Recovering the factual information by original facts, Τ . x’ = AGENT-1 performed as PATIENT-3 on BRIDGE-1 mission. Τ = {( Alan_Bean , occupation, Test_pilot ), Unmask ( Alan_Bean , was a crew member of, Apollo_12 ), ( Apollo_12 , operator, NASA )} x = Alan_Bean performed as Test_pilot on Apollo_12 mission. Fact-based Text Editing instance Τ = {( Alan_Bean , occupation, Test_pilot ), ( Alan_Bean , was a crew member of, Apollo_12 ), ( Apollo_12 , operator, NASA )} x = Alan_Bean performed as Test_pilot on Apollo_12 mission. y = Alan_Bean performed as Test_pilot on Apollo_12 mission that was operated by NASA . 9

  10. Data Creation: Statistics • We applied our data creation method for two publicly available datasets, WebNLG (Gardent et al., 2017) and RotoWire (Wiseman et al., 2017), to create fact- based text editing datasets, WebEdit and RotoEdit . W EB E DIT R OTO E DIT T RAIN V ALID T EST T RAIN V ALID T EST # D 181k 23k 29k 27k 5.3k 4.9k # W d 4.1M 495k 624k 4.7M 904k 839k # W r 4.2M 525k 649k 5.6M 1.1M 1.0M # S 403k 49k 62k 209k 40k 36k https://github.com/isomap/factedit 10

  11. How to model the Fact-based Text Editing? • A natural choice is an encoder-decoder model with attention & copy to generate the revised text from scratch. ✘ Unnecessary word replacement could happen. ✘ Inefficient for the long input & output. Attention & Copy Table Encoder Text Encoder Decoder x y T 11

  12. Approach: Editing through Tagging • Instead of generating words from scratch, the model just predicts predefined actions . ✓ Model only focuses on the explicit editing ✓ Robust to the length of input & output Draft text x Bakewell pudding is Dessert that can be served Warm or cold . Bakewell pudding is Dessert that originates from Derbyshire Dales . Revised text y Keep Keep Keep Keep Gen (originates) Gen (from) Gen (Derbyshire Dales) Action sequence a Drop Drop Drop Drop Keep 12

  13. A running example: Keep Keep Stream Buffer t c b s W . h B i D e a e s a a r n a e v t r k s m e e s d _ e w o r e t r _ l l _ C p o u l d d d i n g { } (Bakewell_pudding, region, Derbyshire_Dales) Triples (Bakewell_pudding, course, Dessert) 13

  14. A running example: Keep Stream push pop Buffer D t t c b s W . h h e B i e a e s a a a s r n a v t t r s k m e e e d r _ w t o e r _ l l _ C p o u l d d d i n g { } (Bakewell_pudding, region, Derbyshire_Dales) Triples (Bakewell_pudding, course, Dessert) 14

  15. A running example: Gen Gen (originates) Stream Buffer … D c b s W . i s t e a e e h a r n s a v r s m t e e d _ r t o r _ C o l d { } (Bakewell_pudding, region, Derbyshire_Dales) Triples (Bakewell_pudding, course, Dessert) 15

  16. A running example: Gen emb Stream push Buffer … D o t c b s W . r e h i i e a e g s s a a r n i s v t r n m e e a r d t _ t e o s r _ C o l d { } (Bakewell_pudding, region, Derbyshire_Dales) Triples (Bakewell_pudding, course, Dessert) 16

  17. A running example: Drop Drop Stream Buffer … o D c b s W . f e a r e r e a i o r n g r v r m b i m e n y d a _ s t o h e r i s r _ e C _ o D l a d l e s { } (Bakewell_pudding, region, Derbyshire_Dales) Triples (Bakewell_pudding, course, Dessert) 17

  18. A running example: Drop Stream pop Buffer … o D c b s W . f r r e a e e i o a g r n r m v r b i m n e y a d s _ t h o e i r s r _ e C _ D o l a d l e s { } (Bakewell_pudding, region, Derbyshire_Dales) Triples (Bakewell_pudding, course, Dessert) 18

  19. Experimental Results • The proposed model, FactEditor , shows generally better performance. WebEdit RotoEdit Further results are in the paper 19

  20. Examples { ( Ardmore Airport , runwayLength , 1411.0 ), ( Ardmore Airport , 3rd runway SurfaceType , Poaceae ), EncDecEditor FactEditor Set of triples ( Ardmore Airport , operatingOrganisation , Civil Aviation Authority of New Zealand ), ( Ardmore Airport , elevationAboveTheSeaLevel , 34.0 ), 03R/21L ) } ( Ardmore Airport , runwayName , ☺ ☺ Ardmore Airport , ICAO Location Identifier UTAA . Ardmore Airport 3rd runway Fluency Draft text is made of Poaceae and Ardmore Airport . 03R/21L is 1411.0 m long and Ardmore Airport is 34.0 above sea level . Ardmore Airport is operated by Civil Aviation Authority of New Zealand . Ardmore Airport ☹ ☺ Revised text 3rd runway is made of Poaceae and Ardmore Airport name is 03R/21L . 03R/21L is 1411.0 m long and Ardmore Airport is 34.0 above sea level . Adequecy Ardmore Airport , ICAO Location Identifier UTAA , is operated by E NC D EC E DITOR Civil Aviation Authority of New Zealand . Ardmore Airport 3rd runway is made of Poaceae and Ardmore Airport . 03R/21L is 1411.0 m long and Ardmore Airport is 34.0 m long . ☹ ☺ Unnecessary Ardmore Airport is operated by Civil Aviation Authority of New Zealand . Ardmore Airport F ACT E DITOR 3rd runway is made of Poaceae and Ardmore Airport . 03R/21L is 1411.0 m long and paraphrasing Ardmore Airport is 34.0 above sea level . 20

  21. Runtime analysis • FactEditor shows the 2nd fastest inference performance. • It processes three times faster than EncDecEditor on RotoEdit dataset. W EB E DIT R OTO E DIT Table-to-Text 4,083 1,834 Text-to-Text 2,751 581 E NC D EC E DITOR 2,487 505 F ACT E DITOR 3,295 1,412 21

  22. Summary • We introduced the new task, Fact-based Text Editing . • We have proposed a data construction method for fact-based text editing and created two datasets. • We have proposed a model for fact-based text editing, which performs the task by generating a sequence of actions. Code & Data available at https://github.com/isomap/factedit 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend