Diamonds in the Rough: Generating Fluent Sentences from Early-stage - PowerPoint PPT Presentation

Diamonds in the Rough: Generating Fluent Sentences from Early-stage Drafts for Academic Writing Assistance Takumi Ito 1,2 , Tatsuki Kuribayashi 1,2 , Hayato Kobayashi 3,4 , Ana Brassard 4,1 , Masato Hagiwara 5 , Jun Suzuki 1,4 and Kentaro Inui 1,4 1: Tohoku University, 2: Langsmith Inc., 3: Yahoo Japan Corporation, 4: RIKEN, 5: Octanove Labs LLC

2 The writing process FIRST DRAFT: “Model have good results.” Revising “Our model show “Our model shows a excellent perfomance good result in this task.” in this task.” Editing “Our model shows “Our model shows a excellent perfomance good results in this in this task.” task.” Proofreading “Our model shows excellent performance in this task.” “Our model shows excellent FINAL VERSION: performance in this task.” 2019/10/29 INLG2019

3 Automatic writing assistance FIRST DRAFT: “Model have good results.” insufficient fluidity • awkward style • Revising “Our model show “Our model shows collocation errors • a excellent perfomance good result missing words • in this task.” in this task.” Editing “Our model shows “Our model shows a excellent perfomance good results in this in this task.” grammatical errors task.” • spelling errors • Proofreading “Our model shows excellent performance in this task.” “Our model shows excellent FINAL VERSION: performance in this task.” 2019/10/29 INLG2019

4 Automatic writing assistance FIRST DRAFT: “Model have good results.” ✗ insufficient fluidity ✗ awkward style Revising “Our model show “Our model shows ✗ collocation errors a excellent perfomance good result ✗ missing words in this task.” in this task.” EXISTING STUDIES Editing “Our model shows “Our model shows ✓ grammatical errors a excellent perfomance good results in this in this task.” task.” ✓ spelling errors Proofreading “Our model shows excellent Grammatical error performance in this task.” correction (GEC) “Our model shows excellent FINAL VERSION: performance in this task.” 2019/10/29 INLG2019

5 Automatic writing assistance FIRST DRAFT: “Model have good results.” ✓ insufficient fluidity ✓ awkward style OUR FOCUS ✓ collocation errors Revising “Our model show “Our model shows ✓ missing words a excellent perfomance good result in this task.” in this task.” Editing “Our model shows “Our model shows ✓ grammatical errors a excellent perfomance good results in this in this task.” task.” ✓ spelling errors Proofreading “Our model shows excellent Grammatical error performance in this task.” correction (GEC) Sentence-level revision “Our model shows excellent FINAL VERSION: (SentRev) performance in this task.” 2019/10/29 INLG2019

6 Proposed Task: Sentence-level Revision The idea of our approach derives Our aproach idea is <*> at read from the normal human reading patern of normal human. pattern. revising, editing, draft final version proofreading l input: early-stage draft sentence - has errors (e.g., collocation errors) - has Information gaps (denoted by <*> ) l output: final version sentence - error-free - correctly filled-in sentence 2019/10/29 INLG2019

7 Proposed Task: Sentence-level Revision The idea of our approach derives Our aproach idea is <*> at read from the normal human reading patern of normal human. pattern. revising, editing, draft final version proofreading l input: early-stage draft sentence - has errors (e.g., collocation errors) - has Information gaps (denoted by <*> ) l output: final version sentence - error-free - correctly filled-in sentence l issue: lack of evaluation resource 2019/10/29 INLG2019

8 Our contributions The idea of our approach derives Our aproach idea is <*> at read from the normal human reading patern of normal human. pattern. revising, editing, draft final version proofreading l Created an evaluation dataset for SentRev - Set of Modified Incomplete TecHnical paper sentences (SMITH) l Analyzed the characteristics of the dataset l Established baseline scores for SentRev 2019/10/29 INLG2019

9 Evaluation Dataset Creation Goal : collect pairs of draft sentence and final version Our model shows Our model <*> results competitive results draft final 2019/10/29 INLG2019

10 Evaluation Dataset Creation Goal : collect pairs of draft sentence and final version Our model shows Our model <*> results competitive results Straight-forward approach ︓ Experts modify collected drafts to final version drafts final version Note: limitation: We can access plenty of early-stage draft sentences are final version sentences not usually publicly available 2019/10/29 INLG2019

11 Evaluation Dataset Creation Goal : collect pairs of draft sentence and final version Our model shows Our model <*> results competitive results Straight-forward approach ︓ Experts modify collected drafts to final version drafts final version Our approach : create draft sentences from final version sentences 2019/10/29 INLG2019

12 Crowdsourcing Protocol for Creating an Evaluation Dataset Our approach : create draft sentences from final version sentences ACL Anthology drafts final version Our model shows Our model <*> competitive results 私達のモデルは results 匹敵する結果を⽰しました。 1.automatically translate 2. Japanese native workers the final sentence into translate into English Japanese 2019/10/29 INLG2019

13 Crowdsourcing Protocol for Creating an Evaluation Dataset Our approach : create draft sentences from final version sentences insert <*> where workers ACL could not think of a good Anthology expression drafts final version Our model shows Our model <*> competitive results 私達のモデルは results 匹敵する結果を⽰しました。 1.automatically translate 2. Japanese native workers the final sentence into translate into English Japanese 2019/10/29 INLG2019

14 Statistics Levenshtein Dataset size w/<*> w/change distance Lang-8 2.1M - 42% 3.5 AESW 1.2M - 39% 4.8 JFLEG 1.5K - 86% 12.4 SMITH 10K 33% 99% 47.0 w/<*>: percentage of source sentences with <*> w/change: percentage where the source and target sentences differ l collected 10,804 pairs l SMITH simulates significant editing l Larger Levenshtein distance ⇨ more drastic editing 2019/10/29 INLG2019

15 Examples of SMITH draft: I research the rate of workable SQL <*> at the generated result. final: We study the percentage of executable SQL queries in the generated results. draft: For <*>, we used Adam using weight decay and gradient clipping . final: We used Adam with a weight decay and gradient clipping for optimization. draft: In the model aechitecture, as shown in Figure 1 , it is based an AE and GAN. final: The model architecture, as illustrated in figure 1 , is based on the AE and GAN. 2019/10/29 INLG2019

16 Examples of SMITH (1) Wording problems draft: I research the rate of workable SQL <*> at the generated result. final: We study the percentage of executable SQL queries in the generated results. draft: For <*>, we used Adam using weight decay and gradient clipping . final: We used Adam with a weight decay and gradient clipping for optimization. draft: In the model aechitecture, as shown in Figure 1 , it is based an AE and GAN. final: The model architecture, as illustrated in figure 1 , is based on the AE and GAN. 2019/10/29 INLG2019

17 Examples of SMITH (1) Wording problems draft: I research the rate of workable SQL <*> at the generated result. final: We study the percentage of executable SQL queries in the generated results. draft: For <*>, we used Adam using weight decay and gradient clipping. final: We used Adam with a weight decay and gradient clipping for optimization. draft: In the model aechitecture, as shown in Figure 1 , it is based an AE and GAN. final: The model architecture, as illustrated in figure 1 , is based on the AE and GAN. 2019/10/29 INLG2019

18 Examples of SMITH (2) Information gaps draft: I research the rate of workable SQL <*> at the generated result. final: We study the percentage of executable SQL queries in the generated results. draft: For <*>, we used Adam using weight decay and gradient clipping . final: We used Adam with a weight decay and gradient clipping for optimization. draft: In the model aechitecture, as shown in Figure 1 , it is based an AE and GAN. final: The model architecture, as illustrated in figure 1 , is based on the AE and GAN. 2019/10/29 INLG2019

19 Examples of SMITH (2) Information gaps draft: I research the rate of workable SQL <*> at the generated result. final: We study the percentage of executable SQL queries in the generated results. draft: For <*>, we used Adam using weight decay and gradient clipping. final: We used Adam with a weight decay and gradient clipping for optimization. draft: In the model aechitecture, as shown in Figure 1 , it is based an AE and GAN. final: The model architecture, as illustrated in figure 1 , is based on the AE and GAN. 2019/10/29 INLG2019

Diamonds in the Rough: Generating Fluent Sentences from Early-stage - PowerPoint PPT Presentation

Diamonds in the Rough: Generating Fluent Sentences from Early-stage Drafts for Academic Writing Assistance Takumi Ito 1,2 , Tatsuki Kuribayashi 1,2 , Hayato Kobayashi 3,4 , Ana Brassard 4,1 , Masato Hagiwara 5 , Jun Suzuki 1,4 and Kentaro Inui 1,4

Fluent Merging for Classical Planning Problems Jendrik Seipp Malte Helmert

H2 F2009 H2 F2009 GENERATING GENERATING GENERATING GENERATING FREE CASH FLOW FREE CASH FLOW

Diamonds Diamonds J.D. Price Images and much of the information here is from the American

Rough paths methods 1: Introduction Samy Tindel Purdue University University of Aarhus 2016

Rough paths methods 1: Introduction Samy Tindel University of Lorraine at Nancy KU - Probability

SYMBOLIC LOGIC UNIT 10: SINGULAR SENTENCES Singular Sentences (monadic) Paris is beautiful

or Diamonds in the rough Richard M. Davis, ULCC Maureen Pennock, BL or Set a blog to

Advanced Electric Generating Advanced Electric Generating Advanced Electric Generating

Ratchaburi Electricity Generating Holding PCL. Ratchaburi Electricity Generating Holding PCL.

Recursive Definitions Generating Functions Lecture 18 Generating Functions A generating

PRODUCING EXCEPTIONAL QUALITY DIAMONDS Corporate Presentation Q1 2019 DISCLAIMER The

EXCEPTIONAL QUALITY DIAMONDS Corporate Presentation Q4 2018 DISCLAIMER The information contained

Weak diamonds and topology Michael Hru s ak CCM UNAM michael@matmor.unam.mx Praha July

Recursive Datatypes and Lists Types and constructors data Suit = Spades | Hearts | Diamonds |

Rough Case Rough Case Study Plan Study Plan Responsible NORAD institutions Funding

The microstructural foundations of rough volatility Omar El Euch and Mathieu Rosenbaum Ecole

Some categorical aspects of coarse spaces and balleans Nicol` o Zava joint work with Dikran

Annual Meeting April 16, 2015 Note: All financial disclosure in this presentation is, unless

Econ 2148, fall 2019 Instrumental variables I, origins and binary treatment case Maximilian Kasy

14.581 International Trade Lecture 9: Factor Proportion Theory (II) 14.581 Week 5

Handling of Position Errors in Variational and Hybrid Ensemble/Variational Data Assimilation Using

Statistical View of Linear Least Squares Minjung Kyung mkyung@stat.ncsu.edu May 22, 2005 0-0

Applied Political Research Session 13: Multiple Regression Analysis Lecturer: Prof. A.

CS 147: Computer Systems Performance Analysis Two-Factor Designs 1 / 34 Overview CS147