Learning Simplifications for Specific Target Audiences Carolina - PowerPoint PPT Presentation

Learning Simplifications for Specific Target Audiences Carolina Scarton and Lucia Specia { c.scarton, l.specia } @sheffield.ac.uk ACL 2018, Melbourne, Australia 1 / 14

Text Simplification If the trend continues, the researchers say, some of the rarer amphibians could disappear in as few as six years from roughly half the sites where they're now found , while the more common species could see similar declines in 26 years. If the trend continues, some of the rarer amphibians could be gone from roughly half the sites where they are now found in as few as six years. More common species could see similar declines in 26 years. 2 / 14

Text Simplification If the trend continues, the researchers say, some of the rarer amphibians could disappear in as few as six years from roughly half the sites where they're now found , while the more common species could see similar declines in 26 years. If the trend continues, some of the rarer amphibians could be gone from roughly half the sites where they are now found in as few as six years. More common species could see similar declines in 26 years. ◮ For a specific target audience , e.g. non-native speakers 2 / 14

Text Simplification If the trend continues, the researchers say, some of the rarer amphibians could disappear in as few as six years from roughly half the sites where they're now found , while the more common species could see similar declines in 26 years. If the trend continues, some of the rarer amphibians could be gone from roughly half the sites where they are now found in as few as six years. More common species could see similar declines in 26 years. ◮ For a specific target audience , e.g. non-native speakers ◮ For improving NLP tasks , e.g. MT 2 / 14

Newsela Corpus ◮ Wikipedia – Simple Wikipedia (W–SW) ◮ rather small ◮ not professionally simplified ◮ no defined target audience 3 / 14

Newsela Corpus ◮ Wikipedia – Simple Wikipedia (W–SW) ◮ rather small ◮ not professionally simplified ◮ no defined target audience ◮ Newsela (version 2016-01-29.1) ◮ simplified versions target different grade levels in the US ◮ professionally simplified 3 / 14

Newsela Corpus ◮ Wikipedia – Simple Wikipedia (W–SW) ◮ rather small ◮ not professionally simplified ◮ no defined target audience ◮ Newsela (version 2016-01-29.1) ◮ simplified versions target different grade levels in the US ◮ professionally simplified ◮ Automatic sentence-level alignments ◮ Identical (146,251) ◮ Many-to-one (merge) (24,661) ◮ One-to-many (split) (121,582) ◮ Elaboration (258,150) 3 / 14

Newsela Corpus ◮ Wikipedia – Simple Wikipedia (W–SW) ◮ rather small ◮ not professionally simplified ◮ no defined target audience ◮ Newsela (version 2016-01-29.1) ◮ simplified versions target different grade levels in the US ◮ professionally simplified ◮ Automatic sentence-level alignments ◮ Identical (146,251) ◮ Many-to-one (merge) (24,661) ◮ One-to-many (split) (121,582) ◮ Elaboration (258,150) ◮ Newsela: ≈ 550K sentences pairs ( ≈ 280K W-SW) 3 / 14

Sequence-to-Sequence TS ◮ Sequence-to-Sequence: state-of-the-art for other text-to-text transformation tasks ◮ NTS [Nisioi et al., 2017] → state-of-the-art on W–SW 4 / 14

Sequence-to-Sequence TS ◮ Sequence-to-Sequence: state-of-the-art for other text-to-text transformation tasks ◮ NTS [Nisioi et al., 2017] → state-of-the-art on W–SW ◮ Previous work disregards specificities of different audiences 4 / 14

Sequence-to-Sequence TS ◮ Sequence-to-Sequence: state-of-the-art for other text-to-text transformation tasks ◮ NTS [Nisioi et al., 2017] → state-of-the-art on W–SW ◮ Previous work disregards specificities of different audiences ◮ Google’s multilingual NMT approach [Johnson et al., 2017]: artificial token to guide the encoder < 2es > How are you? → C´ omo est´ as? 4 / 14

Sequence-to-Sequence TS ◮ Sequence-to-Sequence: state-of-the-art for other text-to-text transformation tasks ◮ NTS [Nisioi et al., 2017] → state-of-the-art on W–SW ◮ Previous work disregards specificities of different audiences ◮ Google’s multilingual NMT approach [Johnson et al., 2017]: artificial token to guide the encoder < 2es > How are you? → C´ omo est´ as? ◮ Our approach : artificial token representing the grade level of the target sentence 4 / 14

TS for Different Grade Levels < 2 > dusty handprints stood out against the rust of the fence near Sasabe. dusty handprints could be seen on the fence near Sasabe. 5 / 14

< 2 > dusty handprints stood out against the rust of the fence near Sasabe. dusty handprints could be seen on the fence near Sasabe. < 4 > dusty handprints stood out against the rust of the fence near Sasabe. dusty handprints stood out against the rust of the fence near Sasabe. TS for Different Grade Levels ◮ Advantages: ◮ More adequate simplifications for audiences with different educational levels ◮ Real world scenario → grade level is given by the end-user ◮ Robust for repetitions of source sentences 6 / 14

< 4 > dusty handprints stood out against the rust of the fence near Sasabe. dusty handprints stood out against the rust of the fence near Sasabe. TS for Different Grade Levels ◮ Advantages: ◮ More adequate simplifications for audiences with different educational levels ◮ Real world scenario → grade level is given by the end-user ◮ Robust for repetitions of source sentences < 2 > dusty handprints stood out against the rust of the fence near Sasabe. dusty handprints could be seen on the fence near Sasabe. 6 / 14

TS for Different Grade Levels ◮ Advantages: ◮ More adequate simplifications for audiences with different educational levels ◮ Real world scenario → grade level is given by the end-user ◮ Robust for repetitions of source sentences < 2 > dusty handprints stood out against the rust of the fence near Sasabe. dusty handprints could be seen on the fence near Sasabe. < 4 > dusty handprints stood out against the rust of the fence near Sasabe. dusty handprints stood out against the rust of the fence near Sasabe. 6 / 14

Simplification Operations Information ◮ Sentence-level alignments → coarse-grained operations ◮ Identical, Elaborate, Split, Merge < elaboration > dusty handprints stood out against the rust of the fence near Sasabe. dusty handprints could be seen on the fence near Sasabe. 7 / 14

Simplification Operations Information ◮ Sentence-level alignments → coarse-grained operations ◮ Identical, Elaborate, Split, Merge < elaboration > dusty handprints stood out against the rust of the fence near Sasabe. dusty handprints could be seen on the fence near Sasabe. ◮ Problem: not available at test time 7 / 14

Simplification Operations Information ◮ Sentence-level alignments → coarse-grained operations ◮ Identical, Elaborate, Split, Merge < elaboration > dusty handprints stood out against the rust of the fence near Sasabe. dusty handprints could be seen on the fence near Sasabe. ◮ Problem: not available at test time ◮ Simplification operations classification ◮ four-class classifier → Naive Bayes with nine features ◮ Accuracy: 0.51 7 / 14

Experiment and results ◮ NMT approach → default OpenNMT 8 / 14

Experiments and Results ◮ NTS (w2v): no artificial tokens 9 / 14

Experiments and Results ◮ NTS (w2v): no artificial tokens ◮ Our models: ◮ s2s (baseline): no artificial tokens ◮ s2s+to-grade → < 2 > ◮ s2s+operation (pred/gold) → < elaboration > ◮ s2s+to-grade+operation (pred/gold) → < 2-elaboration > 9 / 14

Experiments and Results ◮ NTS (w2v): no artificial tokens ◮ Our models: ◮ s2s (baseline): no artificial tokens ◮ s2s+to-grade → < 2 > ◮ s2s+operation (pred/gold) → < elaboration > ◮ s2s+to-grade+operation (pred/gold) → < 2-elaboration > BLEU ↑ SARI ↑ Flesch ↑ NTS 61.60 33.40 79.95 s2s 61.78 33.72 79.86 9 / 14

Experiments and Results ◮ NTS (w2v): no artificial tokens ◮ Our models: ◮ s2s (baseline): no artificial tokens ◮ s2s+to-grade → < 2 > ◮ s2s+operation (pred/gold) → < elaboration > ◮ s2s+to-grade+operation (pred/gold) → < 2-elaboration > BLEU ↑ SARI ↑ Flesch ↑ NTS 61.60 33.40 79.95 s2s 61.78 33.72 79.86 s2s+to-grade 62.91 41.04 82.91 s2s+operation (pred) 59.83 37.36 84.96 s2s+to-grade+operation (pred) 61.48 40.56 83.11 9 / 14

Experiments and Results ◮ NTS (w2v): no artificial tokens ◮ Our models: ◮ s2s (baseline): no artificial tokens ◮ s2s+to-grade → < 2 > ◮ s2s+operation (pred/gold) → < elaboration > ◮ s2s+to-grade+operation (pred/gold) → < 2-elaboration > BLEU ↑ SARI ↑ Flesch ↑ NTS 61.60 33.40 79.95 s2s 61.78 33.72 79.86 s2s+to-grade 62.91 41.04 82.91 s2s+operation (pred) 59.83 37.36 84.96 s2s+to-grade+operation (pred) 61.48 40.56 83.11 s2s+operation (gold) 63.24 41.81 84.47 s2s+to-grade+operation (gold) 64.78 45.41 85.44 9 / 14

Learning Simplifications for Specific Target Audiences Carolina - PowerPoint PPT Presentation

Learning Simplifications for Specific Target Audiences Carolina Scarton and Lucia Specia { c.scarton, l.specia } @sheffield.ac.uk ACL 2018, Melbourne, Australia 1 / 14 Text Simplification If the trend continues, the researchers say, some of the

Videoconference Presentation Tips Planning Handouts Send handouts to the remote audiences ahead

M 87M C M edia Audiences M afalda Stasi Gemma Commane Syllabus 2014-15 Date Topic Tue

How to create targeted audiences that work How to create targeted audiences that work

CUSTOMS SIMPLIFICATIONS UNDER UCC Customs Consultative Committee Thursday 20 th September 2018

THE USE AND AUDIENCES OF THE USE AND AUDIENCES OF NATIONAL FOREST NATIONAL FOREST

Target Risk vs. Target Date Funds in 401(k) Plans: Maybe the answer is both January 14, 2015

Specific Aims One Page The single most important page in a grant Specific Aims Specific Aims

find audiences Providing the tools Good high resolution image/images Clear, plain English

Other Communication Channels Select channels of communication that will reach your audiences.

Natural Target Pruning Making Proper Pruning Cuts Natural Target Pruning In this lesson we

Cotton Incorporated TARGET SPOT UPDATE A. K. Hagan Auburn University TARGET SPOT Target Spot

LBNE 1.2MW Target NBI 2014 Presented by Brian Hartsell LBNE Target - Introduction Target

Semi-Heuristic Target-Based Fuzzy Target . . . Fuzzy Target . . . Fuzzy Decision Procedures:

Part 5: Kinookimaw Specific Claim Specific Claim: Specific claims deal with the past

P REPARATION 1 Prepared by : Rupal Patel TOPICS Numbers HCF and LCM Simplifications

Notes Simplifications of Elasticity cs533d-winter-2005 1 cs533d-winter-2005 2 Rotated Linear

Vertebrate Diversity and Conservation Clinton Jenkins (NC State University) Stuart Pimm (Duke

Overview Overview th 2012 Community Meeting October 24 October 24 th 2012 Community Meeting

Questionnaire for HELCOM Red List w orkshop Anna Westling Swedish Species Information Centre

J. Tomasz Giermakowski, Mason J. Ryan, Joseph A. Cook Museum of Southwestern Biology, University

Focu cusing sing on De Deli liver ery Invest estor Presen enta tati tion on Januar ary

Burkina Faso Analyst visit Taparko Gold Mine (SOMITA) 31.10.2013 Disclaimer Information

2020 INVESTOR PRESENTATION [CSE: FMAN; FWB: 3WU] Disclaimer This corporate presentation and the

September 2016 DISCLAIMER The information contained in this presentation has been prepared by

Sambuz

Useful Links

Newsletter

Mail Us