Learning Simplifications for Specific Target Audiences Carolina - - PowerPoint PPT Presentation

learning simplifications for specific target audiences
SMART_READER_LITE
LIVE PREVIEW

Learning Simplifications for Specific Target Audiences Carolina - - PowerPoint PPT Presentation

Learning Simplifications for Specific Target Audiences Carolina Scarton and Lucia Specia { c.scarton, l.specia } @sheffield.ac.uk ACL 2018, Melbourne, Australia 1 / 14 Text Simplification If the trend continues, the researchers say, some of the


slide-1
SLIDE 1

Learning Simplifications for Specific Target Audiences

Carolina Scarton and Lucia Specia

{c.scarton, l.specia}@sheffield.ac.uk

ACL 2018, Melbourne, Australia

1 / 14

slide-2
SLIDE 2

Text Simplification

If the trend continues, the researchers say, some of the rarer amphibians could disappear in as few as six years from roughly half the sites where they're now found, while the more common species could see similar declines in 26 years. If the trend continues, some of the rarer amphibians could be gone from roughly half the sites where they are now found in as few as six years. More common species could see similar declines in 26 years.

2 / 14

slide-3
SLIDE 3

Text Simplification

If the trend continues, the researchers say, some of the rarer amphibians could disappear in as few as six years from roughly half the sites where they're now found, while the more common species could see similar declines in 26 years. If the trend continues, some of the rarer amphibians could be gone from roughly half the sites where they are now found in as few as six years. More common species could see similar declines in 26 years. ◮ For a specific target audience, e.g. non-native speakers

2 / 14

slide-4
SLIDE 4

Text Simplification

If the trend continues, the researchers say, some of the rarer amphibians could disappear in as few as six years from roughly half the sites where they're now found, while the more common species could see similar declines in 26 years. If the trend continues, some of the rarer amphibians could be gone from roughly half the sites where they are now found in as few as six years. More common species could see similar declines in 26 years. ◮ For a specific target audience, e.g. non-native speakers ◮ For improving NLP tasks, e.g. MT

2 / 14

slide-5
SLIDE 5

Newsela Corpus

◮ Wikipedia – Simple Wikipedia (W–SW)

◮ rather small ◮ not professionally simplified ◮ no defined target audience 3 / 14

slide-6
SLIDE 6

Newsela Corpus

◮ Wikipedia – Simple Wikipedia (W–SW)

◮ rather small ◮ not professionally simplified ◮ no defined target audience

◮ Newsela (version 2016-01-29.1)

◮ simplified versions target different grade levels in the US ◮ professionally simplified 3 / 14

slide-7
SLIDE 7

Newsela Corpus

◮ Wikipedia – Simple Wikipedia (W–SW)

◮ rather small ◮ not professionally simplified ◮ no defined target audience

◮ Newsela (version 2016-01-29.1)

◮ simplified versions target different grade levels in the US ◮ professionally simplified

◮ Automatic sentence-level alignments

◮ Identical (146,251) ◮ Many-to-one (merge) (24,661) ◮ One-to-many (split) (121,582) ◮ Elaboration (258,150) 3 / 14

slide-8
SLIDE 8

Newsela Corpus

◮ Wikipedia – Simple Wikipedia (W–SW)

◮ rather small ◮ not professionally simplified ◮ no defined target audience

◮ Newsela (version 2016-01-29.1)

◮ simplified versions target different grade levels in the US ◮ professionally simplified

◮ Automatic sentence-level alignments

◮ Identical (146,251) ◮ Many-to-one (merge) (24,661) ◮ One-to-many (split) (121,582) ◮ Elaboration (258,150)

◮ Newsela: ≈550K sentences pairs (≈ 280K W-SW)

3 / 14

slide-9
SLIDE 9

Sequence-to-Sequence TS

◮ Sequence-to-Sequence: state-of-the-art for other text-to-text

transformation tasks

◮ NTS [Nisioi et al., 2017] → state-of-the-art on W–SW 4 / 14

slide-10
SLIDE 10

Sequence-to-Sequence TS

◮ Sequence-to-Sequence: state-of-the-art for other text-to-text

transformation tasks

◮ NTS [Nisioi et al., 2017] → state-of-the-art on W–SW

◮ Previous work disregards specificities of different audiences

4 / 14

slide-11
SLIDE 11

Sequence-to-Sequence TS

◮ Sequence-to-Sequence: state-of-the-art for other text-to-text

transformation tasks

◮ NTS [Nisioi et al., 2017] → state-of-the-art on W–SW

◮ Previous work disregards specificities of different audiences ◮ Google’s multilingual NMT approach [Johnson et al., 2017]:

artificial token to guide the encoder <2es> How are you? → C´

  • mo est´

as?

4 / 14

slide-12
SLIDE 12

Sequence-to-Sequence TS

◮ Sequence-to-Sequence: state-of-the-art for other text-to-text

transformation tasks

◮ NTS [Nisioi et al., 2017] → state-of-the-art on W–SW

◮ Previous work disregards specificities of different audiences ◮ Google’s multilingual NMT approach [Johnson et al., 2017]:

artificial token to guide the encoder <2es> How are you? → C´

  • mo est´

as?

◮ Our approach: artificial token representing the grade level

  • f the target sentence

4 / 14

slide-13
SLIDE 13

TS for Different Grade Levels

< 2 > dusty handprints stood out against the rust of the fence near Sasabe. dusty handprints could be seen on the fence near Sasabe.

5 / 14

slide-14
SLIDE 14

TS for Different Grade Levels

◮ Advantages:

◮ More adequate simplifications for audiences with different

educational levels

◮ Real world scenario → grade level is given by the end-user ◮ Robust for repetitions of source sentences

< 2 > dusty handprints stood out against the rust of the fence near Sasabe. dusty handprints could be seen on the fence near Sasabe. dusty handprints stood out against the rust of the fence near Sasabe. < 4 > dusty handprints stood out against the rust of the fence near Sasabe.

6 / 14

slide-15
SLIDE 15

TS for Different Grade Levels

◮ Advantages:

◮ More adequate simplifications for audiences with different

educational levels

◮ Real world scenario → grade level is given by the end-user ◮ Robust for repetitions of source sentences

< 2 > dusty handprints stood out against the rust of the fence near Sasabe. dusty handprints could be seen on the fence near Sasabe. dusty handprints stood out against the rust of the fence near Sasabe. < 4 > dusty handprints stood out against the rust of the fence near Sasabe.

6 / 14

slide-16
SLIDE 16

TS for Different Grade Levels

◮ Advantages:

◮ More adequate simplifications for audiences with different

educational levels

◮ Real world scenario → grade level is given by the end-user ◮ Robust for repetitions of source sentences

< 2 > dusty handprints stood out against the rust of the fence near Sasabe. dusty handprints could be seen on the fence near Sasabe. dusty handprints stood out against the rust of the fence near Sasabe. < 4 > dusty handprints stood out against the rust of the fence near Sasabe.

6 / 14

slide-17
SLIDE 17

Simplification Operations Information

◮ Sentence-level alignments → coarse-grained operations

◮ Identical, Elaborate, Split, Merge

< elaboration > dusty handprints stood out against the rust of the fence near Sasabe. dusty handprints could be seen on the fence near Sasabe.

7 / 14

slide-18
SLIDE 18

Simplification Operations Information

◮ Sentence-level alignments → coarse-grained operations

◮ Identical, Elaborate, Split, Merge

< elaboration > dusty handprints stood out against the rust of the fence near Sasabe. dusty handprints could be seen on the fence near Sasabe. ◮ Problem: not available at test time

7 / 14

slide-19
SLIDE 19

Simplification Operations Information

◮ Sentence-level alignments → coarse-grained operations

◮ Identical, Elaborate, Split, Merge

< elaboration > dusty handprints stood out against the rust of the fence near Sasabe. dusty handprints could be seen on the fence near Sasabe. ◮ Problem: not available at test time ◮ Simplification operations classification

◮ four-class classifier → Naive Bayes with nine features ◮ Accuracy: 0.51 7 / 14

slide-20
SLIDE 20

Experiment and results

◮ NMT approach → default OpenNMT

8 / 14

slide-21
SLIDE 21

Experiments and Results

◮ NTS (w2v): no artificial tokens

9 / 14

slide-22
SLIDE 22

Experiments and Results

◮ NTS (w2v): no artificial tokens ◮ Our models:

◮ s2s (baseline): no artificial tokens ◮ s2s+to-grade → <2> ◮ s2s+operation (pred/gold) → <elaboration> ◮ s2s+to-grade+operation (pred/gold) → <2-elaboration> 9 / 14

slide-23
SLIDE 23

Experiments and Results

◮ NTS (w2v): no artificial tokens ◮ Our models:

◮ s2s (baseline): no artificial tokens ◮ s2s+to-grade → <2> ◮ s2s+operation (pred/gold) → <elaboration> ◮ s2s+to-grade+operation (pred/gold) → <2-elaboration>

BLEU ↑ SARI ↑ Flesch ↑ NTS 61.60 33.40 79.95 s2s 61.78 33.72 79.86

9 / 14

slide-24
SLIDE 24

Experiments and Results

◮ NTS (w2v): no artificial tokens ◮ Our models:

◮ s2s (baseline): no artificial tokens ◮ s2s+to-grade → <2> ◮ s2s+operation (pred/gold) → <elaboration> ◮ s2s+to-grade+operation (pred/gold) → <2-elaboration>

BLEU ↑ SARI ↑ Flesch ↑ NTS 61.60 33.40 79.95 s2s 61.78 33.72 79.86 s2s+to-grade 62.91 41.04 82.91 s2s+operation (pred) 59.83 37.36 84.96 s2s+to-grade+operation (pred) 61.48 40.56 83.11

9 / 14

slide-25
SLIDE 25

Experiments and Results

◮ NTS (w2v): no artificial tokens ◮ Our models:

◮ s2s (baseline): no artificial tokens ◮ s2s+to-grade → <2> ◮ s2s+operation (pred/gold) → <elaboration> ◮ s2s+to-grade+operation (pred/gold) → <2-elaboration>

BLEU ↑ SARI ↑ Flesch ↑ NTS 61.60 33.40 79.95 s2s 61.78 33.72 79.86 s2s+to-grade 62.91 41.04 82.91 s2s+operation (pred) 59.83 37.36 84.96 s2s+to-grade+operation (pred) 61.48 40.56 83.11 s2s+operation (gold) 63.24 41.81 84.47 s2s+to-grade+operation (gold) 64.78 45.41 85.44

9 / 14

slide-26
SLIDE 26

Example

We want to reassure you that we take fire safety very seriously and we are doing everything we can to make sure our residents are safe.

Original

We want to reassure you that we take fire safety very seriously and we are doing everything we can to make sure our residents are safe.

s2s Grades 10-7

We want to reassure you that we take fire safety very seriously. We are doing everything we can to make sure our residents are safe.

Grades 6-5

We want to make sure we take fire safety very seriously. We are doing everything we can to make sure our people are safe.

Grade 4

We want to make sure people take fire safety very seriously. We are doing everything we can to make sure our people are safe .

Grade 3

We want to make sure people take fire safety very seriously. We are doing everything we can to make sure people are safe .

Grade 2

10 / 14

slide-27
SLIDE 27

Example

We want to reassure you that we take fire safety very seriously and we are doing everything we can to make sure our residents are safe.

Original

We want to reassure you that we take fire safety very seriously and we are doing everything we can to make sure our residents are safe.

s2s Grades 10-7

We want to reassure you that we take fire safety very seriously. We are doing everything we can to make sure our residents are safe.

Grades 6-5

We want to make sure we take fire safety very seriously. We are doing everything we can to make sure our people are safe.

Grade 4

We want to make sure people take fire safety very seriously. We are doing everything we can to make sure our people are safe .

Grade 3

We want to make sure people take fire safety very seriously. We are doing everything we can to make sure people are safe .

Grade 2

10 / 14

slide-28
SLIDE 28

Example

We want to reassure you that we take fire safety very seriously and we are doing everything we can to make sure our residents are safe.

Original

We want to reassure you that we take fire safety very seriously and we are doing everything we can to make sure our residents are safe.

s2s Grades 10-7

We want to reassure you that we take fire safety very seriously. We are doing everything we can to make sure our residents are safe.

Grades 6-5

We want to make sure we take fire safety very seriously. We are doing everything we can to make sure our people are safe.

Grade 4

We want to make sure people take fire safety very seriously. We are doing everything we can to make sure our people are safe .

Grade 3

We want to make sure people take fire safety very seriously. We are doing everything we can to make sure people are safe .

Grade 2

10 / 14

slide-29
SLIDE 29

Example

We want to reassure you that we take fire safety very seriously and we are doing everything we can to make sure our residents are safe.

Original

We want to reassure you that we take fire safety very seriously and we are doing everything we can to make sure our residents are safe.

s2s Grades 10-7

We want to reassure you that we take fire safety very seriously. We are doing everything we can to make sure our residents are safe.

Grades 6-5

We want to make sure we take fire safety very seriously. We are doing everything we can to make sure our people are safe.

Grade 4

We want to make sure people take fire safety very seriously. We are doing everything we can to make sure our people are safe .

Grade 3

We want to make sure people take fire safety very seriously. We are doing everything we can to make sure people are safe .

Grade 2

10 / 14

slide-30
SLIDE 30

Example

We want to reassure you that we take fire safety very seriously and we are doing everything we can to make sure our residents are safe.

Original

We want to reassure you that we take fire safety very seriously and we are doing everything we can to make sure our residents are safe.

s2s Grades 10-7

We want to reassure you that we take fire safety very seriously. We are doing everything we can to make sure our residents are safe.

Grades 6-5

We want to make sure we take fire safety very seriously. We are doing everything we can to make sure our people are safe.

Grade 4

We want to make sure people take fire safety very seriously. We are doing everything we can to make sure our people are safe .

Grade 3

We want to make sure people take fire safety very seriously. We are doing everything we can to make sure people are safe .

Grade 2

10 / 14

slide-31
SLIDE 31

Example

We want to reassure you that we take fire safety very seriously and we are doing everything we can to make sure our residents are safe.

Original

We want to reassure you that we take fire safety very seriously and we are doing everything we can to make sure our residents are safe.

s2s Grades 10-7

We want to reassure you that we take fire safety very seriously. We are doing everything we can to make sure our residents are safe.

Grades 6-5

We want to make sure we take fire safety very seriously. We are doing everything we can to make sure our people are safe.

Grade 4

We want to make sure people take fire safety very seriously. We are doing everything we can to make sure our people are safe .

Grade 3

We want to make sure people take fire safety very seriously. We are doing everything we can to make sure people are safe .

Grade 2

10 / 14

slide-32
SLIDE 32

Zero-shot TS

◮ Zero-shot TS among grade levels

◮ Example: from grade level 12 to grade level 4 ◮ No instances of 12-to-4 in the training set ◮ Other into 4 levels (e.g. 10-to-4, 6-to-4) 11 / 14

slide-33
SLIDE 33

Zero-shot TS

◮ Zero-shot TS among grade levels

◮ Example: from grade level 12 to grade level 4 ◮ No instances of 12-to-4 in the training set ◮ Other into 4 levels (e.g. 10-to-4, 6-to-4)

BLEU ↑ SARI ↑ Flesch ↑ 12-to-4 s2s 44.56 37.56 79.50 s2s+to-grade 49.43 50.76 91.04 s2s+to-grade+zs 50.18 50.85 91.08

11 / 14

slide-34
SLIDE 34

Zero-shot TS

◮ Zero-shot TS among grade levels

◮ Example: from grade level 12 to grade level 4 ◮ No instances of 12-to-4 in the training set ◮ Other into 4 levels (e.g. 10-to-4, 6-to-4)

BLEU ↑ SARI ↑ Flesch ↑ 12-to-4 s2s 44.56 37.56 79.50 s2s+to-grade 49.43 50.76 91.04 s2s+to-grade+zs 50.18 50.85 91.08 6-to-5 s2s 69.71 26.47 84.74 s2s+to-grade 69.39 26.32 87.07 s2s+to-grade+zs 68.78 26.23 86.80

11 / 14

slide-35
SLIDE 35

Conclusions

◮ TS without target audience → results not ideal

12 / 14

slide-36
SLIDE 36

Conclusions

◮ TS without target audience → results not ideal ◮ Using a simple artificial token with grade level to guide the

encoder

◮ can improve the quality of TS ◮ enables target-audience-oriented simplifications ◮ enables zero-shot TS 12 / 14

slide-37
SLIDE 37

Conclusions

◮ TS without target audience → results not ideal ◮ Using a simple artificial token with grade level to guide the

encoder

◮ can improve the quality of TS ◮ enables target-audience-oriented simplifications ◮ enables zero-shot TS

◮ Simplification operation information can help

◮ improve classifier for the task ◮ explore multi-task learning 12 / 14

slide-38
SLIDE 38

Learning Simplifications for Specific Target Audiences

Carolina Scarton and Lucia Specia

{c.scarton, l.specia}@sheffield.ac.uk

ACL 2018, Melbourne, Australia

13 / 14

slide-39
SLIDE 39

References I

Johnson, M., Schuster, M., Le, Q. V., Krikun, M., Wu, Y., Chen, Z., Thorat, N., Vi´ egas, F., Wattenberg, M., Corrado, G., Hughes, M., and Dean, J. (2017). Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. TACL, 5:339–351. Nisioi, S., ˇ Stajner, S., Ponzetto, S. P., and Dinu, L. P. (2017). Exploring neural text simplification models. In Proceedings of ACL, pages 85–91.

14 / 14