Controlling Linguistic Style Aspects in Neural Language Generation - - PowerPoint PPT Presentation

controlling linguistic style aspects in neural language
SMART_READER_LITE
LIVE PREVIEW

Controlling Linguistic Style Aspects in Neural Language Generation - - PowerPoint PPT Presentation

Controlling Linguistic Style Aspects in Neural Language Generation Jessica Ficler and Yoav Goldberg ISCOL 2017 Controlling Linguistic Style Aspects in Neural Language Generation Jessica Ficler and Yoav Goldberg ISCOL 2017 Our goal is to


slide-1
SLIDE 1

Controlling Linguistic Style Aspects in Neural Language Generation

Jessica Ficler and Yoav Goldberg

ISCOL 2017

slide-2
SLIDE 2

Controlling Linguistic Style Aspects in Neural Language Generation

Jessica Ficler and Yoav Goldberg

ISCOL 2017

slide-3
SLIDE 3

Our goal is to generate text… …while allowing control of its style.

slide-4
SLIDE 4

Style

The same message (e.g. expressing a positive sentiment towards a movie) can be conveyed in different ways.

slide-5
SLIDE 5

“OMG... This movie actually made me cry a little bit because I laughed so hard at some parts.“

Style Aspects (Example)

slide-6
SLIDE 6

“OMG... This movie actually made me cry a little bit because I laughed so hard at some parts.“ Colloquial style

Style Aspects (Example)

slide-7
SLIDE 7

“OMG... This movie actually made me cry a little bit because I laughed so hard at some parts.“ Colloquial style Personal voice

Style Aspects (Example)

slide-8
SLIDE 8

“OMG... This movie actually made me cry a little bit because I laughed so hard at some parts.“ Colloquial style Personal voice Few adjectives

Style Aspects (Example)

slide-9
SLIDE 9

“OMG... This movie actually made me cry a little bit because I laughed so hard at some parts.“ Colloquial style Personal voice Few adjectives “A genuinely unique, full-on sensory experience that treads its own path between narrative clarity and pure visual expression.”

Style Aspects (Example)

slide-10
SLIDE 10

“OMG... This movie actually made me cry a little bit because I laughed so hard at some parts.“ Colloquial style Personal voice Few adjectives “A genuinely unique, full-on sensory experience that treads its own path between narrative clarity and pure visual expression.” Professional critic

Style Aspects (Example)

slide-11
SLIDE 11

“OMG... This movie actually made me cry a little bit because I laughed so hard at some parts.“ Colloquial style Personal voice Few adjectives “A genuinely unique, full-on sensory experience that treads its own path between narrative clarity and pure visual expression.” Professional critic Impersonal voice

Style Aspects (Example)

slide-12
SLIDE 12

“OMG... This movie actually made me cry a little bit because I laughed so hard at some parts.“ Colloquial style Personal voice Few adjectives “A genuinely unique, full-on sensory experience that treads its own path between narrative clarity and pure visual expression.” Professional critic Impersonal voice Many adjectives

Style Aspects (Example)

slide-13
SLIDE 13

The challenge

Generate text that conforms to a set of content-based and stylistic requirements.

slide-14
SLIDE 14

The challenge

Generate text that conforms to a set of content-based and stylistic requirements. full length, natural sentences

slide-15
SLIDE 15

The challenge

Generate text that conforms to a set of content-based and stylistic requirements. more than 2 full length, natural sentences

slide-16
SLIDE 16

Example

Theme: Acting Descriptive: True

slide-17
SLIDE 17

Example

Theme: Acting Descriptive: True “A wholly original, well-acted, romantic comedy that's elevated by the modest talents

  • f a lesser known cast.”
slide-18
SLIDE 18

Example

Theme: Acting Descriptive: True “A wholly original, well-acted, romantic comedy that's elevated by the modest talents

  • f a lesser known cast.”
slide-19
SLIDE 19

Example

Theme: Acting Descriptive: True “A wholly original, well-acted, romantic comedy that's elevated by the modest talents

  • f a lesser known cast.”
slide-20
SLIDE 20

Example

Theme: Acting Descriptive: True “A wholly original, well-acted, romantic comedy that's elevated by the modest talents

  • f a lesser known cast.”

Theme: Plot Descriptive: False “I think the poor writing and script are what caused this movie to bomb.”

slide-21
SLIDE 21

Formal Definition

  • We assume a set of k parameters 𝑞" … 𝑞%, each parameter 𝑞& with a

set of possible values 𝑊()

slide-22
SLIDE 22

Formal Definition

  • We assume a set of k parameters 𝑞" … 𝑞%, each parameter 𝑞& with a

set of possible values 𝑊()

  • Input: specific assignment to these parameters

e.g.

Value Parameter False Professional True Personal ≤ 10 Length False Descriptive Other Theme Positive Sentiment

slide-23
SLIDE 23

Formal Definition

  • We assume a set of k parameters 𝑞" … 𝑞%, each parameter 𝑞& with a

set of possible values 𝑊()

  • Input: specific assignment to these parameters

e.g. Output: a text that is compatible with the parameters values

Value Parameter False Professional True Personal ≤ 10 Length False Descriptive Other Theme Positive Sentiment

e.g. “I don't understand why it is rated so poorly.”

slide-24
SLIDE 24

This work

We consider 6 parameters and values from the movie reviews domain

Content Style Sentiment Theme Professional Personal Descriptive Length

slide-25
SLIDE 25

Content Parameters

slide-26
SLIDE 26

Task Description – Content Parameters

Sentiment - The score that the reviewer gave the movie

Positive Neutral

“While the film doesn't quite reach the level of sugar fluctuations, it's beautifully animated.” “This movie is so much to keep you on the edge of your seat.”

Negative

“It’s a very low-budget movie that just seems to be a bunch

  • f fluff.”
slide-27
SLIDE 27

Task Description – Content Parameters

Theme - Whether the sentence's content is about the Plot, Acting, Production, Effects or none

  • f these (Other)
slide-28
SLIDE 28

Task Description – Content Parameters

Theme - Whether the sentence's content is about the Plot, Acting, Production, Effects or none

  • f these (Other)

Plot - “The storyline had me laughing out loud.”

slide-29
SLIDE 29

Task Description – Content Parameters

Theme - Whether the sentence's content is about the Plot, Acting, Production, Effects or none

  • f these (Other)

Acting - “The cast are all excellent.” Plot - “The storyline had me laughing out loud.”

slide-30
SLIDE 30

Task Description – Content Parameters

Theme - Whether the sentence's content is about the Plot, Acting, Production, Effects or none

  • f these (Other)

Production - “The director's magical.” Acting - “The cast are all excellent.” Plot - “The storyline had me laughing out loud.”

slide-31
SLIDE 31

Task Description – Content Parameters

Theme - Whether the sentence's content is about the Plot, Acting, Production, Effects or none

  • f these (Other)

Effects - “Only saving grace is the sound effects.” Production - “The director's magical.” Acting - “The cast are all excellent.” Plot - “The storyline had me laughing out loud.”

slide-32
SLIDE 32

Task Description – Content Parameters

Theme - Whether the sentence's content is about the Plot, Acting, Production, Effects or none

  • f these (Other)

Effects - “Only saving grace is the sound effects.” Other - “I'm afraid that the movie is aimed at kids and adults weren't sure what to say about it.” Production - “The director's magical.” Acting - “The cast are all excellent.” Plot - “The storyline had me laughing out loud.”

slide-33
SLIDE 33

Style Parameters

slide-34
SLIDE 34

Task Description – Style Parameters

Length – Number of words

≤ 10 words 11-20 words 21-40 words > 40 words

slide-35
SLIDE 35

Task Description – Style Parameters

Professional - Whether the review is written in the style of a professional critic or not

slide-36
SLIDE 36

Task Description – Style Parameters

Professional - Whether the review is written in the style of a professional critic or not

True

“This is a breath of fresh air, it's a welcome return to the franchise's brand of satirical humor.”

slide-37
SLIDE 37

Task Description – Style Parameters

Professional - Whether the review is written in the style of a professional critic or not

True False

“So glad to see this movie !!” “This is a breath of fresh air, it's a welcome return to the franchise's brand of satirical humor.”

slide-38
SLIDE 38

Task Description – Style Parameters

Personal - Whether the review describes subjective experience (written in personal voice) or not

slide-39
SLIDE 39

Task Description – Style Parameters

Personal - Whether the review describes subjective experience (written in personal voice) or not

True

“I could see the movie again”

slide-40
SLIDE 40

Task Description – Style Parameters

Personal - Whether the review describes subjective experience (written in personal voice) or not

True False

“Very similar to the book.” “I could see the movie again”

slide-41
SLIDE 41

Task Description – Style Parameters

Descriptive - Whether the review is in descriptive (contains a high ratio of adjectives) style or not

slide-42
SLIDE 42

Task Description – Style Parameters

True

“Such a hilarious and funny romantic comedy.”

Descriptive - Whether the review is in descriptive (contains a high ratio of adjectives) style or not

slide-43
SLIDE 43

Task Description – Style Parameters

True False

“A definite must see for fans of anime fans, pop culture references and animation with a good laugh too.” “Such a hilarious and funny romantic comedy.”

Descriptive - Whether the review is in descriptive (contains a high ratio of adjectives) style or not

slide-44
SLIDE 44

And we would like to control for all these aspects si simul ultani niousl usly

slide-45
SLIDE 45

Value Parameter Type False Professional Style True Personal Style ≤ 10 Length Style False Descriptive Style Other Theme Content Positive Sentiment Content

“I don't understand why it is rated so poorly.”

slide-46
SLIDE 46

Model

slide-47
SLIDE 47

Model

a conditioned language model:

𝑄 𝑥" … 𝑥- 𝑑 = 0 𝑄(𝑥2|𝑥", … , 𝑥25" , 𝑑)

  • 27"
slide-48
SLIDE 48

Model

a conditioned language model: Condition each word on the history, as well as on a context c.

𝑄 𝑥" … 𝑥- 𝑑 = 0 𝑄(𝑥2|𝑥", … , 𝑥25" , 𝑑)

  • 27"
slide-49
SLIDE 49

Model

In our case, c is a concatenation of the parameters values embedding vectors

c:

Theme:Plot Proffesional:True Descriptive:True Length:≤10 Sentiment:Positive Personal:False

slide-50
SLIDE 50

Model

In our case, c is a concatenation of the parameters values embedding vectors

c:

Theme:Plot Proffesional:True Descriptive:True Length:≤10 Sentiment:Positive Personal:False

start

slide-51
SLIDE 51

Model

In our case, c is a concatenation of the parameters values embedding vectors

c:

Theme:Plot Proffesional:True Descriptive:True Length:≤10 Sentiment:Positive Personal:False

start

slide-52
SLIDE 52

Model

In our case, c is a concatenation of the parameters values embedding vectors

c:

Theme:Plot Proffesional:True Descriptive:True Length:≤10 Sentiment:Positive Personal:False

start An

slide-53
SLIDE 53

Model

In our case, c is a concatenation of the parameters values embedding vectors

c:

Theme:Plot Proffesional:True Descriptive:True Length:≤10 Sentiment:Positive Personal:False

start An An

slide-54
SLIDE 54

Model

In our case, c is a concatenation of the parameters values embedding vectors

c:

Theme:Plot Proffesional:True Descriptive:True Length:≤10 Sentiment:Positive Personal:False

start An An

slide-55
SLIDE 55

Model

In our case, c is a concatenation of the parameters values embedding vectors

c:

Theme:Plot Proffesional:True Descriptive:True Length:≤10 Sentiment:Positive Personal:False

start An An entertaining

slide-56
SLIDE 56

Model

In our case, c is a concatenation of the parameters values embedding vectors

c:

Theme:Plot Proffesional:True Descriptive:True Length:≤10 Sentiment:Positive Personal:False

start An An entertaining entertaining and and visually visually attractive attractive family- friendly family-friendly story story .

slide-57
SLIDE 57

The model is simple, but…

we need training data annotated with the appropriate values.

slide-58
SLIDE 58

Text Parameters extract

slide-59
SLIDE 59

Text Parameters extract

Meta data Heuristics

slide-60
SLIDE 60

Text Parameters extract train

Meta data Heuristics

slide-61
SLIDE 61

Text Parameters extract train

Meta data Heuristics

Rotten-Tomatoeswebsite. 7,500 movies. 1,002,625 movie reviews.

slide-62
SLIDE 62

Text Parameters extract train

Meta data Heuristics

Rotten-Tomatoeswebsite. 7,500 movies. 1,002,625 movie reviews.

slide-63
SLIDE 63

Professional

slide-64
SLIDE 64

Professional

In rottentomatoes the critic reviews are separated from the audience review

slide-65
SLIDE 65

Professional

In rottentomatoes the critic reviews are separated from the audience review

Professional Non Professional

slide-66
SLIDE 66

Some of the non-professional reviewers are considered as “super reviewers”

Also professional

slide-67
SLIDE 67

Sentiment

slide-68
SLIDE 68

Sentiment

Sentiment scores

slide-69
SLIDE 69

Sentiment

Sentiment

We normalized the critics scores to be on 0-5 scale Negative 0-2 Neutral 3 Positive 4-5

slide-70
SLIDE 70

Text Parameters extract train

Meta data Heuristics

Rotten-Tomatoeswebsite. 7,500 movies. 1,002,625 movie reviews.

slide-71
SLIDE 71

Text Parameters extract train

Meta data Heuristics

Rotten-Tomatoeswebsite. 7,500 movies. 1,002,625 movie reviews.

Content words Function words POS tags

slide-72
SLIDE 72

Theme

Content words

Effects Production

Director Directed Production co-production

Acting

Acting Cast Performance Play Role Miscasting Actor

Plot

Story Storytelling Plot Script Manuscript Tale Scene Effects Song Music Voice Visual Soundtrack Shot

To determine the value for the theme parameter we searched for words that are related to the 4 topics and are common in our data set

Theme

slide-73
SLIDE 73

Theme

Content words

Effects Production

Director Directed Production co-production

Acting

Acting Cast Performance Play Role Miscasting Actor

Plot

Story Storytelling Plot Script Manuscript Tale Scene Effects Song Music Voice Visual Soundtrack Shot

Each sentence was labeled with the category that has the most words in the sentence. Sentences that do not include any words from our lists are labeled as other To determine the value for the theme parameter we searched for words that are related to the 4 topics and are common in our data set

Theme

slide-74
SLIDE 74

Personal Voice

Personal

True I My False Other cases To determine weather a review is written in personal voice we search for words that express subjectivity

Personal Pronouns

slide-75
SLIDE 75

Descriptiveness

We assume that descriptive texts make heavy use of adjectives

True % JJ ≥35 False Other cases

Distribution of part-of-speech tags Descriptive

slide-76
SLIDE 76

Length

Length

≤ 10 words 11-20 words 21-40 words > 40 words

slide-77
SLIDE 77

Dataset Statistics

Our final data-set includes 2,773,435 sentences We divided the data set to training (~2.7M), development (~2K) and test (~2K) sets Each sentence is labeled with the 6 parameters

slide-78
SLIDE 78

Parameters Values Text

easy

slide-79
SLIDE 79

Parameters Values Text

easy

slide-80
SLIDE 80

Text Parameters Values

easy hard

slide-81
SLIDE 81

Text Parameters Values

extract hard

slide-82
SLIDE 82

Text Conditioned Language Model Parameters Values

extract

slide-83
SLIDE 83

Text Conditioned Language Model Does this work? Parameters Values

extract

slide-84
SLIDE 84

Examples of Generated Sentences

Value Parameter False Professional True Personal 11-20 Length True Descriptive Other Theme Negative Sentiment

slide-85
SLIDE 85

Examples of Generated Sentences

Value Parameter False Professional True Personal 11-20 Length True Descriptive Other Theme Negative Sentiment “Ultimately, I can honestly say that this movie is full of stupid stupid and stupid stupid stupid stupid stupid.”

slide-86
SLIDE 86

Examples of Generated Sentences

Value Parameter False Professional True Personal 11-20 Length True Descriptive Other Theme Negative Sentiment “Ultimately, I can honestly say that this movie is full of stupid stupid and stupid stupid stupid stupid stupid.”

slide-87
SLIDE 87

Examples of Generated Sentences

Value Parameter False Professional True Personal 11-20 Length True Descriptive Other Theme Negative Sentiment “Ultimately, I can honestly say that this movie is full of stupid stupid and stupid stupid stupid stupid stupid.”

slide-88
SLIDE 88

Examples of Generated Sentences

Value Parameter False Professional True Personal 11-20 Length True Descriptive Other Theme Negative Sentiment “Ultimately, I can honestly say that this movie is full of stupid stupid and stupid stupid stupid stupid stupid.”

slide-89
SLIDE 89

Examples of Generated Sentences

Value Parameter False Professional True Personal 11-20 Length True Descriptive Other Theme Negative Sentiment “Ultimately, I can honestly say that this movie is full of stupid stupid and stupid stupid stupid stupid stupid.”

slide-90
SLIDE 90

Examples of Generated Sentences

Value Parameter False Professional True Personal 11-20 Length True Descriptive Other Theme Negative Sentiment “Ultimately, I can honestly say that this movie is full of stupid stupid and stupid stupid stupid stupid stupid.”

slide-91
SLIDE 91

Examples of Generated Sentences

Value Parameter False Professional True Personal 11-20 Length True Descriptive Other Theme Negative Sentiment “Ultimately, I can honestly say that this movie is full of stupid stupid and stupid stupid stupid stupid stupid.”

slide-92
SLIDE 92

Examples of Generated Sentences

Value Parameter False Professional True Personal 11-20 Length True Descriptive Other Theme Negative Sentiment “Ultimately, I can honestly say that this movie is full of stupid stupid and stupid stupid stupid stupid stupid.” “The film’s simple, and a refreshing take on the complex family drama of the regions of human intelligence.” Value Parameter True Professional False Personal 11-20 Length False Descriptive Other Theme Positive Sentiment

slide-93
SLIDE 93

Examples of Generated Sentences

Value Parameter False Professional True Personal 11-20 Length True Descriptive Other Theme Negative Sentiment “Ultimately, I can honestly say that this movie is full of stupid stupid and stupid stupid stupid stupid stupid.” “The film’s simple, and a refreshing take on the complex family drama of the regions of human intelligence.” Value Parameter True Professional False Personal 11-20 Length False Descriptive Other Theme Positive Sentiment

We would like to quantitatively measure our model capabilities.

slide-94
SLIDE 94

Evaluation

  • Evaluating LM Quality (Perplexity)
  • Evaluating the Generated Sentences
slide-95
SLIDE 95

Evaluating LM Quality

slide-96
SLIDE 96

Sanity Check

  • 1. Conditioned vs. Unconditioned

Does knowing the parameters indeed helps in achieving better language modeling results?

slide-97
SLIDE 97

Sanity Check

  • 1. Conditioned vs. Unconditioned

Does knowing the parameters indeed helps in achieving better language modeling results?

Test Dev 24.4 25.8 Not-conditioned 23.3 24.8 Conditioned

Knowing the correct parameter values indeed results in better perplexity!

slide-98
SLIDE 98

Baseline

  • 2. Conditioned vs. Dedicated LMs

Is our model effective comparing to train a separate unconditioned LM

  • n subset of the data (dedicated LM)?
slide-99
SLIDE 99

Baseline

  • 2. Conditioned vs. Dedicated LMs

Is our model effective comparing to train a separate unconditioned LM

  • n subset of the data (dedicated LM)?

Data Set

slide-100
SLIDE 100

Baseline

  • 2. Conditioned vs. Dedicated LMs

Is our model effective comparing to train a separate unconditioned LM

  • n subset of the data (dedicated LM)?

Data Set

Sentiment:Positive Sentiment:Neutral Sentiment:Negative

when generating text, we would choose the model that corresponds to the requested

When generating text, we would choose the model that corresponds to the requested value

slide-101
SLIDE 101

Baseline

  • 2. Conditioned vs. Dedicated LMs

Is our model effective comparing to train a separate unconditioned LM

  • n subset of the data (dedicated LM)?

Data Set

Sentiment:Positive; Proffesional:True Sentiment:Positive; Proffesional:False Sentiment:Neutral; Proffesional:True Sentiment:Neutral; Proffesional:False Sentiment:Negative; Proffesional:True Sentiment:Negative; Proffesional:False The number of models that need to be trained depends on the number of parameters and the possible values

slide-102
SLIDE 102

Baseline

  • 2. Conditioned vs. Dedicated LMs

Is our model effective comparing to train a separate unconditioned LM

  • n subset of the data (dedicated LM)?

Data Set

Sentiment:Positive; Proffesional:True; Theme:Other; Personal:False; Length:21-40; Descriptive:False Sentiment:Negative; Proffesional:False; Theme:Other; Personal:True; Length:21-40; Descriptive:False

. . . .

240

slide-103
SLIDE 103

Evaluation (Language Model Quality)

We hypothesize that the conditioned LM will be able to:

slide-104
SLIDE 104

Evaluation (Language Model Quality)

We hypothesize that the conditioned LM will be able to:

  • Generalize across properties-combinations
slide-105
SLIDE 105

Evaluation (Language Model Quality)

We hypothesize that the conditioned LM will be able to:

  • Generalize across properties-combinations
  • Share data between the different settings
slide-106
SLIDE 106

Evaluation (Language Model Quality)

We hypothesize that the conditioned LM will be able to:

  • Generalize across properties-combinations
  • Share data between the different settings

And thus will be more effective than a dedicated LM

slide-107
SLIDE 107

Evaluation (Language Model Quality)

We verify this hypothesis by training dedicated models and compare their results on the corresponding data to the results achieved by our model

slide-108
SLIDE 108

Baseline

For a set of parameters and values 𝑞" … 𝑞-, we train n sub-models Each sub model 𝑛& is trained on the subset of sentences that match parameters 𝑞" … 𝑞&

slide-109
SLIDE 109

Baseline

For a set of parameters and values 𝑞" … 𝑞-, we train n sub-models Each sub model 𝑛& is trained on the subset of sentences that match parameters 𝑞" … 𝑞& Example - given the set of parameters values: personal:false, sentiment:pos, professional:false, theme:other and length:≤10 we train 5 sub-models:

1. personal:false 2. persoal:false and sentiment:positive 3. persoal:false, sentiment:positive and professional:false 4. persoal:false, sentiment:positive, professional:false and theme:other 5. persoal:false, sentiment:positive, professional:false, theme:other and length:≤10

slide-110
SLIDE 110

Baseline

For a set of parameters and values 𝑞" … 𝑞-, we train n sub-models Each sub model 𝑛& is trained on the subset of sentences that match parameters 𝑞" … 𝑞& Example - given the set of parameters values: personal:false, sentiment:pos, professional:false, theme:other and length:≤10 we train 5 sub-models:

1. personal:false 2. persoal:false and sentiment:positive 3. persoal:false, sentiment:positive and professional:false 4. persoal:false, sentiment:positive, professional:false and theme:other 5. persoal:false, sentiment:positive, professional:false, theme:other and length:≤10

As we add parameters, the size of the training set of the sub-model decreases.

slide-111
SLIDE 111

Baseline

We measure the perplexity of the dedicated models on the test-set sentences that match the criteria and compare it to our conditioned LM and to an unconditioned language model. We do this for 4 different parameter-sets.

slide-112
SLIDE 112

Evaluation (Language Model Quality)

The dedicated model achieves better perplexity than our model on data with personal:false

slide-113
SLIDE 113

Evaluation (Language Model Quality)

The dedicated model achieves better perplexity than our model on data with personal:false The gap is getting smaller as the dedicated model includes more properties

slide-114
SLIDE 114

Evaluation (Language Model Quality)

The dedicated model achieves better perplexity than our model on data with personal:false The gap is getting smaller as the dedicated model includes more properties Eventually the conditioned model result is better than the dedicated model result

slide-115
SLIDE 115

Evaluation (Language Model Quality)

This is the case also in the other 3 sets that were experimented

slide-116
SLIDE 116

Evaluation (Language Model Quality)

The dedicated LM scores are better than our model when:

  • Only few conditioning parameters are needed
  • The coverage of the parameter combination in the training set is large enough
slide-117
SLIDE 117

Evaluation (Language Model Quality)

The dedicated LM scores are better than our model when:

  • Only few conditioning parameters are needed
  • The coverage of the parameter combination in the training set is large enough

We conclude that the conditioned model manages to generalize from sentences with different sets of properties, and is effective also with large number of conditioning factors.

slide-118
SLIDE 118

Evaluation (Language Model Quality)

  • 3. Conditioned vs. Flipped Conditioning

How effective are the conditioning parameters individually?

slide-119
SLIDE 119

Evaluation (Language Model Quality)

  • 3. Conditioned vs. Flipped Conditioning

How effective are the conditioning parameters individually? We compare the perplexity when using the correct conditioning values to the perplexity achieved when flipping the parameter value to an incorrect one.

slide-120
SLIDE 120

Evaluation (Language Model Quality)

23.3 Correct Value 27.2 Replacing Descriptive with non-Descriptive 27.5 Replacing Personal 25 Replacing Professional 24.3 Replacing Sentiment Pos with Neg The model distinguishes descriptive text and personal voice better than it distinguishes sentiment and professional text.

slide-121
SLIDE 121

Evaluating the Generated Sentences

slide-122
SLIDE 122

Evaluation (Generated Sentences)

How well sentences generated by the model match the requested behavior (conditioning properties)?

slide-123
SLIDE 123

Evaluation (Generated Sentences)

  • 1. Capturing Individual Properties

For each parameter, we measure the correspondence of the sentences to the requested values.

slide-124
SLIDE 124

Evaluation (Generated Sentences)

  • 1. Capturing Individual Properties

Length

Max Min Avg Requested Length 21 1 7.6 <=10 25 5 20.6 11-20 49 7 34 21-40

slide-125
SLIDE 125

Evaluation (Generated Sentences)

Descriptive descriptive:true – 85.7% descriptive descriptive:false – 96% non-descriptive

We measure the percentage of sentences that are considered as descriptive when requesting descriptive:true, and when requesting descriptive:false

slide-126
SLIDE 126

Evaluation (Generated Sentences)

Personal personal:true – 100% personal personal:false – 99.85% non-personal

We measure the percentage of sentences that are considered as personal voice when requesting personal:true, and when requesting personal:false

slide-127
SLIDE 127

Evaluation (Generated Sentences)

Theme

% Other % Effects % Prod % Acting % Plot Requested value 0.3 0.2 0.8 98.7 Plot 1.6 0.6 95.3 2.5 Acting 2.6 97.4 Production 2.4 91.7 5.9 Effects 99.9 0.03 0.03 0.04 Other

For each of the possible theme values, we compute the proportion of the sentences that were generated with the corresponding value. The confusion shows that the majority of sentences are generated according to the requested theme

slide-128
SLIDE 128

Evaluation (Generated Sentences)

Professional The professional property could not be evaluated automatically We performed manual evaluation using Mechanical Turk

slide-129
SLIDE 129

Evaluation (Generated Sentences)

Professional The professional property could not be evaluated automatically We performed manual evaluation using Mechanical Turk Can a person distinguish professional:true from professional:false?

slide-130
SLIDE 130

Evaluation (Generated Sentences)

Professional The professional property could not be evaluated automatically We performed manual evaluation using Mechanical Turk Can a person distinguish professional:true from professional:false? We randomly created 1000 sentence-pairs:

  • professional:true
  • professional:false
slide-131
SLIDE 131

Evaluation (Generated Sentences)

(t) “This film has a certain sense of imagination and a sobering look at the clandestine indictment.” (f) “I know it’s a little bit too long, but it’s a great movie to watch !!!!” Which of the sentences was written by a professional critic?

slide-132
SLIDE 132

Evaluation (Generated Sentences)

Settings 5 different annotators. majority vote. Result The annotators were able to tell apart the professional from non- professional sentences generated sentences in 72.1% of the cases.

slide-133
SLIDE 133

Evaluation (Generated Sentences)

Analysis In a few cases the sentence that was generated for professional:true was indeed not professional enough “Looking forward to the trailer.”

slide-134
SLIDE 134

Evaluation (Generated Sentences)

In many cases, both sentences could indeed be considered as either professional or not Example: (t) “This is a cute movie with some funny moments, and some of the jokes are funny and entertaining.” (f) “Absolutely amazing story of bravery and dedication.”

slide-135
SLIDE 135

Evaluation (Generated Sentences)

Sentiment Manual annotations using Mechanical Turk We randomly created 300 pairs of generated sentences for each of the following settings:

  • positive/negative
  • positive/neutral
  • negative/neutral.
slide-136
SLIDE 136

Evaluation (Generated Sentences)

Sentiment Manual annotations using Mechanical Turk We randomly created 300 pairs of generated sentences for each of the following settings:

  • positive/negative
  • positive/neutral
  • negative/neutral.

Which of the reviewers liked the movie more than the other?

slide-137
SLIDE 137

Evaluation (Generated Sentences)

Settings 5 different annotators. Majority vote. Results

86.3% Positive/Negative 63% Positive/Neutral 69.7% Negative/Neutral

slide-138
SLIDE 138

Evaluation (Generated Sentences)

Examples where the intended sentiment was not recognized by the annotators:

(Pos) “It’s a shame that this film is not as good as the previous film, but it still delivers.” (Neg) “The premise is great, the acting is not bad, but the special effects are so bad.”

slide-139
SLIDE 139

Evaluation (Generated Sentences)

  • 2. Generalization Ability

Can the model generate sentences for parameter combinations it has not seen in training?

slide-140
SLIDE 140

Evaluation (Generated Sentences)

theme:plot and personal:true

75,421

We removed from the training set about ~75K sentences which were labeled as theme:plot and personal:true

336,567 477,738

slide-141
SLIDE 141

Evaluation (Generated Sentences)

theme:plot and personal:true

75,421

We removed from the training set about ~75K sentences which were labeled as theme:plot and personal:true

336,567 477,738

We don’t train

  • n these
slide-142
SLIDE 142

Evaluation (Generated Sentences)

We then asked the trained model to generate sentences with theme:plot and personal:true

slide-143
SLIDE 143

Evaluation (Generated Sentences)

We then asked the trained model to generate sentences with theme:plot and personal:true Results 100% of the generated sentences indeed contained personal pronouns 82.4% of them fit the theme:plot criteria (The result achieved by the full model is 97.8%)

slide-144
SLIDE 144

Evaluation (Generated Sentences)

Examples: “Some parts weren’t as good as I thought it would be and the acting and script were amazing.” “I really liked the story and the performances were likable and the chemistry between the two leads is great.”

slide-145
SLIDE 145

Comparison to Previous Work

slide-146
SLIDE 146

Comparison to Previous Work

Most work focus on content that need to be conveyed in the generated text

  • Reviews generation conditioned on category and numeric rating

scores (Lipton et al., 2015; Tang et al., 2016)

  • Dialog generation conditioned on a dialog act (“request”, “inform”)

and information to be conveyed (“price=low, food=italian, near=citycenter”) (Wen et al., 2015; Dusek and Jurcicek , 2016b,a)

slide-147
SLIDE 147

Comparison to Previous Work

  • Recipe generation conditioned on a list of ingredients (Kiddon et

al.,2016)

  • Textual biographies generation conditioned on Wikipedia infoboxes

(Lebret et al.,2016)

slide-148
SLIDE 148

Comparison to Previous Work

Generation conditioned on style:

  • In dialog generation, Li et al. (2016) condition the text on the

speaker’s identity (age, gender, location) for improving the factual consistency of the utterances

  • In Machine Translation, Sennrich et al. (2016a) model translates

English to German with a feature that encodes whether the generated text (in German) should express politeness.

slide-149
SLIDE 149

Comparison to Previous Work

  • Hu et al. (2017) that tackles the same problem as ours:

conditioning on multiple aspects of the generated text. Their model features a VAE based method coupled with a discriminator network. Hu et al. (2017) restrict themselves to sentences of up to length 16, and only two conditioning aspects (sentiment and tense).

slide-150
SLIDE 150

Summary

  • Most work on neural natural language generation focus on controlling

the content of the generated text

  • We experiment with controlling several stylistic aspects of the

generated text, in addition to its content

  • The method is based on conditioned RNN language model
  • Simple but very effective!
  • We demonstrate the approach on the movie reviews domain
  • We show that it is successful in generating coherent sentences

corresponding to the required linguistic style and content