Controlling Linguistic Style Aspects in Neural Language Generation
Jessica Ficler and Yoav Goldberg
ISCOL 2017
Controlling Linguistic Style Aspects in Neural Language Generation - - PowerPoint PPT Presentation
Controlling Linguistic Style Aspects in Neural Language Generation Jessica Ficler and Yoav Goldberg ISCOL 2017 Controlling Linguistic Style Aspects in Neural Language Generation Jessica Ficler and Yoav Goldberg ISCOL 2017 Our goal is to
Controlling Linguistic Style Aspects in Neural Language Generation
Jessica Ficler and Yoav Goldberg
ISCOL 2017
Controlling Linguistic Style Aspects in Neural Language Generation
Jessica Ficler and Yoav Goldberg
ISCOL 2017
The same message (e.g. expressing a positive sentiment towards a movie) can be conveyed in different ways.
“OMG... This movie actually made me cry a little bit because I laughed so hard at some parts.“
“OMG... This movie actually made me cry a little bit because I laughed so hard at some parts.“ Colloquial style
“OMG... This movie actually made me cry a little bit because I laughed so hard at some parts.“ Colloquial style Personal voice
“OMG... This movie actually made me cry a little bit because I laughed so hard at some parts.“ Colloquial style Personal voice Few adjectives
“OMG... This movie actually made me cry a little bit because I laughed so hard at some parts.“ Colloquial style Personal voice Few adjectives “A genuinely unique, full-on sensory experience that treads its own path between narrative clarity and pure visual expression.”
“OMG... This movie actually made me cry a little bit because I laughed so hard at some parts.“ Colloquial style Personal voice Few adjectives “A genuinely unique, full-on sensory experience that treads its own path between narrative clarity and pure visual expression.” Professional critic
“OMG... This movie actually made me cry a little bit because I laughed so hard at some parts.“ Colloquial style Personal voice Few adjectives “A genuinely unique, full-on sensory experience that treads its own path between narrative clarity and pure visual expression.” Professional critic Impersonal voice
“OMG... This movie actually made me cry a little bit because I laughed so hard at some parts.“ Colloquial style Personal voice Few adjectives “A genuinely unique, full-on sensory experience that treads its own path between narrative clarity and pure visual expression.” Professional critic Impersonal voice Many adjectives
Generate text that conforms to a set of content-based and stylistic requirements.
Generate text that conforms to a set of content-based and stylistic requirements. full length, natural sentences
Generate text that conforms to a set of content-based and stylistic requirements. more than 2 full length, natural sentences
Theme: Acting Descriptive: True
Theme: Acting Descriptive: True “A wholly original, well-acted, romantic comedy that's elevated by the modest talents
Theme: Acting Descriptive: True “A wholly original, well-acted, romantic comedy that's elevated by the modest talents
Theme: Acting Descriptive: True “A wholly original, well-acted, romantic comedy that's elevated by the modest talents
Theme: Acting Descriptive: True “A wholly original, well-acted, romantic comedy that's elevated by the modest talents
Theme: Plot Descriptive: False “I think the poor writing and script are what caused this movie to bomb.”
set of possible values 𝑊()
set of possible values 𝑊()
e.g.
Value Parameter False Professional True Personal ≤ 10 Length False Descriptive Other Theme Positive Sentiment
set of possible values 𝑊()
e.g. Output: a text that is compatible with the parameters values
Value Parameter False Professional True Personal ≤ 10 Length False Descriptive Other Theme Positive Sentiment
e.g. “I don't understand why it is rated so poorly.”
We consider 6 parameters and values from the movie reviews domain
Content Style Sentiment Theme Professional Personal Descriptive Length
Sentiment - The score that the reviewer gave the movie
Positive Neutral
“While the film doesn't quite reach the level of sugar fluctuations, it's beautifully animated.” “This movie is so much to keep you on the edge of your seat.”
Negative
“It’s a very low-budget movie that just seems to be a bunch
Theme - Whether the sentence's content is about the Plot, Acting, Production, Effects or none
Theme - Whether the sentence's content is about the Plot, Acting, Production, Effects or none
Plot - “The storyline had me laughing out loud.”
Theme - Whether the sentence's content is about the Plot, Acting, Production, Effects or none
Acting - “The cast are all excellent.” Plot - “The storyline had me laughing out loud.”
Theme - Whether the sentence's content is about the Plot, Acting, Production, Effects or none
Production - “The director's magical.” Acting - “The cast are all excellent.” Plot - “The storyline had me laughing out loud.”
Theme - Whether the sentence's content is about the Plot, Acting, Production, Effects or none
Effects - “Only saving grace is the sound effects.” Production - “The director's magical.” Acting - “The cast are all excellent.” Plot - “The storyline had me laughing out loud.”
Theme - Whether the sentence's content is about the Plot, Acting, Production, Effects or none
Effects - “Only saving grace is the sound effects.” Other - “I'm afraid that the movie is aimed at kids and adults weren't sure what to say about it.” Production - “The director's magical.” Acting - “The cast are all excellent.” Plot - “The storyline had me laughing out loud.”
Length – Number of words
≤ 10 words 11-20 words 21-40 words > 40 words
Professional - Whether the review is written in the style of a professional critic or not
Professional - Whether the review is written in the style of a professional critic or not
True
“This is a breath of fresh air, it's a welcome return to the franchise's brand of satirical humor.”
Professional - Whether the review is written in the style of a professional critic or not
True False
“So glad to see this movie !!” “This is a breath of fresh air, it's a welcome return to the franchise's brand of satirical humor.”
Personal - Whether the review describes subjective experience (written in personal voice) or not
Personal - Whether the review describes subjective experience (written in personal voice) or not
True
“I could see the movie again”
Personal - Whether the review describes subjective experience (written in personal voice) or not
True False
“Very similar to the book.” “I could see the movie again”
Descriptive - Whether the review is in descriptive (contains a high ratio of adjectives) style or not
True
“Such a hilarious and funny romantic comedy.”
Descriptive - Whether the review is in descriptive (contains a high ratio of adjectives) style or not
True False
“A definite must see for fans of anime fans, pop culture references and animation with a good laugh too.” “Such a hilarious and funny romantic comedy.”
Descriptive - Whether the review is in descriptive (contains a high ratio of adjectives) style or not
And we would like to control for all these aspects si simul ultani niousl usly
Value Parameter Type False Professional Style True Personal Style ≤ 10 Length Style False Descriptive Style Other Theme Content Positive Sentiment Content
“I don't understand why it is rated so poorly.”
a conditioned language model:
𝑄 𝑥" … 𝑥- 𝑑 = 0 𝑄(𝑥2|𝑥", … , 𝑥25" , 𝑑)
a conditioned language model: Condition each word on the history, as well as on a context c.
𝑄 𝑥" … 𝑥- 𝑑 = 0 𝑄(𝑥2|𝑥", … , 𝑥25" , 𝑑)
In our case, c is a concatenation of the parameters values embedding vectors
c:
Theme:Plot Proffesional:True Descriptive:True Length:≤10 Sentiment:Positive Personal:False
In our case, c is a concatenation of the parameters values embedding vectors
c:
Theme:Plot Proffesional:True Descriptive:True Length:≤10 Sentiment:Positive Personal:False
start
In our case, c is a concatenation of the parameters values embedding vectors
c:
Theme:Plot Proffesional:True Descriptive:True Length:≤10 Sentiment:Positive Personal:False
start
In our case, c is a concatenation of the parameters values embedding vectors
c:
Theme:Plot Proffesional:True Descriptive:True Length:≤10 Sentiment:Positive Personal:False
start An
In our case, c is a concatenation of the parameters values embedding vectors
c:
Theme:Plot Proffesional:True Descriptive:True Length:≤10 Sentiment:Positive Personal:False
start An An
In our case, c is a concatenation of the parameters values embedding vectors
c:
Theme:Plot Proffesional:True Descriptive:True Length:≤10 Sentiment:Positive Personal:False
start An An
In our case, c is a concatenation of the parameters values embedding vectors
c:
Theme:Plot Proffesional:True Descriptive:True Length:≤10 Sentiment:Positive Personal:False
start An An entertaining
In our case, c is a concatenation of the parameters values embedding vectors
c:
Theme:Plot Proffesional:True Descriptive:True Length:≤10 Sentiment:Positive Personal:False
start An An entertaining entertaining and and visually visually attractive attractive family- friendly family-friendly story story .
The model is simple, but…
we need training data annotated with the appropriate values.
Text Parameters extract
Text Parameters extract
Meta data Heuristics
Text Parameters extract train
Meta data Heuristics
Text Parameters extract train
Meta data Heuristics
Rotten-Tomatoeswebsite. 7,500 movies. 1,002,625 movie reviews.
Text Parameters extract train
Meta data Heuristics
Rotten-Tomatoeswebsite. 7,500 movies. 1,002,625 movie reviews.
In rottentomatoes the critic reviews are separated from the audience review
In rottentomatoes the critic reviews are separated from the audience review
Professional Non Professional
Some of the non-professional reviewers are considered as “super reviewers”
Also professional
Sentiment scores
Sentiment
We normalized the critics scores to be on 0-5 scale Negative 0-2 Neutral 3 Positive 4-5
Text Parameters extract train
Meta data Heuristics
Rotten-Tomatoeswebsite. 7,500 movies. 1,002,625 movie reviews.
Text Parameters extract train
Meta data Heuristics
Rotten-Tomatoeswebsite. 7,500 movies. 1,002,625 movie reviews.
Content words Function words POS tags
Content words
Effects Production
Director Directed Production co-production
Acting
Acting Cast Performance Play Role Miscasting Actor
Plot
Story Storytelling Plot Script Manuscript Tale Scene Effects Song Music Voice Visual Soundtrack Shot
To determine the value for the theme parameter we searched for words that are related to the 4 topics and are common in our data set
Theme
Content words
Effects Production
Director Directed Production co-production
Acting
Acting Cast Performance Play Role Miscasting Actor
Plot
Story Storytelling Plot Script Manuscript Tale Scene Effects Song Music Voice Visual Soundtrack Shot
Each sentence was labeled with the category that has the most words in the sentence. Sentences that do not include any words from our lists are labeled as other To determine the value for the theme parameter we searched for words that are related to the 4 topics and are common in our data set
Theme
Personal
True I My False Other cases To determine weather a review is written in personal voice we search for words that express subjectivity
Personal Pronouns
We assume that descriptive texts make heavy use of adjectives
True % JJ ≥35 False Other cases
Distribution of part-of-speech tags Descriptive
Length
≤ 10 words 11-20 words 21-40 words > 40 words
Our final data-set includes 2,773,435 sentences We divided the data set to training (~2.7M), development (~2K) and test (~2K) sets Each sentence is labeled with the 6 parameters
Parameters Values Text
easy
Parameters Values Text
easy
Text Parameters Values
easy hard
Text Parameters Values
extract hard
Text Conditioned Language Model Parameters Values
extract
Text Conditioned Language Model Does this work? Parameters Values
extract
Value Parameter False Professional True Personal 11-20 Length True Descriptive Other Theme Negative Sentiment
Value Parameter False Professional True Personal 11-20 Length True Descriptive Other Theme Negative Sentiment “Ultimately, I can honestly say that this movie is full of stupid stupid and stupid stupid stupid stupid stupid.”
Value Parameter False Professional True Personal 11-20 Length True Descriptive Other Theme Negative Sentiment “Ultimately, I can honestly say that this movie is full of stupid stupid and stupid stupid stupid stupid stupid.”
Value Parameter False Professional True Personal 11-20 Length True Descriptive Other Theme Negative Sentiment “Ultimately, I can honestly say that this movie is full of stupid stupid and stupid stupid stupid stupid stupid.”
Value Parameter False Professional True Personal 11-20 Length True Descriptive Other Theme Negative Sentiment “Ultimately, I can honestly say that this movie is full of stupid stupid and stupid stupid stupid stupid stupid.”
Value Parameter False Professional True Personal 11-20 Length True Descriptive Other Theme Negative Sentiment “Ultimately, I can honestly say that this movie is full of stupid stupid and stupid stupid stupid stupid stupid.”
Value Parameter False Professional True Personal 11-20 Length True Descriptive Other Theme Negative Sentiment “Ultimately, I can honestly say that this movie is full of stupid stupid and stupid stupid stupid stupid stupid.”
Value Parameter False Professional True Personal 11-20 Length True Descriptive Other Theme Negative Sentiment “Ultimately, I can honestly say that this movie is full of stupid stupid and stupid stupid stupid stupid stupid.”
Value Parameter False Professional True Personal 11-20 Length True Descriptive Other Theme Negative Sentiment “Ultimately, I can honestly say that this movie is full of stupid stupid and stupid stupid stupid stupid stupid.” “The film’s simple, and a refreshing take on the complex family drama of the regions of human intelligence.” Value Parameter True Professional False Personal 11-20 Length False Descriptive Other Theme Positive Sentiment
Value Parameter False Professional True Personal 11-20 Length True Descriptive Other Theme Negative Sentiment “Ultimately, I can honestly say that this movie is full of stupid stupid and stupid stupid stupid stupid stupid.” “The film’s simple, and a refreshing take on the complex family drama of the regions of human intelligence.” Value Parameter True Professional False Personal 11-20 Length False Descriptive Other Theme Positive Sentiment
We would like to quantitatively measure our model capabilities.
Does knowing the parameters indeed helps in achieving better language modeling results?
Does knowing the parameters indeed helps in achieving better language modeling results?
Test Dev 24.4 25.8 Not-conditioned 23.3 24.8 Conditioned
Knowing the correct parameter values indeed results in better perplexity!
Is our model effective comparing to train a separate unconditioned LM
Is our model effective comparing to train a separate unconditioned LM
Data Set
Is our model effective comparing to train a separate unconditioned LM
Data Set
Sentiment:Positive Sentiment:Neutral Sentiment:Negative
when generating text, we would choose the model that corresponds to the requested
When generating text, we would choose the model that corresponds to the requested value
Is our model effective comparing to train a separate unconditioned LM
Data Set
Sentiment:Positive; Proffesional:True Sentiment:Positive; Proffesional:False Sentiment:Neutral; Proffesional:True Sentiment:Neutral; Proffesional:False Sentiment:Negative; Proffesional:True Sentiment:Negative; Proffesional:False The number of models that need to be trained depends on the number of parameters and the possible values
Is our model effective comparing to train a separate unconditioned LM
Data Set
Sentiment:Positive; Proffesional:True; Theme:Other; Personal:False; Length:21-40; Descriptive:False Sentiment:Negative; Proffesional:False; Theme:Other; Personal:True; Length:21-40; Descriptive:False
. . . .
240
We hypothesize that the conditioned LM will be able to:
We hypothesize that the conditioned LM will be able to:
We hypothesize that the conditioned LM will be able to:
We hypothesize that the conditioned LM will be able to:
And thus will be more effective than a dedicated LM
We verify this hypothesis by training dedicated models and compare their results on the corresponding data to the results achieved by our model
For a set of parameters and values 𝑞" … 𝑞-, we train n sub-models Each sub model 𝑛& is trained on the subset of sentences that match parameters 𝑞" … 𝑞&
For a set of parameters and values 𝑞" … 𝑞-, we train n sub-models Each sub model 𝑛& is trained on the subset of sentences that match parameters 𝑞" … 𝑞& Example - given the set of parameters values: personal:false, sentiment:pos, professional:false, theme:other and length:≤10 we train 5 sub-models:
1. personal:false 2. persoal:false and sentiment:positive 3. persoal:false, sentiment:positive and professional:false 4. persoal:false, sentiment:positive, professional:false and theme:other 5. persoal:false, sentiment:positive, professional:false, theme:other and length:≤10
For a set of parameters and values 𝑞" … 𝑞-, we train n sub-models Each sub model 𝑛& is trained on the subset of sentences that match parameters 𝑞" … 𝑞& Example - given the set of parameters values: personal:false, sentiment:pos, professional:false, theme:other and length:≤10 we train 5 sub-models:
1. personal:false 2. persoal:false and sentiment:positive 3. persoal:false, sentiment:positive and professional:false 4. persoal:false, sentiment:positive, professional:false and theme:other 5. persoal:false, sentiment:positive, professional:false, theme:other and length:≤10
As we add parameters, the size of the training set of the sub-model decreases.
We measure the perplexity of the dedicated models on the test-set sentences that match the criteria and compare it to our conditioned LM and to an unconditioned language model. We do this for 4 different parameter-sets.
The dedicated model achieves better perplexity than our model on data with personal:false
The dedicated model achieves better perplexity than our model on data with personal:false The gap is getting smaller as the dedicated model includes more properties
The dedicated model achieves better perplexity than our model on data with personal:false The gap is getting smaller as the dedicated model includes more properties Eventually the conditioned model result is better than the dedicated model result
This is the case also in the other 3 sets that were experimented
The dedicated LM scores are better than our model when:
The dedicated LM scores are better than our model when:
We conclude that the conditioned model manages to generalize from sentences with different sets of properties, and is effective also with large number of conditioning factors.
How effective are the conditioning parameters individually?
How effective are the conditioning parameters individually? We compare the perplexity when using the correct conditioning values to the perplexity achieved when flipping the parameter value to an incorrect one.
23.3 Correct Value 27.2 Replacing Descriptive with non-Descriptive 27.5 Replacing Personal 25 Replacing Professional 24.3 Replacing Sentiment Pos with Neg The model distinguishes descriptive text and personal voice better than it distinguishes sentiment and professional text.
How well sentences generated by the model match the requested behavior (conditioning properties)?
For each parameter, we measure the correspondence of the sentences to the requested values.
Length
Max Min Avg Requested Length 21 1 7.6 <=10 25 5 20.6 11-20 49 7 34 21-40
Descriptive descriptive:true – 85.7% descriptive descriptive:false – 96% non-descriptive
We measure the percentage of sentences that are considered as descriptive when requesting descriptive:true, and when requesting descriptive:false
Personal personal:true – 100% personal personal:false – 99.85% non-personal
We measure the percentage of sentences that are considered as personal voice when requesting personal:true, and when requesting personal:false
Theme
% Other % Effects % Prod % Acting % Plot Requested value 0.3 0.2 0.8 98.7 Plot 1.6 0.6 95.3 2.5 Acting 2.6 97.4 Production 2.4 91.7 5.9 Effects 99.9 0.03 0.03 0.04 Other
For each of the possible theme values, we compute the proportion of the sentences that were generated with the corresponding value. The confusion shows that the majority of sentences are generated according to the requested theme
Professional The professional property could not be evaluated automatically We performed manual evaluation using Mechanical Turk
Professional The professional property could not be evaluated automatically We performed manual evaluation using Mechanical Turk Can a person distinguish professional:true from professional:false?
Professional The professional property could not be evaluated automatically We performed manual evaluation using Mechanical Turk Can a person distinguish professional:true from professional:false? We randomly created 1000 sentence-pairs:
(t) “This film has a certain sense of imagination and a sobering look at the clandestine indictment.” (f) “I know it’s a little bit too long, but it’s a great movie to watch !!!!” Which of the sentences was written by a professional critic?
Settings 5 different annotators. majority vote. Result The annotators were able to tell apart the professional from non- professional sentences generated sentences in 72.1% of the cases.
Analysis In a few cases the sentence that was generated for professional:true was indeed not professional enough “Looking forward to the trailer.”
In many cases, both sentences could indeed be considered as either professional or not Example: (t) “This is a cute movie with some funny moments, and some of the jokes are funny and entertaining.” (f) “Absolutely amazing story of bravery and dedication.”
Sentiment Manual annotations using Mechanical Turk We randomly created 300 pairs of generated sentences for each of the following settings:
Sentiment Manual annotations using Mechanical Turk We randomly created 300 pairs of generated sentences for each of the following settings:
Which of the reviewers liked the movie more than the other?
Settings 5 different annotators. Majority vote. Results
86.3% Positive/Negative 63% Positive/Neutral 69.7% Negative/Neutral
Examples where the intended sentiment was not recognized by the annotators:
(Pos) “It’s a shame that this film is not as good as the previous film, but it still delivers.” (Neg) “The premise is great, the acting is not bad, but the special effects are so bad.”
Can the model generate sentences for parameter combinations it has not seen in training?
theme:plot and personal:true
75,421
We removed from the training set about ~75K sentences which were labeled as theme:plot and personal:true
336,567 477,738
theme:plot and personal:true
75,421
We removed from the training set about ~75K sentences which were labeled as theme:plot and personal:true
336,567 477,738
We don’t train
We then asked the trained model to generate sentences with theme:plot and personal:true
We then asked the trained model to generate sentences with theme:plot and personal:true Results 100% of the generated sentences indeed contained personal pronouns 82.4% of them fit the theme:plot criteria (The result achieved by the full model is 97.8%)
Examples: “Some parts weren’t as good as I thought it would be and the acting and script were amazing.” “I really liked the story and the performances were likable and the chemistry between the two leads is great.”
Most work focus on content that need to be conveyed in the generated text
scores (Lipton et al., 2015; Tang et al., 2016)
and information to be conveyed (“price=low, food=italian, near=citycenter”) (Wen et al., 2015; Dusek and Jurcicek , 2016b,a)
al.,2016)
(Lebret et al.,2016)
Generation conditioned on style:
speaker’s identity (age, gender, location) for improving the factual consistency of the utterances
English to German with a feature that encodes whether the generated text (in German) should express politeness.
conditioning on multiple aspects of the generated text. Their model features a VAE based method coupled with a discriminator network. Hu et al. (2017) restrict themselves to sentences of up to length 16, and only two conditioning aspects (sentiment and tense).
the content of the generated text
generated text, in addition to its content
corresponding to the required linguistic style and content