Diverse Paraphrasing and its Effectiveness in Data Augmentation
Ashutosh Kumar*, Satwik Bhattamishra*, Manik Bhandari, Partha Talukdar Machine and Language Learning Lab (MALL) Indian Institute of Science, Bangalore
*Equal Contributions
Diverse Paraphrasing and its Effectiveness in Data Augmentation - - PowerPoint PPT Presentation
Diverse Paraphrasing and its Effectiveness in Data Augmentation Ashutosh Kumar*, Satwik Bhattamishra*, Manik Bhandari, Partha Talukdar Machine and Language Learning Lab (MALL) Indian Institute of Science, Bangalore *Equal Contributions
Ashutosh Kumar*, Satwik Bhattamishra*, Manik Bhandari, Partha Talukdar Machine and Language Learning Lab (MALL) Indian Institute of Science, Bangalore
*Equal Contributions
User: I want to book a flight from Minneapolis to New York
User: I want to book a flight from Minneapolis to New York Bot: Sure. When are you planning to travel ?
User: I want to book a flight from Minneapolis to New York Bot: Sure. When are you planning to travel ? User: Can you book plane to New York from Minneapolis
User: I want to book a flight from Minneapolis to New York Bot: Sure. When are you planning to travel ? User: Can you book plane to New York from Minneapolis
Sorry, I don’t understand what you’re saying
Bot: Sorry, I don’t understand what you’re saying
Data augmentation might help
User: I want to book a flight from Minneapolis to New York Bot: Sure. When are you planning to travel ? User: Can you book plane to New York from Minneapolis
Sorry, I don’t understand what you’re saying
Bot: Sorry, I don’t understand what you’re saying
Rephrasing a given text in multiple ways
Source how do i increase body height ?
Rephrasing a given text in multiple ways
Source how do i increase body height ? Paraphrases
Rephrasing a given text in multiple ways
Source how do i increase body height ? Paraphrases
Rephrasing a given text in multiple ways
(Meaning preserving)
Source how do i increase body height ? Paraphrases
Rephrasing a given text in multiple ways
(Meaning preserving)
(Lexical & syntactical variety)
Synonym or phrase replacement
Synonym or phrase replacement Source how do i increase body height ?
Synonym or phrase replacement Source how do i increase body height ? Synonym how do i grow body height ?
Synonym or phrase replacement Source how do i increase body height ? Synonym how do i grow body height ? Phrase how do i increase the body measurement vertically?
Synonym or phrase replacement Source how do i increase body height ? Synonym how do i grow body height ? Phrase how do i increase the body measurement vertically?
Encoder Decoder
Sentence Paraphrase
Synonym or phrase replacement Source how do i increase body height ? Synonym how do i grow body height ? Phrase how do i increase the body measurement vertically? Subsequence Selection - Beam Search (Top-k)
Encoder Decoder
Sentence Paraphrase
Synonym or phrase replacement Source how do i increase body height ? Synonym how do i grow body height ? Phrase how do i increase the body measurement vertically? Subsequence Selection - Beam Search (Top-k) Source how do i increase body height ?
Encoder Decoder
Sentence Paraphrase
Synonym or phrase replacement Source how do i increase body height ? Synonym how do i grow body height ? Phrase how do i increase the body measurement vertically? Subsequence Selection - Beam Search (Top-k) Source how do i increase body height ? Beam
Encoder Decoder
Sentence Paraphrase
Synonym or phrase replacement Source how do i increase body height ? Synonym how do i grow body height ? Phrase how do i increase the body measurement vertically? Subsequence Selection - Beam Search (Top-k) Source how do i increase body height ? Beam
Encoder Decoder
Sentence Paraphrase
Synonym or phrase replacement Source how do i increase body height ? Synonym how do i grow body height ? Phrase how do i increase the body measurement vertically? Subsequence Selection - Beam Search (Top-k) Source how do i increase body height ? Beam
Encoder Decoder
Sentence Paraphrase
Subsequence Selection - Beam Search (Diverse selection)
Subsequence Selection - Beam Search (Diverse selection) Source how do i increase body height ?
Subsequence Selection - Beam Search (Diverse selection) Source how do i increase body height ? Beam
Subsequence Selection - Beam Search (Diverse selection) Source how do i increase body height ? Beam
Subsequence Selection - Beam Search (Diverse selection) Source how do i increase body height ? Beam
Find k diverse paraphrases with high fidelity Method based on subset selection of candidate (sub)sequences
how do i increase my … how can i decrease the … how can i grow the … what ways exist to increase … how would I increase the … how do I decrease the … i am 17 , what … are there ways to increase …
how do i increase my … how can i decrease the … how can i grow the … what ways exist to increase … how would I increase the … how do I decrease the … i am 17 , what … are there ways to increase …
how do i increase my … how can i grow the … what ways exist to increase … are there ways to increase …
how do i increase my … how can i decrease the … how can i grow the … what ways exist to increase … how would I increase the … how do I decrease the … i am 17 , what … are there ways to increase …
how do i increase my … how can i grow the … what ways exist to increase … are there ways to increase …
argmaxX⊆Vt, |X|=kF(X)
how do i increase my … how can i decrease the … how can i grow the … what ways exist to increase … how would I increase the … how do I decrease the … i am 17 , what … are there ways to increase …
If is sub modular + monotone = Greedy algo. with good bounds exists
F
how do i increase my … how can i grow the … what ways exist to increase … are there ways to increase …
argmaxX⊆Vt, |X|=kF(X)
F = # Unique Coloured items
# Items = 4 F = 2
F = # Unique Coloured items
# Items = 4 F = 2
F = # Unique Coloured items
# Items = 4 F = 2 # Items = 4 + 1 F = 2 + 1
F = # Unique Coloured items
# Items = 4 F = 2 # Items = 4 + 1 F = 2 + 1
F = # Unique Coloured items
# Items = 4 F = 2 # Items = 4 + 1 F = 2 + 1 # Items = 5 + 1 F = 3 + 0
F = # Unique Coloured items
# Items = 4 F = 2 # Items = 4 + 1 F = 2 + 1 # Items = 5 + 1 F = 3 + 0
Diminishing Returns
F = # Unique Coloured items
A B A
Induce Diversity while not compromising on Fidelity
<sos>
Where can I get that movie?
can
Where can I get that film?
I <eos>
How can I get that picture? : 3k Candidate Subsequences find film? Where can I that Where can I Where How can can I I that that picture picture get find get movie? Where can I Where can I that
k- sequences
Synonym (similar embeddings)
Diversity Components Fidelity Components
where , can , film , I , How , find that , that picture , .. I get , can I , Where can I
Rewards unique n-grams Rewards Structural Coverage Source Sentence
Where
ENCODER DECODER
Induce Diversity while not compromising on Fidelity
<sos>
Where can I get that movie?
can
Where can I get that film?
I <eos>
How can I get that picture? : 3k Candidate Subsequences find film? Where can I that Where can I Where How can can I I that that picture picture get find get movie? Where can I Where can I that
k- sequences
Synonym (similar embeddings)
Diversity Components Fidelity Components
where , can , film , I , How , find that , that picture , .. I get , can I , Where can I
Rewards unique n-grams Rewards Structural Coverage Source Sentence
Where
ENCODER DECODER
Induce Diversity while not compromising on Fidelity
<sos>
Where can I get that movie?
can
Where can I get that film?
I <eos>
How can I get that picture? : 3k Candidate Subsequences find film? Where can I that Where can I Where How can can I I that that picture picture get find get movie? Where can I Where can I that
k- sequences
Synonym (similar embeddings)
Diversity Components Fidelity Components
where , can , film , I , How , find that , that picture , .. I get , can I , Where can I
Rewards unique n-grams Rewards Structural Coverage Source Sentence
Where
ENCODER DECODER
Induce Diversity while not compromising on Fidelity
<sos>
Where can I get that movie?
can
Where can I get that film?
I <eos>
How can I get that picture? : 3k Candidate Subsequences find film? Where can I that Where can I Where How can can I I that that picture picture get find get movie? Where can I Where can I that
k- sequences
Synonym (similar embeddings)
Diversity Components Fidelity Components
where , can , film , I , How , find that , that picture , .. I get , can I , Where can I
Rewards unique n-grams Rewards Structural Coverage Source Sentence
Where
ENCODER DECODER
Induce Diversity while not compromising on Fidelity
<sos>
Where can I get that movie?
can
Where can I get that film?
I <eos>
How can I get that picture? : 3k Candidate Subsequences find film? Where can I that Where can I Where How can can I I that that picture picture get find get movie? Where can I Where can I that
k- sequences
Synonym (similar embeddings)
Diversity Components Fidelity Components
where , can , film , I , How , find that , that picture , .. I get , can I , Where can I
Rewards unique n-grams Rewards Structural Coverage Source Sentence
Where
ENCODER DECODER
Induce Diversity while not compromising on Fidelity
<sos>
Where can I get that movie?
can
Where can I get that film?
I <eos>
How can I get that picture? : 3k Candidate Subsequences find film? Where can I that Where can I Where How can can I I that that picture picture get find get movie? Where can I Where can I that
k- sequences
Synonym (similar embeddings)
Diversity Components Fidelity Components
where , can , film , I , How , find that , that picture , .. I get , can I , Where can I
Rewards unique n-grams Rewards Structural Coverage Source Sentence
Where
ENCODER DECODER
Induce Diversity while not compromising on Fidelity
<sos>
Where can I get that movie?
can
Where can I get that film?
I <eos>
How can I get that picture? : 3k Candidate Subsequences find film? Where can I that Where can I Where How can can I I that that picture picture get find get movie? Where can I Where can I that
k- sequences
Synonym (similar embeddings)
Diversity Components Fidelity Components
where , can , film , I , How , find that , that picture , .. I get , can I , Where can I
Rewards unique n-grams Rewards Structural Coverage Source Sentence
Where
ENCODER DECODER
Induce Diversity while not compromising on Fidelity
<sos>
Where can I get that movie?
can
Where can I get that film?
I <eos>
How can I get that picture? : 3k Candidate Subsequences find film? Where can I that Where can I Where How can can I I that that picture picture get find get movie? Where can I Where can I that
k- sequences
Synonym (similar embeddings)
Diversity Components Fidelity Components
where , can , film , I , How , find that , that picture , .. I get , can I , Where can I
Rewards unique n-grams Rewards Structural Coverage Source Sentence
Where
ENCODER DECODER
Induce Diversity while not compromising on Fidelity
<sos>
Where can I get that movie?
can
Where can I get that film?
I <eos>
How can I get that picture? : 3k Candidate Subsequences find film? Where can I that Where can I Where How can can I I that that picture picture get find get movie? Where can I Where can I that
k- sequences
Synonym (similar embeddings)
Diversity Components Fidelity Components
where , can , film , I , How , find that , that picture , .. I get , can I , Where can I
Rewards unique n-grams Rewards Structural Coverage Source Sentence
Where
ENCODER DECODER
Induce Diversity while not compromising on Fidelity
<sos>
Where can I get that movie?
can
Where can I get that film?
I <eos>
How can I get that picture? : 3k Candidate Subsequences find film? Where can I that Where can I Where How can can I I that that picture picture get find get movie? Where can I Where can I that
k- sequences
Synonym (similar embeddings)
Diversity Components Fidelity Components
where , can , film , I , How , find that , that picture , .. I get , can I , Where can I
Rewards unique n-grams Rewards Structural Coverage Source Sentence
Where
ENCODER DECODER
Induce Diversity while not compromising on Fidelity
<sos>
Where can I get that movie?
can
Where can I get that film?
I <eos>
How can I get that picture? : 3k Candidate Subsequences find film? Where can I that Where can I Where How can can I I that that picture picture get find get movie? Where can I Where can I that
k- sequences
Synonym (similar embeddings)
Diversity Components Fidelity Components
where , can , film , I , How , find that , that picture , .. I get , can I , Where can I
Rewards unique n-grams Rewards Structural Coverage Source Sentence
Where
ENCODER DECODER
Where How can can I I that that picture picture get find
Diversity Components
where , can , film , I , How , find that , that picture , .. I get , can I , Where can I
Rewards unique n-grams Rewards Structural Coverage Where can I get that film? How can I get that picture? : 3k Candidate Subsequences find film? Where can I that Where can I
Where How can can I I that that picture picture get find
Diversity Components
where , can , film , I , How , find that , that picture , .. I get , can I , Where can I
Rewards unique n-grams Rewards Structural Coverage
N
∑
n=1
βn ⋃
x∈X
xn−gram
N-gram uniqueness
Where can I get that film? How can I get that picture? : 3k Candidate Subsequences find film? Where can I that Where can I
Where How can can I I that that picture picture get find
Diversity Components
where , can , film , I , How , find that , that picture , .. I get , can I , Where can I
Rewards unique n-grams Rewards Structural Coverage
N
∑
n=1
βn ⋃
x∈X
xn−gram
N-gram uniqueness
∑
xi∈V(t) ∑ xj∈X
ℛ(xi, xj)
ℛ(xi, xj) = 1 − EditDistance(xi, xj) |xi| + |xj|
Structural Coverage
Where can I get that film? How can I get that picture? : 3k Candidate Subsequences find film? Where can I that Where can I
Induce Diversity while not compromising on Fidelity
<sos>
Where can I get that movie?
can
Where can I get that film?
I <eos>
How can I get that picture? : 3k Candidate Subsequences find film? Where can I that Where can I Where How can can I I that that picture picture get find get movie? Where can I Where can I that
k- sequences
Synonym (similar embeddings)
Diversity Components Fidelity Components
where , can , film , I , How , find that , that picture , .. I get , can I , Where can I
Rewards unique n-grams Rewards Structural Coverage Source Sentence
Where
ENCODER DECODER
Induce Diversity while not compromising on Fidelity
<sos>
Where can I get that movie?
can
Where can I get that film?
I <eos>
How can I get that picture? : 3k Candidate Subsequences find film? Where can I that Where can I Where How can can I I that that picture picture get find get movie? Where can I Where can I that
k- sequences
Synonym (similar embeddings)
Diversity Components Fidelity Components
where , can , film , I , How , find that , that picture , .. I get , can I , Where can I
Rewards unique n-grams Rewards Structural Coverage Source Sentence
Where
ENCODER DECODER
Where can I get that film? How can I get that picture? : 3k Candidate Subsequences find film? Where can I that Where can I get movie? Where can I Where can I that
Synonym (similar embeddings)
Fidelity Components
s rage Source Sentence
Where can I get that film? How can I get that picture? : 3k Candidate Subsequences find film? Where can I that Where can I get movie? Where can I Where can I that
Synonym (similar embeddings)
Fidelity Components
s rage Source Sentence
∑
x∈X N
∑
n=1
βn |xn-gram ∩ sn-gram|
Lexical Similarity
Where can I get that film? How can I get that picture? : 3k Candidate Subsequences find film? Where can I that Where can I get movie? Where can I Where can I that
Synonym (similar embeddings)
Fidelity Components
s rage Source Sentence
∑
x∈X
𝒯(x, s)
𝒯(x, s) = 1 |x| ∑
wi∈x
maxwj∈s ψ(vwi, vwj)
Embedding based Similarity
∑
x∈X N
∑
n=1
βn |xn-gram ∩ sn-gram|
Lexical Similarity
Where How can can I I that that picture picture get find
Diversity Components
where , can , film , I , How , find that , that picture , .. I get , can I , Where can I
Rewards unique n-grams Rewards Structural Coverage Where can I get that film? How can I get that picture? : 3k Candidate Subsequences find film? Where can I that Where can I get movie? Where can I Where can I that
Synonym (similar embeddings)
Fidelity Components
s rage Source Sentence
Where How can can I I that that picture picture get find
Diversity Components
where , can , film , I , How , find that , that picture , .. I get , can I , Where can I
Rewards unique n-grams Rewards Structural Coverage Where can I get that film? How can I get that picture? : 3k Candidate Subsequences find film? Where can I that Where can I get movie? Where can I Where can I that
Synonym (similar embeddings)
Fidelity Components
s rage Source Sentence
argmaxX⊆V, X=|k|F(X) F(X) = λ(μ1D1(X) + μ2D2(X)) + (1 − λ)(ν1L1(X, s) + ν2L2(X, s))
Where How can can I I that that picture picture get find
Diversity Components
where , can , film , I , How , find that , that picture , .. I get , can I , Where can I
Rewards unique n-grams Rewards Structural Coverage Where can I get that film? How can I get that picture? : 3k Candidate Subsequences find film? Where can I that Where can I get movie? Where can I Where can I that
Synonym (similar embeddings)
Fidelity Components
s rage Source Sentence
argmaxX⊆V, X=|k|F(X) F(X) = λ(μ1D1(X) + μ2D2(X)) + (1 − λ)(ν1L1(X, s) + ν2L2(X, s))
Where How can can I I that that picture picture get find
Diversity Components
where , can , film , I , How , find that , that picture , .. I get , can I , Where can I
Rewards unique n-grams Rewards Structural Coverage Where can I get that film? How can I get that picture? : 3k Candidate Subsequences find film? Where can I that Where can I get movie? Where can I Where can I that
Synonym (similar embeddings)
Fidelity Components
s rage Source Sentence
argmaxX⊆V, X=|k|F(X) F(X) = λ(μ1D1(X) + μ2D2(X)) + (1 − λ)(ν1L1(X, s) + ν2L2(X, s))
27 30 33 36 SBS DBS VAE-SVG DPP SSR DiPS (Ours)
BLEU (Fidelity)
27 30 33 36 SBS DBS VAE-SVG DPP SSR DiPS (Ours)
BLEU (Fidelity)
30 39 48 57 66
Models
SBS DBS VAE-SVG DPP SSR DiPS (Ours)
4-Distinct (Diversity)
27 30 33 36 SBS DBS VAE-SVG DPP SSR DiPS (Ours)
BLEU (Fidelity)
30 39 48 57 66
Models
SBS DBS VAE-SVG DPP SSR DiPS (Ours)
4-Distinct (Diversity)
Quora Dataset
Accuracy
67 70 73 76
Models
No Aug SBS DPP SSR DBS DiPS (Ours)
LogReg SiameseLSTM
Quora Dataset
Accuracy
67 70 73 76
Models
No Aug SBS DPP SSR DBS DiPS (Ours)
LogReg SiameseLSTM
Dataset : SNIPS
Accuracy
93 94 95 96 97 98
Models
SBS DBS
DiPS (Ours)
LogReg LSTM
Dataset : SNIPS
Accuracy
93 94 95 96 97 98
Models
SBS DBS
DiPS (Ours)
LogReg LSTM
Dataset : Yahoo-L31
Accuracy
62 63 64 65 66 67
Models
No Aug. SBS DBS Syn.Rep
DiPS (Ours)
LogReg LSTM
Dataset : Yahoo-L31
Accuracy
62 63 64 65 66 67
Models
No Aug. SBS DBS Syn.Rep
DiPS (Ours)
LogReg LSTM
Diversity in Paraphrases Without compromising
Sub-modular
Diversity in Paraphrases Without compromising
Sub-modular
Data Augmentation Using Paraphrasing
Seq2Seq + Diversity
Diversity in Paraphrases Without compromising
https://github.com/malllabiisc/DiPS
https://github.com/malllabiisc/DiPS
https://github.com/malllabiisc/DiPS
https://github.com/malllabiisc/DiPS
ashutosh@iisc.ac.in, satwik55@gmail.com
Find us on
https://github.com/malllabiisc/DiPS
ashutosh@iisc.ac.in, satwik55@gmail.com
Find us on