Chunk-based Verb Reordering in VSO Sentences for Arabic-English SMT - - PowerPoint PPT Presentation

chunk based verb reordering in vso sentences for arabic
SMART_READER_LITE
LIVE PREVIEW

Chunk-based Verb Reordering in VSO Sentences for Arabic-English SMT - - PowerPoint PPT Presentation

Chunk-based Verb Reordering in VSO Sentences for Arabic-English SMT Arianna Bisazza, Marcello Federico FBK-irst Trento, Italy WMT 2010, Uppsala, 15-16 July 2010 1 Introduc)on Englishwordorder:SubjectVerbObject


slide-1
SLIDE 1

Chunk-based Verb Reordering in VSO Sentences for Arabic-English SMT

Arianna Bisazza, Marcello Federico

FBK-irst Trento, Italy WMT 2010, Uppsala, 15-16 July 2010

1

slide-2
SLIDE 2

2

Introduc)on


  • English
word
order
:
Subject‐Verb‐Object

  • Arabic
:
both
SVO
and
VSO

  • Common
errors
in
phrase‐based
SMT
outputs:


−
wrong
order
of
syntacBc
consBtuents
 −
verbless
sentences


WMT 2010, Uppsala

  • A. Bisazza, M. Federico
slide-3
SLIDE 3

3

Outline 


  • Reordering
paEerns
in
Arabic‐English


  • Chunk‐based
verb
reordering:
technique
and
analysis

  • Impact
of
VSO
sentences
on
translaBon
quality


  • Chunk‐based
reordering
laIces


WMT 2010, Uppsala

  • A. Bisazza, M. Federico
slide-4
SLIDE 4

4

Outline 


  • Reordering
pa3erns
in
Arabic‐English


  • Chunk‐based
verb
reordering:
technique
and
analysis

  • Impact
of
VSO
sentences
on
translaBon
quality


  • Chunk‐based
reordering
laIces


WMT 2010, Uppsala

  • A. Bisazza, M. Federico
slide-5
SLIDE 5

5

Reordering
pa3erns
in
Arabic‐English


VSO
sentence:
Arabic
verb
an#cipated
wrt
English


WMT 2010, Uppsala

  • A. Bisazza, M. Federico
slide-6
SLIDE 6

6

Reordering
pa3erns
in
Arabic‐English


VSO
sentence:
Arabic
verb
an#cipated
wrt
English


WMT 2010, Uppsala

  • A. Bisazza, M. Federico

Several
local,
one
long
reordering
involving
the
verb
 Typical
phrase‐based
SMT
outputs:



*The
Moroccan
monarch
King
Mohamed
VI
__
his
support
to…
 *He
renewed
the
Moroccan
monarch
King
Mohamed
VI
his
support
to…


slide-7
SLIDE 7

7

Previous
works



















(Habash
'07;
Crego&Habash
'08;
Elming&Habash
'09)


  • preprocess
source
data
to
approximate
target
word
order

  • address
all
reorderings

  • determinisBc
reordering
=>
1
most
probable
permutaBon

  • non‐determinisBc
=>
word
reordering
laIces



Our
work:


  • only
one
class
of
reorderings

  • mixed
approach:
determinisBc
for
train,
laIces
for
test


WMT 2010, Uppsala

  • A. Bisazza, M. Federico
slide-8
SLIDE 8

8

Reordering
pa3erns
in
Arabic‐English


Working
hypothesis:

 






uneven
distribu#on
of
reordering
phenomena



WMT 2010, Uppsala

  • A. Bisazza, M. Federico
slide-9
SLIDE 9

9

Reordering
pa3erns
in
Arabic‐English


Working
hypothesis:

 






uneven
distribu#on
of
reordering
phenomena

 Many
local
 
−
adjecBval
modifiers
following
their
noun

 
 
 
 

−
head‐iniBal
geniBve
construcBons
(idafa)




























Example
=>



Few
global



−
Verb‐Subject‐Object
sentences



WMT 2010, Uppsala

  • A. Bisazza, M. Federico
slide-10
SLIDE 10

10

Reordering
pa3erns
in
Arabic‐English


Working
hypothesis:

 






uneven
distribu#on
of
reordering
phenomena

 Many
local
 
−
adjecBves
follow
nouns
 
 
 
 

−
head‐iniBal
geniBve
construcBons
(idafa)




























Example
=>



Few
global



−
Verb‐Subject‐Object
sentences



WMT 2010, Uppsala

  • A. Bisazza, M. Federico
slide-11
SLIDE 11

11

Reordering
pa3erns
in
Arabic‐English


Working
hypothesis:

 






uneven
distribu#on
of
reordering
phenomena

 Many
local
 
−
adjecBves
follow
nouns
 
 
 
 

−
head‐iniBal
geniBve
construcBons
(idafa)




























Example
=>



Few
global



−
Verb‐Subject‐Object
sentences



WMT 2010, Uppsala

  • A. Bisazza, M. Federico
slide-12
SLIDE 12

12

Reordering
pa3erns
in
Arabic‐English


VSO
sentences:

 moving
verb
a\er
subject
simplifies
reordering
 Other
(local)
reorderings:

 handled
inside
phrases
or
through
distorBon


WMT 2010, Uppsala

  • A. Bisazza, M. Federico
slide-13
SLIDE 13

13

Outline 


  • Reordering
paEerns
in
Arabic‐English


  • Chunk‐based
verb
reordering:
technique
and
analysis

  • Impact
of
VSO
sentences
on
translaBon
quality


  • Chunk‐based
reordering
laIces


WMT 2010, Uppsala

  • A. Bisazza, M. Federico
slide-14
SLIDE 14

14

Chunk‐based
verb
reordering


WMT 2010, Uppsala

  • A. Bisazza, M. Federico

–
Simplifying
assumpBons:


 1)
verb
reordering
only
between
shallow
syntax
chunks;





 2)
no
overlap
between
consecuBve
verb
movements


slide-15
SLIDE 15

15

Chunk‐based
verb
reordering


WMT 2010, Uppsala

  • A. Bisazza, M. Federico

–
Simplifying
assumpBons:


 1)
verb
reordering
only
between
shallow
syntax
chunks;





 2)
no
overlap
between
consecuBve
verb
movements
 –
Possible
movements:

 move
verb
chunk…


slide-16
SLIDE 16

16

Chunk‐based
verb
reordering


WMT 2010, Uppsala

  • A. Bisazza, M. Federico

–
Simplifying
assumpBons:


 1)
verb
reordering
only
between
shallow
syntax
chunks;





 2)
no
overlap
between
consecuBve
verb
movements
 –
Possible
movements:

 move
verb
chunk…
 ...or
verb
chunk
+
next
chunk
(e.g.
adverbials)
 by
up
to
X
chunks
to
the
right



slide-17
SLIDE 17

17

Chunk‐based
verb
reordering


Best
movement:

 minimizes
distorBon
wrt
English
translaBon


WMT 2010, Uppsala

  • A. Bisazza, M. Federico
slide-18
SLIDE 18

18

Chunk‐based
verb
reordering:
 corpus
analysis


DistribuBon
by
movement
length


WMT 2010, Uppsala

  • A. Bisazza, M. Federico

IntersecBon
of
GIZA++
alignments
 Manual
alignments


slide-19
SLIDE 19

19

Chunk‐based
verb
reordering:
 corpus
analysis


DistribuBon
by
movement
length


WMT 2010, Uppsala

  • A. Bisazza, M. Federico

=>
Good
coverage
(≥
99.5%)

 with
max
movement
length
6


slide-20
SLIDE 20

20

Outline 


  • Reordering
paEerns
in
Arabic‐English


  • Chunk‐based
verb
reordering:
technique
and
analysis

  • Impact
of
VSO
sentences
on
transla)on
quality


  • Chunk‐based
reordering
laIces


WMT 2010, Uppsala

  • A. Bisazza, M. Federico
slide-21
SLIDE 21

21

Impact
of
VSO
sentences
on
MT
quality


  • Baseline:
Moses,
30M
words
newswire
from
NIST09


WMT 2010, Uppsala

  • A. Bisazza, M. Federico
slide-22
SLIDE 22

22

Impact
of
VSO
sentences
on
MT
quality


  • Baseline:
Moses,
30M
words
newswire
from
NIST09

  • Shallow
syntax
chunking:
AMIRA
(Diab&al.2004)








  • Verb‐reorder
training
and
devset,
re‐train
whole
system


WMT 2010, Uppsala

  • A. Bisazza, M. Federico
slide-23
SLIDE 23

23

Impact
of
VSO
sentences
on
MT
quality


  • Baseline:
Moses,
30M
words
newswire
from
NIST09

  • Shallow
syntax
chunking:
AMIRA
(Diab&al.2004)








  • Verb‐reorder
training
and
devset,
re‐train
whole
system

  • Verb‐reorder
test
aligned
with
reference
(oracle)

  • Tested
with
different
DistorBon
Limits
(DL)
from
2
to
10







and
wide
beam
search


WMT 2010, Uppsala

  • A. Bisazza, M. Federico
slide-24
SLIDE 24

24

Impact
of
VSO
sentences
on
MT
quality


WMT 2010, Uppsala

  • A. Bisazza, M. Federico

%BLEU
scores
on
Eval08‐NW
(MERT
on
Dev06‐NW):


slide-25
SLIDE 25

25

Impact
of
VSO
sentences
on
MT
quality


WMT 2010, Uppsala

  • A. Bisazza, M. Federico

%BLEU
scores
on
Eval08‐NW
(MERT
on
Dev06‐NW):


Verb
reordering
of
training
 data
only
=>
posiBve
effect


(9%
more
phrases
extracted)


slide-26
SLIDE 26

26

Impact
of
VSO
sentences
on
MT
quality


WMT 2010, Uppsala

  • A. Bisazza, M. Federico

%BLEU
scores
on
Eval08‐NW
(MERT
on
Dev06‐NW):


Verb
reordering
of
training
 and
test
=>
further
gain












(+1.2
with
1/3
of
sentences
modified)


Verb
reordering
of
training
 data
only
=>
posiBve
effect


(9%
more
phrases
extracted)


slide-27
SLIDE 27

27

Impact
of
VSO
sentences
on
MT
quality


WMT 2010, Uppsala

  • A. Bisazza, M. Federico

%BLEU
scores
on
Eval08‐NW
(MERT
on
Dev06‐NW):


Verb
reordering
of
training
 and
test
=>
further
gain












(+1.2
with
1/3
of
sentences
modified)


Verb
reordering
of
training
 data
only
=>
posiBve
effect


(9%
more
phrases
extracted)


Relaxing
the
DL
to
high
 values
doesn’t
help


slide-28
SLIDE 28

28

Impact
of
VSO
sentences
on
MT
quality


To
resume:


  • VSO
sentences
affect
negaBvely
phrase‐based
SMT

  • Specific
models
needed
to
handle
verb
reordering
of
test


WMT 2010, Uppsala

  • A. Bisazza, M. Federico
slide-29
SLIDE 29

29

Outline 


  • Reordering
paEerns
in
Arabic‐English


  • Chunk‐based
verb
reordering:
technique
and
analysis

  • Impact
of
VSO
sentences
on
translaBon
quality


  • Chunk‐based
reordering
laIces


WMT 2010, Uppsala

  • A. Bisazza, M. Federico
slide-30
SLIDE 30

30

Chunk‐based
reordering
laIces


  • Word
laIces:
represent
input
ambiguiBes



(segmentaBon,
decompounding,
…
ordering)


  • Thanks
to
our
assumpBons,
we
build
compact
reordering
laIces


and
run
non‐monotonic
decoding
on
them


  • Double
strategy

  • for
global
reordering:
laIces

  • for
local
reordering:
standard
(phrase‐internal
and
distorBon)



WMT 2010, Uppsala

  • A. Bisazza, M. Federico
slide-31
SLIDE 31

31

Word‐based
VS
Chunk‐based
laIces


WMT 2010, Uppsala

  • A. Bisazza, M. Federico
slide-32
SLIDE 32

32

Word‐based
VS
Chunk‐based
laIces


WMT 2010, Uppsala

  • A. Bisazza, M. Federico

1+6
reordering
paths


slide-33
SLIDE 33

33

Word‐based
VS
Chunk‐based
laIces


WMT 2010, Uppsala

  • A. Bisazza, M. Federico

1+6
reordering
paths
 1+3
reordering
paths


slide-34
SLIDE 34

34

Word‐based
VS
Chunk‐based
laIces


WMT 2010, Uppsala

  • A. Bisazza, M. Federico

1+6
reordering
paths
 1+3
reordering
paths
 Chunk‐to‐word
 expansion


slide-35
SLIDE 35

35

Chunk‐based
reordering
laIces



LaIce
representaBon
of
the
rule:
 “move
1
or
2
chunks
by
up
to
6
chunk
posiBons
right”


WMT 2010, Uppsala

  • A. Bisazza, M. Federico
slide-36
SLIDE 36

36

Chunk‐based
reordering
laIces



LaIce
representaBon
of
the
rule:
 “move
1
or
2
chunks
by
up
to
6
chunk
posiBons
right”


WMT 2010, Uppsala

  • A. Bisazza, M. Federico
slide-37
SLIDE 37

37

Chunk‐based
reordering
laIces



LaIce
representaBon
of
the
rule:
 “move
1
or
2
chunks
by
up
to
6
chunk
posiBons
right”


WMT 2010, Uppsala

  • A. Bisazza, M. Federico

1








for
the
plain
order
path
 0.25


for
the
reordering
paths
 Simple
edge
 weighBng
scheme:


slide-38
SLIDE 38

38

Chunk‐based
reordering
laIces:
 Evalua)on


  • Same
Moses‐based
system

  • Training
and
tuning
on
verb‐reordered
data

  • Non‐monotonic
decoding
of
word
laIces
by
Dyer
et
al.
(2008)


WMT 2010, Uppsala

  • A. Bisazza, M. Federico
slide-39
SLIDE 39

39

Chunk‐based
reordering
laIces:
 Evalua)on


WMT 2010, Uppsala

  • A. Bisazza, M. Federico

%BLEU
scores
on
Eval08‐NW,
Reo08‐NW
(specific
set
 containing
only
VSO
sentences)
and
Eval09‐NW
:


System DL

eval08nw reo08nw eval09nw

baseline 6 43.10 46.90 48.13

  • reord. training +

plain input 6 43.67 46.64 48.53 lattice 4 44.04 47.51 48.96

  • racle reord.

4 44.36 48.25 49.26

slide-40
SLIDE 40

40

Chunk‐based
reordering
laIces:
 Evalua)on


WMT 2010, Uppsala

  • A. Bisazza, M. Federico

%BLEU
scores
on
Eval08‐NW,
Reo08‐NW
(specific
set
 containing
only
VSO
sentences)
and
Eval09‐NW
:


+0.9/0.6/0.8%
 abs.
improvement



System DL

eval08nw reo08nw eval09nw

baseline 6 43.10 46.90 48.13

  • reord. training +

plain input 6 43.67 46.64 48.53 lattice 4 44.04 47.51 48.96

  • racle reord.

4 44.36 48.25 49.26

slide-41
SLIDE 41

41

Chunk‐based
reordering
laIces:
 Evalua)on


WMT 2010, Uppsala

  • A. Bisazza, M. Federico

%BLEU
scores
on
Eval08‐NW,
Reo08‐NW
(specific
set
 containing
only
VSO
sentences)
and
Eval09‐NW
:


Gap
between
baseline
 and
oracle
is
largely
but
 not
totally
filled
 +0.9/0.6/0.8%
 abs.
improvement



System DL

eval08nw reo08nw eval09nw

baseline 6 43.10 46.90 48.13

  • reord. training +

plain input 6 43.67 46.64 48.53 lattice 4 44.04 47.51 48.96

  • racle reord.

4 44.36 48.25 49.26

slide-42
SLIDE 42

42

Conclusions


  • We
have
focused
on
a
class
of
significant
reorderings

  • analysed
their
distribuBon
and
measured
their
impact
on
SMT

  • developed
techniques
to:


  • verb‐reorder
the
parallel
data,

  • represent
likely
verb
movements
in
test
sentences

  • PosiBve
results
(
+0.8%
BLEU
on
Nist09‐NW
)
but
further


improvement
possible


WMT 2010, Uppsala

  • A. Bisazza, M. Federico
slide-43
SLIDE 43

43

Conclusions


Future
work
will
include:


  • devising
more
discriminaBve
weighBng
scheme
for
laIces

  • evaluaBng
with
reordering‐specific
metrics
by
Birch
&
al.(2010)

  • developing
linguisBcally
informed
reordering
constraints,


alternaBve
to
laIces


WMT 2010, Uppsala

  • A. Bisazza, M. Federico
slide-44
SLIDE 44

44

Appendices


WMT 2010, Uppsala

  • A. Bisazza, M. Federico
slide-45
SLIDE 45

45 WMT 2010, Uppsala

  • A. Bisazza, M. Federico

Improved
MT
outputs


slide-46
SLIDE 46

46

Evalua)on
(also
on
No‐reo‐08)


WMT 2010, Uppsala

  • A. Bisazza, M. Federico

%BLEU
scores
on:
 Eval08‐NW,
Reo08‐NW
(only
sentences
needing
reordering),










 No‐reo08‐NW
(the
other
sentences),
and
Eval09‐NW
:


System DL

eval08nw reo08nw no-reo08nw eval09nw

baseline 6 43.10 46.90 40.68 48.13

  • reord. training +

plain input 6 43.67 46.64 41.79 48.53 lattice 4 44.04 47.51 41.83 48.96

  • racle reord.

4 44.36 48.25 41.79 49.26

slide-47
SLIDE 47

47

Evalua)on
(with
L‐weights)


WMT 2010, Uppsala

  • A. Bisazza, M. Federico

%BLEU
scores
on:
 Eval08‐NW,
Reo08‐NW
(only
sentences
needing
reordering),










 No‐reo08‐NW
(the
other
sentences),
and
Eval09‐NW
 Length‐based
edge
weighBng
scheme
(L‐weights):



System DL

eval08nw reo08nw no-reo08nw eval09nw

baseline 6 43.10 46.90 40.68 48.13

  • reord. training +

plain input 6 43.67 46.64 41.79 48.53 lattice 4 44.04 47.51 41.83 48.96 lattice(Lweights) 4 44.18 47.40 42.13 49.06

  • racle reord.

4 44.36 48.25 41.79 49.26