Machine Translation Steps: Analysis, Transfer, Generation Classical - - PowerPoint PPT Presentation

machine translation
SMART_READER_LITE
LIVE PREVIEW

Machine Translation Steps: Analysis, Transfer, Generation Classical - - PowerPoint PPT Presentation

Session 2: Syntactic Transfer Syntactic Transfer Machine Translation Steps: Analysis, Transfer, Generation Classical and Statistical Approaches How are the various types of divergence dealt with? Session 2: Syntactic Transfer


slide-1
SLIDE 1

Machine Translation

– Classical and Statistical Approaches

Session 2: Syntactic Transfer

Jonas Kuhn Universität des Saarlandes, Saarbrücken The University of Texas at Austin jonask@coli.uni-sb.de

DGfS/CL Fall School 2005, Ruhr-Universität Bochum, September 19-30, 2005

Jonas Kuhn: MT 2

Session 2: Syntactic Transfer

Syntactic Transfer

Steps: Analysis, Transfer, Generation How are the various types of divergence dealt

with? For lab exercise: Quick Prolog Intro/Recap

Basic Prolog Terminology and Syntax Lists and Definite Clause Grammars (DCGs)

Jonas Kuhn: MT 3

Syntactic Transfer

Source-language syntactic analysis:

construct SL analysis tree

Transfer: Tree-to-tree transformations applied

recursively to SL tree: construct TL tree

recursive, non-deterministic top-down process

(No syntactic generation required in TL)

Morphological generation Consolidation: Applying TL grammar constraints to the

TL structure to enforce grammaticality (and fill in underspecified values)

Syntactic Structure Syntactic Structure String String

Jonas Kuhn: MT 4

Syntactic Transfer: Resources

Translation process is governed by three sets of rules

Standard grammar specification for source language

analysis (e.g., context-free grammars)

Transfer “grammar”: Transformation rules

Include translation variables (e.g., tv(X) in Trujillo’s

tree notation)

Set of transformation rules will be applied recursively

to each occurrence of translation variables

Standard grammar specification for target language

generation

slide-2
SLIDE 2

Jonas Kuhn: MT 5

Example

NP N1 soup delicious a N Adj Det NP N1 deliciosa sopa una Adj N Det English grammar NP Det N1 N1 Adj N Det a Adj delicious N soup Transfer grammar

  • delicious deliciosa
  • soup sopa

NP tv(Y) tv(X) NP tv(Y) tv(X) N1 Adj tv(A) N tv(B) N tv(B) Adj tv(A) N1 Det a Det una

Spanish grammar NP Det N1 N1 N Adj Det una Adj deliciosa N sopa

Jonas Kuhn: MT 6

Tree-to-tree transformation

Jonas Kuhn: MT 7

Transformations: Prolog notation

Actual Prolog code by Trujillo (slightly different structural

analysis than in text book)

We will come back to the details of this notation… [np|_]/_ dtrs [ DetE, N1E ] <==> [np|_]/_ dtrs [ DetS, N1S ] :- DetE <==> DetS, N1E <==> N1S. [n1|_]/_ dtrs [ [ap|_]/_ dtrs [ AdjvE ], [n1|_]/_ dtrs [ NE ]] <==> [n1|_]/_ dtrs [ [n1|_]/_ dtrs [ NS ], [ap|_]/_ dtrs [ AdjvS ]] :- AdjvE <==> AdjvS, NE <==> NS. [n|_]/soup <==> [n|_]/sopa. [adjv|_]/delicious <==> [adjv|_]/deliciosa. [det|_]/a <==> [det|_]/una.

Jonas Kuhn: MT 8

Divergences in syntactic transfer

Thematic divergence

En: You like her Sp: Ella te gusta

slide-3
SLIDE 3

Jonas Kuhn: MT 9

Divergences in syntactic transfer

Head switching

En: The baby just ate Sp: El bebé acaba de comer

Jonas Kuhn: MT 10

Divergences in syntactic transfer

Structural

En: Luisa entered the house Sp: Luisa entró a la casa

Jonas Kuhn: MT 11

Divergences in syntactic transfer

Categorial

En: a little bread Sp: un poco de pan

Jonas Kuhn: MT 12

Divergences in syntactic transfer

Lexical gaps (conflational divergence)

En: Camillo got up early Sp: Camillo madrugó

slide-4
SLIDE 4

Jonas Kuhn: MT 13

Divergences in syntactic transfer

Lexicalization (lexical divergence)

En: Susan swam across the channel Sp: Susan cruzó el canal nadando

Jonas Kuhn: MT 14

Divergences in syntactic transfer

Collocational

En: Jan made a decision Sp: Jan tomó/*hizó una decisión

Jonas Kuhn: MT 15

Divergences in syntactic transfer

Idiomatic

En: Socrates kicked the bucket Sp: Socrates estiró la pata

Jonas Kuhn: MT 16

Quick Prolog Intro/Recap

Compare: Blackburn, Bos & Striegnitz: Learn Prolog

Now!

[www.coli.uni-sb.de/~kris/learn-prolog-now/] Public domain compiler SWI Prolog

Developed since 1987 at the University of Amsterdam,

The Netherlands

http://www.swi-prolog.org/ Available for MS-Windows, Mac, and Linux

Logic programming, i.e., a Prolog program is (mostly)

not a sequence of commands, but a set of facts and rules used to prove or refute new facts

slide-5
SLIDE 5

Jonas Kuhn: MT 17

Interpreter and knowledge base

How we communicate with the system Knowledge base (file we can edit) Interpreter (shell in which we can type queries) ?- ?- woman(mia). Yes In order to use a knowledge base, we have to load or consult

  • ne

?- [’my-knowledge-base-file.pl’].

  • With SWI running under MS Windows, the File menu can be used

To quit the interpreter at the end of your session type ?- halt. woman(mia). playsAirGuitar(jody).

Jonas Kuhn: MT 18

Terminology

Knowledge base:

Facts Rules

Inference rules to derive new facts from given facts listensToMusic(yolanda) :- happy(yolanda).

  • Read: “If Yolanda is happy, then she listens to music.”

Facts and rules define predicates

  • Examples: happy, listensToMusic

Interpreter:

Query

  • Clause for which we ask: is there a proof from the

knowledge base?

Jonas Kuhn: MT 19

Prolog rules

A predicate definition may consist of several clauses

Disjunctive interpretation playsAirGuitar(butch):- happy(butch). playsAirGuitar(butch):- listensToMusic(butch).

Each clause ends in a period The condition part (right-hand side) of a rule may

contain several term

Conjunctive interpretation playsAirGuitar(vincent):- listensToMusic(vincent), happy(vincent).

The consequence part (left-hand side) may only

contain one term

Jonas Kuhn: MT 20

Variables

Capitalized identifiers are interpreted as variables

(undergoing unification)

?- woman(X). X=mia ; X=jody ?- jealous(marcellus,W). W=vincent

woman(mia). woman(jody). woman(yolanda). loves(vincent,mia). loves(marcellus,mia). loves(pumpkin,honey_bunny). loves(honey_bunny,pumpkin). jealous(X,Y) :- loves(X,Z), loves(Y,Z). Hitting semicolon tells Prolog to find alternative solutions Backtracking

slide-6
SLIDE 6

Jonas Kuhn: MT 21

Variables

The match predicate “=” can be used to state that two

things are the same

jealous(X,Y) :- loves(X,U), loves(Y,V), U=V.

Normally, variables are simply re-used in predicate

definitions in order to express that two argument positions have to be the same

When a variable is used just once, this is often due to a typo Prolog will issue a warning for variables used only once in a

clause

To suppress the warning, a leading underscore can be used in_love(X) :-

loves(X,_Someone).

Jonas Kuhn: MT 22

Variables

Special variable: _ (the “anonymous”

variable)

Can match any arbitrary value – even if used

several times in the same clause!

in_love(X) :- loves(X,_).

Jonas Kuhn: MT 23

Prolog survival guide

Clauses (facts/rules/queries) end in a period Uppercase identifiers are variables, functors/atoms

have to start with a lowercase letter!

Prolog variables are logical variables tied to a

particular value within the scope of a clause (unlike variables in other programming languages where values of variables can be changed)

Don’t forget consulting your knowledge base (and re-

consulting after making changes)

To exit the Prolog interpreter type

?- halt.

(and don’t forget the period!)

Jonas Kuhn: MT 24

Prolog lists

Important data structure for linguistic tasks List elements can be enumerated within brackets

[fred, ann, pete]

Special case: the empty list: [] For flexible access to list elements, Prolog has a built-in

  • perator for decomposing lists into head and tail:

the “ |”operator

?- [ X | Y ] = [fred, ann, pete] X = fred Y = [ann, pete]

slide-7
SLIDE 7

Jonas Kuhn: MT 25

Prolog lists

Lists are typically manipulated in recursive predicates trans(eins,one). trans(zwei,two). trans(drei,three). trans_list([],[]). trans_list([H|T],[H1|T1]) :- trans(H,H1), trans_list(T,T1). Example application:

?- trans_list([zwei,eins,drei],X). X = [two,one,three]

Jonas Kuhn: MT 26

Built-in list predicates

Some important, generic list predicates are predefined in most

Prolog versions (“built-in”)

member/2

member(X,L) is true if and only if X is an element of the list L Examples: member(b,[a,b,c]),

member([2,3],[1,[2,3]]) append/3

append(L1,L2,L3) is true if and only if L3 is the concatenation

  • f lists L1 and L2

Examples: append([a],[b,c],[a,b,c]),

append([],[1,2],[1,2]) reverse/2

reverse(L1,L2) is true if and only if L1 is the reversed version

  • f list L2

Examples: reverse([a,b,c],[c,b,a])

  • length/2
  • length(L,N) is true if and only if the integer N is the length (number of

elements) of list L

  • Examples: length([a,b,c],3), length([],0)

Jonas Kuhn: MT 27

Definite Clause Grammars (DCGs)

Simple built-in grammar formalism

Rewrite rules for (augmented) context-free

grammars

s --> np, vp. np --> det, n. vp --> v, np. vp --> v. det --> [the]. n --> [dog]. v --> [barks].

Jonas Kuhn: MT 28

Definite Clause Grammars (DCGs)

Internally, the rewrite rule notation is

compiled out as follows (using a “difference list notation” for phrase coverage):

s(X,Z) :- np(X,Y), vp(Y,Z). np(X,Z) :- det(X,Y), n(Y,Z). vp(X,Z) :- v(X,Y), np(Y,Z). vp(X,Z) :- v(X,Z). det([the|T],T). n([dog|T],T). v([barks|T],T).

slide-8
SLIDE 8

Jonas Kuhn: MT 29

Parsing with the DCG notation

For asking queries, we have to know about

the internal encoding: two arguments for the difference list notation

?- s([the,dog,barks],[]). ?- np([the,dog],[]).

Definition of a simple parse (or recognition)

predicate:

parse(Sent) :- s(Sent,[]).

Jonas Kuhn: MT 30

“Augmented” DCGs

The atomic category symbols can be augmented with

  • ne or more arguments

We can use variables to encode linguistic “features”

s --> np(Num), vp(Num). np(Num) --> det(Num), n(Num). vp(Num) --> v(Num), np(_). vp(Num) --> v(Num). det(_) --> [the]. n(sg) --> [dog]. n(pl) --> [dogs]. v(sg) --> [barks]. v(pl) --> [bark].

Jonas Kuhn: MT 31

“Augmented” DCGs

Special use of an argument position in DCGs:

Building up a syntactic representation

:- op(450,xfx,dtrs). s( [s] dtrs [NPtree, VPtree]) --> np(NPtree), vp(VPtree). vp([vp] dtrs [Vtree, NPtree] ) --> v(Vtree), np(NPtree). np([np] dtrs [DETtree, Ntree]) --> det(DETtree), n(Ntree). det([det] dtrs [the] ) --> [the]. n( [n] dtrs [cat] ) --> [cat]. n( [n] dtrs [dog] ) --> [dog]. v( [v] dtrs [chase]) --> [chases]. Defining the letter sequence “dtrs” as a special infix operator (similar to + or * in arithmetic expressions)

Jonas Kuhn: MT 32

“Augmented” DCGs

This notation should be used with a special “pretty

print” predicate

parse(String) :- s(Tree,String,[]), pretty_print(3,Tree). pretty_print(Tabs,A) :- atom(A), tab(Tabs), write(A), nl. pretty_print(Tabs, M dtrs [W] ) :- !, tab(Tabs), write(M), write(' dtrs '), write([W]), nl. pretty_print(Tabs, M dtrs D) :- tab(Tabs), write(M), write(' dtrs '), nl, tab(Tabs), write(' ['), nl, Tabs1 is Tabs + 8, pretty_print_list(Tabs1,D), tab(Tabs), write(' ]'), nl. pretty_print_list(_Tabs,[]). pretty_print_list(Tabs,[H|T]) :- pretty_print(Tabs,H), pretty_print_list(Tabs,T).

slide-9
SLIDE 9

Jonas Kuhn: MT 33

Effect of pretty_print

?- parse([the,dog,chases,the,cat]). [s] dtrs [ [np] dtrs [ [det] dtrs [the] [n] dtrs [dog] ] [vp] dtrs [ [v] dtrs [chase] [np] dtrs [ [det] dtrs [the] [n] dtrs [cat] ] ] ]

Jonas Kuhn: MT 34

“Augmented” DCGs

Arguments for constructing parse trees can

be combined with other arguments for linguistic features

Besides a syntactic tree, we can also build a

more abstract semantic representation (possibly using additional operators like dtrs, e.g. a slash “/” following the category)

Jonas Kuhn: MT 35

Coming back to syntactic transfer

We can now understand most aspects of Trujillo’s

Prolog notation

[np|_]/_ dtrs [ DetE, N1E ] <==> [np|_]/_ dtrs [ DetS, N1S ] :- DetE <==> DetS, N1E <==> N1S. [n1|_]/_ dtrs [ [ap|_]/_ dtrs [ AdjvE ], [n1|_]/_ dtrs [ NE ]] <==> [n1|_]/_ dtrs [ [n1|_]/_ dtrs [ NS ], [ap|_]/_ dtrs [ AdjvS ]] :- AdjvE <==> AdjvS, NE <==> NS. [n|_]/soup <==> [n|_]/sopa. [adjv|_]/delicious <==> [adjv|_]/deliciosa. [det|_]/a <==> [det|_]/una.