3GTM: A Third-Generation Translation Memory Fabrizio Gotti , - - PowerPoint PPT Presentation

3gtm a third generation translation memory
SMART_READER_LITE
LIVE PREVIEW

3GTM: A Third-Generation Translation Memory Fabrizio Gotti , - - PowerPoint PPT Presentation

3GTM: A Third-Generation Translation Memory Fabrizio Gotti , Philippe Langlais , Elliott Macklovitch , Didier Bourigault , Benoit Robichaud and Claude Coulombe RALI D epartement dinformatique et de recherche op


slide-1
SLIDE 1

3GTM: A Third-Generation Translation Memory

Fabrizio Gotti†, Philippe Langlais†, Elliott Macklovitch†, Didier Bourigault⋆, Benoit Robichaud‡ and Claude Coulombe‡

†RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal ‡ Lingua Technologies Inc. Montr´ eal ⋆ ERSS-CNRS Toulouse

CLiNE — August 26th 2005

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 1 / 35

slide-2
SLIDE 2

Outline

1

Overview of the 3GTM project

2

Experimental Setting

3

Experiments Sentence Coverage Random Substring Coverage Chunk-Based Coverage Tree-Phrase Coverage

4

Discussion

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 2 / 35

slide-3
SLIDE 3

Overview of the 3GTM project

Translation Memory

A Computer Assisted Tool which eases the recycling of past translations 1st-generation TM never translates again a sentence that has already been translated Full-sentence repetition is a rather marginal phenomemon 2nd-generation TM 2 source sentences might be considered identical if they differ only slightly (named entities, edit distance, etc.) Fuzzy matching 3rd-generation TM (3GTM) recycles at a sub-sentential level A project currently funded by PRECARN Lingua Technologies Inc., RALI, Transetix Inc.

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 3 / 35

slide-4
SLIDE 4

3GTM in a Screenshot

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 4 / 35

slide-5
SLIDE 5

1

Overview of the 3GTM project

2

Experimental Setting

3

Experiments Sentence Coverage Random Substring Coverage Chunk-Based Coverage Tree-Phrase Coverage

4

Discussion

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 5 / 35

slide-6
SLIDE 6

Experimental Setting

English-French language pair : querying the French side, proposing English material TM populated with Canadian Hansard texts Coverage statistics computed over a test corpus

help appreciating the number of useful units that can be queried/found the easiest thing to implement in an early stage of a project ultimately, we target human evaluation runs (or simulations)

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 6 / 35

slide-7
SLIDE 7

Training Material

Number of sentences, tokens and types in the training corpus

Language English French

  • Nb. sentences

1 753 443 1 753 443

  • Nb. tokens

31 637 775 34 150 039

  • Nb. types

85 810 106 987

  • Avg. word/sent.

17.5 19.3

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 7 / 35

slide-8
SLIDE 8

Test Material

1000 sentences (Hansard corpus) chronologically distinct from the training material French = query or source language English = output or target language

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 8 / 35

slide-9
SLIDE 9

Tools used

JAPA an in-house sentence aligner http://rali.iro.umontreal.ca/Japa/ LUCENE a freely available full-featured text search engine http://lucene.apache.org SIMAC an in-house implementation of a word aligner (Simard and Langlais, 2003) GIZA++ a tool to train translation models (Och and Ney, 2000) GRAMMATICUM a constituant-based parser (Coulombe, 1991) SYNTEX a dependency-based parser (Bourigault and Fabre, 2000)

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 9 / 35

slide-10
SLIDE 10

1

Overview of the 3GTM project

2

Experimental Setting

3

Experiments Sentence Coverage Random Substring Coverage Chunk-Based Coverage Tree-Phrase Coverage

4

Discussion

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 10 / 35

slide-11
SLIDE 11

Full Sentence Coverage

Using Verbatim Match

  • Nb. of sentences

1000

  • Nb. of sent. found verbatim

148

  • Avg. size of sent. in test corpus

19.2

  • Avg. size of sent. found verbatim

11.1 14.8 % because of Hansard idioms : I don’t know

  • Mr. Speaker : Order, please .

within a TM ≡ TSRALI.com (6.6 M. pairs of phrases), we only found 11 out of 1000 sentences of the EuroParl corpus.

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 11 / 35

slide-12
SLIDE 12

1

Overview of the 3GTM project

2

Experimental Setting

3

Experiments Sentence Coverage Random Substring Coverage Chunk-Based Coverage Tree-Phrase Coverage

4

Discussion

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 12 / 35

slide-13
SLIDE 13

Random Substring Coverage

Protocol

1

Query the TM with any sequence of the source (French) material (length ≥ 2) A query found at least once is a valid one

2

Compute a source (French) optimal coverage Maximizing the source coverage while minimizing the number of queries

3

Consider the target (English) material associated By following the word alignment

4

Compute a target (English) optimal coverage Wait for details

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 13 / 35

slide-14
SLIDE 14

Random Substring Coverage

S Il travaille dans la chocolaterie T He works in a chocolate factory q la chocolaterie Match : S Charlie1 et2 [la3 chocolaterie4,5] T Charlie and [the chocolate factory] m

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 14 / 35

slide-15
SLIDE 15

Random Substring Coverage

S Il travaille dans la chocolaterie T He works in a chocolate factory q la chocolaterie Match : S Charlie1 et2 [la3 chocolaterie4,5] T Charlie and [the chocolate factory] m

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 14 / 35

slide-16
SLIDE 16

Random Substring Coverage

S Il travaille dans la chocolaterie T He works in a chocolate factory q la chocolaterie Match : S Charlie1 et2 [la3 chocolaterie4,5] T Charlie and [the chocolate factory] m the chocolate factory

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 14 / 35

slide-17
SLIDE 17

Random Substring Coverage

S Il travaille dans la chocolaterie T He works in a chocolate factory q la chocolaterie Match : S Charlie1 et2 [la3 chocolaterie4,5] T Charlie and [the chocolate factory] m the chocolate factory

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 15 / 35

slide-18
SLIDE 18

Random Substring Coverage

Coverage statistics

Metric Source Target Optimal coverage 98.8% 55.8%

  • Cov. unit size (words)

4.09 2.98 Number of cov. units 4.65 3.23

  • Avg. nb. LUCENE queries per sentence : 226

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 16 / 35

slide-19
SLIDE 19

Random Substring Coverage

The unsustainable Truth

S : m. mcinnes : je m’ excuse T : mr . mcinnes : i apologize ր m. mcinnes : je m’ excuse excuse – – – – – – m’ – – – – – 446 je – – – – 3719 347 : – – – 12330 185 107 mcinnes – – 43 4 m. – 69 43 4

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 17 / 35

slide-20
SLIDE 20

Random Substring Coverage

The unsustainable Truth

S : m. mcinnes – : je m’ excuse T : mr . mcinnes : i apologize ր m. mcinnes : je m’ excuse excuse – – – – – – m’ – – – – – 446 je – – – – 3719 347 : – – – 12330 185 107 mcinnes – – 43 4 m. – 69 43 4

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 17 / 35

slide-21
SLIDE 21

Random Substring Coverage

The unsustainable Truth

S : m. mcinnes : je m’ excuse T : mr . mcinnes : i apologize

  • m. mcinnes : (69)

42 mr . mcinnes : 17 mr . mcinnes ) 2 mr . mcinnes ) , 1 mr . mcinnes ( 1 mr . mcinnes : it not is required reading , mr . speaker 1 mr . mcinnes moved 1 mr . mcinnes moves : je m’ excuse (107) 46 : i am sorry , 16 : i am sorry 14 : i am sorry to 14 : i apologize , 8 : i apologize for interrupting 8 : i apologize to 8 : i do apologize 6 : excuse me , 6 : : i apologize 6 : i beg your pardon

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 19 / 35

slide-22
SLIDE 22

Random Substring Coverage

The unsustainable Truth

S : m. mcinnes : je m’ excuse T : mr . mcinnes : i apologize

  • m. mcinnes : (69)

42 mr . mcinnes : 17 mr . mcinnes ) 2 mr . mcinnes ) , 1 mr . mcinnes ( 1 mr . mcinnes : it not is required reading , mr . speaker 1 mr . mcinnes moved 1 mr . mcinnes moves : je m’ excuse (107) 46 : i am sorry , 16 : i am sorry 14 : i am sorry to 14 : i apologize , 8 : i apologize for interrupting 8 : i apologize to 8 : i do apologize 6 : excuse me , 6 : : i apologize 6 : i beg your pardon

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 19 / 35

slide-23
SLIDE 23

Random Substring Coverage

The unsustainable Truth

S : m. mcinnes : je m’ excuse T : mr . mcinnes : i apologize

  • m. mcinnes : (69)

42 mr . mcinnes : 17 mr . mcinnes ) 2 mr . mcinnes ) , 1 mr . mcinnes ( 1 mr . mcinnes : it not is required reading , mr . speaker 1 mr . mcinnes moved 1 mr . mcinnes moves : je m’ excuse (107) 46 : i am sorry , 16 : i am sorry 14 : i am sorry to 14 : i apologize , 8 : i apologize for interrupting 8 : i apologize to 8 : i do apologize 6 : excuse me , 6 : : i apologize 6 : i beg your pardon

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 19 / 35

slide-24
SLIDE 24

1

Overview of the 3GTM project

2

Experimental Setting

3

Experiments Sentence Coverage Random Substring Coverage Chunk-Based Coverage Tree-Phrase Coverage

4

Discussion

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 20 / 35

slide-25
SLIDE 25

Chunk-Based Coverage

Querying the Memory with Chunks : Pros

Speeding up By limiting the number of queries Improving the target material extraction By taking into account chunk boundaries computed on both sides Avoiding overwhelming the user with too much data Less queries, reduced target material overlaps

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 21 / 35

slide-26
SLIDE 26

Chunk-Based Coverage

Protocol

1

The test material was first chunked using a tool from Lingua Technologies Inc. (Coulombe,1991)

2

28.35 chunks per source (French) sentence on average

3

11.7 chunks if we only consider those of size ≥ 2

4

We used those selected chunks to query the TM

5

Everything else was kept identical to the previous experiment

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 22 / 35

slide-27
SLIDE 27

Chunk-Based Coverage

Coverage Statistics

Metric Source Target Optimal coverage 98.8% 55.8%

  • Cov. unit size (words)

4.09 2.98 Number of cov. units 4.65 3.23

  • Avg. nb. LUCENE queries per sentence : 226

Metric Source Target Optimal coverage 59.9% 59.3%

  • Cov. unit size (words)

3.73 2.99 Number of cov. units 3.08 3.47

  • Avg. nb. LUCENE queries per sentence : 11.7

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 23 / 35

slide-28
SLIDE 28

Chunk-Based Coverage

S : m. mcinnes : je m’ excuse T : mr . mcinnes : i apologize ր m. mcinnes : je m’ excuse excuse – – – – – – m’ – – – – – 446 je – – – – 3719 347 : – – – 12330 185 107 mcinnes – – 43 4 m. – 69 43 4

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 24 / 35

slide-29
SLIDE 29

Chunk-Based Coverage

S : m. mcinnes : – je m’ excuse T : mr . mcinnes : i apologize ր m. mcinnes : je m’ excuse excuse – – – – – – m’ – – – – – 446 je – – – – 3719 347 : – – – 12330 185 107 mcinnes – – 43 4 m. – 69 43 4

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 24 / 35

slide-30
SLIDE 30

Chunk-Based Coverage

S : m. mcinnes : je m’ excuse T : mr . mcinnes : i apologize

  • m. mcinnes : (69)

42 mr . mcinnes : 17 mr . mcinnes ) 2 mr . mcinnes ) , 1 mr . mcinnes ( 1 mr . mcinnes : it not is required reading , mr . speaker 1 mr . mcinnes moved 1 mr . mcinnes moves je m’ excuse (347) 40 i am sorry , 33 i apologize to 24 i apologize 21 i am sorry 15 i apologize for 13 i am sorry to 13 i apologize , 6 i apologize if 6 i do apologize to 5 i apologize for interrupting

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 26 / 35

slide-31
SLIDE 31

Chunk-Based Coverage

S : m. mcinnes : je m’ excuse T : mr . mcinnes : i apologize

  • m. mcinnes : (69)

42 mr . mcinnes : 17 mr . mcinnes ) 2 mr . mcinnes ) , 1 mr . mcinnes ( 1 mr . mcinnes : it not is required reading , mr . speaker 1 mr . mcinnes moved 1 mr . mcinnes moves je m’ excuse (347) 40 i am sorry , 33 i apologize to 24 i apologize 21 i am sorry 15 i apologize for 13 i am sorry to 13 i apologize , 6 i apologize if 6 i do apologize to 5 i apologize for interrupting

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 26 / 35

slide-32
SLIDE 32

Chunk-Based Coverage

S : m. mcinnes : je m’ excuse T : mr . mcinnes : i apologize

  • m. mcinnes : (69)

42 mr . mcinnes : 17 mr . mcinnes ) 2 mr . mcinnes ) , 1 mr . mcinnes ( 1 mr . mcinnes : it not is required reading , mr . speaker 1 mr . mcinnes moved 1 mr . mcinnes moves je m’ excuse (347) 40 i am sorry , 33 i apologize to 24 i apologize 21 i am sorry 15 i apologize for 13 i am sorry to 13 i apologize , 6 i apologize if 6 i do apologize to 5 i apologize for interrupting

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 26 / 35

slide-33
SLIDE 33

1

Overview of the 3GTM project

2

Experimental Setting

3

Experiments Sentence Coverage Random Substring Coverage Chunk-Based Coverage Tree-Phrase Coverage

4

Discussion

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 27 / 35

slide-34
SLIDE 34

Tree-Phrase Coverage

Motivation : The translations of a good friend could be useful to translate a very good friend which do not appear in the TM. From now on, TM = collection of Tree-Phrases (TPs) where TP = a combination of a treelet (TL) and an elastic-phrase (EP)

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 28 / 35

slide-35
SLIDE 35

Tree-Phrase Coverage

SYNTEX (Bourigault et Fabre, 2000)

Dependency parser available for French and English. On demande des cr´ edits f´ ed´ eraux demande

SUB

  • OBJ
  • n

cr´ edits

DET

  • ADJ
  • des

f´ ed´ eraux A link identifies two words : a governor and its dependent (e.g. demande governs cr´ edits) We do not consider link labels in this study

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 29 / 35

slide-36
SLIDE 36

Tree-Phrase Coverage

Facts

We parsed the source (French) part of the training material with SYNTEX We extracted all TLs of depth 1 We collected more than 3 million types of TLs from which we projected 6.5 million kinds of EPs The TLs range in size from 2 to 8 words, and EPs from 1 to 9 Roughly half the TLs and 40% of the EPs are contiguous ones

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 30 / 35

slide-37
SLIDE 37

Tree-Phrase Coverage

On demande des cr´ edits f´ ed´ eraux / We request for federal funding

alignment : demande ≡ request for // f´ ed´ eraux ≡ federal // cr´ edits ≡ funding treelets : demande

  • n

cr´ edits cr´ edits

  • des

f´ ed´ eraux tree-phrases : TL⋆ {{on@-1} demande {cr´ edits@2}} EP⋆ |request@0||for@1||funding@3| TL {{des@-1} cr´ edits {f´ ed´ eraux@1}} EP |federal@0||funding@1|

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 31 / 35

slide-38
SLIDE 38

Tree-Phrase Coverage

Coverage Statistics

Metric Source Target Optimal coverage 59.9% 59.3%

  • Cov. unit size (words)

3.73 2.99 Number of cov. units 3.08 3.47

  • Avg. nb. LUCENE queries per sentence : 11.7

Metric Source Target Optimal coverage 62.7% 56.4%

  • Cov. by contiguous TPs

46.0% 38.6%

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 32 / 35

slide-39
SLIDE 39

Tree-Phrase Coverage

S pr´ esentation de le 1er rapport de le comit´ e permanent T presentation of first report of standing committee

rapport

  • de

le 1er

  • f first report

rapport

  • de le

report, report of, of report rapport

  • le

de report, of report report of rapport

  • de

le 1er de report, of first report, first report of comit´ e

  • de

le committee, of committee comit´ e

  • de le permanent
  • f committee, standing committee,
  • f standing committee

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 33 / 35

slide-40
SLIDE 40

Tree-Phrase Coverage

S pr´ esentation de le 1er rapport de le comit´ e permanent T presentation of first report of standing committee

rapport

  • de

le 1er

  • f first report

rapport

  • de le

report, report of, of report rapport

  • le

de report, of report report of rapport

  • de

le 1er de report, of first report, first report of comit´ e

  • de

le committee, of committee comit´ e

  • de le permanent
  • f committee, standing committee,
  • f standing committee

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 33 / 35

slide-41
SLIDE 41

1

Overview of the 3GTM project

2

Experimental Setting

3

Experiments Sentence Coverage Random Substring Coverage Chunk-Based Coverage Tree-Phrase Coverage

4

Discussion

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 34 / 35

slide-42
SLIDE 42

Discussion

We have considered distinct approaches to query a TM Full-sentence queries yield a poor coverage Random substring querying does well at coverage, but does not seem viable without stringent pruning strategies (Simard, 2003) Chunk-based querying seems attractive TP querying seems a viable alternative, and non-contiguous units might be useful to the end user Coverage simulations are only approximations (Langlais and Simard, 2003)

RALI, Lingua Technologies Inc., ERSS-CNRS ( †RALI D´ epartement d’informatique et de recherche op´ erationnelle Universit´ e de Montr´ eal 3GTM CLiNE — August 26th 2005 35 / 35