Introduction Machine Translation (MT) Translate one natural - - PDF document

introduction
SMART_READER_LITE
LIVE PREVIEW

Introduction Machine Translation (MT) Translate one natural - - PDF document

24.01.2012 HOW MACHINE TRANSLATION OF WEB PAGES WORKS ? E-BUSINESS TECHNOLOGIES, BCM1, WS 2011 Student: Lei Sio Meng Professor: Dr. Eduard Heindl Introduction Machine Translation (MT) Translate one natural language to other language by


slide-1
SLIDE 1

24.01.2012 1

HOW MACHINE TRANSLATION OF WEB PAGES WORKS ?

E-BUSINESS TECHNOLOGIES, BCM1, WS 2011

Student: Lei Sio Meng Professor: Dr. Eduard Heindl

Introduction

  • Machine Translation (MT)
  • Translate one natural language to other language by the

computational power of computer

  • When computer was developed, MT has brought up
  • A demonstration to solve the MT as early as 1950’s
slide-2
SLIDE 2

24.01.2012 2

Introduction

  • Processing was slow before 1980’s
  • the result is poor
  • computation ability not enough to solve MT problem
  • After 1980s’, people pay attention back to MT
  • the growth of computation
  • Nowadays, different approaches can be implement
  • rule-based machine translation (RBMT)
  • statistical machine translation (SMT)…

Introduction

  • Web make a new stage of MT after 1990s’
  • The application of MT in Global, also e-Business becomes

more important

  • 260 countries are connected by internet, over 26

major languages.

  • Non-English language speakers take about 43 percent
  • f online population.
slide-3
SLIDE 3

24.01.2012 3

Introduction

  • Key issue of web page in globalization:
  • How user gets Multi-language information
  • How to present and translate information of company
  • Solution: Web based Machine Translation services
  • General
  • Fast
  • Instant
  • Effective
  • Low-cost
  • For accurate result, still need human post-edit
  • MT can speed-up the process of traditional translation

Web-MT Service

  • Various translation service
  • For instance: Google translate

Online translator Document/ web page translation

6

slide-4
SLIDE 4

24.01.2012 4

Web-MT Service

Browser integration Web Page integration Mobile version

7

Basic Work flow

  • Build a client/server translation service by various architectures
  • Provide translation service by connect web server
  • MT application still relies on tradition MT approach
  • Different translation language rules and approaches will be

developed

  • as a series of modules
  • in the back-end server side
  • Front-end client interface accept the translate requirement ,

send to server side

  • Result send back to client after processing
slide-5
SLIDE 5

24.01.2012 5

Basic Work flow

Client Web Interface Machine Translation Server Source Language Target Language . . . SMT RBMT MT Approaches Probabilistic Data/Corpus Lexicon/ Dictionary Grammar Rule Resources . . . Module 1 . . . Modules

E.g. Use Google translator to translate English to Germany. English = Source language (SL) Germany = Target language (TL)

Module 2 Module 3

Basic Work flow

  • Different MT approaches
  • rule-based (RBMT)
  • statistical (SMT),
  • example-based
  • hybrid (RBMT + SMT)…
  • Different Modules
  • HTML fetching
  • Word segmentation
  • Part of Speech tagging…
  • Depend on MT approaches
  • Different Resource:

– Probabilistic data – Lexicon – Grammar rule

slide-6
SLIDE 6

24.01.2012 6

Translation approaches

1st Generation (1960s - 1980s)

  • Direct approach

2nd Generation (1980s - )

  • Rule-Based

approach

  • Transfer
  • Interlingua

3rd Generation (1990s - )

  • Corpus-Based
  • Example
  • Statistical

1st Generation is simple 2nd Generation is Linguistic analysis approaches, 3rd Generation is using corpus to train a statistical data for obtain result.

Direct approach

  • Earliest, basic approach
  • Dictionary Approach
  • Linguist Model is not involved
  • Translate words by words
  • Result is poor

Source Language Text Target Language Text Dictionaries Translation

slide-7
SLIDE 7

24.01.2012 7

Transfer approach

  • 3 Steps:

1.

Analysis: Parses input into Abstract Source Representation (SL Intermediate)

2.

Transfer: Translate Intermediate into Abstract Target Representation (TL Intermediate)

3.

Generation: Map TL Intermediate into output

Source Language Text Target Language Text Analysis Stage: SL Dictionary + Grammar rule SL Intermediate Generate Stage: TL Dictionary + Grammar rule TL Intermediate Transfer Stage: Bilingual Dictionary + Grammar rule

Interlingua approaches

  • Similar as Transfer approach
  • 2 steps:

1.

Analysis: input is converted to one Interlingua representation

  • A summarized, abstract meaning, Neutral Universal Language

2.

General: transfer Interlingua to target text

Source Language Text Target Language Text Analysis Stage: Source Dictionary + Grammar rule Interlingua representation Generate Stage: Target Dictionary + Grammar rule

slide-8
SLIDE 8

24.01.2012 8

Statistical Machine approaches

  • The most widely-use approach
  • Training a large Parallel Bilingual Corpus
  • Bilingual corpus is a set of documentations with SL, TL and translated

relationship

  • Easier to build Multilingual MT
  • No matter the closely or un-closely language-pair

Statistical Machine approaches

1.

Input text is segmented into phrases and strings

2.

Translate word segments by probability theory and training data

3.

Arranging, combining the Segments by probability theory and training data

Source Language Text Target Language Text Source Segments Language Model Target Segments Translation Model Parallel bilingual corpus

slide-9
SLIDE 9

24.01.2012 9

Statistical Machine approaches

  • Statistical approach is implemented by Bayes’ rule
  • Translation Model
  • calculating probabilities of matching the source segments to

target segment by a bilingual corpus

  • Language Model
  • calculating best sequences from target segments and

combine them as a final output

𝑄(𝑈|𝑇) = 𝑄 𝑈 𝑄(𝑇|𝑈)

Language Model Translation Model

Example-Based approaches

  • Uses the aligned bilingual corpus and TL model

1.

Input is decomposed into a set of segments

2.

Translated to target segments

  • Find a closely translation-pair example from Examples in Aligned bilingual corpus.

3.

Target segments recombined together to be a target output text.

Source Language Text Target Language Text Source Segments Target Language Model Target Segments Aligned bilingual corpus

slide-10
SLIDE 10

24.01.2012 10

Example-Based approaches

  • MT system is “Imitating” the translation of similar segment in corpus

She sells flowers in the farmers’ market every day She sells flowers in the farmers’market every day Dia menjual bunga setiap hari Example: She sells flowers every day -> Dia menjual bunga setiap hari. Example: The lady in the farmers’ market is my cousin -> Wanita di pasar tani itu ialah sepupu saya. di pasar tani itu Dia menjual bunga di pasar tani itu setiap hair. Decomposed Recombined Translation 19

Hybrid approach

  • Build rule-based MT is expensive
  • Add linguistic rule, the result will be inconsistent
  • Add new language is different
  • Statistical approach can’t reach the quality for people to fully

understand

  • Especially when translating long sentences
  • New idea: Build Hybrid approach
  • E.g. SYSTRAN: Hybrid Rule-based + Statistical approach in 2010
slide-11
SLIDE 11

24.01.2012 11

Modules and Resources

  • Varies modules and resources are used between different MT applications
  • Case: Multi-languages MT system based on the Statistical approaches

Word-to-Word Alignment Segment Extraction Statistical Translator Associated Sequence Bilingual Corpus SL Text TL Text Module Resource Monolingual Corpus Segment Table Translation Model Language Model

Process and architectures of Web MTs

  • Varies architectures are used between MT applications
  • Case 1: Translate English to Bangla and Punjabi to Hindi

1.

Parsing HTML Source Code: Use a HTML Parser, omits HTML tags, obtain content texts, combined as a SL input text

2.

Translate input text to TL Text

3.

Modifying original HTML code: Replace SL content by the TL text. The modified HTML code redirected to client

HTML Source Code SL Content Text TL Text Modified HTML Code Parsing Translate Replace

slide-12
SLIDE 12

24.01.2012 12

Process and architectures of Web MTs

  • Case 2: Web MT translates Arabic, Chinese, Spanish to English by statistical

approach

Web Page (form) CGI script MT front-end Languages 1 Wrapper Languages 2 Wrapper Languages 3 Wrapper ... … Server Client Pre- Processing MT System Pre- Processing MT System Pre- Processing MT System

Process and architectures of Web MTs

  • 2-Level Layer architecture
  • Web site user-interface: user input SL text and

choose language-pair

  • Send request by HTML form
  • A CGI (Common Gateway Interface) :

Communicate between web site and machine translator in server side

  • MT front-end : Forward translation requests to

appropriate languages wrapper.

Web Page (form) CGI script Client MT front-end Server

slide-13
SLIDE 13

24.01.2012 13

Process and architectures of Web MTs

  • Different Language-pair wrappers (as Chinese to

English, Spanish to English)

  • Include kernel of MT system
  • Also Pre-processing module for SL
  • The Translation implement MT program, result

send back to client by opposition direction.

Languages 1 Wrapper Pre- Processing MT System Server

Process and architectures of Web MTs

  • Case 3: Use Moses toolkit to build 3 different web MT systems
  • Moses is a open-source development software
  • Design 1:

Apache Web Server Tomcat Server Server Client Translate Modules Translate Modules Translate Modules Translate Modules Language Pair 1 Language Pair 2 Language Pair 3 Language Pair 4 Moses Toolkit Client B Client A Client C

slide-14
SLIDE 14

24.01.2012 14

Process and architectures of Web MTs

  • Apache Web Server and Tomcat Server:

Communicate between kernel MT system (Moses Toolkit) and Clients

  • Moses Toolkit only handles 1 request
  • When multi-client send requests, the Tomcat

Server queues them (FIFO)

  • Moses Toolkit activate and re-load related

language-pair MT system every time

  • Simple, not efficiency

Apache Web Server Tomcat Server Server Translate Modules Language Pair 1 Moses Toolkit Client Client A

Process and architectures of Web MTs

  • Design 2:

Apache Web Server Tomcat Serve Server Client Moses Server Client B Client A Client C Language Pair 1 Modules B Modules A Language Pair 2 Modules B Modules A Language Pair 3 Modules B Modules A Language Pair 4 Modules B Modules A

slide-15
SLIDE 15

24.01.2012 15

Process and architectures of Web MTs

  • Moses Server is used instead of Moses Toolkit
  • Moses server : load multiple language-pair

translation modules, handle multiple requests at a time

  • Haven’t re-load language-pair MT system
  • Resources are shared.
  • E.g. if translate same target language continuously
  • Reduce the server’s memory.

Apache Web Server Tomcat Server Server Moses Server Client Client A Language Pair 1 Modules B Modules A Language Pair 2 Modules B Modules A

Process and architectures of Web MTs

  • Design 3:

Apache Web Server Tomcat Serve Server Client Client B Client A Client C Moses Server 1 Language Pair 1 Modules B Modules A Moses Server 2 Language Pair 2 Modules B Modules A Moses Server 3 Language Pair 3 Modules B Modules A Moses Server 4 Language Pair 4 Modules B Modules A

slide-16
SLIDE 16

24.01.2012 16

Process and architectures of Web MTs

  • Each language-pair, system creates a separate

Moses server

  • Keeps translation resources, such as CPU, memory,

independent of each other

  • Work-Load of resource is reduced.
  • Compare the result of the second design, this

design is better

Apache Web Server Tomcat Server Server Moses Server 1 Client Client A Moses Server 2 Modules B Modules A Modules B Modules A Language Pair 1 Language Pair 2

Conclusion

  • Numerous web MT application is existing in market
  • Various approaches, resources, architecture
  • Yahoo! Babe Fish : Rule-based approaches
  • Google translator/ Bing translator : Statistics approaches
  • Give a fast, instant translation to user
  • Although can’t provide the high quality output
  • It is satisfied for general purpose
slide-17
SLIDE 17

24.01.2012 17

Conclusion

  • Various ways to use web-based MT service
  • Basic Client-Server MTs
  • Automatic webpage translation
  • Toolkit integration
  • Browser integration.
  • Mobile version web translation application
  • Real-time Speech translation
  • By the powerful computation and resource on Web
  • Web machine translation become convenient and easier

accessible