Intro to SMT Sara Stymne 2019-09-09 Partly based on slides by J - PowerPoint PPT Presentation

Intro to SMT Sara Stymne 2019-09-09 Partly based on slides by J¨ org Tiedemann and Fabienne Cap

The revolution of the empiricists Classical approaches require lots of manual work! long development times low coverage, not robust disambiguation at various levels → slow! Learn from translation data: example databases for CAT and MT bilingual lexicon/terminology extraction statistical translation models

Motivation for Data-Driven MT How do we learn to translate? grammar vs. examples teacher vs. practice intuition vs. experience Is it possible to create an MT engine without any human effort? no writing of grammar rules no bilingual lexicography no writing of preference & disambiguation rules

Motivating example Imagine a spaceship with aliens coming to earth, telling you: peli kaj meni Translation? Anyone?

Motivating example Imagine a spaceship with aliens coming to earth, telling you: peli kaj meni Translation? Anyone? Problem: Human translators may not be available Human translators are expensive

Motivating example Imagine a spaceship with aliens coming to earth, telling you: peli kaj meni Translation? Anyone? Problem: Human translators may not be available Human translators are expensive Possible solution: We found a collection of translated text!

Practical exercise 15–20 minutes Try to learn to translate the alien language!

What can we learn from this exercise? We can learn to translate from translated texts 1-to-1 translations are easier to identify than 1-to-n n-to-1 or n-to-m unseen words cannot be translated ambiguity: some words have more than one correct translation → the context helps determine which one sometimes words need to be reordered

Motivation for Data-Driven MT Learning to translate: there is a bunch of translated stuff (collect all) learn common word/phrase translations from this collection look at typical sentences in the target language learn how to write a sentence in the target language

Motivation for Data-Driven MT Learning to translate: there is a bunch of translated stuff (collect all) learn common word/phrase translations from this collection look at typical sentences in the target language learn how to write a sentence in the target language Translation: try various translations of words/phrases in given sentence put them together, shuffle them around check which translation candidate looks best

Statistical Machine Translation Noisy channel for MT: “What could have been the sentence that has generated the observed source language sentence?” Target lang Source lang Translation model Language model P(Source|Target) P(Target) Target lang Source lang Decoder ... what a strange idea!

Statistical Machine Translation Ideas borrowed from Speech Recognition: utterance Speech signal utterance model Pronounciation model P(Utterance) P(Speech|Utterance) Speech utterance Decoder signal

Statistical Machine Translation Target lang Source lang Translation model Language model P(Source|Target) P(Target) Target lang Source lang Decoder Probabilistic view on MT (T = target language, S = source language): � = argmax T P ( T | S ) T = argmax T P ( S | T ) P ( T )

Noisy Channel Model vs SMT Noisy Channel SMT Example Model Source signal (desired) SMT output English text target language text (noisy) Channel Translation Reciever (distorted SMT input source lan- Foreign text message) guage text

Statistical Machine Translation Modeling model translation as an optimization (search) problem look for the most likely translation T for a given input S use a probabilistic model that assigns these conditional likelihoods use Bayes theorem to split the model into 2 parts: a language model (for the target language) a translation model (source language given target language)

Statistical Machine Translation Learn statistical models automatically from bilingual corpora Bilingual corpora: collections of texts translated by humans Use the models to translate unseen texts

Statistical Machine Translation Learn statistical models automatically from bilingual corpora Bilingual corpora: collections of texts translated by humans Use the models to translate unseen texts Models can be have different granularity Word-based Phrase-based – sequences of words Hierarchical – tree structures Syntactical – linguistically motivated tree structures

Some (very) basic concepts of probability theory probability P ( X ) maps event X to number between 0 and 1 P ( X ) represents the likelihood of observing event X in some kind of experiment (trial) discrete probability distribution: � i P ( X = x i ) = 1

Some (very) basic concepts of probability theory probability P ( X ) maps event X to number between 0 and 1 P ( X ) represents the likelihood of observing event X in some kind of experiment (trial) discrete probability distribution: � i P ( X = x i ) = 1 P ( X | Y ) = conditional probability (likelihood of event X given that event Y has been observed before)

Some (very) basic concepts of probability theory probability P ( X ) maps event X to number between 0 and 1 P ( X ) represents the likelihood of observing event X in some kind of experiment (trial) discrete probability distribution: � i P ( X = x i ) = 1 P ( X | Y ) = conditional probability (likelihood of event X given that event Y has been observed before) joint probability: P ( X, Y ) (likelihood of seeing both events) P ( X, Y ) = P ( X ) ∗ P ( Y | X ) = P ( Y ) ∗ P ( X | Y )

Some (very) basic concepts of probability theory probability P ( X ) maps event X to number between 0 and 1 P ( X ) represents the likelihood of observing event X in some kind of experiment (trial) discrete probability distribution: � i P ( X = x i ) = 1 P ( X | Y ) = conditional probability (likelihood of event X given that event Y has been observed before) joint probability: P ( X, Y ) (likelihood of seeing both events) P ( X, Y ) = P ( X ) ∗ P ( Y | X ) = P ( Y ) ∗ P ( X | Y ), therefore: Bayes Theorem: P ( X | Y ) = P ( X ) ∗ P ( Y | X ) P ( Y )

Some quick words on probability theory & Statistics Where do the probabilities come from? → Experience! Use experiments (and repeat them often ....) Maximum Likelihood Estimation (rely on N experiments only): P ( X ) ≈ count ( X ) N

Some quick words on probability theory & Statistics Where do the probabilities come from? → Experience! Use experiments (and repeat them often ....) Maximum Likelihood Estimation (rely on N experiments only): P ( X ) ≈ count ( X ) N For conditional probabilities: P ( X | Y ) = P ( X, Y ) ≈ count ( X, Y ) ∗ N = count ( X, Y ) P ( Y ) count ( Y ) ∗ N count ( Y )

Translation Model Parameters Lexical translations: das → the haus → house, home, building, household, shell ist → is klein → small, low Multiple translation options: learn translation probabilities from data use the most common one in that context

Context-independent models Count translation statistics: How often is Haus translated into: Translation of Haus Count house 8,000 building 1,600 home 200 household 150 shell 50 10,000

(Classical) Statistical Machine Translation ˆ T = argmax T P ( T | S ) P ( S | T ) P ( T ) = argmax T P ( S ) = argmax T P ( S | T ) P ( T )

(Classical) Statistical Machine Translation ˆ T = argmax T P ( T | S ) P ( S | T ) P ( T ) = argmax T P ( S ) = argmax T P ( S | T ) P ( T ) Translation model: P ( S | T ), estimated from (big) parallel corpora, takes care of adequacy Language model: P ( T ), estimated from (huge) monolingual target language corpora, takes care of fluency Decoder: global search for argmax T P ( S | T ) P ( T ) for a given sentence S

Modelling Statistical Machine Translation Sith − English English Parallel Corpus Corpus Statistical Analysis Statistical Analysis Sith Translation Broken Language Fluent Input Model Model English English Let’s in there climb Let’s climb in there Tegu mus Let’s climb in there Let’s climb there in kelias antai kash There in let’s climb P(sith | english) P(english) Decoding Algorithm Tegu mus argmax P(sith | english) * P(english) Let’s climb in there kelias antai kash english

The role of the translation and language model Translation model: prefer adequate translations P(Das Haus ist klein—The house is small) > P(Das Haus ist klein—The building is small) > P(Das Haus ist klein—The shell is low ) Language model: prefer fluent translations: P(The house is small) > P(The is house small)

Word-based SMT models Why do we need word alignment? Cannot directly estimate P ( S | T ) ... Why not?

Word-based SMT models Why do we need word alignment? Cannot directly estimate P ( S | T ) ... Why not? almost all sentences are unique sparse counts! → no good estimations → decompose into smaller chunks!

Intro to SMT Sara Stymne 2019-09-09 Partly based on slides by J - PowerPoint PPT Presentation

Intro to SMT Sara Stymne 2019-09-09 Partly based on slides by J org Tiedemann and Fabienne Cap The revolution of the empiricists Classical approaches require lots of manual work! long development times low coverage, not robust

SMT WORLDWIDE SMT America, Europe and Asia staff has over 20 years experience in the SMT field

POLYMETALLIC PRODUCER AGM PRESENTATION June 30, 2020 TSX: SMT | NYSE AMERICAN: SMTS | BVL: SMT

SMT Solvers: A Disruptive Technology John Rushby Computer Science Laboratory SRI International

Using SMT solvers for binary analysis and exploitation A primer on SMT, SMT solvers, Z3 & angr

POLYMETALLIC PRODUCER CORPORATE PRESENTATION July 2020 TSX: SMT | NYSE AMERICAN: SMTS | BVL:

SMT in Asia Content Teknek and the SMT industry The market Why cleaning is needed

POLYMETALLIC PRODUCER CORPORATE PRESENTATION February 2020 TSX: SMT | NYSE AMERICAN: SMTS |

DIVERSIFIED PRODUCER CORPORATE PRESENTATION August 2020 TSX: SMT | NYSE AMERICAN: SMTS | BVL:

DIVERSIFED PRODUCER CORPORATE PRESENTATION August 2020 TSX: SMT | NYSE AMERICAN: SMTS | BVL:

SMT-LIB for HOL Daniel Kroening Philipp Rmmer Georg Weissenbacher Oxford University Computing

Motivation SMT Theories of Interest History of SMT Eager approach Lazy approach Optimizations

Introduction to SAT and SMT Solvers Interfacing Yosys and SMT Solvers for BMC and more using

Introduction to SMT Albert Oliveras Technical University of Catalonia 8th International

SMT Unsat Core Minimization OFER GUTHMANN, OFER STRICHMAN, ANNA TRO STANETSKI FMCAD2016 1 SMT

Applications of SMT to Test Generation Patrice Godefroid Microsoft Research SAT/SMT Summer

SMT-LIB Status Report SMT19 Clark Barrett, Pascal Fontaine, Cesare Tinelli 1 / 11 Plan

QCD and EW NLO corrections with NLOX Effects in bg Zb Christian Reuschle CREUSCHLE @ HEP . FSU

4CSLL5 IBM Translation Models IBM models Probabilities and Translation Alignments Martin Emms

Empirical Methods in Natural Language Processing Lecture 15 Machine translation (II): Word-based

Whats New Since Nice 2013 Whats New Since Nice 2013 in Pediatric PH? in Pediatric PH?

Modules and Programs 1 / 14 Python Programs Python code organized in modules, packages,

rian sanderson sensor platforms, inc. you have an android board, you have a sensor board, you

Operating experience of the worlds largest lime kiln Roberto Villarroel Eldorado Brazil Sao

Mo Most staf afa a Z. Ali Z. Ali mzali@just.edu.jo 1-1 The Bourne Again Shell The

Intro to SMT Sara Stymne 2019-09-09 Partly based on slides by J - PowerPoint PPT Presentation

Intro to SMT Sara Stymne 2019-09-09 Partly based on slides by J org Tiedemann and Fabienne Cap The revolution of the empiricists Classical approaches require lots of manual work! long development times low coverage, not robust

SMT WORLDWIDE SMT America, Europe and Asia staff has over 20 years experience in the SMT field

POLYMETALLIC PRODUCER AGM PRESENTATION June 30, 2020 TSX: SMT | NYSE AMERICAN: SMTS | BVL: SMT

SMT Solvers: A Disruptive Technology John Rushby Computer Science Laboratory SRI International

Using SMT solvers for binary analysis and exploitation A primer on SMT, SMT solvers, Z3 &amp; angr

POLYMETALLIC PRODUCER CORPORATE PRESENTATION July 2020 TSX: SMT | NYSE AMERICAN: SMTS | BVL:

SMT in Asia Content Teknek and the SMT industry The market Why cleaning is needed

POLYMETALLIC PRODUCER CORPORATE PRESENTATION February 2020 TSX: SMT | NYSE AMERICAN: SMTS |

DIVERSIFIED PRODUCER CORPORATE PRESENTATION August 2020 TSX: SMT | NYSE AMERICAN: SMTS | BVL:

DIVERSIFED PRODUCER CORPORATE PRESENTATION August 2020 TSX: SMT | NYSE AMERICAN: SMTS | BVL:

SMT-LIB for HOL Daniel Kroening Philipp Rmmer Georg Weissenbacher Oxford University Computing

Motivation SMT Theories of Interest History of SMT Eager approach Lazy approach Optimizations

Introduction to SAT and SMT Solvers Interfacing Yosys and SMT Solvers for BMC and more using

Introduction to SMT Albert Oliveras Technical University of Catalonia 8th International

SMT Unsat Core Minimization OFER GUTHMANN, OFER STRICHMAN, ANNA TRO STANETSKI FMCAD2016 1 SMT

Applications of SMT to Test Generation Patrice Godefroid Microsoft Research SAT/SMT Summer

SMT-LIB Status Report SMT19 Clark Barrett, Pascal Fontaine, Cesare Tinelli 1 / 11 Plan

QCD and EW NLO corrections with NLOX Effects in bg Zb Christian Reuschle CREUSCHLE @ HEP . FSU

4CSLL5 IBM Translation Models IBM models Probabilities and Translation Alignments Martin Emms

Empirical Methods in Natural Language Processing Lecture 15 Machine translation (II): Word-based

Whats New Since Nice 2013 Whats New Since Nice 2013 in Pediatric PH? in Pediatric PH?

Modules and Programs 1 / 14 Python Programs Python code organized in modules, packages,

rian sanderson sensor platforms, inc. you have an android board, you have a sensor board, you

Operating experience of the worlds largest lime kiln Roberto Villarroel Eldorado Brazil Sao

Mo Most staf afa a Z. Ali Z. Ali mzali@just.edu.jo 1-1 The Bourne Again Shell The

Using SMT solvers for binary analysis and exploitation A primer on SMT, SMT solvers, Z3 & angr