Natural Language Processing Marco Chiarandini Department of - PowerPoint PPT Presentation

Lecture 18 Natural Language Processing Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Dan Klein at Berkeley

Recap Speech Recognition Course Overview Machine Translation ✔ Introduction ✔ Learning ✔ Artificial Intelligence ✔ Supervised ✔ Intelligent Agents Decision Trees, Neural Networks ✔ Search Learning Bayesian Networks ✔ Uninformed Search ✔ Unsupervised ✔ Heuristic Search EM Algorithm ✔ Uncertain knowledge and ✔ Reinforcement Learning Reasoning ◮ Games and Adversarial Search ✔ Probability and Bayesian ◮ Minimax search and approach Alpha-beta pruning ✔ Bayesian Networks ◮ Multiagent search ✔ Hidden Markov Chains ✔ Kalman Filters ◮ Knowledge representation and Reasoning ◮ Propositional logic ◮ First order logic ◮ Inference ◮ Plannning 2

Recap Speech Recognition Outline Machine Translation 1. Recap 2. Speech Recognition 3. Machine Translation Statistical MT Rule-based MT 3

Recap Speech Recognition Recap: Sequential data Machine Translation 4

Recap Speech Recognition Recap: Filtering Machine Translation 5

Recap Speech Recognition Recap: State Trellis Machine Translation ◮ State trellis: graph of states and transitions over time ◮ Each arc represents some transition x t − 1 → x t ◮ Each arc has weight Pr ( x t | x t − 1 ) Pr ( e t | x t ) ◮ Each path is a sequence of states ◮ The product of weights on a path is the seq’s probability ◮ Can think of the Forward (and now Viterbi) algorithms as computing sums of all paths (best paths) in this graph 6

Recap Speech Recognition Recap: Forward/Viterbi Machine Translation 7

Recap Speech Recognition Recap: Particle Filtering Machine Translation Particles: track samples of states rather than an explicit distribution 8

Recap Speech Recognition Natural Language Machine Translation ◮ 100.000 years ago humans started to speak ◮ 7.000 years ago humans started to write Machines process natural language to: ◮ acquire information ◮ communicate with humans 9

Recap Speech Recognition Natural Language Processing Machine Translation ◮ Speech technologies ◮ Automatic speech recognition (ASR) ◮ Text-to-speech synthesis (TTS) ◮ Dialog systems ◮ Language processing technologies ◮ Machine translation ◮ Information extraction ◮ Web search, question answering ◮ Text classification, spam filtering, etc. 10

Recap Speech Recognition Digitalizing Speech Machine Translation Speech input is an acoustic wave form 12

Recap Speech Recognition Spectral Analysis Machine Translation 13

Recap Speech Recognition Acoustic Feature Sequence Machine Translation 14

Recap Speech Recognition State Space Machine Translation ◮ Pr ( E | X ) encodes which acoustic vectors are appropriate for each phoneme (each kind of sound) ◮ Pr ( X | X ′ ) encodes how sounds can be strung together ◮ We will have one state for each sound in each word ◮ From some state x, can only: ◮ Stay in the same state (e.g. speaking slowly) ◮ Move to the next position in the word ◮ At the end of the word, move to the start of the next word ◮ We build a little state graph for each word and chain them together to form our state space X 15

Recap Speech Recognition HMM for speech Machine Translation 16

Recap Speech Recognition Transition with Bigrams Machine Translation 17

Recap Speech Recognition Decoding Machine Translation ◮ While there are some practical issues, finding the words given the acoustics is an HMM inference problem ◮ We want to know which state sequence x 1 : T is most likely given the evidence e 1 : T : ◮ From the sequence x, we can simply read off the words 18

Recap Speech Recognition Machine Translation Machine Translation ◮ Fundamental goal: analyze and process human language, broadly, robustly, accurately... ◮ End systems that we want to build: Ambitious: speech recognition, machine translation, information extraction, dialog interfaces, question answering... Modest: spelling correction, text categorization, language recognition, genre classification. 20

Recap Speech Recognition Language Models Machine Translation ◮ Language defined by a sequence of strings and rules called grammars. ◮ Formal languages also need semantics that define meaning. ◮ Natural Languages: 1. not definitive: is disagreement with grammar rules “Not to be invited is sad” “To be not invited is sad” 2. ambiguous: “Entire store 25% off” “I will bring my bike tomorrow if it looks nice in the morning.” 3. large and constantly changing 21

Recap Speech Recognition Machine Translation ◮ n -gram sequence of n characters or sequence of n words, syllables ◮ n -gram models: define probability distributions for these sequences ◮ n -gram model is defined as a Markov chain of order n − 1. For a trigram: p ( c i | c 1 : i − 1 ) = p ( c i | c i − 2 : i − 1 ) N N � � p ( c 1 : N ) = Pr ( c i | c 1 : i − 1 ) = Pr ( c i | c i − 2 : i − 1 ) i = 1 i = 1 ◮ 100 chars � millions of entries ◮ with words even worse ◮ Corpus body of text 22

Recap Speech Recognition Language identification Machine Translation Learned from corpus: p ( c i | c i − 2 : i − 1 , l ) Most probable language: l ∗ = argmax l p ( l | c 1 : N ) = argmax l p ( l ) p ( c 1 : N | l ) (Bayes) N � = argmax l p ( l ) p ( c i | c i − 2 : i − 1 , l ) (Markov property) i = 1 Computers can reach 99% accuracy 23

Recap Speech Recognition Machine Translation Machine Translation Rough translation: gives the main point but contains errors Pre-edited translation: original text written in constrained language easier to translate automatically Restricted-source translation: fully automatic but only on technical content as e.g. weather forecast 24

Recap Speech Recognition Machine Translation Systems Machine Translation Very simplified there are three types of machine translation Statistical machine translation (SMT) learn relational dependencies of features such as grams, lemmas, etc. • Requires large data sets • Example: google translate • Relatively easy to implement Rule-based machine translation (RBMT) use grammatical rules and language constructions to analyze syntax and semantics • Use moderate size data sets • Long development time and expertise Hybrid machine translation either construct from RBMT and use SMT to post-process and optimize the result • Or use grammatical rules to derive further features to then be fed in the statistical learning machine • New direction of research. 25

Recap Speech Recognition Brief History Machine Translation 26

Recap Speech Recognition Machine Translation ◮ Interlingual model: the source language, i.e. the text to be translated is transformed into an interlingua, i.e., an abstract language-independent representation. The target language is then generated from the interlingua. ◮ Transfer model: the source language is transformed into an abstract, less language-specific representation. Linguistic rules which are specific to the language pair then transform the source language representation into an abstract target language representation and from this the target sentence is generated. ◮ Direct model: words are translated directly without passing through an additional representation. 27

Recap Speech Recognition Levels of Transfer Machine Translation Interlingua Semantics Attraction ( NamedJohn, NamedMary, High ) English Semantics French Semantics Loves ( John, Mary ) Aime ( Jean, Marie ) English Syntax French Syntax S ( NP ( John), VP ( loves, NP(Mary ))) S ( NP ( Jean), VP ( aime, NP(Marie ))) English Words French Words Jean aime Marie John loves Mary Vauquois pyramid 28

Recap Speech Recognition Levels of Transfer Machine Translation 29

Recap Speech Recognition The problem with dictionary look ups Machine Translation 30

Recap Statistical machine translation Speech Recognition Machine Translation Data driven MT 32

Recap Speech Recognition Machine Translation ◮ e sequence of strings in English ◮ f sequence of strings in French f ∗ = argmax f Pr ( f | e ) = argmax f Pr ( e | f ) Pr ( f ) ◮ Pr ( e | f ) learned from bilingual (parallel) corpus made of phrases seen before 33

Recap Speech Recognition Machine Translation e 1 e 2 e 3 e 4 e 5 There is a smelly wumpus sleeping in 2 2 f 3 f 2 f 1 f 4 f 5 Il y a un wumpus malodorant qui dort à 2 2 d 1 = 0 d 3 = -2 d 2 = +1 d 4 = +1 d 5 = 0 Given English sentence e find French sentence f ∗ : 1. break English e into phrases e 1 , . . . , e n 2. ∀ e i choose the French f i : Pr ( f i | e i ) 3. choose a permutation of phrases f 1 , . . . , f n ∀ f i choose distortion d i : num. of words that phrase f i has moved wrt f i − 1 n � Pr ( f , d | e ) = Pr ( f i | e i ) Pr ( d i ) i = 1 with 100 French phrases for a 5-gram English there are 100 5 different 5-gram and 5! reorderings. 34

Natural Language Processing Marco Chiarandini Department of - PowerPoint PPT Presentation

Lecture 18 Natural Language Processing Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Dan Klein at Berkeley Recap Speech Recognition Course Overview Machine Translation

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2019 Natural Language

MIA - Master on Artificial Intelligence Advanced Natural Language Processing Advanced Natural

Advanced Natural Language Processing: What is Natural Language Processing (NLP)? Background

Introduction Karl Stratos Rutgers University Karl Stratos CS 533: Natural Language Processing

Outline of todays lecture Overview of Natural Language Generation Components of Natural

Introduction to Natural Language Processing CMSC 470 Marine Carpuat Natural Language Processing

x86 Memory Protection User System Calls Kernel and Translation RCU File System Networking

Attention is All You Need (Vaswani et. al. 2017) Slides and figures when not cited are from:

Certificate Translation for Specification Preserving Advices Gilles Barthe and Csar Kunz INRIA

Predicate Logic: Introduction and Translations Alice Gao Lecture 11 CS 245 Logic and

Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text Translation Ye Jia, Melvin

Fast and Adaptive Online Training of Feature-Rich Translation Models Spence Green Sida Wang

for Bilingual Resources at Museums Welcome! The webinar will begin at 2:00 p.m. C.T. THC Museum

Pseudo-Boolean Solving by Incremental Translation to SAT Pete Manolios Vasilis Papavasileiou

Sambuz

Useful Links

Newsletter

Mail Us