Introductory Notes on Machine Translation and Deep Learning - PowerPoint PPT Presentation

NPFL116 Compendium of Neural Machine Translation Introductory Notes on Machine Translation and Deep Learning February 20, 2017 Jindřich Libovický, Jindřich Helcl Charles Univeristy in Prague Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics

What is machine translation? Time for discussion.

What we think… captured in the data they translate • MT does not care what translation is • we believe people know what translation is and that it is • we evaluate how well we can mimic what humans do when

Deep Learning representation with the increasing level of complexity and abstraction (Goodfellow et al.) raw outputs as parameterizable real-valued functions and finding good parameters for the functions (me) neural networks (backpropaganda, ha, ha) • machine learning that hierarchically infers suitable data • formulating end-to-end relation of a problems’ raw inputs and • industrial/marketing buzzword for machine learning with

Neural Network . . . . . x . . . . . . . . . . . . . ↓ ↓ h 1 = f ( W 1 x + b 1 ) ↓↑ ↓ ↑ h 2 = f ( W 2 h 1 + b 2 ) ↓↑ ↓ ↑ ↓↑ ↓ ↑ h n = f ( W n h n − 1 + b n ) ↓↑ ↓ ↑ ∂ W o = ∂ E ∂ E ∂ o o = g ( W o h n + b o ) ∂ o · ∂ W o ↓ ↓ ↑ ∂ E E = e ( o , t ) → ∂ o

Building Blocks (1) (allows innovations like inventing LSTM cells, ReLU activation) layer-level (allows innovations like batch normalization, dropout) programming concepts (allows innovations like attention) • individual neurons / more complex units like recurrent cells • libraries like Keras, Lasagne, TFSlim conceptualize on • sometimes higher-level conceptualization, similar to functional

Building Blocks (2) Single Neuron 1940’s transforms to input Layer … f nonlinearity, W …weight matrix, b …bias allows using matrix multiplication f ( Wx + b ) • computational model from • having the network in layers • adds weighted inputs and • allows GPU acceleration • vector space interpretations

Encoder & Decoder Encoder: Functional fold (reduce) with function foldl a s xs Decoder: Inverse operation – functional unfold unfoldr a s Source: Colah’s blog ( http://colah.github.io/posts/2015-09-NN-Types-FP/ )

RNNs & Convolutions General RNN: Map with accumulator mapAccumR a s xs Bidirectional RNN: Zip left and right accumulating map zip (mapAccumR a s xs) (mapAccumL a' s' xs) Convolution: Zip neighbors and apply function zipWith a xs (tail xs) Source: Colah’s blog ( http://colah.github.io/posts/2015-09-NN-Types-FP/ )

Optimization • data is constant, treat the network as function of parameters • the differentiable error is function of parameters as well • clever variants of gradient descent algorithm

Deep Learning as Alchemy model – just rules of thumb learned (as in physics), there are only experiments • there no rigorous manual how to develop a good deep learning • we don’t know how to interpret the weights the network has • there is no theory that is able to predict results of experiments

Recoding in mathematics …became planar curves Algebraic equations Image: Existential comics ( http://existentialcomics.com/ ) 100 f(x) 10 x 2 − x − 60 g(x) = 0 h(x) 50 0 . 2 x 3 − 2 x 2 − 10 x + 4 = 0 − 2 x 2 − 10 = 0 0 Y -50 -100 -4 -2 0 2 4 X

Watching Learning Curves Source: Convolutional Neural Networks for Visual Recognition at Stanford University ( http://cs231n.github.io/neural-networks-3/ )

Other Things to Watch During Training (1) • train and validation loss

Other Things to Watch During Training (2) • target metric on training and validation data • L2 and L1 norm of parameters

Other Things to Watch During Training (3) • gradients of the parameters • non-linearities saturation

What’s Strange on Neural MT symbols • we naturally think of translation in terms of manipulating with • neural network represents everything as real-space vectors • ignore pretty much everythng we know about language

Reading for the Next Week LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. “Deep learning.” Nature 521.7553 (2015): 436. http://pages.cs.wisc.edu/~dyer/cs540/handouts/ deep-learning-nature2015.pdf Question: Can you identify some implicit assumptions the authors make about sentence meaning while talking about NMT? Do you think they are correct? How do the properties that the authors attribute to LSTM networks correspond to your own ideas how should language be computationally processed?

Introductory Notes on Machine Translation and Deep Learning - PowerPoint PPT Presentation

NPFL116 Compendium of Neural Machine Translation Introductory Notes on Machine Translation and Deep Learning February 20, 2017 Jindich Libovick, Jindich Helcl Charles Univeristy in Prague Faculty of Mathematics and Physics Institute of

BIBLICAL SURVEY Introductory Class Introductory Class BIBLICAL SURVEY Introductory Class

Machine Translation: Going Deep Philipp Koehn 4 June 2015 Philipp Koehn Machine Translation:

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Machine Translation Philipp Koehn 1 December 2015 Philipp Koehn Artificial Intelligence:

Neural Machine Translation II Refinements Philipp Koehn 17 October 2017 Philipp Koehn Machine

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

Global Translation Services Website translation using post-edited machine translation and

Community Translation By Willem Stoeller Examples Community Translation Virtual Teams Powering

CS502: Compiler Design Semantic Analysis (Cont.) Manas Thakur Fall 2020 Recap Syntax

Tutoriel Deep Learning: applications signal Thomas Pellegrini Universit e de Toulouse; UPS;

EN ENGL GLISH ISH LAN ANGUAG GUAGE TOPIC 37: SOLUTIONS. INTERMEDIATE. STUDENTS BOOK. UNIT

Ricetta de la pasta a la lausannese June 2012 Lonard Studer Ville de Lausanne Service

Thomas Jefferson and Apple versus the FBI Daniel J. Bernstein University of Illinois at Chicago

Abstractions and Frameworks for Deep Learning: a Discussion Caffe, Torch, Theano,

The next galactic supenova with the IceCube observatory Blondin/ Lutz Kpke Mezzacappa Mainz

Process Mining Tutorial Computational Intelligence in HealthCare 20 - 24 September 2010,