INF5820: Language technological applications Course summary Andrey - PowerPoint PPT Presentation

INF5820: Language technological applications Course summary Andrey Kutuzov, Lilja Øvrelid, Stephan Oepen, Taraka Rama & Erik Velldal University of Oslo 20 November 2018

Today ◮ Exam preparations ◮ Collectively summing up ◮ Results of obligatory assignment(s) ◮ Current trends, beyond INF5820. ◮ Cutting edge in word embedding pre-training ◮ Transfer and multitask learning ◮ Adversarial learning ◮ Transformers ◮ And more. . . 2

Exam ◮ When: Monday November 26, 09:00 AM (4 hours). ◮ Where: Store fysiske lesesal, Fysikkbygningen ◮ How: ◮ No aids (no textbooks, etc.) ◮ Pen and paper (not Inspera) ◮ Not a programming exam ◮ Focus on conceptual understanding ◮ Could still involve equations, but no complicated calculations by hand ◮ Details of use cases we’ve considered (in lectures or assignments) are also relevant 3

Neural Network Methods for NLP (The Great Wave off Kanagawa by Katsushika Hokusai) 4

What has changed? ◮ We’re still within the realm of supervised machine learning. But: ◮ A shift from linear models with discrete representations of manually specified features, ◮ to non-linear models with distributed and learned representations. 5

What has changed? ◮ We’re still within the realm of supervised machine learning. But: ◮ A shift from linear models with discrete representations of manually specified features, ◮ to non-linear models with distributed and learned representations. ◮ We’ll consider two main themes running through the semester: architectures and representations. 5

Architectures and model design 6

Architectures and model design ◮ Linear classifiers, feed-forward networks, (MLPs and CNNs) and RNNs. 6

Architectures and model design ◮ Linear classifiers, feed-forward networks, (MLPs and CNNs) and RNNs. ◮ Various instantiations of 1d CNNs: 6

Architectures and model design ◮ Linear classifiers, feed-forward networks, (MLPs and CNNs) and RNNs. ◮ Various instantiations of 1d CNNs: ◮ Multi-channel, stacked / hierarchical, graph CNNs ◮ Other choices: pooling strategy, window sizes, number of filters, stride. . . 6

Architectures and model design ◮ Linear classifiers, feed-forward networks, (MLPs and CNNs) and RNNs. ◮ Various instantiations of 1d CNNs: ◮ Multi-channel, stacked / hierarchical, graph CNNs ◮ Other choices: pooling strategy, window sizes, number of filters, stride. . . ◮ Variations beyond simple RNNs: 6

Architectures and model design ◮ Linear classifiers, feed-forward networks, (MLPs and CNNs) and RNNs. ◮ Various instantiations of 1d CNNs: ◮ Multi-channel, stacked / hierarchical, graph CNNs ◮ Other choices: pooling strategy, window sizes, number of filters, stride. . . ◮ Variations beyond simple RNNs: ◮ (Bi)LSTM + GRU (gating), attention and stacking. ◮ Variations of how RNNs can be used: Acceptors, transducers, conditioned generation (encoder-decoder / seq.-to-seq.) ◮ Various ways of performing sequence labeling with RNNs 6

Architectures and model design ◮ Linear classifiers, feed-forward networks, (MLPs and CNNs) and RNNs. ◮ Various instantiations of 1d CNNs: ◮ Multi-channel, stacked / hierarchical, graph CNNs ◮ Other choices: pooling strategy, window sizes, number of filters, stride. . . ◮ Variations beyond simple RNNs: ◮ (Bi)LSTM + GRU (gating), attention and stacking. ◮ Variations of how RNNs can be used: Acceptors, transducers, conditioned generation (encoder-decoder / seq.-to-seq.) ◮ Various ways of performing sequence labeling with RNNs Various aspects of modeling common to all the neural architetctures: 6

Architectures and model design ◮ Linear classifiers, feed-forward networks, (MLPs and CNNs) and RNNs. ◮ Various instantiations of 1d CNNs: ◮ Multi-channel, stacked / hierarchical, graph CNNs ◮ Other choices: pooling strategy, window sizes, number of filters, stride. . . ◮ Variations beyond simple RNNs: ◮ (Bi)LSTM + GRU (gating), attention and stacking. ◮ Variations of how RNNs can be used: Acceptors, transducers, conditioned generation (encoder-decoder / seq.-to-seq.) ◮ Various ways of performing sequence labeling with RNNs Various aspects of modeling common to all the neural architetctures: ◮ dimensionalities, regularization, initialization, handling OOVs, activation functions, batches, loss-functions, learning rate, optimizer,. . . ◮ Embedding pre-training and text pre-processing ◮ Backpropagation, vanishing / exploding gradients 6

Representations ◮ An important part of the neural ‘revolution’ in NLP: the input representations provided to the learner. ◮ Traditional feature vectors: ◮ Word embeddings: ◮ Main benefit of using embeddings rather than one-hot encodings: 7

Representations ◮ An important part of the neural ‘revolution’ in NLP: the input representations provided to the learner. ◮ Traditional feature vectors: High-dimensional, sparse, categorical and discrete. Based on manualy specified feature templates. ◮ Word embeddings: Low-dimensional, dense, continous and distributed. Often learned automatically, eg. as a language model. ◮ Main benefit of using embeddings rather than one-hot encodings: ◮ Information-sharing between features, counteracts data-sparsness. ◮ Can be computed from unlabelled data. 7

Representations ◮ An important part of the neural ‘revolution’ in NLP: the input representations provided to the learner. ◮ Traditional feature vectors: High-dimensional, sparse, categorical and discrete. Based on manualy specified feature templates. ◮ Word embeddings: Low-dimensional, dense, continous and distributed. Often learned automatically, eg. as a language model. ◮ Main benefit of using embeddings rather than one-hot encodings: ◮ Information-sharing between features, counteracts data-sparsness. ◮ Can be computed from unlabelled data. ◮ We’ve also considered various tasks for intrinsic evaluation of distributional word vectors. 7

Representation learning ◮ With neural network models, our main interest is not always in the final classification outcome itself. ◮ Rather, we might be interested in the learned internal representations. ◮ Examples? 8

Representation learning ◮ With neural network models, our main interest is not always in the final classification outcome itself. ◮ Rather, we might be interested in the learned internal representations. ◮ Examples? ◮ Embeddings in neural models ◮ Pre-trained or learned from scratch (with one-hot input) ◮ Static (frozen) or dynamic. ◮ The pooling layer of a CNN or the final hidden state of an RNN provides a fixed-length representation of an arbitray-length sequence. 8

Specialized NN architectures ◮ Focus of manual engineering shifted from features to architechture decisions and hyper-parameters. ◮ The elimination of feature-engineering is only partially true: ◮ Need for specialized NN architectures that extract higher-level features: ◮ CNNs and RNNs. ◮ Pitch: layers and architectures are like Lego bricks – mix and match. 9

Specialized NN architectures ◮ Focus of manual engineering shifted from features to architechture decisions and hyper-parameters. ◮ The elimination of feature-engineering is only partially true: ◮ Need for specialized NN architectures that extract higher-level features: ◮ CNNs and RNNs. ◮ Pitch: layers and architectures are like Lego bricks – mix and match. ◮ Examples of things you could be asked to reflect on: ◮ When would you use each architecture? ◮ What are some of the ways we’ve combined the various bricks? ◮ When choosing to apply a non-hierarchical CNN, what assumptions are you implicitly making about the nature of your task or data? ◮ Why could it make sense run a CNN over the word-by-word vector outputs of an RNN (e.g. a BiLSTM)? 9

INF5820: Experiment Design 10

INF5820: Experiment Design Methodology ◮ Small, elite group of I:ST finishers; ◮ Engage everyone from start to finish; ◮ an Olympic twist: friendly competition; ◮ acquire practical skills and intutitions. 10

INF5820: Experiment Design Methodology ◮ Small, elite group of I:ST finishers; ◮ Engage everyone from start to finish; ◮ an Olympic twist: friendly competition; ◮ acquire practical skills and intutitions. Main Results ◮ We are very happy with results from experiment (so far); ◮ commonly apply two key metrics in internal evaluation: 10

INF5820: Experiment Design Methodology ◮ Small, elite group of I:ST finishers; ◮ Engage everyone from start to finish; ◮ an Olympic twist: friendly competition; ◮ acquire practical skills and intutitions. Main Results ◮ We are very happy with results from experiment (so far); ◮ commonly apply two key metrics in internal evaluation: ◮ retention rate: 10

INF5820: Experiment Design Methodology ◮ Small, elite group of I:ST finishers; ◮ Engage everyone from start to finish; ◮ an Olympic twist: friendly competition; ◮ acquire practical skills and intutitions. Main Results ◮ We are very happy with results from experiment (so far); ◮ commonly apply two key metrics in internal evaluation: ◮ retention rate: 9 / 9; 10

INF5820: Language technological applications Course summary Andrey - PowerPoint PPT Presentation

INF5820: Language technological applications Course summary Andrey Kutuzov, Lilja vrelid, Stephan Oepen, Taraka Rama & Erik Velldal University of Oslo 20 November 2018 Today Exam preparations Collectively summing up Results

INF5820: Language Technological Applications Applications of Recurrent Neural Networks Stephan

INF5820: Language technological applications Gated RNNs (3:2) Taraka Rama University of Oslo 30

INF5820: Language technological applications Lecture 6 Evaluating Word Embeddings and Using them

Course Orientation q Course Description q Course Outcomes q Course Requirements q Course Outline

The Motion The Combined Technological and The Combined Technological and Economic Economic

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Technological barriers in PEM Technological barriers in PEM fuel cell system development fuel

PERM STATE AGRO-TECHNOLOGICAL UNIVERSITY NAMED AFTER ACADEMICIAN D.N. PRYANISHNIKOV pERM , rUSSI

Technological Learning Systems, Technological Learning Systems, Competitiveness and Development

The telescopes from the The telescopes from the technological point of view: technological point

Agricultural R&D, Technological Agricultural R&D, Technological Change, and Food Security

Brook Abegaz, Tennessee Technological University, Fall 2013 1 Tennessee Technological University

Macroeconomic Effects of Technological Transition F. Collard, P. F` eve & F. Portier April,

Digital inequalities in children and young people: A technological matter? INDIRE OECD

Baldwin Space Summary October 25 1 Baldwin School Space Summary 2 Baldwin School Space Summary

Course Search Widget Topics StudyLink Course Search Widget Demo Generic Course Search

Center Overview Christiana Honsberg, Director QESST ERC, Arizona State University QESST

Neural Language Models The New Frontier of Natural Language Understanding Gabriele Sarti

Database Management Course Content Systems Introduction Database Design Theory

Hyper-heuristics and Cross-domain Optimisation Gabriela Ochoa Computing Science and Mathematics,

The Modern Marketer Understanding data science and the multi-channel, multi-device shopper Ben

PART II Concept lattices and related structures in a fuzzy setting Radim BELOHLAVEK Dept.

3D2N P6 Outdoor Adventure Camp 8 th March 2018 10 th

6.02 Fall 2012 Lecture #7 Viterbi decoding of convolutional codes Path and branch metrics

INF5820: Language technological applications Course summary Andrey - PowerPoint PPT Presentation

INF5820: Language technological applications Course summary Andrey Kutuzov, Lilja vrelid, Stephan Oepen, Taraka Rama & Erik Velldal University of Oslo 20 November 2018 Today Exam preparations Collectively summing up Results

INF5820: Language Technological Applications Applications of Recurrent Neural Networks Stephan

INF5820: Language technological applications Gated RNNs (3:2) Taraka Rama University of Oslo 30

INF5820: Language technological applications Lecture 6 Evaluating Word Embeddings and Using them

Course Orientation q Course Description q Course Outcomes q Course Requirements q Course Outline

The Motion The Combined Technological and The Combined Technological and Economic Economic

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Technological barriers in PEM Technological barriers in PEM fuel cell system development fuel

PERM STATE AGRO-TECHNOLOGICAL UNIVERSITY NAMED AFTER ACADEMICIAN D.N. PRYANISHNIKOV pERM , rUSSI

Technological Learning Systems, Technological Learning Systems, Competitiveness and Development

The telescopes from the The telescopes from the technological point of view: technological point

Agricultural R&amp;D, Technological Agricultural R&amp;D, Technological Change, and Food Security

Brook Abegaz, Tennessee Technological University, Fall 2013 1 Tennessee Technological University

Macroeconomic Effects of Technological Transition F. Collard, P. F` eve &amp; F. Portier April,

Digital inequalities in children and young people: A technological matter? INDIRE OECD

Baldwin Space Summary October 25 1 Baldwin School Space Summary 2 Baldwin School Space Summary

Course Search Widget Topics StudyLink Course Search Widget Demo Generic Course Search

Center Overview Christiana Honsberg, Director QESST ERC, Arizona State University QESST

Neural Language Models The New Frontier of Natural Language Understanding Gabriele Sarti

Database Management Course Content Systems Introduction Database Design Theory

Hyper-heuristics and Cross-domain Optimisation Gabriela Ochoa Computing Science and Mathematics,

The Modern Marketer Understanding data science and the multi-channel, multi-device shopper Ben

PART II Concept lattices and related structures in a fuzzy setting Radim BELOHLAVEK Dept.

3D2N P6 Outdoor Adventure Camp 8 th March 2018 10 th

6.02 Fall 2012 Lecture #7 Viterbi decoding of convolutional codes Path and branch metrics

Agricultural R&D, Technological Agricultural R&D, Technological Change, and Food Security

Macroeconomic Effects of Technological Transition F. Collard, P. F` eve & F. Portier April,