inf5820 language technological applications course summary
play

INF5820: Language technological applications Course summary Andrey - PowerPoint PPT Presentation

INF5820: Language technological applications Course summary Andrey Kutuzov, Lilja vrelid, Stephan Oepen, Taraka Rama & Erik Velldal University of Oslo 20 November 2018 Today Exam preparations Collectively summing up Results


  1. INF5820: Language technological applications Course summary Andrey Kutuzov, Lilja Øvrelid, Stephan Oepen, Taraka Rama & Erik Velldal University of Oslo 20 November 2018

  2. Today ◮ Exam preparations ◮ Collectively summing up ◮ Results of obligatory assignment(s) ◮ Current trends, beyond INF5820. ◮ Cutting edge in word embedding pre-training ◮ Transfer and multitask learning ◮ Adversarial learning ◮ Transformers ◮ And more. . . 2

  3. Exam ◮ When: Monday November 26, 09:00 AM (4 hours). ◮ Where: Store fysiske lesesal, Fysikkbygningen ◮ How: ◮ No aids (no textbooks, etc.) ◮ Pen and paper (not Inspera) ◮ Not a programming exam ◮ Focus on conceptual understanding ◮ Could still involve equations, but no complicated calculations by hand ◮ Details of use cases we’ve considered (in lectures or assignments) are also relevant 3

  4. Neural Network Methods for NLP (The Great Wave off Kanagawa by Katsushika Hokusai) 4

  5. What has changed? ◮ We’re still within the realm of supervised machine learning. But: ◮ A shift from linear models with discrete representations of manually specified features, ◮ to non-linear models with distributed and learned representations. 5

  6. What has changed? ◮ We’re still within the realm of supervised machine learning. But: ◮ A shift from linear models with discrete representations of manually specified features, ◮ to non-linear models with distributed and learned representations. ◮ We’ll consider two main themes running through the semester: architectures and representations. 5

  7. Architectures and model design 6

  8. Architectures and model design ◮ Linear classifiers, feed-forward networks, (MLPs and CNNs) and RNNs. 6

  9. Architectures and model design ◮ Linear classifiers, feed-forward networks, (MLPs and CNNs) and RNNs. ◮ Various instantiations of 1d CNNs: 6

  10. Architectures and model design ◮ Linear classifiers, feed-forward networks, (MLPs and CNNs) and RNNs. ◮ Various instantiations of 1d CNNs: ◮ Multi-channel, stacked / hierarchical, graph CNNs ◮ Other choices: pooling strategy, window sizes, number of filters, stride. . . 6

  11. Architectures and model design ◮ Linear classifiers, feed-forward networks, (MLPs and CNNs) and RNNs. ◮ Various instantiations of 1d CNNs: ◮ Multi-channel, stacked / hierarchical, graph CNNs ◮ Other choices: pooling strategy, window sizes, number of filters, stride. . . ◮ Variations beyond simple RNNs: 6

  12. Architectures and model design ◮ Linear classifiers, feed-forward networks, (MLPs and CNNs) and RNNs. ◮ Various instantiations of 1d CNNs: ◮ Multi-channel, stacked / hierarchical, graph CNNs ◮ Other choices: pooling strategy, window sizes, number of filters, stride. . . ◮ Variations beyond simple RNNs: ◮ (Bi)LSTM + GRU (gating), attention and stacking. ◮ Variations of how RNNs can be used: Acceptors, transducers, conditioned generation (encoder-decoder / seq.-to-seq.) ◮ Various ways of performing sequence labeling with RNNs 6

  13. Architectures and model design ◮ Linear classifiers, feed-forward networks, (MLPs and CNNs) and RNNs. ◮ Various instantiations of 1d CNNs: ◮ Multi-channel, stacked / hierarchical, graph CNNs ◮ Other choices: pooling strategy, window sizes, number of filters, stride. . . ◮ Variations beyond simple RNNs: ◮ (Bi)LSTM + GRU (gating), attention and stacking. ◮ Variations of how RNNs can be used: Acceptors, transducers, conditioned generation (encoder-decoder / seq.-to-seq.) ◮ Various ways of performing sequence labeling with RNNs Various aspects of modeling common to all the neural architetctures: 6

  14. Architectures and model design ◮ Linear classifiers, feed-forward networks, (MLPs and CNNs) and RNNs. ◮ Various instantiations of 1d CNNs: ◮ Multi-channel, stacked / hierarchical, graph CNNs ◮ Other choices: pooling strategy, window sizes, number of filters, stride. . . ◮ Variations beyond simple RNNs: ◮ (Bi)LSTM + GRU (gating), attention and stacking. ◮ Variations of how RNNs can be used: Acceptors, transducers, conditioned generation (encoder-decoder / seq.-to-seq.) ◮ Various ways of performing sequence labeling with RNNs Various aspects of modeling common to all the neural architetctures: ◮ dimensionalities, regularization, initialization, handling OOVs, activation functions, batches, loss-functions, learning rate, optimizer,. . . ◮ Embedding pre-training and text pre-processing ◮ Backpropagation, vanishing / exploding gradients 6

  15. Representations ◮ An important part of the neural ‘revolution’ in NLP: the input representations provided to the learner. ◮ Traditional feature vectors: ◮ Word embeddings: ◮ Main benefit of using embeddings rather than one-hot encodings: 7

  16. Representations ◮ An important part of the neural ‘revolution’ in NLP: the input representations provided to the learner. ◮ Traditional feature vectors: High-dimensional, sparse, categorical and discrete. Based on manualy specified feature templates. ◮ Word embeddings: Low-dimensional, dense, continous and distributed. Often learned automatically, eg. as a language model. ◮ Main benefit of using embeddings rather than one-hot encodings: ◮ Information-sharing between features, counteracts data-sparsness. ◮ Can be computed from unlabelled data. 7

  17. Representations ◮ An important part of the neural ‘revolution’ in NLP: the input representations provided to the learner. ◮ Traditional feature vectors: High-dimensional, sparse, categorical and discrete. Based on manualy specified feature templates. ◮ Word embeddings: Low-dimensional, dense, continous and distributed. Often learned automatically, eg. as a language model. ◮ Main benefit of using embeddings rather than one-hot encodings: ◮ Information-sharing between features, counteracts data-sparsness. ◮ Can be computed from unlabelled data. ◮ We’ve also considered various tasks for intrinsic evaluation of distributional word vectors. 7

  18. Representation learning ◮ With neural network models, our main interest is not always in the final classification outcome itself. ◮ Rather, we might be interested in the learned internal representations. ◮ Examples? 8

  19. Representation learning ◮ With neural network models, our main interest is not always in the final classification outcome itself. ◮ Rather, we might be interested in the learned internal representations. ◮ Examples? ◮ Embeddings in neural models ◮ Pre-trained or learned from scratch (with one-hot input) ◮ Static (frozen) or dynamic. ◮ The pooling layer of a CNN or the final hidden state of an RNN provides a fixed-length representation of an arbitray-length sequence. 8

  20. Specialized NN architectures ◮ Focus of manual engineering shifted from features to architechture decisions and hyper-parameters. ◮ The elimination of feature-engineering is only partially true: ◮ Need for specialized NN architectures that extract higher-level features: ◮ CNNs and RNNs. ◮ Pitch: layers and architectures are like Lego bricks – mix and match. 9

  21. Specialized NN architectures ◮ Focus of manual engineering shifted from features to architechture decisions and hyper-parameters. ◮ The elimination of feature-engineering is only partially true: ◮ Need for specialized NN architectures that extract higher-level features: ◮ CNNs and RNNs. ◮ Pitch: layers and architectures are like Lego bricks – mix and match. ◮ Examples of things you could be asked to reflect on: ◮ When would you use each architecture? ◮ What are some of the ways we’ve combined the various bricks? ◮ When choosing to apply a non-hierarchical CNN, what assumptions are you implicitly making about the nature of your task or data? ◮ Why could it make sense run a CNN over the word-by-word vector outputs of an RNN (e.g. a BiLSTM)? 9

  22. INF5820: Experiment Design 10

  23. INF5820: Experiment Design Methodology ◮ Small, elite group of I:ST finishers; ◮ Engage everyone from start to finish; ◮ an Olympic twist: friendly competition; ◮ acquire practical skills and intutitions. 10

  24. INF5820: Experiment Design Methodology ◮ Small, elite group of I:ST finishers; ◮ Engage everyone from start to finish; ◮ an Olympic twist: friendly competition; ◮ acquire practical skills and intutitions. Main Results ◮ We are very happy with results from experiment (so far); ◮ commonly apply two key metrics in internal evaluation: 10

  25. INF5820: Experiment Design Methodology ◮ Small, elite group of I:ST finishers; ◮ Engage everyone from start to finish; ◮ an Olympic twist: friendly competition; ◮ acquire practical skills and intutitions. Main Results ◮ We are very happy with results from experiment (so far); ◮ commonly apply two key metrics in internal evaluation: ◮ retention rate: 10

  26. INF5820: Experiment Design Methodology ◮ Small, elite group of I:ST finishers; ◮ Engage everyone from start to finish; ◮ an Olympic twist: friendly competition; ◮ acquire practical skills and intutitions. Main Results ◮ We are very happy with results from experiment (so far); ◮ commonly apply two key metrics in internal evaluation: ◮ retention rate: 9 / 9; 10

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend