IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 - PowerPoint PPT Presentation

1 IN4080 – 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lønning

2 Neural LMs, Recurrent networks, Sequence labeling, Information Extraction, Named-Entity Recognition, Evaluation Lecture 13, 9 Nov.

Today 3  Feedforward neural networks  Neural Language Models  Recurrent networks  Information Extraction  Named Entity Recognition  Evaluation

Last week 4  Feedforward neural networks (partly recap)  Model  Training  Computational graphs  Neural Language Models  Recurrent networks  Information Extraction

Neural NLP 5  (Multi-layered) neural networks  Example: Neural language model (k- gram)  Using embeddings as word 𝑗−1 representations  𝑄 𝑥 𝑗 | 𝑥 𝑗−𝑙  Use embeddings for representing the 𝑥 𝑗 -s  Use neural network for 𝑗−1 estimating 𝑄 𝑥 𝑗 | 𝑥 𝑗−𝑙

From J&M, 3.ed., 2019 6

Pretrained embeddings 7  The last slide uses pretrained embeddings  Trained with some method, SkipGram , CBOW, Glove, …  On some specific corpus  Can be downloaded from the web  Pretrained embeddings can aslo be the input to other tasks, e.g. text classification  The task of neural language modeling was also the basis for training the embeddings

Training the embeddings 8  Alternatively we may start with one-hot representations of words and train the embeddings as the first layer in our models (=the way we trained the embeddings)  If the goal is a task different from language modeling, this may result in embeddings better suited for the specific tasks.  We may even use two set of embeddings for each word – one pretrained and one which is trained during the task.

Computational graph 10 [1] = 𝐹𝒚𝟐 𝒚𝟐 𝒗 1 𝒗 = 𝑑𝑝𝑜𝑑𝑏𝑢( 𝒘 = 𝒜 = 𝑏 = 𝒙 𝒜𝟑 = ෝ 𝒛 = 𝑡𝑝𝑔𝑢 − [1] = 𝐹𝒚𝟑 𝒚 2 𝒗 2 [1] , 𝒗 1 [1] , 𝒗 1 [1] ) 𝒘 + 𝒄 [1] 𝒙 + 𝒄 [2] 𝑋𝒗 𝑆𝑉(𝒜) = 𝑉𝒃 𝑛𝑏𝑦(𝒜𝟑) 𝒗 1 [1] = 𝐹𝒚𝟒 𝒚𝟒 𝒗 3 𝒄 [1] 𝒄 [2] W U This picture is if we train the E embeddings E With pretrained embeddings, [1] in a table for we look up 𝒗 1 each word

11 Recurrent networks

Today 12  Feedforward neural networks  Recurrent networks  Model  Language Model  Sequence Labeling  Advanced architecture  Information Extraction  Named Entity Recognition  Evaluation

Recurrent neural nets 13  Model sequences/temporal phenomena  A cell may send a signal back to itself – at the next moment in time The processing during time The network https://en.wikipedia.org/wiki/Recurrent_neural_network

Forward 14  Each U, V and W are edges with weights (matrices)  𝑦 1 , 𝑦 2 , … , 𝑦 𝑜 is the input sequence  Forward: Calculate ℎ 1 from ℎ 0 and 𝑦 1 . 1. Calculate 𝑧 1 from ℎ 1 . 2. Calculate ℎ 𝑗 from ℎ 𝑗−1 and 𝑦 𝑗 , 3. and 𝑧 𝑗 from 𝑗 , for 𝑗 = 1, … , 𝑜 From J&M, 3.ed., 2019

Forward 15  𝒊 𝑢 = 𝑕 𝑉𝒊 𝑢−1 + 𝑋𝒚 𝑢  𝒛 𝑢 = 𝑔 𝑊𝒊 𝑢  𝑕 and are activation functions  (There are also bias which we didn't include in the formulas) From J&M, 3.ed., 2019

Training 16  At each output node:  Calculate the loss and the  𝜀 -term  Backpropagate the error, e.g.  the 𝜀 -term at ℎ 2 is calculated  from the 𝜀 -term at ℎ 3 by U and  the 𝜀 -term at 𝑧 2 by V  Update  V from the 𝜀 -terms at the 𝑧 𝑗 -s and  U and W from the 𝜀 -terms at the ℎ 𝑗 -s From J&M, 3.ed., 2019

Remark 17  J&M, 3. ed., 2019, sec 9.1.2  It is beyond this course to explain this at a high-level explain how this can be done in using vectors and matrices, OK detail  The formulas, however, are not  But you should be able to do correct: the actual calculations if you stick to the entries of the  Describing derivatives of vectors and matrices, as we did matrices and vectors demand a little more care, e.g. one has to above (ch. 7). transpose matrices

RNN Language model 19 𝑜−1 =  ො 𝑧 = 𝑄 𝑥 𝑜 𝑥 1 𝑡𝑝𝑔𝑢𝑛𝑏𝑦(𝑊𝒊 𝑜 )  In principle:  unlimited history  a word depends on all preceding w2 words w1  The word 𝑥 𝑗 is represented by an embedding <s>  or a one-hot and the embedding is made by the LM From J&M, 3.ed., 2019

Autoregressive generation 20  Generated by probabilities:  Choose word in accordance with prob.distribution  Part of more complex models  Encoder-decoder models  Translation From J&M, 3.ed., 2019

Today 21  Feedforward neural networks  Recurrent networks  Model  Language Model  Sequence Labeling  Sequence Labeling  Advanced architecture  Information Extraction  Named Entity Recognition  Evaluation

Neural sequence labeling: tagging 22 𝑜 =  ො 𝑧 = 𝑄 𝑢 𝑜 𝑥 1 𝑡𝑝𝑔𝑢𝑛𝑏𝑦(𝑊𝒊 𝑜 ) From J&M, 3.ed., 2019

Sequence labeling 23  Actual models for sequence labeling, e.g. tagging, are more complex  For example, that it may take words after the tag into consideration.

Stacked RNN 25  Can yield better results than single- layers  Reason?  Higher-layers of abstraction  similar to image processing (convolutional nets) From J&M, 3.ed., 2019

Bidirectional RNN 26  Example: Tagger  Considers both preceding and following words From J&M, 3.ed., 2019

LSTM 27  Problems for RNN  Long Short-Term Memory  Keep track of distant information  An advanced architecture with additional layers and weights  Vanishing gradient  Not consider the details here  During backpropagation going backwards through several layers,  Bi-LSTM (Binary LSTM) the gradient approaches 0  Popular standard architecture in NLP

28 Information extraction

Today 29  Feedforward neural networks (partly recap)  Recurrent networks  Information extraction, IE  Chunking  Named Entity Recognition  Evaluation

IE basics 30 Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents. (Wikipedia)  Bottom-Up approach  Start with unrestricted texts, and do the best you can  The approach was in particular developed by the Message Understanding Conferences (MUC) in the 1990s  Select a particular domain and task

A typical pipeline 31 From NLTK

Some example systems 32  Stanford core nlp: http://corenlp.run/  SpaCy (Python): https://spacy.io/docs/api/  OpenNLP (Java): https://opennlp.apache.org/docs/  GATE (Java): https://gate.ac.uk/  https://cloud.gate.ac.uk/shopfront  UDPipe: http://ufal.mff.cuni.cz/udpipe  Online demo: http://lindat.mff.cuni.cz/services/udpipe/  Collection of tools for NER:  https://www.clarin.eu/resource-families/tools-named-entity-recognition

Next steps 34  Chunk together words to phrases

NP-chunks 35 [ The/DT market/NN ] for/IN [ system-management/NN software/NN ] for/IN [ Digital/NNP ] [ 's/POS hardware/NN ] is/VBZ fragmented/JJ enough/RB that/IN [ a/DT giant/NN ] such/JJ as/IN [ Computer/NNP Associates/NNPS ] should/MD do/VB well/RB there/RB ./.  Exactly what is an NP-chunk?  Flat structure: no NP-chunk is part of another NP chunk  It is an NP  Maximally large  But not all NPs are chunks  Opposing restrictions

Chunking methods 36  Hand-written rules  Regular expressions  Supervised machine learning

Regular Expression Chunker 37  Input POS-tagged sentences  Use a regular expression over POS to identify NP-chunks  NLTK example:  It inserts parentheses grammar = r""" NP: {<DT|PP\$>?<JJ>*<NN>} {<NNP>+} """

IOB-tags 38  B-NP: First word in NP  Properties  One tag per token  I-NP: Part of NP , not first word  Unambiguous  O: Not part of NP (phrase)  Does not insert anything in the text itself

Assigning IOB-tags 39  The process can be considered a form for tagging  POS-tagging: Word to POS-tag  IOB-tagging: POS-tag to IOB-tag  But one may in addition use additional features, e.g. words  Can use various types of classifiers  NLTK uses a MaxEnt Classifier (=LogReg, but the implementation is slow)  We can modify along the lines of mandatory assignment 2, using scikit-learn

J&M, 3. ed. 40

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 - PowerPoint PPT Presentation

1 IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Neural LMs, Recurrent networks, Sequence labeling, Information Extraction, Named-Entity Recognition, Evaluation Lecture 13, 9 Nov. Today 3 Feedforward neural networks

Dialogue systems & chatbots Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

Dialogue systems & chatbots Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

Chatbot models, NLU & ASR Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Tagging and sequence

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning Today 2 Part 1: Course

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Vectors, Distributions,

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Probabilities Tutorial,

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Neural networks, Language

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Words, text processing

Ethics in Natural Language Processing Pierre Lison IN4080 : Natural Language Processing (Fall

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Logistic Regression

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 IE: Relation extraction,

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning Looking at data 2 Data 3

Ethics in Natural Language Processing Pierre Lison IN4080 : Natural Language Processing (Fall

Dialogue management, system design & evaluation Pierre Lison IN4080 : Natural Language

Fall to Fall Enrollment Comparison Fall to Fall Enrollment Comparison Student FTE, Fall 2000

CLSW 2020 Revisiting Tibetan Word Segmentation with Neural Networks Sangjie Duanzhu, Cizhen

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and

1/33 David Parnas 2006 August 08 12:15 slides Software Quality Research Laboratory -

Architecture and Simplicity UNC COMP 523 Wed Sep 9, 2020 Prof. Jeff Terrell 1 / 31

Time Complexity: P and NP 17-0 Big-Oh Notation Recall that g = O ( f ) iff n

Intractable Problems [HMU06,Chp.10a] Time-Bounded Turing Machines Classes P and NP

Fundamentele Informatica 3 voorjaar 2014 A slide from lecture 6

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 - PowerPoint PPT Presentation

1 IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Neural LMs, Recurrent networks, Sequence labeling, Information Extraction, Named-Entity Recognition, Evaluation Lecture 13, 9 Nov. Today 3 Feedforward neural networks

Dialogue systems &amp; chatbots Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

Dialogue systems &amp; chatbots Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

Chatbot models, NLU &amp; ASR Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Tagging and sequence

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning Today 2 Part 1: Course

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Vectors, Distributions,

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Probabilities Tutorial,

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Neural networks, Language

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Words, text processing

Ethics in Natural Language Processing Pierre Lison IN4080 : Natural Language Processing (Fall

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 Logistic Regression

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 IE: Relation extraction,

IN4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning Looking at data 2 Data 3

Ethics in Natural Language Processing Pierre Lison IN4080 : Natural Language Processing (Fall

Dialogue management, system design &amp; evaluation Pierre Lison IN4080 : Natural Language

Fall to Fall Enrollment Comparison Fall to Fall Enrollment Comparison Student FTE, Fall 2000

CLSW 2020 Revisiting Tibetan Word Segmentation with Neural Networks Sangjie Duanzhu, Cizhen

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and

1/33 David Parnas 2006 August 08 12:15 slides Software Quality Research Laboratory -

Architecture and Simplicity UNC COMP 523 Wed Sep 9, 2020 Prof. Jeff Terrell 1 / 31

Time Complexity: P and NP 17-0 Big-Oh Notation Recall that g = O ( f ) iff n

Intractable Problems [HMU06,Chp.10a] Time-Bounded Turing Machines Classes P and NP

Fundamentele Informatica 3 voorjaar 2014 A slide from lecture 6

Dialogue systems & chatbots Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

Dialogue systems & chatbots Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

Chatbot models, NLU & ASR Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

Dialogue management, system design & evaluation Pierre Lison IN4080 : Natural Language