Chatbot models, NLU & ASR Pierre Lison IN4080 : Natural - PDF document

www.nr.no Chatbot models, NLU & ASR Pierre Lison IN4080 : Natural Language Processing (Fall 2020) 12.10.2020 Plan for today � Obligatory assignment � Chatbot models (cont'd) � Natural Language Understanding (NLU) for dialogue systems � Speech recognition 2

Plan for today � Obligatory assignment � Chatbot models (cont'd) � Natural Language Understanding (NLU) for dialogue systems � Speech recognition 3 Oblig 3 Three parts: Chatbot trained on 1. movie and TV subtitles Silence detector in 2. audio files (Simulated) talking 3. elevator

Oblig 3 � Deadline: November 6 � Concrete delivery: Jupyter notebook � Need to run version of Python with additional (Anaconda) packages � See obligatory assignment for details � Computing the utterance embeddings in Part 1 requires some patience (or enough computational ressources) Plan for today � Obligatory assignment � Chatbot models (cont'd) � Natural Language Understanding (NLU) for dialogue systems � Speech recognition 6

Chatbot models: recap � Rule-based models: if (some pattern match X on user input) then respond Y to user � IR models using cosine similarities between vectors Where C is the set of utterances in dialogue corpus (in a vector representation) and q is the user input (also in vector form) Dual encoders Another type of IR-based chatbots We compute here the dot product between the user � input (called " context ") and a possible response u c Utterance encoder (context) Dot product Where are you ? u r u c �� u r (= score expressing Utterance encoder (response) how good/appropriate the response is for Over there ! the given context)

Dual encoders The encoders are typically deep neural networks, such as LSTMs or transformers u c Utterance encoder (context) Where are you ? u c �� u r u r Utterance encoder (response) Over there ! The two encoders often rely on a shared neural network, apart from a last transformation step that is specific for the context or response Dual encoders � �� u c Utterance encoder (context) Where are you ? �� c �� u r ) u r Utterance encoder (response) We can add a sigmoid function to compress the Over there ! score into the [0,1] range Dual encoders are trained with both positive and negative examples: Positive : actual consecutive pairs of utterances � observed in the corpus � output=1 Negative : random pairs of utterances � output=0 �

At prediction time, we Dual encoders search for the response with the maximum score u c Utterance encoder (context) Where are you ? �� c �� u r ) u r Utterance encoder (response) We can precompute the vectors u r for all possible Over there ! responses in corpus Given a new user input, we have to: Compute the context embeddings u c � � Compute its dot product with all responses � Search for the response with max score Seq2seq models � Sequence-to-sequence models generate a response token-by-token � Akin to machine translation � Advantage: can generate «creative» responses not observed in the corpus � Two steps: � First «encode» the input with e.g. an LSTM � Then «decode» the output token-by-token 12

Seq2seq models Chatbot response User input NB: state-of-the-art seq2seq models use an attention mechanism (not shown here) above the recurrent layer 13 [Image borrowed from Deep Learning for Chatbots: Part 1] Seq2seq models � Interesting models for dialogue research � But : � Difficult to «control» (hard to know in advance what the system may generate) � Lack of diversity in the responses (often stick to generic answers: «I don’t know» etc.) � Getting a seq2seq model that works reasonably well takes a lot of time (and tons of data) [Li, Jiwei, et al. (2015) "A diversity-promoting objective function for neural conversation models.», ACL] 14

Example from Meena (Google) 2.6 billion parameters, trained on 341 GB of text (public domain social media conversations) https://ai.googleblog.com/2020/01/towards-conversational-agent-that-can.html 15 Pro : Fine-grained Taking stock control on interaction Con : Difficult to build, � Rule-based chatbots scale and maintain Pro : Easy to build, � Corpus-based chatbots well-formed responses � IR approaches Con : Can only repeat � Seq2seq existing responses in corpus Pro : Powerful model, Con : Difficult to train, hard to can generate anything control, needs lots of data Corpus-based approaches seen so far often limited to chi-chat dialogues (for which we can easily crawl data)

Plan for today � Obligatory assignment � Chatbot models (cont'd) � Natural Language Understanding (NLU) for dialogue systems � Speech recognition 17 NLU-based chatbots Language Generation / Understanding response selection Can we build data-driven chatbots for task- specific interactions (not just chit-chat)? "Standard" case for commercial chatbots � Typically: no available task-specific data �

NLU-based chatbots Language Generation / Understanding response selection � Solution: NLU as a classification task � From a set of (predefined) possible intents � Response selection generally handcrafted � Chatbot owners want to have full control over what the chatbot actually says Intent recognition Goal : map user utterance to its most likely intent Input : sequence (of characters or tokens) � + possibly preceding context Output : intent (what the user tries to accomplish) � Intent = GetInfoOpenHours(RecyclingStation) Intent Response recognition selection "When is the "The recycling station is open recycling on weekdays from 10 to 18" station open?"

Intent recognition � Many possible machine learning models � Convolutional, recurrent, transformers, etc Utterance softmax layer (often Distribution ... an LSTM) over intents Embeddings When is ... open ? � Must collect training data : user utterances (manually) annotated with intents � Often done by "chatbot trainers" in industry 21 Small amounts of data? Use transfer learning to exploit models 1. trained on related domains Source domain (with large Source model Output s Data s amounts of training data) Target domain Target- Source (with small Output t Data t specific model amounts of model training data)

Small amounts of data? Use transfer learning to exploit models 1. trained on related domains Use data augmentation to generate new 2. labelled utterances from existing ones " When is the recycling GetInfoOpenHours station open?" (RecyclingStation) Replace with synonyms GetInfoOpenHours " At what time is the (RecyclingStation) recycling station open?" Small amounts of data? Use transfer learning to exploit models 1. trained on related domains Use data augmentation to generate more 2. utterances from existing ones Collect raw (unlabelled) utterances and 3. use weak supervision to label those [see e.g. Mallinar et al (2019), "Bootstrapping conversational agents with weak supervision", IAAI.]

Slot filling � In addition to intents, we also sometimes need to detect specific entities ("slots"), such as mentions of places or times «Show me morning flights from Boston to San Francisco on Tuesday» � Slots are domain-specific � And so are the ontologies listing all possible values for each slot 25 Slot filling Can be framed as a sequence labelling task (as in NER), using e.g. BIO schemes 26 26 [illustration from D. Jurafsky]

Response selection � Given an intent, how to create a response? � In commercial systems, system responses are typically written by hand Response � Possibly in templated form, Intent selection i.e. "{Place} is open from System {Start-time} to {Close-time}" response NLU � But data-driven generation methods also exists User utterance [see e.g. Garbacea & Mei (2020), " Neural Language Generation: Formulation, Methods, and Evaluation "] Plan for today � Obligatory assignment � Chatbot models (cont'd) � Natural Language Understanding (NLU) for dialogue systems � Speech recognition 28

Spoken dialogue systems Language Generation / Transcription Understanding response selection hypotheses Speech Speech Text synthesis recognition Spoken interfaces add a layer of complexity Need to handle uncertainties, ASR errors etc. � Speech communicates more than just words � (intonation, emotions in voice, etc.) Need to handle turn-taking � A difficult problem! 30

The speech chain [Denes and Pinson (1993), «The speech chain»] 31 Speech production � Sounds are variations in air pressure � How are they produced? � An air supply : the lungs (we usually speak by breathing out) � A sound source setting the air in motion (e.g. vibrating) in ways relevant to speech production: the larynx , in which the vocal folds are located � A set of 3 filters modulating the sound: the pharynx , the oral tract (teeth, tongue, palate,lips, etc.) & the nasal tract 32

Chatbot models, NLU & ASR Pierre Lison IN4080 : Natural - PDF document

www.nr.no Chatbot models, NLU & ASR Pierre Lison IN4080 : Natural Language Processing (Fall 2020) 12.10.2020 Plan for today Obligatory assignment Chatbot models (cont'd) Natural Language Understanding (NLU) for dialogue

ASR, NLU, DM Ling575 Spoken Dialog Systems April 12, 2017 Roadmap ASR Basic

SDS: ASR, NLU, & VXML Ling575 Spoken Dialog April 14, 2016 Roadmap Dialog System

History and goals of NLU; course plan and goals Bill MacCartney and Christopher Potts CS 244U:

CPSC 503 - Intro to E2E ASR Peter Sullivan - April 24th 2020 Lecture Overview Intro to ASR

S2S ASR Advanced issues Tight coupling Tight coupling ASR should output N ASR should

Speech Processing 15-492/18-492 Speech Recognition Systems Other ASR techniques ASR Systems

Advanced NLU & Dialog Models Ling575 Spoken Dialog Systems April 21, 2016 Roadmap

Use of f th the SA SAWS ASR ASR for r Sp Spri ringflow Protection Optimization through

1 In this presentation the two types of alkali-aggregate reaction ASR and ACR will de

Water Authoritys ASR Policy Perspective RICK SHEAN, WATER QUALITY HYDROLOGIST AUG. 16, 2017

(Construction) Grammar does not Suffice for NLU Jerome Feldman, ICSI & UC Berkeley Natural

Why NLU doesnt generalize to NLG Yejin Choi Paul G. Allen School of Computer Science &

Building a chatbot: NLP pipeline and dependency parsing By: Andrei uiu meetup.com/IASI-AI/

Retrieving Comparative Arguments using Deep Pre-trained Language Models and NLU Viktoriia

pler Sulfide Expansion Project Photographic Update February 2017 TSX: ASR / ASX: AQG / 1

1 Remember from the presentation on Fundamentals of ASR we learned that there are three

Worms & Botnets CS 161: Computer Security Prof. Vern Paxson TAs: Devdatta Akhawe, Mobin

CSCI 21 215 Soc ocial & Eth thical Iss Issues In In Com omputing Class 22 (some)

Stateful bots Alan Nichol Co-founder and CTO, Rasa DataCamp Building Chatbots in Python What

Group Communication: Never Split the Party Debbie Fligor, Isaac Galvan, Bobbi Hardy Top 5

SMART HOME OVER IRC HAVING A CHAT WITH YOUR TOASTER 1 Motivation In this Lab you will

Ava: From data to insights through conversations A review by Apaar Shanker DATA ANALYTICS

Verification of Deep Learning Systems Xiaowei Huang, University of Liverpool December 25, 2017

Dialogue Quality and Nugget Detection for Short Text Conversation (STC-3) based on Hierarchical

Chatbot models, NLU & ASR Pierre Lison IN4080 : Natural - PDF document

www.nr.no Chatbot models, NLU & ASR Pierre Lison IN4080 : Natural Language Processing (Fall 2020) 12.10.2020 Plan for today Obligatory assignment Chatbot models (cont'd) Natural Language Understanding (NLU) for dialogue

ASR, NLU, DM Ling575 Spoken Dialog Systems April 12, 2017 Roadmap ASR Basic

SDS: ASR, NLU, &amp; VXML Ling575 Spoken Dialog April 14, 2016 Roadmap Dialog System

History and goals of NLU; course plan and goals Bill MacCartney and Christopher Potts CS 244U:

CPSC 503 - Intro to E2E ASR Peter Sullivan - April 24th 2020 Lecture Overview Intro to ASR

S2S ASR Advanced issues Tight coupling Tight coupling ASR should output N ASR should

Speech Processing 15-492/18-492 Speech Recognition Systems Other ASR techniques ASR Systems

Advanced NLU &amp; Dialog Models Ling575 Spoken Dialog Systems April 21, 2016 Roadmap

Use of f th the SA SAWS ASR ASR for r Sp Spri ringflow Protection Optimization through

1 In this presentation the two types of alkali-aggregate reaction ASR and ACR will de

Water Authoritys ASR Policy Perspective RICK SHEAN, WATER QUALITY HYDROLOGIST AUG. 16, 2017

(Construction) Grammar does not Suffice for NLU Jerome Feldman, ICSI &amp; UC Berkeley Natural

Why NLU doesnt generalize to NLG Yejin Choi Paul G. Allen School of Computer Science &amp;

Building a chatbot: NLP pipeline and dependency parsing By: Andrei uiu meetup.com/IASI-AI/

Retrieving Comparative Arguments using Deep Pre-trained Language Models and NLU Viktoriia

pler Sulfide Expansion Project Photographic Update February 2017 TSX: ASR / ASX: AQG / 1

1 Remember from the presentation on Fundamentals of ASR we learned that there are three

Worms &amp; Botnets CS 161: Computer Security Prof. Vern Paxson TAs: Devdatta Akhawe, Mobin

CSCI 21 215 Soc ocial &amp; Eth thical Iss Issues In In Com omputing Class 22 (some)

Stateful bots Alan Nichol Co-founder and CTO, Rasa DataCamp Building Chatbots in Python What

Group Communication: Never Split the Party Debbie Fligor, Isaac Galvan, Bobbi Hardy Top 5

SMART HOME OVER IRC HAVING A CHAT WITH YOUR TOASTER 1 Motivation In this Lab you will

Ava: From data to insights through conversations A review by Apaar Shanker DATA ANALYTICS

Verification of Deep Learning Systems Xiaowei Huang, University of Liverpool December 25, 2017

Dialogue Quality and Nugget Detection for Short Text Conversation (STC-3) based on Hierarchical

SDS: ASR, NLU, & VXML Ling575 Spoken Dialog April 14, 2016 Roadmap Dialog System

Advanced NLU & Dialog Models Ling575 Spoken Dialog Systems April 21, 2016 Roadmap

(Construction) Grammar does not Suffice for NLU Jerome Feldman, ICSI & UC Berkeley Natural

Why NLU doesnt generalize to NLG Yejin Choi Paul G. Allen School of Computer Science &

Worms & Botnets CS 161: Computer Security Prof. Vern Paxson TAs: Devdatta Akhawe, Mobin

CSCI 21 215 Soc ocial & Eth thical Iss Issues In In Com omputing Class 22 (some)