Conversational Agents
Human-AI Interaction Luigi De Russis
Academic Year 2019/2020
Conversational Agents Human-AI Interaction Luigi De Russis - - PowerPoint PPT Presentation
Conversational Agents Human-AI Interaction Luigi De Russis Academic Year 2019/2020 Background: Voice and Speech 2 Human-AI Interaction Voice and Speech Human voice is an efficient input modality: it allows people to give commands to a
Human-AI Interaction Luigi De Russis
Academic Year 2019/2020
2
Human-AI Interaction
3
§ Human voice is an efficient input modality: it allows people to give commands to a computer quickly, on their own terms
§ Fully understanding natural language remains a dream (for now) § Voice and speech interaction became mainstream, in recent years
§ Such applications simulate a natural language interaction at different extents
have to learn and remember
Human Computer Interaction
4
§ From a computer perspective, voice-based interaction is mainly:
§ Applications may leverage one or both
added § Examples:
Human Computer Interaction
5
§ Spoken interaction is successful in some cases…
Human Computer Interaction
6
§ … and it encounters some issues, as well
also for privacy issues
extreme customization
Human Computer Interaction
7
1. Initiation
2. Knowing what to say
3. Recognition errors (speech-to-text)
4. Correcting errors 5. Mapping to possible actions
parts 6. Feedback and dialogs
Human Computer Interaction
8
… and their User Interfaces
Human-AI Interaction
9
§ Voice User Interfaces (VUIs) allow the user to interact with a system through voice or speech commands
§ Voice User Interfaces or Conversational User Interfaces?
§ Contemporary VUIs can be divided in:
Human Computer Interaction
10
§ Most of contemporary voice interaction happens on screen-first devices
§ Impressive speech recognition and language processing features
§ Main limitations
Human Computer Interaction
11
§ Users can start a task via voice, but subsequent steps require them to use the touchscreen § Visual affordances are missing (or poor)
(e.g., it does not show that people can edit a text message before sending it)
Human Computer Interaction
12
§ Tasks with some support for multi- step voice input exhibit a screen design:
GUI version
available to the user
Human Computer Interaction
13
§ No visual display at all
§ Quite good accuracy in speech recognition
Human Computer Interaction
14
§ They are quite prolix in the answers § You have to know what to say! § Some operations are "challenging", e.g.,
§ Some actions are not allowed nor expected, e.g.,
Human Computer Interaction
15
§ Voice-only devices… with a screen § A system which primarily accept user input via voice commands, and may augment audio output with visual information
devices § Typically, the display is a touch screen
Human Computer Interaction
16
… and their UI
Human Computer Interaction
17
§ Voice interaction between people and devices is analogous to learning a foreign languages
§ Easily learnt through immersion
§ Successful examples on voice-first devices:
§ Beware: people often have unrealistic expectations
Human Computer Interaction
18
§ To design a VUI, you firstly need to have a clear picture of
needs are § Then, you can write some sample dialogs and sketch a diagram of the conversation flow
§ Focus on the spoken conversation before considering any visual element
Human Computer Interaction
19
§ Controlling: specifying a goal with means of achieving it
§ Delegating: asking for an outcome without specifying how to achieve it
§ Guiding: discussing the means of achieving a goal
§ Collaborating: mutually deciding on goals between both participants
Human Computer Interaction
Currently adopted by contemporary VUIs
20
Human-AI Interaction
§ By Microsoft Research
us/research/project/guid elines-for-human-ai- interaction/ § Saleema Amershi et al. Guidelines for Human-AI
90605.3300233
21
Weather Web App: let's "chat" about the weather
Human-AI Interaction
22
§ Natural language understanding platforms
§ To design and integrate voice user interfaces into mobile apps, web applications, devices, … § Focus on simplicity and abstraction
Human Computer Interaction
23
§ Two main families: 1. Extension of a product
platform, typically integrated in "suites" of cloud services
Human Computer Interaction
24
§ "Create a Private by Design voice assistant that runs on the edge"
§ France-based startup, founded in 2013, acquired by Sonos in 2019 § Run on the edge, not in the cloud
§ Free for makers and for building prototypes § 6 fully supported languages, mostly uses Node.js
Human Computer Interaction
25
§ "Build natural and rich conversational experiences"
§ California-based startup, founded in 2010, acquired by Google in 2016
§ Free to use for simple usage § One-click integration with several services
§ Multiple languages support
§ REST API and various (official) SDKs
Human Computer Interaction
26
§ Each application (an agent) will have different entities and intents § Intent
the agent § Typically, an intent is composed by:
§ Different out-of-the-box intents can be enabled on DialogFlow
Human Computer Interaction
27
§ Entities
§ Many pre-existing entities are available on the platform
Human Computer Interaction
28
§ Base implementation:
§ HTML+CSS+JS and Python § Uses the Dialogflow v2 library
Human-AI Interaction
29
§ Multimodal Interaction – slides and video lectures:
§ Voice User Interfaces – slides and video lecture:
Human-AI Interaction
30
§ Voice User Interfaces on the Web – slides and video lectures:
Human-AI Interaction
31
§ These slides are distributed under a Creative Commons license “Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)” § You are free to:
§ Under the following terms:
you or your use.
under the same license as the original.
§ https://creativecommons.org/licenses/by-nc-sa/4.0/
Human-AI Interaction