The SmartKom Multimodal Corpus Data Collection and EndtoEnd - - PowerPoint PPT Presentation

▶

Feb 05, 2024 29 likes •399 views

The SmartKom Multimodal Corpus Data Collection and EndtoEnd Evaluation Nicole Beringer Institut fr Phonetik und Sprachliche Kommunikation LMU Mnchen The SmartKom Multimodal Corpus Data Collection and EndtoEnd

SLIDE 1

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation

Nicole Beringer Institut für Phonetik und Sprachliche Kommunikation LMU München

SLIDE 2

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation

Where can the IPSK (LMU) be found within the project?

Data Collection, Evaluation, Annotation

Feedback about user reactions

Modules Modules

user behaviour? Implementation of problem solving strategies improved prototype

SLIDE 3

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation

Responsibilities of the IPSK−group in SmartKom

Overview:

Data Collection

WOZ design WOZ experiments some useful results

End−to−End−Evaluation

Problems with Multimodality Evaluation Framework

Annotation

Transliteration of the audio

data

Prosodic Annotation Annotation of the gestures Annotation of facial expression Annotation of user states

SLIDE 4

Data Collection Evaluation

User modelling

WOZ System − Studio Recordings Annotation of audio, gesture,

emotion

Distribution

MODULES Providing Data for Recognition

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation

Responsibility Network

SLIDE 5

Data Collection

Creating and publishing of data for

the training of recognizers (speech, prosodic feature, gesture,

facial expression, emotion)

dialogue creation generation of information (speech)

Research

user modelling evaluation (usability & technical evaluation)

Software

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation

SLIDE 6

Training of recognizers user modelling The BIG Problem: Wizard−of−Oz

different users Instruction − „Market Research“ 2 recordings (4,5 minutes each) Recording of audio (different characteristics) Recording of video (face, profile, display, gestures) Interview

How to persuade users of a nonexisting system just by simulation?

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation

SLIDE 7

realistic prototype created by partners & LMU influence on development playback of atmosphere creation of the studio Reliability Quality of speech output Experiment design WOZ System with technical defects Evocation of behaviour (trial and error, gestures, emotion) Instruction Provoking of different behaviour (new gestures, anger, new input facilities) Design of the display few associations to existing systems Dialogue with intelligent machine, no ordinary input facilities

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation

SLIDE 8

good preparation intensive training of the wizards System makes mistakes Perception of the SmartKom−System

The system is a machine The system is a person The system is something in between

„That’s a telephone box, I wouldn’t expect to talk to a human. I do not have illusions!“

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation

Reliability: the fraud should not be noticed

SLIDE 9

Only few associations to existing systems allowed

Simulation of a personal assistant. existing dialogue partner Assistant has „personality“ Assistant leads through the dialogue, has proposals

Percent 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Polite Users

subjects used polite expressions subjects used greetings subjects used thanks subjects used sorry

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation

SLIDE 10

N 2.5 5 7.5 10 12.5 15 17.5 20

positive aspects

verbale Interaktion mit Assistent läuft gut einzelne Anwendungen oder Seiten positive Bewertung Persona Schnell insgesamt eine gute Idee Übersichtlich Praktisch Benutzung macht Spaß Multimodalität Sonstiges

− verbal Interaction! − Multimodality is

nly noticed by a few

users − too slow − too few Possibilities − more Help needed − Persona not often criticized!

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation

N 2.5 5 7.5 10 12.5 15 17.5 20

negative aspects

Kritik an der Sprachausgabe zu langsam zu geringer Umfang zu wenig Unterstützung Kritik an der Spracheingabe insgesamt nicht gut Straßenlärm stört Kritik an der Persona Gestikeingabe nicht gut Display

SLIDE 11

What characterizes a comfortable system?

Einfache Bedienung Spracherkennung Hardware/Aus− stattung Display−Layout Schnelligkeit Serviceangebot Multimodalität Synthese Sonstiges

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation

SLIDE 12

SmartKom WOZ−Recordings and Processing of the Data at the LMU

WOZ − Recordings

Coordin.

f Graph

Tablet DV−Video Front DV−Video Side View Beamer− Output SIVIT Stream 11 Audio− streams Cutting Transliteration (TRL) Preparation of Gesture Label− stream Holistic US−Labeling (USH) Prosodic US−Labeling (TRP) Gesture Labeling (GES) US−Labeling Facial Expr. (USM)

Deliver. Files to

DFKI Server Recording

f DVD

SLIDE 13

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation

Annotation of emotions

System is simulated Subjects are recorded (audio and video) 4,5Min interaction − e.g. „find a movie for this evening“ emotions are partly provoked by the wizards

Subjects during a recording Front view Side view

SLIDE 14

Orthographical Annotation
Marking of repetitions, hesitations, noise, speech disfluencies etc.

w001_pkw_003_SMA: <Ger"ausch> @1hier @1sehen <:<#> Sie:> <:<#> eine:> "Ubersicht "uber das Programm der ~Heidelberger Kinos . w001_pkd_004_AAA: mhm [PA] [B3 cont] . <Ger"ausch> oh<Z> [B2] , ~F<Z>ight+Club<ROT> <!1 Flight−Club> [NA] [B2] , ~Das+f"unfte+Element<Z><ROT> [NA] [B2] , ~Drum%<ROT> , ~Jakob+der+L"ugner<ROT> [NA] [B3 cont] . <A> ah<OOT> [PA] [B2] , ich w"urde gerne [NA] ~Aimee+_ <"ah> _und+Jaguar [PA] sehen [B3 fall] . <Ger"ausch> wo [PA] wird das gespielt<Z> [NA] [B3 rise] ? <PP>

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation

SLIDE 15

Annotation of gestures in 3 categories:

Interactional gestures: pointing (long & short), free gestures Supporting gestures: reading, searching, counting Residual gestures: Emotional gestures, not identifiable

gestures

I−Point (short −) R−Emotional (+ cubus)

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation

SLIDE 16

3 steps:

Prosodic annotation: audio only, formal labelling system Holistic labelling: facial expression, audio, context

Holistic labeling includes context information, which is not relevant for the facial expression recognizer. Therefore we included a „facial expression only“ labeling step (no audio). For the analysis of the prosody the speech had to be labeled. The functional approach did not seem to work with speech. Therefore we adopted a formal coding step that was used in Verbmobil (Fischer, 1999) for the prosody. The holistic and the formal step for the speech can be combined to get ecological valid data.

facial expression: labelling without audio

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation

Annotation of emotions

SLIDE 17

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation

Annotation of emotions

Categories for the prosody

step:

Pauses between phrases Pauses between words Pauses between syllables Irregular length of syllables Emphasized words Strongly emphasized words Clearly articulated words Hyperarticulated words Words overlapped by laughing Labeling with some defined

subjective categories

„anger/irritation" „joy/gratification (being

successful)“

„helplessness“ „pondering/reflecting“ „surprise“ „neutral“ „unidentifiable episode“

SLIDE 18

Conclusion (WOZ)

WOZ: realistic data for man−machine interaction

Training of recognizers Observation of user behaviour

WOZ−technique is time consuming and expensive BUT: Results out of user observations and

questionnaires can early influence the development of the system

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation

SLIDE 19

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation

Website: http://www.smartkom.org/
http://www.phonetik.uni−muenchen.de/Forschung/Publications/index.html
Corpus Overview: Schiel, F. et al. (2002): Integration of multi−modal data

and annotations into a simple extendable form: the extension of the BAS Partitur Format. LREC Conference

Steininger, S. et al. (2002b): User−State Labeling Procedures For The

Multimodal Data Collection Of SmartKom. LREC conference.

Beringer N. (2001): Evoking Gestures in SmartKom − Design of the

Graphical User Interface. Gesture Workshop 2001, London, UK. to appear in: Springer "Gesture Workshop 2001, London"

Labeling of gestures: Steininger, S. et al. (2001): Labeling of Gestures in

SmartKom − The Coding System. Gesture Workshop 2001, London.

Transliteration: Oppermann, D. et al.: Transliterationskonventionen.

SLIDE 20

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation

General Criteria of Dialogue System Evaluation (End−to−End Evaluation)

„The performance of the evaluation is very often driven by the

characteristics of the system that has to be judged “ [Andenfilger− 97].

An evaluation framework must abstract from the system itself and

from different dialogue strategies.

Combination of the developers’ and the users’ needs as well as the

constraints on the evaluation of multimodal systems in general.

Combination of objective and subjective evaluation criteria

SLIDE 21

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation

PARADISE: Paradigm for Dialogue Systems Evaluation

Comparison of Dialogue Strategies
Direct Comparison with other Dialogue Systems
Comparison of usability and objectively measurable results
Generalization and normalization over measures

Standardization of

the Evaluation of successful transactions via Attribute Value Matrices

SLIDE 22

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation

Evaluation framework for unimodal Dialogue Systems − Problems

Usability What about multimodal systems? separation of user satisfaction and dialogue complexity unique scales Objective measures multimodal costs higher dimensional AVMs there exist no static definitions of the ‘‘keys’’ necessary to compute an AVM

SLIDE 23

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation

Problems with Spoken Dialogue Evaluation Frameworks in Multimodal Dialogue Environments

How to score multimodal inputs or outputs? How to score the use of multimodal technologies? How to weight the several multimodal components of

recognition systems?

How to evaluate different scenarios?

SLIDE 24

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation

Problems with Spoken Dialogue Evaluation Frameworks in Multimodal Dialogue Environments

How to define an optimal dialogue? How to evaluate uncompleted tasks? How to deal with bad performance

due to user incooperativity?

SLIDE 25

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation Usability

Multimodal evaluation criteria Questionnaire adapted to cost functions User Satisfaction is separately compiled Standardization of questions User Satisfaction range from −3 to +3

SLIDE 26

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation Objective Evaluation Measures

Optimal dialogues depend on the system processing Length of the dialogue is defined by the user Weighting of quality and quantity measures and task success by Correlation between user satisfaction and

bjective measure.

Definition of multimodal costs Definition of a bipolar function τ for the compilation of task success via biunique information clusters Integration of uncompleted tasks: τ (j) = − 1 : task failure.

SLIDE 27

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation Definition of Weights

Quality and quantity measures usability question

Transaction success Task complexity The task was easy to solve Misunderstanding of input Offtalk SmartKom has understood my input Misunderstanding of output SmartKom can easily be understood Semantical, syntactical correctness Incremental compatibility SmartKom has answered properly in most cases Mean system response time Mean user response time The speed of the system was acceptable for each situation Timeout I always knew what to say

Acc. gesture recognition

The gestural input was successful

Acc. ASR

The speech input was successful

SLIDE 28

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation Definition of Weights

Quality and quantity measures usability question

Dialogue complexity SmartKom worked as assumed SmartKom reacted quickly to my input SmartKom is easy to handle Percentage of appropriate/inappropriate system directive diagnostic utterances SmartKom offered an adequate amount

f high quality information

Percentage of explicit recovery answers SmartKom is easy to handle repetitions

No. of ambiguities

Diagnostic error messages Rejections SmartKom needs input only once to successfully complete a task Timeout Help−analyzer SmartKom offers adequate help

SLIDE 29

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation Definition of Weights

Quality and quantity measures usability question

Output complexity (display) The display is clearly designed Mean elapsed time Task completion time Dialogue elapsed time SmartKom reacted fast to my input Duration of speech input Duration of ASR SmartKom reacted fast to speech input Duration of gestural input Duration of gesture recognition SmartKom reacted fast to gestural input BargeIn Cancel SmartKom allows interrupts Dialogue complexity Was the task difficult? Gesture turns input via graphical display Ways of interaction Display turns

utput via graphical display

SLIDE 30

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation Definition of Weights

Quality and quantity measures usability question

Speech input speech input Speech synthesis (synchronicity) speech output N−way communication Ways of interaction Error rate of questions Input complexity Possibility to interact in a quasi− human way with SmartKom Recognition/duration of facial expression Prosodic features SmartKom reacted my emotional state Synchronicity Graphical output (turns) How do you score the competence

f the agent?

Cooperativity Were actions of the persona natural? Gestural input Gestural input

SLIDE 31

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation Information Clusters

Extract different superordinate concepts

depending on the task at hand.

Example: EPG

„City of Angels“ (Assumption: unique day,

time, channel) => one information needed Movie today at 8 p.m. on SAT1(channel) => three informations needed

SLIDE 32

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation User Incooperativity

Smartakus, do the dishes! Other frameworks: task failure attributed to

the system

Only dialogues with cooperative users are

evaluated using empirical methods

Only dialogues which terminate with finished

tasks are evaluated.

SLIDE 33

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation How to score multimodal inputs or outputs?

Multimodal cost functions „no.of multiple

input“ and „ways of interaction“

Weighting of recognition scores via defined

user satisfaction score

SLIDE 34

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation How to evaluate different scenarios?

Intra−scenarios: Normalization over tasks

Inter−scenarios: three systems Possibility to compute the performance

ver the three scenarios after all

evaluation periods

SLIDE 35

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation j = biunique

Information cluster; t j) = + 1 : task success; t j) = − 1 : task failure;

ci = cost function i

__

Performance = α⋅τ− ∑n

i=1 ωi⋅N ( ci )

α = Correlation between User Satisfaction und mean value of τ ωi = Correlation between User Satisfaction und normalized costs _ x − x N (x) = −−−−−−−− σx

SLIDE 36

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation Conclusion (Evaluation)

PROMISE offers an overall evaluation result

integrating cost functions and user satisfaction

PROMISE can deal with multimodality PROMISE is independent of task definitions

(static or dynamic tasks)

SLIDE 37

The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation

Beringer et al. (2002): End−to−End Evaluation of Multimodal Dialogue

Systems −can we Transfer Established Methods? Proc. of the Third International Conference on Language Resources and Evaluation. Las Palmas, Gran Canaria, Spain.

Beringer et al. (2002): PROMISE: A Procedure for Multimodal Interactive

System Evaluation. Proceedings of the Workshop ’Multimodal Resources and Multimodal Systems Evaluation’ 2002, Las Palmas, Gran Canaria, Spain, pp. 77−80.

Beringer et al. (2002): How to relate User Satisfaction and System

Performance in Multimodal Dialogue Situations − a Graphical

Approach. Proceedings of the International CLASS Workshop on Natural,