Speech Encoder Importance of body language 2 Why data-driven? - - PowerPoint PPT Presentation

speech encoder importance of body language
SMART_READER_LITE
LIVE PREVIEW

Speech Encoder Importance of body language 2 Why data-driven? - - PowerPoint PPT Presentation

Speech Encoder Importance of body language 2 Why data-driven? Yoon et al. "Robots Learn Social Skills: End-to-End Learning of Co- Cassell et al. "BEAT: the Behavior Expression Speech Gesture Generation for Humanoid Robots." In


slide-1
SLIDE 1

Speech Encoder

slide-2
SLIDE 2

Importance of body language

2

slide-3
SLIDE 3

Why data-driven?

Cassell et al. "BEAT: the Behavior Expression Animation Toolkit" In SIGGRAPH, 2001. Yoon et al. "Robots Learn Social Skills: End-to-End Learning of Co- Speech Gesture Generation for Humanoid Robots." In ICRA. 2019

3

✔ Scalability ✔ Adaptability ✔ Variability

slide-4
SLIDE 4

4

?

Speech-driven gesture generation

slide-5
SLIDE 5

Sadoughi et al. "Speech-driven animation with meaningful behaviors." Speech Communication 110. 2019

 Hybrid between data-driven and rule-based approaches  Based on PGM with an additional hidden node for a constraint  Evaluate 3 hand gestures and 2 head motions.  Do smoothing afterwards

5

Related work

slide-6
SLIDE 6

Hasegawa et al. "Evaluation of Speech-to-Gesture Generation Using Bi-Directional LSTM Network." In IVA’18. ACM. 2018.

 From speech to 3D motion  Deep-learning based approach  Applied a lot of smoothing as post-processing

6

Related work

slide-7
SLIDE 7
  • 1. A novel speech-driven method for

non-verbal behavior generation that can be applied to any embodiment.

  • 2. Evaluation of the importance of

representation both for the motion and for the speech

7

Contributions

slide-8
SLIDE 8

8

General framework

slide-9
SLIDE 9

9

Our baseline model

Hasegawa, Dai, Naoshi Kaneko, Shinichi Shirakawa, Hiroshi Sakuta, and Kazuhiko Sumi. "Evaluation of Speech-to-Gesture Generation Using Bi-Directional LSTM Network." In Proceedings of the 18th International Conference on Intelligent Virtual Agents. ACM, pp. 79-86. 2018.

slide-10
SLIDE 10

Step 1

10

Proposed method

slide-11
SLIDE 11

Step 2

11

Proposed method

slide-12
SLIDE 12

Step 3

12

Proposed method

slide-13
SLIDE 13

13

Proposed method

slide-14
SLIDE 14

Experimental results

14

slide-15
SLIDE 15

Takeuchi et al. "Creating a gesture-speech dataset for speech-based automatic gesture generation." In HCII. 2017.

 Japanese language  171 min of speech and 3D motion  Speech in mp3 format  Motion in bvh format

15

Dataset used

slide-16
SLIDE 16

Original dim. was 384

16

Dimensionality choice

slide-17
SLIDE 17

17

Input feature analysis

slide-18
SLIDE 18

18

Histogram for wrists joints

slide-19
SLIDE 19

User study measures

19

All were evaluated in the Likert scale from 1 to 7

slide-20
SLIDE 20

19 participants with 10 videos x 9 questions x 2 conditions = 180 ratings each

20

User study results

*

slide-21
SLIDE 21

21

Visual comparison

No smoothing was applied

slide-22
SLIDE 22

22

Visual comparison

No smoothing was applied

slide-23
SLIDE 23

Conclusion

23

slide-24
SLIDE 24

24 24

The team

slide-25
SLIDE 25
slide-26
SLIDE 26

Questions?

slide-27
SLIDE 27

Chung-Cheng Chiu, Louis-Philippe Morency, and Stacy Marsella. Predicting co-verbal gestures: a deep and temporal modeling approach. International Conference on Intelligent Virtual Agents. Springer, Cham, 2015.

 DNN + CRF = DCNF  Virtual character  Discrete set of motions

Related work

27

slide-28
SLIDE 28

https://www.ald.softbankrobotics.com Speech Body language Speech Body language

28

Human-robot communication