Speech Encoder Importance of body language 2 Why data-driven? - - PowerPoint PPT Presentation

▶

Feb 07, 2023 105 likes •409 views

Speech Encoder Importance of body language 2 Why data-driven? Yoon et al. "Robots Learn Social Skills: End-to-End Learning of Co- Cassell et al. "BEAT: the Behavior Expression Speech Gesture Generation for Humanoid Robots." In

SLIDE 1

Speech Encoder

SLIDE 2

Importance of body language

SLIDE 3

Why data-driven?

Cassell et al. "BEAT: the Behavior Expression Animation Toolkit" In SIGGRAPH, 2001. Yoon et al. "Robots Learn Social Skills: End-to-End Learning of Co- Speech Gesture Generation for Humanoid Robots." In ICRA. 2019

✔ Scalability ✔ Adaptability ✔ Variability

SLIDE 4

?

Speech-driven gesture generation

SLIDE 5

Sadoughi et al. "Speech-driven animation with meaningful behaviors." Speech Communication 110. 2019

 Hybrid between data-driven and rule-based approaches  Based on PGM with an additional hidden node for a constraint  Evaluate 3 hand gestures and 2 head motions.  Do smoothing afterwards

Related work

SLIDE 6

Hasegawa et al. "Evaluation of Speech-to-Gesture Generation Using Bi-Directional LSTM Network." In IVA’18. ACM. 2018.

 From speech to 3D motion  Deep-learning based approach  Applied a lot of smoothing as post-processing

Related work

SLIDE 7

1. A novel speech-driven method for

non-verbal behavior generation that can be applied to any embodiment.

2. Evaluation of the importance of

representation both for the motion and for the speech

Contributions

SLIDE 8

General framework

SLIDE 9

Our baseline model

Hasegawa, Dai, Naoshi Kaneko, Shinichi Shirakawa, Hiroshi Sakuta, and Kazuhiko Sumi. "Evaluation of Speech-to-Gesture Generation Using Bi-Directional LSTM Network." In Proceedings of the 18th International Conference on Intelligent Virtual Agents. ACM, pp. 79-86. 2018.

SLIDE 10

Step 1

Proposed method

SLIDE 11

Step 2

Proposed method

SLIDE 12

Step 3

Proposed method

SLIDE 13

Proposed method

SLIDE 14

Experimental results

SLIDE 15

Takeuchi et al. "Creating a gesture-speech dataset for speech-based automatic gesture generation." In HCII. 2017.

 Japanese language  171 min of speech and 3D motion  Speech in mp3 format  Motion in bvh format

Dataset used

SLIDE 16

Original dim. was 384

Dimensionality choice

SLIDE 17

Input feature analysis

SLIDE 18

Histogram for wrists joints

SLIDE 19

User study measures

All were evaluated in the Likert scale from 1 to 7

SLIDE 20

19 participants with 10 videos x 9 questions x 2 conditions = 180 ratings each

User study results

*

SLIDE 21

Visual comparison

No smoothing was applied

SLIDE 22

Visual comparison

No smoothing was applied

SLIDE 23

Conclusion

SLIDE 24

24 24

The team

SLIDE 25

SLIDE 26

Questions?

SLIDE 27

Chung-Cheng Chiu, Louis-Philippe Morency, and Stacy Marsella. Predicting co-verbal gestures: a deep and temporal modeling approach. International Conference on Intelligent Virtual Agents. Springer, Cham, 2015.

 DNN + CRF = DCNF  Virtual character  Discrete set of motions

Related work

27

SLIDE 28

Speech Encoder

Importance of body language

Why data-driven?

?

Speech-driven gesture generation

Related work

Related work

non-verbal behavior generation that can be applied to any embodiment.

representation both for the motion and for the speech

Contributions

General framework

Our baseline model

Step 1

Proposed method

Step 2

Proposed method

Step 3

Proposed method

Proposed method

Experimental results

Dataset used

Original dim. was 384

Dimensionality choice

Input feature analysis

Histogram for wrists joints

User study measures

All were evaluated in the Likert scale from 1 to 7

19 participants with 10 videos x 9 questions x 2 conditions = 180 ratings each

User study results

*

Visual comparison

No smoothing was applied

Visual comparison

No smoothing was applied

Conclusion

The team

Questions?

Related work

27

https://www.ald.softbankrobotics.com Speech Body language Speech Body language

Human-robot communication