Cross-lingual topic prediction for speech using translations Sameer - - PowerPoint PPT Presentation

cross lingual topic prediction for speech using
SMART_READER_LITE
LIVE PREVIEW

Cross-lingual topic prediction for speech using translations Sameer - - PowerPoint PPT Presentation

Cross-lingual topic prediction for speech using translations Sameer Bansal Herman Kamper Adam Lopez Sharon Goldwater Automated speech-to-text Translation Information Retrieval 2 Current systems English audio: ? downstream task:


slide-1
SLIDE 1

Cross-lingual topic prediction for speech using translations

Sameer Bansal Herman Kamper Adam Lopez Sharon Goldwater

slide-2
SLIDE 2

2

Automated speech-to-text

Translation Information Retrieval

slide-3
SLIDE 3

Current systems

downstream task: translation, IR

3

English audio:

?

slide-4
SLIDE 4

Current systems

Where is the nearest hospital? Automatic Speech

Recognition

4

English text: English audio: downstream task: translation, IR

slide-5
SLIDE 5

~100 languages supported by Google Translate ...

5

slide-6
SLIDE 6
  • ~3,000 languages with no writing system
  • Traditional ASR based will not work!

Unwritten languages

  • 6

Mboshi Audio:

Aikuma: Bird et al. 2014, LIG-Aikuma: Blachon et al. 2016 Godard et al. 2018

ASR Mboshi text:

slide-7
SLIDE 7

Unwritten languages

7

Mboshi Audio:

Efforts to collect speech and translations using mobile apps

Aikuma: Bird et al. 2014, LIG-Aikuma: Blachon et al. 2016 Godard et al. 2018

French text ASR Mboshi text:

slide-8
SLIDE 8

Unwritten languages

8

Mboshi Audio:

Aikuma: Bird et al. 2014, LIG-Aikuma: Blachon et al. 2016 Godard et al. 2018

French text ASR Mboshi text:

Build cross-lingual speech-to-text systems (ST)

slide-9
SLIDE 9

Why speech input?

9

https://tnw.to/ieUbS

“For many Indians, searching by voice rather than text is their first choice.”

slide-10
SLIDE 10

Radio content analysis in Uganda

10

https://bit.ly/2mL4pf6

55% households: radio main source of information

Quinn and Hidalgo-Sanchis, 2017

slide-11
SLIDE 11

Radio content analysis in Uganda

11

https://bit.ly/2mL4pf6

Collect data from public radio conversations

Quinn and Hidalgo-Sanchis, 2017

slide-12
SLIDE 12

Radio content analysis in Uganda

12

https://bit.ly/2mL4pf6

“Insights about the spread of infectious diseases, small-scale disasters, etc.”

Quinn and Hidalgo-Sanchis, 2017

healthcare disasters

slide-13
SLIDE 13

13

Luganda audio

https://radio.unglobalpulse.net/uganda

Topic?

Radio content analysis in Uganda

https://bit.ly/2mL4pf6

Topic prediction task

slide-14
SLIDE 14

14

ASR

“Eddwaliro lyaffe temuli yadde …” (“… they have built health centers”) Luganda audio

https://radio.unglobalpulse.net/uganda

Radio content analysis in Uganda

https://bit.ly/2mL4pf6

Topic?

Speech to text system

slide-15
SLIDE 15

15

ASR

“Eddwaliro lyaffe temuli yadde …” Luganda audio

Topic prediction

https://radio.unglobalpulse.net/uganda

Radio content analysis in Uganda

https://bit.ly/2mL4pf6

healthcare

(“… they have built health centers”)

Keywords indicate topic information

slide-16
SLIDE 16

16

ASR

“Eddwaliro lyaffe temuli yadde …” Luganda audio

Topic prediction

https://radio.unglobalpulse.net/uganda

Radio content analysis in Uganda

https://bit.ly/2mL4pf6

healthcare

(“… they have built health centers”)

Availability of ASR!

slide-17
SLIDE 17

17

ASR

“Eddwaliro lyaffe temuli yadde …” Luganda audio

Topic prediction

https://radio.unglobalpulse.net/uganda

Radio content analysis in Uganda

https://bit.ly/2mL4pf6

healthcare

(“… they have built health centers”)

Can we predict topics using ST?

slide-18
SLIDE 18

18

ASR

“Eddwaliro lyaffe temuli yadde …” Luganda audio

Topic prediction

https://radio.unglobalpulse.net/uganda

Radio content analysis in Uganda

https://bit.ly/2mL4pf6

healthcare

(“… they have built health centers”)

Can we predict topics using ST?

slide-19
SLIDE 19

19

ASR

“Eddwaliro lyaffe temuli yadde …” Luganda audio

Topic prediction

https://radio.unglobalpulse.net/uganda

Radio content analysis in Uganda

https://bit.ly/2mL4pf6

healthcare

(“… they have built health centers”)

UN study dataset not available!

slide-20
SLIDE 20

20

ST English text prediction

Spanish audio

Topic prediction

Our work: topic prediction for Spanish speech

topic?

ST trained in simulated low-resource settings

slide-21
SLIDE 21

21

Spanish-English BLEU 160 hours - Weiss et al. 46

ST performance in low-resource settings

*for comparison text-to-text = 58

Good performance if trained on 100+ hours

slide-22
SLIDE 22

22

Spanish-English BLEU 160 hours - Weiss et al. 46 20 hours - Bansal et al. 2019 19

Mediocre performance in low-resource settings ST performance in low-resource settings

*for comparison text-to-text = 58

slide-23
SLIDE 23

23

“Good applications for crummy machine translation” Church & Hovy, 1993

Spanish-English BLEU 160 hours - Weiss et al. 46 20 hours - Bansal et al. 2019 19

ST performance in low-resource settings

*for comparison text-to-text = 58

slide-24
SLIDE 24

24

Spanish soy cat ́olica pero no en realidad casi no voy a laiglesia English i am catholic but actually i hardly go to church

Sample translations

slide-25
SLIDE 25

25

Spanish soy cat ́olica pero no en realidad casi no voy a laiglesia English i am catholic but actually i hardly go to church 20h i’m catholics but reality i don’t go to the church

“Crummy” translation Sample translations

slide-26
SLIDE 26

26

Spanish soy cat ́olica pero no en realidad casi no voy a laiglesia English i am catholic but actually i hardly go to church 20h i’m catholics but reality i don’t go to the church topic religion

Keywords can be useful for topic prediction Sample translations

slide-27
SLIDE 27

27

ST English text prediction

Spanish audio

Topic prediction

Our work: topic prediction for Spanish speech

topic?

ST trained in simulated low-resource settings

slide-28
SLIDE 28

28

ST English text prediction

Spanish audio

Topic prediction

Our work: topic prediction for Spanish speech

topic?

Gold topics labels not available!

slide-29
SLIDE 29

29

Spanish audio

Learning topic labels

Gold topic label?

slide-30
SLIDE 30

30

Spanish audio

Learning topic labels

Gold translation I like to listen to jazz Gold topic label?

slide-31
SLIDE 31

31

Spanish audio

Learning topic labels

Use gold translations to infer topic labels

Gold translation I like to listen to jazz Gold topic label?

slide-32
SLIDE 32

32

Spanish audio

Learning topic labels

Silver topic label Gold translation I like to listen to jazz

Use gold translations to infer topic labels

slide-33
SLIDE 33

33

Spanish audio

Learning topic labels

I listen to english music

Gold human translation

I am catholic hello how are you Topic model

Training set

slide-34
SLIDE 34

34

Spanish audio

Learning topic labels

I listen to english music

Gold human translation

I am catholic hello how are you Topic model

Topic Terms small-talk hello, fine, name music dance, listen, music religion god, bible, believe ... ...

Training set

slide-35
SLIDE 35

35

Spanish audio

Learning topic labels

I listen to english music

Gold human translation

I am catholic hello how are you Topic model

Topic Terms small-talk hello, fine, name music dance, listen, music religion god, bible, believe ... ...

Number of topics set to 10

slide-36
SLIDE 36

36

Spanish audio

Learning topic labels

I listen to english music

Gold human translation

I am catholic hello how are you Topic model

Topic Terms small-talk hello, fine, name music dance, listen, music religion god, bible, believe ... ...

small-talk most frequent

slide-37
SLIDE 37

37

Spanish audio

Topic prediction and evaluation

Evaluation set

Topic model

slide-38
SLIDE 38

38

Spanish audio

Topic prediction and evaluation

Evaluation set music

I like to listen to jazz

Gold translation

Topic model

Silver

slide-39
SLIDE 39

39

Spanish audio

Topic prediction and evaluation

ST translation

music

I like to listen to jazz

Gold translation

Topic model

music

I like jazz

Silver Predicted

Compare predicted and silver topic label

slide-40
SLIDE 40

40

Spanish audio

Topic prediction and evaluation

ST translation

Good prediction music

I like to listen to jazz

Gold translation

Topic model

music

I like jazz

Silver Predicted

slide-41
SLIDE 41

41

Spanish audio

Topic prediction and evaluation

ST translation

Poor prediction music

I like to listen to jazz

Gold translation

Topic model

small-talk

I like like

Silver Predicted

slide-42
SLIDE 42

42

Spanish audio

Topic prediction and evaluation

ST translation

Evaluate over a 100 hour test set

Gold translation

Topic model

Silver Predicted

slide-43
SLIDE 43

Topic prediction accuracy

43

  • ST trained on <= 20 hours of Spanish-English
  • Pretrained on English ASR
slide-44
SLIDE 44

Topic prediction accuracy

44

small-talk topic is the majority class baseline

slide-45
SLIDE 45

Topic prediction accuracy

45

Poor performance <= 5 hours ST models

slide-46
SLIDE 46

Topic prediction accuracy

46

10-20h ST models outperform majority baseline

slide-47
SLIDE 47

Topic prediction accuracy

47

10-20h ST models outperform majority baseline

BLEU = 13

slide-48
SLIDE 48

Topic prediction accuracy

48

slide-49
SLIDE 49

Takeaways

  • Low-resource ST can still be useful for building downstream applications
  • Silver evaluation for this preliminary study

○ Future: human evaluation

  • Experiments on low-resource/unwritten languages

○ Datasets required

  • Keyword spotting

Thanks!

  • Check out: “Analyzing ASR pretraining for low-resource speech-to-text

translation”, Stoian et al.

49

slide-50
SLIDE 50

Backup

50

slide-51
SLIDE 51

51

Topic prediction accuracy

slide-52
SLIDE 52

52

Silver labels

Speakers were provided discussion prompts

slide-53
SLIDE 53

53

Topic labels

slide-54
SLIDE 54

54

Spanish dataset discussion prompts

slide-55
SLIDE 55

English text Encoder Attention Decoder Spanish Audio

  • Telephone speech (unscripted)
  • Realistic noise conditions
  • Multiple speakers and dialects
  • Crowdsourced English text translations

Spanish speech to English text

Closer to real-world conditions

slide-56
SLIDE 56

Neural ST model

CNN

MFCCs 150 x 13 37 x 512 37 x 512

biLSTM

1.5 s

yo vivo en bronx Embedding FF-Softmax LSTM Attention

56

Code available on Github

i live in bronx EOS previous time step

slide-57
SLIDE 57

57

Cross-lingual applications for low-resource languages

  • Sheridan et al., 1997

○ German speech retrieval system using French text queries.

  • Projects LORELEI, OpenCLIR

○ Query speech/text in a low-resource language using English (or similar high-resource).

  • Dredze et al. (2010) and Siu et al. (2014)

○ Unsupervised clustering of speech into topics

  • Our work: Speech paired with text translations