unsupervised neural network based feature extraction
play

Unsupervised neural network based feature extraction using weak - PowerPoint PPT Presentation

Unsupervised neural network based feature extraction using weak top-down constraints Herman Kamper 1 , 2 , Micha Elsner 3 , Aren Jansen 4 , Sharon Goldwater 2 1 CSTR and 2 ILCC, School of Informatics, University of Edinburgh, UK 3 Department of


  1. Unsupervised neural network based feature extraction using weak top-down constraints Herman Kamper 1 , 2 , Micha Elsner 3 , Aren Jansen 4 , Sharon Goldwater 2 1 CSTR and 2 ILCC, School of Informatics, University of Edinburgh, UK 3 Department of Linguistics, The Ohio State University, USA 4 HLTCOE and CLSP, Johns Hopkins University, USA ICASSP 2015

  2. Introduction ◮ Huge amounts of speech audio data are becoming available online. ◮ Even for severely under-resourced and endangered languages (e.g. unwritten), data is being collected. ◮ Generally this data is unlabelled. ◮ We want to build speech technology on available unlabelled data. 2 / 16

  3. Introduction ◮ Huge amounts of speech audio data are becoming available online. ◮ Even for severely under-resourced and endangered languages (e.g. unwritten), data is being collected. ◮ Generally this data is unlabelled. ◮ We want to build speech technology on available unlabelled data. ◮ Need unsupervised speech processing techniques. 2 / 16

  4. Example application: query-by-example search 3 / 16

  5. Example application: query-by-example search Spoken query: 3 / 16

  6. Example application: query-by-example search Spoken query: 3 / 16

  7. Example application: query-by-example search Spoken query: 3 / 16

  8. Example application: query-by-example search Spoken query: 3 / 16

  9. Example application: query-by-example search Spoken query: 3 / 16

  10. Example application: query-by-example search Spoken query: 3 / 16

  11. Example application: query-by-example search Spoken query: What features should we use to represent the speech for such unsupervised tasks? 3 / 16

  12. Supervised neural network feature extraction 4 / 16

  13. Supervised neural network feature extraction Output: predict phone states ay ey k v Input: speech frame(s) e.g. MFCCs, filterbanks 4 / 16

  14. Supervised neural network feature extraction Output: predict phone states ay ey k v Feature extractor (learned from data) Input: speech frame(s) e.g. MFCCs, filterbanks 4 / 16

  15. Supervised neural network feature extraction Output: predict phone states ay ey k v Phone classifier (learned jointly) Feature extractor (learned from data) Input: speech frame(s) e.g. MFCCs, filterbanks 4 / 16

  16. Supervised neural network feature extraction Output: predict phone states ay ey k v Phone classifier (learned jointly) Feature extractor (learned from data) Input: speech frame(s) e.g. MFCCs, filterbanks But what if we do not have phone class targets to train our network? 4 / 16

  17. Weak supervision: unsupervised term discovery 5 / 16

  18. Weak supervision: unsupervised term discovery 5 / 16

  19. Weak supervision: unsupervised term discovery 5 / 16

  20. Weak supervision: unsupervised term discovery 5 / 16

  21. Weak supervision: unsupervised term discovery 5 / 16

  22. Weak supervision: unsupervised term discovery 5 / 16

  23. Weak supervision: unsupervised term discovery Can we use these discovered word pairs to provide us with weak supervision? 5 / 16

  24. Weak supervision: align the discovered word pairs Use correspondence idea from [Jansen et al., 2013] 6 / 16

  25. Weak supervision: align the discovered word pairs Use correspondence idea from [Jansen et al., 2013]: 6 / 16

  26. Weak supervision: align the discovered word pairs Use correspondence idea from [Jansen et al., 2013]: 6 / 16

  27. Weak supervision: align the discovered word pairs Use correspondence idea from [Jansen et al., 2013]: 6 / 16

  28. Autoencoder (AE) neural network 7 / 16

  29. Autoencoder (AE) neural network Output is same as input Input speech frame A normal autoencoder neural network is trained to reconstruct its input. 7 / 16

  30. Autoencoder (AE) neural network Output is same as input Input speech frame This reconstruction criterion can be used to pretrain a deep neural network. 7 / 16

  31. The correspondence autoencoder (cAE) Frame from other word in pair Frame from one word The correspondence autoencoder (cAE) takes a frame from one word, and tries to reconstruct the corresponding frame from the other word in the pair. 8 / 16

  32. The correspondence autoencoder (cAE) Frame from other word in pair Unsupervised feature extractor Frame from one word In this way we learn an unsupervised feature extractor using the weak word-pair supervision. 8 / 16

  33. Complete unsupervised cAE training algorithm Train correspondence (1) (4) Train stacked autoencoder autoencoder (pretraining) Initialize weights Speech corpus Unsupervised (3) feature extractor (2) Unsupervised term discovery Align word pair frames 9 / 16

  34. Evaluation of features: the same-different task 10 / 16

  35. Evaluation of features: the same-different task “apple” “pie” “grape” “apple” “apple” “like” 10 / 16

  36. Evaluation of features: the same-different task “apple” “pie” “grape” “apple” “apple” “like” Treat as query “apple” 10 / 16

  37. Evaluation of features: the same-different task “apple” “pie” “grape” “apple” “apple” “like” Treat as terms to search Treat as query “pie” “grape” “apple” “apple” “apple” “like” 10 / 16

  38. Evaluation of features: the same-different task “apple” “pie” “grape” “apple” “apple” “like” “pie” “grape” “apple” “apple” “apple” “like” 10 / 16

  39. Evaluation of features: the same-different task “apple” “pie” “grape” “apple” “apple” “like” “pie” “grape” “apple” “apple” “apple” “like” 10 / 16

  40. Evaluation of features: the same-different task “apple” “pie” “grape” “apple” “apple” “like” DTW distance: “pie” d 1 “grape” “apple” “apple” “apple” “like” 10 / 16

  41. Evaluation of features: the same-different task “apple” “pie” “grape” “apple” “apple” “like” d i < threshold? DTW distance: predict: “pie” different d 1 “grape” “apple” “apple” “apple” “like” 10 / 16

  42. Evaluation of features: the same-different task “apple” “pie” “grape” “apple” “apple” “like” d i < threshold? DTW distance: predict: “pie” � different d 1 “grape” “apple” “apple” “apple” “like” 10 / 16

  43. Evaluation of features: the same-different task “apple” “pie” “grape” “apple” “apple” “like” d i < threshold? DTW distance: predict: “pie” � different d 1 “grape” “apple” “apple” “apple” “like” 10 / 16

  44. Evaluation of features: the same-different task “apple” “pie” “grape” “apple” “apple” “like” d i < threshold? DTW distance: predict: “pie” � different d 1 “grape” d 2 “apple” “apple” “apple” “like” 10 / 16

  45. Evaluation of features: the same-different task “apple” “pie” “grape” “apple” “apple” “like” d i < threshold? DTW distance: predict: “pie” � different d 1 “grape” d 2 same “apple” “apple” “apple” “like” 10 / 16

  46. Evaluation of features: the same-different task “apple” “pie” “grape” “apple” “apple” “like” d i < threshold? DTW distance: predict: “pie” � different d 1 “grape” d 2 same × “apple” “apple” “apple” “like” 10 / 16

  47. Evaluation of features: the same-different task “apple” “pie” “grape” “apple” “apple” “like” d i < threshold? DTW distance: predict: “pie” � different d 1 “grape” d 2 same × “apple” “apple” “apple” “like” 10 / 16

  48. Evaluation of features: the same-different task “apple” “pie” “grape” “apple” “apple” “like” d i < threshold? DTW distance: predict: “pie” � different d 1 “grape” d 2 same × “apple” “apple” d 3 “apple” “like” 10 / 16

  49. Evaluation of features: the same-different task “apple” “pie” “grape” “apple” “apple” “like” d i < threshold? DTW distance: predict: “pie” � different d 1 “grape” d 2 same × “apple” “apple” d 3 same “apple” “like” 10 / 16

  50. Evaluation of features: the same-different task “apple” “pie” “grape” “apple” “apple” “like” d i < threshold? DTW distance: predict: “pie” � different d 1 “grape” d 2 same × “apple” “apple” � d 3 same “apple” “like” 10 / 16

  51. Evaluation of features: the same-different task “apple” “pie” “grape” “apple” “apple” “like” d i < threshold? DTW distance: predict: “pie” � different d 1 “grape” d 2 same × “apple” “apple” � d 3 same “apple” d 4 different × “like” � d N different 10 / 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend