nus sung and
play

NUS Sung and Zhiyan Duan Haotian Fang Bo Li Spoken Lyrics Corpus - PowerPoint PPT Presentation

A quantitative comparison of sung and spoken lyrics NUS Sung and Zhiyan Duan Haotian Fang Bo Li Spoken Lyrics Corpus Khe Chai Sim Ye Wang 1 Outline Motivation Dataset Description Duration Analysis


  1. A quantitative comparison of sung and spoken lyrics NUS Sung and Zhiyan Duan � Haotian Fang � Bo Li � Spoken Lyrics Corpus Khe Chai Sim � Ye Wang � 1

  2. Outline ❖ Motivation � ❖ Dataset Description � ❖ Duration Analysis � ❖ Spectral Analysis � ❖ Conclusion � ❖ Future Work

  3. Outline ❖ Motivation � ❖ Dataset Description � ❖ Duration Analysis � ❖ Spectral Analysis � ❖ Conclusion � ❖ Future Work

  4. Motivation ❖ Understanding the characteristics of singing voice � ❖ Benefiting a wide range of research problems � ❖ Lack of a comprehensive dataset with phoneme level annotation

  5. Outline ❖ Motivation � ❖ Dataset Description � ❖ Duration Analysis � ❖ Spectral Analysis � ❖ Conclusion � ❖ Future Work

  6. Dataset ❖ Diversity : in gender, accent, tempo etc. � ❖ Size : number of songs, subjects � � ❖ Balance the two Image by Digitalnative

  7. Songs Selection ❖ Phonetic richness : to get the most out of selected songs � ❖ Phonetic balance : to minimize bias � ❖ Tempo balance : to cover songs with different tempo � ❖ Popularity : easier to recruit subjects � ❖ Ease of learning : easier for subjects to learn

  8. Songs Selection ❖ Songs : 20 � ❖ Est. Phoneme Count : 140 ~ 980 per song � ❖ Tempo : 68 ~ 150 bpm

  9. Subjects ❖ 6 males, 6 females � ❖ All levels of vocal experiences � ❖ Amateur to 10+ years of vocal training � ❖ All common voice types � ❖ Soprano, alto, tenor, baritone and bass

  10. Subjects - Accents Singing Speech 6 4.5 3 1.5 0 North American Mild Malay Malay Mild SingaporeanSingaporean North Chinese Number of subjects with different accents

  11. Recording ❖ Sound-proof recording studio � ❖ 44.1 kHz, 16-bit � ❖ Pro Tools 9 � ❖ Metronome with downbeat accent (through earphone) � ❖ Lyrics printouts on music stand

  12. Annotation ❖ Phoneme set : CMU Dictionary * � ❖ Annotators : with musical & phonetic backgrounds � ❖ Software : Audacity * http://www.speech.cs.cmu.edu/cgi- bin/cmudict

  13. Annotation

  14. Annotation ❖ Annotated sung tracks : 48 tracks � ❖ Subjects : 12 (6 male, 6 female), 4 tracks per subject � ❖ Total Length : 169 mins � ❖ Phoneme Count : 25,474 � ❖ Spoken data : alignment of labels from sung data * http://www.speech.cs.cmu.edu/cgi- bin/cmudict

  15. Outline ❖ Motivation � ❖ Dataset Description � ❖ Duration Analysis � ❖ Spectral Analysis � ❖ Conclusion � ❖ Future Work

  16. Duration Analysis ❖ Focus on consonants � ❖ Stretching in time and subject variations � ❖ Proportion in syllable and position effects � ❖ Compare among different types of consonants

  17. Phoneme Classes Class CMU Phonemes AA, AE, AH, AO, AW, AY, EH, ER, EY, IH, IY, OW, OY, UH, UW Vowels Semivowels W, Y Stops B, D, G, K, P, T Affricates CH, JH Fricatives DH, F, S, SH, TH, V, Z, ZH Aspirates HH Liquids L, R Nasals M, N, NG

  18. Consonants Stretching ❖ Intuitively, vowels can be stretched arbitrarily. � ❖ Consonants are supposed to be less so ?

  19. Consonants Stretching Speech Singing 4 3 Time (s) 2 1 0 Vowel Consonant

  20. Consonants Stretching Stretching Ratio = Singing Duration / Speech Duration Male Female Overall 2.3 Average Stretching Ratio 1.725 1.15 0.575 0 Semivowel Stops Affricates Fricatives Aspirates Liquids Nasals Average stretching ratio comparison of different types of consonants

  21. Consonants Stretching - Subject Variations Comparison on probability density function of consonants duration � stretching ratio with respect to gender.

  22. Consonant Stretching - Subject Variations Gender Accent Musical Exposure 2 years of choral Subject 05 Female Malay experience Subject 08 Male Northern Chinese no vocal training

  23. Consonants Stretching - Subject Variations Comparison on consonants duration stretching ratio of subject 05 and 08

  24. Consonant Proportion Male Female Overall 34 Consonant Proportion in Syllable 25.5 (%) 17 8.5 0 Semivowel Stops Affricates Fricatives Aspirates Liquids Nasals Phoneme proportion in syllable comparison of different types of consonants

  25. Consonant Proportion ❖ Syllabic proportions of consonants are higher in males � ❖ Absolute length of both consonants and syllables are higher in male

  26. Consonant Proportion - Position Effect Type Description Example Starting At the beginning of a word /g/ in go Preceding a vowel, but not at the Preceding /m/ in small beginning of a word Succeeding a vowel, but not at the end Succeeding /l/ in angel of a word Ending At the end of a word /t/ in at

  27. Consonant Proportion - Position Effect Start Preceding Succeeding Ending 40 Consonant Proportion in Syllable 30 (%) 20 10 0 Semivowel Stops Affricates Fricatives Aspirates Liquids Nasals The effect of positioning on consonant proportion in syllable

  28. Outline ❖ Motivation � ❖ Dataset Description � ❖ Duration Analysis � ❖ Spectral Analysis � ❖ Conclusion � ❖ Future Work

  29. Spectral Analysis ❖ Likelihood score comparison of sung and spoken phonemes � ❖ Discrepancies between the effects of duration & pitch on MFCC features

  30. Likelihood Score Comparison ❖ Using a GMM-HMM system trained on WSJ0 corpus � ❖ Perform alignment on both speech and singing data � ❖ Phonemes boundaries are fixed for sung tracks

  31. Likelihood Score Comparison Spoken Phoneme Sung Phoneme GMM-HMM System Score Score -

  32. Likelihood Score Comparison Average likelihood difference = |Average likelihood score (sung) - Average likelihood score(spoken)| Male Female Overall 90 Average Likelihood Difference 67.5 45 22.5 0 Vowels Semivowels Stops Affricates Fricatives Aspirates Liquids Nasals Average likelihood difference comparison of different types of phonemes

  33. Effects of Duration & Pitch on Acoustic Features ❖ Discretize phoneme duration/pitch into 10 bins � ❖ Ensure bins have balanced cumulative density masses � ❖ Cluster using decision tree � ❖ Lower reduction rate indicates larger impact on low level acoustic features (i.e. MFCC)

  34. Effects of Duration & Pitch on Acoustic Features Sung Spoken 60 Model Reduction Rate 45 30 15 0 Duration Pitch

  35. Outline ❖ Motivation � ❖ Dataset Description � ❖ Duration Analysis � ❖ Spectral Analysis � ❖ Conclusion � ❖ Future Work

  36. Conclusion ❖ Created the NUS-48E dataset of sung and spoken lyrics � ❖ Conducted comparative study of sung and spoken phonemes in both time and frequency domain

  37. Outline ❖ Motivation � ❖ Dataset Description � ❖ Duration Analysis � ❖ Spectral Analysis � ❖ Conclusion � ❖ Future Work

  38. Future Work ❖ Continue to annotate the remaining tracks (currently 80 out of 420 are annotated) � ❖ Annotate the spoken data � ❖ Repeat some previous work related to singing voice using the new dataset � ❖ Further exploration based on current observations

  39. Thank you!

  40. Question & Answer

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend