effect of telephone line transmission and digital audio
play

Effect of Telephone-Line Transmission and Digital Audio Format on - PowerPoint PPT Presentation

1 Effect of Telephone-Line Transmission and Digital Audio Format 27.07.2011 Effect of Telephone-Line Transmission and Digital Audio Format on Formant Tracking Measurements Christoph Meinerz Herbert Masthoff Landeskriminalamt Brandenburg,


  1. 1 Effect of Telephone-Line Transmission and Digital Audio Format 27.07.2011 Effect of Telephone-Line Transmission and Digital Audio Format on Formant Tracking Measurements Christoph Meinerz Herbert Masthoff Landeskriminalamt Brandenburg, Germany Department of Phonetics, University of Trier, Germany christoph.meinerz@gmx.de masthoff@uni-trier.de

  2. 2 Effect of Telephone-Line Transmission and Digital Audio Format 27.07.2011 Introduction - Formants, Speaker ID and Audio Compression Method - Experimental Setup, Hardware, Software Results - Formant Shift Conclusion - What to do Christoph Meinerz Herbert Masthoff Landeskriminalamt Brandenburg, Germany Department of Phonetics, University of Trier, Germany christoph.meinerz@gmx.de masthoff@uni-trier.de

  3. Introduction 3 27.07.2011 • (revival of) reports of formant measurements for speaker identification (i.e. Nolan/Grigoras, 2005; Becker et al., 2007; Jessen et al., 2010; Simpson/French, 2010) • reports of effects of telephone and lossy compression on acoustic parameters (Künzel, 2001; Köster/Grasmück, 2004; Gonzalez et. al., 2003) • the problem is real: telephone-intercepts in low-Bit .mp3! ➡ results of preliminary study: effects of telephone-line and lossy low- Bit audio compression on LPC-based formant-measurement and no intra-speaker variation

  4. Method I 4 27.07.2011 1 2 1 2 Experimental set-up - „The Plan“

  5. Method II 5 27.07.2011 mike .wav PCM 44.1 kHz 705 kbps Tech-Specs: Sound Studio UoT mike .wma CBR 22. kHz 20 kbps Mike: Neumann M147 Tube Soundcard: RME Hammerfall mike .mp3 CBR 8 kHz 8 kbps tel .wav PCM 44.1 kHz 705 kbps Tech-Specs: tel .wma CBR 22. kHz 20 kbps „Re-Tel“ - Tel. Rec. Adapter 157 Soundcard: MBox 2 Pro tel .mp3 CBR 8 kHz 8 kbps Audio Formats and Hardware

  6. Results I 6 27.07.2011 2.400 2.400 1.800 1.800 1.200 1.200 600 600 0 0 mike .wav mike .wma mike .mp3 tel .wav tel. wma tel .mp3 F3 F3 Shift of average formant frequency according to format 1 2 F2 F2 (males) F1 F1

  7. Results II 7 27.07.2011 2.400 2.400 1.800 1.800 1.200 1.200 600 600 0 0 mike .wav mike .wma mike .mp3 tel .wav tel. wma tel .mp3 F3 F3 Shift of average formant frequency according to format 1 2 F2 F2 (females) F1 F1

  8. Results III 8 27.07.2011 2.400 2.400 100 % 98 % 1.800 1.800 82 % 83 % 80 % 77 % 100 % 98 % 90 % 1.200 1.200 87 % 83 % 82 % 600 600 104 % 100 % 98 % 102 % 83 % 98 % 0 0 mike .wav mike .wma mike .mp3 tel .wav tel. wma tel .mp3 F3 F2 Mean shift of average formant frequency according to format % (all) F1

  9. Results IV 9 27.07.2011 2 Sonagraphic symptoms (top mike .wav, bottom mike .mp3)

  10. Results V 10 27.07.2011 2 Sonagraphic symptoms (top tel .wav, bottom tel .mp3)

  11. Results VI 11 27.07.2011 Sonagraphic symptoms (top mike .wav, bottom mike .mp3) 1

  12. Results VII 12 27.07.2011 Sonagraphic symptoms (top tel .wav, bottom tel .mp3) 1

  13. Summary 13 27.07.2011 • shift of formant frequencies (all) • F3: downward ≈ 2 - 23 % • F2: downward ≈ 1 - 17 % • F1: mike downward ≈ 1 - 16 % tel upward ≈ 2 - 4 %, .wav + .wma tel downward ≈ 1 %, .mp3 • highest amount of shift in tel .mp3, 8 kbps • telephone-line alone produces shift of F2, F3 ≈ mike .mp3 • sonagraphic and auditory symptoms • spectral cancellations - „the moth“ • „musical noise“ effect

  14. Conclusion 14 27.07.2011 • results confirm those already reported (i.e. Becker et al., 2011!) • consider shifting effects when doing formants and formant-related ASR (LPC) • include larger population for statistical significance - possibly detect “critical” Bit-rate • possibly cross-check with FFT -based measurements

  15. Moth-Zilla (Becker et. al., Vienna 2011) 15 27.07.2011 Thank you for your attention!

  16. References 16 27.07.2011 Becker, T. et al: Forensic speaker verification using formant features and Gaussian Mixture Models. Interspeech 2008 Special Session: Forensic Speaker Recognition – Traditional and Automatic Approaches, Brisbane. Boersma, P./D. Weenink: Praat: doing phonetics by computer [Computer program]. Version 5.2.17, retrieved 26 March 2011 from http://www.praat.org/ Gonzalez, J. et al.: Acoustic analysis of pathological voices compressed with MPEG System. Journal of Voice, 17, 2003, 126-139. Grasmück, C./J.-P. Köster: Die Auswirkung von mp3 und ATRAC-Kompression auf sprechertypische Parameter des Sprachsignals. In: Nolte, B.: Proceedings „Schall und Schwingungen in sensibler Umgebung“, 2004, Bonn, 126-132. Harrison, P.: Formant measurement errors for multiple synthetic speakers. IAFPA Annual Conference 2010, Trier. Jessen, M. et al.: Correlation between long-term formant measurements and automatic speaker recognition in forensic case material. IAFPA Annual Conference 2010, Trier. Künzel, H.J.: Beware of the telephone effect: the influence of telephone transmission on the measurement of formant frequencies. Forensic Linguistics, 8, 2001, 80-99. Nolan, F./C. Grigoras: A case for formant analysis in forensic speaker identification. International Journal of Speech, Language and the Law, 12, 2005, 143-173. Simpson, S./P. French : Testing the speaker discrimination ability of formant measurements. IAFPA Annual Conference 2010, Trier.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend