Intelligibility and Space based voice Intelligibility and - PowerPoint PPT Presentation

Intelligibility and Space based voice Intelligibility and Space-based voice with relaxed delay constraints Sam Nguyen, Clayton Okino, and Michael Cheng J t P Jet Propulsion Laboratory l i L b t Presented at IEEE Aerospace Conference Big Sky, Montana 5 March 2008

Outline • Background: Space communications Background: Space communications considerations • Luby-Transform (LT) Codes • Metrics used in testing & experimental setup M t i d i t ti & i t l t • Results • Intelligibility Overview • Intelligibility Overview • Results • Conclusions • Future directions 2

Space Communications Characteristics • End-to-end latency is significant relative to the terrestrial environment – E.g. ~1.3 sec one-way propagation delay Moon-Earth • Wireless communications channels are potentially noisy resulting in bit errors and/or dropped packets • Automatic retransmission query (ARQ) techniques rely on a return channel (feedback) which may undesirable and impose to high a h l (f db k) hi h d i bl d i t hi h constraint versus a sufficient simplex channel need – Operation over simplex channel – Tolerate errors or exploit error concealment techniques – Tolerate errors, or exploit error concealment techniques Terrestrial Networks Space Networks • Lo er Latenc Lower Latency • Higher Latenc Higher Latency • Lower BER • Higher BER • Can Request Resend on Error • Require Anticipatory Error Recovery Recovery 3

Encoder for LT codes A message block v 2 v 1 v 3 v 4 v 5 v 6 v 7 Information Packets C d Code Symbols S b l c 3 c 6 c 7 c 8 c 1 c 2 c 4 c 5 For each code symbol: 1. Randomly select the number of information packets to be XORed according to the robust soliton distribution. Example: 3 bits for symbol c 1 . 2. 2 Randomly select the positions of the information packets to be Randomly select the positions of the information packets to be XORed according to a uniform distribution. Example: positions 1, 3, 5, for symbol c 1 . 3 3. XOR the selected bits to generate the code symbol XOR the selected bits to generate the code symbol. Example: Example: c 1 =v 1 +v 3 +v 5 . 4

Decoders for LT codes Algebraic decoder: Each code symbol establishes a constraint with the information packets in a message block a message block. So a collection of code symbols establishes a system of So a collection of code symbols establishes a system of linear equations. Solution to this system of equations is the original information packets.     c 1 c 1          1 0 1 0 1 0 v 1       c 2    1 1 0 0 0 0 v 2               1 0 0 0 0 0   c k c k                     v k                v G c  1. 1 Collect code symbols c until G is full rank Collect code symbols c until G is full rank. Recover v by computing G -1 c . 2. Advantage: low average over head. Disadvantage: inverting a matrix is of complexity O(k 3 ). it O(k 3 ) Di d t i ti t i i f l 5

Decoders for LT codes (cont.) Belief Propagation (BP) decoder: 1. Find a code symbol c i that is connected to only one information packet v v j . (If there is no such code symbol, the decoder halts and declares a (If there is no such code symbol the decoder halts and declares a decoder failure). 2. Set v j = c i . 3. 3 Add v j to all code symbols c i ’ s that are connected to v j Add v j to all code symbols c i s that are connected to v j . 4. Remove all edges connected to the information packet v j . 5. Repeat steps 1-4 until all information packets are recovered. v 1 c 3 c 2 +c 3 v 2 v 1 c 3 3 c 1 c 1 c 2 c 3 c 1 c 2 Advantage: decoding complexity is ~ O(klogk). Disadvantage: average overhead is higher than the algebraic decoder. Di d t h d i hi h th th l b i d d 6

Metrics Used & Experimental Set Up • Speech Quality – Perceptual Evaluation of Speech Quality (PESQ) algorithm provides an objective measure of pf speech quality. bj i f f h li – This is as opposed to the Mean Opinion Score (MOS) subjective approach. – The basic simulation modeling approach is used from Florian Hammer and is shown below Bit error rate MatLab/C Codec Decoder Simulator Reference speech sample Evaluation (PESQ) Degraded speech samples Speech Estimated Estimated D Database b speech-quality [PESQ-MOS] 7

Codec • Codec analysis did not encompass all possible candidates and work focused on one codec as a i iti l initial assessment t – Selected codec has good PESQ performance for bandwidth efficiency but is not necessarily the optimal choice – As described in [kataoka] G.729 codec is an 8 kbps conjugate structure code excited linear prediction algorithm (CS-CELP) • Operates on 10 ms blocks of encoded speech • Utilizes linear predictive coding analysis • Utilizes codebooks for the set of possible sequences • Conjugate relationship between two codebooks used for the random excitation vector – Similar relationship for the gain vector [kataoka] A. Kataoka, T. Moriya, “An 8 kb/s Conjugate Structure CELP (CS-CELP) Speech Coders”, IEEE Transactions on Speech and Audio Processing , Vol. 4, No. 6, November 1996. 8

Results • G.729 CODEC PESQ performance degrades at various size of LT codes to number of 10ms frame per packet K = 30, n v. PESQ 4 3.5 3 5 3 SQ 5% drop, 60ms packet w LT PES 2 5 2.5 1% drop, 60ms packet w LT .1% drop, 60ms packet w LT 1% drop, 20ms packet w LT 2 .1% drop, 20ms packet w LT 1% drop, 20ms packet w/o LT .1% drop, 20ms packet w/o LT 1.5 1% drop 60ms packet w/o LT 1% drop, 60ms packet w/o LT .1% drop, 60ms packet w/o LT 5% drop, 60ms packet w/o LT 1 30 35 40 45 50 55 60 65 70 75 size of n in LT codec 9

Intelligibility Overview • Dynamic Rhyme Test Voicing Voicing Nasality Nasality Sustenation Veal-Feel Meat-Beat Vee-Bee Bean-Peen Need-Deed Sheet-Cheat Gin-Chin Mitt-Bit Vill-Bill Dint-Tint Nip-Dip Thick-Tick Zoo-Sue Moot-Boot Foo-Pooh • Speech Recognition 10

Results • Dynamic Rhyme Test Speaker S k DRT Score DRT S S Standard Error d d E RH 96.9 .74 JE 93.9 .72 CH 96.4 .96 VW 95.6 .55 KS 98.0 .69 MP 97.5 .39 • Speech Recognition Speaker #correctly identified #wrongly % of words correctly Identified Identified identified identified RH 172 20 89.58 JE 161 31 83.85 CH 167 25 86.98 VW VW 141 141 51 51 73 44 73.44 KS 156 36 81.25 MP 150 42 78.13 11

Conclusions • • Utilizing LT codes as a means of reducing packet Utilizing LT codes as a means of reducing packet erasures due to corrupted packets on an RF link can result in higher voice quality – E g Tolerating 720 ms of delay can result in error-free – E.g. Tolerating 720 ms of delay can result in error-free G.729 performance for a 5% packet drop rate channel • ASR as a means of obtaining a metric related to DRT is a promising area for further work a promising area for further work • PESQ-MOS measure was used to analyze voice degradation over space links tested for LT codec size and number of 10ms per packet and number of 10ms per packet 12

Future Directions • Extensions utilizing LT codes to improve the packet erasure performance and combining the use of ASR could provide for a solid means of identifying the benefit in terms of intelligibility of voice communications in space-based networks i ti i b d t k 13

Intelligibility and Space based voice Intelligibility and - PowerPoint PPT Presentation

Intelligibility and Space based voice Intelligibility and Space-based voice with relaxed delay constraints Sam Nguyen, Clayton Okino, and Michael Cheng J t P Jet Propulsion Laboratory l i L b t Presented at IEEE Aerospace Conference Big

Slide 1 Page: 1 The Leader's Voice Slide 3 Page: 5 The Leader's Voice Slide 4 Page: 6 The

Transparency, Openness, In Intelligibility Purpose and Outputs Purpose: To provide knowledge

DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and

Digital Voice VHF, UHF, and HF Analog Voice - AM/SSB Analog Voice - FM Digital Voice GMSK UHF

Difficult acoustic environments? Maintaining voice intelligibility Mechanics of the hearing

Difficult acoustic environments? Maintaining voice intelligibility Measurement Conventions

Aisle Safety Light Brightness SFMTA Fleet Engineering Voice Annunciator Volume Voice

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

There is a voice speaking. That voice is sovereign. That voice alone is sovereign. Jeremiah

Getting Sta rted with Voice API Lorna Mitchell Getting Sta rted with Voice API Use the Voice

CS 528 Mobile and Ubiquitous Computing Lecture 8b: Voice Analytics, Affect Detection &

CS 528 Mobile and Ubiquitous Computing Lecture 9b: Voice Analytics & Affect Detection

CS 528 Mobile and Ubiquitous Computing Lecture 9b: Voice Analytics, Affect Detection &

Baldwin Space Summary October 25 1 Baldwin School Space Summary 2 Baldwin School Space Summary

Slavic Diachronic Corpora: Challenges and Perspectives Project INCOMSLAV Mutual Intelligibility

Speech Intelligibility Enhancement using Microphone Array via Intra-Vehicular Beamforming Final

COURSE 2020/21 SELECTION Do YOU have your 40 VOLUNTEER HRS completed and the paper submitted to

Upda date o e on n the e Standar andards ds Rev eview ew and R and Rev evision n Pr

ONTARIO PRESENTS COMMUNITY ENGAGEMENT WEBSITE RESOURCE TOOL INTRODUCTION ONTARIO TRILLIUM

Characterizing the Evolution of Indian Cities using Satellite Imagery and Open Street Maps Chahat

NEW TEAM NEW PLAN NEW DX DX X (G (Group) plc lc Preliminary Results for the year ended 30

The Danish Maritime Fund (Den Danske Maritime Fond) Presentation in English July 2017 1 The

OPPORTUNITY DAY Q4/ 2019 2 APRIL 2020 C O M P A N Y I N F O R M A T I O N Techno Medical

2013 Half Year Results 31 July 2013 Geopost, Enfield An active and successful period

Intelligibility and Space based voice Intelligibility and - PowerPoint PPT Presentation

Intelligibility and Space based voice Intelligibility and Space-based voice with relaxed delay constraints Sam Nguyen, Clayton Okino, and Michael Cheng J t P Jet Propulsion Laboratory l i L b t Presented at IEEE Aerospace Conference Big

Slide 1 Page: 1 The Leader's Voice Slide 3 Page: 5 The Leader's Voice Slide 4 Page: 6 The

Transparency, Openness, In Intelligibility Purpose and Outputs Purpose: To provide knowledge

DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and

Digital Voice VHF, UHF, and HF Analog Voice - AM/SSB Analog Voice - FM Digital Voice GMSK UHF

Difficult acoustic environments? Maintaining voice intelligibility Mechanics of the hearing

Difficult acoustic environments? Maintaining voice intelligibility Measurement Conventions

Aisle Safety Light Brightness SFMTA Fleet Engineering Voice Annunciator Volume Voice

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

There is a voice speaking. That voice is sovereign. That voice alone is sovereign. Jeremiah

Getting Sta rted with Voice API Lorna Mitchell Getting Sta rted with Voice API Use the Voice

CS 528 Mobile and Ubiquitous Computing Lecture 8b: Voice Analytics, Affect Detection &amp;

CS 528 Mobile and Ubiquitous Computing Lecture 9b: Voice Analytics &amp; Affect Detection

CS 528 Mobile and Ubiquitous Computing Lecture 9b: Voice Analytics, Affect Detection &amp;

Baldwin Space Summary October 25 1 Baldwin School Space Summary 2 Baldwin School Space Summary

Slavic Diachronic Corpora: Challenges and Perspectives Project INCOMSLAV Mutual Intelligibility

Speech Intelligibility Enhancement using Microphone Array via Intra-Vehicular Beamforming Final

COURSE 2020/21 SELECTION Do YOU have your 40 VOLUNTEER HRS completed and the paper submitted to

Upda date o e on n the e Standar andards ds Rev eview ew and R and Rev evision n Pr

ONTARIO PRESENTS COMMUNITY ENGAGEMENT WEBSITE RESOURCE TOOL INTRODUCTION ONTARIO TRILLIUM

Characterizing the Evolution of Indian Cities using Satellite Imagery and Open Street Maps Chahat

NEW TEAM NEW PLAN NEW DX DX X (G (Group) plc lc Preliminary Results for the year ended 30

The Danish Maritime Fund (Den Danske Maritime Fond) Presentation in English July 2017 1 The

OPPORTUNITY DAY Q4/ 2019 2 APRIL 2020 C O M P A N Y I N F O R M A T I O N Techno Medical

2013 Half Year Results 31 July 2013 Geopost, Enfield An active and successful period

CS 528 Mobile and Ubiquitous Computing Lecture 8b: Voice Analytics, Affect Detection &

CS 528 Mobile and Ubiquitous Computing Lecture 9b: Voice Analytics & Affect Detection

CS 528 Mobile and Ubiquitous Computing Lecture 9b: Voice Analytics, Affect Detection &