7 speech quality assessment
play

7-Speech Quality Assessment Quality Levels Subjective Tests - PowerPoint PPT Presentation

7-Speech Quality Assessment Quality Levels Subjective Tests Objective Tests Intelligibility Naturalness Quality Levels Synthetic Quality (Under 4.8 kbps) Communication Quality (4.8 to 13 kbps) Toll Quality (13 to 64 kbps) Broadcast Quality


  1. 7-Speech Quality Assessment Quality Levels Subjective Tests Objective Tests Intelligibility Naturalness

  2. Quality Levels Synthetic Quality (Under 4.8 kbps) Communication Quality (4.8 to 13 kbps) Toll Quality (13 to 64 kbps) Broadcast Quality (Upper than 64 kbps)

  3. Test Types Intelligibility Naturalness Subjective DRT, MRT MOS, DAM AI, Global SNR, Seg. Objective None. SNR, FW-Seg. SNR, Future ASR Itakura Measure, WSSM systems

  4. First Class Subjective Intelligibility Tests Diagnostic Rhyme Test (DRT) – Selecting between two CVC by different first C – First C should have specific properties – Ex. hop - fop And than - dan Modified Rhyme Test (MRT) – Selecting between CVC’s by different first C – Ex. Cat, bat, rat, mat, fat, sat

  5. First Class (Cont’d) Subjective Intelligibility tests DRT is very applicable and credible In this test user can hear the speech only once  N N   Correct Incorrect % 100 DRT N Tests

  6. Second Class Subjective Naturalness tests Mean Opinion Score (MOS) – MOS is very applicable and credible – In this test user can hear the speech a lot Diagnostic Acceptability Measure (DAM) – This test is very complex

  7. Mean Opinion Score (MOS) Scores for MOS are like this Score Speech Quality 1 Not Acceptable 2 Weak 3 Medium 4 Good 5 Excellent

  8. Diagnostic Acceptability Measure (DAM) This test is very complex In this test there is 19 different parameters for score. These parameters divide into 3 main groups: – Signal Quality – Background Quality – Total Quality

  9. Objective Tests These tests can not be used for intelligibility. Because system couldn ’ t recognize speech intelligibility Objective tests can only be used for speech Naturalness

  10. Objective Tests (Cont’d) Articulation Index (AI) Signal to Noise Ratio (SNR) – Global (Classic) SNR – Segmental SNR – Frequency Weighted Segmental SNR

  11. Articulation Index (AI) AI assumes that different frequency bands distortion are independent, and measure signal quality in different bands. In each band determines percentage of perceptible signal by listener 20 Bands HZ . . . . . . . . . 200 6100

  12. Articulation index (Cont’d) Perceptible by user signal : – 1- Upper than human hearing threshold – 2- Under than human pain threshold – 3- Upper than Masking Noise level – In each case one of the states 1 or 3 is prevail

  13. Articulation index (Cont ’ d) In AI SNR measured isolated in each band 20 1 ( , 30 ) Min SNR   AI 20 30  1 j

  14. Signal To Noise Ratio(SNR)    ˆ n s s ( ) ( ) ( ) n n         2 ˆ 2 [ ] E s s  ( ) ( ) ( ) n n n      n n   2 E s ( ) s n   n   2 s ( ) n E     s n 10 log 10 log SNR  ( ) global  E   ˆ 2 [ ] s s ( ) ( ) n n   n

  15. Segmental SNR m j  2 ( ) s n N 1     1  n m M 10 log [ j ] SNR ( ) seg m N j    1 j ˆ 2 [ ( ) ( ) ] s n s n    1 n m M j N : Number of frames j ’ th Frame SNR M: Frame length Usually averaged over “good frames” “good frames”: having SNRs of higher than -10dB and Saturated at +30dB

  16. Frequency Weighted Segmental SNR Siemens Formula: 𝑂 𝐺 𝑘,𝑙 σ 𝑡(𝑜) 2 𝑥 𝑇𝑂𝑆 𝐺𝑋𝑇 = 1 1 𝑂 ෍ ෍ 10𝑚𝑝𝑕 10 𝑡 𝑜 ] 2 σ[(𝑡 𝑜 − Ƹ 𝑋 𝑙 𝑙=1 𝑘=1 𝐺 𝑋 𝑙 = ෍ 𝑥 𝑘,𝑙 𝑘=1 F : Number of frequency bands N : Number of frames

  17. Frequency Weighted Segmental SNR Deller Formula K  10log [ ( ) ( )] w E m E m   , 10 , , j k s k j k j 1 M 1    1 k 10log [ ] SNR  ( ) 10 fw seg K  M  0 j w , j k  1 k

  18. Frequency Weighted Segmental SNR Other Formulas:    ( ) 1 E m M K 1 1    , s k j   10log SNR w  ( ) 10 , fw seg j k K   ( )  M E m      0 1 j k , w k j , j k  1 k   K  10log [ ( ) ( )] w E m E m     , 10 , , j k s k j k j 1 M 1      1 k SNR  ( ) fw seg  K   M  0 j w   , j k    1 k

  19. The Final Formula The right formula for fw-seg SNR is thus:   K  10log [ ( ) ( )]  w E m E m    , 10 , , 1 j k s k j k j M 1      1 k SNR  ( ) fw seg  K   M  0 j w   , j k    1 k

  20. The Final Formula Where – M is the number of frames – j is the frame index – k is the frequency band index – w j,k is the weight of the kth band of the jth frame – E s,k and E e ,k are the energies of the kth band of signal and noise respectively

  21. Itakura Measure (  ) H (  ) S (  ) H Is the envelope spectrum        2 ( ) { ( )} ( ) | ( ) | S F R S X Use from All-Pole (AR) Model

  22. Itakura Measure (Cont ’ d) 1   ( ) H p     j 1 a i e  1 i This is based on the spectrum difference between main signal and assessment signal a Autoregressive Coefficients i K Reflection Coefficients i R Autocorrelation Coefficients i

  23. Itakura Measure (Cont ’ d) M 1    2 ( ( ), ( )) [ ( , ) ( , )] d g m g m g l m g l m ˆ ˆ s s s s M  1 l m :Index of frame l : Index of coefficients

  24. Itakura Measure (Cont ’ d) ~    ( ( ), ( ' )) d m m ˆ lp s s M      [ ( , ) ( , ' )] W l m l m ˆ , , ' l m m s s 1   1 l [ ] M  W , , ' l m m  1 l  ( m , ) Is the l ’ th parameter of the frame that l s conduces m ’ th sample

  25. Weighted Spectral Slope Measure (WSSM)     | ( , ) | | ( 1 , ) | | ( , ) | s k m s k m s k m     ˆ ˆ ˆ | ( , ) | | ( 1 , ) | | ( , ) | s k m s k m s k m  | ( 1 , ) | | ( , ) | are in dB. s k m and s k m Is STFT of k ’ th band of the frame ( , ) s k m that conduces m ’ th sample   ˆ (| ( , ) |, | ( , ) |) d s m s m WSSM 36       ˆ 2 [ | ( , ) | | ( , ) | ] K W s k m s k m , k m  1 k

  26. PESQ Perceptual Evaluation of Speech Quality

  27. PESQ The most eminent result of PESQ is the MOS. It directly expresses the voice quality. The PESQ MOS as defined by the ITU recommendation P.862 ranges from 1.0 (worst) up to 4.5 (best). This may surprise at first glance since the ITU scale ranges up to 5.0, but the explanation is simple: PESQ simulates a listening test and is optimized to reproduce the average result of all listeners (remember, MOS stands for Mean Opinion Score). Statistics however prove that the best average result one can generally expect from a listening test is not 5.0, instead it is ca. 4.5. It appears the subjects are always cautious to score a 5, meaning "excellent", even if there is no degradation at all.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend