R T S C
T H E U N I V E R S I T Y O F E D I N B U R G H
Analysis of the Voice Conversion Challenge 2016 Evaluation Results - - PowerPoint PPT Presentation
Analysis of the Voice Conversion Challenge 2016 Evaluation Results Mirjam Wester, Zhizheng Wu & Junichi Yamagishi I V N E U R S E C I H T S Y T T O H R G F R E U D B I N Voice Conversion Voice converted voices
R T S C
T H E U N I V E R S I T Y O F E D I N B U R G H
Voice converted voices were evaluated in terms of naturalness and similarity. The questions we addressed were:
sound?
sound compared to the target speaker and to the source speaker?
—> 50 minutes
judge naturalness and similarity
pairs how about just one single ST pair?
solution.
encounter one gender condition and listeners needed to encounter the full range of gender conditions as ratings are context-sensitive.
source-target (ST) pairs
two sets as comparable as possible.
should reflect their opinion of how natural or unnatural the sentence sounded
replacement from pool of 30 test sentences
the listening tests (hence not 54 sentences)
to 5 may not be all that meaningful.
everyday speech perception.
do all the time.
instructions:
produced by the same speaker? Some of the samples may sound somewhat degraded/distorted. Please try to listen beyond the distortion and concentrate on identifying the voice. Are the two voices the same or different? You have the option to indicate how sure you are of your decision.”
source speaker.
random ensuing all ST pairs were covered across listeners.
T N K J L O P G F A B Q E H D M I B_ C 1 2 3 4 5 System Score
S T N K J O L P G F Q B A E H D M I B_ C 1 2 3 4 5
Set 1
S T N K J L O P G F A B Q E H D M I B_ C 1 2 3 4 5
Set 2
N K S T J L O E H D M I B_ P G F A B Q N S T O L E H D M I B_ P G F A B Q K J N K S T J L O E H D M I B_ P G F A B Q All ST pairs Set 1 Set 2 C C C
S T N K J O L P G F Q B A E H D M I B_ C 1 2 3 4 5Set 1
S T N K J L O P G F A B Q E H D M I B_ C 1 2 3 4 5Set 2
Significance
T N K O L J P A H F Q B G E D M B_ I C 1 2 3 4 5
MM
S N J K L P Q G F B O H E A D M I B_ C
FF
T K G L O F J P N A E B D Q M H B_ C I 1 2 3 4 5
MF
T O K N J G L B P F A Q E D M H I B_ C
FM
S T O L J G E D M C N I B O H E K G S T L E B D Q H B_ C I O F J P N A O K S T G L D M H I B_ C B P F A Q E N J P A H F Q B M J K L T S Q G M B_ C F D P A MM MF FM FF B_ I
J P D G A O L B N K F E H Q I B_ M C T S
T J P G O L D A B K B_ Q M F H E I S N C Different: Absolutely sure Different: Not sure Same: Not sure Same: Absolutely sure 20 40 60 80 100 S H K N E I P T B Q D F B_ J O A L C G M Different: absolutely sure Different: not sure Same: not sure Same: absolutely sure 20 40 60 80 100
Source Target
inevitable.
ratings not ideal.
target was informative.
http:/ /dx.doi.org/10.7488/ds/1430
22