 
              Summary of the 2015 NIST Language Recognition i-Vector Machine Learning Challenge Craig Greenberg Audrey Tong, Alvin Martin, George Doddington National Institute of Standards and Technology Douglas Reynolds, Elliot Singer MIT Lincoln Laboratory Desire Banse, John Howard, Hui Zhao NIST contractors Daniel Garcia-Romero, Alan McCree HLT Center of Excellence, JHU Jaime Hernandez-Cordero, Lisa Mason U.S. Department of Defense Odyssey 2016 Special Session June 24, 2016
Motivation  Attract researchers outside of the speech processing community to work on language recognition  Explore new ideas in machine learning for use in language recognition  Improve the performance of language recognition technology 2
The Task: Open Set Language Identification Language 1 Language 2 ? Audio . . . Segment Language N Unknown Languages 3
The Data & Language Selection  Telephone conversations and narrowband broadcasts  Previous NIST LREs 1996 – 2011  Selected data from the IARPA Babel Program  Data driven selection  Only languages with multiple sources were select to reduce source to language effect  Language pairs with higher confusability were preferred 4
Data Size Number of Number of Set Total Segments Languages Segments per Language Train 50 300 1500 65 Dev. • 50 target ≈ 100 6431 • 15 out-of-set 6500 • 5000 target 65 – 1500 progress Test • 50 target 100 – 3500 evaluation • 15 out-of-set • 1500 out -of-set – 450 progress – 1050 evaluation  Training set did not include out-of-set data  Development set included an unlabeled out-of-set  Test set was divided into a progress and evaluation subsets 5
Data Sources Distribution of sources across languages similar 6
Segment Speech Durations Distribution of segment speech durations relatively similar 7
Performance Metric • k = target language • oos = out-of-set language • n = total number of target languages = 50 • P oos = prior probably of oos language = 0.23 8
Participation  Worldwide participation  6 continents  31 countries  78 participants downloaded data  56 participants submitted results  44 unique organizations  3773 valid submissions during the challenge period (May 1 - September 1, 2015)  4021 submissions as of January 2016 9
Contrast with Traditional LREs i-Vector Traditional Input i-vector representations of audio audio segments segments Task Identification (n-class problem) Detection (2-class problem) Cost based on miss and false alarm Metric Cost based on error rates rates Target Language 50 10 – 25 Segment Speech Log normal distribution with a Uniform distribution of 3, 10, 30 Duration mean of 35 secs secs Challenge Duration 4 months 1 week Scoring Results feedback on portion of the No results feedback test set Evaluation Platform Web-based Manual 10
Web-based Evaluation Platform  Goal was to facilitate the evaluation process with limited human involvement  All evaluation activities were conducted via a web interface  Download training and evaluation data  Upload submissions for validation and scoring  Track submission status  View results and site ranking 11
Daily Best Cost During the Challenge 12
Best Cost Per Participant at End of Challenge 13
Number of Submissions Per Participant 14
Results by Target Language 15
Results by Speech Duration 16
Lessons Learned  Record participation, more than all previous LREs  56 participants from 44 unique organizations from 31 countries in 6 continents  46 of 56 systems were better than the baseline system  6 were better than the oracle system  Half of improvement made in the first few weeks of the four-month challenge  Top systems did well on Burmese, not so on English or Hindi  Performance on OOS language was in the middle  Did not receive many system descriptions so not clear if new methods were investigated  Top system did develop novel technique to improve out-of- set detection (talk upcoming) 17
Benchmark Your System  NIST plans to keep the web-based evaluation platform up for the foreseeable future as a system development tool  Visit http://ivectorchallenge.nist.gov  Download the training and development data  Test your system on the test data used in the Challenge 18
Upcoming Activities  SRE16 & Workshop – 15 th edition  Speaker detection of telephone speech recorded over variety of handsets  Introduction of fixed training condition  Test segments with more speech duration variability  Data collected outside North America  Inclusion of trials using same and different phone numbers  http://www.nist.gov/itl/iad/mig/sre16.cfm  2016 LRE Analysis Workshop  In-depth analysis of results from LRE15  Co-located with SLT16 in San Juan, Puerto Rico 19
Recommend
More recommend