 
              Identifying Interviewer Falsification using Speech Recognition: A Proof of Concept Study Hanyu Sun, Gonzalo Rivero, Ting Yan, Westat
Background: Computer Audio-Recorded Interviewing (CARI) ❯ Computer Audio-Recorded Interviewing (CARI) allows studies to monitor interviewer performance in real time during the field data collection period (e.g. Hicks et al. 2010). ❯ The success of such evaluation, however, often depends on labor- intensive coding in a timely manner. • Often a small number of items in the questionnaire or a small portion of the interview will be coded by human coders. ❯ There is a blooming interest in the survey field to explore the use of machine learning on different aspects of the data collection process (e.g. Eck et al. 2018, Thompson, Book, and Tamby, 2018). • May use speech recognition to detect interviewer falsification and performance issues. 2
Background: Interviewer Falsification and Performance ❯ Interviewer falsification: “the intentional departure from the designed interviewer guidelines or instructions, unreported by the interviewer, which could result in the contamination of data” (AAPOR 2003). • Fabricate all or part of an interview -> Number of speakers ❯ Interviewer performance issues that we could train the interviewers to do better. • Are the questions being administered as expected? ->Interviewer reads verbatim 3
Conceptual Framework CARI Diarize Transcribe Assess Recordings Similarity • Convert audio • Identify audio • Create speech • Compare recordings segments by to text each into required speaker transcriptions transcription format for of all to a reference • Create processing segments text (i.e. individual files question) and by segment calculate similarity Detect Assess interviewer interviewer falsification performance 4
Study Design (1) ❯ Mock interviews created by six testing pairs: • 2 female interviewers (native speaker vs. non-native speaker) • 3 female respondents (1 native speaker vs. 2 non-native speakers) ❯ Varied recording quality: • Background noise • Far field effects ❯ Seven scripted question and answer sequences (5 valid vs. 2 falsified) • N=30 valid across the testing pairs • N=4 falsified, two for each interviewer 5
Study Design (2) ❯ Outcome measures: • Similarity between the question wording and the transcript -> interviewer performance - Conversational turn level transcript when the interviewer delivers the question - String metric: • WordNGram Jaccard distance: 0-1, 1=exactly the same • Levenshtein distance: 0- max. length, 0=exactly the same • Greedy String Tiling (GST): 0-1, 1=exactly the same • Number of speakers detected -> interviewer falsification, only 1 speaker if falsified. 6
Findings: Recording Quality and Similarity Measures ❯ Factors affecting the quality of the audio recording do not significantly affect the similarity measures: • Marginally significant effects of R Native Speaker on GST ( 𝐺 = 4.08, p=0.05) WordNGram Jaccard Levenshtein GST Recording Quality I Native Speaker Yes 0.54 63.81 0.80 No 0.51 81.29 0.78 R Native Speaker Yes 0.56 56.60 0.86 No 0.47 82.90 0.66 Background Noise Yes 0.52 71.00 0.86 No 0.61 40.71 0.89 Far Field Effects Yes 0.57 64.00 0.89 No 0.61 40.71 0.89 7
Findings: Interviewer Verbatim and Similarity Measures ❯ Two scripted question and answer sequences of the interviewer falsification (i.e. true values are known). ❯ Higher similarity measures if the interviewer read verbatim, and lower measures if the interviewer changed the question wording. WordNGram Interviewer Verbatim Jaccard Levenshtein GST Verbatim 0.60 50.77 0.85 Minor Wording Change 0.50 67.33 0.63 Major Wording Change 0.31 128.50 0.74 Test Statistic 𝐺 = 6.22, p=0.005 𝐺 = 3.68, p=0.04 𝐺 = 2.51, p=0.10 Westat @ APOR 2019 8
Findings: Interviewer Falsification and Number of Speakers ❯ The speech recognition approach is able to identify interviewer falsification using the number of speakers detected in the audio recording for most of the cases ( 𝑦 2 (2) =10.27, p=0.006). ❯ Factors affecting the quality of the audio recording seems to impact the number of speakers detected. Interviewer falsification: intentionally changed her Interviewer Falsification vocal attributes. Number of Speakers Detected Yes (1 Speaker) No (2 Speakers) 1 3 3 2 1 26 R non-native 3 0 1 speakers, Background noise Background noise 9
Conclusion and Discussion ❯ The CARI speech recognition approach works robustly: • Factors affecting the quality of the audio recording does not affect the similarity measures. • Higher similarity between the transcript and the speech if the interviewer read verbatim. • This approach can detect the number of speakers correctly for most of the cases. ❯ Future research: • Explore this approach with the field interviewing data. • Extend to other measures of interviewer performance, e.g. did the interviewer maintain the question meaning, did the respondent provide adequate responses. 10
Thank you! Any Questions? HanyuSun@Westat.com 11
Reference ❯ Hicks, W., Edwards, B., Tourangeau, K., McBride, B., Harris-Kojetin, L., and Moss, A. (2010). “Using CARI tools to understand measurement error”, Public Opinion Quarterly , 74 (5), 985-1003. ❯ Kern, C. (2018) “Data - driven Prediction of Panel Attrition”, Paper presented at the 2018 Conference of the American Association for Public Opinion Research, Dever, CO, U.S.A. ❯ Rivero, G., Tourangeau, R., Edwards, B. and Cook, T. (2018). “Using Machine Learning Methods to Improve Responsive Designs in Face-to- Face Surveys”, Paper presented at the 2018 Conference of the American Association for Public Opinion Research, Dever, Colorado, U.S.A. 12
Recommend
More recommend