Falsification using Speech Recognition: A Proof of Concept Study - - PowerPoint PPT Presentation
Falsification using Speech Recognition: A Proof of Concept Study - - PowerPoint PPT Presentation
Identifying Interviewer Falsification using Speech Recognition: A Proof of Concept Study Hanyu Sun, Gonzalo Rivero, Ting Yan, Westat Background: Computer Audio-Recorded Interviewing (CARI) Computer Audio-Recorded Interviewing (CARI) allows
Background: Computer Audio-Recorded Interviewing (CARI) ❯ Computer Audio-Recorded Interviewing (CARI) allows studies to monitor interviewer performance in real time during the field data collection period (e.g. Hicks et al. 2010). ❯ The success of such evaluation, however, often depends on labor- intensive coding in a timely manner.
- Often a small number of items in the questionnaire or a small portion
- f the interview will be coded by human coders.
❯ There is a blooming interest in the survey field to explore the use of machine learning on different aspects of the data collection process (e.g. Eck et al. 2018, Thompson, Book, and Tamby, 2018).
- May use speech recognition to detect interviewer falsification and
performance issues.
2
Background: Interviewer Falsification and Performance ❯ Interviewer falsification: “the intentional departure from the designed interviewer guidelines or instructions, unreported by the interviewer, which could result in the contamination of data” (AAPOR 2003).
- Fabricate all or part of an interview -> Number of speakers
❯ Interviewer performance issues that we could train the interviewers to do better.
- Are the questions being administered as expected? ->Interviewer
reads verbatim
3
Conceptual Framework
4
CARI Recordings
- Convert audio
recordings into required format for processing Diarize
- Identify audio
segments by speaker
- Create
individual files by segment Transcribe
- Create speech
to text transcriptions
- f all
segments Assess Similarity
- Compare
each transcription to a reference text (i.e. question) and calculate similarity
Detect interviewer falsification Assess interviewer performance
Study Design (1) ❯ Mock interviews created by six testing pairs:
- 2 female interviewers (native speaker vs. non-native speaker)
- 3 female respondents (1 native speaker vs. 2 non-native speakers)
❯ Varied recording quality:
- Background noise
- Far field effects
❯ Seven scripted question and answer sequences (5 valid vs. 2 falsified)
- N=30 valid across the testing pairs
- N=4 falsified, two for each interviewer
5
Study Design (2) ❯ Outcome measures:
- Similarity between the question wording and the transcript ->
interviewer performance -Conversational turn level transcript when the interviewer delivers the question -String metric:
- WordNGram Jaccard distance: 0-1, 1=exactly the same
- Levenshtein distance: 0- max. length, 0=exactly the same
- Greedy String Tiling (GST): 0-1, 1=exactly the same
- Number of speakers detected -> interviewer falsification, only 1
speaker if falsified.
6
Findings: Recording Quality and Similarity Measures
Recording Quality WordNGram Jaccard Levenshtein GST I Native Speaker Yes 0.54 63.81 0.80 No 0.51 81.29 0.78 R Native Speaker Yes 0.56 56.60 0.86 No 0.47 82.90 0.66 Background Noise Yes 0.52 71.00 0.86 No 0.61 40.71 0.89 Far Field Effects Yes 0.57 64.00 0.89 No 0.61 40.71 0.89
7
❯ Factors affecting the quality of the audio recording do not significantly affect the similarity measures:
- Marginally significant effects of R Native Speaker on GST (𝐺 =
4.08, p=0.05)
Findings: Interviewer Verbatim and Similarity Measures ❯ Two scripted question and answer sequences of the interviewer falsification (i.e. true values are known). ❯ Higher similarity measures if the interviewer read verbatim, and lower measures if the interviewer changed the question wording.
Westat @ APOR 2019 8
Interviewer Verbatim WordNGram Jaccard Levenshtein GST Verbatim 0.60 50.77 0.85 Minor Wording Change 0.50 67.33 0.63 Major Wording Change 0.31 128.50 0.74 Test Statistic 𝐺 = 6.22, p=0.005 𝐺 = 3.68, p=0.04 𝐺 = 2.51, p=0.10
Findings: Interviewer Falsification and Number of Speakers ❯ The speech recognition approach is able to identify interviewer falsification using the number of speakers detected in the audio recording for most of the cases (𝑦2(2)=10.27, p=0.006). ❯ Factors affecting the quality of the audio recording seems to impact the number of speakers detected.
9
Interviewer Falsification Number of Speakers Detected Yes (1 Speaker) No (2 Speakers) 1 3 3 2 1 26 3 1 Interviewer falsification: intentionally changed her vocal attributes. Background noise R non-native speakers, Background noise
Conclusion and Discussion
❯ The CARI speech recognition approach works robustly:
- Factors affecting the quality of the audio recording does not affect the similarity
measures.
- Higher similarity between the transcript and the speech if the interviewer read
verbatim.
- This approach can detect the number of speakers correctly for most of the
cases. ❯ Future research:
- Explore this approach with the field interviewing data.
- Extend to other measures of interviewer performance, e.g. did the interviewer
maintain the question meaning, did the respondent provide adequate responses.
10
Thank you!
Any Questions? HanyuSun@Westat.com
11
Reference ❯ Hicks, W., Edwards, B., Tourangeau, K., McBride, B., Harris-Kojetin, L., and Moss, A. (2010). “Using CARI tools to understand measurement error”, Public Opinion Quarterly, 74(5), 985-1003. ❯ Kern, C. (2018) “Data-driven Prediction of Panel Attrition”, Paper presented at the 2018 Conference of the American Association for Public Opinion Research, Dever, CO, U.S.A. ❯ Rivero, G., Tourangeau, R., Edwards, B. and Cook, T. (2018). “Using Machine Learning Methods to Improve Responsive Designs in Face-to- Face Surveys”, Paper presented at the 2018 Conference of the American Association for Public Opinion Research, Dever, Colorado, U.S.A.
12