Falsification using Speech Recognition: A Proof of Concept Study - - PowerPoint PPT Presentation

▶

Aug 25, 2022 369 likes •498 views

Identifying Interviewer Falsification using Speech Recognition: A Proof of Concept Study Hanyu Sun, Gonzalo Rivero, Ting Yan, Westat Background: Computer Audio-Recorded Interviewing (CARI) Computer Audio-Recorded Interviewing (CARI) allows

SLIDE 1

Identifying Interviewer Falsification using Speech Recognition: A Proof of Concept Study

Hanyu Sun, Gonzalo Rivero, Ting Yan, Westat

SLIDE 2

Background: Computer Audio-Recorded Interviewing (CARI) ❯ Computer Audio-Recorded Interviewing (CARI) allows studies to monitor interviewer performance in real time during the field data collection period (e.g. Hicks et al. 2010). ❯ The success of such evaluation, however, often depends on labor- intensive coding in a timely manner.

Often a small number of items in the questionnaire or a small portion
f the interview will be coded by human coders.

❯ There is a blooming interest in the survey field to explore the use of machine learning on different aspects of the data collection process (e.g. Eck et al. 2018, Thompson, Book, and Tamby, 2018).

May use speech recognition to detect interviewer falsification and

performance issues.

SLIDE 3

Background: Interviewer Falsification and Performance ❯ Interviewer falsification: “the intentional departure from the designed interviewer guidelines or instructions, unreported by the interviewer, which could result in the contamination of data” (AAPOR 2003).

Fabricate all or part of an interview -> Number of speakers

❯ Interviewer performance issues that we could train the interviewers to do better.

Are the questions being administered as expected? ->Interviewer

reads verbatim

SLIDE 4

Conceptual Framework

CARI Recordings

Convert audio

recordings into required format for processing Diarize

Identify audio

segments by speaker

Create

individual files by segment Transcribe

Create speech

to text transcriptions

f all

segments Assess Similarity

Compare

each transcription to a reference text (i.e. question) and calculate similarity

Detect interviewer falsification Assess interviewer performance

SLIDE 5

Study Design (1) ❯ Mock interviews created by six testing pairs:

2 female interviewers (native speaker vs. non-native speaker)
3 female respondents (1 native speaker vs. 2 non-native speakers)

❯ Varied recording quality:

Background noise
Far field effects

❯ Seven scripted question and answer sequences (5 valid vs. 2 falsified)

N=30 valid across the testing pairs
N=4 falsified, two for each interviewer

SLIDE 6

Study Design (2) ❯ Outcome measures:

Similarity between the question wording and the transcript ->

interviewer performance －Conversational turn level transcript when the interviewer delivers the question －String metric:

WordNGram Jaccard distance: 0-1, 1=exactly the same
Levenshtein distance: 0- max. length, 0=exactly the same
Greedy String Tiling (GST): 0-1, 1=exactly the same
Number of speakers detected -> interviewer falsification, only 1

speaker if falsified.

SLIDE 7

Findings: Recording Quality and Similarity Measures

Recording Quality WordNGram Jaccard Levenshtein GST I Native Speaker Yes 0.54 63.81 0.80 No 0.51 81.29 0.78 R Native Speaker Yes 0.56 56.60 0.86 No 0.47 82.90 0.66 Background Noise Yes 0.52 71.00 0.86 No 0.61 40.71 0.89 Far Field Effects Yes 0.57 64.00 0.89 No 0.61 40.71 0.89

❯ Factors affecting the quality of the audio recording do not significantly affect the similarity measures:

Marginally significant effects of R Native Speaker on GST (𝐺 =

4.08, p=0.05)

SLIDE 8

Findings: Interviewer Verbatim and Similarity Measures ❯ Two scripted question and answer sequences of the interviewer falsification (i.e. true values are known). ❯ Higher similarity measures if the interviewer read verbatim, and lower measures if the interviewer changed the question wording.

Westat @ APOR 2019 8

Interviewer Verbatim WordNGram Jaccard Levenshtein GST Verbatim 0.60 50.77 0.85 Minor Wording Change 0.50 67.33 0.63 Major Wording Change 0.31 128.50 0.74 Test Statistic 𝐺 = 6.22, p=0.005 𝐺 = 3.68, p=0.04 𝐺 = 2.51, p=0.10

SLIDE 9

Findings: Interviewer Falsification and Number of Speakers ❯ The speech recognition approach is able to identify interviewer falsification using the number of speakers detected in the audio recording for most of the cases (𝑦2(2)=10.27, p=0.006). ❯ Factors affecting the quality of the audio recording seems to impact the number of speakers detected.

Interviewer Falsification Number of Speakers Detected Yes (1 Speaker) No (2 Speakers) 1 3 3 2 1 26 3 1 Interviewer falsification: intentionally changed her vocal attributes. Background noise R non-native speakers, Background noise

SLIDE 10

Conclusion and Discussion

❯ The CARI speech recognition approach works robustly:

Factors affecting the quality of the audio recording does not affect the similarity

measures.

Higher similarity between the transcript and the speech if the interviewer read

verbatim.

This approach can detect the number of speakers correctly for most of the

cases. ❯ Future research:

Explore this approach with the field interviewing data.
Extend to other measures of interviewer performance, e.g. did the interviewer

maintain the question meaning, did the respondent provide adequate responses.

SLIDE 11

Thank you!

Any Questions? HanyuSun@Westat.com

SLIDE 12

Reference ❯ Hicks, W., Edwards, B., Tourangeau, K., McBride, B., Harris-Kojetin, L., and Moss, A. (2010). “Using CARI tools to understand measurement error”, Public Opinion Quarterly, 74(5), 985-1003. ❯ Kern, C. (2018) “Data-driven Prediction of Panel Attrition”, Paper presented at the 2018 Conference of the American Association for Public Opinion Research, Dever, CO, U.S.A. ❯ Rivero, G., Tourangeau, R., Edwards, B. and Cook, T. (2018). “Using Machine Learning Methods to Improve Responsive Designs in Face-to- Face Surveys”, Paper presented at the 2018 Conference of the American Association for Public Opinion Research, Dever, Colorado, U.S.A.