DolphinAttack: Inaudible Voice Commands Guoming Zhang, Chen Yan, - - PowerPoint PPT Presentation

dolphinattack inaudible voice commands
SMART_READER_LITE
LIVE PREVIEW

DolphinAttack: Inaudible Voice Commands Guoming Zhang, Chen Yan, - - PowerPoint PPT Presentation

DolphinAttack: Inaudible Voice Commands Guoming Zhang, Chen Yan, Xiaoyu Ji, Tianchen Zhang, Taimin Zhang, Wenyuan Xu Zhejiang University Presenter: Huichen Li This paper won the CCS 2017 Best Paper award Speech Recognition Systems Apple Siri


slide-1
SLIDE 1

DolphinAttack: Inaudible Voice Commands

Guoming Zhang, Chen Yan, Xiaoyu Ji, Tianchen Zhang, Taimin Zhang, Wenyuan Xu Zhejiang University

Presenter: Huichen Li

This paper won the CCS 2017 Best Paper award

slide-2
SLIDE 2

Speech Recognition Systems

Apple Siri Google Now Amazon Alexa Huawei HiVoice

slide-3
SLIDE 3

Obfuscated Voice Commands

Hidden Voice Commands

slide-4
SLIDE 4

Threat Model

  • Inaudible (with ultrasounds f > 20kHz)
  • No owner interaction.
  • Whitebox.
  • No (physical) target device access.
  • Attacker has required equipments (e.g.

speakers for transmitting ultrasound near target devices).

slide-5
SLIDE 5

Threat Model

  • Inaudible (with ultrasounds f > 20kHz)
  • No owner interaction.
  • Whitebox.
  • No (physical) target device access.
  • Attacker has required equipments (e.g.

speakers for transmitting ultrasound near target devices).

slide-6
SLIDE 6

Voice Controllable System

Q: Which parts of the VCS are most vulnerable? (No known answer)

slide-7
SLIDE 7

Voice Controllable System

slide-8
SLIDE 8

Voice Controllable System

ambient voices: recorded -> amplified -> filtered -> digitized

slide-9
SLIDE 9

Voice Controllable System

  • remove frequencies that are beyond the audible sound range
  • discard signal segments that contain sounds too weak to be identified
slide-10
SLIDE 10

Voice Controllable System

slide-11
SLIDE 11

Voice Controllable System

  • say pre-defined wake words
  • press a special key

Performed locally e.g. Siri

slide-12
SLIDE 12

Voice Controllable System

Via a cloud service signals sent to servers -> extract features -> recognize commands e.g. Mel-frequency cepstral coefficients(MFCC) e.g. machine learning

slide-13
SLIDE 13

Voice Controllable System

launch the corresponding application or execute an operation

slide-14
SLIDE 14

Voice Controllable System

Q: Which parts of the VCS are most vulnerable? (No known answer) Take a guess!

slide-15
SLIDE 15

Focus of Attack

Inaudible!

slide-16
SLIDE 16

Doubts on Inaudible Voice Commands

  • How can inaudible sounds be audible to devices?
  • How can inaudible sounds be intelligible to SR systems?
  • How can inaudible sounds cause unnoticed security

breach to VCS?

low-pass filters? low audio sampling rates? SR systems do not recognize signals that do not match human tonal features? speaker-dependent wake words?

slide-17
SLIDE 17

Pros: - miniature package sizes

  • low power consumption

Microphone

air pressure change -> capacitive change -> AC signal

slide-18
SLIDE 18

Nonlinearity of Microphone

in ultrasound bands f > 20kHz

m(t): target voice signal Fourier Transformation LPF

slide-19
SLIDE 19

s1(t) = cos(2π f1 t) at frequency f1=38kHz s2(t) = cos(2π f2 t) at frequency f2=40kHz s_hi (t) = s1(t) + s2(t)

Inaudible Voice Commands: The Long-Range Attack and Defense

slide-20
SLIDE 20

Modulated Tone Traversing Voice Capture Device

Modulation Demodulation

slide-21
SLIDE 21

Nonlinearity Evaluation: Questions

  • Will the demodulation work well in practice?
  • Will the demodulated voice signal remain

similar to the original one?

slide-22
SLIDE 22

Nonlinearity Evaluation: Experimental Setup

iPhone SE -> vector signal generator -> power amplifier -> ultrasonic speaker baseband signal -> modulated onto a carrier -> amplified -> transmitted

slide-23
SLIDE 23

Nonlinearity Evaluation: Single Tone Results

  • riginal
  • utput signal
  • f MEMS

microphone

  • utput signal
  • f ECM

microphone Demodulation successful! 20 kHz carrier 2 kHz baseband

slide-24
SLIDE 24

Nonlinearity Evaluation: Voices Results

3.1 7.6

  • riginal TTS

generated voice recorded as the

  • riginal voice

is played recorded as the modulated voice is played by ultrasonic speaker Mel-Cepstral Distortion (MCD) quantifies distortion between two MFCCs two voices are considered to be acceptable to voice recognition systems if their MCD values are smaller than 8 MCD between

  • riginal and

recorded Similar!

slide-25
SLIDE 25

Attack Design

  • Generate voice commands
  • Modulate baseband signals
  • Launch attack with a portable transmitter
slide-26
SLIDE 26

Activation Voice Commands Generation: Brute Force

Siri is trained with Google TTS

slide-27
SLIDE 27

Activation Voice Commands Generation: Concatenative

slide-28
SLIDE 28

Amplitude Modulation (AM): Depth (index)

directly related to the utilization of the nonlinearity effect of microphones

slide-29
SLIDE 29

Analysis: Modulation Depth

Demodulated signals become stronger Signal-to-noise ratio and the attack success rate get higher

slide-30
SLIDE 30
  • Factors for choosing f:
  • frequency range of ultrasounds
  • bandwidth of the baseband signal
  • cut-off frequency of the low pass filter
  • frequency response of the microphone on the VCS
  • frequency response of the attacking speaker

Amplitude Modulation (AM): Carrier Frequency f

slide-31
SLIDE 31
  • Factors for choosing f:
  • frequency range of ultrasounds
  • bandwidth of the baseband signal
  • cut-off frequency of the low pass filter
  • frequency response of the microphone on the VCS
  • frequency response of the attacking speaker

Amplitude Modulation (AM): Carrier Frequency f

Inaudibility: lowest frequency > 20 kHz

slide-32
SLIDE 32
  • Factors for choosing f:
  • frequency range of ultrasounds
  • bandwidth of the baseband signal
  • cut-off frequency of the low pass filter
  • frequency response of the microphone on the VCS
  • frequency response of the attacking speaker

Amplitude Modulation (AM): Carrier Frequency f

Inaudibility: lowest frequency > 20 kHz

w: frequency range

  • f voice command
slide-33
SLIDE 33
  • Factors for choosing f:
  • frequency range of ultrasounds
  • bandwidth of the baseband signal
  • cut-off frequency of the low pass filter
  • frequency response of the microphone on the VCS
  • frequency response of the attacking speaker

Amplitude Modulation (AM): Carrier Frequency f

Inaudibility: lowest frequency > 20 kHz

w: frequency range

  • f voice command

f - w > 20 kHz

slide-34
SLIDE 34
  • Factors for choosing f:
  • frequency range of ultrasounds
  • bandwidth of the baseband signal
  • cut-off frequency of the low pass filter
  • frequency response of the microphone on the VCS
  • frequency response of the attacking speaker

Amplitude Modulation (AM): Carrier Frequency f

Inaudibility: lowest frequency > 20 kHz

w: frequency range

  • f voice command

f - w > 20 kHz

  • therwise

carrier will not be filtered.

slide-35
SLIDE 35

Amplitude Modulation (AM): Carrier Frequency f

slide-36
SLIDE 36

Analysis: Carrier Wave Frequency

400 Hz baseband and higher order harmonics

slide-37
SLIDE 37

Analysis: Carrier Wave Frequency

400 Hz baseband and higher order harmonics amplitude of the harmonics larger than baseband Unacceptable to SR systems!

slide-38
SLIDE 38

Amplitude Modulation (AM): Voice Selection

  • Various voices map to various baseband frequency

ranges.

  • A voice with a small bandwidth shall be selected to

create baseband voice signals

f - w > 20 kHz

slide-39
SLIDE 39

Voice Commands Transmitter

Powerful transmitter: driven by a dedicated signal generator Portable transmitter: driven by a smartphone

slide-40
SLIDE 40

Experimental Goal

  • Examining the feasibility of attacks.
  • Quantifying the parameters in tuning a successfully

attack.

  • Measuring the attack performance.
slide-41
SLIDE 41

Feasibility Experiments: Device/System & Commands

slide-42
SLIDE 42

Impact: Languages

slide-43
SLIDE 43

Impact: Background Noise

slide-44
SLIDE 44

Impact: Distance

slide-45
SLIDE 45

Impact: Sound Pressure Levels

slide-46
SLIDE 46

Results

Almost all the systems can be attacked!

slide-47
SLIDE 47

Defense: Hardware-based

  • Microphone Enhancement.
  • Suppress any acoustic signals whose frequencies

are in the ultrasound range.

  • Inaudible Voice Command Cancellation.
  • Demodulate the signals to obtain the baseband and

subtract it.

slide-48
SLIDE 48

Defense: Software-based

  • riginal

recorded recovered support vector machine (SVM)

  • > 10 training sample (5 positive, 5 negative)
  • > 14 testing samples

100% true positive and false positive rates Q: rigorous?

slide-49
SLIDE 49

Remote attack?

slide-50
SLIDE 50

Related Work

  • Embed commands into songs -> distribute through the internet
  • Use multiple speakers to mitigate leakage
slide-51
SLIDE 51

Thanks!