SLIDE 1 Hidden Voice Commands
Nicholas Carlini*, Pratyush Mishra*, Tavish Vaidya**, Yuankai Zhang**, Micah Sherr**, Clay Shields**, David Wagner*, Wenchao Zhou** * University of California, Berkeley ** Georgetown University
SLIDE 2
SLIDE 3
Voice channel opens up new
possibilities for attack
SLIDE 4
Today: "Okay google, text [premium SMS number]"
SLIDE 5
In the future? "Okay google, pay John $100"
SLIDE 6
SLIDE 7
We make voice commands stealthy.
SLIDE 8
We produce audio which is noise to humans, but speech to devices.
SLIDE 9 This is an instance of attacks
SLIDE 10
Background
SLIDE 11
Background
Machine Learning Algorithm
Text
SLIDE 12
Background
ML Algorithm Feature Extraction
Text
SLIDE 13
Feature Extraction
SLIDE 14
Feature Extraction
SLIDE 15
Feature Extraction
SLIDE 16 Feature Extraction
[x0]
MFCC MFCC MFCC
[x1] [x2]
SLIDE 17
ML Algorithm Feature Extraction
Text
SLIDE 18
SLIDE 19
First Attack: White-Box
Assume complete system knowledge
(model, parameters, etc)
SLIDE 20
Recognition
ML Algorithm Feature Extraction
Text
SLIDE 21
Attack
ML Algorithm Feature Extraction
Text
SLIDE 22
Attack
ML Algorithm Feature Extraction
Text
SLIDE 23
Attack
ML Algorithm Feature Extraction
Text
SLIDE 24
Inverting Feature Extraction
[x0] [x1] [x2] MFCC-1 MFCC-1 MFCC-1
SLIDE 25
Inverting Feature Extraction
[x0] [x1] [x2] MFCC-1 MFCC-1 MFCC-1
SLIDE 26
Inverting Feature Extraction
[x0] MFCC-1
SLIDE 27
Inverting Feature Extraction
[x0] [x1] MFCC-1 MFCC-1
SLIDE 28
Inverting Feature Extraction
[x0] [x1] MFCC-1 MFCC-1
SLIDE 29
Inverting Feature Extraction
[x0] [x1] [x2] MFCC-1 MFCC-1 MFCC-1
SLIDE 30
Inverting Feature Extraction
[x0] [x1] [x2] MFCC-1 MFCC-1 MFCC-1
SLIDE 31
Actually not that easy
SLIDE 32 Playing attacks over-the-air
- 1. Create a model of the physical channel
- 2. Use model to predict effect of over-the-air
- 3. Validate model by playing potential
- bfuscated commands during generation
SLIDE 33
Demo
SLIDE 34
Demo
SLIDE 35
Okay Google, take a picture
SLIDE 36
Demo
SLIDE 37
Okay Google, text 12345
SLIDE 38
Demo
SLIDE 39
Okay Google, browse to evil.com
SLIDE 40
Not Over-The-Air Demo
SLIDE 41
Okay Google, browse to evil.com
SLIDE 42
SLIDE 43
Limitations
No background noise, in an echo-free room. Assumes complete knowledge of model.
SLIDE 44
SLIDE 45
Can we make this attack practical?
Can we remove the white-box assumption?
SLIDE 46
Yes.
... but at the expense of attack quality.
SLIDE 47
SLIDE 48
Audio Obfuscater Speech Recognition
Text
Black-Box Attack
SLIDE 49
Speech Recognition
Text
Black-Box Attack
MFCC MFCC-1
SLIDE 50
Evaluation
SLIDE 51
Demo
SLIDE 52
SLIDE 53
SLIDE 54
SLIDE 55
SLIDE 56 White-Box
Attack on open system Commands heavily obfuscated Works when played over-the-air Doesn't tolerate background noise
Black-Box
Practical real-world attack Somewhat possible to recognize Works when played over-the-air Background noise and echo okay
SLIDE 57
SLIDE 58
Defenses?
Notify the user that an action was taken. Challenge the user to perform an action. Detect and prevent the malicious commands.
SLIDE 59
Detect and Prevent
Successfully trained simple machine learning classifier: learn the difference between attack commands and actual commands
SLIDE 60
SLIDE 61 Conclusion
Voice: new paradigm for human-device interaction. This brings many new risks. Our hidden voice commands are practical. The impact of these attacks will increase. Future work is needed to construct defenses.
http://hiddenvoicecommands.com/
SLIDE 62
SLIDE 63
SLIDE 64