Metamorph: Injecting Inaudible Commands into Over-the-air Voice - - PowerPoint PPT Presentation

metamorph injecting inaudible commands into over the air
SMART_READER_LITE
LIVE PREVIEW

Metamorph: Injecting Inaudible Commands into Over-the-air Voice - - PowerPoint PPT Presentation

Metamorph: Injecting Inaudible Commands into Over-the-air Voice Controlled Systems Tao Chen 1 Longfei Shangguan 2 Zhenjiang Li 1 Kyle Jamieson 3 1 City University of Hong Kong, 2 Microsoft, 3 Princeton University Voice Assistants in Smart Home 2


slide-1
SLIDE 1

Metamorph: Injecting Inaudible Commands into Over-the-air Voice Controlled Systems

Tao Chen1 Longfei Shangguan2 Zhenjiang Li1 Kyle Jamieson3

1City University of Hong Kong, 2Microsoft, 3Princeton University

slide-2
SLIDE 2

2

Voice Assistants in Smart Home

5 4 3 2 1

slide-3
SLIDE 3

2

Voice Assistants in Smart Home

5 4 3 2 1

slide-4
SLIDE 4

2

Voice Assistants in Smart Home

5 4 3 2 1

https://www.emarketer.com/content/us-voice-assistant-users-2019

111.8 million people in U.S. use voice assistants and related services!

slide-5
SLIDE 5

3

Are they safe enough?

slide-6
SLIDE 6

4

How to attack the voice assistant?

Speech Recognition Models (SR) Neural networks

slide-7
SLIDE 7

5

How to attack the voice assistant?

SR(I)

Audio Clip: I

“ this is for you”

T :

slide-8
SLIDE 8

5

How to attack the voice assistant?

SR(I)

Audio Clip: I

“ this is for you”

T : Adversarial Example:

I + δ

“ open the door”

T′ : Perturbation: δ

slide-9
SLIDE 9

5

How to attack the voice assistant?

dBI(δ), SR(I) = T, SR(I + δ) = T′ minimize such that SR(I)

Audio Clip: I

“ this is for you”

T : Adversarial Example:

I + δ

“ open the door”

T′ : Perturbation: δ

Nicholas Carlini et al. Audio Adversarial Examples, Deep Learning and Security Workshop, 2018

slide-10
SLIDE 10

5

How to attack the voice assistant?

dBI(δ), SR(I) = T, SR(I + δ) = T′ minimize such that SR(I)

Audio Clip: I

“ This is for you”

T : Adversarial Example:

I + δ

“ Open the door”

T′ : Perturbation: δ

Audio Adversarial Attack

Nicholas Carlini et al. Audio Adversarial Examples, Deep Learning and Security Workshop, 2018

slide-11
SLIDE 11

6

slide-12
SLIDE 12

6

slide-13
SLIDE 13

6

slide-14
SLIDE 14

6

slide-15
SLIDE 15

6

Is it a real threat? Yes!

slide-16
SLIDE 16

6

Adversarial Example

slide-17
SLIDE 17

6

Adversarial Example

But, failed Over-the-air!

slide-18
SLIDE 18

Challenge

7

Attenuation Multi-path Channel Effect Hardware Heterogeneity

slide-19
SLIDE 19

Challenge

7

Attenuation Multi-path Channel Effect Hardware Heterogeneity

VS

SR(I + δ) SR(H(I + δ))

H is unknown in advance!

slide-20
SLIDE 20

Understand Over-the-air Attack

8

Hardware Heterogeneity

Attenuation Multi-path

Channel Effect

slide-21
SLIDE 21

9

Attenuation

Attenuation

slide-22
SLIDE 22

9

Attenuation

Attenuation

slide-23
SLIDE 23

9

Attenuation Normalization

“ Open the door”

Attenuation

slide-24
SLIDE 24

9

Attenuation Normalization

“ Open the door”

Attenuation

No frequency-selectivity, doesn’t matter at all!

slide-25
SLIDE 25

Understand Over-the-air Attack

10

Hardware Heterogeneity

Attenuation Multi-path

Channel Effect Noise

slide-26
SLIDE 26

11

Hardware Heterogeneity

Anechoic Chamber Testing

Transmitter Receiver

Anechoic Materials

slide-27
SLIDE 27

11

Hardware Heterogeneity

Anechoic Chamber Testing

Transmitter Receiver

Anechoic Materials

slide-28
SLIDE 28

11

Hardware Heterogeneity

Anechoic Chamber Testing

Transmitter Receiver

Anechoic Materials

slide-29
SLIDE 29

11

Hardware Heterogeneity

Anechoic Chamber Testing

Transmitter Receiver

Anechoic Materials

Not strong, device’s inherent feature, compensable!

slide-30
SLIDE 30

12

Hardware Heterogeneity

Anechoic Chamber Testing

Transmitter Receiver

Anechoic Materials

Static, predictable and compensable! Character Successful Rate (CSR):

slide-31
SLIDE 31

Understand Over-the-air Attack

13

Hardware Heterogeneity

Attenuation Multi-path

Channel Effect

slide-32
SLIDE 32

14

Multi-path

HIVI M200MK3 Speaker Ruler SAMSUNG S7 Over-the-air Channel HIVI M200MK3 Speaker SAMSUNG S7 Ruler Over-the-air Channel HIVI M200MK3 Speaker SAMSUNG S7 Over-the-air Channel

slide-33
SLIDE 33

15 HIVI M200MK3 Speaker Ruler SAMSUNG S7 Over-the-air Channel HIVI M200MK3 Speaker SAMSUNG S7 Ruler Over-the-air Channel HIVI M200MK3 Speaker SAMSUNG S7 Over-the-air Channel

Office Corridor Home Tx to Rx: From 0.5m to 8m

Multi-path: Near range

slide-34
SLIDE 34

15 HIVI M200MK3 Speaker Ruler SAMSUNG S7 Over-the-air Channel HIVI M200MK3 Speaker SAMSUNG S7 Ruler Over-the-air Channel HIVI M200MK3 Speaker SAMSUNG S7 Over-the-air Channel

Office Corridor Home Tx to Rx: From 0.5m to 8m

Multi-path: Near range

I Q

Reflection1 Reflection2 LOS path Superimposed signal

slide-35
SLIDE 35

15 HIVI M200MK3 Speaker Ruler SAMSUNG S7 Over-the-air Channel HIVI M200MK3 Speaker SAMSUNG S7 Ruler Over-the-air Channel HIVI M200MK3 Speaker SAMSUNG S7 Over-the-air Channel

Office Corridor Home Tx to Rx: From 0.5m to 8m

Multi-path: Near range

I Q

Reflection1 Reflection2 LOS path Superimposed signal

slide-36
SLIDE 36

15 HIVI M200MK3 Speaker Ruler SAMSUNG S7 Over-the-air Channel HIVI M200MK3 Speaker SAMSUNG S7 Ruler Over-the-air Channel HIVI M200MK3 Speaker SAMSUNG S7 Over-the-air Channel

Office Corridor Home Tx to Rx: From 0.5m to 8m

Multi-path: Near range

I Q

Reflection1 Reflection2 LOS path Superimposed signal

Also not strong and similar!

slide-37
SLIDE 37

16 HIVI M200MK3 Speaker Ruler SAMSUNG S7 Over-the-air Channel HIVI M200MK3 Speaker SAMSUNG S7 Ruler Over-the-air Channel HIVI M200MK3 Speaker SAMSUNG S7 Over-the-air Channel

Office Corridor Home Tx to Rx: From 0.5m to 8m

Multi-path: Long range

slide-38
SLIDE 38

HIVI M200MK3 Speaker Ruler SAMSUNG S7 Over-the-air Channel HIVI M200MK3 Speaker SAMSUNG S7 Ruler Over-the-air Channel HIVI M200MK3 Speaker SAMSUNG S7 Over-the-air Channel

Office Corridor Home Tx to Rx: From 0.5m to 8m

Multi-path: Long range

Superimposed signal Reflection1 Reflection2 LOS path

I Q

16

slide-39
SLIDE 39

HIVI M200MK3 Speaker Ruler SAMSUNG S7 Over-the-air Channel HIVI M200MK3 Speaker SAMSUNG S7 Ruler Over-the-air Channel HIVI M200MK3 Speaker SAMSUNG S7 Over-the-air Channel

Office Corridor Home Tx to Rx: From 0.5m to 8m

Multi-path: Long range

Superimposed signal Reflection1 Reflection2 LOS path

I Q

16

slide-40
SLIDE 40

HIVI M200MK3 Speaker Ruler SAMSUNG S7 Over-the-air Channel HIVI M200MK3 Speaker SAMSUNG S7 Ruler Over-the-air Channel HIVI M200MK3 Speaker SAMSUNG S7 Over-the-air Channel

Office Corridor Home Tx to Rx: From 0.5m to 8m

Multi-path: Long range

Stronger and unpredictable!

Superimposed signal Reflection1 Reflection2 LOS path

I Q

16

slide-41
SLIDE 41

HIVI M200MK3 Speaker Ruler SAMSUNG S7 Over-the-air Channel HIVI M200MK3 Speaker SAMSUNG S7 Ruler Over-the-air Channel HIVI M200MK3 Speaker SAMSUNG S7 Over-the-air Channel

Office Corridor Home Tx to Rx: From 0.5m to 8m

Multi-path: Long range

Highly unpredictable!

Superimposed signal Reflection1 Reflection2 LOS path

I Q

Character Successful Rate (CSR):

17

slide-42
SLIDE 42

18

Design Inspiration

“ Open the door”

I + δ

SR(H(I + δ))

slide-43
SLIDE 43

18

Design Inspiration

“ Open the door”

I + δ

SR(H(I + δ))

Unknown, but share similarity!

slide-44
SLIDE 44

Design Inspiration

“ Open the door”

I + δ

SR(H(I + δ))

Unknown, but share similarity!

SR(H(I + δ))

H: public acoustic CIR datasets

18

slide-45
SLIDE 45

arg minδ α ⋅ dBI(δ) + 1 M ∑i Loss(SR(Hi(I + δ)), T′ )

Design Inspiration

“ Open the door”

I + δ

SR(H(I + δ))

Unknown, but share similarity!

SR(H(I + δ))

H: public acoustic CIR datasets

18

slide-46
SLIDE 46

19

arg minδ α ⋅ dBI(δ) + 1 M ∑i Loss(SR(Hi(I + δ)), T′ )

public acoustic CIR datasets

Design Inspiration

“ Open the door”

I + δ

SR(H(I + δ))

Unknown, but share similarity!

SR(H(I + δ))

Transcript and Character Successful Rate:

slide-47
SLIDE 47

20

“ Open the door”

I + δ

SR(H(I + δ))

Domain (environment-specific) information dominates!

SR(H(I + δ))

H: public acoustic CIR datasets

Design Inspiration

slide-48
SLIDE 48

21

arg minδ α ⋅ dBI(δ) + 1 M ∑i Loss(SR(Hi(I + δ)), T′ ) − β ⋅ Ld

Metamorph: Meta-Enha

Clean domain information

slide-49
SLIDE 49

22

Metamorph: Meta-Qual

  • Acoustic Graffiti:
  • Reducing Perturbation’s Coverage:

distance(δ, ̂ N)

L1/L2 regularization

slide-50
SLIDE 50

23

Evaluation: Audio Quality

Classical music Human speech Original: [no transcription] Meta-Enha: “hello world” Meta-Qual: “hello world” Original: “your son went to serve at a distant place and became a centurion” Meta-Enha: “open the door” Meta-Qual: “open the door”

  • Examples
slide-51
SLIDE 51

24

Evaluation: Attack Successful Rate

A multi-path prevalent office

  • Attack Target: “DeepSpeech” (White-Box)
slide-52
SLIDE 52

25

Evaluation: Attack Successful Rate

  • Line-of-Sight (LOS) Attack

Meta-Enha: > 90% attack successful rate

Transcript Successful Rate Character Successful Rate

slide-53
SLIDE 53

26

  • No-Line-of-Sight (NLOS) Attack

Evaluation: Attack Successful Rate

Character Successful Rate Transcript Successful Rate

Meta-Enha: over 85% attack successful rate across 11/20 NLOS location!

slide-54
SLIDE 54

27

Conclusion

  • 1. Investigate over-the-air audio adversarial attacks systematically.
  • 2. Propose a “generate-and-clean” two-phase design and improve

the audio quality.

  • 3. Develop a prototype and conduct extensive evaluations.

Visit acoustic-metamorph-system.github.io for more information!