Le Lear arnin ing-based P based Prac actic ical S al Smar martpho phone ne Eavesdrop
- ppin
Le Lear arnin ing-based P based Prac actic ical S al Smar - - PowerPoint PPT Presentation
Le Lear arnin ing-based P based Prac actic ical S al Smar martpho phone ne Eavesdrop oppin ing wi with Bu Built-in A in Acceler elerome meter er Authors: Zhongjie Ba, Tianhang Zheng, Xinyu Zhang, Zhan Qin, Baochun Li, Xue Liu
Motion Sensor Gyroscope Accelerometer Magnetic Sensor Magnetometer Voice Sensor Microphone Image Sensor Camera
Accelerometer
Speaker Speaker Identification Digits Recognition Mixed Female/Male 50% 17% Female speakers 45% 26% Male speakers 65% 23%
Gyroscope Accelerometer
Through a shared surface Through air The threat does not go beyond the Loudspeaker-Same-Surface setup studied by Michalevsky et al.
Fundamental frequency range of human speech 85-180 Hz 165-255 Hz
Delay Options Delay Sampling Rate DELAY NORMAL 200 ms 5 Hz DELAY UI 20 ms 50 Hz DELAY GAME 60 ms 16.7 Hz DELAY FASTEST 0 ms AFAP
<latexit sha1_base64="yYOIwdWODfxZEK63X1Gki8GTDu8=">AC+HicdZLbtNAFIbHLpcSoKSwZDMiomIV2YW2LOvSlCK1NDRJWxRH0fHkJBl1PLZmxkipmydhwKE2PIo7HgbxonFpQ3HGun3Od8/lzMTpYJr43k/HXfpxs1bt5fvVO7eu7/yoLr68EQnmWLYlI1FkEGgWX2DHcCDxLFUIcCTyNzl8V9dMPqDRPZNtMUuzFMJ8yBkYm+qvOithCMucwNRJkBN80txyew3rYTjYtIK3UBE3qUFgZN18r/NdqC2O5QjugxGKRhWKGlIw+NobuNg+B92H97dHwYHEwtvu5NC78G3T/wtIL4M6bOWg5i3n/5V4Hh42C3Cxn9DfrWxami+m9oNVutNqFoeSDvaD5hw1RDn4fv1+teXVvFvS68EtRI2U0+9Uf4SBhWYzSMAFad30vNb0clOFMoO1ipjEFdg4j7FopIUbdy2cXN6VPbWZAh4myQxo6y/7tyCHWehJHlozBjPXVWpFcVOtmZviyl3OZgYlmy80zAQ1CS1eAR1whcyIiRXAFLd7pWwMCpixb6Vim+BfPfJ1cbJe95/XN969qG3vlO1YJo/JE/KM+GSLbJN90iQdwpzM+eh8dr64F+4n96v7bY6Tul5RP4J9/svAufiTA=</latexit>Model Year Sampling Rate Moto G4 2016 100 Hz Samsung J3 2016 100 Hz LG G5 2016 200 Hz Huawei Mate 9 2016 250 Hz Samsung S8 2017 420 Hz Google Pixel 3 2018 410 Hz Huawei P20 Pro 2018 500 Hz Huawei Mate 20 2018 500 Hz
<latexit sha1_base64="FsdzMpQszuC+0z1Y2gmbgrn9ioY=">ADinicfZLdbtMwFMfdBNhaBuvgkhuLiobqiRt103cTIDUCjGpMLoNVXlOG5rzbEjxwFK1nfhmbjbThpw0TbsRNF+euc3/mwc4JY8MQ4zu+SZd+7/2Bnt1x5uPfo8X714Ml5olJN2YAqofRlQBImuGQDw41gl7FmJAoEuwiu3ubxi69MJ1zJz2Yes1FEpJPOCUGXOD0s+yH7Apl5khQSqIXmTX4prCs6iUy/4srwviVIVM4Dr+woiGzxmJYDg5xZ+IYdj31GjcLdVx57jHuK6zi492ODgfwkhfT3zTuxD13cbeMCwXvNqaXkm+M49N8kO/bN1r39H17GjFdaBmy8vBda6r1FQw3Of4czNFQspLXcbLbr3oUpfqxu07fwXQ4K+Dq6PqnPZHjzP8bVmtNwloa3hVuIGiqsP67+8kNF04hJQwVJkqHrxGaUEW04FWxR8dOExYRekSkbgpQkYskoW67SAr8AT4gnSsMrDV56/83I4A6TeRQAGREzSzZjufO2DA1k6NRxmWcGibpqtEkFRiWJd9LHLNqBFzEIRqDrNiOiOaUAPbW4FLcDePvC3OvYbLQ/tmonb4r2EXP0HP0Ermog05QD/XRAFrx3plHVode8/27GP79Qq1SkXOU7Rm9rs/jM74yw=</latexit>[1] “Sensor Overview,” https://developer.android.com/guide/topics/sensors/sensors_overview.
Sampling frequencies supported by Android [1]
0.4 0.03
Table setting Handhold setting
Time (ms) x-axis ("/$%) y-axis ("/$%) z-axis ("/$%) 1
10.0020 2
9.9970 3
9.9970 5
10.0070 8
10.0120 10
10.0070
Time (ms) x-axis ("/$%) y-axis ("/$%) z-axis ("/$%) 1
10.0020 2
9.9970 3
9.9970 5
10.0070 8
10.0120 10
10.0070
using linear interpolation.
Time (ms) x-axis ("/$%) y-axis ("/$%) z-axis ("/$%) 1
10.0020 2
9.9970 3
9.9970 4
10.0020 5
10.0070 6
10.0087 7
10.0103 8
10.0120 9
10.0095 10
10.0070
using linear interpolation.
Fundamental frequency range of human speech 85-180 Hz 165-255 Hz
using linear interpolation.
axis to the frequency domain and eliminate frequency components below 80 Hz.
Table setting Handhold setting
using linear interpolation.
axis to the frequency domain and eliminate frequency components below 80 Hz.
using linear interpolation.
axis to the frequency domain and eliminate frequency components below 80 Hz.
signal and smooth the obtained magnitude sequence with moving average.
than a threshold.
Table setting Handhold setting
with a fixed overlap.
and calculate its spectrum through STFT (Short- Time Fourier Transform).
single-word signal. Table setting Handhold setting
with a fixed overlap.
and calculate its spectrum through STFT (Short- Time Fourier Transform).
single-word signal.
x 3 tensor.
tensor and map the obtained values to integers between 0 and 255.
format Table setting Handhold setting Table setting Handhold setting
Huang, Gao, et al. "Densely connected convolutional networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
Previous SOTA results: 26% on recognizing digits Previous SOTA results: 50% on recognizing 10 speakers Traditional ML + gyroscope+ Loudspeaker-Same-Surface
AccDataRec Audio Player
Insensitive words Hot words
Hot word TPR FPR Password 94% 0.4% Username 97% 0.4% Social 100% 0.3% Security 91% 0.0% Number 88% 0.1% Email 88% 1.4% Credit 88% 0.3% Card 97% 1.4%
<latexit sha1_base64="742alscu+TK3Kxe6vbew5OdTJo=">ADYnicdZJRb9MwEMfdhMEaYGvZIzxYVEM8Vck61PI2UYH2hMpYt0lNVTnOtbXm2JHtAFXWL8kbT7zwQXDaiLXNuCjxX3c/X3zni1LOtPH9XzXHfbT3+Ml+3Xv67PnBYaP54krLTFEYUsmluomIBs4EDA0zHG5SBSJOFxHt/0ifv0NlGZSXJpFCuOEzASbMkqMdU2atR9ePYxgxkRuSJRxopb5Hb+j9l69Xo4LxJbcS4N/i5VjN/gy8GF/X4aXIThJjEgWpfE+9Pw2C5+267b0FCDEiSBAureQ3ib+iopI9wGA98voU4VApopZhZFqCk/Ar1OUsiUDbY65VMUGE+JoTxeyR4ER9BTEzm2k6u6X1ybr27v+ShCDif032vEmjZQ+8MlwVQSlaqLTBpPEzjCXNEhCGctvsUeCnZpwTZRjlsPTCTENK6C2ZwcjKos16nK9GZImPrSfGU6nsKwxeTd35CTRepFElkyImevdWOF8KDbKzLQ3zplIMwOCrn80zTg2EhfzhmOmgBq+sIJQe1+MYjonilBjp7JoQrBbclVcnbSDTvdl9PW2YeyHfvoJXqN3qIAdEZOkcDNES09tvZcw6cQ+eP67lN92iNOrVyzxHaMvfVX5H97L0=</latexit>insensitive words and one to three hot words.
digit password.
mappings (inspired by ResNet)
H(x) = F(x, Wi) + x
<latexit sha1_base64="P1uq49SUCts+KaEJE2YMrUYRLvg=">ACS3icbVDLSgMxFM1Uq7W+qi7dBItQUcqMD3QjFAXpsoJ9QDsMmTRTQzOZIcmIZj/c+PGnT/hxoUiLsz0gb1QsjJOedyb4bMiqVab4amYXF7NJybiW/ura+sVnY2m7IBKY1HAtFykSMclJXVDHSCgVBvstI0+1fp3rzgQhJA36nBiGxfdTj1KMYKU05BbfjI3XvevHwxojF1SQpTcjH5ABewjnLzZTlCMaTRzNxqG45hL9qPu8UimbZHBacB9YFMG4ak7hpdMNcOQTrjBDUrYtM1R2jISimJEk34kCRHuox5pa8iRT6QdD7NI4L5mutALhD5cwSH7tyNGvpQD39XOdEc5q6Xkf1o7Ut6FHVMeRopwPBrkRQyqAKbBwi4VBCs20ABhQfWuEN8jgbDS8achWLNfngeN47J1Uj67PS1WrsZx5MAu2AMlYIFzUAFVUAN1gMETeAMf4N4Nt6NL+N7ZM0Y454dMFWZ7A8PQ7R+</latexit>Johnson, Justin, et al. "Perceptual losses for real-time style transfer and super-resolution." European conference on computer vision. Springer, Cham, 2016.
and mobile games are 16.7 Hz and 50 Hz respectively.
<user-permission >
Recognition accuracy on the digits dataset