FINDING YOUR VOICE IN THE REGULATORY AGE NIGEL CANNINGS CTO - - PowerPoint PPT Presentation

finding your voice in the regulatory age
SMART_READER_LITE
LIVE PREVIEW

FINDING YOUR VOICE IN THE REGULATORY AGE NIGEL CANNINGS CTO - - PowerPoint PPT Presentation

FINDING YOUR VOICE IN THE REGULATORY AGE NIGEL CANNINGS CTO nigel.cannings@intelligentvoice.com @intelligentvox 2017! 2016? 2015 THE YEAR OF VOICE As almost 50% of all corporate data will have a voice component within 5 years, either as


slide-1
SLIDE 1

FINDING YOUR VOICE IN THE REGULATORY AGE

NIGEL CANNINGS CTO

nigel.cannings@intelligentvoice.com @intelligentvox

slide-2
SLIDE 2

THE YEAR OF VOICE

LIBOR FX Scandal Banks face Multi-Billion $ fines Amazon Alexa SIRI(?)

As almost 50% of all corporate data will have a voice component within 5 years, either as audio or video, all companies, but particularly banks and insurance companies, need to get a handle not just on where this data is being held, but what is being said in it, and also who is saying it.

2015

2016?

2017!

slide-3
SLIDE 3

AUDIENCE PARTICIPATION

slide-4
SLIDE 4

HOW OFTEN DO YOU USE A VOICE ASSISTANT?

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% Daily Weekly Monthly Never

Results taken from a survey on 5th October 2017 of 1500 people across Europe

Of the people with a smart phone how many use their integrated voice assistant (e.g. Siri, Cortana):

slide-5
SLIDE 5

HOW OFTEN DO YOU USE A VOICE ASSISTANT?

0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% Daily Weekly Monthly

Results taken from a survey on 5th October 2017 of 1500 people across Europe

Of the people with an Alexa home assistant how often do they use it:

slide-6
SLIDE 6

IT’S A DOUBLE WHAMMY

Where?

GDPR MiFID II

What? Who?

slide-7
SLIDE 7

CLOUD SECURITY

Where is your voice stored?

Your voice could be used for any number of the following:

  • Use (edit) your voice recordings to impersonate you
  • Learn about you

→ Your identity, gender, nationality (accent), emotional state..

  • Track you from uploads / communications of voice recordings

WHERE

slide-8
SLIDE 8

ENCRYPTED SPEECH PROCESSING

Privacy preserving encrypted phonetic search of speech data C Glackin, G Chollet, N Dugan, N Cannings, J Wall, S Tahir, IG Ray IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017. A New Secure and Lightweight Searchable Encryption Scheme over Encrypted Cloud Data S Tahir, S Ruj, Y Rahulamathavan, M Rajarajan, C Glackin IEEE Transactions on Emerging Topics in Computing, 2017.

AES Encryption (Public key)

Powered by machine learning Powered by GPU

WHERE

slide-9
SLIDE 9

DEEP SPIKING NEURAL NETWORKS FOR SPEECH ENHANCEMENT

Recurrent lateral inhibitory spiking networks for speech enhancement J Wall, C Glackin, N Cannings, G Chollet, N Dugan International Joint Conference on Neural Networks (IJCNN), pp. 1023-1028, 2016.

TECHNICAL

slide-10
SLIDE 10

CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC MODELLING

TIMIT Speech Corpus 1.4M spectrograms for the training set Sliding window used for timing 4 to 5 phones in each 0.256 second window 61 Phoneme Classes

?

  • Beaten the current NTIMIT. State of the art! - Beaten the current NTIMIT. State of the art! - Beaten the current NTIMIT. State of the

TECHNICAL

slide-11
SLIDE 11

TECHNICAL

slide-12
SLIDE 12

HOW FAST?

10 30 50 80 100

20 40 60 80 100 120

Times Real Time

WHAT

slide-13
SLIDE 13

UNDERSTANDING

100x Realtime using P5000

WHAT

slide-14
SLIDE 14

TELEFONICA/O2

But this is just the beginning: Voice data is generated not only in the organisation, but externally, maybe as YouTube content. One area commonly forgotten is mobile

  • telephony. MiFID II now places a strong requirement

not just on recording calls made from a regulated

  • rganisation premises, but their mobile calls as well.

Intelligent Voice are working with Telefonica/O2 to capture, index and analyse mobile phone calls, and introduce them as part of a compliance and monitoring workflow for MiFID II .

WHAT

slide-15
SLIDE 15

CREDIBILITY

WHAT IS WRONG WITH THESE STATEMENTS?

“Woke up at 7:30. Had a shower. Made breakfast and read the newspaper. At 8:30, drove to work.” “We should have done a better job.” “That’s their way of doing things.” “You’d better ask them.” Alleged robbery victim: “The man asked for my money.” “He told me not to look at him. He said he would shoot me if I screamed.”

WHAT

slide-16
SLIDE 16

CREDIBILITY INDICATORS

Pronouns: Omission, Improper use, Higher rates of third person plural pronounced person plural pronouns Complexity: Parameters such as number of letters/syllables per word, higher word count, higher rate of pauses Speaking verbs: Strong tone (told, demanded, telling), soft tone (said, asked, stated, saying) – tone changes Tempo: Slow tempo (indicator of cognitive load), fast tempo (indicator of arousal and negative effects) Pitch: Higher pitch/lower voice quality at specific times are indications of fraudulent related utterances Specific Words: Explainers (so, since therefore, because…) These are just a few of the indicators of suspicious language

WHAT

slide-17
SLIDE 17

CREDIBILITY NETWORK

Voice Activity Detection i-vector diarization

What happened next? He told me not to look at him. He said he would shoot me if… INTERVIEWER CALLER

… He told me not to look at him . He said … Embedding LSTM LSTM Strong tone Weak tone followed by Inspired by recurrent networks for named entity recognition and part of speech tagging We can use bi-directional recurrent networks to attach credibility tags to the speech transcription Bi-directionality is important for context Network can tag explainers, changes in tone, pronouns etc. GPU- accelerated RNN-based Speech to Text

WHAT

slide-18
SLIDE 18

SPEAKER IDENTIFICATION

RASTA SOX MATLAB PYTHON RASTA 12

Dialect identification via images and DIGITS NIST evaluation of 500 hours and 20 dialects

WHO

slide-19
SLIDE 19

NIST EVALUATION

Preliminary Results

50 100 English- Portuguese-Brazilian Spanish- Spanish-European Chinese-Min_Dong Arabic Chinese-Cantonese Arabic-Egyptian English-British Spanish-Caribbean Slavic-Russian Arabic-Maghrebi Chinese-Mandarin Arabic-Iraqi English-American Chinese-Wu Slavic-Polish French-Haitian Arabic-Leventine French-West_African

WHO

slide-20
SLIDE 20

WHO

slide-21
SLIDE 21

CELEBRITY SOUND A LIKE

https://celebsoundalike.com/

Tweet your results to @intelligentvox

WHO

slide-22
SLIDE 22

CONCLUSION

THANK YOU

nigel.cannings@intelligentvoice.com @intelligentvox