Beyond the Equal Error Rate About the inter-relationship between - - PowerPoint PPT Presentation

beyond the equal error rate
SMART_READER_LITE
LIVE PREVIEW

Beyond the Equal Error Rate About the inter-relationship between - - PowerPoint PPT Presentation

ISCA Archive


slide-1
SLIDE 1

Beyond the Equal Error Rate

About the inter-relationship between algorithm and application Renana Peres Comverse Technology

  • ISCA Archive
slide-2
SLIDE 2

authentication

Market needs

Effective authentication tools for remote services Direct banking Service Centers Home shopping Calling Cards E commerce Mobile Commerce Smart cards

RF Signatures Questions Profiles PIN codes

U N S A F E E X P E N S I V E N O T F R I E N D L Y

slide-3
SLIDE 3

Telecom Service centers Calling Cards, 97’ US Visa, 96’, US AT&T, 94’, US 10-30 B$/Y 1B$ 2B$ 0.5B$

FRAUD

IMAGE SERVICE ECONOMIC

Customer Satisfaction Expenses Profitability

Effective authentication

The barrier in the expansion of remote commerce services

slide-4
SLIDE 4

Operational Scenarios

Free speech and vocal password applications

Applications: Call Centers Cellular Roamers Calling cards Voice / IP

Claimed id.

Verify Accept/Reject

  • Speaker Verification

Free speech (Text Independent)

Applications: Credit Cards IVR Interactions Physical Access E-commerce

Verify Accept/Reject

Claimed id.

V

  • c

a l p a s s w

  • r

d ( T e x t D e p e n d e n t )

slide-5
SLIDE 5

Voice based verification

Authentication solution for any remote services

Friendly

Combines with transaction flow No passwords Use of natural speech

Personal, biometric verification

Safe Saves Costs

Fraud prevention Reduce bureaucracies Increase service volumes Shortens call duration

slide-6
SLIDE 6

Typical Architecture

Integrated into the service provider infrastructures

Audio system Storage system Calling application Management Processing units Coordinator

slide-7
SLIDE 7

Research challenges Speaker separation and segmentation Segmentation with unknown

  • no. of speakers

Non-speech and silence vox Non-password speech vox

Audio issues

IVR Transfer volumes: Call Center: 100 - 3000 agents = 100 concurrent calls Telecom: 10 - 30 trunks= 300-900 concurrent calls

Audio system Storage system Calling application Management Processing units Coordinator

Free speech Vocal password

slide-8
SLIDE 8

Storage

Internal vs. external storage architecture

Internal storage

Audio system Processing units

External storage

Audio system Processing units Coordinator Coordinator Audio system Processing units Coordinator

Disk chase

slide-9
SLIDE 9

Storage

Coordinator Processing units Audio System World models & data Claimed identities Verification results statistics Verification audio Storage issues: Large storage volumes: 1 minute audio = 0.5 Mbyte (PCM) Storage of audio objects Backup, redundancy Voice signature maintenance

Large, dynamic storage, containing voice & data

Storage operations Create new VS Add audio to VS Remove session from VS Remove VS Add audio to world model Store speaker model Modify claimed id. Get VS data Voice signatures Speaker models Audio

slide-10
SLIDE 10

Add to VS Verify car T r a i n i n g T r a i n i n g

Storage

Voice signature maintenance

Time call1 call2 call3 T r a i n i n g Research challenges Time evolution of VS VS update policy Identification of faulty sessions Re-training without audio Compact speaker models

Audio sessions are added to VS; VS is re-trained

cellular T r a i n i n g

slide-11
SLIDE 11

Recognition phases

Calibration, enrolment, verification

Time Add to VS Verify Time Add to Cal Verify subscribers

Train Train Train Train Train

Add to VS Calibration Enrolment Calibration : Initial parameter settings, creating world models Enrolment: VS data accumulation, creating speaker model Verification: Match an incoming call against a claimed identity

slide-12
SLIDE 12

Calibration

Initial parameter setting

Calibration data: world models, tuning data,other params Large amount of audio Heavy computation No source labeling Research challenges Calibration with mixed source data (unsupervised clustering ?) Time evolution of world models Calibration for text-dependent applications (no impostor repetitions, no language info)

slide-13
SLIDE 13

Enrolment

Voice Signature for each subscriber During enrolment, alternative authentication methods are used

Research challenges Minimum user involvement Signature robustness Mixed source signature Mixed source corpora Measurement for VS quality First 2-3 calls Add call to VS Train VS ready for verification Problem in VS Off-line operation Free speech Enrolment session Repeat password Train More audio? VS ready for verification Vocal Password

slide-14
SLIDE 14

Verification

The most frequent mission

DTMF Verification API Verify Claimed id. Another trial required Accept / Reject Speech recognition CLI Research challenges Multi trial verification Share info between trials

slide-15
SLIDE 15

Result update policies

For free speech applications

Call Start Transaction 1 Call End Transaction 2

Upon request Fixed intervals Confidence level

slide-16
SLIDE 16

Decision Policy

Service oriented Security oriented

FA FR

Threshold

Algorithmic results + application cost function

slide-17
SLIDE 17

Decision and Scoring

Research challenges Effective scoring Likelihood ratio -> FA / FR

Intra speaker Inter speaker

Likelihood Ratio

Decision Threshold

FA FR

Posterior Probability

slide-18
SLIDE 18

Management

Tools for system monitoring and maintenance

General information System status Mission status Loads Speaker Recognition information

  • No. of trained voice signatures

Data collection status Rejection cases Performance measurements Feedback

slide-19
SLIDE 19

Summary

Algorithms Applications

Telephony Transfer volumes Service User behavior New research challenges

slide-20
SLIDE 20

Summary of Research challenges

  • Speaker separation and

segmentation

  • Segmentation with unknown no.
  • f speakers
  • Non-speech and silence vox
  • Non-password speech vox

Audio

  • Minimum user involvement
  • Signature robustness
  • Mixed source signature
  • Time evolution of signatures
  • Measurement for VS quality

Enrolment

  • Multi trial verification
  • Share info between trials

Verification

  • Re-training without audio
  • Identification of faulty sessions
  • VS update policy
  • Time evolution of VS
  • Compact speaker models

Storage

  • Calibration with mixed source data

(unsupervised clustering ?)

  • Time evolution of world models
  • Calibration for text-dependent

applications (no impostor repetitions, no language info)

Calibration

  • Effective scoring
  • Likelihood ratio -> FA / FR

Scoring

  • Mixed source
  • Cellular
  • Free password

Corpora