Making decisions with biometric systems: the usefulness of a - - PowerPoint PPT Presentation

making decisions with biometric systems the usefulness of
SMART_READER_LITE
LIVE PREVIEW

Making decisions with biometric systems: the usefulness of a - - PowerPoint PPT Presentation

Making decisions with biometric systems: the usefulness of a Bayesian perspective A. Nautsch , D. Ramos Castro , J. Gonz guez , alez Rodr Christian Rathgeb , Christoph Busch Hochschule Darmstadt, CRISP, CASED, da/sec


slide-1
SLIDE 1

Making decisions with biometric systems: the usefulness of a Bayesian perspective

  • A. Nautsch⋆, D. Ramos Castro†, J. Gonz´

alez Rodr´ ıguez†, Christian Rathgeb⋆, Christoph Busch⋆

⋆Hochschule Darmstadt, CRISP, CASED, da/sec Security Research Group †Universidad Aut´

  • noma de Madrid, ATVS Biometric Recognition Group

NIST IBPC’16, Gaithersburg, 03.05.2016

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 1/32

slide-2
SLIDE 2

Outline

  • 1. Decision Frameworks in Biometrics and Forensics
  • 2. Bayesian Method: making good decisions
  • 3. Metrics, operating points and examples
  • 4. Conclusion

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 2/32

slide-3
SLIDE 3

Decision Frameworks Biometric Systems in ISO/IEC JTC1 SC37 SD11

⇒ Note: separate decision subsystem

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 3/32

slide-4
SLIDE 4

Decision Frameworks Making Decisions with Biometric Systems

Decisions are involved in most applications of biometric systems

◮ Access control

Accepted-rejected decision

◮ Forensic Investigation

Decide the k list to investigate e.g., AFIS

◮ Intelligence

Decide where to establish relevant links in a database

◮ Forensic Evaluation

Commnunicate for the court to decide a veredict

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 4/32

slide-5
SLIDE 5

Decision Frameworks Making Decisions with Biometric Systems

Decisions are involved in most applications of biometric systems

◮ Access control

Accepted-rejected decision

◮ Forensic Investigation

Decide the k list to investigate e.g., AFIS

◮ Intelligence

Decide where to establish relevant links in a database

◮ Forensic Evaluation

Commnunicate for the court to decide a veredict

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 4/32

slide-6
SLIDE 6

Decision Frameworks Making Decisions with Biometric Systems

Decisions are involved in most applications of biometric systems

◮ Access control

Accepted-rejected decision

◮ Forensic Investigation

Decide the k list to investigate e.g., AFIS

◮ Intelligence

Decide where to establish relevant links in a database

◮ Forensic Evaluation

Commnunicate for the court to decide a veredict

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 4/32

slide-7
SLIDE 7

Decision Frameworks Making Decisions with Biometric Systems

Decisions are involved in most applications of biometric systems

◮ Access control

Accepted-rejected decision

◮ Forensic Investigation

Decide the k list to investigate e.g., AFIS

◮ Intelligence

Decide where to establish relevant links in a database

◮ Forensic Evaluation

Commnunicate for the court to decide a veredict

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 4/32

slide-8
SLIDE 8

Decision Frameworks Making Decisions with Biometric Systems

◮ Decision maker faces multiple sources of information

Biometric system is one of them, but also . . .

◮ Prior knowledge about users/impostors/suspects ◮ Other evidence from other biometric systems ◮ . . .

◮ Decisions must consider all that information

◮ Formalizing decision framework helps ◮ Especially in complex problems ◮ Example: medical diagnosis support Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 5/32

slide-9
SLIDE 9

Bayesian Method Bayesian Decisions with Biometric Systems

◮ A proposal: Bayesian decision theory

◮ Decisions are made based on posterior probabilities ◮ Considering all the relevant information available ◮ Updating strategy: likelihood ratios (LR)

Example biometrics systems in forensic evaluation of the evidence

Prior probability all information prior to (forensic) evidence Posterior probability all information, inlcuding (forensic) evidence Weight of the Evidence Likelihood Ratio (LR)

[1] I. Evett: Towards a uniform framework for Reporting opinions in forensic science Casework, Science and Justice, 1998. Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 6/32

slide-10
SLIDE 10

Bayesian Method Value of Evidence: Likelihood Ratio (LR)

◮ Two-class (H1, H2) decision framework ◮ Likelihood Ratio: probabilistic value of the evidence,

also: the ratio of posterior to prior odds

Prior

  • dds

Posterior

  • dds

Inference

  • dds: 1:99

P (H1) = 1%

  • dds: 1000:99

P (H1 | E) = 91%

LR

LR = 1000

P(H1) P(H2) × P(E | H1) P(E | H2) = P(H1 | E) P(H2 | E)

Prior odds LR Posterior odds

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 7/32

slide-11
SLIDE 11

Bayesian Method Decisions Using Biometric Systems

◮ Binary classes (hypotheses): H1 and H2 ◮ Inference

◮ Prior probability, before knowing the biometric system outcome ◮ Posterior probability, after the biometric system outcome ◮ LR is the value of the biometric evidence

⇒ Changes prior odds into posterior odds

Prior

  • dds

Posterior

  • dds

Inference LR (Biometric System) P(H1 | E) P(H2 | E)

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 8/32

slide-12
SLIDE 12

Bayesian Method Decisions Using Biometric Systems

◮ Costs: Penalty of making a wrong decision

towards H1 (Cf1) or H2 (Cf2).

◮ Can be different — example in access control:

◮ is it better to accept an impostor (cost Cf1) ◮ or to reject a genuine user (cost Cf2)?

Prior

  • dds

Posterior

  • dds

Inference LR (Biometric System) Costs Cf1, Cf2

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 9/32

slide-13
SLIDE 13

Bayesian Method Decisions Using Biometric Systems

◮ Decision: Minimum-risk decision

i.e.: minimum mean cost

◮ Decision rule considers

◮ Posterior odds ◮ Costs

Prior

  • dds

Posterior

  • dds

Inference LR (Biometric System) Costs Cf1, Cf2 Decision H1 or H2?

P(H1 | E) Cf1 P(H2 | E) Cf2

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 10/32

slide-14
SLIDE 14

Bayesian Method Decision Process: Competences

◮ Total separation between

◮ The comparator (biometric system outputing a LR) ◮ The decision maker (depends on the application)

Prior

  • dds

Posterior

  • dds

Inference LR Costs Cf1, Cf2 Decision H1 or H2? Competence of the Comparator (Biometric System) Competence of the Decision Maker

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 11/32

slide-15
SLIDE 15

Bayesian Method Decision Process: Consequences

◮ Duty of the biometric systems:

yielding LR values that lead to the correct decisions

◮ The LR should support H1 when H1 is actually true ◮ The LR should support H2 when H2 is actually true

◮ LR values must be calibrated, which leads to better decisions Prior

  • dds

Posterior

  • dds

Inference LR Costs Cf1, Cf2 Decision H1 or H2? Should lead to the correct decision!

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 12/32

slide-16
SLIDE 16

Bayesian Method Biometric Systems

◮ Score-based architecture

◮ Widely extended ◮ Especially in black-box implementations (COTS)

Criminal Suspect Biometric System Score

◮ Score: in general the only output of the system

◮ It may not be directly interpretable as a likelihood ratio ◮ Depends on its calibration performance Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 13/32

slide-17
SLIDE 17

Bayesian Method LR-Based Computation with Biometric Systems

◮ A further stage is necessary: score-to-LR transformation

Biometric System Score-to-LR LR Score

◮ Objective:

  • utput discriminating scores

◮ Score-based architecture ◮ Improve ROC/DET curves

◮ Objective:

transforming the score into a meaningful LR

⇒ Calibration of LRs [2,3]

[2] N. Br¨ ummer and J. du Preez: Application Independent Evaluation of Speaker Detection, Computer Speech and Language, 2006. [3] D. Ramos and J. Gonz´ alez Rodr´ ıguez: Reliable support: Measuring calibration of likelihood ratios, Forensic Science International, 2013. Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 14/32

slide-18
SLIDE 18

Bayesian Method LR-Based Computation with Biometric Systems

◮ A further stage is necessary: score-to-LR transformation

Biometric System Score-to-LR LR

◮ Objective:

  • utput discriminating scores

◮ Score-based architecture ◮ Improve ROC/DET curves

◮ Objective:

transforming the score into a meaningful LR

⇒ Calibration of LRs [2,3]

[2] N. Br¨ ummer and J. du Preez: Application Independent Evaluation of Speaker Detection, Computer Speech and Language, 2006. [3] D. Ramos and J. Gonz´ alez Rodr´ ıguez: Reliable support: Measuring calibration of likelihood ratios, Forensic Science International, 2013. Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 14/32

slide-19
SLIDE 19

Bayesian Method LR-Based Computation with Biometric Systems

◮ A further stage is necessary: score-to-LR transformation

Biometric System Score-to-LR LR

◮ Objective:

  • utput discriminating scores

◮ Score-based architecture ◮ Improve ROC/DET curves

◮ Objective:

transforming the score into a meaningful LR

⇒ Calibration of LRs [2,3]

[2] N. Br¨ ummer and J. du Preez: Application Independent Evaluation of Speaker Detection, Computer Speech and Language, 2006. [3] D. Ramos and J. Gonz´ alez Rodr´ ıguez: Reliable support: Measuring calibration of likelihood ratios, Forensic Science International, 2013. Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 14/32

slide-20
SLIDE 20

Bayesian Method Bayesian Decisions: Advantages

◮ Competences of the biometric system are delimited:

◮ Biometric system: comparator ◮ Decision maker: final decision considering all the information ◮ Separation of roles: important in some fields (e.g. forensics)!

◮ Information is integrated formally

⇒ LR into a probabilistic framework

◮ LR computation: great experience in other fields

⇒ Example: forensic biometrics

Prior

  • dds

Posterior

  • dds

Inference LR (Biometric System) Costs Cf1, Cf2 Decision H1 or H2?

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 15/32

slide-21
SLIDE 21

Metrics and Examples Revisiting ISO/IEC JTC1 SC37 SD11

FNMR, FMR → DET

P (H1) P (H2) = π 1−π

⇒ π Cf1, Cf2 DCF → APE & NBER ECE Cllr

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 16/32

slide-22
SLIDE 22

Metrics and Examples Revisiting ISO/IEC JTC1 SC37 SD11

FNMR, FMR → DET

P (H1) P (H2) = π 1−π

⇒ π Cf1, Cf2 DCF → APE & NBER ECE Cllr

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 16/32

slide-23
SLIDE 23

Metrics and Examples Revisiting ISO/IEC JTC1 SC37 SD11

FNMR, FMR → DET

P (H1) P (H2) = π 1−π

⇒ π Cf1, Cf2 DCF → APE & NBER ECE Cllr

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 16/32

slide-24
SLIDE 24

Metrics and Examples Revisiting ISO/IEC JTC1 SC37 SD11

FNMR, FMR → DET

P (H1) P (H2) = π 1−π

⇒ π Cf1, Cf2 DCF → APE & NBER ECE Cllr

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 16/32

slide-25
SLIDE 25

Metrics and Examples Revisiting ISO/IEC JTC1 SC37 SD11

FNMR, FMR → DET

P (H1) P (H2) = π 1−π

⇒ π Cf1, Cf2 DCF → APE & NBER ECE Cllr

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 16/32

slide-26
SLIDE 26

Metrics and Examples Detection Error Trade-off (DET) diagrams

−10 10 0.1 0.2 0.3 Scores pdf H1 H2 −10 10 0.5 1 Thresholds error FNMR FMR 0.1 1 2 5 10 20 40 0.1 1 2 5 10 20 40 FMR (in %) FNMR (in %) DET (steppy) 30 FNMs

[4] N. Br¨ ummer and E. de Villers: The BOSARIS Toolkit User Guide: Theory, Algorithms and Code for Binary Classifier Score Processing, Tech.Rep. AGNITIO Research, 2011. Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 17/32

slide-27
SLIDE 27

Metrics and Examples From Bayesian Decisions to Cost Functions

◮ Bayes theorem P(H1) P(H2)

×

P(E | H1) P(E | H2)

=

P(H1 | E) P(H2 | E) Prior odds LR Posterior odds ◮ Decision rule

P(H1 | E) Cf1 P(H2 | E) Cf2 ⇔ P(H1 | E)

P(H2 | E) Cf2 Cf1 ◮ Bayesian threshold η for Log-LRs (LLRs) by posterior odds

η = log Cf2

Cf1 − log P(H1) P(H2) LLR

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 18/32

slide-28
SLIDE 28

Metrics and Examples From Bayesian Decisions to Cost Functions

◮ Bayes theorem P(H1) P(H2)

×

P(E | H1) P(E | H2)

=

P(H1 | E) P(H2 | E) Prior odds LR Posterior odds ◮ Decision rule

P(H1 | E) Cf1 P(H2 | E) Cf2 ⇔ P(H1 | E)

P(H2 | E) Cf2 Cf1 ◮ Bayesian threshold η for Log-LRs (LLRs) by posterior odds

η = log Cf2

Cf1 − log P(H1) P(H2) LLR

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 18/32

slide-29
SLIDE 29

Metrics and Examples From Bayesian Decisions to Cost Functions

◮ Bayes theorem P(H1) P(H2)

×

P(E | H1) P(E | H2)

=

P(H1 | E) P(H2 | E) Prior odds LR Posterior odds ◮ Decision rule

P(H1 | E) Cf1 P(H2 | E) Cf2 ⇔ P(H1 | E)

P(H2 | E) Cf2 Cf1 ◮ Bayesian threshold η for Log-LRs (LLRs) by posterior odds

η = log Cf2

Cf1 − log P(H1) P(H2) LLR

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 18/32

slide-30
SLIDE 30

Metrics and Examples From Bayesian Decisions to Cost Functions

◮ Bayesian error rate: Decision Cost Function (DCF) DCF(P(H1), P(H2), Cf1, Cf2) = P(H1) FNMR(η) Cf1 + P(H2) FMR(η) Cf2 η = log Cf2

Cf1 − log P (H1) P (H2)

◮ Simplifying the operating point (P(H1), P(H2), Cf1, Cf2) → ˜ π

  • 1. Mutually exclusive priors: log P (H1)

P (H2) = log π 1−π = logit π

DCF(π, Cf1, Cf2) = π FNMR(η) Cf1 + (1 − π) FMR(η) Cf2

  • 2. Introducing an effective prior: ˜

π =

π Cf1 π Cf1 + (1−π) Cf2

DCF(˜ π) = ˜ π FNMR(η) + (1 − ˜ π) FMR(η) = DCF(π, 1, 1) η = − logit ˜ π

⇒ meaningful LLR operating points: ˜ π or η

[4] N. Br¨ ummer and E. de Villiers: The BOSARIS Toolkit User Guide: Theory, Algorithms and Code for Binary Classifier Score, Tech.Rep., AGNITIO Research, December 2011. Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 19/32

slide-31
SLIDE 31

Metrics and Examples From Bayesian Decisions to Cost Functions

◮ Bayesian error rate: Decision Cost Function (DCF) DCF(P(H1), P(H2), Cf1, Cf2) = P(H1) FNMR(η) Cf1 + P(H2) FMR(η) Cf2 η = log Cf2

Cf1 − log P (H1) P (H2)

◮ Simplifying the operating point (P(H1), P(H2), Cf1, Cf2) → ˜ π

  • 1. Mutually exclusive priors: log P (H1)

P (H2) = log π 1−π = logit π

DCF(π, Cf1, Cf2) = π FNMR(η) Cf1 + (1 − π) FMR(η) Cf2

  • 2. Introducing an effective prior: ˜

π =

π Cf1 π Cf1 + (1−π) Cf2

DCF(˜ π) = ˜ π FNMR(η) + (1 − ˜ π) FMR(η) = DCF(π, 1, 1) η = − logit ˜ π

⇒ meaningful LLR operating points: ˜ π or η

[4] N. Br¨ ummer and E. de Villiers: The BOSARIS Toolkit User Guide: Theory, Algorithms and Code for Binary Classifier Score, Tech.Rep., AGNITIO Research, December 2011. Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 19/32

slide-32
SLIDE 32

Metrics and Examples From Bayesian Decisions to Cost Functions

◮ Bayesian error rate: Decision Cost Function (DCF) DCF(P(H1), P(H2), Cf1, Cf2) = P(H1) FNMR(η) Cf1 + P(H2) FMR(η) Cf2 η = log Cf2

Cf1 − log P (H1) P (H2)

◮ Simplifying the operating point (P(H1), P(H2), Cf1, Cf2) → ˜ π

  • 1. Mutually exclusive priors: log P (H1)

P (H2) = log π 1−π = logit π

DCF(π, Cf1, Cf2) = π FNMR(η) Cf1 + (1 − π) FMR(η) Cf2

  • 2. Introducing an effective prior: ˜

π =

π Cf1 π Cf1 + (1−π) Cf2

DCF(˜ π) = ˜ π FNMR(η) + (1 − ˜ π) FMR(η) = DCF(π, 1, 1) η = − logit ˜ π

⇒ meaningful LLR operating points: ˜ π or η

[4] N. Br¨ ummer and E. de Villiers: The BOSARIS Toolkit User Guide: Theory, Algorithms and Code for Binary Classifier Score, Tech.Rep., AGNITIO Research, December 2011. Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 19/32

slide-33
SLIDE 33

Metrics and Examples From Bayesian Decisions to Cost Functions

◮ Bayesian error rate: Decision Cost Function (DCF) DCF(P(H1), P(H2), Cf1, Cf2) = P(H1) FNMR(η) Cf1 + P(H2) FMR(η) Cf2 η = log Cf2

Cf1 − log P (H1) P (H2)

◮ Simplifying the operating point (P(H1), P(H2), Cf1, Cf2) → ˜ π

  • 1. Mutually exclusive priors: log P (H1)

P (H2) = log π 1−π = logit π

DCF(π, Cf1, Cf2) = π FNMR(η) Cf1 + (1 − π) FMR(η) Cf2

  • 2. Introducing an effective prior: ˜

π =

π Cf1 π Cf1 + (1−π) Cf2

DCF(˜ π) = ˜ π FNMR(η) + (1 − ˜ π) FMR(η) = DCF(π, 1, 1) η = − logit ˜ π

⇒ meaningful LLR operating points: ˜ π or η

[4] N. Br¨ ummer and E. de Villiers: The BOSARIS Toolkit User Guide: Theory, Algorithms and Code for Binary Classifier Score, Tech.Rep., AGNITIO Research, December 2011. Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 19/32

slide-34
SLIDE 34

Metrics and Examples From Bayesian Decisions to Cost Functions

◮ Bayesian error rate: Decision Cost Function (DCF) DCF(P(H1), P(H2), Cf1, Cf2) = P(H1) FNMR(η) Cf1 + P(H2) FMR(η) Cf2 η = log Cf2

Cf1 − log P (H1) P (H2)

◮ Simplifying the operating point (P(H1), P(H2), Cf1, Cf2) → ˜ π

  • 1. Mutually exclusive priors: log P (H1)

P (H2) = log π 1−π = logit π

DCF(π, Cf1, Cf2) = π FNMR(η) Cf1 + (1 − π) FMR(η) Cf2

  • 2. Introducing an effective prior: ˜

π =

π Cf1 π Cf1 + (1−π) Cf2

DCF(˜ π) = ˜ π FNMR(η) + (1 − ˜ π) FMR(η) = DCF(π, 1, 1) η = − logit ˜ π

⇒ meaningful LLR operating points: ˜ π or η

[4] N. Br¨ ummer and E. de Villiers: The BOSARIS Toolkit User Guide: Theory, Algorithms and Code for Binary Classifier Score, Tech.Rep., AGNITIO Research, December 2011. Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 19/32

slide-35
SLIDE 35

Metrics and Examples Example on Decision Cost Functions (DCFs)

◮ Speaker recognition ivec/PLDA scores (I4U list/NIST SRE’12) −10 10 0.1 0.2 0.3 LLRs pdf H1 H2 −10 10 0.5 1 η error FNMR FMR ◮ Example: DCF(1:1, η = 0) vs. DCF(1:100, η ≈ 4.6) −10 10 0.1 0.2 η Cost DCF(˜ π = 1

2)

−10 10 0.1 0.2 η Cost DCF(˜ π =

1 101 )

⇒ actual vs. minimum DCF: calibration loss ⇒ LLR meaning: aligning scores for Bayesian support

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 20/32

slide-36
SLIDE 36

Metrics and Examples Example on Decision Cost Functions (DCFs)

◮ Speaker recognition ivec/PLDA scores (I4U list/NIST SRE’12) −10 10 0.1 0.2 0.3 LLRs pdf H1 H2 −10 10 0.5 1 η error FNMR FMR ◮ Example: DCF(1:1, η = 0) vs. DCF(1:100, η ≈ 4.6) −10 10 0.1 0.2 actual η Cost DCF(˜ π = 1

2)

−10 10 0.1 0.2 η Cost DCF(˜ π =

1 101 )

⇒ actual vs. minimum DCF: calibration loss ⇒ LLR meaning: aligning scores for Bayesian support

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 20/32

slide-37
SLIDE 37

Metrics and Examples Example on Decision Cost Functions (DCFs)

◮ Speaker recognition ivec/PLDA scores (I4U list/NIST SRE’12) −10 10 0.1 0.2 0.3 LLRs pdf H1 H2 −10 10 0.5 1 η error FNMR FMR ◮ Example: DCF(1:1, η = 0) vs. DCF(1:100, η ≈ 4.6) −10 10 0.1 0.2 minimum actual η Cost DCF(˜ π = 1

2)

−10 10 0.1 0.2 η Cost DCF(˜ π =

1 101 )

⇒ actual vs. minimum DCF: calibration loss ⇒ LLR meaning: aligning scores for Bayesian support

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 20/32

slide-38
SLIDE 38

Metrics and Examples Visualizing DCFs

◮ Applied Probability of Error (APE) curve

◮ Simulating DCFs on multiple operating points ◮ default: all LLRs = 0, i.e.: DCF = ˜

π + (1 − ˜ π)

◮ Area-under-APE: cost of LLR scores

⇒ Goodness of LLRs: Cllr

−10 −5 5 10 0.025 0.050 logit ˜ π = −η P(error) actual DCF minimum DCF default

[5] N. Br¨ ummer: FoCal: Tools for Fusion and Calibration of automatic speaker detection systems, Tech.Rep., 2005. [6] D.A. van Leeuwen and N. Br¨ ummer: An Introduction to Application-Independent Evaluation of Speaker Recognition Systems, Speaker Classification I: Fundamentals, Features, and Methods, Springer LNCS, 2007. Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 21/32

slide-39
SLIDE 39

Metrics and Examples Visualizing DCFs

◮ Applied Probability of Error (APE) curve

◮ Simulating DCFs on multiple operating points ◮ default: all LLRs = 0, i.e.: DCF = ˜

π + (1 − ˜ π)

◮ Area-under-APE: cost of LLR scores

⇒ Goodness of LLRs: Cllr

−10 −5 5 10 0.025 0.050 logit ˜ π = −η P(error) actual DCF minimum DCF default Cllr

[5] N. Br¨ ummer: FoCal: Tools for Fusion and Calibration of automatic speaker detection systems, Tech.Rep., 2005. [6] D.A. van Leeuwen and N. Br¨ ummer: An Introduction to Application-Independent Evaluation of Speaker Recognition Systems, Speaker Classification I: Fundamentals, Features, and Methods, Springer LNCS, 2007. Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 21/32

slide-40
SLIDE 40

Metrics and Examples Visualizing DCFs

◮ Applied Probability of Error (APE) curve

◮ Simulating DCFs on multiple operating points ◮ default: all LLRs = 0, i.e.: DCF = ˜

π + (1 − ˜ π)

◮ Area-under-APE: cost of LLR scores

⇒ Goodness of LLRs: Cllr

−10 −5 5 10 0.025 0.050 logit ˜ π = −η P(error) actual DCF minimum DCF default Cllr Cmin

llr [5] N. Br¨ ummer: FoCal: Tools for Fusion and Calibration of automatic speaker detection systems, Tech.Rep., 2005. [6] D.A. van Leeuwen and N. Br¨ ummer: An Introduction to Application-Independent Evaluation of Speaker Recognition Systems, Speaker Classification I: Fundamentals, Features, and Methods, Springer LNCS, 2007. Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 21/32

slide-41
SLIDE 41

Metrics and Examples Visualizing DCFs

◮ Applied Probability of Error (APE) curve

◮ Simulating DCFs on multiple operating points ◮ default: all LLRs = 0, i.e.: DCF = ˜

π + (1 − ˜ π)

◮ Area-under-APE: cost of LLR scores

⇒ Goodness of LLRs: Cllr

−10 −5 5 10 0.025 0.050 EER: 0.5%

interpolation by max. minDCF / min. ROC Convex Hull (ROCCH)

logit ˜ π = −η P(error) actual DCF minimum DCF default Cllr Cmin

llr [5] N. Br¨ ummer: FoCal: Tools for Fusion and Calibration of automatic speaker detection systems, Tech.Rep., 2005. [6] D.A. van Leeuwen and N. Br¨ ummer: An Introduction to Application-Independent Evaluation of Speaker Recognition Systems, Speaker Classification I: Fundamentals, Features, and Methods, Springer LNCS, 2007. Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 21/32

slide-42
SLIDE 42

Metrics and Examples Normalized Bayesian Error Rate (NBER)

◮ APE-plot visually misleading on error impact

◮ EER operating point: lots of scores to mismatch ◮ FMR1000 operating point: few scores to mismatch

◮ Normalizing by default performance

⇒ wider range of operating points can be compared

−15 −10 −5 5 10 15 0.5 1 η = − logit ˜ π NBER default (LLRs=0) actual DCFnorm minimum DCFnorm 30 FMs 30 FNMs

[4] N. Br¨ ummer and E. de Villiers: The BOSARIS Toolkit User Guide: Theory, Algorithms and Code for Binary Classifier Score, Tech.Rep., AGNITIO Research, December 2011. Note: in the BOSARIS toolkit, the x-axis is swapped, i.e.: depicting purely the effective prior. Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 22/32

slide-43
SLIDE 43

Metrics and Examples Revisiting ISO/IEC JTC1 SC37 SD11

FNMR, FMR → DET

P (H1) P (H2) = π 1−π

⇒ π Cf1, Cf2 DCF → APE & NBER

  • ECE

Cllr

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 23/32

slide-44
SLIDE 44

Metrics and Examples Empirical Cross-Entropy (ECE)

◮ Objective measure of performance ◮ Motivation by Information Theory

◮ Prior entropy

Evidence

− − − − − − − − − − →

Information gain Posterior entropy

◮ Divergence of system to Grund-of-Truth (GoT) ◮ ECE: approximating Kullback-Leibler divergence DGoT||system

Hsystem(H1, H2) DGoT||system(H1, H2 | LLRs) HGoT(H1, H2 | LLRs) information (LLRs)

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 24/32

slide-45
SLIDE 45

Metrics and Examples Empirical Cross-Entropy (ECE)

◮ We expect the reference, but obtain the system’s LLRs ◮ Measuring performance of LR in terms of uncertainty

◮ The lower the better

Calibration loss: overall performance ⇔ discriminating power

◮ Cllr at log(odds) = 0

⇒ no information on H1/H2 prior

−6 −4 −2 2 4 6 0.1 0.2 Prior log10(odds) ECE System Optimal calibration default (LLRs=0)

[7] D. Ramos Castro and J. Gonz´ alez Rodr´ ıguez: Cross-entropy Analysis of the Information in Forensic Speaker Recognition, Odyssey, 2008. Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 25/32

slide-46
SLIDE 46

Metrics and Examples Empirical Cross-Entropy (ECE)

◮ We expect the reference, but obtain the system’s LLRs ◮ Measuring performance of LR in terms of uncertainty

◮ The lower the better

Calibration loss: overall performance ⇔ discriminating power

◮ Cllr at log(odds) = 0

⇒ no information on H1/H2 prior

−6 −4 −2 2 4 6 0.1 0.2 Cllr Cmin

llr

Prior log10(odds) ECE System Optimal calibration default (LLRs=0)

[7] D. Ramos Castro and J. Gonz´ alez Rodr´ ıguez: Cross-entropy Analysis of the Information in Forensic Speaker Recognition, Odyssey, 2008. Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 25/32

slide-47
SLIDE 47

Metrics and Examples Examples

◮ Signature recognition [8]

◮ Performance of feature space normalization ◮ Simulation of application-independent decision performances

−2 2 0.5 1 Prior log10(odds) ECE Baseline GMM – UBM −2 2 0.5 1 Prior log10(odds) ECE w/ feature warping

[8] A. Nautsch, C. Rathgeb, C. Busch: Bridging Gaps: An Application of Feature Warping to Online Signature Verification, ICCST, 2014. Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 26/32

slide-48
SLIDE 48

Metrics and Examples Examples

◮ Speaker recognition [9]

◮ Overview of application-dependent decision costs in 10 dB/10 s ◮ Conventional score normalization vs. quality-based

−10 10 0.5 1 η = − logit ˜ π NBER default Baseline DCFnorm Baseline minDCFnorm Baseline 30 FMs Baseline 30 FNMs w/ qual. DCFnorm w/ qual. minDCFnorm w/ qual. 30 FMs w/ qual. 30 FNMs

[9] A. Nautsch, R. Saeidi, C. Rathgeb, C. Busch: Analysis of mutual duration and noise effects in speaker recognition: benefits of condition-matched cohort selection in score normalization, Interspeech, 2015. Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 27/32

slide-49
SLIDE 49

Metrics and Examples Examples

◮ Speaker recognition [10]

◮ Examining calibration schemes in 55 quality conditions ◮ Discrimination vs. calibration loss on 55-pooled ◮ Goal: approx. binning performance, avoiding binning

conventional QMF FQE binning 0.15 0.20 0.25 0.35 Calibration scheme Cllr [bits] discrimination loss calibration loss

[10] A. Nautsch, R. Saeidi, C. Rathgeb, C. Busch: Robustness of Quality-based Score Calibration of Speaker Recognition Systems with respect to low-SNR and short-duration conditions, Odyssey, 2016. (to appear) Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 28/32

slide-50
SLIDE 50

Metrics and Examples Examples

◮ Recurring challenges in biometrics

◮ NIST Speaker Recognition Evaluation (SRE)

⇒ DCFs (since 1996) & Cllr (since 2006)

◮ ICDAR Competition on Signature Verification and Writer

Identification (SigWIcomp) ⇒ Cllr & Cmin

llr

(both since 2011)

◮ Non-biometric forensics [11]

◮ Glass objects ◮ Car paints ◮ Inks [11] G. Zadora, A. Martyna, D. Ramos, C. Aitken: Statistical Analysis in Forensic Science: Evidential Values of Multivariate Physicochemical Data, John Wiley and Sons, 2014. Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 29/32

slide-51
SLIDE 51

Conclusion Summary

◮ Bayesian decision framework

◮ Bayes theorem & decision rule enploying costs ◮ Biometric systems: generator of Bayesian support (LLRs) ◮ Decisions by posterior knowledge of priors and LLR score

◮ Score-to-LLR calibration: meaningful LLRs

◮ Necessary step, requiring a calibration data set ◮ Essential for validation/accredetation

◮ Performance reporting

◮ Decoupled decision policy ◮ APE curves ◮ NBER diagrams ◮ ECE plots ◮ Scalars: actDCF, minDCF, Cllr & Cmin

llr

0.1 Cllr [bits] discrimination loss calibration loss

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 30/32

slide-52
SLIDE 52

Conclusion Summary

◮ Bayesian decision framework

◮ Bayes theorem & decision rule enploying costs ◮ Biometric systems: generator of Bayesian support (LLRs) ◮ Decisions by posterior knowledge of priors and LLR score

◮ Score-to-LLR calibration: meaningful LLRs

◮ Necessary step, requiring a calibration data set ◮ Essential for validation/accredetation

◮ Performance reporting

◮ Decoupled decision policy ◮ APE curves ◮ NBER diagrams ◮ ECE plots ◮ Scalars: actDCF, minDCF, Cllr & Cmin

llr

0.1 Cllr [bits] discrimination loss calibration loss

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 30/32

slide-53
SLIDE 53

Conclusion Perspectives

◮ From forensics to biometrics in general ◮ Forensics: distinct separation of role provinces

Suspect reference Feature extraction Recovered probe Feature extraction Evidence analysis (comparison) Score Guilty (Accept) Not-Guilty (Reject)

Province of the forensic scientist Province of the court

⇒ Non-forensic biometric companion/equivalent

. . . vendor system . . . customer decision policy

Note: neither forensic scientists nor courts shall be automated, its an analogue. Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 31/32

slide-54
SLIDE 54

Conclusion Application fields

◮ Operating point independent performance reporting

◮ Discrimination loss → Goodness of scores w/o calibration ◮ System calibration (meaningful) ◮ Forensic state-of-the-art

⇒ European Network of Forensic Science Institutre (ENFSI): adopted Bayesian methodology (strong recommendation)

◮ Fix-operational testing: no need

⇒ But: fundamental in technology testing

This work has been funded by the Center for Advanced Security Research Darmstadt (CASED), and the Hesse government (project no. 467/15-09, BioMobile). Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 32/32

slide-55
SLIDE 55
slide-56
SLIDE 56

Evaluation of evidence strength

◮ Metrics in the Bayesian Framework

◮ Application-independent generalization [2]:

Goodness of (Log-Likelihood Ratio) scores Cllr

Cllr =

0.5 |H1|

  • S∈H1

ld

  • 1 + e−S

+ 0.5

|H2|

  • S∈H2

ld

  • 1 + eS

◮ Information-theoretic generalization [7]:

Empirical Cross-Entropy (ECE)

ECE =

π |H1|

  • S∈H1

ld

  • 1 + e−(S

π 1−π )

+ 1−π

|H2|

  • S∈H2

ld

  • 1 + eS

π 1−π

  • ◮ Metrics represent (cross-) entropy in bits

◮ Performance reporting with decoupled decision layer

[2] N. Br¨ ummer and J. du Preez: Application Independent Evaluation of Speaker Detection, Computer Speech and Language, 2006. [7] D. Ramos Castro and J. Gonz´ alez Rodr´ ıguez: Cross-entropy Analysis of the Information in Forensic Speaker Recognition, Odyssey, 2008. Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 2/3

slide-57
SLIDE 57

Brief introduction to calibration

◮ Linear: logistic regression (robust model)

◮ Transform: Scal. = w0 + w1 S

◮ Non-linear: Pool-Adjacent-Violator (PAV) algorithm (optimal)

◮ Transform: monotonic, non-parametric mapping function

−15 −10 −5 5 10 15 −10 10 System scores PAV-calibrated LLRs

Nautsch, Ramos, et al. Bayesian Biometrics / NIST IBPC’16, Gaithersburg, 03.05.2016 3/3