A Bayesian nonparametric method for the LR assessment in case of - - PowerPoint PPT Presentation

a bayesian nonparametric method for the lr assessment in
SMART_READER_LITE
LIVE PREVIEW

A Bayesian nonparametric method for the LR assessment in case of - - PowerPoint PPT Presentation

A Bayesian nonparametric method for the LR assessment in case of rare type match Giulia Cereda October 8, 2015 Giulia Cereda () Short title October 8, 2015 1 / 33 Forensic Statistics Ingredients: Giulia Cereda () Short title October 8,


slide-1
SLIDE 1

A Bayesian nonparametric method for the LR assessment in case of rare type match

Giulia Cereda October 8, 2015

Giulia Cereda () Short title October 8, 2015 1 / 33

slide-2
SLIDE 2

Forensic Statistics

Ingredients:

Giulia Cereda () Short title October 8, 2015 2 / 33

slide-3
SLIDE 3

Forensic Statistics

Ingredients: Crime case

Giulia Cereda () Short title October 8, 2015 2 / 33

slide-4
SLIDE 4

Forensic Statistics

Ingredients: Crime case Evidence (E)

Giulia Cereda () Short title October 8, 2015 2 / 33

slide-5
SLIDE 5

Forensic Statistics

Ingredients: Crime case Evidence (E) 2 Hypotheses of Interest: Hp vs Hd

Giulia Cereda () Short title October 8, 2015 2 / 33

slide-6
SLIDE 6

Forensic Statistics

Ingredients: Crime case Evidence (E) 2 Hypotheses of Interest: Hp vs Hd Background (B)

Giulia Cereda () Short title October 8, 2015 2 / 33

slide-7
SLIDE 7

Forensic Statistics

Ingredients: Crime case Evidence (E) 2 Hypotheses of Interest: Hp vs Hd Background (B) D = (E,B).

Giulia Cereda () Short title October 8, 2015 2 / 33

slide-8
SLIDE 8

Forensic Statistics

Ingredients: Crime case Evidence (E) 2 Hypotheses of Interest: Hp vs Hd Background (B) D = (E,B). The court asks for the likelihood ratio

Giulia Cereda () Short title October 8, 2015 2 / 33

slide-9
SLIDE 9

Forensic Statistics

Ingredients: Crime case Evidence (E) 2 Hypotheses of Interest: Hp vs Hd Background (B) D = (E,B). The court asks for the likelihood ratio Pr(Hp | D) Pr(Hd | D) = Pr(D | Hp) Pr(D | Hd)

  • LR

Pr(Hp) Pr(Hd)

Giulia Cereda () Short title October 8, 2015 2 / 33

slide-10
SLIDE 10

Example

Ingredients:

Giulia Cereda () Short title October 8, 2015 3 / 33

slide-11
SLIDE 11

Example

Ingredients: Crime case: murder

Giulia Cereda () Short title October 8, 2015 3 / 33

slide-12
SLIDE 12

Example

Ingredients: Crime case: murder Evidence (E): profile of the DNA trace found on the crime scene matches the suspect’s DNA profile.

Giulia Cereda () Short title October 8, 2015 3 / 33

slide-13
SLIDE 13

Example

Ingredients: Crime case: murder Evidence (E): profile of the DNA trace found on the crime scene matches the suspect’s DNA profile. 2 Hypotheses of Interest:

Giulia Cereda () Short title October 8, 2015 3 / 33

slide-14
SLIDE 14

Example

Ingredients: Crime case: murder Evidence (E): profile of the DNA trace found on the crime scene matches the suspect’s DNA profile. 2 Hypotheses of Interest:

Hp: The suspect left the stain

Giulia Cereda () Short title October 8, 2015 3 / 33

slide-15
SLIDE 15

Example

Ingredients: Crime case: murder Evidence (E): profile of the DNA trace found on the crime scene matches the suspect’s DNA profile. 2 Hypotheses of Interest:

Hp: The suspect left the stain Hd: Someone else left the stain

Giulia Cereda () Short title October 8, 2015 3 / 33

slide-16
SLIDE 16

Example

Ingredients: Crime case: murder Evidence (E): profile of the DNA trace found on the crime scene matches the suspect’s DNA profile. 2 Hypotheses of Interest:

Hp: The suspect left the stain Hd: Someone else left the stain

Background (B): database of DNA profiles from the population of possible perpetrators

Giulia Cereda () Short title October 8, 2015 3 / 33

slide-17
SLIDE 17

DNA profiles

Giulia Cereda () Short title October 8, 2015 4 / 33

slide-18
SLIDE 18

DNA profiles

A DNA profile is a list of integers h = (4 − 5 − 2 − 10) that code some characteristics in some portions of the DNA sequence of an individual: different persons can share the same profile.

Giulia Cereda () Short title October 8, 2015 4 / 33

slide-19
SLIDE 19

DNA profiles

A DNA profile is a list of integers h = (4 − 5 − 2 − 10) that code some characteristics in some portions of the DNA sequence of an individual: different persons can share the same profile. For Hp the match is a sure event,

Giulia Cereda () Short title October 8, 2015 4 / 33

slide-20
SLIDE 20

DNA profiles

A DNA profile is a list of integers h = (4 − 5 − 2 − 10) that code some characteristics in some portions of the DNA sequence of an individual: different persons can share the same profile. For Hp the match is a sure event, For Hd the match is a random event with probability ph = frequency of the profile h of the suspect in the population of possible perpetrators.

Giulia Cereda () Short title October 8, 2015 4 / 33

slide-21
SLIDE 21

DNA database

Database: a list of DNA profiles from a sample from the population of possible perpetrators

Giulia Cereda () Short title October 8, 2015 5 / 33

slide-22
SLIDE 22

DNA database

Database: a list of DNA profiles from a sample from the population of possible perpetrators DATABASE of size 10 Person 1 (4 − 10 − 6 − 7) Person 2 (3 − 5 − 6 − 8) Person 3 (3 − 7 − 8 − 10) Person 4 (10 − 1 − 4 − 5) Person 5 (3 − 7 − 8 − 10) Person 6 (3 − 7 − 8 − 10) Person 7 (1 − 5 − 7 − 2) Person 8 (3 − 7 − 8 − 10) Person 9 (3 − 5 − 6 − 8) Person 10 (3 − 7 − 8 − 10)

Giulia Cereda () Short title October 8, 2015 5 / 33

slide-23
SLIDE 23

DNA database

Database: a list of DNA profiles from a sample from the population of possible perpetrators DATABASE of size 10 Person 1 (4 − 10 − 6 − 7) Person 2 (3 − 5 − 6 − 8) Person 3 (3 − 7 − 8 − 10) Person 4 (10 − 1 − 4 − 5) Person 5 (3 − 7 − 8 − 10) Person 6 (3 − 7 − 8 − 10) Person 7 (1 − 5 − 7 − 2) Person 8 (3 − 7 − 8 − 10) Person 9 (3 − 5 − 6 − 8) Person 10 (3 − 7 − 8 − 10) The database is used to find out the rarity of the matching profile.

Giulia Cereda () Short title October 8, 2015 5 / 33

slide-24
SLIDE 24

LR assessment in the rare type match case

My research focuses on the LR assessment in the rare type match case, that is:

Giulia Cereda () Short title October 8, 2015 6 / 33

slide-25
SLIDE 25

LR assessment in the rare type match case

My research focuses on the LR assessment in the rare type match case, that is: A match between the suspect’s DNA profile and the crime stain’s DNA profile.

Giulia Cereda () Short title October 8, 2015 6 / 33

slide-26
SLIDE 26

LR assessment in the rare type match case

My research focuses on the LR assessment in the rare type match case, that is: A match between the suspect’s DNA profile and the crime stain’s DNA profile. This profile is not contained in the database B.

Giulia Cereda () Short title October 8, 2015 6 / 33

slide-27
SLIDE 27

LR assessment in the rare type match case

My research focuses on the LR assessment in the rare type match case, that is: A match between the suspect’s DNA profile and the crime stain’s DNA profile. This profile is not contained in the database B. Especially if the database is big, the profile seems to be rare.

Giulia Cereda () Short title October 8, 2015 6 / 33

slide-28
SLIDE 28

LR assessment in the rare type match case

My research focuses on the LR assessment in the rare type match case, that is: A match between the suspect’s DNA profile and the crime stain’s DNA profile. This profile is not contained in the database B. Especially if the database is big, the profile seems to be rare. How rare?

Giulia Cereda () Short title October 8, 2015 6 / 33

slide-29
SLIDE 29

Previous models

Giulia Cereda () Short title October 8, 2015 7 / 33

slide-30
SLIDE 30

Previous models

Frequentist model: (Cereda 2015) Frequentist approach to LR assessment in case of rare haplotype match arXiv:1502.04083

Giulia Cereda () Short title October 8, 2015 7 / 33

slide-31
SLIDE 31

Previous models

Frequentist model: (Cereda 2015) Frequentist approach to LR assessment in case of rare haplotype match arXiv:1502.04083 Bayesian model: (Cereda 2015) Full Bayesian approach to LR assessment in case of rare haplotype match arXiv:1502.02406

Giulia Cereda () Short title October 8, 2015 7 / 33

slide-32
SLIDE 32

Previous models

Frequentist model: (Cereda 2015) Frequentist approach to LR assessment in case of rare haplotype match arXiv:1502.04083 Bayesian model: (Cereda 2015) Full Bayesian approach to LR assessment in case of rare haplotype match arXiv:1502.02406 (Cereda 2015) Nonparametric Bayesian approach to LR assessment in case of rare haplotype match arXiv:1506.08444

Giulia Cereda () Short title October 8, 2015 7 / 33

slide-33
SLIDE 33

Assumptions

Assumption 1

There are so many different DNA types that they may be considered infinite.

Giulia Cereda () Short title October 8, 2015 8 / 33

slide-34
SLIDE 34

Assumptions

Assumption 1

There are so many different DNA types that they may be considered infinite. Parameter: p = (pt|t ∈ T), T an infinite countable set,pt > 0, pt = 1, to represent the (unknown) frequencies of all DNA types in Nature.

Giulia Cereda () Short title October 8, 2015 8 / 33

slide-35
SLIDE 35

Assumptions

Assumption 1

There are so many different DNA types that they may be considered infinite. Parameter: p = (pt|t ∈ T), T an infinite countable set,pt > 0, pt = 1, to represent the (unknown) frequencies of all DNA types in Nature.

Assumption 2

The particular list of integers that forms a DNA type is just a category: no structure assumed.

Giulia Cereda () Short title October 8, 2015 8 / 33

slide-36
SLIDE 36

Assumptions

Assumption 1

There are so many different DNA types that they may be considered infinite. Parameter: p = (pt|t ∈ T), T an infinite countable set,pt > 0, pt = 1, to represent the (unknown) frequencies of all DNA types in Nature.

Assumption 2

The particular list of integers that forms a DNA type is just a category: no structure assumed. “DNA types” or “colors” is now the same.

Giulia Cereda () Short title October 8, 2015 8 / 33

slide-37
SLIDE 37

Random partitions of [n]

Let [n] denote the set [n] = {1, 2, ..., n}.

Giulia Cereda () Short title October 8, 2015 9 / 33

slide-38
SLIDE 38

Random partitions of [n]

Let [n] denote the set [n] = {1, 2, ..., n}. A partition of the set [n] will be denoted as π[n].

Giulia Cereda () Short title October 8, 2015 9 / 33

slide-39
SLIDE 39

Random partitions of [n]

Let [n] denote the set [n] = {1, 2, ..., n}. A partition of the set [n] will be denoted as π[n]. Random partitions on the set [n] will be denoted as Π[n].

Giulia Cereda () Short title October 8, 2015 9 / 33

slide-40
SLIDE 40

DNA database can be reduced

DATABASE of size 10 Person 1 (4 − 10 − 6 − 7) Person 2 (3 − 5 − 6 − 8) Person 3 (3 − 7 − 8 − 10) Person 4 (10 − 1 − 4 − 5) Person 5 (3 − 7 − 8 − 10) Person 6 (3 − 7 − 8 − 10) Person 7 (1 − 5 − 7 − 2) Person 8 (3 − 7 − 8 − 10) Person 9 (3 − 5 − 6 − 8) Person 10 (3 − 7 − 8 − 10)

Giulia Cereda () Short title October 8, 2015 10 / 33

slide-41
SLIDE 41

DNA database can be reduced

DATABASE of size 10 Person 1 (4 − 10 − 6 − 7) Person 2 (3 − 5 − 6 − 8) Person 3 (3 − 7 − 8 − 10) Person 4 (10 − 1 − 4 − 5) Person 5 (3 − 7 − 8 − 10) Person 6 (3 − 7 − 8 − 10) Person 7 (1 − 5 − 7 − 2) Person 8 (3 − 7 − 8 − 10) Person 9 (3 − 5 − 6 − 8) Person 10 (3 − 7 − 8 − 10)

Giulia Cereda () Short title October 8, 2015 11 / 33

slide-42
SLIDE 42

DNA database can be reduced

DATABASE of size 10 Person 1 (4 − 10 − 6 − 7) Person 2 (3 − 5 − 6 − 8) Person 3 (3 − 7 − 8 − 10) Person 4 (10 − 1 − 4 − 5) Person 5 (3 − 7 − 8 − 10) Person 6 (3 − 7 − 8 − 10) Person 7 (1 − 5 − 7 − 2) Person 8 (3 − 7 − 8 − 10) Person 9 (3 − 5 − 6 − 8) Person 10 (3 − 7 − 8 − 10) Assumption 2 → data can be replaces by the equivalence classes on the indices of the relation “to have the same DNA type”. This is a partition of the set [n] : {{1}, {2, 9}, {3, 5, 6, 8, 10}, {4}, {7}}

Giulia Cereda () Short title October 8, 2015 11 / 33

slide-43
SLIDE 43

Reduced data

Giulia Cereda () Short title October 8, 2015 12 / 33

slide-44
SLIDE 44

Reduced data

Data D is made of the database + 2 new observations

Giulia Cereda () Short title October 8, 2015 12 / 33

slide-45
SLIDE 45

Reduced data

Data D is made of the database + 2 new observations D = π[n+2] partition of the set {1, 2, ..., n + 2}

Giulia Cereda () Short title October 8, 2015 12 / 33

slide-46
SLIDE 46

Reduced data

Data D is made of the database + 2 new observations D = π[n+2] partition of the set {1, 2, ..., n + 2} Example: Database → π[10] = {{1}, {2, 9}, {3, 5, 6, 8, 10}, {4}, {7}}

Giulia Cereda () Short title October 8, 2015 12 / 33

slide-47
SLIDE 47

Reduced data

Data D is made of the database + 2 new observations D = π[n+2] partition of the set {1, 2, ..., n + 2} Example: Database → π[10] = {{1}, {2, 9}, {3, 5, 6, 8, 10}, {4}, {7}} D → π[12] = {{1}, {2, 9}, {3, 5, 6, 8, 10}, {4}, {7}, {11, 12}}

Giulia Cereda () Short title October 8, 2015 12 / 33

slide-48
SLIDE 48

Reduced data

Data D is made of the database + 2 new observations D = π[n+2] partition of the set {1, 2, ..., n + 2} Example: Database → π[10] = {{1}, {2, 9}, {3, 5, 6, 8, 10}, {4}, {7}} D → π[12] = {{1}, {2, 9}, {3, 5, 6, 8, 10}, {4}, {7}, {11, 12}} We can see the data as a random variable. In that case, D = Π[n+2].

Giulia Cereda () Short title October 8, 2015 12 / 33

slide-49
SLIDE 49

The distribution of D = Π[n+2] depends on p. However, it does not depend on the order of the pi.

Giulia Cereda () Short title October 8, 2015 13 / 33

slide-50
SLIDE 50

The distribution of D = Π[n+2] depends on p. However, it does not depend on the order of the pi. ↓ We can consider directly the ordered vector p ∈ ∇∞ = {(p1, p2, ....), p1 ≥ p2 ≥ ... > 0, pi = 1}.

Giulia Cereda () Short title October 8, 2015 13 / 33

slide-51
SLIDE 51

The distribution of D = Π[n+2] depends on p. However, it does not depend on the order of the pi. ↓ We can consider directly the ordered vector p ∈ ∇∞ = {(p1, p2, ....), p1 ≥ p2 ≥ ... > 0, pi = 1}. For instance, p3= the frequency of the third most frequent DNA type in Nature.

Giulia Cereda () Short title October 8, 2015 13 / 33

slide-52
SLIDE 52

Prior distribution on p ∈ ∇∞

Bayesian nonparametrics: we need a prior for the parameter p.

Giulia Cereda () Short title October 8, 2015 14 / 33

slide-53
SLIDE 53

Prior distribution on p ∈ ∇∞

Bayesian nonparametrics: we need a prior for the parameter p. Two parameter Poisson Dirichlet distribution.

Giulia Cereda () Short title October 8, 2015 14 / 33

slide-54
SLIDE 54

Prior distribution on p ∈ ∇∞

Bayesian nonparametrics: we need a prior for the parameter p. Two parameter Poisson Dirichlet distribution. Parameters: 0 < α < 1, θ > −α

Giulia Cereda () Short title October 8, 2015 14 / 33

slide-55
SLIDE 55

The model (first part)

Giulia Cereda () Short title October 8, 2015 15 / 33

slide-56
SLIDE 56

The model (first part)

A, Θ ∇∞ ∋ P

slide-57
SLIDE 57

The model (first part)

A, Θ ∇∞ ∋ P P|α, θ ∼ PD(α, θ)

slide-58
SLIDE 58

The model (first part)

A, Θ ∇∞ ∋ P P|α, θ ∼ PD(α, θ) ... X1 N ∋ X2 Xn Xi = j → the i-th observation has the jth most common type in Nature.

Giulia Cereda () Short title October 8, 2015 15 / 33

slide-59
SLIDE 59

The model (first part)

A, Θ ∇∞ ∋ P P|α, θ ∼ PD(α, θ) ... X1 N ∋ X2 Xn Xi = j → the i-th observation has the jth most common type in Nature.

Giulia Cereda () Short title October 8, 2015 15 / 33

slide-60
SLIDE 60

The model (first part)

A, Θ P ∇∞ ∋ (A, Θ) ∼ f P|α, θ ∼ PD(α, θ) ... X1 N ∋ X2 Xn

slide-61
SLIDE 61

The model (first part)

A, Θ P ∇∞ ∋ (A, Θ) ∼ f P|α, θ ∼ PD(α, θ) ... X1 N ∋ X2 Xn Suspect Xn+1

slide-62
SLIDE 62

The model (first part)

A, Θ P ∇∞ ∋ (A, Θ) ∼ f P|α, θ ∼ PD(α, θ) ... X1 N ∋ X2 Xn Suspect Xn+1 X1, ..., Xn+1|p ∼i.i.d p

slide-63
SLIDE 63

The model (first part)

A, Θ P ∇∞ ∋ (A, Θ) ∼ f P|α, θ ∼ PD(α, θ) ... X1 N ∋ X2 Xn Suspect Xn+1 X1, ..., Xn+1|p ∼i.i.d p Crime stain Xn+2 H {Hp, Hd} ∋

slide-64
SLIDE 64

The model (first part)

A, Θ P ∇∞ ∋ (A, Θ) ∼ f P|α, θ ∼ PD(α, θ) ... X1 N ∋ X2 Xn Suspect Xn+1 X1, ..., Xn+1|p ∼i.i.d p Crime stain Xn+2 H {Hp, Hd} ∋ Xn+2|p, H, xn+1 ∼

  • δxn+1

if H = Hp p if H = Hd

Giulia Cereda () Short title October 8, 2015 16 / 33

slide-65
SLIDE 65

The model (first part)

A, Θ P ∇∞ ∋ (A, Θ) ∼ f P|α, θ ∼ PD(α, θ) ... X1 N ∋ X2 Xn Suspect Xn+1 X1, ..., Xn+1|p ∼i.i.d p Crime stain Xn+2 H {Hp, Hd} ∋ Xn+2|p, H, xn+1 ∼

  • δxn+1

if H = Hp p if H = Hd

Giulia Cereda () Short title October 8, 2015 16 / 33

slide-66
SLIDE 66

The model (first part)

A, Θ P ... X1 X2 Xn Xn+1 Xn+2 H

Giulia Cereda () Short title October 8, 2015 17 / 33

slide-67
SLIDE 67

Random partitions

Some notation: Given X1, ..., Xn ∈ N, random variables, Π[n](X1, X2, ..., Xn) is the random partition defined by the equivalence classes of i ∼ j iff Xi = Xj.

Giulia Cereda () Short title October 8, 2015 18 / 33

slide-68
SLIDE 68

Random partitions

Some notation: Given X1, ..., Xn ∈ N, random variables, Π[n](X1, X2, ..., Xn) is the random partition defined by the equivalence classes of i ∼ j iff Xi = Xj. X1, ..., Xn − → Π[n] = πDb

[n]

X1, ..., Xn, Xn+1 − → Π[n+1] = πDb+

[n+1]

X1, ..., Xn, Xn+1, Xn+2 − → Π[n+2] = πDb++

[n+2]

Giulia Cereda () Short title October 8, 2015 18 / 33

slide-69
SLIDE 69

Random partitions

Some notation: Given X1, ..., Xn ∈ N, random variables, Π[n](X1, X2, ..., Xn) is the random partition defined by the equivalence classes of i ∼ j iff Xi = Xj. X1, ..., Xn − → Π[n] = πDb

[n]

X1, ..., Xn, Xn+1 − → Π[n+1] = πDb+

[n+1]

X1, ..., Xn, Xn+1, Xn+2 − → Π[n+2] = πDb++

[n+2]

X1, ..., Xn are not observed, but generates the same partition as the

  • riginal database.

Data can be defined as D = Π[n+2].

Giulia Cereda () Short title October 8, 2015 18 / 33

slide-70
SLIDE 70

The complete model

A, Θ H P X1 X2 Xn Xn+1 Xn+2 D

Giulia Cereda () Short title October 8, 2015 19 / 33

slide-71
SLIDE 71

Pitman sampling formula

Giulia Cereda () Short title October 8, 2015 20 / 33

slide-72
SLIDE 72

Pitman sampling formula

P ∼ PD(α, θ) X1, X2, ..., Xn|P = p ∼i.i.d p

Giulia Cereda () Short title October 8, 2015 20 / 33

slide-73
SLIDE 73

Pitman sampling formula

P ∼ PD(α, θ) X1, X2, ..., Xn|P = p ∼i.i.d p then Π[n] = Π[n](X1, ..., Xn) has the following distribution:

Giulia Cereda () Short title October 8, 2015 20 / 33

slide-74
SLIDE 74

Pitman sampling formula

P ∼ PD(α, θ) X1, X2, ..., Xn|P = p ∼i.i.d p then Π[n] = Π[n](X1, ..., Xn) has the following distribution: Pr(Π[n] = π[n]|α, θ) = Pn

α,θ(π[n]) = [θ + α]k−1;α

[θ + 1]n−1;1

k

  • i=1

[1 − α]ni−1;1,

Giulia Cereda () Short title October 8, 2015 20 / 33

slide-75
SLIDE 75

Pitman sampling formula

P ∼ PD(α, θ) X1, X2, ..., Xn|P = p ∼i.i.d p then Π[n] = Π[n](X1, ..., Xn) has the following distribution: Pr(Π[n] = π[n]|α, θ) = Pn

α,θ(π[n]) = [θ + α]k−1;α

[θ + 1]n−1;1

k

  • i=1

[1 − α]ni−1;1, In our model Pr(D|α, θ, h) = Pr(Π[n+2] = πDb++

[n+2] |α, θ, h) =

  • Pn+2

α,θ (πDb++ [n+2] )

if h = Hd Pn+1

α,θ (πDb+ [n+1])

if h = Hp

Giulia Cereda () Short title October 8, 2015 20 / 33

slide-76
SLIDE 76

The model, simplified

A, Θ H D

Giulia Cereda () Short title October 8, 2015 21 / 33

slide-77
SLIDE 77

The model, simplified

A, Θ H D D = Π[n+2].

Giulia Cereda () Short title October 8, 2015 21 / 33

slide-78
SLIDE 78

Lemma

Giulia Cereda () Short title October 8, 2015 22 / 33

slide-79
SLIDE 79

Lemma

A H X Y

Lemma

Given four random variables A, H, X and Y , as above, the likelihood function for h, given X = x and Y = y, satisfies lik(h | x, y) ∝ E(p(y | x, A, h) | X = x).

Giulia Cereda () Short title October 8, 2015 22 / 33

slide-80
SLIDE 80

Lemma

A, Θ H Π[n+1] Π[n+2]

Giulia Cereda () Short title October 8, 2015 23 / 33

slide-81
SLIDE 81

Lemma

A, Θ H Π[n+1] Π[n+2] lik(h | π[n+1], π[n+2]) ∝ E(p(π[n+2] | π[n+1], A, Θ, h) | Π[n+1] = π[n+1]).

Giulia Cereda () Short title October 8, 2015 23 / 33

slide-82
SLIDE 82

Likelihood ratio

LR = p(π[n+2]|Hp) p(π[n+2]|Hd) = p(π[n+1], π[n+2]|Hp) p(π[n+1], π[n+2]|Hd) = lik(Hp|π[n+1], π[n+2]) lik(Hd|π[n+1], π[n+2])

Giulia Cereda () Short title October 8, 2015 24 / 33

slide-83
SLIDE 83

Likelihood ratio

LR = p(π[n+2]|Hp) p(π[n+2]|Hd) = p(π[n+1], π[n+2]|Hp) p(π[n+1], π[n+2]|Hd) = lik(Hp|π[n+1], π[n+2]) lik(Hd|π[n+1], π[n+2]) Lemma allows to write LR = E(

1

  • p(π[n+2] | π[n+1], A, Θ, Hp) | Π[n+1] = π[n+1])

E(p(π[n+2] | π[n+1], A, Θ, Hd)

  • 1−A

n+1+Θ

| Π[n+1] = π[n+1])

Giulia Cereda () Short title October 8, 2015 24 / 33

slide-84
SLIDE 84

Likelihood ratio

LR = p(π[n+2]|Hp) p(π[n+2]|Hd) = p(π[n+1], π[n+2]|Hp) p(π[n+1], π[n+2]|Hd) = lik(Hp|π[n+1], π[n+2]) lik(Hd|π[n+1], π[n+2]) Lemma allows to write LR = E(

1

  • p(π[n+2] | π[n+1], A, Θ, Hp) | Π[n+1] = π[n+1])

E(p(π[n+2] | π[n+1], A, Θ, Hd)

  • 1−A

n+1+Θ

| Π[n+1] = π[n+1]) = 1 E

  • 1−A

n+1+Θ | Π[n+1] = π[n+1]

.

Giulia Cereda () Short title October 8, 2015 24 / 33

slide-85
SLIDE 85

Likelihood ratio

LR = p(π[n+2]|Hp) p(π[n+2]|Hd) = p(π[n+1], π[n+2]|Hp) p(π[n+1], π[n+2]|Hd) = lik(Hp|π[n+1], π[n+2]) lik(Hd|π[n+1], π[n+2]) Lemma allows to write LR = E(

1

  • p(π[n+2] | π[n+1], A, Θ, Hp) | Π[n+1] = π[n+1])

E(p(π[n+2] | π[n+1], A, Θ, Hd)

  • 1−A

n+1+Θ

| Π[n+1] = π[n+1]) = 1 E

  • 1−A

n+1+Θ | Π[n+1] = π[n+1]

.

Giulia Cereda () Short title October 8, 2015 24 / 33

slide-86
SLIDE 86

LR = 1 E

  • 1−A

n+1+Θ | Π[n+1] = π[n+1]

  • Giulia Cereda ()

Short title October 8, 2015 25 / 33

slide-87
SLIDE 87

LR = 1 E

  • 1−A

n+1+Θ | Π[n+1] = π[n+1]

  • By defining the random variable Φ = n

1−A n+1+Θ we can write the LR as

Giulia Cereda () Short title October 8, 2015 25 / 33

slide-88
SLIDE 88

LR = 1 E

  • 1−A

n+1+Θ | Π[n+1] = π[n+1]

  • By defining the random variable Φ = n

1−A n+1+Θ we can write the LR as

LR = n E(Φ | Π[n+1] = π[n+1]).

Giulia Cereda () Short title October 8, 2015 25 / 33

slide-89
SLIDE 89

LR = 1 E

  • 1−A

n+1+Θ | Π[n+1] = π[n+1]

  • By defining the random variable Φ = n

1−A n+1+Θ we can write the LR as

LR = n E(Φ | Π[n+1] = π[n+1]). We are interested in the distribution of Φ, Θ|Π[n+1]

Giulia Cereda () Short title October 8, 2015 25 / 33

slide-90
SLIDE 90

LR = 1 E

  • 1−A

n+1+Θ | Π[n+1] = π[n+1]

  • By defining the random variable Φ = n

1−A n+1+Θ we can write the LR as

LR = n E(Φ | Π[n+1] = π[n+1]). We are interested in the distribution of Φ, Θ|Π[n+1] p(φ, θ | π[n+1]) ∝ p(π[n+1] | φ, θ)f (φ, θ)

Giulia Cereda () Short title October 8, 2015 25 / 33

slide-91
SLIDE 91

LR = 1 E

  • 1−A

n+1+Θ | Π[n+1] = π[n+1]

  • By defining the random variable Φ = n

1−A n+1+Θ we can write the LR as

LR = n E(Φ | Π[n+1] = π[n+1]). We are interested in the distribution of Φ, Θ|Π[n+1] p(φ, θ | π[n+1]) ∝ p(π[n+1] | φ, θ)f (φ, θ)

Giulia Cereda () Short title October 8, 2015 25 / 33

slide-92
SLIDE 92

Log likelihood with φ and θ

log10 p(π[n+1] | φ, θ)

θ φ

− 1 −100 −80 −80 −60 − 6 −40 − 4 −20 −20

150 200 250 0.40 0.45 0.50 0.55

− 2 . 9 9 5 7 3 2 − 4 . 6 5 1 7

(φMLE, θMLE)

  • Dutch Y-STR database, 7 loci, N=18,925

Giulia Cereda () Short title October 8, 2015 26 / 33

slide-93
SLIDE 93

Log likelihood with φ and θ

log10 p(π[n+1] | φ, θ)

θ φ

− 1 −100 −80 −80 −60 − 6 −40 − 4 −20 −20

150 200 250 0.40 0.45 0.50 0.55

− 2 . 9 9 5 7 3 2 − 4 . 6 5 1 7

(φMLE, θMLE)

  • −100

−100 −80 −80 −60 −60 −40 −40 −20 −20 −2.995732 −4.60517

logL(φ,θ|π[n+1]) logN((φMLE, θMLE), Iobs

−1 (φMLE, θMLE))

95% confidence interval 99% confidence interval

Dutch Y-STR database, 7 loci, N=18,925

Giulia Cereda () Short title October 8, 2015 27 / 33

slide-94
SLIDE 94

Log likelihood as a function of φ and θ

θ φ

− 1 −100 −80 −80 −60 − 6 − 4 −40 − 2 −20

150 200 250 0.40 0.45 0.50 0.55

−2.995732 −4.60517

(φMLE, θMLE)

1 − 1 −80 −80 −60 −60 −40 −40 −20 −20 −2.995732 −4.60517

logL(φ,θ|π[n+1]) logN((φMLE, θMLE), Iobs

−1 (φMLE, θMLE))

95% confidence interval 99% confidence interval

p(π[n+1] | φ, θ) ≈ N((φMLE, θMLE), I −1

MLE)

Giulia Cereda () Short title October 8, 2015 28 / 33

slide-95
SLIDE 95

Log likelihood as a function of φ and θ

θ φ

− 1 −100 −80 −80 −60 − 6 − 4 −40 − 2 −20

150 200 250 0.40 0.45 0.50 0.55

−2.995732 −4.60517

(φMLE, θMLE)

1 − 1 −80 −80 −60 −60 −40 −40 −20 −20 −2.995732 −4.60517

logL(φ,θ|π[n+1]) logN((φMLE, θMLE), Iobs

−1 (φMLE, θMLE))

95% confidence interval 99% confidence interval

p(π[n+1] | φ, θ) ≈ N((φMLE, θMLE), I −1

MLE)

p(φ, θ | π[n+1]) ∝ p(π[n+1] | φ, θ)f (φ, θ)

Giulia Cereda () Short title October 8, 2015 28 / 33

slide-96
SLIDE 96

Log likelihood as a function of φ and θ

θ φ

− 1 −100 −80 −80 −60 − 6 − 4 −40 − 2 −20

150 200 250 0.40 0.45 0.50 0.55

−2.995732 −4.60517

(φMLE, θMLE)

1 − 1 −80 −80 −60 −60 −40 −40 −20 −20 −2.995732 −4.60517

logL(φ,θ|π[n+1]) logN((φMLE, θMLE), Iobs

−1 (φMLE, θMLE))

95% confidence interval 99% confidence interval

p(π[n+1] | φ, θ) ≈ N((φMLE, θMLE), I −1

MLE)

p(φ, θ | π[n+1]) ∝ p(π[n+1] | φ, θ)f (φ, θ) If the prior is smooth around the MLE then

Giulia Cereda () Short title October 8, 2015 28 / 33

slide-97
SLIDE 97

Log likelihood as a function of φ and θ

θ φ

− 1 −100 −80 −80 −60 − 6 − 4 −40 − 2 −20

150 200 250 0.40 0.45 0.50 0.55

−2.995732 −4.60517

(φMLE, θMLE)

1 − 1 −80 −80 −60 −60 −40 −40 −20 −20 −2.995732 −4.60517

logL(φ,θ|π[n+1]) logN((φMLE, θMLE), Iobs

−1 (φMLE, θMLE))

95% confidence interval 99% confidence interval

p(π[n+1] | φ, θ) ≈ N((φMLE, θMLE), I −1

MLE)

p(φ, θ | π[n+1]) ∝ p(π[n+1] | φ, θ)f (φ, θ) If the prior is smooth around the MLE then p(φ, θ | π[n+1]) ≈ N((φMLE, θMLE), I −1

MLE).

Giulia Cereda () Short title October 8, 2015 28 / 33

slide-98
SLIDE 98

Log likelihood as a function of φ and θ

θ φ

− 1 −100 −80 −80 −60 − 6 − 4 −40 − 2 −20

150 200 250 0.40 0.45 0.50 0.55

−2.995732 −4.60517

(φMLE, θMLE)

1 − 1 −80 −80 −60 −60 −40 −40 −20 −20 −2.995732 −4.60517

logL(φ,θ|π[n+1]) logN((φMLE, θMLE), Iobs

−1 (φMLE, θMLE))

95% confidence interval 99% confidence interval

p(π[n+1] | φ, θ) ≈ N((φMLE, θMLE), I −1

MLE)

p(φ, θ | π[n+1]) ∝ p(π[n+1] | φ, θ)f (φ, θ) If the prior is smooth around the MLE then p(φ, θ | π[n+1]) ≈ N((φMLE, θMLE), I −1

MLE).

It follows that E(Φ | Π[n+1] = π[n+1]) ≈ φMLE. That is

Giulia Cereda () Short title October 8, 2015 28 / 33

slide-99
SLIDE 99

p(φ, θ | π[n+1]) ≈ N((φMLE, θMLE), I −1

MLE).

Giulia Cereda () Short title October 8, 2015 29 / 33

slide-100
SLIDE 100

p(φ, θ | π[n+1]) ≈ N((φMLE, θMLE), I −1

MLE).

It follows that E(Φ | Π[n+1] = π[n+1]) ≈ φMLE.

Giulia Cereda () Short title October 8, 2015 29 / 33

slide-101
SLIDE 101

p(φ, θ | π[n+1]) ≈ N((φMLE, θMLE), I −1

MLE).

It follows that E(Φ | Π[n+1] = π[n+1]) ≈ φMLE. LR = n E(Φ | Π[n+1] = π[n+1]) ≈ n + 1 + θMLE 1 − αMLE

Giulia Cereda () Short title October 8, 2015 29 / 33

slide-102
SLIDE 102

Sorted relative frequencies: how good is our prior?

Comparison between the spectrum from a big database, and simulations from PD(α, θ) using MLE estimators of the parameters.

Giulia Cereda () Short title October 8, 2015 30 / 33

slide-103
SLIDE 103

Sorted relative frequencies: how good is our prior?

Comparison between the spectrum from a big database, and simulations from PD(α, θ) using MLE estimators of the parameters.

1 5 10 50 100 500 5000 5e−05 2e−04 5e−04 2e−03 5e−03 2e−02 5e−02 Rank Relative frequencies

  • Database N=18925

αMLE = 0.51, θMLE = 216 asymptotic behavior

Thick black line: ranked relative frequencies in the database. Thin black lines: simulations from the PD(αMLE, θMLE). Dotted line: asymptotics.

Giulia Cereda () Short title October 8, 2015 30 / 33

slide-104
SLIDE 104

The LR when p is known

Imagine we know p.

Giulia Cereda () Short title October 8, 2015 31 / 33

slide-105
SLIDE 105

The LR when p is known

Imagine we know p. LR|p = p(πDb++

[n+2] |Hp, p)

p(πDb++

[n+2] |Hd, p)

=

Giulia Cereda () Short title October 8, 2015 31 / 33

slide-106
SLIDE 106

The LR when p is known

Imagine we know p. LR|p = p(πDb++

[n+2] |Hp, p)

p(πDb++

[n+2] |Hd, p)

= Applying Lemma =

Giulia Cereda () Short title October 8, 2015 31 / 33

slide-107
SLIDE 107

The LR when p is known

Imagine we know p. LR|p = p(πDb++

[n+2] |Hp, p)

p(πDb++

[n+2] |Hd, p)

= Applying Lemma = 1 E(pxn+1|πDb+

[n+1], p)

.

Giulia Cereda () Short title October 8, 2015 31 / 33

slide-108
SLIDE 108

The LR when p is known

Imagine we know p. LR|p = p(πDb++

[n+2] |Hp, p)

p(πDb++

[n+2] |Hd, p)

= Applying Lemma = 1 E(pxn+1|πDb+

[n+1], p)

. How is this compared to the one we get with our method when p is unknown?

Giulia Cereda () Short title October 8, 2015 31 / 33

slide-109
SLIDE 109

Test Dutch database (N=2085, 7 loci)

Database of 2085 Y-STR profiles form Dutch men.

Giulia Cereda () Short title October 8, 2015 32 / 33

slide-110
SLIDE 110

Test Dutch database (N=2085, 7 loci)

Database of 2085 Y-STR profiles form Dutch men. Test: Compare the distribution of log10(LR|p) and log10 LR obtained by 100 samples of size 100 from this population.

Giulia Cereda () Short title October 8, 2015 32 / 33

slide-111
SLIDE 111

Results

Compare the distribution of log10(LR|p) and log10 LR obtained by 100 samples of size 100 from this population

  • 2.5

3.0 3.5 4.0 log10(LR) log10(LRp)

log10(LR)

(a) Comparison

−0.4 0.0 0.4 Error

(b) Error

Giulia Cereda () Short title October 8, 2015 33 / 33