SLIDE 1 A Bayesian nonparametric method for the LR assessment in case of rare type match
Giulia Cereda October 8, 2015
Giulia Cereda () Short title October 8, 2015 1 / 33
SLIDE 2 Forensic Statistics
Ingredients:
Giulia Cereda () Short title October 8, 2015 2 / 33
SLIDE 3 Forensic Statistics
Ingredients: Crime case
Giulia Cereda () Short title October 8, 2015 2 / 33
SLIDE 4 Forensic Statistics
Ingredients: Crime case Evidence (E)
Giulia Cereda () Short title October 8, 2015 2 / 33
SLIDE 5 Forensic Statistics
Ingredients: Crime case Evidence (E) 2 Hypotheses of Interest: Hp vs Hd
Giulia Cereda () Short title October 8, 2015 2 / 33
SLIDE 6 Forensic Statistics
Ingredients: Crime case Evidence (E) 2 Hypotheses of Interest: Hp vs Hd Background (B)
Giulia Cereda () Short title October 8, 2015 2 / 33
SLIDE 7 Forensic Statistics
Ingredients: Crime case Evidence (E) 2 Hypotheses of Interest: Hp vs Hd Background (B) D = (E,B).
Giulia Cereda () Short title October 8, 2015 2 / 33
SLIDE 8 Forensic Statistics
Ingredients: Crime case Evidence (E) 2 Hypotheses of Interest: Hp vs Hd Background (B) D = (E,B). The court asks for the likelihood ratio
Giulia Cereda () Short title October 8, 2015 2 / 33
SLIDE 9 Forensic Statistics
Ingredients: Crime case Evidence (E) 2 Hypotheses of Interest: Hp vs Hd Background (B) D = (E,B). The court asks for the likelihood ratio Pr(Hp | D) Pr(Hd | D) = Pr(D | Hp) Pr(D | Hd)
Pr(Hp) Pr(Hd)
Giulia Cereda () Short title October 8, 2015 2 / 33
SLIDE 10 Example
Ingredients:
Giulia Cereda () Short title October 8, 2015 3 / 33
SLIDE 11 Example
Ingredients: Crime case: murder
Giulia Cereda () Short title October 8, 2015 3 / 33
SLIDE 12 Example
Ingredients: Crime case: murder Evidence (E): profile of the DNA trace found on the crime scene matches the suspect’s DNA profile.
Giulia Cereda () Short title October 8, 2015 3 / 33
SLIDE 13 Example
Ingredients: Crime case: murder Evidence (E): profile of the DNA trace found on the crime scene matches the suspect’s DNA profile. 2 Hypotheses of Interest:
Giulia Cereda () Short title October 8, 2015 3 / 33
SLIDE 14 Example
Ingredients: Crime case: murder Evidence (E): profile of the DNA trace found on the crime scene matches the suspect’s DNA profile. 2 Hypotheses of Interest:
Hp: The suspect left the stain
Giulia Cereda () Short title October 8, 2015 3 / 33
SLIDE 15 Example
Ingredients: Crime case: murder Evidence (E): profile of the DNA trace found on the crime scene matches the suspect’s DNA profile. 2 Hypotheses of Interest:
Hp: The suspect left the stain Hd: Someone else left the stain
Giulia Cereda () Short title October 8, 2015 3 / 33
SLIDE 16 Example
Ingredients: Crime case: murder Evidence (E): profile of the DNA trace found on the crime scene matches the suspect’s DNA profile. 2 Hypotheses of Interest:
Hp: The suspect left the stain Hd: Someone else left the stain
Background (B): database of DNA profiles from the population of possible perpetrators
Giulia Cereda () Short title October 8, 2015 3 / 33
SLIDE 17 DNA profiles
Giulia Cereda () Short title October 8, 2015 4 / 33
SLIDE 18 DNA profiles
A DNA profile is a list of integers h = (4 − 5 − 2 − 10) that code some characteristics in some portions of the DNA sequence of an individual: different persons can share the same profile.
Giulia Cereda () Short title October 8, 2015 4 / 33
SLIDE 19 DNA profiles
A DNA profile is a list of integers h = (4 − 5 − 2 − 10) that code some characteristics in some portions of the DNA sequence of an individual: different persons can share the same profile. For Hp the match is a sure event,
Giulia Cereda () Short title October 8, 2015 4 / 33
SLIDE 20 DNA profiles
A DNA profile is a list of integers h = (4 − 5 − 2 − 10) that code some characteristics in some portions of the DNA sequence of an individual: different persons can share the same profile. For Hp the match is a sure event, For Hd the match is a random event with probability ph = frequency of the profile h of the suspect in the population of possible perpetrators.
Giulia Cereda () Short title October 8, 2015 4 / 33
SLIDE 21 DNA database
Database: a list of DNA profiles from a sample from the population of possible perpetrators
Giulia Cereda () Short title October 8, 2015 5 / 33
SLIDE 22 DNA database
Database: a list of DNA profiles from a sample from the population of possible perpetrators DATABASE of size 10 Person 1 (4 − 10 − 6 − 7) Person 2 (3 − 5 − 6 − 8) Person 3 (3 − 7 − 8 − 10) Person 4 (10 − 1 − 4 − 5) Person 5 (3 − 7 − 8 − 10) Person 6 (3 − 7 − 8 − 10) Person 7 (1 − 5 − 7 − 2) Person 8 (3 − 7 − 8 − 10) Person 9 (3 − 5 − 6 − 8) Person 10 (3 − 7 − 8 − 10)
Giulia Cereda () Short title October 8, 2015 5 / 33
SLIDE 23 DNA database
Database: a list of DNA profiles from a sample from the population of possible perpetrators DATABASE of size 10 Person 1 (4 − 10 − 6 − 7) Person 2 (3 − 5 − 6 − 8) Person 3 (3 − 7 − 8 − 10) Person 4 (10 − 1 − 4 − 5) Person 5 (3 − 7 − 8 − 10) Person 6 (3 − 7 − 8 − 10) Person 7 (1 − 5 − 7 − 2) Person 8 (3 − 7 − 8 − 10) Person 9 (3 − 5 − 6 − 8) Person 10 (3 − 7 − 8 − 10) The database is used to find out the rarity of the matching profile.
Giulia Cereda () Short title October 8, 2015 5 / 33
SLIDE 24 LR assessment in the rare type match case
My research focuses on the LR assessment in the rare type match case, that is:
Giulia Cereda () Short title October 8, 2015 6 / 33
SLIDE 25 LR assessment in the rare type match case
My research focuses on the LR assessment in the rare type match case, that is: A match between the suspect’s DNA profile and the crime stain’s DNA profile.
Giulia Cereda () Short title October 8, 2015 6 / 33
SLIDE 26 LR assessment in the rare type match case
My research focuses on the LR assessment in the rare type match case, that is: A match between the suspect’s DNA profile and the crime stain’s DNA profile. This profile is not contained in the database B.
Giulia Cereda () Short title October 8, 2015 6 / 33
SLIDE 27 LR assessment in the rare type match case
My research focuses on the LR assessment in the rare type match case, that is: A match between the suspect’s DNA profile and the crime stain’s DNA profile. This profile is not contained in the database B. Especially if the database is big, the profile seems to be rare.
Giulia Cereda () Short title October 8, 2015 6 / 33
SLIDE 28 LR assessment in the rare type match case
My research focuses on the LR assessment in the rare type match case, that is: A match between the suspect’s DNA profile and the crime stain’s DNA profile. This profile is not contained in the database B. Especially if the database is big, the profile seems to be rare. How rare?
Giulia Cereda () Short title October 8, 2015 6 / 33
SLIDE 29 Previous models
Giulia Cereda () Short title October 8, 2015 7 / 33
SLIDE 30 Previous models
Frequentist model: (Cereda 2015) Frequentist approach to LR assessment in case of rare haplotype match arXiv:1502.04083
Giulia Cereda () Short title October 8, 2015 7 / 33
SLIDE 31 Previous models
Frequentist model: (Cereda 2015) Frequentist approach to LR assessment in case of rare haplotype match arXiv:1502.04083 Bayesian model: (Cereda 2015) Full Bayesian approach to LR assessment in case of rare haplotype match arXiv:1502.02406
Giulia Cereda () Short title October 8, 2015 7 / 33
SLIDE 32 Previous models
Frequentist model: (Cereda 2015) Frequentist approach to LR assessment in case of rare haplotype match arXiv:1502.04083 Bayesian model: (Cereda 2015) Full Bayesian approach to LR assessment in case of rare haplotype match arXiv:1502.02406 (Cereda 2015) Nonparametric Bayesian approach to LR assessment in case of rare haplotype match arXiv:1506.08444
Giulia Cereda () Short title October 8, 2015 7 / 33
SLIDE 33 Assumptions
Assumption 1
There are so many different DNA types that they may be considered infinite.
Giulia Cereda () Short title October 8, 2015 8 / 33
SLIDE 34 Assumptions
Assumption 1
There are so many different DNA types that they may be considered infinite. Parameter: p = (pt|t ∈ T), T an infinite countable set,pt > 0, pt = 1, to represent the (unknown) frequencies of all DNA types in Nature.
Giulia Cereda () Short title October 8, 2015 8 / 33
SLIDE 35 Assumptions
Assumption 1
There are so many different DNA types that they may be considered infinite. Parameter: p = (pt|t ∈ T), T an infinite countable set,pt > 0, pt = 1, to represent the (unknown) frequencies of all DNA types in Nature.
Assumption 2
The particular list of integers that forms a DNA type is just a category: no structure assumed.
Giulia Cereda () Short title October 8, 2015 8 / 33
SLIDE 36 Assumptions
Assumption 1
There are so many different DNA types that they may be considered infinite. Parameter: p = (pt|t ∈ T), T an infinite countable set,pt > 0, pt = 1, to represent the (unknown) frequencies of all DNA types in Nature.
Assumption 2
The particular list of integers that forms a DNA type is just a category: no structure assumed. “DNA types” or “colors” is now the same.
Giulia Cereda () Short title October 8, 2015 8 / 33
SLIDE 37 Random partitions of [n]
Let [n] denote the set [n] = {1, 2, ..., n}.
Giulia Cereda () Short title October 8, 2015 9 / 33
SLIDE 38 Random partitions of [n]
Let [n] denote the set [n] = {1, 2, ..., n}. A partition of the set [n] will be denoted as π[n].
Giulia Cereda () Short title October 8, 2015 9 / 33
SLIDE 39 Random partitions of [n]
Let [n] denote the set [n] = {1, 2, ..., n}. A partition of the set [n] will be denoted as π[n]. Random partitions on the set [n] will be denoted as Π[n].
Giulia Cereda () Short title October 8, 2015 9 / 33
SLIDE 40 DNA database can be reduced
DATABASE of size 10 Person 1 (4 − 10 − 6 − 7) Person 2 (3 − 5 − 6 − 8) Person 3 (3 − 7 − 8 − 10) Person 4 (10 − 1 − 4 − 5) Person 5 (3 − 7 − 8 − 10) Person 6 (3 − 7 − 8 − 10) Person 7 (1 − 5 − 7 − 2) Person 8 (3 − 7 − 8 − 10) Person 9 (3 − 5 − 6 − 8) Person 10 (3 − 7 − 8 − 10)
Giulia Cereda () Short title October 8, 2015 10 / 33
SLIDE 41 DNA database can be reduced
DATABASE of size 10 Person 1 (4 − 10 − 6 − 7) Person 2 (3 − 5 − 6 − 8) Person 3 (3 − 7 − 8 − 10) Person 4 (10 − 1 − 4 − 5) Person 5 (3 − 7 − 8 − 10) Person 6 (3 − 7 − 8 − 10) Person 7 (1 − 5 − 7 − 2) Person 8 (3 − 7 − 8 − 10) Person 9 (3 − 5 − 6 − 8) Person 10 (3 − 7 − 8 − 10)
Giulia Cereda () Short title October 8, 2015 11 / 33
SLIDE 42 DNA database can be reduced
DATABASE of size 10 Person 1 (4 − 10 − 6 − 7) Person 2 (3 − 5 − 6 − 8) Person 3 (3 − 7 − 8 − 10) Person 4 (10 − 1 − 4 − 5) Person 5 (3 − 7 − 8 − 10) Person 6 (3 − 7 − 8 − 10) Person 7 (1 − 5 − 7 − 2) Person 8 (3 − 7 − 8 − 10) Person 9 (3 − 5 − 6 − 8) Person 10 (3 − 7 − 8 − 10) Assumption 2 → data can be replaces by the equivalence classes on the indices of the relation “to have the same DNA type”. This is a partition of the set [n] : {{1}, {2, 9}, {3, 5, 6, 8, 10}, {4}, {7}}
Giulia Cereda () Short title October 8, 2015 11 / 33
SLIDE 43 Reduced data
Giulia Cereda () Short title October 8, 2015 12 / 33
SLIDE 44 Reduced data
Data D is made of the database + 2 new observations
Giulia Cereda () Short title October 8, 2015 12 / 33
SLIDE 45 Reduced data
Data D is made of the database + 2 new observations D = π[n+2] partition of the set {1, 2, ..., n + 2}
Giulia Cereda () Short title October 8, 2015 12 / 33
SLIDE 46 Reduced data
Data D is made of the database + 2 new observations D = π[n+2] partition of the set {1, 2, ..., n + 2} Example: Database → π[10] = {{1}, {2, 9}, {3, 5, 6, 8, 10}, {4}, {7}}
Giulia Cereda () Short title October 8, 2015 12 / 33
SLIDE 47 Reduced data
Data D is made of the database + 2 new observations D = π[n+2] partition of the set {1, 2, ..., n + 2} Example: Database → π[10] = {{1}, {2, 9}, {3, 5, 6, 8, 10}, {4}, {7}} D → π[12] = {{1}, {2, 9}, {3, 5, 6, 8, 10}, {4}, {7}, {11, 12}}
Giulia Cereda () Short title October 8, 2015 12 / 33
SLIDE 48 Reduced data
Data D is made of the database + 2 new observations D = π[n+2] partition of the set {1, 2, ..., n + 2} Example: Database → π[10] = {{1}, {2, 9}, {3, 5, 6, 8, 10}, {4}, {7}} D → π[12] = {{1}, {2, 9}, {3, 5, 6, 8, 10}, {4}, {7}, {11, 12}} We can see the data as a random variable. In that case, D = Π[n+2].
Giulia Cereda () Short title October 8, 2015 12 / 33
SLIDE 49 The distribution of D = Π[n+2] depends on p. However, it does not depend on the order of the pi.
Giulia Cereda () Short title October 8, 2015 13 / 33
SLIDE 50 The distribution of D = Π[n+2] depends on p. However, it does not depend on the order of the pi. ↓ We can consider directly the ordered vector p ∈ ∇∞ = {(p1, p2, ....), p1 ≥ p2 ≥ ... > 0, pi = 1}.
Giulia Cereda () Short title October 8, 2015 13 / 33
SLIDE 51 The distribution of D = Π[n+2] depends on p. However, it does not depend on the order of the pi. ↓ We can consider directly the ordered vector p ∈ ∇∞ = {(p1, p2, ....), p1 ≥ p2 ≥ ... > 0, pi = 1}. For instance, p3= the frequency of the third most frequent DNA type in Nature.
Giulia Cereda () Short title October 8, 2015 13 / 33
SLIDE 52 Prior distribution on p ∈ ∇∞
Bayesian nonparametrics: we need a prior for the parameter p.
Giulia Cereda () Short title October 8, 2015 14 / 33
SLIDE 53 Prior distribution on p ∈ ∇∞
Bayesian nonparametrics: we need a prior for the parameter p. Two parameter Poisson Dirichlet distribution.
Giulia Cereda () Short title October 8, 2015 14 / 33
SLIDE 54 Prior distribution on p ∈ ∇∞
Bayesian nonparametrics: we need a prior for the parameter p. Two parameter Poisson Dirichlet distribution. Parameters: 0 < α < 1, θ > −α
Giulia Cereda () Short title October 8, 2015 14 / 33
SLIDE 55 The model (first part)
Giulia Cereda () Short title October 8, 2015 15 / 33
SLIDE 56
The model (first part)
A, Θ ∇∞ ∋ P
SLIDE 57
The model (first part)
A, Θ ∇∞ ∋ P P|α, θ ∼ PD(α, θ)
SLIDE 58 The model (first part)
A, Θ ∇∞ ∋ P P|α, θ ∼ PD(α, θ) ... X1 N ∋ X2 Xn Xi = j → the i-th observation has the jth most common type in Nature.
Giulia Cereda () Short title October 8, 2015 15 / 33
SLIDE 59 The model (first part)
A, Θ ∇∞ ∋ P P|α, θ ∼ PD(α, θ) ... X1 N ∋ X2 Xn Xi = j → the i-th observation has the jth most common type in Nature.
Giulia Cereda () Short title October 8, 2015 15 / 33
SLIDE 60
The model (first part)
A, Θ P ∇∞ ∋ (A, Θ) ∼ f P|α, θ ∼ PD(α, θ) ... X1 N ∋ X2 Xn
SLIDE 61
The model (first part)
A, Θ P ∇∞ ∋ (A, Θ) ∼ f P|α, θ ∼ PD(α, θ) ... X1 N ∋ X2 Xn Suspect Xn+1
SLIDE 62
The model (first part)
A, Θ P ∇∞ ∋ (A, Θ) ∼ f P|α, θ ∼ PD(α, θ) ... X1 N ∋ X2 Xn Suspect Xn+1 X1, ..., Xn+1|p ∼i.i.d p
SLIDE 63
The model (first part)
A, Θ P ∇∞ ∋ (A, Θ) ∼ f P|α, θ ∼ PD(α, θ) ... X1 N ∋ X2 Xn Suspect Xn+1 X1, ..., Xn+1|p ∼i.i.d p Crime stain Xn+2 H {Hp, Hd} ∋
SLIDE 64 The model (first part)
A, Θ P ∇∞ ∋ (A, Θ) ∼ f P|α, θ ∼ PD(α, θ) ... X1 N ∋ X2 Xn Suspect Xn+1 X1, ..., Xn+1|p ∼i.i.d p Crime stain Xn+2 H {Hp, Hd} ∋ Xn+2|p, H, xn+1 ∼
if H = Hp p if H = Hd
Giulia Cereda () Short title October 8, 2015 16 / 33
SLIDE 65 The model (first part)
A, Θ P ∇∞ ∋ (A, Θ) ∼ f P|α, θ ∼ PD(α, θ) ... X1 N ∋ X2 Xn Suspect Xn+1 X1, ..., Xn+1|p ∼i.i.d p Crime stain Xn+2 H {Hp, Hd} ∋ Xn+2|p, H, xn+1 ∼
if H = Hp p if H = Hd
Giulia Cereda () Short title October 8, 2015 16 / 33
SLIDE 66 The model (first part)
A, Θ P ... X1 X2 Xn Xn+1 Xn+2 H
Giulia Cereda () Short title October 8, 2015 17 / 33
SLIDE 67 Random partitions
Some notation: Given X1, ..., Xn ∈ N, random variables, Π[n](X1, X2, ..., Xn) is the random partition defined by the equivalence classes of i ∼ j iff Xi = Xj.
Giulia Cereda () Short title October 8, 2015 18 / 33
SLIDE 68 Random partitions
Some notation: Given X1, ..., Xn ∈ N, random variables, Π[n](X1, X2, ..., Xn) is the random partition defined by the equivalence classes of i ∼ j iff Xi = Xj. X1, ..., Xn − → Π[n] = πDb
[n]
X1, ..., Xn, Xn+1 − → Π[n+1] = πDb+
[n+1]
X1, ..., Xn, Xn+1, Xn+2 − → Π[n+2] = πDb++
[n+2]
Giulia Cereda () Short title October 8, 2015 18 / 33
SLIDE 69 Random partitions
Some notation: Given X1, ..., Xn ∈ N, random variables, Π[n](X1, X2, ..., Xn) is the random partition defined by the equivalence classes of i ∼ j iff Xi = Xj. X1, ..., Xn − → Π[n] = πDb
[n]
X1, ..., Xn, Xn+1 − → Π[n+1] = πDb+
[n+1]
X1, ..., Xn, Xn+1, Xn+2 − → Π[n+2] = πDb++
[n+2]
X1, ..., Xn are not observed, but generates the same partition as the
Data can be defined as D = Π[n+2].
Giulia Cereda () Short title October 8, 2015 18 / 33
SLIDE 70 The complete model
A, Θ H P X1 X2 Xn Xn+1 Xn+2 D
Giulia Cereda () Short title October 8, 2015 19 / 33
SLIDE 71 Pitman sampling formula
Giulia Cereda () Short title October 8, 2015 20 / 33
SLIDE 72 Pitman sampling formula
P ∼ PD(α, θ) X1, X2, ..., Xn|P = p ∼i.i.d p
Giulia Cereda () Short title October 8, 2015 20 / 33
SLIDE 73 Pitman sampling formula
P ∼ PD(α, θ) X1, X2, ..., Xn|P = p ∼i.i.d p then Π[n] = Π[n](X1, ..., Xn) has the following distribution:
Giulia Cereda () Short title October 8, 2015 20 / 33
SLIDE 74 Pitman sampling formula
P ∼ PD(α, θ) X1, X2, ..., Xn|P = p ∼i.i.d p then Π[n] = Π[n](X1, ..., Xn) has the following distribution: Pr(Π[n] = π[n]|α, θ) = Pn
α,θ(π[n]) = [θ + α]k−1;α
[θ + 1]n−1;1
k
[1 − α]ni−1;1,
Giulia Cereda () Short title October 8, 2015 20 / 33
SLIDE 75 Pitman sampling formula
P ∼ PD(α, θ) X1, X2, ..., Xn|P = p ∼i.i.d p then Π[n] = Π[n](X1, ..., Xn) has the following distribution: Pr(Π[n] = π[n]|α, θ) = Pn
α,θ(π[n]) = [θ + α]k−1;α
[θ + 1]n−1;1
k
[1 − α]ni−1;1, In our model Pr(D|α, θ, h) = Pr(Π[n+2] = πDb++
[n+2] |α, θ, h) =
α,θ (πDb++ [n+2] )
if h = Hd Pn+1
α,θ (πDb+ [n+1])
if h = Hp
Giulia Cereda () Short title October 8, 2015 20 / 33
SLIDE 76 The model, simplified
A, Θ H D
Giulia Cereda () Short title October 8, 2015 21 / 33
SLIDE 77 The model, simplified
A, Θ H D D = Π[n+2].
Giulia Cereda () Short title October 8, 2015 21 / 33
SLIDE 78 Lemma
Giulia Cereda () Short title October 8, 2015 22 / 33
SLIDE 79 Lemma
A H X Y
Lemma
Given four random variables A, H, X and Y , as above, the likelihood function for h, given X = x and Y = y, satisfies lik(h | x, y) ∝ E(p(y | x, A, h) | X = x).
Giulia Cereda () Short title October 8, 2015 22 / 33
SLIDE 80 Lemma
A, Θ H Π[n+1] Π[n+2]
Giulia Cereda () Short title October 8, 2015 23 / 33
SLIDE 81 Lemma
A, Θ H Π[n+1] Π[n+2] lik(h | π[n+1], π[n+2]) ∝ E(p(π[n+2] | π[n+1], A, Θ, h) | Π[n+1] = π[n+1]).
Giulia Cereda () Short title October 8, 2015 23 / 33
SLIDE 82 Likelihood ratio
LR = p(π[n+2]|Hp) p(π[n+2]|Hd) = p(π[n+1], π[n+2]|Hp) p(π[n+1], π[n+2]|Hd) = lik(Hp|π[n+1], π[n+2]) lik(Hd|π[n+1], π[n+2])
Giulia Cereda () Short title October 8, 2015 24 / 33
SLIDE 83 Likelihood ratio
LR = p(π[n+2]|Hp) p(π[n+2]|Hd) = p(π[n+1], π[n+2]|Hp) p(π[n+1], π[n+2]|Hd) = lik(Hp|π[n+1], π[n+2]) lik(Hd|π[n+1], π[n+2]) Lemma allows to write LR = E(
1
- p(π[n+2] | π[n+1], A, Θ, Hp) | Π[n+1] = π[n+1])
E(p(π[n+2] | π[n+1], A, Θ, Hd)
n+1+Θ
| Π[n+1] = π[n+1])
Giulia Cereda () Short title October 8, 2015 24 / 33
SLIDE 84 Likelihood ratio
LR = p(π[n+2]|Hp) p(π[n+2]|Hd) = p(π[n+1], π[n+2]|Hp) p(π[n+1], π[n+2]|Hd) = lik(Hp|π[n+1], π[n+2]) lik(Hd|π[n+1], π[n+2]) Lemma allows to write LR = E(
1
- p(π[n+2] | π[n+1], A, Θ, Hp) | Π[n+1] = π[n+1])
E(p(π[n+2] | π[n+1], A, Θ, Hd)
n+1+Θ
| Π[n+1] = π[n+1]) = 1 E
n+1+Θ | Π[n+1] = π[n+1]
.
Giulia Cereda () Short title October 8, 2015 24 / 33
SLIDE 85 Likelihood ratio
LR = p(π[n+2]|Hp) p(π[n+2]|Hd) = p(π[n+1], π[n+2]|Hp) p(π[n+1], π[n+2]|Hd) = lik(Hp|π[n+1], π[n+2]) lik(Hd|π[n+1], π[n+2]) Lemma allows to write LR = E(
1
- p(π[n+2] | π[n+1], A, Θ, Hp) | Π[n+1] = π[n+1])
E(p(π[n+2] | π[n+1], A, Θ, Hd)
n+1+Θ
| Π[n+1] = π[n+1]) = 1 E
n+1+Θ | Π[n+1] = π[n+1]
.
Giulia Cereda () Short title October 8, 2015 24 / 33
SLIDE 86 LR = 1 E
n+1+Θ | Π[n+1] = π[n+1]
Short title October 8, 2015 25 / 33
SLIDE 87 LR = 1 E
n+1+Θ | Π[n+1] = π[n+1]
- By defining the random variable Φ = n
1−A n+1+Θ we can write the LR as
Giulia Cereda () Short title October 8, 2015 25 / 33
SLIDE 88 LR = 1 E
n+1+Θ | Π[n+1] = π[n+1]
- By defining the random variable Φ = n
1−A n+1+Θ we can write the LR as
LR = n E(Φ | Π[n+1] = π[n+1]).
Giulia Cereda () Short title October 8, 2015 25 / 33
SLIDE 89 LR = 1 E
n+1+Θ | Π[n+1] = π[n+1]
- By defining the random variable Φ = n
1−A n+1+Θ we can write the LR as
LR = n E(Φ | Π[n+1] = π[n+1]). We are interested in the distribution of Φ, Θ|Π[n+1]
Giulia Cereda () Short title October 8, 2015 25 / 33
SLIDE 90 LR = 1 E
n+1+Θ | Π[n+1] = π[n+1]
- By defining the random variable Φ = n
1−A n+1+Θ we can write the LR as
LR = n E(Φ | Π[n+1] = π[n+1]). We are interested in the distribution of Φ, Θ|Π[n+1] p(φ, θ | π[n+1]) ∝ p(π[n+1] | φ, θ)f (φ, θ)
Giulia Cereda () Short title October 8, 2015 25 / 33
SLIDE 91 LR = 1 E
n+1+Θ | Π[n+1] = π[n+1]
- By defining the random variable Φ = n
1−A n+1+Θ we can write the LR as
LR = n E(Φ | Π[n+1] = π[n+1]). We are interested in the distribution of Φ, Θ|Π[n+1] p(φ, θ | π[n+1]) ∝ p(π[n+1] | φ, θ)f (φ, θ)
Giulia Cereda () Short title October 8, 2015 25 / 33
SLIDE 92 Log likelihood with φ and θ
log10 p(π[n+1] | φ, θ)
θ φ
− 1 −100 −80 −80 −60 − 6 −40 − 4 −20 −20
150 200 250 0.40 0.45 0.50 0.55
− 2 . 9 9 5 7 3 2 − 4 . 6 5 1 7
(φMLE, θMLE)
- Dutch Y-STR database, 7 loci, N=18,925
Giulia Cereda () Short title October 8, 2015 26 / 33
SLIDE 93 Log likelihood with φ and θ
log10 p(π[n+1] | φ, θ)
θ φ
− 1 −100 −80 −80 −60 − 6 −40 − 4 −20 −20
150 200 250 0.40 0.45 0.50 0.55
− 2 . 9 9 5 7 3 2 − 4 . 6 5 1 7
(φMLE, θMLE)
−100 −80 −80 −60 −60 −40 −40 −20 −20 −2.995732 −4.60517
logL(φ,θ|π[n+1]) logN((φMLE, θMLE), Iobs
−1 (φMLE, θMLE))
95% confidence interval 99% confidence interval
Dutch Y-STR database, 7 loci, N=18,925
Giulia Cereda () Short title October 8, 2015 27 / 33
SLIDE 94 Log likelihood as a function of φ and θ
θ φ
− 1 −100 −80 −80 −60 − 6 − 4 −40 − 2 −20
150 200 250 0.40 0.45 0.50 0.55
−2.995732 −4.60517
(φMLE, θMLE)
1 − 1 −80 −80 −60 −60 −40 −40 −20 −20 −2.995732 −4.60517
logL(φ,θ|π[n+1]) logN((φMLE, θMLE), Iobs
−1 (φMLE, θMLE))
95% confidence interval 99% confidence interval
p(π[n+1] | φ, θ) ≈ N((φMLE, θMLE), I −1
MLE)
Giulia Cereda () Short title October 8, 2015 28 / 33
SLIDE 95 Log likelihood as a function of φ and θ
θ φ
− 1 −100 −80 −80 −60 − 6 − 4 −40 − 2 −20
150 200 250 0.40 0.45 0.50 0.55
−2.995732 −4.60517
(φMLE, θMLE)
1 − 1 −80 −80 −60 −60 −40 −40 −20 −20 −2.995732 −4.60517
logL(φ,θ|π[n+1]) logN((φMLE, θMLE), Iobs
−1 (φMLE, θMLE))
95% confidence interval 99% confidence interval
p(π[n+1] | φ, θ) ≈ N((φMLE, θMLE), I −1
MLE)
p(φ, θ | π[n+1]) ∝ p(π[n+1] | φ, θ)f (φ, θ)
Giulia Cereda () Short title October 8, 2015 28 / 33
SLIDE 96 Log likelihood as a function of φ and θ
θ φ
− 1 −100 −80 −80 −60 − 6 − 4 −40 − 2 −20
150 200 250 0.40 0.45 0.50 0.55
−2.995732 −4.60517
(φMLE, θMLE)
1 − 1 −80 −80 −60 −60 −40 −40 −20 −20 −2.995732 −4.60517
logL(φ,θ|π[n+1]) logN((φMLE, θMLE), Iobs
−1 (φMLE, θMLE))
95% confidence interval 99% confidence interval
p(π[n+1] | φ, θ) ≈ N((φMLE, θMLE), I −1
MLE)
p(φ, θ | π[n+1]) ∝ p(π[n+1] | φ, θ)f (φ, θ) If the prior is smooth around the MLE then
Giulia Cereda () Short title October 8, 2015 28 / 33
SLIDE 97 Log likelihood as a function of φ and θ
θ φ
− 1 −100 −80 −80 −60 − 6 − 4 −40 − 2 −20
150 200 250 0.40 0.45 0.50 0.55
−2.995732 −4.60517
(φMLE, θMLE)
1 − 1 −80 −80 −60 −60 −40 −40 −20 −20 −2.995732 −4.60517
logL(φ,θ|π[n+1]) logN((φMLE, θMLE), Iobs
−1 (φMLE, θMLE))
95% confidence interval 99% confidence interval
p(π[n+1] | φ, θ) ≈ N((φMLE, θMLE), I −1
MLE)
p(φ, θ | π[n+1]) ∝ p(π[n+1] | φ, θ)f (φ, θ) If the prior is smooth around the MLE then p(φ, θ | π[n+1]) ≈ N((φMLE, θMLE), I −1
MLE).
Giulia Cereda () Short title October 8, 2015 28 / 33
SLIDE 98 Log likelihood as a function of φ and θ
θ φ
− 1 −100 −80 −80 −60 − 6 − 4 −40 − 2 −20
150 200 250 0.40 0.45 0.50 0.55
−2.995732 −4.60517
(φMLE, θMLE)
1 − 1 −80 −80 −60 −60 −40 −40 −20 −20 −2.995732 −4.60517
logL(φ,θ|π[n+1]) logN((φMLE, θMLE), Iobs
−1 (φMLE, θMLE))
95% confidence interval 99% confidence interval
p(π[n+1] | φ, θ) ≈ N((φMLE, θMLE), I −1
MLE)
p(φ, θ | π[n+1]) ∝ p(π[n+1] | φ, θ)f (φ, θ) If the prior is smooth around the MLE then p(φ, θ | π[n+1]) ≈ N((φMLE, θMLE), I −1
MLE).
It follows that E(Φ | Π[n+1] = π[n+1]) ≈ φMLE. That is
Giulia Cereda () Short title October 8, 2015 28 / 33
SLIDE 99 p(φ, θ | π[n+1]) ≈ N((φMLE, θMLE), I −1
MLE).
Giulia Cereda () Short title October 8, 2015 29 / 33
SLIDE 100 p(φ, θ | π[n+1]) ≈ N((φMLE, θMLE), I −1
MLE).
It follows that E(Φ | Π[n+1] = π[n+1]) ≈ φMLE.
Giulia Cereda () Short title October 8, 2015 29 / 33
SLIDE 101 p(φ, θ | π[n+1]) ≈ N((φMLE, θMLE), I −1
MLE).
It follows that E(Φ | Π[n+1] = π[n+1]) ≈ φMLE. LR = n E(Φ | Π[n+1] = π[n+1]) ≈ n + 1 + θMLE 1 − αMLE
Giulia Cereda () Short title October 8, 2015 29 / 33
SLIDE 102 Sorted relative frequencies: how good is our prior?
Comparison between the spectrum from a big database, and simulations from PD(α, θ) using MLE estimators of the parameters.
Giulia Cereda () Short title October 8, 2015 30 / 33
SLIDE 103 Sorted relative frequencies: how good is our prior?
Comparison between the spectrum from a big database, and simulations from PD(α, θ) using MLE estimators of the parameters.
1 5 10 50 100 500 5000 5e−05 2e−04 5e−04 2e−03 5e−03 2e−02 5e−02 Rank Relative frequencies
αMLE = 0.51, θMLE = 216 asymptotic behavior
Thick black line: ranked relative frequencies in the database. Thin black lines: simulations from the PD(αMLE, θMLE). Dotted line: asymptotics.
Giulia Cereda () Short title October 8, 2015 30 / 33
SLIDE 104 The LR when p is known
Imagine we know p.
Giulia Cereda () Short title October 8, 2015 31 / 33
SLIDE 105 The LR when p is known
Imagine we know p. LR|p = p(πDb++
[n+2] |Hp, p)
p(πDb++
[n+2] |Hd, p)
=
Giulia Cereda () Short title October 8, 2015 31 / 33
SLIDE 106 The LR when p is known
Imagine we know p. LR|p = p(πDb++
[n+2] |Hp, p)
p(πDb++
[n+2] |Hd, p)
= Applying Lemma =
Giulia Cereda () Short title October 8, 2015 31 / 33
SLIDE 107 The LR when p is known
Imagine we know p. LR|p = p(πDb++
[n+2] |Hp, p)
p(πDb++
[n+2] |Hd, p)
= Applying Lemma = 1 E(pxn+1|πDb+
[n+1], p)
.
Giulia Cereda () Short title October 8, 2015 31 / 33
SLIDE 108 The LR when p is known
Imagine we know p. LR|p = p(πDb++
[n+2] |Hp, p)
p(πDb++
[n+2] |Hd, p)
= Applying Lemma = 1 E(pxn+1|πDb+
[n+1], p)
. How is this compared to the one we get with our method when p is unknown?
Giulia Cereda () Short title October 8, 2015 31 / 33
SLIDE 109 Test Dutch database (N=2085, 7 loci)
Database of 2085 Y-STR profiles form Dutch men.
Giulia Cereda () Short title October 8, 2015 32 / 33
SLIDE 110 Test Dutch database (N=2085, 7 loci)
Database of 2085 Y-STR profiles form Dutch men. Test: Compare the distribution of log10(LR|p) and log10 LR obtained by 100 samples of size 100 from this population.
Giulia Cereda () Short title October 8, 2015 32 / 33
SLIDE 111 Results
Compare the distribution of log10(LR|p) and log10 LR obtained by 100 samples of size 100 from this population
3.0 3.5 4.0 log10(LR) log10(LRp)
log10(LR)
(a) Comparison
−0.4 0.0 0.4 Error
(b) Error
Giulia Cereda () Short title October 8, 2015 33 / 33