1
Bayesian networks
- II. Model building
Advanced Herd Management Anders Ringgaard Kristensen
Outline
Determining the graphical structure
Milk test Mastitis diagnosis Pregnancy
Determining the conditional probabilities Modeling methods and tricks
Outline Determining the graphical structure Milk test Mastitis - - PDF document
Bayesian networks II. Model building Advanced Herd Management Anders Ringgaard Kristensen Outline Determining the graphical structure Milk test Mastitis diagnosis Pregnancy Determining the conditional probabilities
1
Advanced Herd Management Anders Ringgaard Kristensen
Determining the graphical structure
Milk test Mastitis diagnosis Pregnancy
Determining the conditional probabilities Modeling methods and tricks
2
Determining the graphical structure
Milk test Mastitis diagnosis Pregnancy
Determining the conditional probabilities Modeling methods and tricks
Sensitivity/Specificity determines the conditional probabilities Direction of edge!
“Causal direction Against the “reasoning” direction
Infected? Infected?
{“Yes”, “No”} {“Positive”, “Negative”}
3
Are the infection states of different days independent? Probably not! Markov property Duration of disease
Inf1 Inf2 Inf3 Inf7 Inf6 Inf5 Inf4 Test1 Test2 Test3 Test7 Test6 Test5 Test4
Correctness of test depends on whether it was correct yesterday. To determine whether it was correct yesterday:
The true infection state yesterday
Not observed – we need the conditional probabilities.
The test result yesterday
Inf1 Inf2 Inf3 Inf7 Inf6 Inf5 Inf4 Test1 Test2 Test3 Test7 Test6 Test5 Test4
4
A simplifying intermediate variable Cori ∈ {“yes”, “no”} indicating whether the test was correct.
Inf1 Inf2 Inf3 Inf7 Inf6 Inf5 Inf4 Test1 Test2 Test3 Test7 Test6 Test5 Test4 Cor1 Cor2 Cor3 Cor4 Cor5 Cor6
Separate variables (don’t pool mastitis & heat) Check conditional independence
Are “Conductivity” and “Temperature” independent, given “Mastitis”?
Mastitis Heat Conductivity Temperature
No Subclinical Clinical No Yes
5
Separate variables (don’t pool mastitis & heat) Check conditional independence
Are “Conductivity” and “Temperature” independent, given “Mastitis”?
Mastitis Heat Conductivity Temperature
No Subclinical Clinical No Yes
If conductivity influences temperature.
Mastitis Heat Conductivity Temperature
No Subclinical Clinical No Yes
6
If temperature influences conductivity.
The causal direction may be difficult to determine
Mastitis Heat Conductivity Temperature
No Subclinical Clinical No Yes
If the direction of an edge cannot be determined, a variable is often missing!
Mastitis Heat Conductivity Temperature
No Subclinical Clinical No Yes
Other disease
7
A goat is mated, and six weeks later we want to test it for pregnancy. We have three tests available:
Blood test Urine test Scanning
The variables of our problem are:
Pregnant {“yes”, “no”} Blood {“positive”, “negative”} Urine {“positive”, “negative”} Scan {“positive”, “negative”}
Check for conditional independence
Pregnant Urine Scan Blood
8
The blood test and the urine test both measure a hormone level. The scanning does something completely different.
Pregnant Urine Scan Blood Hormone
Determining the graphical structure
Milk test Mastitis diagnosis Pregnancy
Determining the conditional probabilities Modeling methods and tricks
9
Statistical model with parameters estimated from data. Law of nature. Experts of the domain.
The P(Testi | Infi) conditional probability is supplied by the test retailer.
Inf1 Inf2 Inf3 Inf7 Inf6 Inf5 Inf4 Test1 Test2 Test3 Test7 Test6 Test5 Test4 Cor1 Cor2 Cor3 Cor4 Cor5 Cor6
10
0.99 0.01 Infi = “no” 0.01 0.99 Infi = “yes” P(Testi=“yes”|Infi ) P(Testi=“yes”|Infi ) Defined by the sensitivity and specificity of the test.
The P(Cori | Infi, Testi) conditional probability is trivial.
Inf1 Inf2 Inf3 Inf7 Inf6 Inf5 Inf4 Test1 Test2 Test3 Test7 Test6 Test5 Test4 Cor1 Cor2 Cor3 Cor4 Cor5 Cor6
11
If Infi and Testi agree, the test is correct! 1 “yes” “yes” 1 “no” “no” 1 “yes” “no” 1 “no” “yes” P(Cori=n|Infi,Testi) P(Cori=y|Infi,Testi) Testi Infi
The P(Testi | Infi, Cori-1) conditional probabilities needs some assumptions.
Inf1 Inf2 Inf3 Inf7 Inf6 Inf5 Inf4 Test1 Test2 Test3 Test7 Test6 Test5 Test4 Cor1 Cor2 Cor3 Cor4 Cor5 Cor6
12
Assumptions:
A correct test has 99.9% chance of being correct next time. An incorrect test has 30% chance of being incorrect next time:
Thus, it is still most likely to be correct. In agreement with the example file provided from the homepage. In disagreement with the textbook.
0.001 0.999 “yes” “yes” 0.3 0.7 “no” “no” 0.999 0.001 “yes” “no” 0.3 0.7 “no” “yes” P(Testi=n|Infi,Cori-1) P(Testi=y|Infi,Cori-1) Cori-1 Infi
13
The P(Inf1) probabilities must be modeled.
Inf1 Inf2 Inf3 Inf7 Inf6 Inf5 Inf4 Test1 Test2 Test3 Test7 Test6 Test5 Test4 Cor1 Cor2 Cor3 Cor4 Cor5 Cor6
Assume that the milk test is made on single cow level at the farm. We need the probability λ that the milk from a particular cow is infected on an arbitrary day (i.e. P(Infi = “yes”) = λ). The farmer has no knowledge about λ, but The dairy performs a very precise bulk tank test:
If the milk from just one cow is infected, the bulk tank test will be positive. On average, the bulk tank test is positive once a month
14
Further assumptions:
λ is the same for all cows Cows are infected independently.
Under those assumptions: (1 - λ)50 = 29/30 ⇔ λ = 1 – (29/30)0.02 ≈ 0.0007
The P(Infi | Infi-1) conditional probabilities must be modeled.
Inf1 Inf2 Inf3 Inf7 Inf6 Inf5 Inf4 Test1 Test2 Test3 Test7 Test6 Test5 Test4 Cor1 Cor2 Cor3 Cor4 Cor5 Cor6
15
Assume the following properties of the infection:
A not-infected cow has probability q of becoming infected. An infection always lasts for at least 2 days After 2 days, the probability of recovery is π Define a state space model: si ∈ {nn, ny, yn, yy} where e.g. “ny” means: not-infected day i-1 but infected day i
Transition propabilities: (1-q) (1-q) nn Day i+1 Day i P(yes) yy yn ny (1-π) (1-π) π yy q q yn 1 1 ny q q nn Only assumptions on min. duration, q and π
16
There are basically 3 parameters:
Duration minimum 2 days Probability of becoming infected q Daily probability of recovery π (after 2 days)
The 3 parameters should be estimated from data. If data is not available, we may have to rely on experts. Experts’ guesses may be calibrated to the overall probability of infection at a given day.
17
The P(Infi | Infi-1) conditional probabilities must be modeled.
Inf1 Inf2 Inf3 Inf7 Inf6 Inf5 Inf4 Test1 Test2 Test3 Test7 Test6 Test5 Test4 Cor1 Cor2 Cor3 Cor4 Cor5 Cor6
Some assumption compensating for the fact that we don’t know the infection state two days ago…
18
John is suffering from a serious hereditary disease caused by a recessive gene. State space for a horse: aa, aA or AA The genotype aa is diseased. The genotype aA is carrier. We want to cull all carriers!
Ann Brian Cecily Fred Dorothy Eric Gwenn Henry Irene John
Father Mother (0, 0, 1) (0, 0.5, 0.5) (0, 1, 0) AA (0, 0.5, 0.5) (0.25, 0.5, 0.25) (0.5, 0.5, 0) aA (0, 1, 0) (0.5, 0.5, 0) (1, 0, 0) aa AA aA aa
19
Two unknown parents:
Assume that the distribution reflects the population probabilities of being healthy or carrier (if they had beem diseased they would not have survived until “breeding age”).
One unknown parent:
Introduce a “dummy” parent reflecting the population distribution.
Impossible for all other horses than John Two options:
Delete the state and adjust all probabilities accordingly. Keep the state and enter the evidence that the horses are either healthy or carriers
20
Sources:
Pure data estimation (frequency counts) Model and parameter estimation from data Provided “by nature” Subjective expert assessments
Determining the graphical structure
Milk test Mastitis diagnosis Pregnancy
Determining the conditional probabilities Modeling methods and tricks
21
We observe the spread of a contagious disease in a population of, say 50 animals. Each animal is either
Susceptible Infective Removed (recovered/dead)
Let Si, Ii and Ri be the number of susceptible, infective and removed, respectively, at time i
Basic problem: Si, Ii and Ri are not independent
Cannot be solved by directed (causal) edges.
Even though the conditional probabilities of the model may be correct, it may happen that Si + Ii + Ri ≠ 50
S1 R1 I1 S2 R2 I2 S3 R3 I3
22
P(Ci = Valid | Si, Ii, Ri) = 1 if Si + Ii + Ri = 50 P(Ci = Valid | Si, Ii, Ri) = 0 if Si + Ii + Ri ≠ 50 Enter the evidence P(Ci = Valid) and propagate
S1 R1 I1 S2 R2 I2 S3 R3 I3
C1 C1 C1
{“Valid”, “Invalid”}
Refer also to the sock-sorting problem in the textbook.