AMBIGUITY
Roger Cooke Resources for the Future
- Dept. Math, Delft Univ. of
Technology,
- Oct. 24, 2011
UNCERTAINTY INDECISION
Technology, Oct. 24, 2011 Websites & Links Radiation - - PowerPoint PPT Presentation
UNCERTAINTY AMBIGUITY Roger Cooke Resources for the Future Dept. Math, Delft Univ. of INDECISION Technology, Oct. 24, 2011 Websites & Links Radiation Protection Dosimetry 90: (2000)
AMBIGUITY
Roger Cooke Resources for the Future
Technology,
UNCERTAINTY INDECISION
Websites & Links
http://rpd.oxfordjournals.org/cgi/content/short/90/3/295
uncertainty analysis
http://www.osti.gov/bridge/basicsearch.jsp http://www.osti.gov/energycitations/basicsearch.jsp
assessment using COSYMA
http://cordis.europa.eu/fp5-euratom/src/lib_docs.htm
http://www.rff.org/rff/Events/Expert-Judgment-Workshop.cfm
http://dutiosc.twi.tudelft.nl/~risk/
AMBIGUITY INDECISION UNCERTAINTY
1
History Structured Expert Judgment in Risk Analysis
NUREG/CR-6372, 1997
18820EN, 2000
Characterizing, Communicating, and Incorporating Scientific Uncertainty in Climate Decision Making 2009
AMBIGUITY INDECISION UNCERTAINTY
2 Very Different Guidelines: The story you hear today is NOT the
Overview
NOT
AMBIGUITY INDECISION UNCERTAINTY
1
AMBIGUITY Is John (1.87m) tall? INDECISION Evacuate? UNCERTAINTY How harmful is 100Gy gamma radiation In 1 hr? AMBIGUITY Is John (1.87m) tall?
AMBIGUITY What means? INDECISION What’s best? UNCERTAINTY What Is?
AMBIGUITY Analysts’ job INDECISION Problem owners’ job UNCERTAINTY Experts’ job
“A big part of my frustration was that scientists would give me a range. And I would ask, „Please just tell me at which point you are safe, and we can do that.‟ But they would give a range, say, from 5 to 25 parts per billion”
Christine Todd Whitman, quoted in Environmental Science & Technology Online, April 20, 2005
Christine Todd Whitman Administrator EPA, 2001-2003
Operational Definitions
Mach, Hertz, Einstein, Bohr
IF BOB says
“The Loch Ness monster exists with degree of possibility 0.0731” to which sentences in the natural language not containing “degree of possibility” is BOB committed?
AMBIGUITY INDECISION UNCERTAINTY
1
Operational definition: Subjective
probability
Consider two events: F: France wins next World Cup Soccer tournament US: USA wins next World Cup Soccer tournament. Two lottery tickets: L(F): worth $10,000 if F, worth $1000 otherwise L(US): worth $10,000 if US, worth $1000 otherwise. John may choose ONE . John's degree belief (F) John’s degree belief (US) is operationalized as John chooses L(F) in the above choice situation
AMBIGUITY INDECISION UNCERTAINTY
1
Fundamental Theorem of Decision Theory
If, eg :
B: Belgium wins next World Cup Soccer tournament. L(F) > L(US); L(US) > L(B); L(F) > L(B) ?? L(F) > L(US) L(F or B) > L(US or B) ?? (plus some technical axioms)
Then There is a UNIQUE probability P which represents degree of
belief: DegBel(F) > DegBel(US) P(F) > P(US) AND a Utility function, unique op to 0 and 1, that represents values: L(F) > L(US) Exp’d Utility (L(F)) > Exp’d Utility (L(US))
PROOF (4 hrs) EJCoursenotes-Theory-Rational-Decision.doc
AMBIGUITY INDECISION UNCERTAINTY
Goals of an EJ study
AMBIGUITY INDECISION UNCERTAINTY
EJCoursenotes_review-EJ-literature.doc
EJ for RATIONAL CONSENSUS:
RESS-TUDdatabase.pdf Parties pre-commit to a method which satisfies necessary conditions for scientific method: Traceability/accountability Neutrality (don’t encourage untruthfulness) Fairness (ab initio, all experts equal) Empirical control (performance meas’t) Withdrawal post hoc incurs burden of proof.
Goal: comply with principals and combine experts‟ judgments to get a Good Probability Assessor “Classical Model for EJ”
AMBIGUITY INDECISION UNCERTAINTY
CLASSICAL MODEL
What is a GOOD subjective probability assessor?
– Are the expert‟s probability statements statistically accurate? P-value of statistical test
– Probability mass concentrated in a small region, relative to background measure
AMBIGUITY INDECISION UNCERTAINTY
Performance based score (weight):
AMBIGUITY INDECISION UNCERTAINTY
Calibration information cutoff Requires that experts assess uncertainty for variables for
which we (will) know the true values: Calibration / performance / seed variables any expert, or combination of experts (Decision Maker, dm), can be regarded as a statistical hypothesis
Expert maximizes long run expected score by, and only by, stating percentiles which (s)he believes
EJCoursenotes-ScoringRules.doc (4 hrs)
Performance score is a strictly proper scoring rule
Ambiguity Indecision Uncertainty
Equal weight decision maker
Performance Based Combinations
performance, linear pool of weighted experts
Combining Experts
Ambiguity Indecision Uncertainty
Ambiguity Indecision Uncertainty
Harvard-Kuwait SEJ Health Effects of Oil Fires: UN Claims commission
All cause mortality, percent increase per 1 μg/m3 increase in PM2.5 (RESS-PM25.pdf)
Amer Cancer Soc. (reanal.) Six Cities Study (reanal.) Harvard Kuwait, Equal weights (US) Harvard Kuwait, Performance weights (US) Median/best estimate
0.7 1.4 0.9657 0.6046
Ratio 95%/5%
2.5 4.8 257 63
AMBIGUITY INDECISION UNCERTAINTY
2
22
68% of 84 NIS established since 1959 associated with transoceanic shipping (Ricciardi 2006)
Robert Wood Johnson Foundation
Campylobacter: Chicken Processing Model
N
env
N
ext
c
env
aextA b c a
C
int
a
int w int
(1-a
int
) w
int
Chicken Environment Feces Transport from skin
Campylobacter Infection
Expert 7
Expert 10
Campylobacter: Chicken Processing Model
N
env
N
ext
c
env
aextA b c a
C
int
a
int w int
(1-a
int
) w
int
Chicken Environment Feces
aextB
Transport from skin transport from feathers
Calibration questions for PM2.5
RESS-PM25.pdf
In London 2000, weekly average PM10 was 18.4 μg/m3. What is the ratio:
# non-accidental deaths in the week with the highest average PM10 concentration (33.4 μg/m3) Weekly average # non-accidental deaths.
5% :_______ 25%:_______ 50% :_______ 75%:________95%:________
Ambiguity Indecision Uncertainty
Very informative assessors may be statistically least accurate
PM25-Range-graphs.doc
Ambiguity Indecision Uncertainty
Experts are sometimes well calibrated
AMS-OPTION-TRADERS-RANGE-GRAPHS.doc realestate-range graphs.doc RWJF-CoveringKids-Penn-RangeGraphs.doc
Sometimes not
GL-invasive-species-range-graphs.doc
Experts sometimes agree
Dispersion-USNRC-EU-RANGE-GRAPHS.doc
And sometimes don’t
Campy-range-graphs.doc Earlyhealth-USNRC-EU-Range-graphs.doc
Ambiguity Indecision Uncertainty
Classical model usually works, not always
Soil-animal-USNRC-EU-range-graphs.doc RWJ – Nebraska- range graphs.docx
TUD EJ database - calibration scores
Statistical Accuracy (p-values) 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Performance based DM Equal weight DM Statistical Accuracy (p-values) 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Best Expert Equal weight DM
Ambiguity Indecision Uncertainty
TUD EJ database - information scores
Informativeness
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 1 2 3 4 5Performance based DM Equal weight DM Informativeness
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 1 2 3 4 5Best expert Equal weight DM
Ambiguity Indecision Uncertainty
TUD EJ database – combined scores
Combined ScoresEqual DM's and Best Expt
0.5 1 1.5 2 2.5 0.5 1 1.5 2 2.5Best expert equal DM Combined Scores Best Expt and Perf DM's
0.5 1 1.5 2 2.5 0.5 1 1.5 2 2.5best expert Combined Scores Equal and Perf DM's
0.5 1 1.5 2 2.5 0.5 1 1.5 2 2.5Perf DM equal DM Combined Scores Best Expt and Perf DM's
0.5 1 1.5 2 2.5 0.5 1 1.5 2 2.5best expert Combined ScoresEqual DM's and Best Expt
0.5 1 1.5 2 2.5 0.5 1 1.5 2 2.5Best expert equal DM Combined Scores Equal and Perf DM's
0.5 1 1.5 2 2.5 0.5 1 1.5 2 2.5Perf DM equal DM
However, performance scores are calculated within-sample: weights are
calculated on the basis of available data (realizations), and performance scores are then calculated using the same data.
Ambiguity Indecision Uncertainty
True out-of-sample validation
PROXY Out-of-sample validation
Remove-One-at a-Time (ROAT) (Clemen 2008)*
variables
using weights calculated in step 2
calculate the score for this set of distributions
*Clemen RT. Comment on Cooke‟s classical method. RESS 2008;93:760-765
Ambiguity Indecision Uncertainty
ROAT findings
From Clemen RT. Comment on Cooke‟s classical method. RESS 2008;93:760-765
Only 9/14 times did PW-DM better than EW-DM! Not statistically convincing
Ambiguity Indecision Uncertainty
Issue with ROAT
DM
– split sample in half – see how well DM based on performance on first half does on second half (and vice versa) – need at least 16 question for statistical power
Ambiguity Indecision Uncertainty
Results half-sample validation
Ratios of combined scores: PW/Eq
0.001 0.01 0.1 1 10 100 1000
T U D d i s p e r 1 T U D d i s p e r 2 T U D d e p
1 T U D d e p
2 O p e r r i s k 1 O p e r r i s k 2 D i k e r i n g 1 D i k e r i n g 2 T h e r m b l d 1 T h e r m b l d 2 R e a l e s t a t e 1 R e a l e s t a t e 2 E U N R C D i s 1 E U N R C D i s 2 E U N R C I n t d
1 U E N R C I n t d
2 E U N R C S O I L 1 E U N R C S O I L 2 G a s E n v i r
1 G a s E n v i r
2 A O T 1 A O T 2 E U W D 1 E U W D 2 E S T E C 1 E S T E C 2
significant
Ambiguity Indecision Uncertainty
ROAT volatility of wghts
ROAT bias
P(Heads) experts 1 & 2: P1 (Heads)= 0.8, P2 (Heads) = 0.2: DM’s probability for heads = Pdm = wP1 + (1-w)P2, Weights proportional to likelihood of each expert’s distribution, given the data. Observe 10 Heads and 10 Tails: experts’ likelihood ratio is 0.810 0.210 = 0.80 0.20 = 1. 0.210 0.810 w = 1/2. If # Tails = 9 => weight ratio is 4 and w = 4/5 Pdm (Heads) = (4/5) 0.8 + (1/5) 0.2. = 0.68. …used to predict a TAIL!! STRONG BIAS. True out of sample with 20 fresh observations PW model would use w = ½. TRUE PW / ROAT likelihood ratio = (½)20 / 0.32)20 = 7523. , 0.810 0.29 = 0.8 / 0.2 = 4 0.210 0.89
Studies of EJ DATA:
Special issue on expert judgment Reliability Engineering & System Safety, 93, Available online 12 March 2007, Issue 5, May 2008.
1. Cooke, R.M., Goossens, L.H.J. (2008) TU Delft Expert Judgment Data Base, 2. Shi-Woei Lin and Bier V.M. (2008) A Study of Expert Overconfidence 3. Wisse, B. Tim Bedford, T. J. Quigley, J (2008) Expert Judgement Combination using Moment Methods, 4. Cooke,R.M. ElSaadany,S, Huang , X (2008) On the Performance of Social Network and Likelihood Based Expert Weighting Schemes, 5. Clemen RT.. (2008) “Comment on Cooke‟s classical method Reliability Engineering & System Safety, 93, Available online 12 March 2007, Volume 93, Issue 5, pp 760-765. 6. Cooke, R.M.,. (2008) Response to Comments, Special issue on expert judgment Reliability Engineering & System Safety, 93, 775-777, Available online 12 March 2007. Volume 93, Issue 5, May 2008. ALSO 1. Shi-Woei Lin, Chih-Hsing Cheng, (2009) "The reliability of aggregated probability judgments
pp.149 – 161 2. Shi-Woei Lin; Chih-Hsing Cheng (2008) “ Can Cooke‟s Model Sift Out Better Experts and Produce Well-Calibrated Aggregated Probabilities?” Proceedings of the 2008 IEEE IEEM 3. Flandoli, F. Giorgi W.P. Aspinall, W. and Neri A (2010). “ Comparing the performance of different expert elicitation models using a cross-validation technique” appearing in Reliability engineering and System Safety
Knowledge
Scientific method – NOT EJ methods - produces agreement among experts EJ is for quantifying ....not removing..... uncertainty.
Ambiguity Indecision Uncertainty
3
as subjective probability
TU DELFT Expert Judgment database 45 applications (anno 2005): # experts # variables # elicitations
Nuclear applications
98 2,203 20,461
Chemical & gas industry
56 403 4,491
Groundwater / water pollution / dike ring / barriers
49 212 3,714
Aerospace sector / space debris /aviation
51 161 1,149
Occupational sector: ladders / buildings (thermal physics)
13 70 800
Health: bovine / chicken (Campylobacter) / SARS
46 240 2,979
Banking: options / rent / operational risk
24 119 4,328
Volcanoes / dams
231 673 29079
Rest group
19 56 762
TOTAL 587
4137
67001
Ambiguity Indecision Uncertainty
3
PERFORMANCE !!!
Confidence, Blue ribbons, Citations Status…. do NOT predict performance
Ambiguity Indecision Uncertainty
3
EU-USNRC Expert Panels: Statistical accuracy and informativeness
0.2 0.4 0.6 0.8 1 SocNet Perf Equal SocNet Perf Equal SocNet Perf Equal SocNet Perf Equal SocNet Perf Equal SocNet Perf Equal SocNet Perf Equal Early Health Internal Dose Soil/Plant Animal Wet Deposition Dry Deposition Dispersion Statistical accuracy 0.2 0.4 0.6 0.8 1 1.2 Informativeness Stat.Acc Informativeness
RESS-SocNet&Likelwgts.pdf
“ In the first few weeks of the Montserrat crisis there was perhaps, at times, some unwarranted scientific dogmatism about what might or might not happen at the volcano, …. The result was a dip in the confidence of the authorities in the Montserrat Volcano Observatory team and, with it, some loss of public credibility; this was not fully restored until later, when a consensual approach was achieved. “
Aspinall et al The Montserrat Volcano Observatory: its evolution, organization, rôle and activities. ALSO: Aspinall_mvo_exerpts.pdf, Aspinall et al Geol Soc _.pdf , Aspinall & Cooke PSAM4 3- 9.pdf, SparksAspinall_VolcanicActivity.pdf
Ambiguity Indecision Uncertainty
3
“The goal should be to quantify uncertainty, not to remove it from the decision process” (Aspinall Nature 21 Jan.
2010)
assessment
Nature, 12 May 2011
Neri, A. et al. (Editors) (2008). Evaluating explosive eruption risk at European volcanoes. J. Volcanol.Geotherm. Res. Spec. Vol. 178. Aspinall, W. (2010) A route to more tractable expert advice. Nature, 463, 294-295. Aspinall WP, Woo G, Voight B, Baxter
volcanology: an application to volcanic
128: 273-285.
Sheep Scab
The choice is NOT whether to use EJ; but: do it well or do it badly?
Ambiguity Indecision Uncertainty
3
Using Uncertainty to Manage Vulcano risk response
Aspinall et al Geol Soc _.pdf
AMBIGUITY INDECISION UNCERTAINTY
1
Sheep Scab
Practical issues
1. The seed variables should sufficiently cover the case structures for elicitation.. 2. For each panel at least 10 seed variables are needed, preferably more. 3. Expert names and qualifications published, but not associated with assessments.
AMBIGUITY INDECISION UNCERTAINTY
3
Preparation of Elicitation Protocol
doc
Ambiguity Indecision Uncertainty
4
58
68% of 84 NIS established since 1959 associated with transoceanic shipping (Ricciardi 2006)