Integrated Crisis Early Warning System (ICEWS) Computational Social Science Experimental Proving Ground (CSS:EPG)
- Dr. Sean O’Brien
December 29, 2008
Integrated Crisis Early Warning System (ICEWS) Computational Social - - PowerPoint PPT Presentation
Integrated Crisis Early Warning System (ICEWS) Computational Social Science Experimental Proving Ground (CSS:EPG) Dr. Sean OBrien December 29, 2008 Program Summary Objective Create a comprehensive , integrated , automated , validated ,
Integrated Crisis Early Warning System (ICEWS) Computational Social Science Experimental Proving Ground (CSS:EPG)
December 29, 2008
Distribution authorized to U.S. Government Agencies only
1
Program Summary
Objective
– Create a comprehensive, integrated, automated, validated, analytic system that forecasts shifts toward/away from country/regional instability
Approach
– Monitor and assess, in near-real time, events and trends that may trigger crises – Develop and integrate validated, transparent, scientifically rigorous, robust and replicable models from diverse perspectives across multiple domains/levels of analysis
February 2007 Solicitation for ICEWS September 2007 Contract Awards October 2007 Program start December 2008 Phase 1 Go/No-Go and Phase 2 Option Decisions March 2009 Phase 2 Options Awarded June 2010 Phase 3 Option Decisions September 2010 Phase 3 Options Awarded
Distribution authorized to U.S. Government Agencies only
2
Performers and Locations
Arlington, VA Atlanta, GA Lawrence, KS Vienna, VA Philadelphia, PA Lexington, SC Athens, GA Seattle, WA Vienna, VA Arlington, VA Philadelphia, PA Weston, MA Washington, DC College Park, MD Arlington, VA Cambridge, MA Oahu, HI Albany, NY Syracuse, NY Hilliard, OH Aspen, CO
Washington, DC
Distribution authorized to U.S. Government Agencies only
3
Data Ingest Services DIME Action Modeling
Event Coding
(KU, SAE)
Data Collection
Models and Model Services
DIAS Framework (ANL) adapted by LM LM
Agent Based Models
(UP, LC, LM)
Bayesian Models
(IDI, EG, SAE)
Statistical Models
(KU, SAE, UW, EG)
Leaders Economics Groups & Factions
Institutions
Text Processing & Analytics SME Interview & QC Tools Events of Interest Overall Stability Stability Assessment & Mitigation Planning
Interactive Interfaces to View & Query Forecasts and Mitigation Strategies
News Feeds Blogs & Reports Databases
Model Inputs
LM-ATL Raven System Concept
DIME Action Mining
(LM)
DIME Action Models
(IDI, LM, UP)
Region/Countries (DIME Effects)
(LC, LM)
Regional Players (DIME Strategies)
(UP)
Stability Impacts
(Linkages to Forecasting Models’ Levers) Stability Forecasting
Key Indicators Primary Phase I Focus Extension of Phase I Primary Phase II Focus
DIME Action Exploration
DIME Actions
Futures Exploration
Aggregation, Explanation, & Transparency
Distribution authorized to U.S. Government Agencies only
4
Phase 1 Achievements
Largest data set ever collected/analyzed for instability forecasting project
– 6.5M news stories from 75 national, regional, and international sources (253M lines of text)
– 100 other sources of data on country social, demographic, economic, leadership and political factors
Correlates of War project
Developed fully automated capability to monitor and forecast political activity around the globe
– Automatically convert news reports into structured indices that reflect the character and intensity of interactions between key leaders, organizations, and countries —who is doing what to whom, when, where and how around the world
stories coded in four major categories—verbal cooperation/conflict, material cooperation/conflict—comprising 130 variables to measure and monitor the character and intensity of a broad range of political activities
– Resulting indices are used in computer models to identify trends, and cue analysts to impending conflicts
By aggregating some of the indices, performers were able to successfully forecast a variety of instability events in the Pacific Command area of responsibility
– Though only one performer passed Phase 1 gates
Some novel new insights
Distribution authorized to U.S. Government Agencies only
5
Phase 1 New Insights on Forces Driving Country Instability
Identified and demonstrated a “repulsion” effect based on social similarity connectedness
– In countries with similar temporal event patterns, the presence of social unrest in one country, reduces its likelihood in connected countries.
Empirically confirmed inverted “U” relationship between government repression and probability of ethnic/religious violence
– However, effects are conditional on the ethnic composition of the society
Weakness in the combined strength of the two dominant rival parties is a leading indicator for higher violence
.004 ELF .16 ELF .36 ELF .6 ELF .86 ELF 0.5 1 1 2 3 4 5
Political Terror Scale
.004 ELF .16 ELF
Distribution authorized to U.S. Government Agencies only
6
Evaluation Methodology
GFI Data: “Authoritative” set of data on the occurrences and intensity of de-stabilizing events in 29 countries of the PACOM AOR for the period 1998-2006
– Intensity levels 0-4* (common yardstick)
– De-stabilizing events (variable; best 3 scored)
rebellion or insurgency (e.g. power struggle between two political factions involving disruptive strikes or violent clashes between supporters)
directed against the government
more states that could lead to conflict
– Data for 1998-2004 provided to performers for training – Data for 2005-2006 withheld for testing
*Index of Instability/Conflict Intensity from the Heidelberg Institute for International Conflict Research
Distribution authorized to U.S. Government Agencies only
7
Evaluation Methodology
Performers calculate probability of 4 intensity levels for each country 6 mos hence
– Bin probabilities using the 2/3 rule to determine “Low”, “Moderate” and “High” intensity – Compare performers matrix with true matrix for accuracy, recall and precision
Country Year Quarter Max Intensity = 0 Max Intensity = 1 Max Intensity = 2 Max Intensity = 3 Max Intensity = 4 1 0.0192 0.4399 0.5343 0.0065 2 0.0489 0.1817 0.5238 0.2456 3 0.1807 0.3831 0.3587 0.0774 4 0.3012 0.4243 0.2456 0.0288 1 0.1478 0.3432 0.4087 0.1003 2 0.1700 0.3826 0.3683 0.0791 3 0.2492 0.4085 0.2939 0.0484 4 0.3143 0.4450 0.2184 0.0224 Burma 2005 2006
Sum of Probabilities >=2/3, Forecast = High Intensity Sum of Probabilities >=2/3, Forecast = Moderate Intensity Sum of Probabilities >= 2/3, Forecast = Low Intensity
Performers calculate probability of discrete de-stabilizing events for each country 6 mos hence
– Use 2/3 rule to determine “0” or “1” – Compare performers matrix with true matrix for accuracy, recall and precision – Best forecasts for 3 de-stabilizing events will we scored and reported
Country Year Quarter Rebellion Insurgency Domestic Political Crisis Ethnic/ Religious Violence International Crisis 1 0.0027 0.0004 1.0000 0.0000 0.4657 2 0.0034 0.2269 0.8918 0.0000 0.4657 3 0.0000 0.0682 0.9999 0.0000 0.0438 4 0.0050 0.1720 0.9910 0.0000 0.0438 1 0.0034 0.9992 0.9996 0.0000 0.4657 2 0.0013 0.4501 0.9979 0.0005 0.4116 3 0.0000 0.1695 0.9974 0.0000 0.8099 4 0.0001 0.9999 0.9943 0.0000 0.8099 Bangladesh 2005 2006
Probability >=2/3, Forecast = Insurgency 1/3 < Probability < 2/3, No forecast Probability <= 1/3, Forecast = No Rebellion
Distribution authorized to U.S. Government Agencies only
8
Forecasting-Performance Metrics
Phase 0 (Benchmark) 80% (Retrospective) 80% 70% Annual Phase 1 > 80% (Retrospective) 80% 70% 1-6 mos. Phase 2 >85% (Real-Time) 80% 70% 1-3 mos. Phase 3 >85% (Real-Time) 80% 70% 1 mo.
GO/ NO GO
Phase Accuracy Forecast Window Recall Precision
Distribution authorized to U.S. Government Agencies only
9
Results: LM-ATL
Exceeds metrics for the maximum intensity index and 3 instability events: Rebellion, Insurgency, and Ethnic/Religious Violence – Passes Phase 1 gates By integrating improved versions of best of breed models from multiple perspectives, team achieves more accurate, precise forecasts than any one model alone 0% 20% 40% 60% 80% 100% HI Reb Insur DPC ERV IC Accuracy Recall Precision
Precision Threshold Accuracy & Recall Threshold Max Intensity Rebellion Insurgency Ethnic/Religious Violence Domestic Political Crisis International Crisis
Distribution authorized to U.S. Government Agencies only
10
LM-ATL: Quarterly Results
(Phase 2 goal)
0% 20% 40% 60% 80% 100% HI Reb Insur DPC ERV IC Accuracy Recall Precision
Precision Threshold Accuracy & Recall Threshold Max Intensity Rebellion Insurgency Ethnic/Religious Violence Domestic Political Crisis International Crisis
Exceeds metrics for the maximum intensity index and 2 instability events: Rebellion, Insurgency Very close on metrics for Ethnic/Religious Violence Almost passes Phase 2 metrics (retrospective vice near real time)
Distribution authorized to U.S. Government Agencies only
11
LM-ATL Hypothesis
Aggregating model scores with a learned Bayesian network
model
– Because different models cover the EOIs and countries with varying levels of performance
0.00 0.50 1.00 1.50 2.00 2.50 3.00 B u r m a R u s s i a C h i n a N e p a l M a l a y s i a A u s t r a l i a N
t h K
e a B h u t a n V i e t n a m S
I s l a n d s L a
P a p u a N e w G u i n e a J a p a n S
t h K
e a S i n g a p
e M
g
i a F i j i N e w Z e a l a n d M a u r i t i u s I n d i a C
T a i w a n M a d a g a s c a r C a m b
i a I n d
e s i a B a n g l a d e s h T h a i l a n d P h i l i p p i n e s S r i L a n k a UW SAE B 0.00 0.20 0.40 0.60 0.80 1.00 1.20 LM ATL UW LC UPenn SAE L UK SAE B
University of Washington, Dr. Michael Ward Geo-spatial statistics applied to trade ties, flow of people, social similarity University of Penn,
Agent-base models (only 6 and 4 “hard” countries resp.) University of Kansas, Dr. Phil Schrodt College of William and Mary, Dr. Steve Shellman Logit-based statistical models College of William and Mary, Dr. Steve Shellman Bayesian statistical model Aggregation model
Average distance, in quarters and over 29 countries, between probability prediction and ground truth vector for Rebellion, 2005-2006 Distance (in quarters) between probability prediction and ground truth vector for Rebellion, for UW and SAE B models, 2005-2006 Lower value = less error
UW model performs better on these countries SAE B model performs better on these countries
Except for Bangladesh, Rebellion is picked up by some other model
Distribution authorized to U.S. Government Agencies only
12
ICEWS Phase 2 CONOPS
Leadership Characteristics Medium-term (intermittent) I nternal/ External I nteractions Short Term (weekly) Macro-Structural Conditions Longer Term (annually)
Stability/ I nstability Forecast
Phase 1: Retrospective 3-6 months forecast of crises, identification of leading indicators Phase 2 Objective 2: Identification, validation and modeling of linkages between DIME actions and leading indicators
Tier 0 Tier 1 Tier 2
Phase 2 Objective 1: Real-time 1 month forecast of EOIs, refinement of leading indicators Also, 2-3 year forecast window Phase 2 DIME data: Taxonomy
2005-present; Continue collection during Phase 2
This is the original Phase 3 objective This is a new objective, requested by PACOM
Distribution authorized to U.S. Government Agencies only
13
Timeline And Evaluation
1 15 16 17 18 19 20 21 22 23 24
Monthly forecasts 1 - 9 LE LE Objective 2: Linkage Evaluations Objective 1: Forecasts DIME data collection/analysis
… …
2-3 year 2-3 year PACOM Evaluation Workshop 1 PACOM Evaluation Workshop 2
14
2-3 year
13 12
Graded for Go/No Go Assessment
Distribution authorized to U.S. Government Agencies only
14
Computational Social Science Experimental Proving Ground (CSS:EPG)
Objective: Develop a synthetic lab to formalize, validate, and integrate social science theories, in a disciplined and cumulative manner, to enable efficient spin
The synthetic lab (transition 1) continuously develops the social science knowledge to enable more and better military applications (transition 2)
knowledge
Synthetic Lab
Military & Intelligence Applications
Possibly Proprietary & Classified Non-Proprietary & Open Source Social Science Research Community Social Computing Performer Community
CSS:EPG will revolutionize the way social science knowledge is developed, formally validated and applied to real-world strategic and operational decisions
CSS:EPG Trajectory
Distribution authorized to U.S. Government Agencies only
16
Today
Do we have the tools to compose, integrate, and evaluate the theories for predictive potency? Yes, simulations have been used to resolve competing theoretical claims and make predictions, in manual, “one-off” cases. Do we have theories that can reliably address operational questions? Yes, but they are “islands” of competing theoretical claims that need to be integrated, formalized, tested and evaluated holistically
Program End
Performance
CSS projects, quickly forgotten
software level is ad-hoc, error- prone, and un-validatable
Scenario3_V2.wmv
Real time news feeds Real time Simulation
Scenario3_V2.wmv
Theory-driven, data-driven simulations to reliably and flexibly address a broad range of
social, cultural, and behavioral implications
Time
Dec 2013
Can we automate the process so theory/data- driven simulations can be applied to current, real world operational questions? Yes, massive simulations can be analyzed, automatically compiled, and executed in real time, and real time exploitation of relevant data is mature.
CSS:EPG Three Challenges
Generality
– Number of analytic cases and number of answerable questions that the simulation environment can address reliably
Composability
– Integrating new theories from 3rd parties – Formalize, integrate and evaluate new theories within the overarching theoretical framework and simulation environment – Question refinement – Decompose operational question into theoretical components, match theoretical components to the relevant theories necessary to answer them, select and assemble theories, simulation components, models, and data streams to answer the question
Timeliness
– Meta-control of question refinement, theory selection and integration, data processing, execution, and analysis – Near real-time data ingest to update events and changing circumstances in AOR – Timely provision of answers to commander’s question
Distribution authorized to U.S. Government Agencies only
17
Increasing generality to the scale necessary for usability by domain experts and operators requires composability and timeliness
Analysis Question HPC Experimentation Environment 18 Data Social Science Theories Historical data artifacts modifiers Integrated model instantiation Experimentation protocol Real-time data Simulations Analysis of results
Commander’s forward looking question or Theoretical question Formal mapping into theoretical framework
Components of Experimentation Lab
Distribution authorized to U.S. Government Agencies only
Top-Level CSS:EPG Program Delivery
Phase 1 (18 mos) Phase 2 (18 mos) Phase 3 (18 mos) Generality
3 training cases for each analytic question: 1 chosen by performer; 2 developed by govt One case withheld by govt for testing, for each analytic question 3 analytic questions for training, 1 analytic question, withheld by govt, for testing 10-12 cases 10 analytic questions for training; 3 questions withheld for testing Entire AOR, contemporaneous Class of questions relevant to JTF commander during Phase 0 and Phase 4
Composability
No composibility Composibility via external competition for the solution of an
relevant, social science argument Composibility via external competition
Timeliness
Automation of set-up only No real-time Automated explanation and assessment, done in batch Real-time data Meta-control of execution and analysis Real-time data and execution Each CSS:EPG performer delivers a single modeling environment capable of answering a few questions from a single problem class (Counterinsurgency- questions in backup) CSS:EPG performers extend their simulation platform to answer questions from multiple problem classes and demonstrate ability to formalize and integrate theories, proposed by others CSS:EPG performer extends simulation platform to (semi) automate question refinement, theory selection, model integration, real-time execution, integration of static & real-time data, and analysis Two Transitions:
applications
& expand infrastructure Technical Challenge emphasized in each phase
Distribution authorized to U.S. Government Agencies only
20
Top-Level CSS:EPG Program Structure
Phase 1 (18 mos) Phase 2 (18 mos) Phase 3 (18 mos) Generality
3 training cases for each analytic question: 1 chosen by performer; 2 developed by govt; One blind case for testing 4 analytic questions for training, 1 blind analytic question for testing 10-12 training cases, 3 blind cases for testing 10 analytic questions for training; 3 blind questions for testing; Broader questions that focus on outstanding,
science arguments All major problems currently relevant to a selected AOR Class of questions relevant to JTF commander for Phase 0 through Phase 4
Composability Manual composability within
simulation environment Manual composability (with help) by external Grand Challenge competitions among participants for the solution of the analytic questions (next slide) Composability (without help) by external competition participants
Timeliness
Automation of set-up only Automated compilation and execution Automated analysis of results, done in batch Near real-time data Automation and meta-control
theory selection, model integration, batch and real- time data ingest, execution, and analysis Near real-time data; Execution quick enough for answer to be relevant to JTF commander Technical Challenge emphasized in each phase
Summary
The Department currently has no reliable, validated means to answer human, social, cultural, and behavioral questions of major import to US national security. Current approach of integrating unproven/untested models/theories at the software level won’t get us there—just provide the means to get to wrong answer quicker. Current academic approach of testing theories in inter (and intra) disciplinary isolation won’t get us there—competing theoretical claims depend on theories from other disciplines at different levels of analysis for their resolution. CSS:EPG has the potential to revolutionize the way that social science knowledge is developed, evaluated, and applied to important operational challenges
Distribution authorized to U.S. Government Agencies only
21
Distribution authorized to U.S. Government Agencies only
22
Distribution authorized to U.S. Government Agencies only
23
Test Metrics
Distribution authorized to U.S. Government Agencies only
24
Time t1 t2 t3 t4 t5 tn ….. Value Ground Truth For v1 Want the average simulation
truth 30% band around ground truth Also want the average simulation
ground truth Time t1 t2 t3 t4 t5 tn ….. Value First Derivative For v1 Example Phase 1 Dependent Variables: Number of violent acts; intensity and kind of gov’t counter- insurgent activities; popular support for gov’t/insurgents; political participation; economic development 30% band around ground truth
Test metrics geared to insure simulation environment captures accurately the value and trend
Test Metrics - Mean Distance Between Simulation Runs & Ground Truth
Distribution authorized to U.S. Government Agencies only
25
Time t1 t2 t3 t4 t5 tn ….. Value Ground Truth For v1 Simulation 1 d1 d2 d3 d4 d5 dn Average Relative Error for Simulation 1 = 1/n(d1/|y1|+d2/|y2|+d3/|y3|+d4/|y4|+d5/|y5|+…+dn/|yn|) Simulation 2 Simulation m Average Error = 1/m(sum of average error for each simulation ) Simulation 3 Average Error < 0.3
y1 yn y2 y3 y4 y5
Test Metrics - Mean Distance Between 1st Derivative of Simulation Runs and Ground Truth
Distribution authorized to U.S. Government Agencies only
26
Time t1 t2 t3 t4 t5 tn ….. Value Ground Truth For v1 Simulation 1 d3 d4 d5 dn Average Relative Error for 1st Derivative of Simulation 1 = 1/(n-1)(d2/|z2|+d3/|z3|+d4/|z4|+d5/|z5|+…+dn/|zn|) Simulation 2 Simulation m Average Error = 1/m(sum of average error for each simulation ) Simulation 3 Average Error < 0.3
y1 yn y2 y3 y4 y5
Time t1 t2 t3 t4 t5 tn ….. Value
zn z2 z3 z4 z5
First Derivative For v1 First Derivative for Simulation 1 d2 First Derivative for Simulation 2 First Derivative for Simulation 3 First Derivative for Simulation m
Distribution authorized to U.S. Government Agencies only
27
CSS:EPG – Phase 1 Analytic Questions
Questions Training Cases Test Case
When will introducing or increasing foreign force presence in a region increase or decrease the insurgency or shift its focus? Iraq Afghanistan Performer Bid Blind Case When will bolstering Host Nation forces help suppress the insurgency? Re-orient violence
Iraq Afghanistan Performer Bid Under what conditions does government repression depress/increase illegal or violent anti-government activity? Iraq Afghanistan Performer Bid What effects does killing or capturing a high profile insurgent leader have on: popular support for insurgency; insurgent cohesion; insurgent strategy; military/political effectiveness? Iraq Afghanistan Performer Bid Blind Question
CSS:EPG – Phase 2 Analytic Questions
Question Cases Under what conditions does discontent with government policy or performance transform into threats to the regime? Eastern European countries (1989-90); China (1989); Philippines (1983-86); Israel in the West Bank and Gaza (1987-1989); Georgia (2003-04); Bolivia (1983-2008); Egypt (1977) Under what conditions can societies deeply divided along communal lines produce stabilizing overarching identities? Canada (1970-1990); Lebanon (1960-1980); Tanzania & Kenya (1960-1990) Under what conditions can antagonistic groups live together peacefully within a single legal political order? Sudan (1960-Present); Yugoslavia (1950-90); Israel (1948- Present); Algeria (1970-2000); Malaysia (1960-2000) When does corruption increase political stability and when does it decrease it? Central Asian Republics post-Soviet Union; Bangladesh, Indonesia, Vietnam, China (1990s, 2000s) Under what conditions do elections foster unity among antagonistic ethnic groups and under what conditions do elections accentuate tendencies toward violent intergroup conflict or secession? India, Bangladesh, Indonesia, Nigeria (last 20 or 30 years) Do political parties based on religion differ in their performance and democratic potential from parties based on class or ethnicity? India, Malaysia, Morocco, Sri Lanka, Israel, Turkey
CSS:EPG – Phase 2 Analytic Questions (cont’d)
Question Cases Under what conditions does strengthening civil society provide the basis for overthrowing the regime and under what conditions does it support the stability of the regime? Pre- vs post-1989 Soviet Bloc countries What kinds of authoritarian regimes can democratize and what kinds cannot? 1980s Philippines (Marcos), Morocco, Russia, China, Iran, Egypt, Malaysia, Algeria, Pakistan What policies toward particular types of authoritarian regimes can produce democratization? South Africa, Zimbabwe, Philippines What transitions to stability are available for polities divided between modern and prosperous urban sectors and populous but poor and traditionalist rural areas? Thailand 2000s Turkey 1960s-1990s China 1970s-2000s How can democratic transitions be managed to reduce the threat of external aggression? Georgia, 1990s, 200s; Lebanon 2000s; Yugoslavia early 1990s What is the political effect of different kinds of violence on the stability of different kinds of political regimes enjoying different levels of political support? Bangladesh 1990s-2000; Afghanistan post 2001; Northern Ireland 1960s-1990s; India 1960s-2000s; Zimbabwe