 
              Challenges in Intelligence Analysis Under Data Overload Emily S. Patterson, PhD Research Scientist Associate Director, Converging Perspectives on Data (CPoD)
Intelligence Analysis: Avoiding Surprise August 7, 1998 Bombing of US Embassy in Africa 224 killed, including 12 US personnel
Data Overload Definition • Data overload in inferential analysis*: – A condition where a domain practitioner, supported by artifacts and other human agents, finds it extremely challenging to focus in on, assemble, and synthesize the significant subset of data for the problem context into a coherent assessment of a situation, where the subset of data is a small portion of a vast data field *See Woods, Patterson, and Roth, 2002 for alternative definitions
Interdisciplinary (CSE + Design) Approach • Identify vulnerabilities to failing to meet work demands • Reduce vulnerabilities with innovations that: – Cope with context sensitivity in interpreting the meaning of data – Are robust to brittleness in machine processing • Focus on leverage points : 1. Process vulnerability (either now or in the future) that has high consequences for failure 2. “New” technological or organizational capability 3. Confident in predicting impact on performance • well-developed “design” and “science” research base, experience in other worlds…
CSEL “Study base” on Intel Analysis Studies: • 10 expert (13 yrs) NASIC analysts on Ariane 501 • 2 junior (not cleared) NASIC analysts on Ariane 501 • 6 expert NASIC analysts critiquing junior analyst on Ariane 501 • 6-day observations of army captains (in training) doing collaborative counter-terrorism (Germans in 1940s) • 4-day observations of army lieutenants (in training) doing modern-day Stability and Support Operations (SASO) • Interviews of ~50 army intelligence analysts • 2 novice, 3 expert NSA analysts critiquing junior NASIC analyst on Ariane 501 • A lot not in Intel Analysis
Community “knowledge base” (very incomplete – help welcome) • Bamford’s "The Puzzle Palace: A Report on America's Most Secret Agency” • Richards Heuer’s The Psychology of Intelligence Analysis (1999) • Klein Assoc studies of “profilers” (Klein, 2001, Hutchins, 2003, Pirolli et.al, 2004) • Department of Defense’s “Novel Intelligence in Massive Data / Glass Box” • Anthropologically-based needs analysis (Johnston, 2005) • Laboratory experiments (Cheikes and Taylor, 2003 and Cheikes, Brown, Lehner and Adleman, 2004) • Website: http://www.tkb.org has great data on terrorism • Friends of the Intelligence Community (FOIC - Brian Moon)
Convergent broadening / narrowing model of decision-making for intelligence analysis Hypothesis Conflict and Down Collect Exploration Corroboration Broadening checks
Down Collect • 10 NASIC experts, 1 novice doing Ariane 501: – Refine until manageable (22 – 419 documents) – Open based on dates and titles (4 – 29 documents) – Rely heavily on small number (1-4 documents) • NSA expert: Start with key terms…something jumps out at me and I follow that route…”I know it when I find it”…Always look for dates - current means less than 2 years…do anything to reduce the number of hits…Wean by year…I go through 2-3 filter and sort processes (hi-level, sorting, choosing what to use) story? Outliers? Conflicts? • NSA expert: 57 hits, that’s nothing…lead information, jot down tidbits for later digging, anything that can be pulled on, names companies, software, buildings…biggest problem is getting right search terminology to get what you want…‘gold nuggets’ concept…I build a house, get started and fill in…write down search terms on paper, but do most of the analysis in my head • NSA expert: These search terms are generic, going to get lots of the same stuff…query is too broad…add in ‘economic impact’ or ‘political’… looking for consensus on what happened…take little trails off the main path to investigate subtopics
Down Collect • NASIC critiquing interviews – Look for one or two articles that specifically talk about incident, get feel, then go back and search. – Use a broad query to pull in lots of things – Need to check intelligence sources not just open source – Refine search if you know there should be lots of information out there – Only going to get 10% of the data if you're lucky - frequently only 1% – Use a broad query if unsure of what is being looked for – Documents that have more detailed information might be more valuable – Commercial translations miss subtleties - use translators to get all of the little connotations for critical data – Documents before an event can give you background that might not come up in later documents
Down Collect (Document Selection) • NSA expert: I do quick glancing, I don’t read whole documents right away; names, technology, places, too many nuances for an automated system… • NSA expert: – scandals and dirt and stolen intellectual property are always important to find…Event recognition from newspapers… – there are usually 4-5 I’m quoting large chunks from; requirements for me: • doc uses certain set of phrases • provides a good succinct history of matter at hand • good lay-translation • written 2-6 weeks after the event in question – historical familiarity – favorite source…sometimes serendipity… good analysis – “editorial” style docs more useful for tidbits and pointers, someone’s opinion, some fact but lower weighting…sometimes useful to capture the debate • NSA novice: Q: If you could see more than date and title, what would you want? Set up like MS Outlook preview, 1st three lines… want to see source of document too. I go straight down the list 1, 2, 3.
Low and High Profit Documents (Indistinguishable with Current Interface)
Outcome Comparison: Did Not Rely on High Profits vs. Did * Significant difference using Wilcoxon-Mann-Whitney Non-Parametric Test
“Narrowing” Search Tactics
Conflict and Corroboration • 10 NASIC experts, 2 novices doing Ariane 501: – Trust “key” documents – Mixed on whether explicitly search for conflicting assessments (high level only, not on details) – Mixed on whether reference sources (considered unprofessional to put multiple explanations in analysis document) – (When noticed) break ties based on “quality” attributes : • Language: Technical expertise, translation, biased interpretation, “facts” vs implications, past vs future, consensus vs multiple interpretations, uncertainty • Source: reason for deception, access to privileged information, “trustworthiness” of source
Conflict and Corroboration • Individual strategies vary - little electronic support – Search more documents to break ties – Check if multiple reports were from the same press release – Look to see whether corrections were made later – Look for how things are generally done to see if different – Use highlighter pens for all documents on that topic – Print out documents and highlight one topic per color from independent sources – Highlight “loose ends” phrases in Word with colored font – Highlight when data comes from the same original source – Tracking reference information for discrepant information – Ask expert in the area – Sort printed documents into categories (one paragraph each) and then review for differences of opinion – Pick “best” source and cut/paste
Conflict and Corroboration • NSA interviews – NSA novice: I compare it across multiple sources…gather as much data as possible, and rely on my mentor for feedback/guidance. – NSA expert: dealing with contraindicating facts…have to dig hard to deconflict…the weight I put into the source effects how I deconflict facts…aware of ultimate source (creeping validity) problem – in regards to multiple instances of essentially the same data…hard to tell when a cited source has been updated… – NSA expert: never take one source’s word…corroborate; want unaffected contributions – sigint, imageint, etc.; go from different data sources…reports can look like multiple sources of information, if there’s no serial number…go outside my document frequently to verify things, and will say I spoke to experts in X shop and they agree that Y is the case…not just citing reports, but saying talked to actual person or office – NSA novice: have electronic notes of who said what, reference the document, build all together in one big notes page…write down different POV’s
Conflict and Corroboration • NASIC critiquing interviews – Talk to other analysts to see to discuss the problem – What source information comes from is very important, loses validity if 2nd or 3rd hand information – It's necessary to corroborate information, might not use if only in one source. – Be aware of directed sources, where they only put in what they want you to believe – Need multiple sources to confirm data – Talk to other people to get their take – Reports six months or so after an event (depending on the event) probably have more accurate information than those immediately around event – Take open source with a grain of salt - might be on soap box, misled themselves, intentionally misleading audience – Human sources have to have direct knowledge for creditability
Reasons for Inaccurate Statements 1. Relying on default assumptions 2. Repeating inaccurate information 3. Missing updates that overturn analyses
Recommend
More recommend