[Bracketing] the Black Swan in Intelligence Analysis (Part I)

This article is part of a recurring series by Steve Coulthart, a PhD candidate at the Graduate School for Public and International Affairs at the University of Pittsburgh.  If you have any questions or comments, feel free to contact him at SJC62@pitt.edu.

Intelligence analysts have a ‘black swan problem,’ or, if you want to be more academic you might call it the ‘problem of induction.’ The problem of induction touches on the conundrum of how we can know the future given the experience we have today.  Or, in other words, at what point can we say we know what we know?

An example from Taleb’s well-known book on black swans helps to clarify. Take the turkey’s dilemma: for the first 1,000 days of its life the turkey is fed and treated well, thus each day increasing its expectation that the following day will be the same. Yet, at day 1,001 that assumption is proven false and the turkey ends up for Thanksgiving dinner.

steve blog 1 fig 1

Substitute the turkey and insert an analyst trying to forecast the next revolution, coup, or terrorist attack and the task comes into focus. To avoid surprise, analysts attempt to foresee different possible outcomes. We can think of these different possible outcomes as hypotheses about things that could happen in the future. For example, in the case of outcomes for the Syrian Civil War, there are several possible hypotheses floating around: the Assad government wins, the stalemate lingers, rebels win, etc.

The first step for analysts working on a forecasting task is to conjure up hypotheses from their own experience, intelligence reports, experts, etc.  Most likely, the analyst will identify a few well-known hypotheses (such as the ones mentioned above). We know this because how well known a hypothesis is, measured in the form of how often it is cited in discussion (e.g. in news articles, amongst experts, etc.), fits a power law distribution. The practical result of this is that there is a set of core hypotheses almost everyone knows and a long tail of lesser known ones (this is due to Zipf’s Law, check out the link for more information).

In his pioneering research in the policy analysis field, William Dunn found that the most cited or discussed hypotheses are on the extreme left of the distribution while the black swans are on the extreme right. In intelligence analysis, these are the hypotheses that are often ignored until it is too late (e.g. the 9/11 attack). In our Syria example this could include something seemingly unlikely like an Iranian invasion of Syria.

steve blog 1 fig 2

What can analysts do to ‘reach out’ on to the tail? The common answer is to encourage analysts to think creatively and/or consider the complexity of the situation. To do this, analysts are trained in ‘imaginative’ structured analytic techniques that supposedly open their minds.   The U.S. Intelligence Community’s tradecraft primer lists a few of these techniques and Heuer and Pherson’s standard texthas several hypothesis generation techniques. Unfortunately, these techniques have a crucial weakness: there is no stopping rule.

What is a stopping rule? Well, like the turkey in the example above, the analyst doesn’t know when he or she can stop considering new hypotheses, including a potential black swan waiting in the wings (no pun intended).

Consider a hypothetical group of analysts brainstorming the outcomes of the Syrian Civil War. At what point should the analysts stop generating hypotheses?  Perhaps they have identified our black swan of Iran invading, but what now? Are they done? The common answer is to say when it “feels right,” but as we know, cognitive biases can creep in, and further, what if the black swan is still lurking out on the tail?

One possible answer, yet to be discussed in the intelligence analysis literature, is the use of boundary analysis developed by Dunn. As the name implies, boundary analysis is a method to determine the analytic ‘boundaries’ of a problem, in this case the number of plausible hypotheses. The technique also addresses the stopping rule problem plaguing imaginative structured analytic techniques.

Here’s how it works:

The first step in boundary analysis is the specification of the analytic problem. For example, “what are the likely outcomes of the Syrian Civil War?” Next, analysts sample data sources that hold hypotheses related to the analytic question. A common source of hypotheses can be found in open source documents, such as news reports. Once the data is compiled, it can be mined by coding each unique hypothesis.

At first the list of hypotheses will grow exponentially with each document, however, the analyst will soon see something very puzzling: after the initial rapid increase of new hypotheses, each new successive document will yield less, and less new hypotheses.  This rapid leveling-off is due to Bradford’s Law

steve blog 1 fig 3

An Example of Bradford’s Law: Citations

In 1934, British mathematician Samuel Bradford was searching physics journals and found that after locating approximately two dozen core journals he had found the bulk of all physics academic citations. After these core journals each subsequent journal provided a diminishing amount of new citations. The leveling-off effect of the Bradford Law also applies to hypotheses and provides a stopping point at which analysts know they have reviewed almost all known hypotheses.

Returning to our power-law distribution of hypotheses, we could imagine that a boundary analysis might get us closer to finding the black swan, but boundary analysis is still no panacea because at this point we really do not know how well the technique does in identifying possible black swans in intelligence analysis tasks.

steve blog 1 fig 4

Fortunately, the question of how boundary analysis performs on intelligence analysis tasks is an answerable empirical question. In my next blog post I will present results from a research study using boundary analysis on a ‘real world’ intelligence analysis problem.