In my last blog post I discussed the possibility of being able to bracket, or identify, a ‘black swan,’ an extremely rare event which has significant consequences. Trying to identify a black swan event is a pretty tall order since these events by definition, are highly unlikely. As I discussed in my blog entry last month, the challenge is to ‘reach out’ on to the statistical distribution towards the unlikely hypotheses.
Research on knowledge systems suggests that the most commonly identified hypotheses among a group of experts are on the extreme left of the distribution. In most analytic tasks, the most instrumental hypothesis is probably here. For example, there are a few commonly discussed hypothesesfor the outcome of the Syrian Civil War (e.g. Assad regime wins, stalemate, etc.). In the graph below these hypotheses would fall in the green shaded region as H1, H2, and H3. But, in the case of black swan events, the hypothesis (or hypotheses) are less frequently suggested and are further out on the right. In the Syrian example, this might include Iran invading and achieving victory out in the yellow shaded region.
Imaginative structured analytic techniques assist analysts in reaching out further on this distribution, but , some of the techniques have notable limitations. For example, one such technique, brainstorming, assumes equal participation among diverse group members, which defies conventional experience. Further, most of these techniques cannot tell the analyst where they are on the distribution, and more importantly, when they have reached saturation and generated the bulk of plausible hypotheses. In a traditional brainstorming session, this is usually identified by a lull in the conversation and participants are satisfied they have captured the likely hypotheses.
Boundary analysis, developed by William N. Dunn, is another way to generate hypotheses. The technique requires analysts to sample documents containing hypotheses (e.g. news reports) and write down each hypothesis. As an analyst records more hypotheses he should observe the effect of Bradford’s Law: after a point the number of new hypotheses gathered from each document drops precipitously. Since the hypotheses come from the documents rather than the group itself, the technique may ameliorate some of the negative effects of group dynamics on hypothesis generation. Furthermore, one can simply expand the scope of the search for more documents to gain access to rarely cited hypotheses.
Stopping Point of Bradford’s Law
For most analytic tasks, stopping at the “knee of the curve” (where the marginal frequency of each new hypothesis levels off) will likely include the correct hypothesis. But for “black swan” events, we have no such defined rule. By definition it would seem that a black swan should fall after the stopping rule, but it is also entirely possible that the black swan really was foreseeable.
We simply don’t know.
To address this question I teamed up with my colleague Jay Rickabaugh to apply boundary analysis retrospectively to a ‘real world’ intelligence analysis task: the 2012 University of Pittsburgh bomb threats.
The Pitt Bomb Threats
Over the course of ten weeks in the spring of 2012, the University of Pittsburgh received approximately 140 bomb threats. While the threats took a variety of forms, beginning with scrawled threats in campus restrooms, the most persistent and numerous threats came from emails sent through a remailer, which masked the location of the perpetrator. Further, confounding the investigation were copycat actions, false accusations and others seeking publicity by capitalizing on the chaos. The swarming of these threats made this case different from a traditional bomb scare and thus the possibility of black swan explanations seems more possible.
During the multi-agency investigation, several leads were pursued but each led to a dead-end. Finally on April 19th, after weeks of threats causing the University of Pittsburgh to spend more than $300,000 in direct costs alone, the University met the demand of one of the threateners to rescind a $50,000 reward, and immediately thereafter, the emailed threats stopped.
In mid-August, after a months-long investigation, authorities held a press conference to announce that they were charging Adam Busby, a 64-year-old Scottish nationalist involved with the Scottish National Liberation Army (SNLA) in connection with the emailed threats. The result was stunning and best summed up by Andrew Fournaridis, administrator of a blog developed during the bomb threats who wrote:
“This is the mind-bending stuff intelligence analysts must deal with on a daily basis, especially in the 21st century cyber-crime era.”
To this day authorities have never divulged Busby’s motivation.
The question is: will boundary analysis find the black swan before the stopping rule?
Using Boundary Analysis & Findings
For our analysis we used open source documents from two local newspapers (the Pittsburgh Post-Gazette and Pittsburgh Tribune-Review) and blog postings from www.stopthepittbombthreats.blogspot.com, a major platform for crowd-sourcing during the threats. After compiling all the sources we had more 130 news articles and numerous blog posts ranging from January 1, 2012 to August 30, 2012.
Articles that did not contain useful information (e.g. articles about how students coped with threats) were omitted, leaving us with 73 articles that we coded by date in an Excel spreadsheet. Next, each article was scrutinized for hypotheses, a process that took a single coder approximately 8-10 hours.
Our boundary analysis of the bomb threats yields two findings:
- Boundary analysis identified the ‘usual suspects’ quickly
In conducting our retrospective boundary analysis we quickly found our stopping rule. In fact, within in a time span of roughly one month, from March to April, almost all of our hypotheses were identified in our documents (see graph). These original hypotheses included typical explanations such as students avoiding exams, students who have conflicts with university administration, pranksters, etc.
The ability of boundary analysis to locate the main hypotheses quickly may also be helpful when combined with hypothesis testing techniques. For example, once the analyst extracts the most common hypotheses he can begin testing each one using a diagnostic technique (for example, alternative competing hypotheses) and move further out on the distribution as needed.
- The normal stopping rule did not bracket the black swan hypothesis
After an examination of our three data sources, the correct hypothesis—a foreign national from the UK pranking the University—was not identified in the documents. However, we stopped our analysis at the stopping rule, or “knee of the curve.” We do not have enough information to suggest what a good limit to set would be, but applying these same principles to more black swan intelligence cases (the DC Sniper, Eric Rudolph, etc.) would give us a better indication. With more research, we can begin to identify how far past the knee one would need to research to be reasonably confident the black swans are identified. Thus, when unanticipated or abnormal events begin to occur, we do not use ordinary methods for unique circumstances.
While we were unable to bracket the black swan using traditional limits, the two findings have important implications for intelligence analysis. Probably the greatest benefit of boundary analysis could be to give analysts a list of ‘usual suspects’ hypotheses. Analysts can then use diagnostic techniques to whittle down the number of plausible hypotheses. If these usual hypotheses are not useful, the analyst can keep moving to the right of the distribution by extending the boundary analysis or employ an imaginative technique. As we note, an area of future research is conducting more research retrospectively to determine if there is a stopping rule that will catch most black swans.