In calling for the improvement of intelligence analysis, other professions have been invoked, such as medicine and law. But, what about that most quintessentially American of all sports: baseball? In the story of Moneyball we find a modern day David versus Goliath story which has been used challenge thinking in diverse fields ranging from business to medicine, and even corrections. As with these other fields, the lessons of Moneyball can transform intelligence analysis.
Moneyball is the story of how in 2002 the cash-strapped Oakland A’s management revolutionized baseball. In the film version of the story, A’s general manager Billy Beane, played by Brad Pitt, sums up the difficult position of his club: “There are rich teams, and there are poor teams. Then there’s 50 feet of crap. And then there’s us.” And Beane wasn’t exaggerating; the A’s 30 million dollar budget was a mere fraction the powerhouse clubs, such as the New York Yankees (c. 130 million).
2003 Major League Budgets by Team
Graph Source: http://en.wikipedia.org/wiki/Moneyball
Combining an unorthodox perspective on recruitment and traditional baseball statistics, the A’s front office imported sophisticated statistical analysis to build a highly competitive team with a shoe-string budget. With their revolutionary approach the A’s won 20 consecutive games, an American League record, and to this day are among the most efficient teams, purchasing a win in the 2013 season at $631,000 versus big payroll clubs such as the Yankees purchasing each win for $2.6 million.
But, if you dig deeper the real message of Moneyball isn’t about statistics, or even money, for that matter: it is really about improving a profession by challenging orthodoxy with novel ideas.
Experience is useful, but it’s no crystal ball
Consider the role of experience in baseball recruiting. Traditionally, baseball recruiting is a subjective art practiced by experienced scouts. According to established baseball thinking:
you found a big league ballplayer by driving sixty thousand miles, staying in a hundred crappy motels, and eating god knows how many meals at Denny’s all so you could watch 200 high school and college baseball games inside of four months….Most of your worth derived from your membership in the fraternity of old scouts who did this for a living… (Lewis p. 37)
After staying in a ‘hundred crappy hotels’ and eating all those Denny’s meals, the scouts relied on their experienced to select their top picks on the basis of the appearance and anecdotal information they know about the player, an error that led some players to be vastly overvalued and others undervalued. Players that didn’t “look” the part of a major league player or who didn’t have the right back story were consistently passed up, something that the A’s capitalized on to buy the best team for their buck. Take for example, Chad Bradford, and his unusual submarine pitch. While Bradford ended up a staple relief pitcher for the A’s, he was so overlooked by scouts that he began his pitching career at a community college.
Chad Bradford delivering his unique submarine pitch
But, Bradford’s case could be a ‘black swan’ and you could even reason that it is OK to miss the occasional submarine pitcher. However, the problem isn’t just about ‘unique’ ballplayers. Consider the common scouting task of selecting the superior hitter. Look at the picture below of Pittsburgh Pirates slugger, Neil Walker. Can you guess what 2013 his batting average is?
Guess Neil Walker’s Batting Average
If you know baseball you might come close because you know roughly how good Walker is, or other standout hitters. However, if you never watch baseball, I am willing to bet you’d be wildly off (but for those of you keeping score at home, Neil Walker batted .280). Now, if you were a scout, you might come a little closer to being right than someone who watches baseball, but you probably wouldn’t guess much differently.
A meta-analysisof clinical judgment in medicine suggests that experts (e.g baseball scouts) slightly outperform well-informed observers (e.g. baseball fans). However, as complexity increases in the judgment task and the opportunities for chances to learn goes down—as is the case with most intelligence tasks— experts will not do much better than well-informed observers. In foreign affairs forecasting, Tetlock found a similar dynamic: there was a rapid and diminishing return on expertise.
The Diminishing Returns of Expertise
In one study I asked participants to make judgments about the extent to which the Assad regime will comply with a UN resolution to remove and destroy all declared weapons before June 30th, 2014. The participants of the study included 24 graduate students in a security studies program, representing well-informed observers, and 16 International Association for Intelligence Education members (IAFIE), and 5 analysts from the IC, representing experts, for a total of 45 participants.
This modest experiment seems to reflect the diminishing returns of experience argument: the grad students guessed 62% versus the 51% for IAFIE and IC analysts. Whether one group has guessed a figure closer to actual number of weapons destroyed is a more complex matter for another discussion, but it would seem that there is not at least a large difference between the two groups, despite the presumably large experience gap.
Estimates of Percentage of Weapons Destroyed & Removed
Since such estimative judgments are but one part of the analyst’s job, I also asked each participant for their rationale. Again, like the estimations, the rationales were similar but not the same. Below are frequency distributions for each group’s cited reason for their judgments. Both groups identified reluctance of the Assad regime, interference of the civil war, and difficult timeline as their main reasons for their judgments, although the two groups prioritized the first two justifications differently. Where I found any substantial difference was that IAFIE and analyst groups noted the importance of the weapons falling into a 3rd parties hands (e.g. a rebel group).
Graduate Student Justifications
IAFIE and Analyst Justifications
The point of this lesson and the findings of the study is not that experience doesn’t matter, but that it can only take us so far as the difference between well-informed observers and experts is not large. In lieu of this issue we need to keep developing new methodologies and techniques that can supplement—not supplant—the experience of analysts. However, if analytic methodologies and techniques are to be created we would learn well to heed the next Moneyball lesson.
Don’t assume the established way of thinking or doing something is right
Realizing the limitations of the scouts, the A’s turned to the analysis of baseball statistics, sabermetrics, but they didn’t just assume the numbers would save them. In fact, the A’s scrutinized many of the standard baseball statistics and found some were just as biased as traditional scouting.
Take for example, the fielding error, which occurs when a fielder misplays a ball such that a runner from the opposing team can advance. The fielding error statistic was thought up in the early years of baseball as a way to account for how barehanded players fielded (baseball gloves weren’t common till the 1890s). A century later baseball statisticians began noticing the fielding error statistic was misleading because an error could only occur if a player made an attempt to pursue the ball in the first place, therefore punishing those who made the attempt to get the ball, and rewarding those who either avoided or couldn’t get to the ball in time.
In short, fielding error statistics “weren’t just inadequate; they lied ” (Lewis, p. 67)
The result of the misleading error statistic is that many players were passed up, and teams relying on error as a measure, were mismanaging how they appraised their defense.
The Fielding Error made more sense in the time of rough fields & bare hands
In intelligence analysis similar mistakes can be made. For example, consider the use social network analysis (SNA), the analysis of links between people, often used in intelligence to study terrorist and criminal groups. With the increase of social network analysis tools and ‘big data’ analysts increasingly rely on SNA and associated statistics such as, degree centrality, a simple measure of how well-connected a person is in a network. However, reliance on this statistic as a measure of influence can be as troublesome for determining influence in the network, as an error statistic is for determining fielding skill.
Consider the case presented by Bienenstock and Salwen (forthcoming) of Abu Zubaida. While Zubaida was identified as a number 3 in al Qaeda by U.S. leadership, he was later found to be a low-level operative. Yet, he was heavily connected in the network because his role as a courier, and therefore would have had a high degree centrality score. Below is a sociogram from Marc Sageman’s well-known global violent Salafist data linking al Qaeda to other violent Salafist plots with Zubaida in the center of the graph near bin Laden.
Al Qaeda-Madrid-Singapore Network (source: Sageman et al)
I witnessed firsthand how network statistics can be misleading while working with Mike Kenney on a project using SNA to map al-Muhajiroun, an Islamist extremist group in the UK. To create our network we used a massive dataset of news reports and automated data extraction tools similar to what IC uses, but what we did that was novel was we cross-validated our network statistics with in depth field research. Over the course of two years, Mike visited the UK several times interviewing 86 people within al-Muhajiroun, from the top leaders down to the rank-and-file members.
When we attempted to cross-validate our networks against hours of interview recordings we found something pretty surprising: the standard network statistics by themselves were incredibly misleading. For example, in our SNA we found individuals that had no operational ties to the network but were ranked artificially high in the network, such as Osama Bin Laden (yellow). Others had were ranked high because they were well-known in the British media, such as Salahuddin Amin (green), but certainly weren’t ranking leaders of the network as the SNA would imply.
Betweenness Centrality in al-Muhajiroun
The moral of the fielding error, Zubaida, and al-Muhajiroun stories is not that these statistics have no value, but that it is necessary to make a conscious effort to determine if a particular way of thinking or method actually works. While it might be difficult to evaluate some of these practices it is entirely possible with additional effort.
Looking to the outside for innovation
If the Moneyball revolution had an ideological father it would certainly be Bill James. It was his mistrust of experienced judgment and concern with the traditional baseball statistics, such as the fielding error, that led the A’s management to storm the Bastille of professional baseball. Yet, James was an ‘outsider’ in the purest sense of the word—he penned his first tract on baseball analysis while working as a night watchman at a bean factory.
Beginning with his self-published books in the 1970s, his popularity grew a cult-following among computer and stats nerds, but even as his circle grew larger and larger, his message landed on deaf ears of the men who ran professional baseball. This all changed once the A’s front office brought James’ ideas into practice, drawing the ire of the traditional ‘baseball men.’ Still, even with strong resistance, the A’s were able to inaugurate the Moneyball revolution with a combination of outsider innovation and insider know-how.
Bill James on the cover of a 1981 issue of Sports Illustrated
In intelligence analysis there is a similar dynamic underway as many post 9/11 programs have opened up numerous channels for new ideas, such as the Intelligence Advanced Research Projects Activity’s (IARPA) grants and the Centers of Academic Excellence program. In short, it would seem inroads are being made to bring in more outsiders.
Still, there are still few places and opportunities for those interested in implementing the core message of Moneyball in intelligence analysis. For my own part I am trying to validate some of the structured analytic techniques promoted after 9/11, but I’ve faced an endless set of institutional and cultural barriers. This stems from the fact that as a young intelligence studies researcher, I am caught between a rock and hard place; my research subject is unfamiliar to an academic audience and some practitioners are distrustful of applied social science research.
It would seem we need not just an institutional shift but also a cultural shift to bring in new ideas. In the beginning, there will certainly always be resistance, but if the objective is to improve the profession, considering the limitations of experience and questioning ‘what works’—the lessons of Moneyball—can take us a long way.