Expert Wrongness and the Streetlight Effect

As the measurement of marketing effectiveness has shifted from something of a black art to an increasing reliance on analytics, the field has opened itself up to some new and considerable risks of providing misleading guidance to marketers. These dangers stem not merely from problems unique to marketing analytics, but rather from booby traps that are inherent to the process of trying to wring useful understanding from measurement.

I came across these risks during the three years I spent researching where experts go wrong, particularly scientists, but also economists, consultants, managers, pundits and others. The simple answer to where they go wrong: everywhere. They ask the wrong questions, mismeasure, misanalyze, cheat, collaborate counterproductively, circumvent safeguards, play to the crowds, and ignore contradictory evidence, among many other missteps. In fact, so many forces align to pull experts in the wrong direction that it’s harder to explain how experts sometimes manage to get it right.

How much wrongness do these problems cause? That isn’t easily measured in any consistent and useful way, but here’s one indication: Scientists who study the reliability of medical research findings have carefully calculated that about two-thirds of the findings in the most prestigious research journals are ultimately revealed within a few years to be significantly or even completely wrong, or to heavily exaggerate the intensity of an effect. There is evidence that wrongness rates are worse in most other fields.

A 1992 study by economists at Harvard and elsewhere analyzed dozens of papers from a range of economics journals, and calculated that the « rightness » rate of the papers was approximately zero.

What can we do about it? A good start is to identify the ways in which experts slip up. And among the many factors that contribute to wrongness, the one that stands out as being especially ubiquitous and pernicious is the streetlight effect, as I’ve taken to calling it. This effect takes its name from the old joke about a police officer coming across a drunk man late at night who says he’s searching for his car keys, which he dropped across the street. When the cop asks him why he isn’t across the street searching for them, the drunk man explains the light’s better on this side.

We badly want hard data on which to base our decisions. Some 130 years ago, Lord Kelvin put it this way: “When you can measure what you are speaking about, and express it in numbers, you know something about it.” Unfortunately, the numbers we really want are rarely available. How do you directly and reliably measure the impact of a website change on the intention of a visitor to return, or to buy? The good will and branding recognition that a particular facet of a campaign generates? Until mind reading devices come along, good luck with that.

 What we do, of course, is turn to « surrogate » measures that we argue are well-correlated to what we’re really after. We can track online behavior in any number of ways, and supplement with self-reported data from surveys.  But surrogate measures almost always throw us off to some extent. Albert Einstein was bitten by this problem in 1915 when, unable to directly measure the forces on an electron in order to confirm an important new theoretical prediction of his, he set up an experiment to measure the forces indirectly via a twisting iron bar. He confirmed his prediction, all right, and with great precision. Unfortunately, his prediction turned out to be off by a factor of two; the surrogate measurement had merely served to cement the error. If Einstein was tripped up by a measurement on a twisting bar, how likely is anyone to nail precise, accurate insights into human decision-making via simple measurements of a few easily observable online behaviors?

We think we can get around these problems simply by being more judicious and careful in our measurements. And that can help, of course, but it won’t solve the problem. In medicine, for example, researchers hold up as a gold standard the randomized controlled trial, in which results from two or more groups are compared in order to isolate the effects of changing just one variable. But analysis of these studies has shown that even the very best are wrong at least a quarter of the time. And only a tiny percentage of studies are nearly as carefully executed as these are.

The more marketing has access to and relies on hard measurement, the bigger the dangers from the streetlight effect become. The simple reason is that when dealing with well-recognized vagaries, as in, say, political and sports predictions, our guard is up, our skepticism is high, and we rely on our judgment and the judgment of others to guide us in assigning trustworthiness to a pronouncement. But when we’re dealing with what seems to be a precise, clear, straightforward measurement that yields hard, analyzable data, we’re much more likely to swallow the resulting conclusions, blinded to where the process might have veered off the rails. As one science historian told me about the difficulty of recognizing when scientists are mismeasuring, « You find out 80 years later who was right, » he told me. Marketers aren’t likely to have that much better a track record than scientists.

I’m often asked what experts are supposed to do to fix the problems caused by the streetlight effect and other measurement missteps. But that’s really the wrong question. You can’t fix it. It’s woven into the fabric of measurement and the way the world works. The right question is how do I learn to live with it in the least damaging way possible? It’s not necessarily about making better measurements or performing more careful analyses of the data. It’s often about adding in judgment–the very thing we’re often trying to remove from the equation. We need to remain questioning, skeptical, dubious, and wary of our conclusions.

 Many marketing and other experts are afraid to admit to these sorts of qualifications and concerns. But in the long run this sort of self-questioning candor is usually appreciated. Take a tip from Einstein, who was nothing if not humble about his own fallibility. He put it this way: “If we knew what we were doing, it wouldn’t be called research, would it?”