In the mid-1990s, I attended a talk by Professor Tom Tyler. Among justice researchers, Dr. Tyler is something of a legend, having done much of the seminal work on procedural fairness. He’s also an engaging speaker, and this talk was no exception. On this particular day, Professor Tyler was discussing one of his recent studies. His results showed strong procedural justice effects in actual legal cases, even when the outcomes were important. Near the end of the talk, Dr. Tyler summarized the importance of his study by observing that skeptics tended to question the efficacy of procedural justice in the “real world” when the stakes were high. His findings, of course, spoke directly to this concern.

This would have been good news for almost any behavioral scientist, but I suspect it held a special place in the hearts of us justice scholars. Dr. Tyler was describing the prevailing concerns with external validity, which haunted our work in those early days. Much of the original research, though by no means all, had been conducted in laboratory settings. These employed artificial tasks and put little at stake for the participant. Procedures, so the skeptical argument went, would only appear important in these ersatz settings. In real life the fair process effect would be overpowered by other stimuli, such as the size of the outcomes. Professor Tyler knew of these external validity arguments, and so he (and others) directly confronted them. Thanks to this effort, we don’t hear these arguments so much nowadays, but they were prevalent back in the day and are worth revisiting.

I say this, not because those arguments were especially compelling. As it turned out, they were not. Rather, I want to re-open the discussion because, even at the time, it struck me that our detractors never fully grasped their own argument. Of course, procedural justice effects are not equally strong in every situation and for every person. For instance, there is good evidence that perceptions of process fairness are egocentrically biased. But these sorts of observations should serve as a call for further investigation, not as an attempt to “wave away” a demonstrable psychological phenomenon. If we understand the nature of external validity, then we see why it was appropriate to explore the causal effects of procedural justice in the lab and, as these effects were better understood, to test them in field settings. Put differently, we first need to have some understanding of the theoretical relationships, and then we can systematically attempt to generalize it to other settings. With this in mind, let’s take a look at external validity.

The earliest mention of “external validity,” at least of which I am aware, was by Donald Campbell in a 1957 article for Psychological Bulletin. Modern approaches follow from Campbell’s work, though the precise definitions of external vary somewhat. In general, though, we can say that something is “externally valid” when an empirical inference, typically concerning a cause-effect relationship, can be generalized across people, situations, and procedures. That is, we are usually asking if an effect that holds for one group of people (or in one situation) will hold when it is considered in a second group of people (or a second situation). For a more detailed discussion, I recommend Lucas, 2003, Sociological Theory. As always, I am simplifying.

In our field, external validity is important for theory testing conducted by basic researchers, but it has a special significance for applied behavioral scientists, such as clinical psychologists and industrial-organizational psychologists. All of us, but especially those involved with a scientifically-based practice, will wish to know if an effect can be generalized from the proverbial undergraduate psychology laboratory to the “real world” and thereby make a practical difference in peoples’ lives. By this reckoning, external validity should be understood as more than a dichotomous “yes/no” question. It is not (only) whether the effect continues to be statistically significant but also the size of the effect that is critical. For instance, a clinical psychologist might examine the effectiveness of a new cognitive behavioral technique in her laboratory. If the technique works well in this controlled setting, it remains an open question whether it will bring the same (hopefully sizable) benefits to actual clients. The very claim that one is an applied behavioral scientist, and not some other sort of practitioner, is predicated on the notion that the science of human behavior can be generalized across settings and persons. If our knowledge cannot be generalized, and if every case is therefore distinct from every other case, then one cannot deploy the organized body of knowledge that comes with being a scientist.

For these reasons, external validity is a serious matter. Though it could potentially concern any research design, the biggest worries are with laboratory experiments. These settings are structured and often artificial. As such, laboratory studies tend to lack “ecological validity.” That is, they often do not share many seemingly relevant features with the real-world settings to which we hope their findings generalize. For example, social psychologists often study college undergraduates (rather than more diverse populations) by placing them in unusual situations (which they would not otherwise encounter) and rewarding them with academic credit (worthless in other life domains). Intuitively, it strikes many observers as odd that these deliberated constructed and unusual settings should provide models for behavior elsewhere. Can you blame them? If we can’t address these important concerns, then we have a problem.

When we consider the external validity of laboratory studies, there are basically two closely related responses to worries about external invalidity. The first is philosophical and indirect; the second is empirical and direct. The empirical approach has the more intuitive appeal, but it follows closely from the philosophical arguments. For this reason, the philosophical approach, though it takes a bit more work, is probably the more critical consideration.

Given all of this, let’s begin with the philosophical response to concerns about external validity. A famous argument was made in 1982 by Berkowitz and Donnerstein in American Psychologist. In their well-known paper, Berkowitz and Donnerstein distinguished between “psychological realism” and “mundane realism.” The former refers to whether the relevant psychological states are engendered in the subject population. The latter refers to whether the particular features of the research setting correspond to those in the setting to which one hopes to generalize. It is psychological realism that a researcher hopes to create and, given subsequent work, to later generalize. Whether or not the concrete elements of the situation resembles a target setting (the ecological validity) is a secondary consideration for theory testing, though it is important for indexing the size of population effects. In their words, it is “the meaning that Ss [subjects] assign to the laboratory setting” that impacts external validity. To state the matter differently, Berkowitz and Donnerstein grant that the situation a person faces in a psychology lab will differ from the situation a person faces at, say, their workplace. However, the psychological phenomenon could be the same, even if the particular attributes of the setting are otherwise distinct. It is this psychological phenomenon that is most relevant to the science of human behavior.

To illustrate Berkowitz and Donnerstein’s reasoning, let’s diagram how early procedural justice experiments were sometimes conducted. An undergraduate might enter a social psychological laboratory and work on a mock task for a small reward. Some participants would lose the reward though an unfair process. They would then be asked about their feelings, attitudes, potential behaviors, or some combination thereof. Clearly, this is an artificial setting. It lacks mundane realism because it does not correspond with anything that the individual is likely to encounter elsewhere. However, if it induces a sense that the reward was allocated unjustly (that is, the psychological realism of procedural injustice), then it remains a useful test of a causal theory. It is the theory, and not the particular attributes of the situation, that we hope to generalize. If individuals sometimes experience procedural injustice, and if they respond predictably to these feelings, then these cause-effect sequences (the theory) are likely to occur in other settings as well.

Probably my favorite rendition of this argument was made by another justice researcher, Dr. Jerry Greenberg. Dr. Greenberg’s analysis appeared in Academy of Management Review, the same year that Berkowitz and Donnerstein’s paper was published. Greenberg was responding to earlier and very powerful criticisms of laboratory research. His argument was nuanced and thorough, but for our present purposes we will focus on one important point. Greenberg maintained that scholars do laboratory research to test scientific theories. These psychological phenomena, not the idiosyncratic elements of the situation, are what we seek to generalize across situations or populations. In other words, the fact that a laboratory is dissimilar from, say, an actual organization is not necessarily a problem.

It might even be a strength. This is because laboratory experiment needs to be structured in such a way that a theory can be examined. For this reason, the laboratory setting should be artificial, if this artificiality strengthens causal inferences (i.e., if it boosts internal validity). If the theory is supported, then it is this that should be examined in other settings for evidence of external validity. Moreover, valid causal theories are not unlikely to generalize outside the laboratory if they properly operationalize and test the psychological phenomena in question.

Notice that, Greenberg, like Berkowitz and Donnerstein, emphasize that laboratory studies are used to investigate theories. Strong tests of theories, which capture and explain psychological phenomena, do not require ecological validity. Rather, they require psychological realism. The phenomenon must be real — it must exist — in the minds of the participants. When a phenomenon has been shown to exist, then it can be extended into the real world.

Drs. Berkowitz, Donnerstein, and Greenberg have made compelling arguments, but are they correct? It is important to test their ideas directly, assessing whether effects observed in laboratory settings can also be found in field settings. There have been a number of papers exploring this possibility. Probably the best study on this topic was conducted by Anderson, Lindsay, and Bushman (1999, Current Directions in Psychological Science). Anderson and his colleagues reviewed the research literature to collect 38 pairs of laboratory and field studies. These examined various psychological phenomena, such as aggression, depression, self-efficacy, and others. Each of the observed effects was converted into a d (the standardized mean difference) to facilitate comparisons. As the authors put it, the results for “lab and field studies tended to be similar” (page 5). The overall correlation between the two sets of studies was about .73, which is quite respectable.

There have also been meta-analyses that report promising findings for procedural justice. For instance, Cohen-Charash and Spector (2001, Organizational Behavior and Human Decision Processes) found that procedural justice relates to important criterion variables in field settings (e.g., counterproductive work behavior, organizational citizenship behavior). This is good news for justice researchers, but we should not become smug. Professors Cohen-Charash and Spector also found some differences between laboratory and field research. For instance, procedural justice showed stronger relationships to job performance among studies conducted in the field, but weaker relationship among studies conducted in the laboratory. We still have some theoretical and empirical homework to do, but we should be happy to know that procedural justice effects are not exclusive to the psychological lab.