The best introduction to the Yule-Simpson Paradox is an example based on an actual event.

Later we will review how the math works, how to detect this paradox, how to avoid it (as much as possible!), and ways in which this paradox can occur.

First, let’s look at one example.

## Yule-Simpson Paradox Example: University Hiring

To move toward gender balance among faculty members, a university set a goal to hire a larger percentage of women than men in two departments, History and Geography.

## Hiring Decisions led to a Yule-Simpson Paradox

The History department hired 3 of 13 applicants: 2 of the 8 women, and 1 of the 5 men. So a higher percentage of women succeeded: 25% versus 20%.

The Geography department also met the goal by hiring 4 of the 5 female applicants, but only 6 of the 8 males. The percentages favored women, 80% to 75%.

However, the combined numbers showed that the university failed to achieve its target. There were 13 applicants of each gender, but only women were hired, fewer than the men. The university’s overall rate of hiring women was ; lower than the rate for male applicants.

How could that be possible? Our instincts say that, since the women’s success rates were better than the men’s in both departments, then the women should have the highest combined success rate. There’s the Yule-Simpson paradox in a nutshell.

## Simpson’s Paradox vs. Yule-Simpson Paradox vs. Simpson’s Reversal

The Simpson’s Paradox reverses an association, or trend, found in partitioned subpopulations when discerning the association across the whole population. (Even the Stanford Stanford Encyclopedia of Philosophy leads its *Simpson’s Paradox* article with a lengthy definition).

The names Yule-Simpson Paradox, Simpson’s Paradox, and Simpson’s Reversal all refer to this same phenomenon.

## What Causes a Simpson’s Paradox?

In the discipline of Statistics, the technical term “population” refers to the complete pool of events in a study. A “subpopulation” is a segment of that whole population. In the hiring example above, the population refers to all the applications by men and women to all departments. The four subpopulations are the men and women who applied to either the History or Geography departments.

A Yule-Simpson Paradox reveals itself when a statistical trend that holds true for all partitioned subpopulations, is reversed in the overall population.

This definition does not mean that people only notice the Simpson’s Reversal when they change focus from the partitioned subpopulations to the full population. They might discover the Simpson’s Paradox when partitioning the population after calculating the overall trend.

## How to Explain the Simpson Reversal

What explains a Yule-Simpson Paradox? Often there is a “lurking variable”: some factor that lies hidden beneath the situation under review. We will discuss these factors farther below. Also, let’s defer examining the general mathematics. Instead, let’s review the university hiring example in more detail.

Each department takes responsibility for its own hiring practices. Therefore, it is natural to partition the population by department. Also, the goal distinguished men from women; so that too is a natural partition.

How do the History and Geography departments differ in this table? The History department hired far fewer people than did Geography: 3 overall, versus 10.

Did the men and women behave differently? More women applied to the History department; more men to Geography.

Let’s summarize the previous paragraphs. More women applied to the department that hired fewer people. More men applied where more hiring occurred. Within each gender, female applicants were more successful than males; but females “squandered” their efforts by applying to the department that hired fewer applicants overall.

## The Mathematics of a Yule-Simpson Paradox

Here is an general, but rather extreme, example of the Yule-Simpson Paradox.

In Group *A*, Type *alpha* succeeds in 2 of 3 trials; and Type *beta* succeeds in 19 of 30. In Group *B*, Type *alpha* succeeds in 39 of 90 trials; and Type *beta* succeeds in 1 of 3. Within both the *A* and *B* groups, Type *alpha* succeeds slightly more often than Type *beta*.

However, Type *beta* has a much higher percentage of success for the total population.

The Yule-Simpson Paradox may occur with more than two Groups or more than two Types, but let’s use the two-by-two example where Type *alpha* has better success in the partitions, but less success in the total population. A Simpson’s Reversal has at least some of the following characteristics in the partitioned data:

- The rate of success in one Group is significantly better than in the other, for both Types.
- Type
*alpha*has fewer trials than Type*beta*in the Group with the higher success rate. - Type
*alpha*has more trials than Type*beta*in the Group with the lower success rate. - Type
*alpha*has more trials in the Group with the lower success rate, than it has in the other Group. - Type
*beta*‘s pattern is opposite to Type*alpha*‘s, with more trials in the Group where the success rate is generally higher.

This pattern may not be sufficient for Simpson’s Reversal; the difference in “number of trials” or “success rate” has to be fairly large.

A pair of math formulae expresses it succinctly:

and

but

where

- S is the number of successes;
- T is the number of trials;
- a and b are the Types;
- A and B are the Groups.

One could say that Type *beta* follows a strategy to defeat Type *alpha* at the total population level, while admitting defeat in the partitions. The strategy would be, “Put more trials into the Group where successes are more likely”.

## Backing into a Simpson Reversal

Let’s consider how to look for a Yule-Simpson Paradox given a result for a total population. That would be a backwards approach, compared to finding a common trend for partitioned groups.

Let’s change the first example. Imagine that the university had never set a target for hiring women over men, but was criticized after publishing the results for the total population. The administration might respond by looking for a Simpson Reversal.

Suppose that there was a helpful pattern by applicants’ age, rather than by department. Simply replace “History Department” with “Applicants under age 35”, and “Geography” with “Over 35 years old”. More young women applied, rather than older women; but it was less likely for younger applicants to gain employment versus older scholars.

In other words, it may be possible to “torture the data” to produce a Simpson’s Reversal.

## Important Examples of the Simpson Reversal

The Yule-Simpson Paradox may affect making choices based on past results. Let’s use the example of two successful surgeons.

Dr. *alpha* has slightly better results than Dr. *beta*, in emergency surgery as well as for scheduled operations. However, Dr. *beta*‘s overall success is slightly better than Dr. *alpha*‘s.

The explanation is that Dr. *alpha* mainly works in the emergency department, which has a slightly lower success rate than for scheduled operations. Dr. *beta* mainly performs scheduled surgeries. However, they sometimes cover the other department.

Would you prefer Dr. *beta* as your surgeon, based on her overall record? Or does Dr. *alpha* have your confidence, based on the partitioned subpopulations?

Simpson Reversals may occur in many different fields.

In medicine, a Type *beta* treatment may be better for the overall population; but Type *alpha* is better for younger and also for older patients; or for “less severe” and also “more severe” conditions.

Other areas include sports statistics and student grades. In general, any partitioned comparison of performance may display Simpson’s Paradox.

## Can We Avoid the Yule-Simpson Paradox?

A scientist designing a test should make equal sized subpopulations, if at all possible. That should rule out the conditions required for the Simpson Reversal, for the groups or types in the planned experiment.

It may be impossible to balance subpopulations, however. If the selection of data is beyond the researcher’s control, paradox conditions may occur.

For example, the university departments could neither pre-select the genders of the applicants, nor their selection of departments. In an epidemic, the severity or combination of symptoms for a specific disease may influence the likelihood of one treatment over another; but the doctors cannot pre-select their patients’ symptoms. If one group of symptoms offers little to choose between two treatments, perhaps the cheaper treatment will be used more often. But some patients may insist on the more expensive treatment.

Finally, as already noted in the second look at the hiring example, a researcher might actively seek to partition the data to create a desirable Simpson’s Reversal. The excuse “Most of my tasks were harder than his; but I did better on the fair comparisons” is an informal search for a Yule-Simpson Paradox.

## Self Selection and Simpson Reversals

Self-selection may provide the “lurking variable” behind many Simpson’s Reversals.

One type of “self-selection lurking variable” occurs in medical treatment, when the number of trials in a group is determined by “who walks in the door”.

Doctors treat sick patients according to their symptoms, medical history, and other factors outside the control of a test designer. Having rare but severe symptoms may open the door for experimental or expensive treatments. Being allergic to one medication may force some patients to use a less effective medicine.

## Closing Notes about the Yule-Simpson Paradox

A study may show that one type performs better than another in each group of tests; yet another type performs better overall. This surprising result, known as the Yule-Simpson Paradox, can always be explained by “checking the math”: count the number of trials and successes of each type in each group. The type with the best overall result usually has more trials in the group with the highest success rate; and vice versa for the other type.

The Simpson Paradox may have a perfectly logical explanation, such as self-selection; or it may be an artifact of carefully partitioning the population to make the math work. Careful experimental design can reduce the likelihood of “logical” Simpson Reversals; but the effect may reflect a valid problem inherent in the experimental situation. Finally, when choosing a course of action based on results that show a Simpson Reversal, it’s important to consider whether your situation fits in one of the partitioned groups, or in the overall population.

## Leave a Reply