Yale professor Dan Kahan informally reported his surprise that, statistically, a sample of American Tea Party supporters scored higher for science literacy than more liberal Americans, and more than the average for Republican Party supporters.
How should we interpret his report? What do these statistics mean? What conclusions can we draw?
Tea Party Science: What Professor Kahan Reported
In Some data on education, religiosity, ideology, and science comprehension, Prof. Kahan reports statistics from several similar studies. People performed a battery of tests to achieve a score combining science literacy with cognitive reflection. The combined score should measure how well people understand and apply scientific thinking. Prof. Kahan calls it a “science comprehension” score.
He took data from several different groups of people. Besides the science comprehension score, each group also reported on some other personal characteristics, which gave a separate score. He then applied standard statistics to determine how the two scores were correlated in each sample group. Here is the summary; let’s explain ‘r’ and ‘p’ afterwards.
- Higher education had a positive correlation with science comprehension: “r=0.36; p<0.01”.
- Religiosity had a negative correlation to science comprehension: “r=-0.26; p<0.01”.
- Holding “conservative Republican” views also had a negative correlation to science comprehension, but only a “small correlation”: “r=-0.05; p=0.03”.
- Identifying oneself as “part of the Tea Party movement” had the surprising, but small, positive correlation with science comprehension: “r=0.05; p=0.05”. Both items (3) and (4) were from the same data set, written by the same people.
Note that we could express values such as “r=0.36; p<0.01” as percentages: “r=36%; p<1%”.
The Meaning of R-Value in Statistics
The “R-value” is the square root of the “Pearson product-moment correlation coefficient”. In simple terms, R-value indicates whether the two types of score go up together, or whether one goes up as the other goes down.
In very simple terms, think of plotting a graph with straight diagonal lines. The slope of the line is the r-value.
Statistical graphs may involve plotting scattered points, and trying to fit a straight line through the points. Generally the line that best fits the graph is found by the “least squares” method. That calculates the slope of a line that minimizes the sum of the squares of the distance of the plotted points.
The “sign” of the slope is very important. A positive slope, with the line going from lower-left to upper-right, indicates a positive correlation. For example, higher education had a positive correlation with science comprehension. If the line slopes from upper-left to lower-right, then the R-value is negative: as one score increases, the other decreases.
However, a negative correlation can be important as well. If a different study were to find a positive correlation between higher education and higher income, another study might find a negative correlation between higher education versus living in poverty. Both studies might provide legitimate insights of similar validity, even if the R-values have opposite signs.
Let’s make two final notes for r-values. First, a steeper slope for the best-fit line indicates that the data on the vertical y-axis has a greater range and was strongly correlated. If all the test subjects had very similar scores for science comprehension, then the best-fit line would be nearly horizontal no matter how well the scores relate to each other.
Finally, suppose the science comprehension scores vary considerably over a wide range, but the best-fit line has flat slope, with the r-value close to zero. That indicates very little correlation between the two types of scores.