Stats literatacy
by Tom Temple
15 June 2005
Suppose we want to answer the following question. “Do abstinence pledges reduce teen pregnancy or STDs?” Let’s take a survey and find out if the answer is yes or no.
Hey look! kids who take an abstinence pledge are more likely not to have sex for 12-18 months later than their non-pledging peers. Well at least that’s what they self-report. But c’mon, why would you lie about breaking a pledge. Let’s trust them.
Wait a second, what if the only kids who take the pledge planned on not having sex anyway? Really all our survey data show is correlation. So lets look at some other factors. If pledgers abstained more across the board, then we might be more inclined to think that pledging makes a difference. That way we remove the shared cause issue. There would still remain the “Does abstainance cause pledging?” question, but that could only be resolved by controlled experiment.
So what else correlates with pledging? In order 1) Asianness (shrug) 2(tie)) religiousity 2(tie)) not having a paramour (more variance than religion) 3) being unpopular 4) parents dissaproving of sex
Funny, the things that correlate with no sex are pretty similar 1) no lover, 2)asian 3)religious 4)parental 5)unpopular. I am not going to pick my way through the fog of multivariate regression. The paper couldn’t do a good job of it, neither will I. It is imaginable that the pledge has some remaining effect. But 18 months? We all know that is too much to possibly be true.
Heritage has two studies simultaneously attacking and defending the work. Here is the NYT? piece on the issue.
The team needs to do “a lot of work” on its paper, said David Landry, a senior research associate at the Alan Guttmacher Institute in New York. He said in an interview that it was “a glaring error” to use the result of a statistical test at a 0.10 level of significance when journals generally use a lower and more rigorous level of 0.05.
Dramatic Interlude
NYT reader: OMFG! They didn’t use the .05 level?! That’s not scholarly… Actually what does that mean?
poof
Stats Fairy: That means that there is more than a 5% chance that he erroniously rejected the hypothesis that there was no effect. In other words there was more than 5% but less than 10% odds that the variation was by mere chance.
NYTR: To say that, wouldn’t you need to know the odds ahead of time?
SF: Well, kinda. You need to know the odds of getting certain results. Luckily someone long, long ago did some math and made tables. Sometimes things turn out just like in the tables. For instance coin flips…
NYTR: I can handle coin flips. Is this like coin flips?
SF: Well, no, but we statisticians can handle coin flips too. You see if I flip a coin a very large number of times…
NYTR: You just said it istn’t like coin flips.
SF: Do you like garlic knots? Let’s to go to Ramuntos.
poof
NYTR: Hey, come back! I have another question.
poof
NYTR: What is so special about .05 and .1? Why can’t they just tell me the odds straight away.
SF: No, no. Once you’ve rejected a hypothesis, it is completely rejected; it is utterly gone forever. It is not rejected until you get to .05 (or .1 if you live in a backwards country like Uzbekistan) and then blam. You see .05 is a very special number. Look at your hand. How many fingers are there?
NYTR: Where are you going with this?
SF: Right, there are 5. That is why we use 5% significance testing.
NYTR: You’ve got to be joking.
SF: God wouldn’t have given you five fingers if it weren’t the right level of significance.
Seems like we need to do a quick credibility check. So what do we have for papers by Bearman? What do we have for Rector and Johnson? Landry seems potentially credible except for the fact that he deliberately mislead the reporter.
No surprises there considering the articles. We’ve got acedemic hacks versus political hacks. Rather than defend either, I think I am going to cut this post off short.

Jun 16, 10:42 AM
Continuing my stats conversation with Tom:
Your complaint about significance levels is correct Tom, and that’s why we have p-values.
Reporting that a result was “significant at the .05 level” when performing a test of some hypothesis isn’t popular anymore (with statisticians). Much preferred is to report the p-value, which is the smallest level at which we would still reject the null hypothesis.
Sadly, not everyone has caught on.
Even better is to report a p-value and a confidence interval, though this isn’t always possible.
People need to stop demanding an answer from stats and recognize that its just a way to quantify evidence for and against various options. So at the end of the day, stats doesn’t give yes/no answers, just good/better/bad/ambiguous evidence for stuff.
Sheesh.
Jun 16, 07:10 PM
Joran wrote: “People need to stop demanding an answer from stats”
I agree, for the reasons you suggest. But if you want people to stop demanding answers, then I think those who give answers based on statistics need to learn to be more careful about their claims.
This is often the fault of the media, who tighten up copy at the expense of clarity, but it’s also a favourite dodge of politicians everywhere. The fact that We the People are overall so mathematically illiterate doesn’t help much, either.
Jun 17, 08:30 AM
Michael wrote: “But if you want people to stop demanding answers, then I think those who give answers based on statistics need to learn to be more careful about their claims.”
Indeed. The stories you hear hanging out with actual statisticians are pretty incredible. The number of times some scientist (not always social scientists!) come to them with data and say “Give me an answer!” is astounding.
My advisor told me about some surgeons who were testing 3 types of artificial heart valves. They showed up with data on two of each! And they wanted to know if there was a “statistically significant difference” in blood leakage. Yikes.
My favorite, though, comes from official US Forest Service policy. My advisor did an intership at the USFS while a grad student, working with some plant researcher, a trained biologist. My advisor got to help this guy out with doing his stats.
Data collection: they had to find some type of fern and then measure some random aspect. Basically, they drove around forest service roads until the biologist saw some ferns. Then they’d get out and measure. Official FS policy stated that this sampling plan was to be referred to as “subjective sampling without bias” in their papers.
In my brief experience, the worst offenders are actually ecologists and organismal biologists. They’ll spend 4 months counting the number of bugs in some random spot of forest. After 4 months, they’ve found like 5! And they want us statisticians to extrapolate some species population estimate. (Not to mention species trends!) Good God. They usually aren’t happy if you just say “Not many”.