Out of control
by Tom Temple
Mar 28, 10:28 PM
So another study came out about parenting. It’s times like these when our inability to talk precisely about data gets pretty painful. Too bad the study itself is pay-only. Here’s a paragraph from Slate where Emily Bazelon is trying to say that all the major media outlets are misreading it.
The source of the fuss is the latest installment of a long-running $200 million effort by the National Institute of Child Health and Human Development. Since 1991, a team of researchers has been tracking more than 1,300 children, following them from infancy through various child-care settings (home with mother, home with another relative, home with nanny, or at day care) and into elementary school. In the March/April issue of Child Development, the team asks “Are There Long-Term Effects of Early Child Care?” To answer that question, the researchers report their findings about the kids’ academic achievement and behavior through sixth grade. The study controls for a host of variables, like socioeconomic status, quality of parenting (annoyingly, this measure involves only mothers), quality of child care, and quality of the elementary-school classroom. It’s all very well-done and careful.
My problem here is “controls for”. If the study had a million kids in it, you could probably “control for” perhaps half of those things. I’m pretty sure she means “collected data on”. On the second page she expounds on how the study failed to control for the quality of the child care.
Does that distinction matter? Absolutely. You have 1300 kids, some do better on tests, some get in trouble, some went to day care, some watched baby Einstien. It’s easy to say that two variables are correlated. You just ignore all the other data. But what you would like to say is that the correlation is not an artifact of some other relationship. This requires taking all the rest of the data into account—that we “control” for those other variables.
The problem with that is it requires having enough data to show a valid relationship for every possible combination of every value of the variables being controlled for. Clearly that is impossible with a sample that small. I’d guess what they did is they fit linear relationships and then asked whether there were correlations in the residuals. I won’t even start with how dumb that is. But in their defense I should add that there is no “correct” way to summarize data. This fact has the troubling implication that no statistics can ever be exact. I guess then that it’s understandable that we don’t have a universal language by which we can describe it.
So there is only one thing to be done and that is to make the data public.
