Thursday, May 1, 2014

More Notes on Planning Research: Always Include Your Sample Size, and Stop It with Pie Charts

The American Planning Association just released new survey data about what people want from their communities. The link is here. This is an update to their 2012 survey, which had some interesting claims. However, the 2012 survey was poorly described and I was skeptical of the generalizability from the data. I wrote a post about it. I know the authors of the 2012 report read my previous post because they sent me emails about it. They have improved their research methods not at all in the interim.

My main complaints are that when you present survey data you must include sample sizes, and stop using pie charts. The APA loves endless sheets of pie charts without any data about sample size within each group. It is not hard to include your n. (Here is a link to the only appropriate use of a pie chart.)

The sample size matters for understanding the accuracy of the data. For the entire sample the margin of error is about 2.7 (assuming 95% CI). The data are broken into subsets, though, so we don't know how many millennials or older people are actually in the sample. Millenials are about 25% of the total population, so if they are proportionately sampled that's about 325 people, and a MoE of 5.4% for their responses. That MoE changes the interpretation of the data quite a bit by introducing much more uncertainty of the claims. Uncertainty doesn't lend itself easily to infographics, though.

This Transportation for America survey from a couple weeks ago suffers from similar sample size issues, but at least presents the data in a way that the reader can assess the veracity, and they don't have noxious pie charts. That doesn't stop reporters from gleaning far too much insight from the data. See here, here, here, or just find one of the many other examples. Again with the MoE, though, is that the data sheet reports the MoE as 3.7, yet that is for the full sample of 703, not the subgroups created. As the survey collected 70 responses from 10 different cities the MoE is actually much larger for the data as presented. For any given city the MoE is 11.7% at 95% CI, so the entirety of the data should be used with caution when analyzing subgroups.

So below is what I wrote two years ago, and it stands for these reports again. Analysis using descriptive data can be very powerful if done well, and the difference between doing it well and not doing it well isn't that big.
Reports like this bother me in part because I teach planning research courses and would be distraught if any of my students turned in a report of this quality (without additional explanation, anyway). But the larger issue is that low quality research--whether it confirms or opposes your personal preferences--reduces the signal to noise ratio. Reports like "Planning in America" are noise that cloud our ability to understand critical issues and policy (the signal in this case). At the very least the full methodology should be explained, pie charts jettisoned and sample sizes included in tables and graphs. As for planning research, reports like this are why I argue planning education should focus primarily on numerical literacy and well-crafted basic research with descriptive statistics rather than advanced regression analysis. We should train planners to communicate with data rather than claim to be psuedo-econometricians. Many of the greatest failures of planning can be directly attributed to planners' inability to understand the fundamentals of quantitative data. (See here for an explanation of the most egregious example.) Reports like "Planning in America" make the situation worse, at least as currently presented. Let's not get excited about the claims made in it.

No comments: