Andrew Gelman is a professor of statistics and political science and the director of the Applied Statistics Center at Columbia. A student of math and physics at MIT, he developed an interest in statistics as a college senior and has gone on to become a leading educator and blogger in the field. His work has focused particularly on American politics, including research on the ability to predict elections, the power of the individual voter, and the benefits of redistricting. He blogs at andrewgelman.com.
Lion Profiles: Andrew Gelman
In your blog, you regularly call out and discuss statistical misinterpretation and deception. What are some important statistical lies being propagated now?
The biggest lie, I think, is that certainty is easy to attain using routine methods. This is a lie that many people tell to themselves. As the saying goes, the first step in fooling others is to fool yourself.
You’ve described your successful projects as endeavors aimed at “big fat targets”, such as voting patterns and election incumbency. What targets interest you most now?
In political science I’ve lately been interested in studying polarization and the role of social groups. We’ve been thinking a lot about what we call the social penumbra, which is the set of people connected to a group. For example, the number of gays in America is about the same as the number of Muslims in America, but, in surveys, a lot more people report having a close friend or family member who is gay, than report the same of a Muslim. Two groups that have approximately the same size, have much different penumbras.
What larger statistical questions, in general, will emerge in coming years?
At one extreme, there’s been lots of difficult statistical work on integration of large streams of data, for problems ranging from internet marketing to self-driving cars. At the other extreme, lots of decisions are still being made based on whether a comparison is “statistically significant.” To consider one application area: there’s lots of talk about personalized medicine based on each person’s genome; but new medical therapies are still being evaluated using crude between-person experiments. How can this be? If we can barely come to a consensus about what works in medicine, or what are the effects of different diets, how can we hope to design individualized therapies? In many areas of applications, we need more local and relevant data and less reliance on statistical significance.