I may overuse the parable about the blind men describing the elephant when talking about polling and pollsters, and why we aggregate … but that’s because it’s a great metaphor for what’s going on. Think of the electorate as the elephant, and the pollsters as the men trying to describe the elephant, one small sample at a time, except one pollster has his hand on its trunk, one on its tusk, one on its tail, and so on. Individually, their stories seem to contradict one another, and might sound bizarre taken on their own. (“The electorate is sharp and pointy! No, the electorate is long and bristly!”)
That’s never been clearer than this week, when The Upshot on the New York Times website commissioned a poll from Siena, and then gave the raw data to four different other pollsters to see what final results they would arrive at. Remarkably, the results ranged from Clinton +4 to Trump +1. Mind you, this wasn’t four different samples; this was four different interpretations of the same sample, just using different judgments about how to determine likely voters and how to weight the different demographic subsamples! This really reaffirmed that there’s no one right poll, only a range of acceptable answers.
What the men describing the elephant really needed was someone off to the side, jotting down all the different descriptions and trying to assemble them into a complete picture of an elephant. If you get enough of the conflicting samples in one place, and try to put them in an order that makes sense, you might actually get a clearer sense of what everyone is fumbling around with. That’s where poll aggregation comes in! It gives you more data points to flesh out that “range of acceptable answers.”
(I’m not using Sam Wang’s much grosser metaphor: “Think of each poll as being a mystery body part; what PEC provides is the head cheese.”)
Unfortunately, even the job of being the guy on the sidelines compiling all the reports of tusks and trunks doesn’t have its objective set of rules. Do we listen to every description equally, or do we give extra weight to the guys who already have a decent track record of successfully describing animals? Do we add in some historical precedent, like whether have been other elephant sightings in recent years? Do we try and make small corrections to the descriptions when we’re pretty sure it’s an elephant, but some of the describers keep insisting they feel fur? Do we outright discard the reports from the describer who seems clearly drunk (we’ll just call that guy “Emerson”)?