This is cross-posted on democraticSPACE.
There are a lot of published opinion polls out there: I count
41 since September 8. The trouble is that many of them contradict each
other – just what are we supposed to make from all this? Let's haul out
our introductory statistics textbooks and take a closer look.
Firstly, let's get an idea of the sort of variations that we should
expect to see, that is, the inherent uncertainty with drawing random
samples. Ekos, Nanos and Harris-Decima are all publishing daily
tracking polls, with announced 95% confidence intervals of 1.6%, 3.1%
and 2.6%, respectively. If sampling error were the only source of
uncertainty, what would a typical deviation between Nanos and
Harris-Decima look like? (I'm choosing those two because they have the
larger announced MOE; what follows will be an upper bound for the
variations we should expect.) The answer is not 3.1 + 2.6 = 5.7%.
If we play around a bit with the arithmetic of confidence intervals,
the standard deviations for the Nanos and Harris-Decima estimates are
1.58 and 1.33, respectively. If Nanos and Harris-Decima are operating
independently, the standard deviation for the difference between the
two estimates is (1.582 + 1.332)1/2 = 2.06. From this, the half-width of the 95% confidence interval is 1.96*2.06 = 4.05.
If sampling error were the only thing to worry about, the chances of
seeing a gap of more than four percentage points is 1 in 20. It also
follows that the probability of observing a discrepancy of 5 or more
points is 1 in 65, and the odds of seeing a 6-point gap are 274:1
against.
So much for the theory; what are the variations we're actually
seeing? First up is the support for the Conservative party; click the
image for a full-sized version. In all the graphs that follow, the
scale of the vertical axes is the same, so that differences in poll
results are the same distance apart.
So far, so good: it is almost always the case that the polling firms
differ by 3 percentage points or fewer. Indeed, there have been many
days where two or three polling firms come up with the same estimate.
It's also interesting to note that of the 26 polls published since
September 15, only two are outside the range of 35-38%.
Now the New Democrats; again, click the image for a larger version.
The range of variation is greater than was the case for that of the
Conservatives, but at least some of that can be attributed to the fact
that NDP support appears to have been growing during this period. Aside
from Ekos – which started with high initial support – all the polling
firms show a discernible trend upwards.
For the Conservative and NDP numbers, the variations across polling
firms is pretty much in line with what we'd expect. But things start to
break down when we look at the Liberals; click on the graph for a
full-size version:
In contrast to the fairly tight variation we've seen so far, the
gaps between the reported estimates for Liberal support are implausibly
large: discrepancies of 8 points – in theory, an event that should
occur 1 time in 10,000 – are not uncommon. Something is seriously wrong
here.
The same problem for the Greens; again, click the image for a larger image:
Discrepancies of 5 or 6 percentage points should be rare (especially
for the Greens), but we're seeing them pretty much every day.
It seems pretty clear that Nanos is the outlier here: their numbers
for the Liberals are consistently higher, and their estimates for the
Greens are consistently lower than what the other firms are reporting.
It seems that there is a significant chunk of the population –
something around 4% – that is telling Nanos that they'll vote Liberal,
but telling everyone else that they will vote Green. Why? And what does
this mean for what will happen on October 14?
It's not too hard to come up with a plausible answer to the first question:
Mr. Nanos says the key difference is methodology.
Unlike other polling firms, his asks open-ended questions on voter
intention. Instead of offering a list of choices — "Would you vote a)
Conservative, b) Liberal …" — Nanos phone operators ask an
open-ended question that requires respondents to come up with their own
answers instead of multiple choice."If they don't get the list, you get the cleanest
read because they have to articulate their support," Mr. Nanos said.
The open-ended question eliminates the importance of the order in which
the parties are listed, although most companies vary the the order to
mitigate this factor.Also, the open-ended method tends to put the Greens
lower than other parties because, Mr. Nanos believes, respondents are
not reminded of the party when they answer. Some will choose the Greens
as a none-of-the-above if they hear the party name on a list before
answering.
But it's much less easy to figure out what this segment of the
population will actually do on election day. Will they indeed vote
Liberal? Will they be reminded of the alternatives when they get to the
voting booth (the party names are listed on the ballot) and vote Green?
Or will they simply not vote?




Interesting analysis. The methodology is clearly a big factor in these numbers.
You may want to have a look a couple of posts on the Green Party website by myself and Jim Harris.
http://www.greenparty.ca/en/node/7289
Thanks,
Jim
I thought the 2.06 should be divided by the square root of two, even if you assume that the populations of the two samples are the same. I’m coming at that from the t-test and could be completely wrong though. Am I?