Just a quick post as I've been without net, electricity and heat for the past week. I finally have the modern conveniences and it happens to be election day, so a quick post. There are a few other posts that should come in the next week as I had a lot of time to think.
Today votes are cast for the election of the President of the United States. Technically voters don't directly elect the president, but rather 538 electors.1 To be elected President a candidate needs 270 or more electoral votes.
There are dozens of polls all with some noise. Some are designed and executed better than others, some have built-in biases and most tend to move around during a campaign. The press tends to focus on them and take them as the odds that a candidate might win or lose the race.
Recently Nate Silver, author of the 538 blog, has been criticized for suggesting that President Obama currently has a better than 90% chance of being re-elected despite the likelihood that the election might be very close - possibly a one percent or less popular vote margin for the winner. This strikes some as entirely daft - how could the President have such a good chance in such a "tight" race?
It turns out this is a nice example of a statistical approach. There is no inconsistency with a close election and a relatively high probability of a certain result - in this case the result required for a win is the accumulation of 270 or more electoral votes.
Rather than rely on one or a small number of national polls, the trick is to take a group of state polls and calculate a total probability that a candidate achieves 270 or more in a combination of ways. Silver has a model that has some assumptions about the past accuracy and predictive properties of each of the individual polls as do others who use this approach.
There are many possibilities to sort through - something over 2.3 quadrillion. No one would try to be exact and some who use this meta analysis of polls look at a few thousand possibilities that are usually chosen to look at the important "battleground" states.2
It should be noted that the various meta-analysis approaches are currently giving Obama anywhere between a 70% and 90% of keeping his job.
We tend to have a notion that the popular vote is important, but to first order it isn't. There is a weighting forced by the electoral college - the game that really matters - and districting.
It is easy to make some wrong assumptions about the real problem, add our own biases (and there are many with politics!) and slip into innumeracy. I'm reminded of a section from John Paulos' book Innumeracy
If from some stock market advisor you received in the mail for six weeks in a row correct predictions on a certain stock index and were asked to pay for the seventh such prediction, would you? Assume you really are interested in making an investment of some sort, and assume further that the question is being posed to you before the stock crash of October 19th, 1987. If you would be willing to pay for the seventh prediction (or even if you wouldn't), consider the following con game.
Some would-be advisor puts a logo on some fancy stationery and sends out 32,000 letters to potential investors in a stock index. The letters tell of his company's elaborate computer model, his financial expertise, and inside contacts. In 16,000 of these letters he predicts the index will rise, and in the other 16,000 he predicts a decline. No matter whether the index rises or falls, a followup letter is sent, but only to the 16,000 people who initially received a correct "prediction". To 8000 of them a rise is predicted for the next week, to the other 8000 a decline. Whatever happens now, 8000 people will have received two correct "predictions". Again to these 8000 people only, letters are sent concerning the index's performance the following week, 4000 predicting a rise, 4000 a decline. Whatever the outcome, 4000 people now have received three straight correct predictions.
This is iterated a few more times until 500 people have received six straight correct "predictions". These 500 people are now reminded of this and told that in order to continue to receive this valuable information for the seventh week they must each contribute $500. If they all pay, that's $250,000 for our advisor. If this is done knowingly and with intent to defraud, this is an illegal con game. Yet it's considered acceptable if it's done unknowingly by earnest, but ignorant publishers of stock newsletters, or by practitioners of quack medicine, or by television evangelists. There's always enough random success to justify almost anything to someone who wants to believe.
There is another quite different problem exemplified by these stock market forecasts and fanciful explanations of success. Since they're quite varied in format and often incomparable and very numerous, people can't act on all of them. The people who try their luck and don't fare well will generally be quiet about their experiences. But there'll always be some people who will do extremely well, and they will loudly swear to the efficacy of whatever system they've used. Other people will soon follow suit, and a fad will be born and thrive for a while despite its baselessness.
There is a strong, general tendency to filter out the bad and the failed and to focus on the good and the successful. Casinos encourage this tendency by making sure that every quarter that's won in a slot machine causes lights to blink and makes its own little tinkle in the metal tray. Seeing all the lights and hearing all the tinkles, it's not hard to get the impression that everyone's winning. Losses or failures are silent. The same applies to well-publicized stock market killings vs. relatively invisible stock market ruinations, and to the faithhealer who takes credit for any accidental improvement, but will deny responsibility if, for example, he ministers to a blind man who then becomes lame.
I'm a huge believer in basic statistics and probability courses in high school - it is at least as important as basic calculus for modern citizens to sort through information available and not be so inclined to take predictions on face value. It is critically important to understand as much as possible about information and how it is filtered and manipulated. There is real danger in considering the thing some call "data" as pristine and reliable.
Back to the election. Many segments of the universe of voters are extremely stable and elections can be influenced by encouraging or suppressing some of them. Black and Hispanic voters tend to vote for Democrats, so elections with poor turnout can pivot favorably for Republicans and the urge for that party to suppress votes has to be strong. On the other hand the Democrats would want to focus on voter registration and get-out-the vote efforts in districts where a strong turnout can make a different. The same can be said about fundamentalist Christians voting for Republicans and there are a huge number of segments of varying degrees of strength and reliability. All of this gives enormous opportunities for trying to datamine and produce lists so these potential voters can be reached and potentially influenced.
It will be interesting to see how the meta-analysis folks make out this time. They usually do better than those who analyze (and mostly over-analyze) popular vote polls.
1 The number is based on the membership of the US Congress: 435 House members, 100 Senators, and 3 electors from the District of Columbia.
2 There are different approaches. The Princeton Election Consortium uses a technique that looks at the probability of getting an exact number of votes. It turns out you can do this by calculating a relatively simple polynomial expression. To get a deeper look at a daily prediction from the Princeton group check this out.