A Technical Analysis Of Exit Polls In Venezuela

Much of the controversy about the recall referendum results in Venezuela centers around exit polls sponsored by the opposition showing diametrically opposite results (see previous post).  

According to the Miami Herald,

Experts in elections questioned the reliability of exit polls.  ''Exit polls are not to be trusted,'' said Horacio Boneo, who has monitored elections in more than 60 countries. Boneo says it is not unusual for voters to be untruthful.  ''Sometimes, people will tell the person doing the poll what they think the person wants to hear,'' he said.

According to the Chicago Tribune

Alejandro Plaz, president of Sumate, which has worked closely with opposition groups to observe the referendum process, said a quick count by his organization also showed Chavez winning Sunday's referendum.

At the same time, Plaz said a nationwide exit poll commissioned by Sumate of 20,000 voters showed the opposition winning 59 percent of the vote.

"Our results have given us a doubt the machines produced an outcome that reflects the will of the people," Plaz said.

But Carter, a veteran election monitor, discounted the Sumate and other exit polls, which he explained are "quite unreliable" in predicting electoral outcomes.

"There is a high chance even in the best of circumstances that exit polls are biased," Carter said Monday.

The brief pronouncement by Jimmy Carter does not do justice to the technical reasons behind that statement.  I would have liked to present a technical analysis of the Sumate exit polls, but they do not disclose their methodology.  Venezuelanalysis.com suggested that the Sumate exit polls are biased because they do not go out to the lower-class barrios where the Chavist supporters live.  This can be determined easily if Sumate discloses its sampling points (techical point: Sumate interviewed 20,000 voters and there are 8,500 voting stations across the country, so they must have selected a sample), the polling results at those points and the method by which they aggregated to the grand average.

In lieu of the Sumate exit polls, I will analyze an exit poll conducted by a reputatable North American company Penn, Schoen and Berland.  The PS&B exit poll results were published in a press release shown below.  My technical analysis will cover points that apply to any exit poll.

[Post-script: It would turn out that the Sumate and PS&B polls are the one and the same, but this does not affect the technical discussion.]


New York, August 15, 2004, 7:30pm EST - With Venezuela's voting set to end at 8:00pm EST according to election officials, final exit poll results from Penn, Schoen & Berland Associates, an independent New York-based polling firm, show a major victory for the "Yes" movement, defeating Chavez in the Venezuelan presidential recall referendum.

With more than 8 million Venezuelans having cast their ballots so far, the results of a national exit poll show that Chavez has been ousted by referendum.

The Penn, Schoen & Berland Associates exit poll shows 59% in favor of recalling Chavez (the "Si" or "Yes", anti-Chavez vote) and 41% against recalling Chavez (the "No", pro-Chavez vote).

The poll results referred to in this release are based on an exit poll just concluded in Venezuela.

This is a national exit poll conducted in 267 voting centers throughout the country. The centers were selected to be broadly representative of the national electorate in regional and demographic terms.

In these centers, 20,382 voters were interviewed. Voters were selected at random but according to a strict demographic breakdown by age and gender to ensure a representative mix reflective of the national electorate. Those voters who were randomly selected to participate in this exit poll were asked to indicate only their vote ("Si" - for "Yes" - or "No") on a small ballot which they could then personally drop into a large envelope in order to maintain secrecy and anonymity. Data was sent by exit poll workers to a central facility in Caracas, Venezuela for processing and verification.

The margin of error for these final exit poll results referred to in this release is under +/-1%.

As a bit of background, I am a survey statistician by profession and I have conducted survey studies in Venezuela since 1994.  My technical analysis will focus on those aspects that are common to all survey samples.  I do not have all the detailed information, so I can only suggest the important aspects to look at.

Quality Control

This exit poll is conducted under the auspices of Penn, Schoen & Berland, which has three offices in the United States and none in Venezuela.  It is not expected that PS&B would be able to import several hundred of its own interviewers, who would all need to speak fluent Venezuelan-accented Spanish, into this country for this one-day-only study.  So this project must have been sub-contracted to one or more local field suppliers.

The sub-contracting arrangement is standard.  My own survey studies were always sub-contracted.  However, I recognize that my goals (that is, to obtain the most accurate survey estimates within the budget) are not identical to the goals of my suppliers (that is, to appear to complete the work as quickly and hassle-free as possible, which may involve taking unauthorized short-cuts), and the goals of my suppliers do not match the goals of their workers (that is, to appear to complete their personal tasks as quickly and hassle-free as possible; sometimes, this might involve filling out surveys themselves).  That is why I had to devise all sorts of procedural and data checks to make sure that the work was executed correctly.

In summary, the brand name Penn, Schoen & Berland does not automatically guarantee a high quality in the exit poll.  I will emphasize that I have no evidence that the work was of poor quality.  I will repeat: I have no evidence that the work was of poor quality.  But you should not take it on account of the PS&B brand name that it is perfect, either.

Margin of Error

The press release ends with this sentence: "The margin of error for these final exit poll results referred to in this release is under +/-1%."  How did this number come about?

Here it is necessary to introduce some statistical terminology.  The margin of error in a survey is related to sampling error.  A sample is subset of a population.  When you draw a sample, you get a certain estimate from this sample (e.g. 60%); if you now go and draw another sample from this population, you will get  different estimate (e.g. 59%).  So the margin of error reflects how widely these sample estimates can differ.

Among other things, the margin of error depends on the sample size.  If I draw a sample of only one person from the total population for an exit poll, I would expect a large margin of error (basically, the margin of error is going to be +/-100% and the estimate is useless).  If I draw a sample of seven million people from the 10 million people who actually voted, I would have a very tiny margin of error because I have virtually covered the entire voting population.  In practice, there are limitations on resource availability, and PS&B reports having interviewed 20,832 voters.

If these 20,832 voters were a simple random sample from the 10+ million voters, then the margin of error is calculated as follows:

Let P be the percentage of persons who chose SI (=59%)
Let N be the sample size (=20,832)

Then the margin of error = 2 x SQRT [Px(100-P)/N] = 0.7% rounded up to 1%

The literal interpretation is that there is a 95% chance that the actual percentage of SI's is between 58% and 60%.

So far so good?  Unfortunately, this formula is only good if this is a simple random sample, and this sample definitely is not.  Conceptually, a simple random sample can be drawn by putting the identity numbers of the 10 million actual voters in a box. you stir, shake and mix the numbers up and then you select 20,832 of them at random.  This is physically not possible because you don't have such a list and you must interview the people when they exit the voting center.

Instead, PS&B says that the exit polls were conducted at "267 voting centers throughout the country. The centers were selected to be broadly representative of the national electorate in regional and demographic terms."  This is a two-stage sample, also known as a cluster sample, and it incurs an additional source of sampling error.

On one hand, if I told you that I intend to cover all 8,500 voting centers, then there is no margin of error involved related to any sampling of voting centers.  Things are fine.

On the other hand, if I told you that I intend to select one voting center in the country and conduct my exit poll there.  You should be deeply disturbed at the margin of error involved.  If that one voting center happens to be in the upper-class El Hatillo, the SI vote may be 95%; if that one voting center happens to be in Petare, the SI vote may be 10%.  The margin of error from the first stage of selecting a voting center is so big as to make any subsequent survey work useless.  I might as well as not bother.

Within the limits of the resources, PS&B opted to visit 267 out of the 8,500 voting centers.  There is a sampling error component that is absent from the +/-1% margin of error.  A correct margin of error will require PS&B to go back and apply the formula appropriate for a two-stage cluster sample formula.  It will lead to an increase in the margin of error, but I don't have the information to tell you what it might be.

Let me give you an extreme illustration.  Suppose that the country is so geographically polarized that all voters at a voting center will vote the same way (i.e. either 100% or 0% for SI).  At each of the 267 sampled voting centers, you will only have to ask one voter what his/her preference is.  All other voters at that voting center will have the identical response.  That being the case, it wouldn't matter if you interviewed 267 persons (one per voting center), or 20,832 persons or 500,000 persons.  Your sample will behave as if the sample size was only 267 and the margin of error will be +/- 2x[59x41/267]% = +/-6%.  PS&B's margin of error should be higher than +/-1% but not as much as +/-6%.

Response Rates

PS&B claimed to have completed 20,832 exit interviews at 267 voting centers.  On the average, they interviewed 20,832 / 267 = 78 persons per voting center.  This is enough to arouse suspicion about how the work was conducted.

The recall referendum was scheduled to be held between 6am and 4pm on Sunday, August 15, 2004.  In practice, due to the long lines, some voting centers stayed open until as late as 2 am in order accommodate all voters.

What is the typical field worker's task assignment for the day?  He/she is told to proceed to a certain voting center in the country, which may be remote, and then told to ask voters how they voted as they leave.  This will take all of 5 seconds to do.  Over the 10 hours that were officially scheduled, plus whatever additional hours, they somehow only managed to complete 78 interviews per voting center.  The average number of voters per voting center is about 10,000,000 / 8,500 = 1,176, of which 78 (=6.6%) were interviewed.  The productivity is incredibly low.  Here are some possible explanations, and I don't know which ones are true:

Size Measures

Not all voting centers have the same number of registered voters eligible to vote there.  The CNE recognized that lower-class areas have four times as many registered voters per voting center than upper-class areas.  In principle, they should have set up more voting centers.  However, they did not want to create confusion among people about where to go, and so the voting centers remained the same as before.

Imagine the following scenarios.  In voting center #1, the percentage of SI was 90% from 100 exit interviews.  In voting center #2, the percentage of SI was 10% from 100 exit interviews.  What is the average percentage of SI?

If the two voting centers had equal number of actual voters, the average percentage would be a straight average = (10% + 90%)/2 = 50%.

But if voting center #1 has 1,000 actual voters and voting center #2 has 4,000 actual voters, the average percentage would be (0.90 x 1,000 + 0.10 x 4,000) / (1,000 + 4,000) = (90 + 400) / (5,000) = 26%.

So you cannot just take the percentages from the 267 voting centers and take a straight arithmetic average.  That answer would be biased towards the voting centers with fewer actual voters (to wit, the upper-class areas) and away from those with more actual voters (to wit, the lower-class areas).  Thus, the percentage of SI's would be overstated.  PS&B's press release does not say what they did.

As we show above in the illustrated example above, the bias can be corrected by weighting.  This turns out to be an exceptionally difficult problem on this day.

The number of registered eligible voters at each voting center is known.  But the relevant number is the number of actual voters on Sunday, August 15, 2004.  This is not the same thing at all, because the absentee rates differ across voting centers.  In a country like the United States, there are extensive databases of voter turnout over time, so that it is possible to estimate  the number of actual voters at the voting center level with reasonable accuracy.

When Hugo Chávez was elected President, 6 million people voted in total of which 3.8 million voted for him.  The recall referendum this time was said to be unprecedented in the history of Venezuela because more than 10 million out of 14 million registered voters came out to cast their votes.  And this was not a presidential election with multiple candidates; this was the culmination of years of political struggle.  In other words, any historical data about voter turnout in Venezuela are inoperative for this day.  When the voter turnout rates are published eventually, PS&B will be in a position to calculate how their numbers might have been by re-weighting their data accordingly.

The above discussion covered a list of technical issues which lie behind Jimmy Carter's brief statement about the unreliaibilty of exit polls.  If exit polls were really that accurate, we wouldn't need to count the votes.

From Vheadline.com

Sumate directors confirm that they employed the exit polls system on Sunday to analyze results and came up with the conclusion that the  government's NO vote totaled 40.6% against the opposition's YES vote of 59.4%.

The polls were undertaken three times on Sunday (9:00 a.m., 5:00 p.m., and 11:00 p.m.).

While admitting that exit polls aren't perfect, Plaz and Machado contend that the differences between their exit polls and National Elections Council (CNE) figures is so great that they want an investigation.  They agree with Carter and Gaviria regarding the results, but the doubt lies in which side got the highest percentage.

The following questions are raised from this very sparse information:

Additional information about the Sumate exit poll was pubilshed on their website:

Sin embargo, para certificar este resultado, afirmó Palacios que decidieron hacer encuestas a boca de urna o exit poll en 300 centros, que es una muestra bastante grande y muy superior. Para ello recibieron asesoría de expertos de universidades nacionales y organizaciones internacionales, en cuanto al procedimiento de la selección de los centros adecuados, decidiendo escoger de acuerdo a técnicas del azar 19 en Distrito Capital, 1 en Amazonas, 13 en Anzoátegui, 5 en Apure, 18 en Aragua, 9 en Barinas, 9 en Bolívar, 28 en Carabobo, 1 en Cojedes, 13 en Falcón, 7 en Guárico, 11 en Lara. Estos centros reflejan el sentir del universo de Venezuela, con vocación hacia el Sí y hacia el No, rurales, urbanos, grandes y pequeños.

Palacios indicó que reclutaron y adiestraron durante más de un mes a los encuestadores para asegurarse que aplicaran las debidas técnicas de inteligencia emocional con el fin de que los ciudadanos respondieran adecuadamente para garantizar la confidencialidad. Se realizaron 267 consultas en todo el día, desde las 9 de la mañana hasta las 5 de la tarde, y una adicional a las 11 de la noche, obteniendo 12.097 encuestas a favor del Sí y 8.285 encuestas a favor del No, lo cual refleja una cifra de 59,4 a favor del Sí. Expresó quedesde las primeras horas de la mañana, desde las 7:00 a las 9:00 a.m. la cifra a favor del Sí era de 56 por ciento, a las 11 de la mañana era deun 59 por ciento, a la 1:00 p.m. un 61 por ciento, observando una tendencia clara que indicaba que a medida que avanzaba el día la proporción de ciudadanos votando por el Sí era mayor.

Esta fue la información proporcionaba una alta confianza en los sectores de la oposición de que el resultado final de las actas iba a ser favorable a las personas que apoyaban el Sí. Sin embargo, dijo Palacios, en la noche cuando se comenzó a recibir los resultados se produjo esa importante desviación con respecto a los resultados de las encuestas. Por ello decidieron ir al detalle, en aquellos centros donde se hizo la encuesta, buscando el acta de totalización logrando conseguir algunas desviaciones importantes. Dio como ejemplo el centro de votación Nº 1.323, frente a la Plaza Concordia, en el Distrito Capital, con 2.734 electores inscritos hasta el pasado julio de 2004, donde el acta de totalización recibida vía telefónica indica 905 ciudadanos votaron a favor del Sí (53 %) y 800 a favor del No; sin embargo, de las 80 encuestas o exit pool que se hicieron 68 de ellas apoyaron la opción del Sí, que representó un 85 por ciento. Esto quiere decir, que entre los resultados de la máquina y las encuestas hay una diferencia de más de 32 puntos porcentuales, que es más alto que la estadística indica para este tipo de procedimiento. También revisaron personas que firmaron en ese centro para solicitar el revocatorio presidencial, detectando que más de 1.075 provienen de ese centro de votación; sin embargo, en el acta de totalización apenas aparecen votando 905 personas por el Sí, lo cual genera una diferencia de 170 ciudadanos, lo cual es muy poco probable.

I can say the following:

The following news report came just after the above entry was posted.

(Associated Press via Yahoo! News)  U.S. Poll Firm in Hot Water in Venezuela.  By Andrew Selsky.  August 19, 2004.

A U.S. pollster whose firm wrongly predicted President Hugo Chavez would lose a recall referendum on Thursday defended the exit poll, which has landed in the center of a national controversy.

The poll by Penn, Schoen & Berland Associates has become such a hot issue because the opposition, which spent more than a year mounting the drive to force Chavez from office, insists it shows the results from Sunday's referendum itself were fraudulent.

Former President Carter and the secretary general of the Organization of American States, Cesar Gaviria, both monitored the vote and endorsed the referendum results.

The exit poll, released 4 1/2 hours before voting stations closed, said 59 percent would vote Chavez out of office. But in fact, the opposite was true — Chavez ended up trouncing his enemies and capturing 59 percent of the vote.

Pollster Doug Schoen said his firm has been involved in polling for years and recently correctly called elections in the Dominican Republic and Mexico.  "We've done this all over the world," Schoen said in a telephone interview. "To be off by 34 points as we are alleged to be, strains credulity — there was no real independent verification of the electronic count. There was almost certainly fraud in the central counting process," he said.

The opposition also claims electronic voting machines were rigged, but has provided no conclusive evidence.

Carter and Gaviria, both experienced election monitors, have said their independent sampling of results conformed with the official results.

Critics of the exit poll have questioned how it was conducted because Penn, Schoen & Berland worked with a U.S.-funded Venezuela group that the Chavez government considers to be sided with the opposition.

The firm had members of Sumate, a Venezuelan group that helped organize the recall initiative, do the fieldwork for the poll, election observers said.  Schoen said his firm "worked with a wide variety of volunteers that were provided by Sumate" but that they "were trained to administer the poll."

Venezuelan Minister of Communications Jesse Chacon said it was a mistake for Sumate to be involved because it might have skewed the results of the poll.  "If you use an activist as a pollster, he will eventually begin to act like an activist," Chacon told The Associated Press.

Roberto Abdul, a Sumate official, said the nonprofit organization received a $53,400 grant from the National Endowment for Democracy, which in turn receives funds from the U.S. Congress but did not use any of those funds to pay for the exit polling.

The issue is potentially explosive because even before the referendum, Chavez himself cited Washington's funding of Sumate as evidence that the Bush administration was financing efforts to oust him — an allegation U.S. officials deny.

Chris Sabatini, senior program officer for the National Endowment for Democracy, defended Sumate as "independent and impartial."  "Exit polls are notoriously unreliable," Sabatini said by telephone from Washington. "Just because they're off doesn't mean that the group that conducted them is partial to one side."

I had gotten the impression that Sumate conducted an exit poll which was confirmed exactly by another done by an US firm Penn, Schoen & Berland.  If you go back to the above descriptions of the two polls: PS&B's press release said that they sampled 267 voting centers, while the Sumate website said that they sampled 300 voting centers.  Since 267 is not the same as 300, I thought that they must be different (although I should have noted that they both had exactly 20,382 interviews).  So someone changed the parameters to create the illusion that they were different studies that confirmed each other.  

Most of the information coming from the AP article are political issues, as discussed in Penn & Schoen's Inaccurate and Dishonest "Exit Poll" on Chávez Vote by Al Giordano at NarcoNews.  From the technical perspective, it is now established that PS&B had little or nothing to do with the operations itself.  They did not sub-contract a professional and neutral survey company.  Instead, they relied on non-professional Sumate volunteers to conduct the interviewing.  When an interviewer is a volunteer rather than a professional, it is easy to imagine greater reluctance to show up and work in a chavista neighborhood.

There is also a discrepancy between PS&B methodology and the Sumate practice.  PS&B claimed the following:

Voters were selected at random but according to a strict demographic breakdown by age and gender to ensure a representative mix reflective of the national electorate. Those voters who were randomly selected to participate in this exit poll were asked to indicate only their vote ("Si" - for "Yes" - or "No") on a small ballot which they could then personally drop into a large envelope in order to maintain secrecy and anonymity. Data was sent by exit poll workers to a central facility in Caracas, Venezuela for processing and verification.

Since the PS&B press release went out in the evening of August 15, it is impossible to see how 20,832 envelopes could have been shipped from all over the country to the central facility in Caracas for processing and verification in time for tabulation.  In fact, Sumate also said: "The polls were undertaken three times on Sunday (9:00 a.m., 5:00 p.m., and 11:00 p.m.)" which is impossible since the PS&B press release went out at 7:30pm.  PS&B could not have done what they described and a more practically possible methodology is the one observed in the field by Josh Gindin at Venezuelanalysis.com:

According to Súmate, there are forty-five thousand of these volunteers all over the country—at least one at every single voting station, and at those voting stations deemed more important, there are as many as twenty.

Altamira, apparently, is one such location. Twenty conscripts stand around outside the voting center, clipboard in hand waiting for unsuspecting citizens to emerge, fresh from having voted.  “Good afternoon,” they purr, “would you mind telling us if you voted ‘Yes’ or ‘No’?” and “Yes, yes, yes,” is the most common response.

“How many ‘No’ votes have you received?” I asked, playing the naïve reporter.

“Let’s see,” she offered, tapping her tennis shoes, “there are no ‘Nos’ on this page, and one on this page. I have one ‘No’.”

“Just one?” I persisted.

“Well, I don’t know about the others, but I have just one,” she answered, then, spotting some emerging voters in the distance, she scampered of to collect more “Yeses.”

Where was the evidence of randomly selecting respondents according to a strict demographic breakdown by age and gender?  They were grabbing everyone that they can!  Where were those large envelopes in which small ballots were dropped?  All the responses were just written down on listing sheets that were counted and called in!  This is an unprofessional operation gone haywire, but this is one way to get the results out by early evening.  PS&B was describing the operation that they would have liked to run, but the ground crew went in another direction.  PS&B also went along with the process, because their press release was issued from Washington DC at 7:30pm on August 15, 2004 and they have to know that this methodology that they wrote down in the press release could not have been implemented as stated. 

(Venezuelanalysis.com)  The Statistical Fraud of Venezuela’s Opposition.  By Gregory Wilpert.  August 21, 2004.

First, with regard to the exit polls, apparently the opposition had organized about five exit polls. Perhaps the most authoritative of these polls was one that was supposedly conducted by Penn, Schoen and Berland, a market research company. According to their press release, they conducted the poll in 267 voting centers and surveyed 20,382 voters immediately after they cast their vote on August 15. Their exit poll indicated that 59% of those surveyed voted “Si” or “yes,” in favor of recalling Chavez, and 41% voted “no,” against recalling Chavez. The margin of error was +/- 1%.

When the actual results turned out to be exactly the opposite, the opposition said that based on its exit polls (not on the basis of anyone else’s), the vote must have been a massive fraud, in effect stealing about 2 million “si” votes and turning them into “no” votes – no doubt one of the largest electoral frauds in world history of otherwise free elections.

Such a comparison of exit poll results with voting results should beg the question of which is a more reliable measure of voter will: an untransparent exit poll, conducted by an interested party, or an extremely closely monitored vote, with disinterested international observers and rules that both sides agreed upon? Nonetheless, Venezuela’s opposition insists that the exit polls are the valid measure, enough so to call the vote a fraud.

Perhaps instead of focusing on how a fraud could have been possible (for which they have no clue), the opposition would be better advised to carefully examine its own exit polls. Several possible (charitable) explanations exist.  Jesse Chacon, the Minister of Communication and Information, suggested that the reason for the false results had to do with the selection of the people who conducted the polls, who were provided by Sumate, the US-funded organization that provided much of the opposition’s logistical support for bringing the referendum about. That is, by choosing volunteers who are sympathizers of the opposition, the polls would inevitably present a bias towards the opposition, especially given the clear class difference in Venezuelan voting patterns and people’s general preference to approach people who are more likely to think, dress, and act like them. Another reason Chacon mentioned for the erroneous exit polling data was that Sumate urged Venezuelans to give their voting information to the exit pollsters. This, of course, immediately would brand the exit poll as an opposition operation and encouraged opposition sympathizers to reveal their vote, but probably acted as a disincentive for Chavez supporters.

A professional exit poll operation, however, should know to avoid such errors. Another explanation for wrong exit poll results would have to do with the selection of voting centers. Generally, these should be selected at random from all available centers. However, considering that the Sumate volunteers who conducted the poll for Penn et al. tend to come from the middle class, it is doubtful that they had enough volunteers from the barrios to staff the barrio exit polls. What this means is that they would have had to send middle class volunteers into the barrios, a proposition most average Venezuelans find quite unimaginable. To most middle class Venezuelans, the barrios are dangerous territory that is to be avoided at all costs, much like most whites avoid Harlem in New York City. It seems extremely unlikely that Sumate was able to properly cover the barrio voting centers. This inability to cover the barrios distorts the exit poll result disproportionately because, on average, barrio voting centers are much larger than the ones in middle class neighborhoods.

Finally, the +/- 1% margin of error that Penn et al. cite is very misleading. This margin of error refers to the accuracy of the poll with respect to the vote at the polled voting centers (a sample of 20,000 out of about 500,000 population at these centers, results in a +/- 1% margin of error). However, one must also consider the margin of error when sampling the voting centers themselves. That is, a sample of 264 polled centers out of a total population of 4,766 automated voting centers results in a margin of error of +/- 5.83%.  If the selection of representative voting centers ends up being off by only a few percent, the polls at those centers, especially given the biases already mentioned (such as partisan poll takers), the error would be magnified even more. That is, just a small error in the selection of voting centers in favor of anti-Chavez centers, could end up with an amplified distortion in favor of the “yes” vote, since voting in many centers tended to be very polarized (some centers disproportionately in favor of Chavez—the barrios—and some disproportionately against Chavez).

Postscript:  A different version of this post appears at ZonaLatina.com.