Project description

Virtually every story about election polling includes some treatment of cross-tabs. These articles typically cherry-pick sub-groups of voters whose preferences seem unusually striking. They can also be highly misleading, because they often rely on statistically insignificant samples and conflate the impact of collinear variables–for example, single women reliably vote Democratic, but that is largely because they tend to be younger than married women are. Our story \”All politics is identity politics\” avoided both of these pitfalls. It utilised a sufficiently large database (with 125,000 individual responses) to provide reliable estimates of the opinions of small sub-groups. And it used logistic regression to isolate the impact of each variable, with all others held constant. This yielded powerful new insights, such as the pattern that among blacks, younger voters are actually more conservative than older ones—perhaps because they have experienced less direct exposure to segregation. Our story \”British voters are unimpressed by Theresa May’s Brexit deal\” took a similar approach, although we wanted to give our readers a sense of what the country thought of three options (remain, May\’s deal and no deal), and where the fault lines in the electorate lay.

What makes this project innovative?

YouGov kindly provided us with 125,000 individual responses from voters across America and 90,000 from Britain, and we used these microdata to create two models relating basic factors to voters’ preferences on the US mid-term elections and on Brexit. We’re not aware of any other news organisation that has created a model like ours and used it in this way. In the American voter model, we modelled a binary choice between Democrats and Republicans and displayed the result on a simple line chart. For Brexit however, voters had three choices (remain, May’s deal, or no deal). To present these options, we used a “ternary plot”, showing our confidence that a person with given demographics would opt for any of these three choices. This lets readers survey the full landscape of support (and also shows how shallow support for May’s deal seems to be among the public).

What was the impact of your project? How did you measure it?

Both the "Build an American voter" and "Build a British voter" interactives were very popular on social media, with many readers selecting their own profile, marvelling at the model's accuracy and sharing the results. The former was also the relevant issue's second most-read article on and has won two awards (The Pudding Cup's best visual and data-driven story of 2018, and Bronze at Malofiej for innovation/customisation).

Source and methodology

YouGov provided us with the raw survey data.

Technologies Used

We used R to build the logistic regression models, Illustrator to design the print versions and d3 to create the interactive versions.

Project members

Dan Rosenheck, Matt McLean, James Fransham, Evan Hensleigh, Martin Gonzalez


Additional links


