Project description

Outside of the Kabila family, there hadn’t been a peaceful transition of power in the Democratic Republic of Congo since its independence from Belgian colonial rule in 1960. Members of the Kabila family had been in power in the DRC for more than two decades by the time of the 2018 presidential election. Contemporaneous opinion polling indicated that the incumbent, Joseph Kabila, commanded nothing close to a majority of support and was likely to lose the presidency.

 

All signs pointed to victory in the December 30 election for an opposition candidate, Martin Fayulu. However, the DRC Independent National Electoral Commision (CENI) publicly declared on January 10, 2019 that the presidency had been won by a third candidate, Felix Tshisekedi — a supporter of President Kabila. This announcement included only total vote counts; CENI refused to release disaggregated polling station-level data.

 

For 20 days, from December 31 until January 19, internet traffic was blocked throughout the DRC. During this time, the Financial Times obtained a leak of results compiled from electronic voting machines across the country, as well as counts of confirmatory paper ballots collected by the Congolese Catholic Church, the only organisation conducting extensive election monitoring in the DRC. We found the two datasets to be in correspondence with one another, describing a result wherein Mr Fayulu was elected Congolese president, contradicting the results declared by CENI. Our analysis indicated Mr Fayulu won the presidency with nearly 60 per cent of the vote, with Mr Tshisekedi a distant second on 19 per cent.

 

Although we trusted our sources — some of whom provided information at great personal risk — further verification of the datasets was essential before we could countenance publishing a story that had very real potential to cause civil unrest.

 

We tested the correspondence between the datasets on the vote shares of each of the three main candidates in each of approximately 200 polling districts, and found correlations between them ranging from 0.976 to 0.991 (1 representing a perfect match). We also employed Benford’s Law, a statistical test commonly used in fields such as forensic accounting to identify fraudulent data, against the vote counts in both datasets and found no significant evidence of deviation from the law.

What makes this project innovative?

.Deviation from Benford’s Law had previously been used as an indicator of electoral fraud in Russian elections, but here it was employed in conjunction with tests of correlation between vote share figures in two leaked datasets from separate sources. To our knowledge, this is a novel set of circumstances and one of the clearest indications of electoral fraud on a national level to have been derived from a journalistic statistical analysis.

What was the impact of your project? How did you measure it?

.Within hours of our story’s publication, the US congress House Committee on Foreign Affairs announced that its chairman, Representative Eliot Engel, had written to Secretary of State Mike Pompeo to request “a detailed explanation of how the State Department is working with other members of the diplomatic community to ensure that the [DRC] election results are confirmed transparently and in accordance with Congolese laws and regulations” (https://twitter.com/HouseForeign/status/1085246790637707264).
The African Union subsequently said there were “serious doubts on the conformity of the provisional results” among its member heads of state and called on the DRC’s constitutional court to halt plans to rule on a dispute over the outcome.
The EU urged CENI to publish disaggregated vote counts from individual local compilation centres.
The US imposed financial sanctions on three CENI officials, citing allegations of corruption linked to last year’s discredited presidential and parliamentary polls.
The story was picked up by both Congolese and international media including the BBC, The Economist and the international francophone network TV5Monde.

Source and methodology

.The primary dataset was a leaked comma-separated values (CSV) file of results from 62,716 electronic voting machines at 17,782 polling stations across the DRC, representing approximately 86 per cent of all votes cast in the election. According to our source, this had been retrieved from the CENI central database prior to their announcement of the results. The second dataset was obtained from the Congolese Catholic Church, whose 40,000 observers counted paper ballots from 28,733 voting machines, representing approximately 43 per cent of all votes cast.

We calculated each candidate’s vote share for each of the approximately 200 polling districts covered by both datasets, then calculated the Pearson’s correlation coefficient between the voting machine and paper ballot figures. Two independent calculations within the FT, using two different tool sets, arrived at the same results, with correlation coefficients ranging from 0.976 to 0.991 for each of the three main candidates, indicating high levels of fidelity between the datasets.

The test of deviation from Benford’s Law was conducted on the first and second digits of the district-level vote counts in both the CENI and Catholic Church datasets, using a range of statistical tests including tests of Euclidean distance, a chi-square goodness of fit test, and a Kolmogorov-Smirnov test.

Technologies Used

.Python was the primary programming language used on this story, with the pandas library (https://pandas.pydata.org/) used for data cleaning, aggregation and correlation testing and the Altair library (https://altair-viz.github.io/) used for exploratory visualisation.
R was also used for correlation testing and for performing the Benford’s Law tests using the BenfordTests (https://cran.r-project.org/web/packages/BenfordTests/BenfordTests.pdf) and benford.analysis (https://cran.r-project.org/web/packages/benford.analysis/benford.analysis.pdf) packages.
A customised version of a JavaScript- and D3.js-based (http://d3js.org/) scatterplot template from the Financial Times’ Visual Vocabulary (https://github.com/ft-interactive/visual-vocabulary-templates) was used for the primary graphic.

Project members

Tom Wilson, David Blood, David Pilling, Andrew Garthwaite, Joanna Kao

Link

Additional links

Followers

Click Follow to keep up with the evolution of this project:
you will receive a notification anytime the project leader updates the project page.