For the 2018 Brazilian general elections, our team developed a highly detailed results map, segmenting the electoral results by ballot location. This may seem trivial in countries where the state of open-data policies is more advanced, but the Brazilian electoral authorities don\’t provide the geographic coordinates of most voting sections, only their addresses – and even this basic data is often too dirty to use easily.
To overcome this, we spent months cleaning the database and geocoding around 52,000 locations. After that, which would already amount for an never-seen-before level of detail by itself, we decided to add another layer to our analysis.
Most people in the country vote in the ballot closer to where they live, a policy that is stimulated by the electoral registration officers. Due to that, we used the voronoi algorithm to approximate the ‘area of influence’ of each point – that is, we drew polygons upon the map, each one enclosing all the people who vote in a given place.
Having computed those areas, we were able to use census tracts data to approximate the demographic and socioeconomic characteristics of the people that voted in any place of the country.
Again, this kind of information is not provided by the Brazilian government or any other institution – the results were inedit. The data was used by the newsroom to find ultra-local stories that we wouldn’t otherwise see, but was also presented to the public as a news app that allowed free exploration and filtering.
What makes this project innovative?
Our team engaged in a eighth months long effort of gathering, cleaning and combining data in order to have exclusive information. By merging data from many different sources, we were able to find and tell stories that no other organization had. We also offered our readers the possibility to use an interactive app and tailor the data to their specific needs and curiosities. It’s all information that was not available before – we are not simply facilitating access to public data, but combining multiple sources in order to offer something new. This is probably the most detailed results map published by the media in any Brazilian election, ever.
What was the impact of your project? How did you measure it?
The 2018 election marked a deep change in the brazilian political landscape due to the emergence of new actors and the demise of traditional parties. Our goal was to have a long-term impact in the public discussion by providing rigorous, comprehensive data about this key event in recent history. The media, researchers and the general public can use the map to look at voter behavior at a granular level for the first time. This allows for a deeper, evidence-based and data-driven discussion about the social, economic and geographic factors that shape the political forces in the country.
Source and methodology
The raw data comes from three sources: Brazil’s Superior Electoral Court (TSE), the Brazilian Institute of Geography and Statistics (IBGE) and the Google Maps API. We tried to be as transparent with the public as we could possibly be, so a detailed description of the methodology is provided at the bottom of the story. Below, you can find an English translation: "[...] An initiative of this scope has several technical limitations and methodological issues that need to be clarified. Here, we try to answer frequently asked questions about the project. GEOLOCATION - How did we find the polling places? To put the polling stations on the map, it was necessary to get the latitude and longitude of every one of them. Part of these coordinates was disclosed by the Superior Electoral Court (TSE). The rest was obtained through automated queries to the Google Maps system. - How accurate are the results of those queries? Since the search for the geographic coordinates was done automatically and the country has tens of thousands of polling places, we could not check manually if all queries returned the correct latitude and longitude. However, we have created mechanisms to make sure that most polling places are correctly positioned. The first test was done within Google Maps itself: we chose to keep only the polling places whose query, according to the company system, has a result of the highest possible reliability. With low reliability data removed, 10% of the coordinates were randomly selected. The latitude and longitude of these were manually checked: less than 5% were wrong, a margin of error that we found acceptable. Thus, even though not all polling stations are represented with 100% precision, most of them are. Accuracy tends to be higher in large urban centers and smaller in remote areas. - So not all of Brazil's polling places were considered? No. About 50% of the locations are represented on the map. Nevertheless, more than 86% of the electorate is included, since there is greater location accuracy in large urban centers, where polling places are larger. The project does not aim to show who won the election in the whole country, but to reveal, with unprecedented detail, the performance of each candidate in several places in Brazil. This way, we think it's better to "lose" votes than to put them in the wrong place of the map. - So the map does not count all the country's votes? No. There are missing voting locations and therefore votes that have not been counted. However, each voting station in the map had all its votes considered. NEIGHBORHOODS - The polling stations are buildings. Why do each one of them cover a huge chunk of the map instead of being represented by a point? Given the coordinates of each polling place, we estimated its "influence area". Each polygon on the map includes the area that is closest to that specific polling place than to any other. To do this calculation, we used a mathematical algorithm that creates a Voronoi diagram, as we explained in the article. We made the assumption that most voters vote at the polling station closest to where they live. This way we can say that the people who live inside a polygon vote there. - Does the each polygon REALLY contain the people who vote in its polling station? Most voters are registered to vote in places close to their home, as they are encouraged to do so by the electoral registry. However, this is not true for all cases. It's an approximation. - Why do some cities have no polygons? In order to compute the polygons inside a given city, the computer algorithm we used requires that at least four of its polling places have known geographical coordinates. In smaller cities, where Google's geo-referencing service does not work so well, this minimum amount is often not found. In these cases, the whole city is represented as a single polygon containing all the votes registered there. DEMOGRAPHICS - How did you find out the demographic profile of each neighborhood? The Brazilian Institute of Geography and Statistics' (IBGE) 2010 census tracts were used to approximate the social and economic characteristics of each neighborhood. The tracts are a unit of territorial analysis that gathers detailed information about the population that lives in a certain space. It reveals, for instance, exactly how many black, white, or brown people lived in each of them in 2010, the date of the last Census. With the polygons for each polling station already drawn, we selected the census tracts that intersect with them. If one tract is completely inside the polygon of the polling place, we consider that its whole population resides in that neighborhood. If the tract is only 50% inside, we consider that only half of the people live there. Thus, by adding the population of the census tracts to the neighborhood polygon according to the percentage of intersection, we were able to estimate how many people of each race, gender and age live there. - Again: are you sure this REALLY represents the population that votes there? It is, as before, an approximation. The method assumes that the people residing in each census tract are evenly distributed through the space. In addition, it is important to remember that the data comes from the 2010 census, the most recent to date. Thus, demographic changes that may have occurred in the interval between 2010 and this article’s publication date are not reflected on the map. - How can I trust that you have done all these calculations the right way? All the source code behind the data is available in our page at GitHub. If you know computer programming, you can check it yourself."
Rodrigo Menegat - Data analysis and story writing Ariel Tonglet - Visual design and app development Bruno Ponceano - Visual design and infographics Cecília do Lago - Story writing Daniel Bramatti - Data journalism editor Vinicius Sueiro - Visuals and infographics editor