Project description

“Perfil Politico” is a news application developed by non-profit Open Knowledge Brasil, with the support of Brasil.IO and Intercept Brazil. This project aimed to help electors find information about their candidates they usually wouldn’t find on political campaigns. As an example, a journalist used “Perfil Politico” and found that a Brazilian politician had 49,000% wealth growth in 4 years while in office. Getting rich is not a crime, but doing that while in office and at a such tremendous rate might raise flags. You wouldn’t hear THAT from the candidate, right?

Choosing the right candidate for elections is one of the most important tasks of the electorate, but it can be an overwhelming process. In some countries, like Brazil, there can be thousands of candidates to select from. How one can make an informed decision about a candidate with thousands of options out there? How can you make it easy so that new candidates, especially those of underrepresented communities, are known? How do you get familiarized with their ideas and projects? What if there was a way to answer just a few questions about the things one believes is important to their communities and get a list of candidates that are most aligned with their answers? These are some of the questions our project, “Perfil Politico”, set out to answer.

General elections in Brazil are massive. There are, on average, 45,000+ candidates from 35 parties to fill positions in the legislative and executive branches of power across 5800+ municipalities in the country. We built a system that mixes together around ten massive databases about candidates, their ideas, profiles, political background, wealth growth, and more. We took historic data from the past 15 years and made it really easy to anyone looking for a candidate to filter out and select the one that would be most aligned with what they wanted for their regions.

One of the features that really makes “Perfil Político” stand out is the possibility of filtering out candidates that are running for the first time and their gender. It is also possible to filter out those that are in power for the longest. We also made it really easy to find candidates that could increase the representation of women, black people, and indigenous communities both in the legislative and executive branches of power. With “Perfil Politico” anyone can also find information about party alliances and how much wealth incumbent politicians accumulated while they were in power.

“Perfil Politico” set out to use data science to make elections in Brazil more ethical, transparent and balanced. The project expanded access to reliable information to profile political candidates and facilitated citizen and journalistic research during the primaries and election campaign.

What makes this project innovative?

Try to picture this: with 35 parties in Brazil and thousands of candidates, it is REALLY hard to follow whether candidates changed parties (some do all the time), their wealth growth, and what they stand for. Most Brazilians feel discouraged to really dig to find information about their candidates, because the datasets that tell the whole picture about this are fragmented across multiple systems in the Brazilian government websites. Until “Perfil Político” there was no trivial way to connect the dots between these datasets that span across 20 years of election data in the country. You’d need to be a data scientist and really know what you were looking for. A feature that really stands out in our project is that used survey techniques to create a matchmaking system that analyzed gigabytes of data to filter out the most relevant candidates. While it’s really easy for the user, all the computation and complex operation happens behind the scenes, making it possible to anyone to find out more about their candidates. No technical background required, provided folks could use the web to access “Perfil Político”. A team of experienced data scientists had to organize a large volume of public documents as part of a comparative and detailed journalistic processing, from which several feature reports arose. In such a big country, with 27 states and up to 45 thousand candidates and population with limited internet access, empowering journalists with an easy-to use news app to consult big data is also empowering local journalism and civic groups. We found and organized 16 million party affiliations, since 2004, in Brazil. This provides journalists with a history of every candidate and helps to visualize the network of power surrounding politicians. Perfil Politico also shows more than 3 million declared goods. With this information, people can compare how much politicians have increased their properties.

What was the impact of your project? How did you measure it?

Stories written using “Perfil Político” data had great engagement, reaching organically over 150,000 people on the pages of Open Knowledge and Operação Serenata de Amor on Facebook, with more than 1,000 shares. With the data we gathered, we unveiled the top 100 incumbent politicians who most enriched themselves from the last mandate to the 2018 elections. The wealth growth of the first place was R$ 168 million (around US$40 million) and of the last one was R$ 2 million, with an average of more than R$ 10 million of enrichment per candidate in the period. The top 3 on this list enriched more than $ 50 million between one election and another. The data we gathered helped journalists to unveil stories such as the one about state representative Zeca Viana, published by The Intercept Brazil. Elected in 2014, the deputy went from R$ 87 million to R$255 million, an increase of R $ 168 million, more than 40 million dollars. Zeca Viana wasn't reelected. Like him, many candidates were seeking reelection after already more than ten years in office. The Intercept also published the story about Doda de Tião. In cities with up to 100,000 inhabitants, Doda was the elected politician in 2014 who had the highest percentage increase in wealth during the term, and he ran for re-election in 2018. His wealth grew by 716% in four years, going from just over R $ 500 thousand to R $ 4.5 million. He also wasn't reelected. In order to help journalists and other projects to use our data, we created an API to facilitate data queries. While misinformation spread, Perfil Político applied data science techniques to empower citizen and democracy by giving access to reliable information about the political candidates.

Source and methodology

Perfil Político used Brazilian government official sources to gather different data: Public information about companies (open data about their registration in Federal revenue, roughly 80,000,000 records) Dataset of donations to public campaigns (historical data containing over 26,000,000 rows) Dataset of candidates from the 2004 to 2018 (over 3,000,000 names) Dataset of all parties' affiliations (over 16,000,000 rows) Text of law bills in the Chamber of Deputies (roughly 2,500/year) Text of law bills in the Federal Senator (roughly 400/year) Dataset of votes and electoral results (over 19,000,000 rows) In order to normalize this massive volume of data, the project faced a significant challenge and had to apply fuzzy matching techniques to match the names of the candidates from different databases. We also found registers, like the affiliation number, with possible errors (like eleven rows with the number "1" as affiliation number) that we had to check with the official servants. We organize all the data in one single table. This one-table-strategy looks forward to improving the speed of use: we don't have to connect dozens of files and then show the results to the user. We just organize it and provide a full analysis experience.

Technologies Used

In the backend, the project was developed using Python, with libraries such as Pandas, and Postgres. We needed to simplify as much as possible the access to the data to minimize the delay in loading it in the platform, so we created an API with Django. The frontend was developed using JavaScript.

Project members

Álvaro Justen, Carol Rozendo, Eduardo Cuducos, Letícia Portella, Pedro Villanova and Tatiana Balachova.



Additional links


Click Follow to keep up with the evolution of this project:
you will receive a notification anytime the project leader updates the project page.