Project description

We analyzed more than 6 million public contracts of the SECOP platform in Colombia. We worked together with different experts in identifying red flags for the detection of public contracting irregularities.
Our motivation to carry out this special report started 4 years ago, where we were able to collect through scrappers more than 2.5 million contracts. Now with the advances of open technologies and open data we were able to analyze a larger dataset as we saw that public procurement data in Colombia is a very good proxy to inquire about relevant public policies, best practices in government expenditure but more importantly a way to measure civil rights. Thi is why at Datasketch we decided to extract all available open contracting data in Colombia from SECOP, the platform of the procurement Agency: Colombia Compra Eficiente. Since most corruption scandals in Colombia are made through embezzlement, we dug deep into data to uncover the patterns to identify future corruption risks.
The data analyzed was quite big, 6.225.175 contracts. We were able to use this data to see contracting patterns, analyze revolving door issues and organize the relevant data for an automatic system for red flagging corruption from the public data feeds. Strong emphasis was done through an investigation of the complex legislation behind public contracting in Colombia, to show it in a simple way for citizens to become active in our platform and become an Open Contracting Stalker, where we prepared information for activists and civil society organizations to act upon contextual issues like all the contracts around the Peace Agreement with FARC Rebels. We conducted an analysis of the \”red flags\” in public contracting and presented them in a Video-Dato (as we call it), featuring interviews with experts in the field and data driven animations. The full report is presented in an interactive scrolly-telling to appeal to a general audience with links to in depth analysis of the state of open data in Colombia with recommendations for the government and detailed information for the comptroller on how to identify corruption risks with this data.

What makes this project innovative?

This project is innovative because it addresses problems that are of great national relevance, but that also happen to be boring for many citizens. This initiative for the first time goes deep into the full public procurement data archive from Colombia, and additionally it for the first time presents an alternative to for a system of red flags of public procurement, which indicate possible irregularities in contracting processes. Another innovative element was the data diagnosis, which provided public entities in Colombia with specific recommendations to present data with higher quality and more clearly. The investigation was featured in several public events, including one from Presidential Counselors. For this data journalism investigation, we used different formats that facilitated the public awareness of the issue of open data to tackle corrution. Through animated data driven gifs and animations, a video, data visualizations and the preparation of simplified data sets for activists we were able to inform of this issue in a really simple way. Furthermore, our research served as a basis for other projects, including the collection of information related to congressmen financial assets, a web platform for exploring political campaign financing in Colombia and its relationship with public procurement. It has also been used for journalistic investigations carried out by other media and civic technology organizations.

What was the impact of your project? How did you measure it?

In addition, in more qualitative terms, this research has served so that several media in Colombia, can conduct their research based on the data published there, an example of this is the research conducted by the Colombian portal Cuestión Pública that conducted an investigation about the financial assets of congressmen in Colombia and people close to them, where data on public procurement for the investigation was key to tie individuals to different companies. On the other hand, this research has served to open space to certain technological tools that have crossed other types of information such as the platform by the Colombian chapter of Transparency International to have an online tool to match electoral campaign financiers with public contracts and their connections to networks of powerful public contractors.

Source and methodology

Public procurement in Colombia has always been a challenge and an opportunity to understand different issues, the most obvious one is corruption. We decided to extract the data of the public procurement in Colombia from the virtual portal SECOP I, the platform for publishing contracts of Colombia Compra Eficiente. We collected 6 million data points downloading the complete database available in the open data portal of the Colombian government. Early on we saw that good care should be taken because of the quality of the data. We created different data cleaning algorithms using R to help with the data cleaning process. We also created data driven animations with R packages including gganimate.To explain the "red flags" in the hiring, we made a video-data, with interviews to experts in the field: Nicolás Penagos (Manager for Latin America of Open Contracting Partnership and former official of Colombia Compra Eficiente), Marcela Restrepo (Director of the Knowledge management area of ​​Transparencia por Colombia), Camilo Vallejo Giraldo (Manager of the Civic Corporation of Caldas) and Aura Cifuentes (Former Coordinator of the Anticorruption Observatory of the Transparency Secretariat), accompanied by relevant information available in the open data. The analyzes and visualizations were made using the programming language R. This research was carried out thanks to the support of Open Contracting Partnership.

Technologies Used

We used our own FOIR platform for Colombia. We used R for the data cleaning scripts and algorithms, the data driven animations and all data processing. Javascript for the scrolly-telling. We could make this happen because we are a lovely balanced team :) Juan Pablo (data scientist), Ana (mathematician), Camila (statistician), David (developer), Maria Isabel (Journalist) Andrea (journalist), Verónica (anthropologist), Juliana (political scientist), David (journalist).

Project members

Ana Hernández Camila Achury Andrea Cervera David Daza Juan Pablo Marín Juliana Galvis María Isabel Magaña Verónica Toro



Additional links


Click Follow to keep up with the evolution of this project:
you will receive a notification anytime the project leader updates the project page.