Project description is the only media that makes data journalism in Cuba. It is a journalistic organization that carries out its work voluntarily, without any funding and in the free time of its members. In I have worked journalistically in all published articles, some written only by me and others in collaboration with some of the team, while I have been responsible and done all the data analysis processes, the creation of all the graphics and visualizations as well as the structuring of all the databases that are created. In addition, I am responsible for all the programming and management of the publication on the site. During this year, the articles have had a good thematic variety in which topics related to sports, economy, politics or elections have been addressed. To make these articles the process of data analysis has always been different and, in each case, it has been necessary to use different techniques. In addition, in each case attempts have been made to make appropriate visualizations always seeking to experiment with new ways of presenting the data looking for the most appropriate tools. Also, it is remarkable the set of databases that had to be created from unstructured sources and that were available for use in Cuba for the first time. Some of the works that make up the portfolio can be highlighted. In the article dealing with transfers of allegiance in athletics, the transfer of allegiance database of the last 20 years was built, showing that this is a multicausal problem that concerns the IAAF\’s top management. In another of the articles, which deals with the topic of electric motorbikes, multiple sources of information were used from interviews to statistics reported by Panama about the visits of Cubans, and data were collected from more than three years of advertisements published in the web and different tools were programmed to process them and identify the relevant data. Interactive visualizations were also constructed to show different elements, both covering geographic, brands or price issues, all to build a story that shows a topic that is multidimensional. Other articles that stand out are those that cover the Cuban electoral process. This is the first time that a coverage of the Cuban election process is made from the data analysis. This electoral process is unique in the world and can not be analyzed as it is traditionally done with other elections. To cover the process, in different proposals, the parliament was characterized, an application of data was made where each person could analyze the parliament according to the criteria that they cared about and it was analyzed how the next council of states of Cuba could be formed.

What makes this project innovative? is the only journalistic organization that makes data journalism in Cuba, this way every time we propose a new article, a new visualization or a new database we are putting at the disposal of society tools to understand and analyze problems of the society. In addition, we have contributed in a practical and objective way to the necessary debate on access to information and the use of open data. In the same way, we have increased the commitment with the audience and we have addressed topics of public interest in moments of informative interest. Thus, in recent months we have addressed issues of high impact for those who visit us, such as private work in Cuba or the elections that will culminate in April and that will change current Cuban history, because Raúl Castro has announced his retirement as President and the generation that has historically occupied power in Cuba is retiring. None of these issues had been analyzed from the data, nor did they have databases for others to use in their analyzes. We have created them and we have made them public. We have also emphasized that all our work can be verifiable, saying how it was done and what tools were used. This becomes a practice that allows us to maintain the credibility that step by step we have won. It is also important the interest to experiment, trying to find new narratives, new ways to analyze the data, new tools to use when designing and making visualizations.

What was the impact of your project? How did you measure it?

Since the emergence of we have had a great social impact, because we offer journalistic analysis of value, added to interactive graphics. In addition, we provide the databases that we create ourselves, and we are based on GitHub, which allows all interested to access our work methodology. Postdata has had a high impact in the last year. The first thing is that we have become the referent of data journalism in Cuba but we have also been recognized at a Latin American level. For this reason, the journalism school of the University of Havana asked us to teach and take charge of a data journalism course for journalism students. In this way, the team of, in this case directed by me, is responsible for teaching the only university course on data journalism that is offered, until now, in Cuba. In addition, I have given talks and lectures about and data journalism in different institutions and spaces. In addition, we have had different recognition. A work of ours was selected to be presented at the Latin American Research Congress COLPIN'2017 in the panel "Investigations using data". In the first Cuban Congress Cybersociety'2017, in the solutions fair, won one of the first prizes for its contribution to society and the innovative use of data analysis and information technologies. Likewise, we were invited to collaborate in the writing of the next Data Journalism Handbook. In this period several Cuban and foreign media have written reports about emphasizing its voluntary nature, its constant innovation and its commitment to the use of public and open data. According to Google Analytics statistics, visits to the site have increased and there are constant visits to some articles that have had a high impact. In addition, every time we publish a new text we receive many more visits, which shows the interest of readers in our articles. Many of our texts have been cited by other media and our journalistic tools and the databases that we share have served as a source for other journalistic works.

Source and methodology

For the works we do, there are some general guidelines: use public data, that history can be verified and that each story has its own design. Based on this, the development of the story is done according to the data analysis. To do this, many times you have to build a database since it does not exist in a structural way, so first the data sources are identified. Then the data is downloaded manually, or programs are created for automatic download. After having the data, the structure of the databases to be used is defined and created programmatically, and if necessary, it is updated. With the data bases, the data analysis is carried out and, based on certain hypotheses that arise, the insights that are the basis for telling the story are obtained. Then we discuss how the story should be told, what other elements or journalistic techniques are needed and what could be the best visualizations that can accompany the work and how we can innovate a little more about what has been done. Finally, after the visualizations are built, the reporting and interviewing, then the writing of the work and its final packaging are made. The published article will tell you where the data came from, how to obtain it and how it was done.

Technologies Used

In we decided to use github as our work platform, so all our articles, databases and tools used are available for study and use by whoever is interested. In this way, publishing is simply committing to our github repository. On the other hand, we have always used the JSON format as the basis for the data that our stories use. In order to create databases it has been necessary to use tools such as pdftotext, download web pages using wget or scrapy, or create our own programs in python to process the data. Many times we have created CSV files as intermediate formats to be processed with python or with LibreOffice. All the data analysis is done in python. Class libraries like Numpy, NLTK, Scikit Learn and NetworkX have been used. These have allowed us to do statistical processing, natural language processing, clustering and network analysis. All the web pages have been programmed by us using HTML5, CSS and Javascript. Jquery has been used for processing the HTML DOM. Likewise, D3.js and C3.js are the main libraries used for graphics although Echarts.js has also been used. Jvectormap and Google Maps have also been used for working with maps. To perform the scrollytelling like visualizations, the Scrollama library has been used. Other libraries of classes used have been Tooltipster, Flip.js and Horizon-swiper, among others.


Additional links


Click Follow to keep up with the evolution of this project:
you will receive a notification anytime the project leader updates the project page.