Project description

Postdata.club is the first and only data journalism website in Cuba. It has only three members who works int their free time and without any funds . In last year our team has been focused on brand new topics for doing journalism that matters, using open, public and accessible data. In some cases, we get them from data sets but also we created databases using unstructured data.

In Cuba there was also a historical change of power —a new generation assumed the head of the country— and was discussed a new project of a constitution, approved in February 2019. From data, we were covering that political situation. The whole process was accompanied by us.

In fact, for the constitution to be approved it was necessary to discuss people\’s opinions and in Postdata we created the only data app in Cuba with this purpose. In the form a digital space, an including a tool for comparing the project with the previous constitution article by article, gave cubans, inside and outside, the opportunity to express themselves by separate from each article while also seeing the opinions of experts who joined the debate in the app. In this way it was possible for people to attend the debates and finally decide their vote taking into account edges that came out of the analysis provided by Postdata.club. Using this tool more than 700 actions related to the constitution project were proposed.

To accompany the debate on the Constitution, in addition to the comparative tool, articles were written on specific topics focused on the issues from data journalism, such as those related to same-gender marriage, the permanence of presidents in power, the vote to distance and how constitutions are reformulated in other countries. In addition, the discussions of a Facebook group with 2377 members and more than 11,000 comments on the subject of the Constitution were taken into account.

We also create a work relate with cuban internal migration that had a great reception. This was the result of the Data Journalism course that our team give to a group of journalism students of Havana University. Together, Postdata team and the students, under our adisory, were able to produce the most complete work of Cuba inner migration based on 16 stories, one for each cuban province, 14 audio testimonies, more than 55 interactive graphs including 16 maps. Here we also experiment with sonifications creating 16 of them, one for each province based on internal and external migration rates.

Another interesting work was a data app to identify the municipality differences in Cuba. For this we use Artificial Intelligence algorithms to identify, without bias, the groups of municipalities base on crossing different characteristics reported by the official institutions. In this work we also introduce the idea of audio-telling, we use an audio story and at the same time we create interactive animations that shows our insights and how to use the tool, and in every moment the users were able to interact to the app to explore concepts and ideas that the sound describe.

Also related with Artificial Intellligence, we create a chatbot to predict the results of last cuban elections. This is the only model that already has presented in Cuba to predict the elections results and it was modeled using supervised learning algorithms that use past elections results to create a clasification model. The readers interact with this model by talking to Amanda, our chatbot.

We even try to show that with only a pencil and colors people can produce data journalism. So we draw data visualizations and published as a works. a lot of people was amazed with the effective of this simple ideas.

We also cover another issues related with gender gaps, the effect in mass culture of movies like Bohemian Rapsody, the Futbol worldcup, the ministery composition.

What makes this project innovative?

Postdata.club is the only journalistic organization that makes data journalism in Cuba, this way every time we propose a new article, a new visualization or a new database, we are putting at the disposal of people tools to understand and analyze problems of the society. In addition, we have contributed in a practical and objective way to the necessary debate on access to information and the use of open data. In the same way, we have increased the commitment with the audience and we have addressed topics of public interest in moments of informative interest. Through this year we have found new ways of telling stories and showing data to the people. We have experimented with many technologies for doing journalism. In the coverage of the Cuban elections, we used Artificial Intelligence for creating a chatbot and the only prediction model for Cuban elections. This chatbot was able not only to “chat” but predict the odds for Cuban elections and also find all available data about any candidate to the Parliament. Also, the AI was recently used to find the gap between the municipalities in Cuba: salary, race, migration, and dependency of non-working people. This way, we found that some zones of the country, mainly with black and half-blood people, get lower salaries than the touristic ones. Sonification was another resource we used. It was significant for showing the behavior of migration inside the country in a different way. Also, we made audio telling, an interactive map which has different actions for different moments according to what is said in the audio story. It gives the opportunity to interact with the map live or see the events only when the audio indicates. Important for us, it's to mantain a balance using traditional ways of doing journalism, data opening , teaching other people of ways of doing, data sciense and creating data apps. This combination is our innovative way of doing.

What was the impact of your project? How did you measure it?

We have constantly increase our audience and create aknowledgement in Cuba about the importance of the use of data for decistions taking. Also we got important recognitions. Last year, we won de General Excellence in Online Journalism Award for micro newsroom in the Online Journalism Awards'2018. Also, we were shortlisted in the RELE Awards in its first editions, because of our efforts for using public data from the government and open these data. Postdata.club team members published a chapter in the Data Journalism Handbook 2nd edition. This chpater was about data searching in a country with lack of data available. Also, thanks to the importance of our work, School of Communication of the University of Havana created an space for young journalists, who learns data journalism with us. Our team was invited, for second consecutive year, as speaker to the Latin-American Investigation Congress (COLPIN 2018) for sharing the experience of covering how was the political power behavior through the elections in that year. Our chatbot named Amanda was not only widely used but mentioned by media around the world, as the election predictor bot. Also, the work about executive power showed the gender gap phenomenon in the low presence of women in the head of Ministries and other important centers. Five doodles were shared in social media and praised by many journalist in COLPIN. It was one of our more beloved works because they were handmade and proved that Data Journalism can be made with markers and some imagination. The Constitutional process coverage was one of our most impact work. The tool we created (http://www.postdata.club/issues/201808/proyecto-constitucion.html) for comparing the old constitution and the project —already approved— was shared not only by people before the official debate, but embassies of Cuba overseas, government functionaries and diplomatic representations in the country. Also, the articles along with the tool served for people to know what kind of requests make to the Constitutional Committee, so we contribute to the people’s decisions to be more informed. Another importan thing was that the databases and app we built and published were used in another interesting and related work from different media. Our tool was used by the diplomatics representations in Cuba and some of these representative were in contact with Postdata.club to see how we cover and think the constitutional process. Bohemian Rhapsody effect had high impact in social media, because of the whole cultural references that were used on its publishing. Counting the migration inside Cuba, one of our recent works, had huge impact. Not only because of the reads on the website, but for the problem it helped to show. People are going to Havana or touristic places to live. We collected for a year 16 life stories and monitored the migration statistics. This was the result of the 2017-2018 Data Journalism course in the University of Havana, so it has a deep educative component. Also, we used sonification for the first time in Postdata.club history, an unprecedent milestone in Cuban journalism history. The final work has been the most important in our impact. It is about finding the gap between people in Cuba with parameters like race, salary, age ratio, intern migration and rural population. Here again we use AI algorithms and introduced the idea of audio-telling. Both things were celebrate for the community including people and organizations related with data journalism but also related to local development. This work in two days since it was published became the most visited and shared article in social networks in Postdata.club history. Most recently, one of our team member was selected as a fellow for the 2019 Chequeado Programme for Fact Checking. A line we start to develop in Cuba.

Source and methodology

We follow general guidelines: use public data, that history can be verified and that each story has its own design. The first we do it is to find the data. Sometimes, the least, we are lucky and find the data completely but most of the time we have to create our own databases manually or using computational tools that we also create. Next, it's data verification, for this we use different sources that we cross to verify all the information. After that we start the data analysis process where we discuss the main ideas and the use of different tools like AI algorithms, Natural Language Processing tools, numeric approaches, social network analysis, etc. Then, based on certain hypotheses that arise we get the insights, those will be the core for telling the stories Later we discuss how the story should be told, what other elements or journalistic techniques are needed and what could be the best visualizations that can accompany the work and how we can innovate a little more about what has been done. Finally, after the visualizations are built, the reporting and interviewing, then the writing of the work and its final packaging are made. The published article will tell you where the data came from, how to obtain it and how it was done. It's also important that we check how the article should be visualized in different formats and devices (PC, tablet, laptop) and try to made the adjusments to have a good visualization in the different devices. Also, we try that all the information you need for the article it's downloaded at the beginning so once you open the work you don't need to create any other connection so you que use the graph and interactions offline, this is very importan for the connectivity conditions of Cuba. This is no always possible to acomplish but we always try to do it if we can.

Technologies Used

We use github as our work platform, so all our articles, databases and tools used are available for study and use by whoever is interested. In this way, publishing is simply committing to our github repository. On the other hand, we have always used the JSON format as the basis for the data that our stories use. In order to create databases it has been necessary to use tools such as pdftotext, download web pages using wget or scrapy, or create our own programs in python to process the data. Some times we use SQL language and SQLite client for getting data of traditional databases, Many times we have created CSV files as intermediate formats to be processed with python or with LibreOffice. All the data analysis is done in python. Class libraries like Numpy, NLTK, Scikit Learn, Spacy, Keras and NetworkX have been used. These have allowed us to do statistical processing, natural language processing, clustering, supervised learning and network analysis. All the web pages have been programmed by us using HTML5, CSS and Javascript. Jquery has been used for processing the HTML DOM. Likewise, D3.js and C3.js are the main libraries used for graphics although Echarts.js has also been used. Jvectormap and Leafletjs have also been used for working with maps and also Google Maps for geographical localization and verification. We used Soundclud and its Widget API for working with sounds, and Twotone.io for sonifications. Other libraries of classes used have been Tooltipster, Horizon-swiper, Simptip, Boostrap-treeview among others. For text searching in the data apps and text interactions we used Lunr javascript library and also Fusejs.

Project members

Yudivián Almeida Saimi Reyes

Link

Additional links

Followers

Click Follow to keep up with the evolution of this project:
you will receive a notification anytime the project leader updates the project page.