Project description

Postdata.club is the first and only media that is doing data jornalism.
It is a journalistic organization that carries out its work
voluntarily, without any funding and in the free time of its three members. In Postdata.club I have worked in every journalistic piece we have produced. I have written only by my self some of then and, in others, I collaborate in the writing.

Also have done all the data analysis of every of ours stories, all the programming, data visualization, data gathering and structuring of the database we use, build or create. Even I have draw some visualizations that we published. All team members, me one of them, participate in the definitions of the ideas for the stories and how we want to told them. After that, I\’m the responsible to made it real.

During this year we have covered different topics related to politics, gender gap, cuban gobernment, mass culture, futbol, inequalities, inner migration, elections, etc. We have conduced two extensive investigations: one related to the constitutional process in Cuba and the other related to the internal migration in the country. I was in charge of both.

To make every work the process of data analysis has always been different. We tried to use the most appropiate tool in every case. Also with the interactive visualizations that we thinked are the most appropiate to each article. We make emphasis in trying to experiment with new ideas or expressions for presenting our works. That\’s why we used what we called audio-telling, we used a chatbot, sonifications or AI algorithms. Also, it is remarkable the set of databases that had to be created from unstructured sources and that were available for use in Cuba for the first time. Important as well, was the verification process of all the data we use that allowed to identify some errors in the official stats in Cuba and warn the authorities to clear and fix this.

Some of the works that make up the portfolio can be highlighted.
First, we did a full coverage from data, but not only, of the Constitutional Reform Process. We, started this investigation and coverage, created the only one online news app that allows cuban citizens and others to make propositions in a way all the information related to the constitution were well structured and can be compared to the previous constitution, article by article, and see in a graphic way all the differences. This app was used by many people, even by institutions like many of the Diplomatic Representations in Cuba and outside. The data we produced was used in many of the analysis that work done, also in journalistic products from different media.

This app was not the only thing I did relative to the Constitutional Reform. We also made analysis of the most controversial topics using a lot of data from an outside Cuba (constitutions and laws from more than 200 countries, laws in Cuba, media reports, etc). Some of this topics were the presidential election, the vote of citizens outside Cuba, same gender marriage, among others.

When discussions about the constitution finished I analized all the proposals that were submitted in our news app but also the ones that were submitted in the principal Facebook group that was created with this purpose. Then, when the constitution final text was released we analyzed all the differences with the project and people suggestions. Finally, when the constitution was approved in referendum we made a short analysis of the results and made and article that ilustrate our coverage of the process.

Another important investigation was the one related to the internal migration in Cuba. This was important and relevant because was the most completed research done in this topic in Cuba. Covering the 16 provinces. It was important also, because it was a product of our teaching role. We involved in this research the student of course in Data Journalist that we give to journalism students in Havana University. At the end we created a coral story with 16 stories (one for each province), these included 14 audio testimonies, more than 55 graphs including 16 interactive maps and we experiment with sonifications. We create one for each province based on the internal and external migration rates.

Also was relevant the data app we create to explore the municipalities differences in Cuba. Here I use unsupervised learning algrithms to make clusters of municipalities by crossing official stats of each of them. Every one can decide one or two parameters of the 5 we choose and see what distribution appears. Even people can select from two different algorithms and decide to exclude or not the capital.
Here we introduce in our works the idea of audio-telling.

I also created a chatbot with the first election prediction model that exists in Cuba. We base this model in supervised learning based on the history of previous elections.

Another relevant articles published were the ones covering the gender gap in salary or government participation.

What makes this project innovative?

Postdata.club is the only journalistic organization that makes data journalism in Cuba, this way every time we propose a new article, a new visualization or a new database, we are putting at the disposal of people tools to understand and analyze problems of the society. In addition, we have contributed in a practical and objective way to the necessary debate on access to information and the use of open data. In the same way, we have increased the commitment with the audience and we have addressed topics of public interest in moments of informative interest. Through this year we have found new ways of telling stories and showing data to the people. We have experimented with many technologies for doing journalism. In the coverage of the Cuban elections, we used Artificial Intelligence for creating a chatbot and the only prediction model for Cuban elections. This chatbot was able not only to “chat” but predict the odds for Cuban elections and also find all available data about any candidate to the Parliament. Also, the AI was recently used to find the gap between the municipalities in Cuba: salary, race, migration, and dependency of non-working people. This way, we found that some zones of the country, mainly with black and half-blood people, get lower salaries than the touristic ones. Sonification was another resource we used. It was significant for showing the behavior of migration inside the country in a different way. Also, we made audio telling, an interactive map which has different actions for different moments according to what is said in the audio story. It gives the opportunity to interact with the map live or see the events only when the audio indicates. Another key issue it's that we were the first site in Cuba doing data opening and publishing data that could be used not only by us. Other media or institutions have used our data in different works and forms. We are firm in this purpose, not only creating data stories also finding and creating database and publishing and opening this data. Important for us, it's to mantain a balance using traditional ways of doing journalism, data opening , teaching other people of ways of doing, data sciense and creating data apps. This combination is our innovative way of doing.

What was the impact of your project? How did you measure it?

We have constantly increase our audience and create aknowledgement in Cuba about the importance of the use of data for decistions taking. Also we got important recognitions. We won de General Excellence in Online Journalism Award for micro newsroom in the Online Journalism Awards'2018. Also, we were finalist in the RELE Awards in its first editions, because of our efforts for using public data from the government and open these data. For this, related to the RELE Awards, the OEA organization did and mentions us for the labor we are doing in an scenario where data is difficult to find, Postdata.club team members published a chapter in the Data Journalism Handbook 2nd edition. This chapter was about data searching in a country with lack of data available. Also, thanks to the importance of our work, School of Communication of the University of Havana created an space for young journalists, who learns data journalism with us. I was one of the authors. Our team was invited, for second consecutive year, as speaker to the Latin-American Investigation Congress (COLPIN 2018) for sharing the experience of covering how was the political power behavior through the elections in that year. Our chatbot named Amanda was not only widely used but mentioned by media around the world, as the election predictor bot. Also, the work about executive power showed the gender gap phenomenon in the low presence of women in the head of Ministries and other important centers. Five doodles were shared in social media and praised by many journalist in COLPIN. It was one of our more beloved works because they were handmade and proved that Data Journalism can be made with markers and some imagination. The Constitutional process coverage was one of our most impact work. The tool we created (http://www.postdata.club/issues/201808/proyecto-constitucion.html) for comparing the old constitution and the project —already approved— was shared not only by people before the official debate, but embassies of Cuba overseas, government functionaries and diplomatic representations in the country. Also, the articles along with the tool served for people to know what kind of requests make to the Constitutional Committee, so we contribute to the people’s decisions to be more informed. Another importan thing was that the databases and app we built and published were used in another interesting and related work from different media. Our tool was used by the diplomatics representations in Cuba and some of these representative were in contact with Postdata.club to see how we cover and think the constitutional process. Bohemian Rhapsody effect had high impact in social media, because of the whole cultural references that were used on its publishing. Counting the migration inside Cuba, one of our recent works, had huge impact. Not only because of the reads on the website, but for the problem it helped to show. People are going to Havana or touristic places to live. We collected for a year 16 life stories and monitored the migration statistics. This was the result of the 2017-2018 Data Journalism course in the University of Havana, so it has a deep educative component. Also, we used sonification for the first time in Postdata.club history, an unprecedent milestone in Cuban journalism history. The final work has been the most important in our impact. It is about finding the gap between people in Cuba with parameters like race, salary, age ratio, intern migration and rural population. Here again we use AI algorithms and introduced the idea of audio-telling. Both things were celebrate for the community including people and organizations related with data journalism but also related to local development. This work in two days since it was published became the most visited and shared article in social networks in Postdata.club history. In the last months, many articles related to Data Journalism mentioned the work Postdata.club and I have being doing in this area.

Source and methodology

We follow general guidelines: use public data, that history can be verified and that each story has its own design. The first we do it is to find the data. Sometimes, the least, we are lucky and find the data completely but most of the time we have to create our own databases manually or using computational tools that we also create. Next, it's data verification, for this we use different sources that we cross to verify all the information. After that we start the data analysis process where we discuss the main ideas and the use of different tools like AI algorithms, Natural Language Processing tools, numeric approaches, social network analysis, etc. Then, based on certain hypotheses that arise we get the insights, those will be the core for telling the stories Later we discuss how the story should be told, what other elements or journalistic techniques are needed and what could be the best visualizations that can accompany the work and how we can innovate a little more about what has been done. Finally, after the visualizations are built, the reporting and interviewing, then the writing of the work and its final packaging are made. The published article will tell you where the data came from, how to obtain it and how it was done. Also, we open the data we have used. Besides that, as we are based on Github, everyone at any moment can see our data, our code, our history. It's also important that we check how the article should be visualized in different formats and devices (PC, tablet, laptop) and try to made the adjusments to have a good visualization in the different devices. Also, we try that all the information you need for the article it's downloaded at the beginning so once you open the work you don't need to create any other connection so you que use the graph and interactions offline, this is very importan for the connectivity conditions of Cuba. This is no always possible to acomplish but we always try to do it if we can.

Technologies Used

We use github as our work platform, so all our articles, databases and tools used are available for study and use by whoever is interested. In this way, publishing is simply committing to our github repository. On the other hand, we have always used the JSON format as the basis for the data that our stories use. In order to create databases it has been necessary to use tools such as pdftotext, download web pages using wget or scrapy, or create our own programs in python to process the data. Some times we use SQL language and SQLite client for getting data of traditional databases, Many times we have created CSV files as intermediate formats to be processed with python or with LibreOffice. All the data analysis is done in python. Class libraries like Numpy, NLTK, Scikit Learn, Spacy, Keras and NetworkX have been used. These have allowed us to do statistical processing, natural language processing, clustering, supervised learning and network analysis. All the web pages have been programmed by us using HTML5, CSS and Javascript. Jquery has been used for processing the HTML DOM. Likewise, D3.js and C3.js are the main libraries used for graphics although Echarts.js has also been used. Jvectormap and Leafletjs have also been used for working with maps and also Google Maps for geographical localization and verification. We used Soundclud and its Widget API for working with sounds, and Twotone.io for sonifications. Other libraries of classes used have been Tooltipster, Horizon-swiper, Simptip, Boostrap-treeview among others. For text searching in the data apps and text interactions we used Lunr javascript library and also Fusejs.

Project members

Yudivián Almeida Saimi Reyes Ernesto Guerra

Link

Additional links

Followers

Click Follow to keep up with the evolution of this project:
you will receive a notification anytime the project leader updates the project page.