DW Data is the editorial data-driven unit of Deutsche Welle (DW), Germany’s public international broadcaster. We are part of the Design Department but function as editorial unit working independently on data-driven stories and collaborating different specialized editorial or regional desks. Even though DW is a large and long-standing media organization, DW Data is comparatively young (founded in June 2017): starting as a small team of two we have grown to four team members as of 2019 (one of us full time). Editorially, we focus on rather original and investigative stories – across a variety of topics. DW addresses people all over the world, allowing us to tell data-driven stories dealing with different topics on regional, international and global scale. Our outlet publishes online content in 30 languages. This allows DW Data to not only reach English and German speaking audience, but also audiences fluent in Hindi, Persian, Greek and Portuguese for example. Above all, the true value of being part of an organization with that many editorial desks is the chance to bring data to life through different (cultural) perspectives.
Although the organization offers many opportunities, naturally it also comes with certain restrictions and a complex infrastructure. 2018/19 DW Data has established itself within the organization and developed more and more routines that lead to more efficiency, which is reflected in our output displayed on dw.com/data. With every piece we try to explore new methodologies, cover new topics or explore new visual forms to expand the DW chart portfolio. For now, we mostly focus on static visualizations, not only to circumvent technological pitfalls but also to fully pursue a mobile first strategy. In addition, we experiment with GIFs and video explainers based on our investigations. Apart from the from-the-outside visible efforts, DW Data has taken strengthened inhouse collaborations bridging between different departments within DW and started training workshops and knowledge sharing sessions for other DW journalists, as well as supporting outside trainings offered by DW\’s own academy (DW Akademie).
What makes this project innovative?
Working for DW brings the advantage and challenge of dealing with diversity. We have a diverse target audience distributed over throughout the globe; we have the opportunity to publish in 30 different languages and address topics varying from environment to sport and culture. We addressed these challenges by establishing workflows and evaluation methdos dedicated to the cater this diversity. For every piece we ask how we might set up in a way it can be tailored to different target groups. Usually this leads to a master article in English that is offered to different language desks for personalization and/or regionalization. Furthermore the efforts of our team stand out in bringing together people from different departments, joining their efforts, overcoming different understandings and setting foot in another department's "world", leading eventually to stronger networks and ties within our organization, supporting a more collaborative culture. In addition, our work pushes for developing methods to measure "quality" and evaluating not only quantitative measures but also making non-quantifiable measures of success count.
What was the impact of your project? How did you measure it?
Our goal is to find stories that have not been told before. We streamline our efforts with DW’s general goal to produce stories for a global audience; focusing among others on topics like environment and migration. Since DW is a public broadcaster, we don’t look at metrics in the classical revenue-driven way, where success is mostly measured in a quantitative way. Since the impact of data-driven journalism is rather qualitative, we tried to find quantifyable proxies that would measure quality. Generally, we take into account three different measures of quality: (1) Rather than on visits, we focus on the dwell time users spend reading our stories. On average the dwell time of data-driven stories on dw.com is more than three times greater compared to the average dwell time of DW online content. (2) The adaptation of our data-driven stories by DW's 30 language desks, as it speaks to the compatibility and relevance of our stories that other teams deem them worthy to spend time on translation and publication; if these efforts would not pay off for their language programs they would not continue to invest time and resources in it. However, we see increased inhouse interest in our team's work. (3) Social media echo: We not only monitor who likes, comments, shares the corresponding posts. For not only see that our stories are recognized by the DDJ community (e.g. mentioning by GIJN’s Data Journalism Top 10), but – more importantly – that our stories reach the target group we identify as stakeholder for the single stories. For instance, our piece on real cost of travels in Europe (train vs. plane) was shared by environmental activists in Europe, whereas the piece on climate risk insurance got more exposure in Africa.
Source and methodology
The DW Data approach to data driven stories is derived from academic research. At the very beginning of each project we start with a hypothesis and develop research questions in order to prove or disprove it. To answer them, we use different methodologies. Although the DW Data team works independently on stories, we usually team up with the corresponding desks. The collaboration can be very close, so that a journalist specialized on the subject exchanges intesively with us about finding of the data analysis (e.g. Climate Insurance). In other cases we - the data journalists - process, clean, interpret, visualize and write the journalistic piece. In order to verify the findings and ensure quality, we double-check the each pieces within our team and with the specialized editoral desks. Since our data-driven investigations tackle a variety of topics, the methodologies and sources differ as well and always adjust to the needs of a given story idea. Thus we draw on data from various sources, ranging from databases like Eurostat or UNHCR, as well as other structured open data to reports (e.g. by the German Federal Foreign Ministry) and crowdsourced data as well as data compiled by individuals such as Richard Brunner (see Hollywood stereotype story for more detail). Since not all data comes in readily-made spreadsheets, a great deal of our work is liberating it from websites or documents with the help of scraping methods. Up to date, almost every project involved scraping data to some extent: Among others, we scraped data of the movie tropes wiki TvTropes.org, looked up prices of airline tickets on Google Flights and rail tickets on Trainline and collected data from various football sites. Our methods are as diverse as our data sources: Working on a story on representation of minorities in Hollywood movies for example, we wrangled with user-generated data, normalized it and applied pattern recognition methods. In order to find another angle on the football World Cup, we developed a customized score allowing us to measure how different football leagues compare. Even though our methods might differ from story to story, one element is key for all of them: To make our sources and methods transparent and reproducible. This not only helps us to ensure quality and avoid mistakes, it allows others to better comprehend each step that led us to our findings and conclusion, thereby adding additional credibility to our journalistic work. All data and code behind our stories can be found at https://github.com/dw-data
We are at the great advantage that - with the individual skill sets of our team members - we can cover a wide range of technologies: We can analyse data in Excel, R and Python, SQL and d3.js. We are adept in scraping with various libraries and using mapping softwares like Mapbox or QGIS for geospatial analysis. We also use tools like Tabula and Abbyy Fine Reader for liberating data and Open Refine or Regular Expressions for data cleaning. A recurring tool is further Jupyter Notebook for documenting the process. When it comes to visualizing our findings for the end user, we use different libraries in Python (e.g. Matplotlib and Seaborn) and R (e.g. ggplot2) for preliminary visualizations and finalize them with Adobe Illustrator. At the end of each data piece we feed our stories to the editorial planning system (OpenMedia) in order to offer them to the different language desks for adaptation. In order to be transparent we publish the data sources, analysis and methodology on GitHub (github.com/dw-data).
Gianna-Carina Grün, Kira Schacht, Tom Wills, Eva Lopez, Daniel Pelz, Ayu Purwaningsih, Shitao Li, Rodion Ebbighausen, Ruby Russell