I am a 25 year old data journalist based in Germany and work for the Berliner Morgenpost. I was the Interactive Team’s data trainee for a year but took over from Julius Tröger as team leader this April. Currently, I am finishing my Bachelor program at the TU Dortmund, where I am studying Science Journalism with a focus on Data Journalism. I am the Co-Founder and CEO of the Journocode UG, an organization of young journalists, designers and programmers who have made it their business to impart data driven skills. Together with the other Journocoders, I give workshops on data journalism, organized a one day data journalism conference in March 2018, and published projects like the Data-driven Advent Calendar last December (https://advent17.journocode.com/).
As trainee of the Berliner Morgenpost’s Interactive Team I worked on some great data projects this past year. With each project I continued learning and kept improving my skills. As the team’s data wrangler, I was mainly responsible for data sources, data cleaning and analysis in our projects. What I enjoy most are the projects in which we try something completely new, a new technique or a new type of visualization. After all, that’s when it’s the greatest challenge. What I also enjoy a lot is data cleaning, even if it often takes hours. I know that liking the cleaning part is a rather unpopular opinion, but what appeals to me is that I already know how the adjusted data should be structured in the end and then I have to find the smartest way to get there. I am also interested in the other areas of data journalism in which I like to look over my brilliant colleagues’ shoulders.
I chose the attached projects for my submission because they show different approaches to create very different pieces of data journalism, and because I have worked very intensively on all of them, from the very beginning of the idea, through methodical structuring, to implementation and publication. Sometimes I was responsible for data cleaning, sometimes for analysis, sometimes even for visualization. Of course, each of our projects is a product of teamwork and none of this projects would have been possible without the other members of my team. The chosen projects also show the variety of topics I worked on this past year. Since 2017 was an election year in Germany, three of the projects are influenced by the election. Nevertheless, the approaches and forms of presentation we chose are different. We strive to be innovative and surprising in every project, yet choose the best form of presentation so that the reader can understand the context as quickly as possible. Being the new team leader, I am very much looking forward to continuing my work with the team.
What makes this project innovative?
Live Link: The sound of the SPD's steep fall. You can find a detailed description in our DJA18 submission of this project: https://bit.ly/2upTevM. Link 1: Where women earn more than men. On the subject of the gender pay gap, I analyzed and visualized income data. In the project we show how women earn more than men in just a few professional groups and how the division of Germany still affects the gender pay gap today. You can find screenshots with annotations in English here: https://bit.ly/2GpUQe5. Link 2: Not Berlin - Munich is the capital of the new arrivals. We were very surprised by some of the results when we compared various data of the five biggest cities in Germany. The project shows that many clichés we have of Berlin are wrong. For example, Berlin is considered to be the city with the most foreigners in Germany. But Munich leads the ranking here. Screenshots with annotations: https://bit.ly/2GvO2vg. Link 3: This is how close the parties are - and how close their candidates are. In preparation for the 2017 federal elections, I analyzed the answers of German parties to questions on social and political issues and calculated the similarity between all parties. In the interactive chart, users can select a party they want to compare to the other parties. These are then sorted according to the degree to which their answers match those of the selected party. Screenshots with annotations: https://bit.ly/2DUdtS0. Link 4: How Berlin ticks along your line. Our DJA18 submission: https://bit.ly/2ulaqTf. Link 5: Enduring party heartlands: Where the parties can count on their voters. Our DJA18 submission: https://bit.ly/2pEQB4E. Link 6: How Germany voted – the federal election in infographics. During election night we worked on several projects simultaneously, including live maps. I was mainly concerned with the analysis of the election results. For this purpose, I wrote an R script you can find here: https://bit.ly/2pIggbX. Our DJA18 submission: https://bit.ly/2IVKMaO.
What was the impact of your project? How did you measure it?
We are a local medium and our projects deal with topics of particular interest to Berliners. Of course, this sometimes includes national issues such as the federal elections. The impact depends on the topic and the target audience, but you can say that our projects have a very good visibility and the interactive elements are used strongly. We always make sure that we don't hide any functions and always use the best visualization method so that readers can understand everything at a glance while working completely clean in our methods at the same time. In return, we receive very good feedback from our readers. We also publish methodological statements and, where appropriate, the code of my analysis and plots. The published code for my election analysis has even been reused by the newspaper "Der Standard" to cover the Austrian elections, which we really appreciated.
Source and methodology
For most projects mentioned above, we have requested the data from federal departments. We scraped the poll data from a German poll data platform, and obtained the election data via an API.
To check the data, we take samples and verify them manually with our and, if possible, other sources. We also check whether the results in graphics make sense. We check the minimum and maximum values, especially outliers, separately. If possible, we ask the locals (for the project on party heartlands we even called and asked the mayors or party leaders of some heartlands), or discuss the results again with our data source. I also consult with statisticians for my analysis.
I use the statistical software R for the entire data processing and sometimes also for the visualization. In R I work almost exclusively with the tidyverse packages tidyr, dplyr and ggplot2. Depending on the project, I use R for different steps of the workflow. For "How Berlin ticks along your line" (Link 4) I cleaned up, matched and transformed the data as the developers needed it as input for the graphics. For "This is how close the parties are - and how close their candidates are" (Link 3) and "The sound of the SPD's steep fall" (Live Link) I also analyzed the data and created graphics during the analysis. For example, find my script on "The sound of the SPD's steep fall" here: https://interaktiv.morgenpost.de/spd-absturz-sound/data/methodik-spd-absturz-sound.html. The graphics shown in the project do not come from R, though. But the graphics in R are quickly done and help us not only with the analysis but also to consider which form of presentation is the best before we implement it in Java Script. But there are some projects where my plots made it to the reader. For example, I did create the map you see in our project "Where women earn more than men" (Link 2), but our designer added the labels manually. For the analysis of the federal elections, however, everything had to be faster and more automated. The choropleths shown in this project are the unchanged ones I built in R.