Project description

SPIEGEL Data is the data journalism team of the German major weekly Der Spiegel and its online offshoot Spiegel Online, one of the most widely read news websites in Germany. Although we’re a rather small unit, our tasks are very diversified: We support our colleagues from other departments in their day-to-day business, we work on medium-term in-depth data analyses and publish complex data-driven investigations and visual storytellings.

Let’s dive into some of our latest projects:

1. Black box Schufa
Schufa is the most influential credit bureau in Germany. We investigated its scoring algorithm using 2,000+ credit reports requested by customers during a crowdsourcing project. We found out that many people are declared a risk case with no fault of their own.

2. Commuting in Germany
More and more German employees are commuters. We built a personalized article that responds to the users’ individual commuting situation and shows them their personal story.

3. Explanatory election maps
We’ve generated a new approach to explain federal and state elections at a glance: a dense static map that focuses on the most important information and can be generated in a reproducible way with little effort.

4. The football fan atlas
Knowledge about friendships and rivalries in German football is very anecdotal. We’ve been able to collect 60,000 survey responses and get an unprecedently deep look into the field. The resulting story is rich in visualizations and the collected data has been released to multiple scientists.

5. Ultra long-haul flights
More airlines offer ultra long nonstop flights – for instance from Singapur to New York (16,500 km). We used tracking data from a flight portal to calculate the real flown distances and evaluated the CO2 emissions of these nonstop flights compared to connecting flights/layovers.

6. Alternative medical health courses
Many German adult education centers (Volkshochschulen) offer health courses that refer to alternative medical methods that are considered scientifically untenable. We scraped and evaluated thousands of course descriptions and showed which centers are the most conspicuous.

7. How a speed limit could save lives
The potential effects of a speed limit in Germany are a research gap. With a spatial analysis based on fine-granular open data we’ve been able to conduct a model calculation, showing that up to 140 deaths per year could be avoided.

8. When to buy a new phone
Flagship smartphones get more and more expensive. By analyzing data from price comparison sites, we were able to give advice to our readers how long they likely have to wait until the new models fall below a certain price level.

9. Tapping the data leak
In December 2018 a teenager published personal data of hundreds of German politicians and celebrities. We examined the material and found evidence that the attacker had used compromised credentials from previous data breaches to get access to the victims’ e-mail and cloud storage accounts.

What makes this project innovative?

For journalists, technology is both a useful tool to find and tell stories and a subject to report on. As data journalists it is our obligation and our privilege to take full advantage of both. The most recent innovations in our work in the former category are crowd-sourcing, personalization and algorithmic accountability reporting in the latter. Crowd-sourcing can be a powerful tool to collect data that was previously unavailable. Reports on Schufa’s credit scoring for example have always had one big flaw: they were only grounded on anecdotal evidence. Thanks to a crowd-sourcing initiative our project was the first systematic and data-driven investigation on the topic. Having access to the data was a primary prerequisite to (at least partially) deconstructing the algorithm at work during the credit scoring process. Algorithms are becoming more and more important for various aspects of everyday life. At the same time, traditional journalism lacks the tools to question how fair they are constructed. In our eyes this is a field of rapidly rising importance for data journalists. Personalization is a tool that leverages the potential of online journalism. Our story on commuting in Germany is based on local statistics that by themselves, in all honesty are rather boring. But since commuting is a very personal experience that highly depends on local conditions, we decided to tell the story in a highly personalized way. The text varies by the day and time you read the story and more importantly, all maps and statistics put the readers’ behavior into perspective with nationwide trends. A story that can only be reported online. On an important topic, that with ordinary means, is rather hard to convey.

What was the impact of your project? How did you measure it?

In 2018 our online articles were visited by more than six million people. Our major pieces usually reach a six-digit number of unique users. On average, they spend two to four minutes reading them. Articles in the print magazine (like the ones about the data leak or the alternative medical health courses) have an even bigger audience: The magazine is bought by 700,000+ people and read by more than five million people each week. Our Twitter account has about 4,200 followers and is the most followed account by a German ddj team. On GitHub we publish datasets and scripts for some of our projects. Our self-created geodata on German judicial districts (https://bit.ly/2uJ6cCw) was quickly implemented by the dataviz tool Datawrapper. Sometimes our stories have political impact, as well: After our investigation on the German credit bureau Schufa’s scoring algorithm the Federal Minister of Justice and Consumer Protection called for more transparency for consumers.

Source and methodology

Depending on the topic, we use very different data sources: official and governmental data (sometimes via FOI requests), data we scraped or gathered by crowdsourcing. Most of our articles have a separate section about the sources and methodology we used. Whenever we create new datasets that are of public interest, we provide them to the public. For some projects we publish a complete documentation and reproducible workflow on GitHub, as well.

Technologies Used

Our team is divided into two parties: Pythoneers and R-tists. But we get along pretty well. In larger projects it is common to separate scraping, processing, analyzing and visualizing into different scripts so everybody can use the language they're most comfortable with. For processing geodata we often use QGIS, furthermore, we extract data from PDFs using Tabula and Abby FineReader, build interactive maps using MapboxGL, create static infographics with Adobe Illustrator, interactive graphics with D3 and JavaScript, and draft our articles using good old Microsoft Word.

Project members

Holger Dambeck: science writer for SPIEGEL since 2004, the last four years as head of the science department. Math is his favorite topic, he published more than 200 math riddles. He joined SPIEGEL Data at the end of 2018. Marcel Pauly: head of SPIEGEL Data. During journalism school he realized how fun Excel can be, at Columbia’s Lede Program he discovered how much more fun Python is. His area of expertise includes party funding, parliamentarianism, algorithmic accountability and right-wing extremism. Patrick Stotz: data journalist at SPIEGEL Data since 2015. He's an urban planner by education and is largely self-taught in the field of data journalism. His analytical work is mostly done in R, while he also loves to visualize data, build interactive tools in D3 & JavaScript and to work with spatial data. Achim Tack: part-time data journalist with a background in urban data analysis and mobility issues. Identifies as a geo-nerd in the world of GIS, Python and larger amounts of data. He is also leading a team of analysts in a Hamburg/Berlin based consultancy.

Link

Additional links

Followers

Click Follow to keep up with the evolution of this project:
you will receive a notification anytime the project leader updates the project page.