Project description

The project addresses dialect change through data and through reporter work.
We analyzed crowdsourced data of more than 60\’000 Swiss German speakers who had indicated how they pronounce certain words and self-reported their age, gender and local dialect. We then picked two extrema: the place with most so-called linguistic change in apparent time, i.e., where the youngest generation speaks most differently from the oldest generation – a town called Fribourg. And the place with the least linguistic change, i.e., where the children still speak like their grandparents – a town called Baar.

In each of these places we visited a family with three generations and discussed their dialect with them: whether the grandparents and the grandchildren speak alike, why this may or may not be the case and what it feels like when the granddaughter uses words that the grandmother would never say.

In the resulting article, the story is told through these dialogues between grandchild, mother and grandmother, through videos that feature their speech, and through graphics that show how much the Swiss linguistic landscape has changed between the oldest and the youngest generation, in Fribourg, and how stable it has remained in Baar.

The story addresses a wide audience: Dialects are the prestige linguistic variety in Switzerland, as opposed to Standard German; the latter is mostly used in written or in more formal oral contexts. Furthermore, German-speaking Switzerland exhibits strong variation in a relatively confined area. Therefore, the Swiss typically show a strong interest for dialect variation and change.

The final article was adapted for wide screens, for mobile screens (the graphics, in particular), as well as for the printed newspaper (parts of the story, particularly around the graphics, were rewritten).

What makes this project innovative?

The project is based on a dataset crowdsourced by a mobile app that predicts people's dialect based on how they pronounce a set of words. This work was done by a group of Swiss researchers, one of which is the first author of the present article. This method for crowdsourcing linguistic data was, at the time (2013), novel, and a dataset of this size is still an exception in linguistic research. Our methods of data analysis combine novel data, analytical methods for analyzing linguistc change as well as different statistical methods for measuring variation in categorical variables. Also, we believe that the story is innovative in one more way: It revolves around people and their language, it features dialogues and video snippets. It is datajournalism in which the graphics do not take the center stage, yet the story is completely data-driven.

What was the impact of your project? How did you measure it?

We measured success using several metrics such as article views and engaged time, as well as the resonance of the article in social media and the real world. The article had a wide audience on our news site (, as measured by our analytics tools, as well as on social media. A high number of spontaneous, personal messages from the general public as well as from experts in the academic field of linguistics showed us that it conveyed the topic equally well to experts and to the general public.

Source and methodology

We worked with a dataset croudsourced via a mobile app called "Dialäkt Äpp" ( The app mainly featured a dialect quiz: Tell the app how you pronounce particular words, and the app tells you where you are from. The app was published in March, 2013 – more than half a year before the New York Times dialect quiz. The app also allows its users to send off their data for research, along with their age, gender and the information which local dialect they speak. The dataset used for this story was collected between 2013 and 2017, and it contains dialect data from more than 60‘000 speakers from all over Switzerland. There are many ways to measure linguistic change. One method is to compare the younger generations with the older generations and to track differences. We defined young speakers to be 20 years old or younger and old speakers to be 55 years old or older. How should one measure linguistic variability in categoric variables? There are several options, we picked a metric called "variation around the mode" (also: "deviation from the mode"). We calculated this value for the young and old generation in each place and divided one value by the other. We then picked extreme values, as "pars pro toto" places: One where the deviation from the mode was almost equal between the young and the old generation: Baar. And one were there was much more variability in the younger generation: Fribourg. We disregarded smaller places, because they often featured to little data – we needed a certain amount of datapoints to make claims about the linguistic situation in a place. More details on the methodology can be found in the methods part of the final article: More details on the methodology regarding the crowdsourced dataset can be found in this article:

Technologies Used

The data were crowdsourced via a mobile app and stored in an SQL database. We fetched the data from the database via SQL code and analyzed it in R. For data analysis, we mainly used custom code that we wrote in R, along with methods used in linguistic research. The basis for the visualizations was created in R using the library 'ggplot2'. These plots were then processed further in Adobe Illustrator. The dialogues in the story can be read in Swiss German as well as in Standard German. The interactive tool that allows users to switch from one variety to the other was built by the Editorial Tech Team of the Neue Zürcher Zeitung.

Project members

Authors: Marie-José Kolly / Alexandra Kohler / Stefanie Hasler (video) / Anna Wiederkehr (graphics) With help from: David Bauer (concept), Markus Stein (video), Beni Buess and Philip Küng (interactive elements).



Click Follow to keep up with the evolution of this project:
you will receive a notification anytime the project leader updates the project page.