Project description

Ahead of the Oscars, the attention isn’t only on stars and celebrities, but also, increasingly, on the people missing from the stories Hollywood tells. Although there has been great reporting on the amount of representation minorities get in Hollywood, a data-driven look at how these minority characters are presented has been missing from the conversation. So, to illustrate how stereotypes have developed in Hollywood, we analyzed the entries of more than 6,000 Oscar-eligible movies since 1928.
Our analysis and the accompanying reporting show that, even though some of the more obviously racist tropes have faded from cinema in the past decades, many stereotypical narratives have shifted rather than disappeared: The evil “yellow peril” stereotype about Asians and Asian Americans has given way to the nerdy “model minority” – a stereotype that, while arguably more benign, still has devastating effects. And while blackface has become largely inacceptable, black characters are still presented as angry or sassy, and they still often get killed off first.
While the data-driven approach to this story aims to show how minorities are represented, the accompanying research and interviews try to convey the mechanisms behind and the real-world impact of stereotypical depictions in media.
We explored the most common stereotypes about Latin characters, Germans, Brits and Russians as well, but kept our main focus on Asians/Asian-Americans and black people. These are not only some of the largest minorities in America, but also represent the audience of Deutsche Welle. As an international broadcaster, our readership lives in Asia and Africa as well. Interest in the Academy Awards and Hollywood movies in general is still very much present in our audience. With Hollywood’s influence over the global media market, its presentation of ethnic groups affects people in all parts of the world. Not only do they see American stereotypes about themselves in the Hollywood movies they watch, they’re also affected by the preconceptions Westerners form about them through those depictions.
The project was published in English and German and adapted for Turkish and Brazilian audiences as well.

What makes this project innovative?

There have been amazing data-driven projects about the amount of representation minorities get in Hollywood, but analyses of how these roles actually play out on screen remained largely anecdotal. With this project, we attempted a deeper look at the topic of representation: Which stereotypes specifically characterize the depiction of various ethnic groups. We aimed for a sample size capable of adding a new dimension of support to the qualitative analyses that media experts, and people of color in general, have been publishing for decades. The larger our sample, the more stable our results would be – and the harder they’d be to dismiss. But content analyses are usually expensive and time-consuming, especially at the scale we needed. Our approach worked around that by utilizing the best aspects of user-generated data, and avoiding its drawbacks as best possible. We built a sample of more than 28,000 movies that were eligible for the Oscars since 1928, and scraped the user-generated wiki for a list of tropes appearing in these movies. In scraping, matching and analyzing the data, we used a mix of automation, pattern recognition and manual work to not only gain optimal results, but also make our process as transparent as possible. Our methodology, as well as the data and code themselves, are published on our GitHub page. This way, we keep ourselves accountable to our audience and help any other interested party benefit from our work. With a final sample of 6,637 matched eligible movies and 21,789 unique encountered tropes, this project offers a scale of analysis previous projects, either journalistic or scientific, did not reach. The final article aims to combine the power of data-driven reporting with traditional journalistic work that explains the why, the how, and the real-world implications of stereotypical depictions in Hollywood.

What was the impact of your project? How did you measure it?

At DW Data, we mostly measure the average time spent on our articles to get a sense of how our project is received. The average dwell time for this article was four times higher than that for the average DW article. Users generally spend more time with our data-driven stories than they do with other DW content, but this one stood out even among data-driven articles. In addition, the social media post for example on the DW News Facebook channel was shared more than average posts. The Turkish and Brazilian teams of DW also adapted the article, prompting discussion on social media in these communities as well.

Source and methodology

You can find a full account of our methodology, as well as the data and code behind the analysis, on our GitHub page (see additional links). To analyze the representation of minorities in Hollywood movies, the first thing we needed was a reliable sample of movies that were at least moderately impactful in Hollywood. A good indicator of this is their eligibility for the Academy Awards. Thankfully, any movie that passed the eligibility criteria gets recorded in a "Reminder List" that the Academy sends out to the jury each year. These lists have been compiled by Richard Brunner on his website After talking to him about his process, we used the scraped data from his site as the basis for our sample. To check the data for accuracy, we used the four publicly available reminder lists, as well as the official Academy database of nominees and winners. We also merged these datasets with IMDB data where possible for further metadata and fact-checking. This yielded a list of around 28200 eligible movies. For the stereotypes present in those movies, we systematically scraped results from the user-generated wiki TVTropes. Users of the wiki collect and maintain lists of tropes that they find to be present in movies. As with all user-generated data, there is bound to be some margin of error. Still, TVTropes is quite simply the best option for getting detailed and large-scale data on something as complex as movie tropes. As a precaution, we excluded entries that were only edited by one user. To counter the effect of newer movies being more likely documented than older ones, we analyzed the share of a trope in all tropes of that time, as opposed to the share of movies that have a trope. The main challenge was matching TVTropes entries to the correct movies and avoiding false matched. For this, we used a mix of pattern recognition rules and manual work in R and OpenRefine. When in doubt, we erred on the side of caution, and in the end, we had a sample of 6637 matched eligible movies and 21789 unique encountered tropes to work with. In the analysis, we mainly focused on the representation of Asians and Asian Americans as well as black people. To find tropes related to these groups, we used keyword searches as well as background research on common media stereotypes. To keep the readers' focus on the most common tropes and their development over time, we made the unusual choice to leave out the y axis on some charts, as it would have been more distraction than help in this case. In addition to the methodology page, we also created an interactive table that allowed users to search through our trope database themselves (see additional links).

Technologies Used

The main tool used for this story was the statistical programming language R. More specifically, we used the scraping library rvest to gather the data, as well as the tidyverse packages, mainly dplyr for analysis and ggplot for preliminary visualization. The data cleaning software OpenRefine was used for pattern-based matching between metadata and TVTropes entries. We also used Adobe Illustrator to adapt the visuals for publication.

Project members

Kira Schacht (she is the sole contributor, who should get all the credits for it; I'm only handing this project in on behalf of our team DW data)


Additional links


Click Follow to keep up with the evolution of this project:
you will receive a notification anytime the project leader updates the project page.