The Associated Press data journalism team combines news applications development, data engineering, news automation, data analysis and data visualization. We work with journalists across the world — not only our AP colleagues, but also journalists at member and partner organizations. We cover every beat AP covers, from government and politics to environment and education, and we are dedicated to making data journalism more accessible to everyone while maintaining the high standards of all AP journalism.
Our work may appear as a full text story, a crucial paragraph in a longer story or as a graphic or interactive. We deliver election results maps, contribute to breaking news and to long-term investigations and enterprise. We build tools used across the newsroom, allowing reporters who don’t code to create their own interactive digital content. In many cases, the most important product we deliver is the data itself — vetted and documented, with full details on our methodology — for other news organizations to create a version of the story that is right for their audience. For the past year, we’ve been working with data sharing platform data.world to make the data behind our work available, which has led to hundreds of text and visual stories that would not otherwise have been possible.
The team has also made critical contributions to the wider data journalism community in the past year. For the first time the AP Stylebook included a a complete section on data journalism. The chapter, written by members of the data team, covers everything from evaluating data sources to acquiring data to writing with numbers. It also sets basic standards for reproducible analysis and data visualization.
The AP has also started looking for opportunities to collaborate with member news organizations as well as those outside of journalism on issues of vital public interest. Our first effort in this space is Sunshine Hub, an online tool that brings together AP journalists, member news organizations, and First Amendment advocates from around the country to keep tabs on state laws that could affect access to government information. Through the portal, users can identify and tag relevant state legislation, track its progress through the legislature and discuss bills with other members to identify cross-state trends.
The AP team began as an offshoot of our interactive graphics team, with four members and an editor. In the past four years, that team has grown to 11 data journalists with specialties ranging
from statistics and demographics to devops and full stack development. We’re still a relatively small team given the size of our organization and the number of news organizations we serve, but we multiply our efforts by helping others tell their own stories with the data.
What makes this project innovative?
Perhaps most importantly, the data and methodology for each of these projects was shared on data.world with thousands of AP member news organizations, giving them access to vetted data and story ideas they may not have been able to do otherwise. AP data journalism team members guided reporters from hundreds of news organizations in using data to bolster their local reporting, hosting webinars to guide them through analysis and even pre-writing SQL queries to help them make sense of the numbers. Over a 12-month period, the AP data team distributed roughly two dozen datasets on everything from county-level opioid prescribing to the more than 4,000 grants given by the NRA to individual schools and community groups.
In every case, whether the techniques were common or cutting-edge, they have served the story first, and we've made our methodology clear enough that other news organizations have been able to build their work on our analysis with confidence.
To support this work, we built and open sourced a toolkit for managing data journalism projects that helps us rapidly start new projects, share data and review one another's work. We have also developed automation solutions and tools that have expanded our output of graphics and stories, freeing AP graphic artists and reporters focus their efforts on journalism with maximum impact.
What was the impact of your project? How did you measure it?
We also measure impact in governmental change and response: A partnership with Reveal/The Center for Investigative on modern-day redlining has led to separate investigations from the Pennsylvania attorney general and state treasurer. An analysis of Superfund sites in areas prone to flooding has prompted a GAO investigation, which is still ongoing. A story on federal judiciary diversity sparked a line of questions to Attorney General Jeff Sessions in a subsequent congressional hearing. A story analyzing NRA grant funding led to a number of local school districts to refuse NRA money in future years.
In the past year, other news organizations have queried our data sets more than 6,000 times. Six separate data distributions were accessed by at least 100 unique news organizations, sparking dozens of stories on school segregation, sexual misconduct in statehouses, mortgage loan redlining, NRA grants, Superfund sites in flood-prone areas and FEMA public claims appeals. AP and member reporters used SunshineHub to identified and track more than 150 transparency-related bills in the 2017 legislative session alone, leading to a series of Sunshine Week stories highlighting these issues.
Source and methodology
In rare cases, when the data does not otherwise exist, we have worked with data from a non-governmental source. One example of this type of source is the work we have done with data collected by the Gun Violence Archive. For this series, which was a partnership with the USA TODAY Network, we began with the data provided by this nonprofit and then used public records requests and our own news searches to verify the cases from which we drew the stories
We have used Python for machine learning tasks and neo4j for network analysis. Our databases are typically built using Postgres or SQLite. We use D3 for interactive visualizations, including our election night maps. We have built data administration tools for data entry and tracking, as well as world-facing news applications using Ruby on Rails and Django. While most of the geographic analysis we’ve done has been accomplished with R, we have also used QGis for some tasks as well as Mapbox and ESRI mapping tools for visualizations.