This project was a massive undertaking that involved two years of collecting data and building a machine that would compare model legislation written by special interest to 1 million bills introduced in statehouses nationwide.
What we found was stunning. By flagging exact phrases found among the bills, we identified a bare minimum of 10,000 bills directly copied from language carefully crafted by corporations and powerful lobbying groups. Some state lawmakers had unknowingly sponsored dozens of these copycat bills with no idea who wrote them or what their true intent.
To crunch the data, we needed the equivalent of 150 computers. They ran non-stop for months to complete the matching.
There were too many people involved in the project to name everyone, but the work couldn’t have been done without our former colleagues Tom Foster and Bob Barnes in the IT department, who built the machine, and lead reporter Rob O’Dell in Arizona who was critical in both reporting the main story and solving the math that let us identify actual copies. It was Arizona Republic Investigative Editor Michael Squires who originally pitched the project and guided it to completion. USAT Investigative reporter Nick Penzenstadler and our newsrooms in Nashville and Wisconsin dedicated months to the effort.
Along the way, we discovered that the Center for Public Integrity had been working on a very similar effort and we proposed a partnership. Their first major contribution to the project was a story showing how auto dealers sparked a model bill campaign to try to avoid being held responsible for repairing recalled cars with deadly defects.
It all came together beautifully in the end, thanks to the work of folks in our design studios and in the storytelling studio. The digital story is chock full of really nice touches, including a cool scrolly graphic by Mitchell Thorson and Shawn Sullivan, illustrations by Andrea Brunty, an amazing graphic video explainer by Patrick Shanahan, design by Spence Holladay, and graphics by Jim Sergent, Veronica Bravo, Lauren Lapid, Pim Linders and Ramon Padilla. And on the print side, David Anesta led the design with others helping to get the packages rolled out to our papers across the country.
As part of the project, we’ve rolled out a tool to all Gannett papers that lets reporters see every model bill we found in each state. Reporters can share suspected model bills to be analyze and get email alerts when new hits are uncovered in their home state. So far, more than 100 of our reporters have sought training on the Legislative Influence Detector. They’ve already searched thousands of bills with the heaviest use from our journalists in New York, Florida, Minnesota, South Dakota and Oregon.
The tool couldn’t have happened without more help from our friends in IT and an assist from John Kelly and Matt Wynn on the data team
With coordination from USAT I-team editor Amy Pyle, more than 30 reporters across the Network have contributed to reporting on local model bill efforts and sponsors. More stories based on that and additional reporting are to come throughout 2019.
This is signature work that gives the entire Network opportunities to produce unique local reporting. If you’re interested in using the tool, please contact USAT Data Director John Kelly.
What makes this project innovative?
See description. We took a subject that many take for granted and used an algorithm to make data that gets into the details. Our job was to make it clear, concise, and visual.
What was the impact of your project? How did you measure it?
Still under analysis, but great overall numbers, visual story being consumed fully, and directing readers to the main investiagtion. We are measuring depth of engagement, time on page. Makes available data set and tool for entire network. May soon open source.
Source and methodology
How do you find 10,000 needles in 50 haystacks? That, in effect, is what journalists and developers with USA TODAY and The Arizona Republic set out to do two years ago: Identify among the roughly 100,000 bills introduced in the 50 states each year what's been copied from drafts pushed by special interests. Here’s how we did it. Using data provided by LegiScan, which tracks every proposed law introduced in the U.S., we pulled in digital copies of nearly 1 million pieces of legislation introduced between 2010 and Oct. 15, 2018. The data included a limited number of bills from 2008 and 2009. We then asked a dozen reporters covering state legislatures for USA TODAY Network newsrooms across the nation to build a list of model bills by searching special-interest groups' websites, scouring news coverage and interviewing lobbyists and lawmakers. We identified more than 2,100 models, a list that is far from complete because many groups don't make their models public. We then used a computer algorithm designed to recognize similar words and phrases and compared each model in our database to the bills that lawmakers had introduced. These comparisons were powered by the equivalent of more than 150 computers, called virtual machines, that ran nonstop for months. How did we compare bills with model legislation? Even with that computing power, we couldn’t compare every model in its entirety against every bill. To cut computing time, we used keywords – guns, abortion, etc. Some bills have 30 to 40 keywords associated with them. The system only compared a model with a bill if they had at least one keyword in common. If there was a keyword match, the system compared the documents looking for strings of six or more words that appeared in both. For this search, the system used “stemmed” words, meaning they had been converted to their root. (For example, walk, walks, walked, and walking all become walk.) If a bill and a model shared at least one keyword and one six-word string, the system assigned a score reflecting how similar the two documents were. How our scoring system worked Our scoring system is based on three factors: the longest string of common text between a model and a bill; the number of common strings of five or more words; and the number of common strings of 10 or more words. Based on those factors, bills received scores on a 100-point scale. The closer to 100, the more likely a bill was copied from model legislation. For its analysis, USA TODAY/Arizona Republic used only bills that scored 80 or higher. At that level, substantial amounts of text have been duplicated. Another estimated 10,000 bills below the 80-point threshold were likely copied from model legislation but matched less of the model's text. Out of caution, USA TODAY/Arizona Republic cited in its investigation only bills with substantial portions copied from a model. In addition, if legislators copied an idea but not the precise language, a bill would not be flagged. Joe Walsh, a former data scientist at the University of Chicago, used what’s known as the Smith-Waterman algorithm to create the Legislative Influence Detector, which also finds similarities between model legislation and bills. His system has been used by reporters around the country to find model bills. Walsh reviewed USA TODAY/Arizona Republic’s investigation and findings and applauded its scoring system for showing when a bill has been substantially copied from model legislation. “It’s really clear, the numbers are nice and round, and it's easy to show and explain,” Walsh said. "I wish that we were able to do some of this stuff. I am glad someone is.” Can I examine the results? USA TODAY/Arizona Republic continues to search legislation and compare it with known model bills from around the country, furthering its investigation of outside influences on state lawmakers. Initially, the system is being rolled out to USA TODAY Network journalists for use in reporting on state legislatures. How were bills categorized? Special-interest groups, both liberal and conservative, have for years crafted and lobbied for model bills. Generally, the organizations that craft the bills have a clear mission or ideological bent. The American Legislative Exchange Council, the best-known and one of the most prolific model-bill factories, supports conservative ideas and efforts. The State Innovation Exchange, once known as ALICE, is in effect ALEC's liberal counterpart. We classified bills based on the mission or ideological orientation of the organizations that created each model. In some cases, groups with a conservative bent also push bills that benefit industry. We labeled each bill according to the most dominant characteristic. How can I help? If you know of a model bill, particularly one that you think has not been made public, we want to hear from you. Please complete this form and include text of the model bill. We will try to include it in our system.
The team behind this investigation REPORTING AND ANALYSIS: Natalie Allison, Chris Amico, Daniel Bice, Giacomo Bologna, Ben Botkin, David Boucher, Jon Campbell, Amy DiPierro, Paul Egan, Dustin Gardiner, Ronald J. Hansen, Greg Hilburn, Greg Holman, Joe Hong, Lisa Kaczke, Keegan Kyle, Kaitlin Lange, Pamela Ren Larson, Aamer Madhani, Patrick Marley, Kelsey Mo, Dan Nowicki, Rob O’Dell, Geoff Pender, Nick Penzenstadler, Agnel Philip, Justin Price, Nick Pugliese, Yvonne Wingett Sanchez, Jeff Schwaner, Chris Sikich, Michael Squires, Matt Wynn FROM THE CENTER FOR PUBLIC INTEGRITY: Jared Bennett, Kristian Hernandez, Sameea Kamal, Rui Kaneya, Mark Olalde, Pratheek Rebala, Peter Smith, Liz Essley Whyte EDITING: Chris Davis, John Kelly, Amy Pyle, Michael Squires, Kytja Weir (CPI), Gordon Witkin (CPI) GRAPHICS AND ILLUSTRATIONS: Andrea Brunty, Veronica Bravo, Lauren Lapid, Pim Linders, Ramon Padilla, Jim Sergent, Shawn Sullivan, Mitchell Thorson PHOTOGRAPHY AND VIDEOGRAPHY: Patrick Breen, Chris Powers, Pat Shanahan DIGITAL PRODUCTION AND DEVELOPMENT: Robert Barnes, Christian Baucom, Andrea Brunty, Tom Foster, Tyler Hawkins, Spencer Holladay, Ryan Marx, Annette Meade, Josh Miller, Michael Varano, Stan Wilson SOCIAL MEDIA, ENGAGEMENT AND PROMOTION: Mary Bowerman, P. Kim Bui, Anne Godlasky, Danielle Woodward