Project description

How do the approximately 1.7 million students entering colleges ranked in the WSJ/THE college rankings make their final decision about where to attend? Guiding the creation of the College Comparison tool was the idea that traditional college rankings released during application season in the fall didn’t effectively serve an audience that was thinking about college choices in a different way once their acceptance letters came.
Using our understanding that the spring’s college decision process demands specific and direct comparisons not necessary during the application process, we translated the idea of a product comparison to college choice. We gave students (and parents) a clearer indication of how colleges stacked up against one another instead of the field at large. In that process, we gave them a chance to read about two schools’ strengths and weaknesses side by side.
We were also aiming to make the WSJ/THE college rankings more accessible to a new kind of WSJ reader. The disparity between the popularity of the rankings themselves and the methodology pages indicated that the users weren’t getting as involved with the rankings as they could. While someone had the chance to dig into the numbers before, the new tool was designed to bring the methodology to life and succinctly and plainly explain where the college excelled or fell short of its peer institutions.
The project was behind WSJ’s dynamic paywall and was designed to generate new subscriptions by targeting a new audience.

What makes this project innovative?

We used artificial intelligence (Natural Language Generation) to automatically create brief descriptions for 968 universities based on college rankings data collected by WSJ/THE. These descriptions were presented to our readers in an interactive comparison tool as well as a data explorer. Each description highlighted and explained the rankings subcategory in which a college did best and worst and presented the unifying elements of any education: the costs. In total, the project consisted of almost 250,000 words. The automation process works by having a human editor writing templates and then the software automatically creates text outputs directly from structured data (organized in rows and columns). The function of writing templates is commonly referred to as “branch writing” because the story can have multiple variations. Branch writing means telling the natural language generation system to write a certain word or sentence under a particular condition defined by the editor; like in computer programming, it’s an if-then-else logic. There were nearly 50 core variations of the article template and each college’s story was unique. The automation process works by having a human editor writing templates and then the software automatically creates text outputs directly from structured data (organized in rows and columns). The function of writing templates is commonly referred to as “branch writing” because the story can have multiple variations. Branch writing means telling the natural language generation system to write a certain word or sentence under a particular condition defined by the editor; like in computer programming, it’s an if-then-else logic. There were nearly 50 core variations of the article template and each college’s story was unique

What was the impact of your project? How did you measure it?

To measure success, we look at the tool’s reach, engagement and ability to drive habit, measuring how widely used the tool was, how effectively the audience utilized it and what they did after using it. The tool has had more visitors than nearly 99% of all pieces published by WSJ since the beginning of 2019, indicating the broad appeal. More than 6/10 of the visitors were new to the Journal, highlighting the data application’s ability to reach different audiences than the paper at large. The tool over-indexed in WSJ’s internally-calculated engagement metrics. Measuring the tool’s time spent in the context of stories of similar length and its conversions against its relative exposure around the internet, the tool outperformed the expectations. Specifically related to subscription generation, the tool did 136% better than stories that had similar numbers of nonsubscriber page views. Of those who interacted with the automated stories, the average tool user expanded nearly 5 college descriptions, meaning he had the chance to read roughly 1,250 words, more than the average story published by WSJ. In addition to attracting new audiences, the tool helped strengthen the relationship between WSJ and current members. Of the subscribers who viewed the college comparison tool, 36% were more active on wsj.com in the week following their interaction with the tool compared to the week prior. The automation process works by having a human editor writing templates and then the software automatically creates text outputs directly from structured data (organized in rows and columns). The function of writing templates is commonly referred to as “branch writing” because the story can have multiple variations. Branch writing means telling the natural language generation system to write a certain word or sentence under a particular condition defined by the editor; like in computer programming, it’s an if-then-else logic. There were nearly 50 core variations of the article template and each college’s story was unique

Source and methodology

The full methodology for the WSJ/THE college rankings can be found here (https://www.timeshighereducation.com/USmethodology2019). “Data comes from a variety of sources: the US government (Integrated Postsecondary Education Data System (IPEDS), the US Department of Education’s Federal Student Aid (FSA), the College Scorecard, the Bureau of Economic Analysis (BEA), the THE US Student Survey, the THE Academic Survey, and the Elsevier bibliometric dataset.” The final data set was then audited for accuracy by PWC (https://www.timeshighereducation.com/sites/default/files/us-college-rankings-2019-methodology-pwc.pdf) and published as a part of the WSJ/THE College Rankings in September 2018. The same data was used to create the automated template developed by Kevin McAllister and Francesco Marconi in February 2019. A sample of the data representing the template’s core variations was reviewed by WSJ editors and the Standards and Ethics team for both accuracy and style. Dov Friedman developed the tool in its entirety and verified that the data rendered correctly and the comparison elements functioned properly.

Technologies Used

The automated story templates were created using Natural Language Generation the Wordsmith interface from Automated Insights. The automated stories were exported and stored on S3 for retrieval. The interactive tool, built with React for the user interface and MobX for state management, pulls the automated stories and comparative college data based on user interaction and displays it to the reader.

Project members

Kevin McAllister, Dov Friedman, Francesco Marconi

Link

Followers

Click Follow to keep up with the evolution of this project:
you will receive a notification anytime the project leader updates the project page.