Project description

I’m submitting seven stories that I’ve worked on at The Times and The Sunday Times since April 2017.
Project 1 (main link and links 1 and 2), paedophiles: An investigation into paedophiles sharing child abuse imagery on the dark net.
Project 2 (links 3 & 4), hackers: An investigation into the black market in stolen credentials, focusing on passwords belonging to UK politicians.
Project 3 (link 5), land banking: A story revealing that a company owned by the British finance minister had been ‘land banking’, on the day he announced an inquiry into the practice.
Project 4 (link 6), halal: An analysis of animal welfare violations at abattoirs practising non-stunned halal slaughter.
Project 5 (link 7), trains: A project to analyse train delays at every station in Britain over a five-week period.
Project 6 (link 8), church land: A story revealing the extent of mineral rights owned by the Church of England, often underneath people’s homes.
Project 7 (link 9), AlphaBay: A story revealing the online identity of Alexandre Cazes, the founder of the AlphaBay dark net market.

What makes this project innovative?

Project 1, paedophiles: We combined data journalism with investigative tools and methods to identify 50 British users of dark net paedophile sites. While previous investigations have focused on individuals or sites, we analysed both, handing over details to the National Crime Agency.
Project 2, hackers: I brought a forensic approach to lists of leaked email addresses and passwords that had been floating around on the dark net for some time, revealing that they contained details on thousands of government officials, including cabinet ministers and the UK Foreign Office’s own head of IT.
Project 3, land banking: I completed this quick story in about two days, just in time for the British finance minister’s 2017 budget statement. It was picked up widely in the UK and the allegations were put to Philip Hammond in a radio interview, derailing his announcement of an inquiry into ‘land banking’.
Project 4, halal: rather than starting from a point of prejudiced opposition to halal slaughter, I looked at the data. I persuaded the UK Food Standards Agency to release historical data on inspections of slaughterhouses and found that abattoirs practising non-stunned slaughter were three times as likely to violate animal welfare standards as others.
Project 5, trains: I looked at train running data on a scale (millions of rows) that no UK data team had attempted before, working out how many trains were delayed over a five-week period at every station in the country.
Project 6, church land: I worked closely with The Times’s religious affairs correspondent to get to the story behind a massive increase in registrations of ‘mineral rights’ by the Church of England.
Project 7, AlphaBay: while other journalists were scrambling to link Alexandre Cazes to AlphaBay using material that was currently online, I looked at a huge archive of dark net markets scraped between 2013 and 2015, finding clear evidence that Cazes was Alpha02, the market’s founder.

What was the impact of your project? How did you measure it?

Project 1, paedophiles: our investigation led to the identification of 50 users of dark net paedophile forums, including a former employee of a children’s hospital and a former school governor. We handed a detailed file on each user to the National Crime Agency for use in making arrests. In response to our investigation, the National Association for People Abused in Childhood called for police to be given more technical resources to tackle online child abuse.
Project 2, hackers: on the day before publication, the National Cyber Security Centre downplayed the significance of our findings, emphasising that the data was obtained in breaches that occurred some time ago. The following day, after the story had appeared on the front page, the parliamentary email system was subjected to a sustained cyber-attack later attributed to Iran. A number of MPs had their emails accessed. The methodology of the attack implied strongly that it had relied on lists of email addresses and passwords similar -- if not identical -- to those reported in my story.
Project 3, land banking: appearing on Budget Day, this story was picked up by opposition Labour Party politicians, including the shadow chancellor, and used to discredit Philip Hammond’s announcement of an inquiry into ‘land banking’. John Humphries put the story to Hammond in a primetime interview on the Today programme, which would have been heard by millions of listeners.

Source and methodology

Project 1, paedophiles: we discovered leaked data posted online by hackers who had breached the hosting provider of a number of major child abuse forums. The data included SQL databases containing users’ email addresses and other details, which we linked to real identities using services such as FullContact and the ‘find my friends’ features on social networks.
Project 2, hackers: a database of email addresses and passwords was compiled by hackers from previous breaches of sites such as LinkedIn. I discovered this database on a hacking forum and used a SQL database to extract credentials belonging to government officials, matching using a list of government domain names.
Project 3, land banking: the Land Registry maintains a comprehensive database of land owned by companies in the UK. I used R to extract property owned by Castlemead, Philip Hammond’s company, and compared this manually with satellite imagery of the relevant sites and planning permission documents published on local council websites.
Project 4, halal: I asked the Food Standards Agency to release historical data on inspections of slaughterhouses. I combined this with data from the Halal Monitoring Committee showing accredited non-stun slaughterhouses.
Project 5, trains: I wrote a web scraper in R to download data from http://www.realtimetrains.co.uk/ for every UK train station over a five-week period. I then analysed this data in R to generate delay statistics for each station, and to look at delays by day of the week, time of day, etc.
Project 6, church land: I extracted all land owned by the Church Commissioners for England from the Land Registry commercial and corporate ownership data. I then matched the land identified with shapefiles of fracking license areas published by the Oil and Gas Authority.
Project 7, AlphaBay: I used an archive of scraped dark net markets maintained by the researcher Gwern Branwen (http://www.gwern.net/DNM-archives), crawling it for relevant information using Python scripts.

Technologies Used

For all projects, I worked primarily in R, my main language. For large databases unsuitable for in-memory access -- e.g. the Land Registry data --- I loaded them into SQLite files and connected to them through the dbplyr package. Other frequently used packages include: dplyr, readr, tidyr, purr, lubridate and rvest.

Project members

Tom Wills
George Arbuthnott
Andrew Gilligan
Sam Joiner
Kaya Burgess

Link

Additional links

Project owner administration

Contributor username

Followers

Click Follow to keep up with the evolution of this project:
you will receive a notification anytime the project leader updates the project page.