We collected the data on investments in shares from the 17 largest Danish pension funds, imported the data from different sources (csv, pdf, xlsx) into one file and cleaned the data. Next step was to collect the exclusion lists from the same pension funds. They all have lists of companies they do not want to invest in. These lists are published online. We imported these lists, mostly from pdfs and compiled the two lists, to see what investments Danish pension funds held, that had been excluded – and for what reasons – by the other funds. This way we identified 357 billion DKK in total investments and out of this more than 42 billion DKK – 6,4 percent of the total investments – in controversial companies dealing with nuclear weapons, weapons, tobacco, coal, tar sand etc… The purpose of the story was to put a light on the use of the extreme collective wealth of the pension funds and put focus on the controversial shares owned by the funds. We have different audiences, mostly of course our regular readers – we are a small, Danish daily print newspaper – but also an online audience. We are sharing our data freely with readers, other media, organisations etc.
What makes this project innovative?
The data work on this story was crazy complicated and we spent months collecting, cleaning, compiling and manually verifying the data, using tools such as Tabula to extract data from pdfs, OpenRefine to clean and Excel to calculate it. After the clean up we "invited the pension funds into the vault" and shared our entire dataset with them and had them verify the data. Based on the data we researched a number of stories. When the first article of series was published, we also at the same time published the entire data set online (here: https://data.information.dk/2019/03/dataviz-pension/public/26MAR19Data_til_offentliggoerelse.xlsx) This way we invited other media, organisations, readers etc. in to do their own investigation and find stories in the data.
What was the impact of your project? How did you measure it?
Notably, the Danish pension funds have sold out 500 million DKK (66.8 million Euro) worth of controversial shares since we contacted them with our data and some funds have publicly confirmed that our coverage have made them change policy on what kind of shares they will invest in. One fund tells in an interview that the no longer will invest in development of nuclear weapons as a direct reaction to our articles. (https://www.information.dk/indland/2019/04/paedagogernes-pension-markant-kursskifte-slut-atomvaaben) Another fund says they will no longer invest in companies that provide weapons to Saudi Arabia, again as a direct reaction to the coverage: (https://www.information.dk/indland/2019/04/danica-droppe-investeringer-selskaber-eksporterer-vaaben-saudi-arabien) We had a lot of reactions on social media with different discussions pro et contra the types of investments we described. Also we set up the interactive so that users could react directly and write their pension funds from the website. We have seen a lot of reactions and there has been a lot of dialogue directly between our users and their pension funds. Some of our users have forwarded replies from the funds etc. We have interviewed several of the users that opened a dialogue with their pension funds. Several organisations used our data to start a public debate on the way pension funds invest in different types of possibly controversial industries.
Source and methodology
Most, but not all Danish pension funds publish lists of their investments online. We downloaded those available and asked for lists from the funds that don't publish them. The lists were in different formats, so we had to extract the data - most of this work was done with Tabula that can extract tables from pdf files. Data was imported into OpenRefine and cleaned here. One issue was that we had 17 different companies we looked at, and they name all their share holdings differently. We used OpenRefines facets and clustering tools for this and then manually checked the data. All the compiled data was then shared with the pension funds so that they could verify their own investments in the data. In this process some of the funds discovered that they had published wrong investment lists, so we got the correct data, imported and cleaned again and shared with the funds to verify that all investment data in our data set was correct. As stated above we have published the data set in its entirety for our readers etc., both in the hope that other media will use it, and in the hope that readers and organisations can check and verify data and point us to stories we might have missed. Reactions came quickly and we have already received tips, comments, reactions, even alternative data visualizations from readers.
Tabula (freeware) was important: http://schoolofdata.org/handbook/recipes/extracting-data-from-pdf-with-tabula/ OpenRefine (freeware) was used to do the heavy lifting: https://libjohn.github.io/openrefine/preamble.html Additional scraping was done in Outwit Hub. (not freeware) RegEx-searches etc in data was done with Tika and Sublime Text (excellent description here: https://gijn.org/2016/06/27/a-poor-journalists-text-mining-toolkit/) Calculations, pivots etc in Excel. (not freeware)
Sebastian Gjerding, Lasse Skou Andersen, Bo Elkjaer (All three from Dagbladet Information)