If you’re like me, you’ve lived through many moments of extreme professional frustration.
You know that somewhere, a long time ago, you came across the name of an important person, place or company as part of your research or reporting. The problem is, you can’t remember where you saw it.
The idea of re-reading hundreds, even thousands, of pages of documents to find that one piece of information seems overwhelming.
Built by the International Consortium of Investigative Journalists, Datashare is an application that allows you to efficiently search and organize your documents
Datashare is built on some of the same technology that helped ICIJ produce its biggest projects, like Panama Papers and Paradise Papers – but rather than rely on ICIJ’s servers, Datashare can be installed on your own computer.
ICIJ has big plans for Datashare. In the future, you will be able to use the program to collaborate securely with reporters, or anyone else, around the world – without needing ICIJ’s data team to index hundreds of files.
A word of warning, from one reporter – not data engineer – to another. Datashare is currently in testing mode. Techies call it “beta.”
That means that you will almost certainly experience glitches and moments where the program doesn’t work. I’ve already downloaded it 17 times.
At this point, you might be tempted to sashay away. But stick with us! Channel your frustration into helping ICIJ make the best tool possible.
Think of Datashare as an experiment and of yourselves as testers.
If you do encounter a problem, send us an email: email@example.com. Or, if you are more technically inclined, then you can contribute to the development of Datashare (and leave any issues) on our GitHub repository.
Be as precise as you can: tell us what kind of computer you are on, what error message you see (or what you don’t see) or send a screenshot of the problematic page.
Now that you’re both excited and forewarned, let’s get started.
Step 1. Download Datashare
Enjoy the purplish-pink. At the very least, it will get you in the mood for dancing.
Also, if you’re using a Mac, you may get told the application doesn’t come from a ‘verified developer.’ So you’ll have to take an extra step and jump into your security preferences to allow it. There are more details on that here.
Welcome to Datashare.
Step 2. Welcome the whale
Installing Datashare necessarily comes with a whale. Call it Ishmael, if you like. Otherwise, it’s known as Docker.
Docker, which will appear on your computer as a blue or white whale, is a program that allows you to run applications in isolation from the rest of your computer system. There’s a more technical explanation available if you want to find out more.
Step 3. Restart your computer
Once you’re back up and running, the small whale (now white on Windows) will appear at the bottom of your screen if you’re using a Windows computer (image 1). If you’re on a Mac, it’ll appear in the top menu bar (image 2). It looks like it’s bubbling from multiple blowholes.
The Docker whale on Windows (left) and Mac (right).
Step 4. Open Datashare
Open Datashare just like you would any other program.
It will take some time to load when you open Datashare for the first time. Your computer might start whirring; the whale, or Docker, may reappear. If you use Windows, it will ask for access to your C Drive. Say “yes.”
Then, hey presto, Datashare will launch in an internet browser window.
Get ready to search your documents.
Step 5. Add documents
Datashare can’t do its job until it knows what documents to search.
Emails, court transcripts, academic articles, your typed interview notes, scans of foreign language reports – go crazy.
But not too crazy.
Remember that Datashare is still in testing mode. Don’t add too many documents or you risk breaking it and having to start again. Start with one or two files; a 50-page court document or a boring parliamentary report you would like to be able to skim read again.
Datashare created an empty folder on your computer during the installation process (in Windows, this folder is on your desktop). Put the documents you want to search into the Datashare folder (the same way you’d move any documents around on your computer).
Don’t forget that if you are lost, there’s a “How to add documents” button on the Datashare page in your browser.
Step 6. Analyze your documents
For mere mortals, the “Analyze documents” button tells Datashare to scan the documents you just put in the Datashare folder. For the technical demi-gods… well, you should read the GitHub for more details on what’s powering this beast.
After you click “Analyze documents,” click “Extract text.” Select “Yes” when it asks if you wish to extract text from images and PDFs.
The magic: extract text from your documents so you can easily find names, locations and more!
This will allow Datashare to read through all the documents in your Datashare folder and turn them into text that can be read by your computer.
One more time, be patient (unlike me). You’ll see a bar grow as the extraction completes. Wait until it is at 100% and marked as “done” before you move on.
Step 7. Wait. Then click “find people, organizations and locations.”
You can now click the neighboring button: “find people, organizations and locations.”
This is one of Datashare’s best functions!
If it works, it will allow you to see at a glance the names of every person, company, organization, country, city and town within your document. No more scrolling through endless pages of documents to find the one reference that you just know is in there.
Here’s an example of my own.
I’m a pretty regular reader of United Nations reports. Fun, right? Many such reports have provided leads for me to chase down names in the Panama Papers and Paradise Papers.
Some of these reports are long. Other important reports are old and can’t be searched with a simple Control/Command + F.
Take, for example, this one 38-page document from 2002 that details allegations about the illegal exploitation of natural resources in the Democratic Republic of Congo.
On the left, you’ll see one page of the original. Dense text, right?
On the right, you’ll see what Datashare has done to the report.
Every city or town named in the report is highlighted in green. The name of every person is highlighted in pink. The name of every organization appears in yellow.
A document (left) now indexed by Datsahre (right) is much easier to read and quick search.
It’s not perfect. The Congolese mining company “Gecamines,” for example, is labeled as a location.
But it’s a start – and a real timesaver.
Step 8. Give it a whirl
The more people who try Datashare now, the better the program will be.
Remember, you can write to firstname.lastname@example.org with any questions or to report any bugs.
With your input, we’ll be working on Datashare throughout 2019 and beyond, so stay tuned for updates!
We’ll post our latest developments on our GitHub, and will keep our community of testers and readers informed of major releases or new features, so please keep checking back and sign up for email updates from ICIJ.
This article was originally published by the International Consortium of Investigative Journalists (ICIJ). It was edited and republished on Data Journalism Awards website with permission.