Project description

Every year since 1996, the National Statistics Institute (INE) publishes detailed information about births, deaths, marriages and vegetative growth of every of the more than 8.100 municipalities in Spain. In just 24 hours, we compile, analyse and visualize data from 1.144 csv files (52 csv files every year, one for each Spanish province) to publish a report about the increase of negative vegegative growth in the last 20 years. Most of these municipalities are located in the Northwest of Spain.

What makes this project innovative?

The innovative phase of this project is focused on the systematic collection of the 1.144 csv files published by the INE (source: http://www.ine.es/dynt3/inebase/es/index.htm?padre=3413). To structure all this data, first we gathered all the data from INE with a Python library called Requests. With this module you can do multiple requests to a domain and download the content. We wanted data about the 52 provinces in Spain, so with this system it was easy and quick to get the result in 1.144 csv files. After that, we structured all the data in one csv file with the function bulk_csv of the R package muckrakr, developed by Andrew Ba Tran. Then, we cleaned and analysed the data with Excel and exported in a big json file with an array for each town that contained the births and deaths since 1996 to 2017. Finally we showed the result in a column chart devoleped with javascript and highcharts. We did all this process in nearly 24 hours, as long as the INE published the data on the noon of the 11st December and our report was published in El Confidencial on the afternoon of the 12nd December. Bearing in mind that we didn't plan this report and we started to scrape the information of the csv files after the publication by the INE, this project could also fit in the best use of data in a breaking news store or in the open data award categories.

What was the impact of your project? How did you measure it?

This project was very viral on Twitter, specially the map that shows the negative vegetative growth in the Northwest of Spain for more than two decades. Spanish people know the aging of the Northwest of Spain and rural areas, but showed in a map that illustrates 20 years of information had a great impact.

Source and methodology

The innovative phase of this project is focused on the systematic collection of the 1.144 csv files published by the INE (source: http://www.ine.es/dynt3/inebase/es/index.htm?padre=3413). To structure all this data, first we gathered all the data from INE with a Python library called Requests. With this module you can do multiple requests to a domain and download the content. We wanted data about the 52 provinces in Spain, so with this system it was easy and quick to get the result in 1.144 csv files. After that, we structured all the data in one csv file with the function bulk_csv of the R package muckrakr, developed by Andrew Ba Tran. Then, we cleaned and analysed the data with Excel and exported in a big json file with an array for each town that contained the births and deaths since 1996 to 2017. Finally we showed the result in a column chart devoleped with javascript and highcharts and we produced the map with QGIS and Adobe Illustrator. The three other infographics were produced with Datawrapper.

Technologies Used

Python, R, Excel, javascript, highcharts, QGIS, Adobe Illustrator and Datawrapper.

Project members

Antonio Hernández, Daniele Grasso

Link

Followers

Click Follow to keep up with the evolution of this project:
you will receive a notification anytime the project leader updates the project page.